94 datasets found

Manipal Image Sentiment Analysis Dataset
figshare.com
search.datacite.org
xlsx
Updated Jan 20, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stuti Jindal; Sanjay Singh (2016). Manipal Image Sentiment Analysis Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.1496534.v2
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1496534.v2
Dataset updated
Jan 20, 2016
Dataset provided by
Figsharehttp://figshare.com/
Authors
Stuti Jindal; Sanjay Singh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Manipal
Description
This dataset has been created through a survey wherein 267 UG and PG students of Manipal Institute of Technology, participated and annotated 1000 images for its sentiment score on a scale of 7. Each image was presented to at least three annotators. After collecting all the annotations, we took the majority vote out of the three scores for each image; that is an image annotation is considered valid only when at least two of three annotators agree on the exact label (out of 7 labels). This dataset uses following sentiment label-map: 1-Depressed 2-Very Sad 3-Sad 4-Neutral 5-Happy 6-Very Happy 7-Excited
g
Multimodal Sentiment Dataset
gts.ai
json
Updated Aug 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GTS (2024). Multimodal Sentiment Dataset [Dataset]. https://gts.ai/dataset-download/multimodal-sentiment-dataset/
Explore at:
jsonAvailable download formats
Dataset updated
Aug 20, 2024
Dataset provided by
GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
Authors
GTS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Explore our Multimodal Sentiment Dataset, featuring 100 diverse classes of images and corresponding texts with sentiment labels. Ideal for AI-driven sentiment analysis, image classification, and multimodal fusion tasks.
f
Image and text datasets for sentiment analysis
figshare.com
zip
Updated Jun 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chuang Dong (2025). Image and text datasets for sentiment analysis [Dataset]. http://doi.org/10.6084/m9.figshare.29234471.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.29234471.v1
Dataset updated
Jun 4, 2025
Dataset provided by
figshare
Authors
Chuang Dong
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is an image and text dataset for sentiment analysis.
i
Data from: MOSABench: Multi-Object Sentiment Analysis Benchmark for...
ieee-dataport.org
Updated Nov 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shezheng Song (2024). MOSABench: Multi-Object Sentiment Analysis Benchmark for Evaluating Multimodal Large Language Models Understanding of Complex Image [Dataset]. https://ieee-dataport.org/documents/mosabench-multi-object-sentiment-analysis-benchmark-evaluating-multimodal-large-language
Explore at:
Dataset updated
Nov 24, 2024
Authors
Shezheng Song
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
image captioning
multimodal-sentiment-data
kaggle.com
Updated May 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Suraj (2023). multimodal-sentiment-data [Dataset]. https://www.kaggle.com/datasets/suraj520/multimodal-sentiment-data/suggestions?status=pending&yourSuggestions=true
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 8, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Suraj
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset provides a collection of images and their corresponding texts and sentiment which makes it a multi-modal sentiment analysis dataset.

The dataset contains images of 100 different classes of animals and objects, including sharks, birds, lizards, spiders, and more.

This dataset can be used for various computer vision and natural language processing tasks, such as image classification, sentiment analysis, and image captioning.
g
Sentiment Analysis for Movie Reviews
gts.ai
json
Updated Nov 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GTS (2023). Sentiment Analysis for Movie Reviews [Dataset]. https://gts.ai/case-study/sentiment-analysis-for-movie-reviews/
Explore at:
jsonAvailable download formats
Dataset updated
Nov 20, 2023
Dataset provided by
GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
Authors
GTS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The objective of sentiment analysis for movie reviews is to automatically analyze and categorize the sentiments expressed in reviews, providing insights into audience opinions, emotions, and reactions towards films.
m
ColorEmoNet
data.mendeley.com
Updated Jun 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SHANKAR MALI (2025). ColorEmoNet [Dataset]. http://doi.org/10.17632/zm46z6y597.1
Explore at:
Unique identifier
https://doi.org/10.17632/zm46z6y597.1
Dataset updated
Jun 26, 2025
Authors
SHANKAR MALI
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The ColorEmoNet dataset has been constructed using foundational concepts from colour theory to explore the relationship between colours and emotions.
Datasets for Sentiment Analysis
zenodo.org
csv
Updated Dec 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10157504
Dataset updated
Dec 10, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.
Below are the datasets specified, along with the details of their references, authors, and download sources.

----------- STS-Gold Dataset ----------------
The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.
Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.
File name: sts_gold_tweet.csv
----------- Amazon Sales Dataset ----------------
This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.
Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)
Features:
product_id - Product ID
product_name - Name of the Product
category - Category of the Product
discounted_price - Discounted Price of the Product
actual_price - Actual Price of the Product
discount_percentage - Percentage of Discount for the Product
rating - Rating of the Product
rating_count - Number of people who voted for the Amazon rating
about_product - Description about the Product
user_id - ID of the user who wrote review for the Product
user_name - Name of the user who wrote review for the Product
review_id - ID of the user review
review_title - Short review
review_content - Long review
img_link - Image Link of the Product
product_link - Official Website Link of the Product
License: CC BY-NC-SA 4.0
File name: amazon.csv
----------- Rotten Tomatoes Reviews Dataset ----------------
This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.
This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).
Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics
File name: data_rt.csv
----------- Preprocessed Dataset Sentiment Analysis ----------------
Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.
The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).
DOI: 10.34740/kaggle/dsv/3877817
Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }
This dataset was used in the experimental phase of my research.
File name: EcoPreprocessed.csv
----------- Amazon Earphones Reviews ----------------
This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)
License: U.S. Government Works
Source: www.amazon.in
File name (original): AllProductReviews.csv (contains 14337 reviews)
File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)
----------- Amazon Musical Instruments Reviews ----------------
This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).
Source: http://jmcauley.ucsd.edu/data/amazon/
File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)
File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)
Emoji Sentiment Ranking
figshare.com
txt
Updated May 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Petra Kralj Novak; Jasmina Smailović; Borut Sluban; Igor Mozetic (2023). Emoji Sentiment Ranking [Dataset]. http://doi.org/10.6084/m9.figshare.1600931.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1600931.v1
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Petra Kralj Novak; Jasmina Smailović; Borut Sluban; Igor Mozetic
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A lexicon of 751 emoji characters with automatically assigned sentiment. The sentiment is computed from 70,000 tweets, labeled by 83 human annotators in 13 European languages. The Emoji Sentiment Ranking web page at http://kt.ijs.si/data/Emoji_sentiment_ranking/ is automatically generated from the data provided in this repository. The process and analysis of emoji sentiment ranking is described in the paper: P. Kralj Novak, J. Smailović, B. Sluban, I. Mozetič, Sentiment of Emojis, submitted; arXiv preprint, http://arxiv.org/abs/1509.07761, 2015.
o
Twitter Public Sentiment Dataset
opendatabay.com
.undefined
Updated Jul 6, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Twitter Public Sentiment Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/04ea3224-1b10-48d4-871a-496c9a2633ff
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 6, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Telecommunications & Network Data
Description
This dataset provides a collection of 1000 tweets designed for sentiment analysis. The tweets were sourced from Twitter using Python and systematically generated using various modules to ensure a balanced representation of different tweet types, user behaviours, and sentiments. This includes the use of a random module for IDs and text, a faker module for usernames and dates, and a textblob module for assigning sentiment. The dataset's purpose is to offer a robust foundation for analysing and visualising sentiment trends and patterns, aiding in the initial exploration of data and the identification of significant patterns or trends.

Columns

Tweet ID: A unique identifier assigned to each individual tweet.

Text: The actual textual content of the tweet.

User: The username of the individual who posted the tweet.

Created At: The date and time when the tweet was originally published.

Likes: The total number of likes or approvals the tweet received.

Retweets: The total count of times the tweet was shared by other users.

Sentiment: The categorised emotional tone of the tweet, typically labelled as positive, neutral, or negative.

Distribution

The dataset is provided in a CSV file format. It consists of 1000 individual tweet records, structured in a tabular layout with the columns detailed above. A sample file will be made available separately on the platform.

Usage

This dataset is ideal for: * Analysing and visualising sentiment trends and patterns in social media. * Initial data exploration to uncover insights into tweet characteristics and user emotions. * Identifying underlying patterns or trends within social media conversations. * Developing and training machine learning models for sentiment classification. * Academic research into Natural Language Processing (NLP) and social media dynamics. * Educational purposes, allowing students to practise data analysis and visualisation techniques.

Coverage

The dataset spans tweets created between January and April 2023, as observed from the included data samples. While specific geographic or demographic information for users is not available within the dataset, the nature of Twitter implies a general global scope, reflecting a variety of user behaviours and sentiments without specific regional or population group focus.

License

CC0

Who Can Use It

This dataset is valuable for: * Data Scientists and Machine Learning Engineers working on NLP tasks and model development. * Researchers in fields such as Natural Language Processing, Machine Learning Algorithms, Deep Learning, and Computer Science. * Data Analysts looking to extract insights from social media content. * Academics and Students undertaking projects related to sentiment analysis or social media studies. * Anyone interested in understanding online sentiment and user behaviour on social media platforms.

Dataset Name Suggestions

Twitter Public Sentiment Dataset

Social Media Text Sentiment Analysis

General Tweet Mood Data

Twitter Sentiment Collection 2023

Microblog Sentiment Dataset

Attributes

Original Data Source: Twitter Sentiment Analysis using Roberta and VaderTwitter Sentiment Analysis using Roberta and Vader
COVID-19 Bimodal Sentiment Analysis Dataset
zenodo.org
bin, json
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hoque Anik Md Saidul; Hoque Anik Md Saidul (2025). COVID-19 Bimodal Sentiment Analysis Dataset [Dataset]. http://doi.org/10.5281/zenodo.15117479
Explore at:
json, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15117479
Dataset updated
Apr 1, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Hoque Anik Md Saidul; Hoque Anik Md Saidul
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Mar 31, 2025
Description
A Twitter sentiment bimodal analysis code and dataset using text and image before and during COVID-19
E
Czech image captioning, machine translation, sentiment analysis and...
live.european-language-grid.eu
Updated Jan 6, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). Czech image captioning, machine translation, sentiment analysis and summarization (Neural Monkey models) [Dataset]. https://live.european-language-grid.eu/catalogue/tool-service/18236
Explore at:
Dataset updated
Jan 6, 2020
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This submission contains trained end-to-end models for the Neural Monkey toolkit for Czech and English, solving four NLP tasks: machine translation, image captioning, sentiment analysis, and summarization. The models are trained on standard datasets and achieve state-of-the-art or near state-of-the-art performance in the tasks. The models are described in the accompanying paper. The same models can also be invoked via the online demo: https://ufal.mff.cuni.cz/grants/lsd

In addition to the models presented in the referenced paper (developed and published in 2018), we include models for automatic news summarization for Czech and English developed in 2019. The Czech models were trained using the SumeCzech dataset (https://www.aclweb.org/anthology/L18-1551.pdf), the English models were trained using the CNN-Daily Mail corpus (https://arxiv.org/pdf/1704.04368.pdf) using the standard recurrent sequence-to-sequence architecture.

There are several separate ZIP archives here, each containing one model solving one of the tasks for one language.

To use a model, you first need to install Neural Monkey: https://github.com/ufal/neuralmonkey To ensure correct functioning of the model, please use the exact version of Neural Monkey specified by the commit hash stored in the 'git_commit' file in the model directory.

Each model directory contains a 'run.ini' Neural Monkey configuration file, to be used to run the model. See the Neural Monkey documentation to learn how to do that (you may need to update some paths to correspond to your filesystem organization). The 'experiment.ini' file, which was used to train the model, is also included. Then there are files containing the model itself, files containing the input and output vocabularies, etc.

For the sentiment analyzers, you should tokenize your input data using the Moses tokenizer: https://pypi.org/project/mosestokenizer/

For the machine translation, you do not need to tokenize the data, as this is done by the model.

For image captioning, you need to: - download a trained ResNet: http://download.tensorflow.org/models/resnet_v2_50_2017_04_14.tar.gz - clone the git repository with TensorFlow models: https://github.com/tensorflow/models - preprocess the input images with the Neural Monkey 'scripts/imagenet_features.py' script (https://github.com/ufal/neuralmonkey/blob/master/scripts/imagenet_features.py) -- you need to specify the path to ResNet and to the TensorFlow models to this script

The summarization models require input that is tokenized with Moses Tokenizer (https://github.com/alvations/sacremoses) and lower-cased.

Feel free to contact the authors of this submission in case you run into problems!
i
Multimodal Sentiment Analysis for Urdu Language
ieee-dataport.org
Updated Dec 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ghulam Rabbani (2024). Multimodal Sentiment Analysis for Urdu Language [Dataset]. https://ieee-dataport.org/documents/multimodal-sentiment-analysis-urdu-language
Explore at:
Dataset updated
Dec 2, 2024
Authors
Ghulam Rabbani
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
natural language processing
Z
IFEED: Interactive Facial Expression and Emotion Detection Dataset
data.niaid.nih.gov
zenodo.org
Updated May 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oliveira, Nuno (2023). IFEED: Interactive Facial Expression and Emotion Detection Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7963451
Explore at:
Dataset updated
May 26, 2023
Dataset provided by
Vitorino, João
Oliveira, Nuno
Oliveira, Jorge
Praça, Isabel
Maia, Eva
Dias, Tiago
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Interactive Facial Expression and Emotion Detection (IFEED) is an annotated dataset that can be used to train, validate, and test Deep Learning models for facial expression and emotion recognition. It contains pre-filtered and analysed images of the interactions between the six main characters of the Friends television series, obtained from the video recordings of the Multimodal EmotionLines Dataset (MELD).

The images were obtained by decomposing the videos into multiple frames and extracting the facial expression of the correctly identified characters. A team composed of 14 researchers manually verified and annotated the processed data into several classes: Angry, Sad, Happy, Fearful, Disgusted, Surprised and Neutral.

IFEED can be valuable for the development of intelligent facial expression recognition solutions and emotion detection software, enabling binary or multi-class classification, or even anomaly detection or clustering tasks. The images with ambiguous or very subtle facial expressions can be repurposed for adversarial learning. The dataset can be combined with additional data recordings to create more complete and extensive datasets and improve the generalization of robust deep learning models.
R
Sentiment Analysis Dataset
universe.roboflow.com
zip
Updated Dec 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akshat Bhandari (2023). Sentiment Analysis Dataset [Dataset]. https://universe.roboflow.com/akshat-bhandari-h45vx/sentiment-analysis/dataset/7
Explore at:
zipAvailable download formats
Dataset updated
Dec 28, 2023
Dataset authored and provided by
Akshat Bhandari
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Boxes Polygons
Description
Sentiment Analysis

## Overview Sentiment Analysis is a dataset for instance segmentation tasks - it contains Boxes annotations for 324 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
f
Comparison of results obtained through CAGT by analyzing image discrepancies...
plos.figshare.com
xls
Updated May 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiaocong Jiang; Ahmad Edwin bin Mohamed; Amirul Husni bin Affifudin (2025). Comparison of results obtained through CAGT by analyzing image discrepancies with those obtained from a single perspective. [Dataset]. http://doi.org/10.1371/journal.pone.0324148.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0324148.t002
Dataset updated
May 27, 2025
Dataset provided by
PLOS ONE
Authors
Xiaocong Jiang; Ahmad Edwin bin Mohamed; Amirul Husni bin Affifudin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Comparison of results obtained through CAGT by analyzing image discrepancies with those obtained from a single perspective.
m
Composing alt text using large language models: dataset in Russian
data.mendeley.com
Updated Jun 17, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yekaterina Kosova (2024). Composing alt text using large language models: dataset in Russian [Dataset]. http://doi.org/10.17632/73dptbyxbb.1
Explore at:
Unique identifier
https://doi.org/10.17632/73dptbyxbb.1
Dataset updated
Jun 17, 2024
Authors
Yekaterina Kosova
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains the results of developing alternative text for images using chatbots based on large language models. The study was carried out in April-June 2024. Microsoft Copilot, Google Gemini, and YandexGPT chatbots were used to generate 108 text descriptions for 12 images. Descriptions were generated by chatbots using keywords specified by a person. The experts then rated the resulting descriptions on a Likert scale (from 1 to 5). The data set is presented in a Microsoft Excel table on the “Data” sheet with the following fields: record number; image number; chatbot; image type (photo, logo); request date; list of keywords; number of keywords; length of keywords; time of compilation of keywords; generated descriptions; required length of descriptions; actual length of descriptions; description generation time; usefulness; reliability; completeness; accuracy; literacy. The “Images” sheet contains links to the original images. Data set is presented in Russian.

BanglaBook Dataset

paperswithcode.com

Updated May 10, 2023

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Mohsinul Kabir; Obayed Bin Mahfuz; Syed Rifat Raiyan; Hasan Mahmud; Md Kamrul Hasan (2023). BanglaBook Dataset [Dataset]. https://paperswithcode.com/dataset/banglabook

Explore at:

Dataset updated

May 10, 2023

Authors

Mohsinul Kabir; Obayed Bin Mahfuz; Syed Rifat Raiyan; Hasan Mahmud; Md Kamrul Hasan

Description

This repository contains the code, data, and models of the paper titled "BᴀɴɢʟᴀBᴏᴏᴋ: A Large-scale Bangla Dataset for Sentiment Analysis from Book Reviews" published in the Findings of the Association for Computational Linguistics: ACL 2023.

License: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International

Data Format Each row consists of a book review sample. The table below describes what each column signifies.

Column Title	Description
id	The unique identification number of the sample
Book_Name	The title of the book that has been evaluated by the review
Writer_Name	The name of the book's author
Category	The genre to which the book belongs
Rating	A numerical value $r$ such that $1\leq r \leq 5$ A score reflecting the reviewer's subjective assessment of the book's quality
Review	The review text written by the reviewer
Site	The name of the online bookshop
sentiment	The conveyed sentiment and class label of the review For a review sample $i$ with rating $r_i$, the sentiment label $S_i$ is, $S_i =\begin{cases}Negative, & \text{if $r_i \leq 2$}\Neutral, & \text{if $r_i = 3$}\Positive, & \text{if $r_i \geq 4$}\end{cases}$
label	The numerical representation of the sentiment label For a review sample $i$ with sentiment label $S_i$, the numerical label is, $label_i =\begin{cases}0, & \text{if $S_i = Negative$}\1, & \text{if $S_i = Neutral$}\2, & \text{if $S_i = Positive$}\end{cases}$

Data Construction Data Collection Process For the data collection and preparation process of the BᴀɴɢʟᴀBᴏᴏᴋ dataset, we first compile a list of URLs for authors from online bookstores. From there, we procure URLs for the books. We meticulously scrape information such as book titles, author names, book categories, review texts, reviewer names, review dates, and ratings by utilizing these book URLs. https://github.com/mohsinulkabir14/BanglaBook/raw/main/images/banglabookgithub1.png" alt="drawing" style="width:1000px;"/>

Labeling, Translation, and Validation of the Curated Samples If a review does not have a rating, we deem it unannotated. Reviews with a rating of 1 or 2 are classified as negative, a rating of 3 is considered neutral, and a rating of 4 or 5 is classified as positive. After discarding the unannotated reviews, we curate a final dataset of 158,065 annotated reviews. Of these, 89,371 are written entirely in Bangla. The remaining 68,694 reviews were written in Romanized Bangla, English, or a mix of languages. They are translated into Bangla with Google Translator and a custom Python program using the googletrans library. The translations are subsequently subjected to manual review and scrutiny to confirm their accuracy. https://github.com/mohsinulkabir14/BanglaBook/raw/main/images/banglabookgithub2.png" alt="drawing" style="width:1000px;"/>

Results https://github.com/mohsinulkabir14/BanglaBook/raw/main/images/banglabookgithub3.png" alt="drawing" style="width:1000px;"/>

Citation If you find this work useful, please cite our paper: bib @inproceedings{kabir-etal-2023-banglabook, title = "{B}angla{B}ook: A Large-scale {B}angla Dataset for Sentiment Analysis from Book Reviews", author = "Kabir, Mohsinul and Bin Mahfuz, Obayed and Raiyan, Syed Rifat and Mahmud, Hasan and Hasan, Md Kamrul", booktitle = "Findings of the Association for Computational Linguistics: ACL 2023", month = jul, year = "2023", address = "Toronto, Canada", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.findings-acl.80", pages = "1237--1247", abstract = "The analysis of consumer sentiment, as expressed through reviews, can provide a wealth of insight regarding the quality of a product. While the study of sentiment analysis has been widely explored in many popular languages, relatively less attention has been given to the Bangla language, mostly due to a lack of relevant data and cross-domain adaptability. To address this limitation, we present BanglaBook, a large-scale dataset of Bangla book reviews consisting of 158,065 samples classified into three broad categories: positive, negative, and neutral. We provide a detailed statistical analysis of the dataset and employ a range of machine learning models to establish baselines including SVM, LSTM, and Bangla-BERT. Our findings demonstrate a substantial performance advantage of pre-trained models over models that rely on manually crafted features, emphasizing the necessity for additional training resources in this domain. Additionally, we conduct an in-depth error analysis by examining sentiment unigrams, which may provide insight into common classification errors in under-resourced languages like Bangla. Our codes and data are publicly available at https://github.com/mohsinulkabir14/BanglaBook.", }

f
Data from: Facial Expression Image Dataset for Computer Vision Algorithms
salford.figshare.com
Updated Apr 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ali Alameer; Odunmolorun Osonuga (2025). Facial Expression Image Dataset for Computer Vision Algorithms [Dataset]. http://doi.org/10.17866/rd.salford.21220835.v2
Explore at:
Unique identifier
https://doi.org/10.17866/rd.salford.21220835.v2
Dataset updated
Apr 29, 2025
Dataset provided by
University of Salford
Authors
Ali Alameer; Odunmolorun Osonuga
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset for this project is characterised by photos of individual human emotion expression and these photos are taken with the help of both digital camera and a mobile phone camera from different angles, posture, background, light exposure, and distances. This task might look and sound very easy but there were some challenges encountered along the process which are reviewed below: 1) People constraint One of the major challenges faced during this project is getting people to participate in the image capturing process as school was on vacation, and other individuals gotten around the environment were not willing to let their images be captured for personal and security reasons even after explaining the notion behind the project which is mainly for academic research purposes. Due to this challenge, we resorted to capturing the images of the researcher and just a few other willing individuals. 2) Time constraint As with all deep learning projects, the more data available the more accuracy and less error the result will produce. At the initial stage of the project, it was agreed to have 10 emotional expression photos each of at least 50 persons and we can increase the number of photos for more accurate results but due to the constraint in time of this project an agreement was later made to just capture the researcher and a few other people that are willing and available. These photos were taken for just two types of human emotion expression that is, “happy” and “sad” faces due to time constraint too. To expand our work further on this project (as future works and recommendations), photos of other facial expression such as anger, contempt, disgust, fright, and surprise can be included if time permits. 3) The approved facial emotions capture. It was agreed to capture as many angles and posture of just two facial emotions for this project with at least 10 images emotional expression per individual, but due to time and people constraints few persons were captured with as many postures as possible for this project which is stated below: Ø Happy faces: 65 images Ø Sad faces: 62 images There are many other types of facial emotions and again to expand our project in the future, we can include all the other types of the facial emotions if time permits, and people are readily available. 4) Expand Further. This project can be improved furthermore with so many abilities, again due to the limitation of time given to this project, these improvements can be implemented later as future works. In simple words, this project is to detect/predict real-time human emotion which involves creating a model that can detect the percentage confidence of any happy or sad facial image. The higher the percentage confidence the more accurate the facial fed into the model. 5) Other Questions Can the model be reproducible? the supposed response to this question should be YES. If and only if the model will be fed with the proper data (images) such as images of other types of emotional expression.
RoMEMES v2
zenodo.org
zip
Updated May 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vasile Florian Pais; Vasile Florian Pais; Daniela Gifu; Daniela Gifu (2025). RoMEMES v2 [Dataset]. http://doi.org/10.5281/zenodo.15424025
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15424025
Dataset updated
May 15, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Vasile Florian Pais; Vasile Florian Pais; Daniela Gifu; Daniela Gifu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
RoMEMESv2 is a dataset of Romanian language memes, collected from public social media platforms. The dataset was manually annotated with:

associated text in Romanian language;

image complexity;

polarity;

sentiment;

political content.

In addition, the dataset contains associated metadata and the text part was automatically annotated in the RELATE platform with part-of-speech tags, lemmas, and dependency parsing.

Files and folders in this dataset:

metadata.tsv - contains metadata and annotations; the first column is the file ID;

LICENSE - contains licensing information;

README - is this file;

images - folder containing image files, following the file naming convention ID.extension, where extension is the original file extension (sometimes this may not correspond with the mime/type of the file, as indicated in metadata.tsv);

text - folder containing text files, following the file naming convention ID.txt; this is only the message from the meme, without additional text (text from logos, unrelated text, etc.);

conllup - folder containing automatic text annotations for the files in the "text" folder, created in the RELATE platform, following the file naming convention ID.conllup;

text_complete - folder with the complete text extracted from the meme (contains additional text which may not be directly related to the meme message);

conllup_complete - folder containing automatic text annotations for the files in the "text_complete" folder, created in the RELATE platform, following the file naming convention ID.conllup.

A first version of this corpus was released here: RoMEMES https://doi.org/10.5281/zenodo.13120215" target="_blank" rel="noopener">https://doi.org/10.5281/zenodo.13120215
The current version has more data and the additional text_complete and conllup_complete folders. These are new levels of annotation, which were not available in the initial release. To maintain compatibility with existing code, the rest of the data is in the same format. Currently not all memes have the text_complete annotation. In case a text file is missing in one of the folders, use the text from the other folder.

Facebook

Twitter

Click to copy link

Link copied

Cite

Stuti Jindal; Sanjay Singh (2016). Manipal Image Sentiment Analysis Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.1496534.v2

Manipal Image Sentiment Analysis Dataset

Explore at:

xlsxAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.1496534.v2

Dataset updated

Jan 20, 2016

Dataset provided by

Figsharehttp://figshare.com/

Authors

Stuti Jindal; Sanjay Singh

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered

Manipal

Description

This dataset has been created through a survey wherein 267 UG and PG students of Manipal Institute of Technology, participated and annotated 1000 images for its sentiment score on a scale of 7. Each image was presented to at least three annotators. After collecting all the annotations, we took the majority vote out of the three scores for each image; that is an image annotation is considered valid only when at least two of three annotators agree on the exact label (out of 7 labels). This dataset uses following sentiment label-map: 1-Depressed 2-Very Sad 3-Sad 4-Neutral 5-Happy 6-Very Happy 7-Excited

Clear search

Close search

Google apps

Main menu

Manipal Image Sentiment Analysis Dataset

Multimodal Sentiment Dataset

Image and text datasets for sentiment analysis

Data from: MOSABench: Multi-Object Sentiment Analysis Benchmark for...

multimodal-sentiment-data

Sentiment Analysis for Movie Reviews

ColorEmoNet

Datasets for Sentiment Analysis

Emoji Sentiment Ranking

Twitter Public Sentiment Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

COVID-19 Bimodal Sentiment Analysis Dataset

Czech image captioning, machine translation, sentiment analysis and...

Multimodal Sentiment Analysis for Urdu Language

IFEED: Interactive Facial Expression and Emotion Detection Dataset

Sentiment Analysis Dataset

Sentiment Analysis

Comparison of results obtained through CAGT by analyzing image discrepancies...

Composing alt text using large language models: dataset in Russian

BanglaBook Dataset

Data from: Facial Expression Image Dataset for Computer Vision Algorithms

RoMEMES v2

Manipal Image Sentiment Analysis Dataset