100+ datasets found

c
emotion analysis based on text Dataset
cubig.ai
Updated Feb 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). emotion analysis based on text Dataset [Dataset]. https://cubig.ai/store/products/139/emotion-analysis-based-on-text-dataset
Explore at:
Dataset updated
Feb 25, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
Description
1) Data introduction • Emotion-analysis dataset is data for analyzing the emotions of text.

2) Data utilization (1) Emotion-analysis data has characteristics that: • Contains a variety of texts that convey emotions ranging from happiness to anger to sadness. The goal is to build an efficient model for detecting emotions in text. (2) Emotion-analysis data can be used to: • Sentiment classification models: This dataset can be used to train machine learning models that classify text based on sentiment, which helps companies and researchers understand public opinion and sentiment trends. • Market research: Researchers can analyze sentiment data to understand consumer preferences and market trends and support data-driven decision making.
m
BanglaEmotion: A Benchmark Dataset for Bangla Textual Emotion Analysis
data.mendeley.com
Updated Nov 20, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Md Ataur Rahman (2020). BanglaEmotion: A Benchmark Dataset for Bangla Textual Emotion Analysis [Dataset]. http://doi.org/10.17632/24xd7w7dhp.1
Explore at:
Unique identifier
https://doi.org/10.17632/24xd7w7dhp.1
Dataset updated
Nov 20, 2020
Authors
Md Ataur Rahman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present a manually annotated Bangla Emotion corpus, which incorporates the diversity of fine-grained emotion expressions in social-media text. We tried to consider more fine-grained emotion labels such as Sadness, Happiness, Disgust, Surprise, Fear and Anger - which are, according to Paul Ekman (1999), the six basic emotion categories. For this task, we collected a large amount of raw text data from the user’s comments on two different Facebook groups (Ekattor TV and Airport Magistrates) and from the public post of a popular blogger and activist Dr. Imran H Sarker. These comments are mostly reactions to ongoing socio-political issues and towards the economic success and failure of Bangladesh. We scrape a total of 32923 comments from the three sources aforementioned above. Out of these, a total of 6314 comments were annotated into the six categories. The distribution of the annotated corpus is as follows:

sad = 1341 happy = 1908 disgust = 703 surprise = 562 fear = 384 angry = 1416

We have also provided a balanced set from the above data and split the dataset into training and test set of equal ratio. We considered a proportion of 5:1 for training and evaluation purpose. More information on the dataset and the experiments on it could be found in our paper (related links below).
h
emotion
huggingface.co
Updated Feb 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DAIR.AI (2023). emotion [Dataset]. https://huggingface.co/datasets/dair-ai/emotion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 16, 2023
Dataset provided by
DAIR.AI
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
Dataset Card for "emotion"

Dataset Summary

Emotion is a dataset of English Twitter messages with six basic emotions: anger, fear, joy, love, sadness, and surprise. For more detailed information please refer to the paper.

Supported Tasks and Leaderboards

More Information Needed

Languages

More Information Needed

Dataset Structure Data Instances

An example looks as follows. { "text": "im feeling quite sad and sorry for myself but… See the full description on the dataset page: https://huggingface.co/datasets/dair-ai/emotion.
c
Sentiment Analysis Dataset
cubig.ai
Updated May 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). Sentiment Analysis Dataset [Dataset]. https://cubig.ai/store/products/270/sentiment-analysis-dataset
Explore at:
Dataset updated
May 20, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
Description
1) Data Introduction • The Sentiment Analysis Dataset is a dataset for emotional analysis, including large-scale tweet text collected from Twitter and emotional polarity (0=negative, 2=neutral, 4=positive) labels for each tweet, featuring automatic labeling based on emoticons.

2) Data Utilization (1) Sentiment Analysis Dataset has characteristics that: • Each sample consists of six columns: emotional polarity, tweet ID, date of writing, search word, author, and tweet body, and is suitable for training natural language processing and classification models using tweet text and emotion labels. (2) Sentiment Analysis Dataset can be used to: • Emotional Classification Model Development: Using tweet text and emotional polarity labels, we can build positive, negative, and neutral emotional automatic classification models with various machine learning and deep learning models such as logistic regression, SVM, RNN, and LSTM. • Analysis of SNS public opinion and trends: By analyzing the distribution of emotions by time series and keywords, you can explore changes in public opinion on specific issues or brands, positive and negative trends, and key emotional keywords.
E
A Sentiment Analysis Dataset for Code-Mixed Malayalam-English
live.european-language-grid.eu
zenodo.org
tsv
Updated Dec 13, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). A Sentiment Analysis Dataset for Code-Mixed Malayalam-English [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7634
Explore at:
tsvAvailable download formats
Dataset updated
Dec 13, 2021
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
There is an increasing demand for sentiment analysis of text from social media which are mostly code-mixed. Systems trained on monolingual data fail for code-mixed data due to the complexity of mixing at different levels of the text. However, very few resources are available for code-mixed data to create models specific for this data. Although much research in multilingual and cross-lingual sentiment analysis has used semi-supervised or unsupervised methods, supervised methods still performs better. Only a few datasets for popular languages such as English-Spanish, English-Hindi, and English-Chinese are available. There are no resources available for Malayalam-English code-mixed data. This paper presents a new gold standard corpus for sentiment analysis of code-mixed text in Malayalam-English annotated by voluntary annotators. This gold standard corpus obtained a Krippendorff’s alpha above 0.8 for the dataset. We use this new corpus to provide the benchmark for sentiment analysis in Malayalam-English code-mixed texts.
Text Emotion Recognition
kaggle.com
Updated Mar 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shreejit Cheela (2023). Text Emotion Recognition [Dataset]. https://www.kaggle.com/shreejitcheela/text-emotion-recognition/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 6, 2023
Dataset provided by
Kaggle
Authors
Shreejit Cheela
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Emotions play a vital role in human communication, and detecting emotions from text data is a challenging task. The ability to automatically recognize emotions from text has many practical applications, such as in sentiment analysis, social media monitoring, and customer feedback analysis.

In this project, we will discuss the working principle of a text emotion recognition model and its important terminologies. We will also provide a detailed description of the model architecture used and its training process. Finally, we will conclude by evaluating the model using confusion matrix and classification report. Here, in the "emotions" column 0: sad 1: happy

slang.txt in Abbreviations step can be taken from: https://www.kaggle.com/datasets/mansis97/slangs
E
Emotions Analytics (EA) Software Report
datainsightsmarket.com
doc, pdf, ppt
Updated Jul 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Emotions Analytics (EA) Software Report [Dataset]. https://www.datainsightsmarket.com/reports/emotions-analytics-ea-software-1973364
Explore at:
ppt, doc, pdfAvailable download formats
Dataset updated
Jul 13, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Emotions Analytics (EA) Software market is experiencing robust growth, driven by increasing demand for personalized customer experiences, advancements in artificial intelligence (AI) and machine learning (ML), and the rising adoption of digital channels across various industries. The market, estimated at $2 billion in 2025, is projected to exhibit a Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033, reaching approximately $7 billion by 2033. This expansion is fueled by several key factors. Firstly, businesses are leveraging EA to gain deeper insights into consumer behavior, enabling more effective marketing strategies, product development, and customer service improvements. Secondly, the sophistication of EA technology continues to improve, with more accurate emotion detection capabilities and the integration of diverse data sources (facial expressions, voice tone, text analysis) resulting in more comprehensive and reliable insights. Finally, growing regulatory requirements concerning data privacy and ethical considerations are driving demand for robust and compliant EA solutions. However, the market's growth is not without its challenges. High initial investment costs for implementing EA systems and the need for specialized expertise to interpret and analyze the collected data can act as significant barriers to entry for smaller businesses. Moreover, concerns surrounding data privacy and the potential for misuse of emotionally sensitive information remain important hurdles that need to be addressed through transparent data handling practices and robust ethical guidelines. The competitive landscape is characterized by a mix of large established technology firms like Microsoft and IBM, alongside innovative specialized companies like iMotions and Affectiva, fostering a dynamic market environment with varied technological approaches and service offerings. Future growth will depend on continued technological advancements, the development of robust ethical frameworks, and increased awareness of the value proposition of EA across diverse sectors.
Z
Data for manuscript: "Longitudinal Analysis of Sentiment and Emotion in News...
data.niaid.nih.gov
zenodo.org
Updated Sep 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymized (2022). Data for manuscript: "Longitudinal Analysis of Sentiment and Emotion in News Media Headlines Using Automated Labelling with Transformer Language Models" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5144112
Explore at:
Dataset updated
Sep 13, 2022
Dataset authored and provided by
Anonymized
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data set contains automated sentiment and emotionality annotations of 23 million headlines from 47 popular news media outlets popular in the United States.

The set of 47 news media outlets analysed (listed in Figure 1 of the main manuscript) was derived from the AllSides organization 2019 Media Bias Chart v1.1. The human ratings of outlets’ ideological leanings were also taken from this chart and are listed in Figure 2 of the main manuscript.

News articles headlines from the set of outlets analyzed in the manuscript are available in the outlets’ online domains and/or public cache repositories such as The Internet Wayback Machine, Google cache and Common Crawl. Articles headlines were located in articles’ HTML raw data using outlet-specific XPath expressions.

The temporal coverage of headlines across news outlets is not uniform. For some media organizations, news articles availability in online domains or Internet cache repositories becomes sparse for earlier years. Furthermore, some news outlets popular in 2019, such as The Huffington Post or Breitbart, did not exist in the early 2000’s. Hence, our data set is sparser in headlines sample size and representativeness for earlier years in the 2000-2019 timeline. Nevertheless, 20 outlets in our data set have chronologically continuous partial or full headline data availability since the year 2000. Figure S 1 in the SI reports the number of headlines per outlet and per year in our analysis.

In a small percentage of articles, outlet specific XPath expressions might fail to properly capture the content of the headline due to the heterogeneity of HTML elements and CSS styling combinations with which articles text content is arranged in outlets online domains. After manual testing, we determined that the percentage of headlines following in this category is very small. Additionally, our method might miss detecting some articles in the online domains of news outlets. To conclude, in a data analysis of over 23 million headlines, we cannot manually check the correctness of every single data instance and hundred percent accuracy at capturing headlines’ content is elusive due to the small number of difficult to detect boundary cases such as incorrect HTML markup syntax in online domains. Overall however, we are confident that our headlines set is representative of headlines in print news media content for the studied time period and outlets analyzed.

The list of compressed files in this data set is listed next:

-analysisScripts.rar contains the analysis scripts used in the main manuscript as well as aggregated data of sentiment and emotionality automated annotations of the headlines and human annotations of a subset of headlines sentiment and emotionality used as ground truth.

-models.rar contains the Transformer sentiment and emotion annotation models used in the analysis. Namely:

Siebert/sentiment-roberta-large-english from https://huggingface.co/siebert/sentiment-roberta-large-english. This model is a fine-tuned checkpoint of RoBERTa-large (Liu et al. 2019). It enables reliable binary sentiment analysis for various types of English-language text. For each instance, it predicts either positive (1) or negative (0) sentiment. The model was fine-tuned and evaluated on 15 data sets from diverse text sources to enhance generalization across different types of texts (reviews, tweets, etc.). See more information from the original authors at https://huggingface.co/siebert/sentiment-roberta-large-english

DistilbertSST2.rar is the default sentiment classification model of the HuggingFace Transformer library https://huggingface.co/ This model is only used to replicate the results of the sentiment analysis with sentiment-roberta-large-english

DistilRoberta j-hartmann/emotion-english-distilroberta-base from https://huggingface.co/j-hartmann/emotion-english-distilroberta-base. The model is a fine-tuned checkpoint of DistilRoBERTa-base. The model allows annotation of English text with Ekman's 6 basic emotions, plus a neutral class. The model was trained on 6 diverse datasets. Please refer to the original author at https://huggingface.co/j-hartmann/emotion-english-distilroberta-base for an overview of the data sets used for fine tuning. https://huggingface.co/j-hartmann/emotion-english-distilroberta-base

-headlinesDataWithSentimentLabelsAnnotationsFromSentimentRobertaLargeModel.rar URLs of headlines analyzed and the sentiment annotations of the siebert/sentiment-roberta-large-english Transformer model. https://huggingface.co/siebert/sentiment-roberta-large-english

-headlinesDataWithSentimentLabelsAnnotationsFromDistilbertSST2.rar URLs of headlines analyzed and the sentiment annotations of the default HuggingFace sentiment analysis model fine-tuned on the SST-2 dataset. https://huggingface.co/

-headlinesDataWithEmotionLabelsAnnotationsFromDistilRoberta.rar URLs of headlines analyzed and the emotion categories annotations of the j-hartmann/emotion-english-distilroberta-base Transformer model. https://huggingface.co/j-hartmann/emotion-english-distilroberta-base
H
Data from: An emotion analysis dataset of course comment texts in massive...
dataverse.harvard.edu
Updated Sep 26, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiang Feng; Keyi Yuan; Xiu Guan; Longhui Qiu (2022). An emotion analysis dataset of course comment texts in massive online learning course platforms [Dataset]. http://doi.org/10.7910/DVN/LC6GHO
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/LC6GHO
Dataset updated
Sep 26, 2022
Dataset provided by
Harvard Dataverse
Authors
Xiang Feng; Keyi Yuan; Xiu Guan; Longhui Qiu
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Datasets are critical for emotion analysis in the machine learning field. This study aims to explore emotion analysis datasets and related benchmarks in online learning, since, currently, there are very few studies that explore the same. We have scientifically labeled the topic and nine-category emotion of 4715 comment texts in online learning platforms using the “three-person voting label method” based on the “sentence-level” and multi-category labeling dimensions with our self-developed system. After testing the consistency of the labeling results using the Fleiss Kappa method, we found that the consistency of the dataset was about 0.51, representing a moderate strength of agreement. Based on the dataset, the prediction accuracy of the Long-Short Term Memory (LSTM) method is about 0.68. This dataset provides a benchmark for the multi- category emotion dataset in the Chinese online learning field. It can provide a basis for the subsequent solution of emotion analysis, monitoring, and intervention in the education field. It can also provide a reference for constructing subsequent datasets in the education field. We need to remind you that this is a Chinese dataset. If you want to use this dataset, please contact the author and you should request for the dataset below.
h
multiclass-sentiment-analysis-dataset
huggingface.co
Updated Jul 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shahriar Parvez (2023). multiclass-sentiment-analysis-dataset [Dataset]. https://huggingface.co/datasets/Sp1786/multiclass-sentiment-analysis-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 14, 2023
Authors
Shahriar Parvez
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for Dataset Name

Dataset Summary

This dataset card aims to be a base template for new datasets. It has been generated using this raw template.

Supported Tasks and Leaderboards

[More Information Needed]

Languages

[More Information Needed]

Dataset Structure Data Instances

[More Information Needed]

Data Fields

[More Information Needed]

Data Splits

[More Information Needed]

Dataset Creation… See the full description on the dataset page: https://huggingface.co/datasets/Sp1786/multiclass-sentiment-analysis-dataset.

Emotion Recognition and Sentiment Analysis Software Market Analysis North...

technavio.com

Updated Jan 19, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Technavio (2024). Emotion Recognition and Sentiment Analysis Software Market Analysis North America, Europe, APAC, South America, Middle East and Africa - US, China, Japan, UK, Germany - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/emotion-recognition-and-sentiment-analysis-software-market-industry-analysis

Explore at:

Dataset updated

Jan 19, 2024

Dataset provided by

TechNavio

Authors

Technavio

Time period covered

2021 - 2025

Area covered

United States, Global

Description

Snapshot img

Emotion Recognition and Sentiment Analysis Software Market Size 2024-2028

The emotion recognition and sentiment analysis software market size is forecast to increase by USD 797.17 million at a CAGR of 14.15% between 2023 and 2028.

The market is experiencing significant growth, driven by the increasing popularity of wearable devices and the adoption of real-time sensing analysis. These technologies enable more accurate and timely emotion recognition, providing valuable insights for various applications, including healthcare, marketing, and customer service. However, the market faces challenges, most notably the issue of low-quality video content hampering emotional interpretation. Regulatory hurdles also impact adoption, as organizations navigate complex data privacy and security regulations.
To capitalize on market opportunities and navigate challenges effectively, companies must focus on improving data quality, investing in advanced algorithms, and addressing regulatory requirements. By doing so, they can differentiate themselves in a competitive landscape and drive innovation in the market.

What will be the Size of the Emotion Recognition and Sentiment Analysis Software Market during the forecast period?

Request Free Sample

The market is experiencing significant growth, driven by the increasing adoption of conversational AI and virtual assistants. This technology enables the analysis of both textual and multimedia data, including audio and video, to extract emotional insights from user interactions. Data mining techniques, such as predictive modeling and model deployment, play a crucial role in processing and interpreting this data. Sentiment analysis dashboards and emotion recognition dashboards provide valuable insights into user experience, allowing businesses to map and optimize both the employee and customer journey. Cognitive computing and cognitive AI technologies are also integral to this market, enabling real-time analysis of user behavior and feedback.
Data ethics and responsible AI are becoming increasingly important considerations in this market, with a focus on data governance and model training to ensure accurate and explainable AI. Biometric data and behavioral data are also being leveraged to enhance the capabilities of emotion recognition systems, further expanding their applications. Model evaluation and model training are essential components of this market, ensuring the accuracy and effectiveness of AI models. Interpretable AI and explainable AI are also gaining traction, enabling businesses to understand the reasoning behind AI decisions and build trust in the technology. Data annotation and data annotation tools are critical for training AI models, ensuring high-quality data and accurate sentiment analysis.
Overall, the market is poised for continued growth, offering businesses valuable insights into user emotions and improving the user experience.

How is this Emotion Recognition and Sentiment Analysis Software Industry segmented?

The emotion recognition and sentiment analysis software industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.

Application

  Customer service/experience
  Product/market research
  Patient diagnosis
  Others


Deployment

  On-premises
  Cloud-based


Geography

  North America

    US


  Europe

    Germany
    UK


  APAC

    China
    Japan


  Rest of World (ROW)

By Application Insights

The customer service/experience segment is estimated to witness significant growth during the forecast period.

Emotion AI technology, integrated with sentiment analysis tools, is revolutionizing business operations by enabling real-time understanding of customer emotions and feedback. These solutions utilize machine learning, natural language processing, and computer vision to analyze text, voice, and facial expressions for sentiment scoring, emotion classification, and polarity analysis. Emotion lexicons and sentiment lexicons are used to identify and categorize emotions, while deep learning and predictive analytics provide insights into historical trends. Sentiment analysis plays a crucial role in various industries, including human resources for employee engagement and feedback analysis, fraud detection, and brand reputation management. It is also used in customer service to enhance customer experience through personalized communication and proactive issue resolution.

Social media monitoring and text analysis help businesses stay updated on brand mentions and customer sentiments, while voice analysis and tone analysis provide valuable insights from customer interactions. Integration with APIs, cloud computing, and data visualization tools streamlines the process, allowing for seamless im

E
Emotion Recognition and Sentiment Analysis Software Market Report
marketreportanalytics.com
doc, pdf, ppt
Updated Mar 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Report Analytics (2025). Emotion Recognition and Sentiment Analysis Software Market Report [Dataset]. https://www.marketreportanalytics.com/reports/emotion-recognition-and-sentiment-analysis-software-market-11382
Explore at:
ppt, pdf, docAvailable download formats
Dataset updated
Mar 19, 2025
Dataset authored and provided by
Market Report Analytics
License
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Emotion Recognition and Sentiment Analysis Software Market is experiencing robust growth, projected to reach $849.76 million in 2025 and maintain a Compound Annual Growth Rate (CAGR) of 14.15% from 2025 to 2033. This expansion is fueled by several key drivers. Increasing adoption of AI-powered solutions across diverse sectors, including customer service, market research, and healthcare (patient diagnosis), is a primary factor. Businesses leverage these tools to gain valuable insights into customer preferences, improve product development, and personalize user experiences. The rise of cloud-based deployment models further accelerates market growth, offering scalability, cost-effectiveness, and enhanced accessibility. Furthermore, the growing need for effective brand monitoring and reputation management, particularly on social media, is driving demand for sentiment analysis tools. While data privacy concerns and ethical considerations surrounding emotion recognition technology pose certain restraints, the overall market outlook remains exceptionally positive. The market is segmented by application (customer service/experience, product/market research, patient diagnosis, others) and deployment (on-premises, cloud-based), reflecting the diverse use cases and deployment preferences of different industries. North America currently holds a significant market share, driven by early adoption and technological advancements. However, APAC is expected to exhibit substantial growth in the coming years, fueled by increasing digitalization and a burgeoning tech industry in countries like China and Japan. Leading companies are focusing on strategic partnerships, acquisitions, and the development of innovative solutions to maintain a competitive edge in this rapidly evolving landscape. The competitive landscape is characterized by a mix of established tech giants like Microsoft and IBM alongside specialized emotion AI companies. The market’s success hinges on the continuous improvement of algorithm accuracy, addressing ethical concerns, and ensuring responsible data handling. Future growth will depend on advancements in deep learning and computer vision, enabling more nuanced and accurate emotion recognition across various modalities, including facial expressions, voice tone, and text analysis. Addressing data bias and ensuring compliance with data privacy regulations are crucial for sustainable growth. The market's segmentation reflects its adaptability across various industries, underscoring its potential for widespread application and sustained expansion throughout the forecast period.
Datasets for Sentiment Analysis
zenodo.org
csv
Updated Dec 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10157504
Dataset updated
Dec 10, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.
Below are the datasets specified, along with the details of their references, authors, and download sources.

----------- STS-Gold Dataset ----------------
The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.
Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.
File name: sts_gold_tweet.csv
----------- Amazon Sales Dataset ----------------
This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.
Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)
Features:
product_id - Product ID
product_name - Name of the Product
category - Category of the Product
discounted_price - Discounted Price of the Product
actual_price - Actual Price of the Product
discount_percentage - Percentage of Discount for the Product
rating - Rating of the Product
rating_count - Number of people who voted for the Amazon rating
about_product - Description about the Product
user_id - ID of the user who wrote review for the Product
user_name - Name of the user who wrote review for the Product
review_id - ID of the user review
review_title - Short review
review_content - Long review
img_link - Image Link of the Product
product_link - Official Website Link of the Product
License: CC BY-NC-SA 4.0
File name: amazon.csv
----------- Rotten Tomatoes Reviews Dataset ----------------
This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.
This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).
Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics
File name: data_rt.csv
----------- Preprocessed Dataset Sentiment Analysis ----------------
Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.
The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).
DOI: 10.34740/kaggle/dsv/3877817
Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }
This dataset was used in the experimental phase of my research.
File name: EcoPreprocessed.csv
----------- Amazon Earphones Reviews ----------------
This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)
License: U.S. Government Works
Source: www.amazon.in
File name (original): AllProductReviews.csv (contains 14337 reviews)
File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)
----------- Amazon Musical Instruments Reviews ----------------
This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).
Source: http://jmcauley.ucsd.edu/data/amazon/
File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)
File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)
f
Four Text Datasets Used For Comparison Between Hedonometer and Azure...
figshare.com
txt
Updated Nov 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Siddhant Jaydeep Mahajani; Shashank Srivastava; Alan Smeaton (2023). Four Text Datasets Used For Comparison Between Hedonometer and Azure Sentiment Analysis Tools [Dataset]. http://doi.org/10.6084/m9.figshare.24539410.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24539410.v1
Dataset updated
Nov 9, 2023
Dataset provided by
figshare
Authors
Siddhant Jaydeep Mahajani; Shashank Srivastava; Alan Smeaton
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Lexicon-based approaches to sentiment analysis of text are based on each word or lexical entry having a pre-definedweight indicating its sentiment polarity. We compute sentiment for more than 150,000 English language texts drawn from 4 domains using the Hedonometer, a lexicon-based technique and Azure, a contemporary machine-learning based approach. We model differences in sentiment scores between approaches for documents in each domain using a regression and analyse the independent variables (Hedonometer lexical entries) as indicators of each word's importance and contribution to the score differences.1. Finance Data: This dataset contains 5,000 records of different financial news texts from company press reviews and news headlines.2. News Headlines Data: This dataset consists of 50,000 news headlines for the period of 8 months (November 2015 to July 2016) on four different topics: Economy, Microsoft, Obama, and Palestine.3. IMDb Dataset: This dataset consists of 50,000 reviews posted by customers on the online IMDb platform which is an International Movie Database platform.4. Twitter Dataset: This dataset consists of almost 40,000 tweets from users around the globe on every thing.5. Hedonometer Bag of Words: This is the bag of words used to perform sentiment analysis using traditional lexicon approach which consists of 10,223 words with their respective happiness score. The actual file can be downloaded from here: https://hedonometer.org/words/labMT-en-v2/6. Combined p-values results: This is the result file which was generated once we performed sentiment analysis on all the above domains and only identified words that are present in the hedonometer sheet. The sheet consists of the words and their respective happiness score and their p-values on all different domains.7. Data visualisations: This is the visualisation code base in Tableau which was used to generate visualisations.
f
Seven-element emotion classification algorithm on event-related microblog...
plos.figshare.com
xls
Updated Jun 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mingyang Wang; Huan Wu; Tianyu Zhang; Shengqing Zhu (2023). Seven-element emotion classification algorithm on event-related microblog texts. [Dataset]. http://doi.org/10.1371/journal.pone.0241355.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0241355.t002
Dataset updated
Jun 5, 2023
Dataset provided by
PLOS ONE
Authors
Mingyang Wang; Huan Wu; Tianyu Zhang; Shengqing Zhu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Seven-element emotion classification algorithm on event-related microblog texts.
g
Multimodal Sentiment Analysis Dataset
gts.ai
json
Updated Jun 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GTS (2024). Multimodal Sentiment Analysis Dataset [Dataset]. https://gts.ai/dataset-download/multimodal-sentiment-analysis-dataset/
Explore at:
jsonAvailable download formats
Dataset updated
Jun 28, 2024
Dataset provided by
GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
Authors
GTS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Explore our unique Multimodal Sentiment Analysis Dataset, featuring high-quality images and corresponding text descriptions with sentiment labels.
Twitter Tweets Sentiment Dataset
kaggle.com
Updated Apr 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
M Yasser H (2022). Twitter Tweets Sentiment Dataset [Dataset]. https://www.kaggle.com/datasets/yasserh/twitter-tweets-sentiment-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 8, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
M Yasser H
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
https://raw.githubusercontent.com/Masterx-AI/Project_Twitter_Sentiment_Analysis_/main/twitt.jpg" alt="">

Description:

Twitter is an online Social Media Platform where people share their their though as tweets. It is observed that some people misuse it to tweet hateful content. Twitter is trying to tackle this problem and we shall help it by creating a strong NLP based-classifier model to distinguish the negative tweets & block such tweets. Can you build a strong classifier model to predict the same?

Each row contains the text of a tweet and a sentiment label. In the training set you are provided with a word or phrase drawn from the tweet (selected_text) that encapsulates the provided sentiment.

Make sure, when parsing the CSV, to remove the beginning / ending quotes from the text field, to ensure that you don't include them in your training.

You're attempting to predict the word or phrase from the tweet that exemplifies the provided sentiment. The word or phrase should include all characters within that span (i.e. including commas, spaces, etc.)

Columns:

textID - unique ID for each piece of text

text - the text of the tweet

sentiment - the general sentiment of the tweet

Acknowledgement:

The dataset is download from Kaggle Competetions:
https://www.kaggle.com/c/tweet-sentiment-extraction/data?select=train.csv

Objective:

Understand the Dataset & cleanup (if required).

Build classification models to predict the twitter sentiments.

Compare the evaluation metrics of vaious classification algorithms.
m
Product Reviews Dataset for Emotions Classification Tasks - Indonesian...
data.mendeley.com
Updated May 19, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rhio Sutoyo (2022). Product Reviews Dataset for Emotions Classification Tasks - Indonesian (PRDECT-ID) Dataset [Dataset]. http://doi.org/10.17632/574v66hf2v.1
Explore at:
Unique identifier
https://doi.org/10.17632/574v66hf2v.1
Dataset updated
May 19, 2022
Authors
Rhio Sutoyo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PRDECT-ID Dataset is a collection of Indonesian product review data annotated with emotion and sentiment labels. The data were collected from one of the giant e-commerce in Indonesia named Tokopedia. The dataset contains product reviews from 29 product categories on Tokopedia that use the Indonesian language. Each product review is annotated with a single emotion, i.e., love, happiness, anger, fear, or sadness. The group of annotators does the annotation process to provide emotion labels by following the emotions annotation criteria created by an expert in clinical psychology. Other attributes related to the product review are also extracted, such as Location, Price, Overall Rating, Number Sold, Total Review, and Customer Rating, to support further research.
h
turkish-sentiment-analysis-dataset
huggingface.co
Updated Jun 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Batuhan (2022). turkish-sentiment-analysis-dataset [Dataset]. https://huggingface.co/datasets/winvoker/turkish-sentiment-analysis-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 21, 2022
Authors
Batuhan
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset

This dataset contains positive , negative and notr sentences from several data sources given in the references. In the most sentiment models , there are only two labels; positive and negative. However , user input can be totally notr sentence. For such cases there were no data I could find. Therefore I created this dataset with 3 class. Positive and negative sentences are listed below. Notr examples are extraced from turkish wiki dump. In addition, added some random text… See the full description on the dataset page: https://huggingface.co/datasets/winvoker/turkish-sentiment-analysis-dataset.
Z
EmoLit
data.niaid.nih.gov
zenodo.org
Updated Jun 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rei, Luis (2023). EmoLit [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7883953
Explore at:
Dataset updated
Jun 27, 2023
Dataset authored and provided by
Rei, Luis
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Emotions in Literature

Description Literature sentences from Project Gutenberg. 38 emotion labels (+neutral examples). Semi-Supervised dataset.

Article

Detecting Fine-Grained Emotions in Literature

Please cite:

@Article{app13137502, AUTHOR = {Rei, Luis and Mladenić, Dunja}, TITLE = {Detecting Fine-Grained Emotions in Literature}, JOURNAL = {Applied Sciences}, VOLUME = {13}, YEAR = {2023}, NUMBER = {13}, ARTICLE-NUMBER = {7502}, URL = {https://www.mdpi.com/2076-3417/13/13/7502}, ISSN = {2076-3417}, DOI = {10.3390/app13137502} }

Abstract

Emotion detection in text is a fundamental aspect of affective computing and is closely linked to natural language processing. Its applications span various domains, from interactive chatbots to marketing and customer service. This research specifically focuses on its significance in literature analysis and understanding. To facilitate this, we present a novel approach that involves creating a multi-label fine-grained emotion detection dataset, derived from literary sources. Our methodology employs a simple yet effective semi-supervised technique. We leverage textual entailment classification to perform emotion-specific weak-labeling, selecting examples with the highest and lowest scores from a large corpus. Utilizing these emotion-specific datasets, we train binary pseudo-labeling classifiers for each individual emotion. By applying this process to the selected examples, we construct a multi-label dataset. Using this dataset, we train models and evaluate their performance within a traditional supervised setting. Our model achieves an F1 score of 0.59 on our labeled gold set, showcasing its ability to effectively detect fine-grained emotions. Furthermore, we conduct evaluations of the model's performance in zero- and few-shot transfer scenarios using benchmark datasets. Notably, our results indicate that the knowledge learned from our dataset exhibits transferability across diverse data domains, demonstrating its potential for broader applications beyond emotion detection in literature. Our contribution thus includes a multi-label fine-grained emotion detection dataset built from literature, the semi-supervised approach used to create it, as well as the models trained on it. This work provides a solid foundation for advancing emotion detection techniques and their utilization in various scenarios, especially within the cultural heritage analysis.

Labels

admiration: finds something admirable, impressive or worthy of respect

amusement: finds something funny, entertaining or amusing

anger: is angry, furious, or strongly displeased; displays ire, rage, or wrath

annoyance: is annoyed or irritated

approval: expresses a favorable opinion, approves, endorses or agrees with something or someone

boredom: feels bored, uninterested, monotony, tedium

calmness: is calm, serene, free from agitation or disturbance, experiences emotional tranquility

caring: cares about the well-being of someone else, feels sympathy, compassion, affectionate concern towards someone, displays kindness or generosity

courage: feels courage or the ability to do something that frightens one, displays fearlessness or bravery

curiosity: is interested, curious, or has strong desire to learn something

desire: has a desire or ambition, wants something, wishes for something to happen

despair: feels despair, helpless, powerless, loss or absence of hope, desperation, despondency

disappointment: feels sadness or displeasure caused by the non-fulfillment of hopes or expectations, being or let down, expresses regret due to the unfavorable outcome of a decision

disapproval: expresses an unfavorable opinion, disagrees or disapproves of something or someone

disgust: feels disgust, revulsion, finds something or someone unpleasant, offensive or hateful

doubt: has doubt or is uncertain about something, bewildered, confused, or shows lack of understanding

embarrassment: feels embarrassed, awkward, self-conscious, shame, or humiliation

envy: is covetous, feels envy or jealousy; begrudges or resents someone for their achievements, possessions, or qualities

excitement: feels excitement or great enthusiasm and eagerness

faith: expresses religious faith, has a strong belief in the doctrines of a religion, or trust in god

fear: is afraid or scared due to a threat, danger, or harm

frustration: feels frustrated: upset or annoyed because of inability to change or achieve something

gratitude: is thankful or grateful for something

greed: is greedy, rapacious, avaricious, or has selfish desire to acquire or possess more than what one needs

grief: feels grief or intense sorrow, or grieves for someone who has died

guilt: feels guilt, remorse, or regret to have committed wrong or failed in an obligation

indifference: is uncaring, unsympathetic, uncharitable, or callous, shows indifference, lack of concern, coldness towards someone

joy: is happy, feels joy, great pleasure, elation, satisfaction, contentment, or delight

love: feels love, strong affection, passion, or deep romantic attachment for someone

nervousness: feels nervous, anxious, worried, uneasy, apprehensive, stressed, troubled or tense

nostalgia: feels nostalgia, longing or wistful affection for the past, something lost, or for a period in one's life, feels homesickness, a longing for one's home, city, or country while being away; longing for a familiar place

optimism: feels optimism or hope, is hopeful or confident about the future, that something good may happen, or the success of something - pain: feels physical pain or is experiences physical suffering

pride: is proud, feels pride from one's own achievements, self-fulfillment, or from the achievements of those with whom one is closely associated, or from qualities or possessions that are widely admired

relief: feels relaxed, relief from tension or anxiety

sadness: feels sadness, sorrow, unhappiness, depression, dejection

surprise: is surprised, astonished or shocked by something unexpected

trust: trusts or has confidence in someone, or believes that someone is good, honest, or reliable

Dataset

EmoLit (Zenodo)

Code

EmoLit Train (Github)

Models

LARGE

BASE

DISTILL

Facebook

Twitter

Click to copy link

Link copied

Cite

CUBIG (2025). emotion analysis based on text Dataset [Dataset]. https://cubig.ai/store/products/139/emotion-analysis-based-on-text-dataset

emotion analysis based on text Dataset

Explore at:

10 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Feb 25, 2025

Dataset authored and provided by

CUBIG

License

https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

Measurement technique

Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy

Description

1) Data introduction • Emotion-analysis dataset is data for analyzing the emotions of text.

2) Data utilization (1) Emotion-analysis data has characteristics that: • Contains a variety of texts that convey emotions ranging from happiness to anger to sadness. The goal is to build an efficient model for detecting emotions in text. (2) Emotion-analysis data can be used to: • Sentiment classification models: This dataset can be used to train machine learning models that classify text based on sentiment, which helps companies and researchers understand public opinion and sentiment trends. • Market research: Researchers can analyze sentiment data to understand consumer preferences and market trends and support data-driven decision making.

Clear search

Close search

Google apps

Main menu

emotion analysis based on text Dataset

BanglaEmotion: A Benchmark Dataset for Bangla Textual Emotion Analysis

emotion

Sentiment Analysis Dataset

A Sentiment Analysis Dataset for Code-Mixed Malayalam-English

Text Emotion Recognition

Emotions Analytics (EA) Software Report

Data for manuscript: "Longitudinal Analysis of Sentiment and Emotion in News...

Data from: An emotion analysis dataset of course comment texts in massive...

multiclass-sentiment-analysis-dataset

Emotion Recognition and Sentiment Analysis Software Market Analysis North...

Snapshot img

Emotion Recognition and Sentiment Analysis Software Market Report

Datasets for Sentiment Analysis

Four Text Datasets Used For Comparison Between Hedonometer and Azure...

Seven-element emotion classification algorithm on event-related microblog...

Multimodal Sentiment Analysis Dataset

Twitter Tweets Sentiment Dataset

Description:

Columns:

Acknowledgement:

Objective:

Product Reviews Dataset for Emotions Classification Tasks - Indonesian...

turkish-sentiment-analysis-dataset

EmoLit

Description Literature sentences from Project Gutenberg. 38 emotion labels (+neutral examples). Semi-Supervised dataset.

Article

Abstract

Labels

Dataset

Code

Models

emotion analysis based on text Dataset