https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data introduction • Emotion-analysis dataset is data for analyzing the emotions of text.
2) Data utilization (1) Emotion-analysis data has characteristics that: • Contains a variety of texts that convey emotions ranging from happiness to anger to sadness. The goal is to build an efficient model for detecting emotions in text. (2) Emotion-analysis data can be used to: • Sentiment classification models: This dataset can be used to train machine learning models that classify text based on sentiment, which helps companies and researchers understand public opinion and sentiment trends. • Market research: Researchers can analyze sentiment data to understand consumer preferences and market trends and support data-driven decision making.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We present a manually annotated Bangla Emotion corpus, which incorporates the diversity of fine-grained emotion expressions in social-media text. We tried to consider more fine-grained emotion labels such as Sadness, Happiness, Disgust, Surprise, Fear and Anger - which are, according to Paul Ekman (1999), the six basic emotion categories. For this task, we collected a large amount of raw text data from the user’s comments on two different Facebook groups (Ekattor TV and Airport Magistrates) and from the public post of a popular blogger and activist Dr. Imran H Sarker. These comments are mostly reactions to ongoing socio-political issues and towards the economic success and failure of Bangladesh. We scrape a total of 32923 comments from the three sources aforementioned above. Out of these, a total of 6314 comments were annotated into the six categories. The distribution of the annotated corpus is as follows:
sad = 1341 happy = 1908 disgust = 703 surprise = 562 fear = 384 angry = 1416
We have also provided a balanced set from the above data and split the dataset into training and test set of equal ratio. We considered a proportion of 5:1 for training and evaluation purpose. More information on the dataset and the experiments on it could be found in our paper (related links below).
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Dataset Card for "emotion"
Dataset Summary
Emotion is a dataset of English Twitter messages with six basic emotions: anger, fear, joy, love, sadness, and surprise. For more detailed information please refer to the paper.
Supported Tasks and Leaderboards
More Information Needed
Languages
More Information Needed
Dataset Structure
Data Instances
An example looks as follows. { "text": "im feeling quite sad and sorry for myself but… See the full description on the dataset page: https://huggingface.co/datasets/dair-ai/emotion.
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Sentiment Analysis Dataset is a dataset for emotional analysis, including large-scale tweet text collected from Twitter and emotional polarity (0=negative, 2=neutral, 4=positive) labels for each tweet, featuring automatic labeling based on emoticons.
2) Data Utilization (1) Sentiment Analysis Dataset has characteristics that: • Each sample consists of six columns: emotional polarity, tweet ID, date of writing, search word, author, and tweet body, and is suitable for training natural language processing and classification models using tweet text and emotion labels. (2) Sentiment Analysis Dataset can be used to: • Emotional Classification Model Development: Using tweet text and emotional polarity labels, we can build positive, negative, and neutral emotional automatic classification models with various machine learning and deep learning models such as logistic regression, SVM, RNN, and LSTM. • Analysis of SNS public opinion and trends: By analyzing the distribution of emotions by time series and keywords, you can explore changes in public opinion on specific issues or brands, positive and negative trends, and key emotional keywords.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
There is an increasing demand for sentiment analysis of text from social media which are mostly code-mixed. Systems trained on monolingual data fail for code-mixed data due to the complexity of mixing at different levels of the text. However, very few resources are available for code-mixed data to create models specific for this data. Although much research in multilingual and cross-lingual sentiment analysis has used semi-supervised or unsupervised methods, supervised methods still performs better. Only a few datasets for popular languages such as English-Spanish, English-Hindi, and English-Chinese are available. There are no resources available for Malayalam-English code-mixed data. This paper presents a new gold standard corpus for sentiment analysis of code-mixed text in Malayalam-English annotated by voluntary annotators. This gold standard corpus obtained a Krippendorff’s alpha above 0.8 for the dataset. We use this new corpus to provide the benchmark for sentiment analysis in Malayalam-English code-mixed texts.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Emotions play a vital role in human communication, and detecting emotions from text data is a challenging task. The ability to automatically recognize emotions from text has many practical applications, such as in sentiment analysis, social media monitoring, and customer feedback analysis.
In this project, we will discuss the working principle of a text emotion recognition model and its important terminologies. We will also provide a detailed description of the model architecture used and its training process. Finally, we will conclude by evaluating the model using confusion matrix and classification report. Here, in the "emotions" column 0: sad 1: happy
slang.txt in Abbreviations step can be taken from: https://www.kaggle.com/datasets/mansis97/slangs
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The Emotions Analytics (EA) Software market is experiencing robust growth, driven by increasing demand for personalized customer experiences, advancements in artificial intelligence (AI) and machine learning (ML), and the rising adoption of digital channels across various industries. The market, estimated at $2 billion in 2025, is projected to exhibit a Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033, reaching approximately $7 billion by 2033. This expansion is fueled by several key factors. Firstly, businesses are leveraging EA to gain deeper insights into consumer behavior, enabling more effective marketing strategies, product development, and customer service improvements. Secondly, the sophistication of EA technology continues to improve, with more accurate emotion detection capabilities and the integration of diverse data sources (facial expressions, voice tone, text analysis) resulting in more comprehensive and reliable insights. Finally, growing regulatory requirements concerning data privacy and ethical considerations are driving demand for robust and compliant EA solutions. However, the market's growth is not without its challenges. High initial investment costs for implementing EA systems and the need for specialized expertise to interpret and analyze the collected data can act as significant barriers to entry for smaller businesses. Moreover, concerns surrounding data privacy and the potential for misuse of emotionally sensitive information remain important hurdles that need to be addressed through transparent data handling practices and robust ethical guidelines. The competitive landscape is characterized by a mix of large established technology firms like Microsoft and IBM, alongside innovative specialized companies like iMotions and Affectiva, fostering a dynamic market environment with varied technological approaches and service offerings. Future growth will depend on continued technological advancements, the development of robust ethical frameworks, and increased awareness of the value proposition of EA across diverse sectors.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set contains automated sentiment and emotionality annotations of 23 million headlines from 47 popular news media outlets popular in the United States.
The set of 47 news media outlets analysed (listed in Figure 1 of the main manuscript) was derived from the AllSides organization 2019 Media Bias Chart v1.1. The human ratings of outlets’ ideological leanings were also taken from this chart and are listed in Figure 2 of the main manuscript.
News articles headlines from the set of outlets analyzed in the manuscript are available in the outlets’ online domains and/or public cache repositories such as The Internet Wayback Machine, Google cache and Common Crawl. Articles headlines were located in articles’ HTML raw data using outlet-specific XPath expressions.
The temporal coverage of headlines across news outlets is not uniform. For some media organizations, news articles availability in online domains or Internet cache repositories becomes sparse for earlier years. Furthermore, some news outlets popular in 2019, such as The Huffington Post or Breitbart, did not exist in the early 2000’s. Hence, our data set is sparser in headlines sample size and representativeness for earlier years in the 2000-2019 timeline. Nevertheless, 20 outlets in our data set have chronologically continuous partial or full headline data availability since the year 2000. Figure S 1 in the SI reports the number of headlines per outlet and per year in our analysis.
In a small percentage of articles, outlet specific XPath expressions might fail to properly capture the content of the headline due to the heterogeneity of HTML elements and CSS styling combinations with which articles text content is arranged in outlets online domains. After manual testing, we determined that the percentage of headlines following in this category is very small. Additionally, our method might miss detecting some articles in the online domains of news outlets. To conclude, in a data analysis of over 23 million headlines, we cannot manually check the correctness of every single data instance and hundred percent accuracy at capturing headlines’ content is elusive due to the small number of difficult to detect boundary cases such as incorrect HTML markup syntax in online domains. Overall however, we are confident that our headlines set is representative of headlines in print news media content for the studied time period and outlets analyzed.
The list of compressed files in this data set is listed next:
-analysisScripts.rar contains the analysis scripts used in the main manuscript as well as aggregated data of sentiment and emotionality automated annotations of the headlines and human annotations of a subset of headlines sentiment and emotionality used as ground truth.
-models.rar contains the Transformer sentiment and emotion annotation models used in the analysis. Namely:
Siebert/sentiment-roberta-large-english from https://huggingface.co/siebert/sentiment-roberta-large-english. This model is a fine-tuned checkpoint of RoBERTa-large (Liu et al. 2019). It enables reliable binary sentiment analysis for various types of English-language text. For each instance, it predicts either positive (1) or negative (0) sentiment. The model was fine-tuned and evaluated on 15 data sets from diverse text sources to enhance generalization across different types of texts (reviews, tweets, etc.). See more information from the original authors at https://huggingface.co/siebert/sentiment-roberta-large-english
DistilbertSST2.rar is the default sentiment classification model of the HuggingFace Transformer library https://huggingface.co/ This model is only used to replicate the results of the sentiment analysis with sentiment-roberta-large-english
DistilRoberta j-hartmann/emotion-english-distilroberta-base from https://huggingface.co/j-hartmann/emotion-english-distilroberta-base. The model is a fine-tuned checkpoint of DistilRoBERTa-base. The model allows annotation of English text with Ekman's 6 basic emotions, plus a neutral class. The model was trained on 6 diverse datasets. Please refer to the original author at https://huggingface.co/j-hartmann/emotion-english-distilroberta-base for an overview of the data sets used for fine tuning. https://huggingface.co/j-hartmann/emotion-english-distilroberta-base
-headlinesDataWithSentimentLabelsAnnotationsFromSentimentRobertaLargeModel.rar URLs of headlines analyzed and the sentiment annotations of the siebert/sentiment-roberta-large-english Transformer model. https://huggingface.co/siebert/sentiment-roberta-large-english
-headlinesDataWithSentimentLabelsAnnotationsFromDistilbertSST2.rar URLs of headlines analyzed and the sentiment annotations of the default HuggingFace sentiment analysis model fine-tuned on the SST-2 dataset. https://huggingface.co/
-headlinesDataWithEmotionLabelsAnnotationsFromDistilRoberta.rar URLs of headlines analyzed and the emotion categories annotations of the j-hartmann/emotion-english-distilroberta-base Transformer model. https://huggingface.co/j-hartmann/emotion-english-distilroberta-base
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Datasets are critical for emotion analysis in the machine learning field. This study aims to explore emotion analysis datasets and related benchmarks in online learning, since, currently, there are very few studies that explore the same. We have scientifically labeled the topic and nine-category emotion of 4715 comment texts in online learning platforms using the “three-person voting label method” based on the “sentence-level” and multi-category labeling dimensions with our self-developed system. After testing the consistency of the labeling results using the Fleiss Kappa method, we found that the consistency of the dataset was about 0.51, representing a moderate strength of agreement. Based on the dataset, the prediction accuracy of the Long-Short Term Memory (LSTM) method is about 0.68. This dataset provides a benchmark for the multi- category emotion dataset in the Chinese online learning field. It can provide a basis for the subsequent solution of emotion analysis, monitoring, and intervention in the education field. It can also provide a reference for constructing subsequent datasets in the education field. We need to remind you that this is a Chinese dataset. If you want to use this dataset, please contact the author and you should request for the dataset below.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for Dataset Name
Dataset Summary
This dataset card aims to be a base template for new datasets. It has been generated using this raw template.
Supported Tasks and Leaderboards
[More Information Needed]
Languages
[More Information Needed]
Dataset Structure
Data Instances
[More Information Needed]
Data Fields
[More Information Needed]
Data Splits
[More Information Needed]
Dataset Creation… See the full description on the dataset page: https://huggingface.co/datasets/Sp1786/multiclass-sentiment-analysis-dataset.
Emotion Recognition and Sentiment Analysis Software Market Size 2024-2028
The emotion recognition and sentiment analysis software market size is forecast to increase by USD 797.17 million at a CAGR of 14.15% between 2023 and 2028.
The market is experiencing significant growth, driven by the increasing popularity of wearable devices and the adoption of real-time sensing analysis. These technologies enable more accurate and timely emotion recognition, providing valuable insights for various applications, including healthcare, marketing, and customer service. However, the market faces challenges, most notably the issue of low-quality video content hampering emotional interpretation. Regulatory hurdles also impact adoption, as organizations navigate complex data privacy and security regulations.
To capitalize on market opportunities and navigate challenges effectively, companies must focus on improving data quality, investing in advanced algorithms, and addressing regulatory requirements. By doing so, they can differentiate themselves in a competitive landscape and drive innovation in the market.
What will be the Size of the Emotion Recognition and Sentiment Analysis Software Market during the forecast period?
Request Free Sample
The market is experiencing significant growth, driven by the increasing adoption of conversational AI and virtual assistants. This technology enables the analysis of both textual and multimedia data, including audio and video, to extract emotional insights from user interactions. Data mining techniques, such as predictive modeling and model deployment, play a crucial role in processing and interpreting this data. Sentiment analysis dashboards and emotion recognition dashboards provide valuable insights into user experience, allowing businesses to map and optimize both the employee and customer journey. Cognitive computing and cognitive AI technologies are also integral to this market, enabling real-time analysis of user behavior and feedback.
Data ethics and responsible AI are becoming increasingly important considerations in this market, with a focus on data governance and model training to ensure accurate and explainable AI. Biometric data and behavioral data are also being leveraged to enhance the capabilities of emotion recognition systems, further expanding their applications. Model evaluation and model training are essential components of this market, ensuring the accuracy and effectiveness of AI models. Interpretable AI and explainable AI are also gaining traction, enabling businesses to understand the reasoning behind AI decisions and build trust in the technology. Data annotation and data annotation tools are critical for training AI models, ensuring high-quality data and accurate sentiment analysis.
Overall, the market is poised for continued growth, offering businesses valuable insights into user emotions and improving the user experience.
How is this Emotion Recognition and Sentiment Analysis Software Industry segmented?
The emotion recognition and sentiment analysis software industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.
Application
Customer service/experience
Product/market research
Patient diagnosis
Others
Deployment
On-premises
Cloud-based
Geography
North America
US
Europe
Germany
UK
APAC
China
Japan
Rest of World (ROW)
By Application Insights
The customer service/experience segment is estimated to witness significant growth during the forecast period.
Emotion AI technology, integrated with sentiment analysis tools, is revolutionizing business operations by enabling real-time understanding of customer emotions and feedback. These solutions utilize machine learning, natural language processing, and computer vision to analyze text, voice, and facial expressions for sentiment scoring, emotion classification, and polarity analysis. Emotion lexicons and sentiment lexicons are used to identify and categorize emotions, while deep learning and predictive analytics provide insights into historical trends. Sentiment analysis plays a crucial role in various industries, including human resources for employee engagement and feedback analysis, fraud detection, and brand reputation management. It is also used in customer service to enhance customer experience through personalized communication and proactive issue resolution.
Social media monitoring and text analysis help businesses stay updated on brand mentions and customer sentiments, while voice analysis and tone analysis provide valuable insights from customer interactions. Integration with APIs, cloud computing, and data visualization tools streamlines the process, allowing for seamless im
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
The Emotion Recognition and Sentiment Analysis Software Market is experiencing robust growth, projected to reach $849.76 million in 2025 and maintain a Compound Annual Growth Rate (CAGR) of 14.15% from 2025 to 2033. This expansion is fueled by several key drivers. Increasing adoption of AI-powered solutions across diverse sectors, including customer service, market research, and healthcare (patient diagnosis), is a primary factor. Businesses leverage these tools to gain valuable insights into customer preferences, improve product development, and personalize user experiences. The rise of cloud-based deployment models further accelerates market growth, offering scalability, cost-effectiveness, and enhanced accessibility. Furthermore, the growing need for effective brand monitoring and reputation management, particularly on social media, is driving demand for sentiment analysis tools. While data privacy concerns and ethical considerations surrounding emotion recognition technology pose certain restraints, the overall market outlook remains exceptionally positive. The market is segmented by application (customer service/experience, product/market research, patient diagnosis, others) and deployment (on-premises, cloud-based), reflecting the diverse use cases and deployment preferences of different industries. North America currently holds a significant market share, driven by early adoption and technological advancements. However, APAC is expected to exhibit substantial growth in the coming years, fueled by increasing digitalization and a burgeoning tech industry in countries like China and Japan. Leading companies are focusing on strategic partnerships, acquisitions, and the development of innovative solutions to maintain a competitive edge in this rapidly evolving landscape. The competitive landscape is characterized by a mix of established tech giants like Microsoft and IBM alongside specialized emotion AI companies. The market’s success hinges on the continuous improvement of algorithm accuracy, addressing ethical concerns, and ensuring responsible data handling. Future growth will depend on advancements in deep learning and computer vision, enabling more nuanced and accurate emotion recognition across various modalities, including facial expressions, voice tone, and text analysis. Addressing data bias and ensuring compliance with data privacy regulations are crucial for sustainable growth. The market's segmentation reflects its adaptability across various industries, underscoring its potential for widespread application and sustained expansion throughout the forecast period.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.
Below are the datasets specified, along with the details of their references, authors, and download sources.
----------- STS-Gold Dataset ----------------
The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.
Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.
File name: sts_gold_tweet.csv
----------- Amazon Sales Dataset ----------------
This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.
Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)
Features:
License: CC BY-NC-SA 4.0
File name: amazon.csv
----------- Rotten Tomatoes Reviews Dataset ----------------
This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.
This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).
Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics
File name: data_rt.csv
----------- Preprocessed Dataset Sentiment Analysis ----------------
Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.
The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).
DOI: 10.34740/kaggle/dsv/3877817
Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }
This dataset was used in the experimental phase of my research.
File name: EcoPreprocessed.csv
----------- Amazon Earphones Reviews ----------------
This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)
License: U.S. Government Works
Source: www.amazon.in
File name (original): AllProductReviews.csv (contains 14337 reviews)
File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)
----------- Amazon Musical Instruments Reviews ----------------
This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).
Source: http://jmcauley.ucsd.edu/data/amazon/
File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)
File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Lexicon-based approaches to sentiment analysis of text are based on each word or lexical entry having a pre-definedweight indicating its sentiment polarity. We compute sentiment for more than 150,000 English language texts drawn from 4 domains using the Hedonometer, a lexicon-based technique and Azure, a contemporary machine-learning based approach. We model differences in sentiment scores between approaches for documents in each domain using a regression and analyse the independent variables (Hedonometer lexical entries) as indicators of each word's importance and contribution to the score differences.1. Finance Data: This dataset contains 5,000 records of different financial news texts from company press reviews and news headlines.2. News Headlines Data: This dataset consists of 50,000 news headlines for the period of 8 months (November 2015 to July 2016) on four different topics: Economy, Microsoft, Obama, and Palestine.3. IMDb Dataset: This dataset consists of 50,000 reviews posted by customers on the online IMDb platform which is an International Movie Database platform.4. Twitter Dataset: This dataset consists of almost 40,000 tweets from users around the globe on every thing.5. Hedonometer Bag of Words: This is the bag of words used to perform sentiment analysis using traditional lexicon approach which consists of 10,223 words with their respective happiness score. The actual file can be downloaded from here: https://hedonometer.org/words/labMT-en-v2/6. Combined p-values results: This is the result file which was generated once we performed sentiment analysis on all the above domains and only identified words that are present in the hedonometer sheet. The sheet consists of the words and their respective happiness score and their p-values on all different domains.7. Data visualisations: This is the visualisation code base in Tableau which was used to generate visualisations.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Seven-element emotion classification algorithm on event-related microblog texts.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Explore our unique Multimodal Sentiment Analysis Dataset, featuring high-quality images and corresponding text descriptions with sentiment labels.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://raw.githubusercontent.com/Masterx-AI/Project_Twitter_Sentiment_Analysis_/main/twitt.jpg" alt="">
Twitter is an online Social Media Platform where people share their their though as tweets. It is observed that some people misuse it to tweet hateful content. Twitter is trying to tackle this problem and we shall help it by creating a strong NLP based-classifier model to distinguish the negative tweets & block such tweets. Can you build a strong classifier model to predict the same?
Each row contains the text of a tweet and a sentiment label. In the training set you are provided with a word or phrase drawn from the tweet (selected_text) that encapsulates the provided sentiment.
Make sure, when parsing the CSV, to remove the beginning / ending quotes from the text field, to ensure that you don't include them in your training.
You're attempting to predict the word or phrase from the tweet that exemplifies the provided sentiment. The word or phrase should include all characters within that span (i.e. including commas, spaces, etc.)
The dataset is download from Kaggle Competetions:
https://www.kaggle.com/c/tweet-sentiment-extraction/data?select=train.csv
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PRDECT-ID Dataset is a collection of Indonesian product review data annotated with emotion and sentiment labels. The data were collected from one of the giant e-commerce in Indonesia named Tokopedia. The dataset contains product reviews from 29 product categories on Tokopedia that use the Indonesian language. Each product review is annotated with a single emotion, i.e., love, happiness, anger, fear, or sadness. The group of annotators does the annotation process to provide emotion labels by following the emotions annotation criteria created by an expert in clinical psychology. Other attributes related to the product review are also extracted, such as Location, Price, Overall Rating, Number Sold, Total Review, and Customer Rating, to support further research.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Dataset
This dataset contains positive , negative and notr sentences from several data sources given in the references. In the most sentiment models , there are only two labels; positive and negative. However , user input can be totally notr sentence. For such cases there were no data I could find. Therefore I created this dataset with 3 class. Positive and negative sentences are listed below. Notr examples are extraced from turkish wiki dump. In addition, added some random text… See the full description on the dataset page: https://huggingface.co/datasets/winvoker/turkish-sentiment-analysis-dataset.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Emotions in Literature
Detecting Fine-Grained Emotions in Literature
Please cite:
@Article{app13137502, AUTHOR = {Rei, Luis and Mladenić, Dunja}, TITLE = {Detecting Fine-Grained Emotions in Literature}, JOURNAL = {Applied Sciences}, VOLUME = {13}, YEAR = {2023}, NUMBER = {13}, ARTICLE-NUMBER = {7502}, URL = {https://www.mdpi.com/2076-3417/13/13/7502}, ISSN = {2076-3417}, DOI = {10.3390/app13137502} }
Emotion detection in text is a fundamental aspect of affective computing and is closely linked to natural language processing. Its applications span various domains, from interactive chatbots to marketing and customer service. This research specifically focuses on its significance in literature analysis and understanding. To facilitate this, we present a novel approach that involves creating a multi-label fine-grained emotion detection dataset, derived from literary sources. Our methodology employs a simple yet effective semi-supervised technique. We leverage textual entailment classification to perform emotion-specific weak-labeling, selecting examples with the highest and lowest scores from a large corpus. Utilizing these emotion-specific datasets, we train binary pseudo-labeling classifiers for each individual emotion. By applying this process to the selected examples, we construct a multi-label dataset. Using this dataset, we train models and evaluate their performance within a traditional supervised setting. Our model achieves an F1 score of 0.59 on our labeled gold set, showcasing its ability to effectively detect fine-grained emotions. Furthermore, we conduct evaluations of the model's performance in zero- and few-shot transfer scenarios using benchmark datasets. Notably, our results indicate that the knowledge learned from our dataset exhibits transferability across diverse data domains, demonstrating its potential for broader applications beyond emotion detection in literature. Our contribution thus includes a multi-label fine-grained emotion detection dataset built from literature, the semi-supervised approach used to create it, as well as the models trained on it. This work provides a solid foundation for advancing emotion detection techniques and their utilization in various scenarios, especially within the cultural heritage analysis.
admiration: finds something admirable, impressive or worthy of respect
amusement: finds something funny, entertaining or amusing
anger: is angry, furious, or strongly displeased; displays ire, rage, or wrath
annoyance: is annoyed or irritated
approval: expresses a favorable opinion, approves, endorses or agrees with something or someone
boredom: feels bored, uninterested, monotony, tedium
calmness: is calm, serene, free from agitation or disturbance, experiences emotional tranquility
caring: cares about the well-being of someone else, feels sympathy, compassion, affectionate concern towards someone, displays kindness or generosity
courage: feels courage or the ability to do something that frightens one, displays fearlessness or bravery
curiosity: is interested, curious, or has strong desire to learn something
desire: has a desire or ambition, wants something, wishes for something to happen
despair: feels despair, helpless, powerless, loss or absence of hope, desperation, despondency
disappointment: feels sadness or displeasure caused by the non-fulfillment of hopes or expectations, being or let down, expresses regret due to the unfavorable outcome of a decision
disapproval: expresses an unfavorable opinion, disagrees or disapproves of something or someone
disgust: feels disgust, revulsion, finds something or someone unpleasant, offensive or hateful
doubt: has doubt or is uncertain about something, bewildered, confused, or shows lack of understanding
embarrassment: feels embarrassed, awkward, self-conscious, shame, or humiliation
envy: is covetous, feels envy or jealousy; begrudges or resents someone for their achievements, possessions, or qualities
excitement: feels excitement or great enthusiasm and eagerness
faith: expresses religious faith, has a strong belief in the doctrines of a religion, or trust in god
fear: is afraid or scared due to a threat, danger, or harm
frustration: feels frustrated: upset or annoyed because of inability to change or achieve something
gratitude: is thankful or grateful for something
greed: is greedy, rapacious, avaricious, or has selfish desire to acquire or possess more than what one needs
grief: feels grief or intense sorrow, or grieves for someone who has died
guilt: feels guilt, remorse, or regret to have committed wrong or failed in an obligation
indifference: is uncaring, unsympathetic, uncharitable, or callous, shows indifference, lack of concern, coldness towards someone
joy: is happy, feels joy, great pleasure, elation, satisfaction, contentment, or delight
love: feels love, strong affection, passion, or deep romantic attachment for someone
nervousness: feels nervous, anxious, worried, uneasy, apprehensive, stressed, troubled or tense
nostalgia: feels nostalgia, longing or wistful affection for the past, something lost, or for a period in one's life, feels homesickness, a longing for one's home, city, or country while being away; longing for a familiar place
optimism: feels optimism or hope, is hopeful or confident about the future, that something good may happen, or the success of something - pain: feels physical pain or is experiences physical suffering
pride: is proud, feels pride from one's own achievements, self-fulfillment, or from the achievements of those with whom one is closely associated, or from qualities or possessions that are widely admired
relief: feels relaxed, relief from tension or anxiety
sadness: feels sadness, sorrow, unhappiness, depression, dejection
surprise: is surprised, astonished or shocked by something unexpected
trust: trusts or has confidence in someone, or believes that someone is good, honest, or reliable
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data introduction • Emotion-analysis dataset is data for analyzing the emotions of text.
2) Data utilization (1) Emotion-analysis data has characteristics that: • Contains a variety of texts that convey emotions ranging from happiness to anger to sadness. The goal is to build an efficient model for detecting emotions in text. (2) Emotion-analysis data can be used to: • Sentiment classification models: This dataset can be used to train machine learning models that classify text based on sentiment, which helps companies and researchers understand public opinion and sentiment trends. • Market research: Researchers can analyze sentiment data to understand consumer preferences and market trends and support data-driven decision making.