Facebook
TwitterGoEmotions
GoEmotions is a corpus of 58k carefully curated comments extracted from Reddit, with human annotations to 27 emotion categories or Neutral.
Number of examples: 58,009. Number of labels: 27 + Neutral. Maximum sequence length in training and evaluation datasets: 30.
On top of the raw data, we also include a version filtered based on reter-agreement, which contains a train/test/validation split:
Size of training dataset: 43,410. Size of test dataset: 5,427. Size of… See the full description on the dataset page: https://huggingface.co/datasets/mrm8488/goemotions.
Facebook
TwitterThe GoEmotions dataset contains 58k carefully curated Reddit comments labeled for 27 emotion categories or Neutral. The emotion categories are admiration, amusement, anger, annoyance, approval, caring, confusion, curiosity, desire, disappointment, disapproval, disgust, embarrassment, excitement, fear, gratitude, grief, joy, love, nervousness, optimism, pride, realization, relief, remorse, sadness, surprise.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('goemotions', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for GoEmotions
Dataset Summary
The GoEmotions dataset contains 58k carefully curated Reddit comments labeled for 27 emotion categories or Neutral. The raw data is included as well as the smaller, simplified version of the dataset with predefined train/val/test splits.
Supported Tasks and Leaderboards
This dataset is intended for multi-class, multi-label emotion classification.
Languages
The data is in English and Brazilian Portuguese… See the full description on the dataset page: https://huggingface.co/datasets/antoniomenezes/go_emotions_ptbr.
Facebook
TwitterGoEmotions 5-Point Sentiment Dataset
This dataset is a modified version of the GoEmotions dataset created by Google. The original dataset consists of 58k carefully curated Reddit comments labeled with 27 fine-grained emotion categories plus a neutral label.
📘 About This Version
This version maps the original GoEmotions emotion labels into a 5-point sentiment scale, making it more suitable for traditional sentiment analysis tasks:
Original Label(s) Mapped Sentiment… See the full description on the dataset page: https://huggingface.co/datasets/spacesedan/goemotions-5point-sentiment.
Facebook
TwitterThe is Google's GoEmotions dataset, which contains 27 categories of emotions on 56k English Reddit comments.
Facebook
TwitterThis dataset was created by Enes Ozturk
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Summary
The GoEmotions Cleaned dataset is a refined version of the original Google GoEmotions dataset. It has been cleaned, simplified, and reformatted for use in text classification tasks such as emotion detection, sentiment analysis, and multi-label emotion prediction. This version retains only two essential columns — text and label — making it ideal for model fine-tuning and experimentation with Transformer-based architectures.
Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/Keyurjotaniya007/go-emotions-cleaned.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
GoEmotions Ekman Emotions Dataset
Dataset Description
This dataset contains 10,000 text samples from Reddit comments mapped to the 7 basic Ekman emotions. It's derived from the original GoEmotions dataset and processed specifically for emotion classification research using Paul Ekman's fundamental emotion model.
Supported Tasks
Text Classification: Multi-class emotion classification Sentiment Analysis: Fine-grained emotion detection Psychology Research:… See the full description on the dataset page: https://huggingface.co/datasets/Frankhihi/goemotion-ekman-emotions.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset is a translation of the Google GoEmotions emotion classification dataset. All features remain unchanged, except for the addition of a new ru_text column containing the translated text in Russian. For the translation process, I used the Deep translator with the Google engine. You can find all the details about translation, raw .csv files and other stuff in this Github repository. For more information also check the official original dataset card.… See the full description on the dataset page: https://huggingface.co/datasets/seara/ru_go_emotions.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Textemotion detection constitutes a crucial foundation for advancing artificial intelligence from basic comprehension to the exploration of emotional reasoning. Most existing emotion detection datasets rely on manual annotations, which are associated with high costs, substantial subjectivity, and severe label imbalances. This is particularly evident in the inadequate annotation of micro-emotions and the absence of emotional intensity representation, which fail to capture the rich emotions embedded in sentences and adversely affect the quality of downstream task completion. By proposing an all-labels and training-set label regression method, we map label values to energy intensity levels, thereby fully leveraging the learning capabilities of machine models and the interdependencies among labels to uncover multiple emotions within samples. This led to the establishment of the Emotion Quantization Network (EQN) framework for micro-emotion detection and annotation. Using five commonly employed sentiment datasets, we conducted comparative experiments with various models, validating the broad applicability of our framework within NLP machine learning models. Based on the EQN framework, emotion detection and annotation are conducted on the GoEmotions dataset. A comprehensive comparison with the results from its literature demonstrates that the EQN framework possesses a high capability for automatic detection and annotation of micro-emotions. The EQN framework is the first to achieve automatic micro-emotion annotation with energy-level scores, providing strong support for further emotion detection analysis and the quantitative research of emotion computing.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Original dataset: GoEmotions dataset Added labels_ekman column with multi-label emotion annotations mapped to 7 base emotions as per Dr. Ekman theory. Column labels contains multi-label emotion annotations with 28 emotion labels as per GoEmotion dataset: 0: admiration 1: amusement 2: anger 3: annoyance 4: approval 5: caring 6: confusion 7: curiosity 8: desire 9: disappointment 10: disapproval 11: disgust 12: embarrassment 13: excitement 14: fear 15: gratitude 16: grief 17: joy 18: love 19:… See the full description on the dataset page: https://huggingface.co/datasets/SkyWater21/en_go_emotions.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Google AI GoEmotions dataset consists of comments from Reddit users with labels of their emotional coloring. GoEmotions is designed to train neural networks to perform deep analysis of the tonality of texts. Most of the existing emotion classification datasets cover certain areas (for example, news headlines and movie subtitles), are small in size and use a scale of only six basic emotions (anger, surprise, disgust, joy, fear, and sadness). The expansion of the emotional spectrum considered in datasets could make it possible to create more sensitive chatbots, models for detecting dangerous behavior on the Internet, as well as improve customer support services.
The categories of emotions were identified by Google together with psychologists and include 12 positive,, 11 negative, 4 ambiguous emotions, and 1 neutral, which makes the dataset suitable for solving tasks that require subtle differentiation between different emotions.
Source: https://arxiv.org/pdf/2005.00547.pdf Github: https://github.com/google-research/google-research/tree/master/goemotions
Facebook
TwitterOpen Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
This dataset is designed for building and evaluating sentiment and emotion classification models in Natural Language Processing (NLP). It includes two well-known datasets:
Each dataset is provided in CSV format and includes text samples along with corresponding emotion or sentiment labels.
This dataset is useful for:
Facebook
TwitterThis dataset was created by mtlsuda
Facebook
TwitterThis dataset was created by htchch123
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The EQN framework is a micro-emotion annotation and detection system that realizes the automatic micro-emotion annotation of text with energy level scores for the first time. The text emotion datasets it annotates are no longer simple single-label or multi-label, but macro-emotions and micro-emotions with continuous values of emotion intensity. The labeling of emotion datasets has changed from discrete to continuous. It plays an important role in the subtle research of emotions in fields such as emotional computing, human-computer alignment, humanoid robots, and psychology.This is the experimental result of the EQN micro-emotion detection and annotation framework we proposed, the train.csv of the Goemotions dataset with micro-emotion labels with energy level intensity valuesand the model trained on the Goemotions dataset based on the BERT model. Attached is the micro-emotion annotation code based on pytorch, which can be used to annotate the Goemotions dataset by yourself, or predict the emotion classification based on the annotation results. For the specific implementation method, please refer to our paperNote:1. gotrainadd.csv: Goemotions dataset with additional annotation (micro-emotion labels with energy level intensity values(0-10)).2. 28pd.py: Micro-emotion detection and annotation code based on pytorch.3. 55770-1.pth: Model trained on the Goemotions dataset based on the BERT model (emotion energy level intensity is a value between 0-1).4. Goemotions dataset: Data and code available at https://github.com/google-research/google-research/tree/master/goemotionsThe experimental environment of this project.GPU:NVIDIA GeForce RTX 3090 GPUBert-base-cased pre-trained model: https://huggingface.co/google-bert/bert-base-casedpython=3.7,pytorch=1.9.0,cudatoolkit=11.3.1,cudnn=8.9.7.29.Instructions for use:1. Refer to our usage environment instructions and install the operating environment.2. Download our EQN-model.3. Change the loading model name in 28pd.py to the actual name of the downloaded EQN-model.4. Create a directory named "28pd" to place the .csv format data files to be labeled or predicted.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Yamini Suman
Released under MIT
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison of annotation results for the GoEmotionstestset using CoEQN and EQN.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
alisha4walunj/goemotions-binary dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterjonasbacci/goemotions-ekman dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterGoEmotions
GoEmotions is a corpus of 58k carefully curated comments extracted from Reddit, with human annotations to 27 emotion categories or Neutral.
Number of examples: 58,009. Number of labels: 27 + Neutral. Maximum sequence length in training and evaluation datasets: 30.
On top of the raw data, we also include a version filtered based on reter-agreement, which contains a train/test/validation split:
Size of training dataset: 43,410. Size of test dataset: 5,427. Size of… See the full description on the dataset page: https://huggingface.co/datasets/mrm8488/goemotions.