GoEmotions
GoEmotions is a corpus of 58k carefully curated comments extracted from Reddit, with human annotations to 27 emotion categories or Neutral.
Number of examples: 58,009. Number of labels: 27 + Neutral. Maximum sequence length in training and evaluation datasets: 30.
On top of the raw data, we also include a version filtered based on reter-agreement, which contains a train/test/validation split:
Size of training dataset: 43,410. Size of test dataset: 5,427. Size of… See the full description on the dataset page: https://huggingface.co/datasets/mrm8488/goemotions.
The GoEmotions dataset contains 58k carefully curated Reddit comments labeled for 27 emotion categories or Neutral. The emotion categories are admiration, amusement, anger, annoyance, approval, caring, confusion, curiosity, desire, disappointment, disapproval, disgust, embarrassment, excitement, fear, gratitude, grief, joy, love, nervousness, optimism, pride, realization, relief, remorse, sadness, surprise.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('goemotions', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset contains 58K carefully curated Reddit comments labeled for 27 emotion categories: admiration, amusement, anger, annoyance, approval, caring, confusion, curiosity, desire, disappointment, disapproval, disgust, embarrassment, excitement, fear, gratitude, grief, joy, love, nervousness, optimism, pride, realization, relief, remorse, sadness, & surprise.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
GoEmotions Ekman Emotions Dataset
Dataset Description
This dataset contains 10,000 text samples from Reddit comments mapped to the 7 basic Ekman emotions. It's derived from the original GoEmotions dataset and processed specifically for emotion classification research using Paul Ekman's fundamental emotion model.
Supported Tasks
Text Classification: Multi-class emotion classification Sentiment Analysis: Fine-grained emotion detection Psychology Research:… See the full description on the dataset page: https://huggingface.co/datasets/Frankhihi/goemotion-ekman-emotions.
GoEmotions 5-Point Sentiment Dataset
This dataset is a modified version of the GoEmotions dataset created by Google. The original dataset consists of 58k carefully curated Reddit comments labeled with 27 fine-grained emotion categories plus a neutral label.
📘 About This Version
This version maps the original GoEmotions emotion labels into a 5-point sentiment scale, making it more suitable for traditional sentiment analysis tasks:
Original Label(s) Mapped Sentiment… See the full description on the dataset page: https://huggingface.co/datasets/spacesedan/goemotions-5point-sentiment.
This dataset was created by Enes Ozturk
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
This dataset is designed for building and evaluating sentiment and emotion classification models in Natural Language Processing (NLP). It includes two well-known datasets:
Each dataset is provided in CSV format and includes text samples along with corresponding emotion or sentiment labels.
This dataset is useful for:
spacesedan/goemotions-5point-sentiment-refined dataset hosted on Hugging Face and contributed by the HF Datasets community
Hostileic/hinglish-goemotions dataset hosted on Hugging Face and contributed by the HF Datasets community
AutoTrain Dataset for project: twitter-goemotions-binary-fear-classification
Dataset Description
This dataset has been automatically processed by AutoTrain for project twitter-goemotions-binary-fear-classification.
Languages
The BCP-47 code for the dataset's language is unk.
Dataset Structure
Data Instances
A sample from this dataset looks as follows: [ { "text": "Downvoting comments you don't like is your right.", "feat_id":… See the full description on the dataset page: https://huggingface.co/datasets/garrettbaber/twitter-roberta-goemotions-binary-fear-classification.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
alisha4walunj/goemotions-binary dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Humans' emotional perception is subjective by nature, in which each individual could express different emotions regarding the same textual content. Existing datasets for emotion analysis commonly depend on a single ground truth per data sample, derived from majority voting or averaging the opinions of all annotators. We introduce a new non-aggregated dataset, namely StudEmo, that contains 5,182 customer reviews, each annotated by 25 people with intensities of eight emotions from Plutchik's model, extended with valence and arousal. We also propose three personalized models that use not only textual content but also the individual human perspective, providing the model with different approaches to learning human representations. The experiments were carried out as a multitask classification on two datasets: our StudEmo dataset and GoEmotions dataset, which contains 28 emotional categories. The proposed personalized methods significantly improve prediction results, especially for emotions that have low inter-annotator agreement.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Original dataset: GoEmotions dataset Filtered using the following mapping based on the basic emotions found in Plutchik's Wheel of Emotions and filtered to use only the sentences with one label wheel_dict = { "joy": [ "joy", "amusement", "excitement", "gratitude", "pride", "relief", "admiration", "love", "optimism", ], "trust": ["approval", "caring"], "fear": ["fear", "nervousness"], "surprise":… See the full description on the dataset page: https://huggingface.co/datasets/Jsevisal/go_emotions_wheel_unilabel.
The GoEmotions dataset contains 58k carefully curated Reddit comments labeled for 27 emotion categories or Neutral. The emotion categories are admiration, amusement, anger, annoyance, approval, caring, confusion, curiosity, desire, disappointment, disapproval, disgust, embarrassment, excitement, fear, gratitude, grief, joy, love, nervousness, optimism, pride, realization, relief, remorse, sadness, surprise.
Original dataset: GoEmotions dataset Filtered using "ekman_mapping.json" from original dataset repo and filtered to use only the sentences with one label Dataset contains 7 emotion labels as per Dr. Ekman theory. Labels are as follows: 0: anger 1: disgust 2: fear 3: joy 4: sadness 5: surprise 6: neutral
tgelton/GoEmotions dataset hosted on Hugging Face and contributed by the HF Datasets community
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
GoEmotions Spanish
A Spanish translation (using EasyNMT) of the GoEmotions dataset.
For more information check the official Model Card
sangkm/go-emotions-cleaned dataset hosted on Hugging Face and contributed by the HF Datasets community
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for RuIzardEmotions
Dataset Summary
The RuIzardEmotions dataset is a high-quality translation of the go-emotions dataset and the other emotion-detection dataset. It contains 30k Reddit comments labeled for 10 emotion categories (joy, sadness, anger, enthusiasm, surprise, disgust, fear, guilt, shame and neutral). The datasets were translated using the accurate translator DeepL and additional processing. The idea for the dataset was inspired by the Izard's… See the full description on the dataset page: https://huggingface.co/datasets/Djacon/ru-izard-emotions.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for GoEmotions
Dataset Summary
The RuGoEmotions dataset contains 34k Reddit comments labeled for 9 emotion categories (joy, interest, surprice, sadness, anger, disgust, fear, guilt and neutral). The dataset already with predefined train/val/test splits
Supported Tasks and Leaderboards
This dataset is intended for multi-class, multi-label emotion classification.
Languages
The data is in Russian.
Dataset Structure
Data… See the full description on the dataset page: https://huggingface.co/datasets/Djacon/ru_goemotions.
GoEmotions
GoEmotions is a corpus of 58k carefully curated comments extracted from Reddit, with human annotations to 27 emotion categories or Neutral.
Number of examples: 58,009. Number of labels: 27 + Neutral. Maximum sequence length in training and evaluation datasets: 30.
On top of the raw data, we also include a version filtered based on reter-agreement, which contains a train/test/validation split:
Size of training dataset: 43,410. Size of test dataset: 5,427. Size of… See the full description on the dataset page: https://huggingface.co/datasets/mrm8488/goemotions.