Facebook
TwitterGoEmotions
GoEmotions is a corpus of 58k carefully curated comments extracted from Reddit, with human annotations to 27 emotion categories or Neutral.
Number of examples: 58,009. Number of labels: 27 + Neutral. Maximum sequence length in training and evaluation datasets: 30.
On top of the raw data, we also include a version filtered based on reter-agreement, which contains a train/test/validation split:
Size of training dataset: 43,410. Size of test dataset: 5,427. Size of… See the full description on the dataset page: https://huggingface.co/datasets/mrm8488/goemotions.
Facebook
TwitterThe GoEmotions dataset contains 58k carefully curated Reddit comments labeled for 27 emotion categories or Neutral. The emotion categories are admiration, amusement, anger, annoyance, approval, caring, confusion, curiosity, desire, disappointment, disapproval, disgust, embarrassment, excitement, fear, gratitude, grief, joy, love, nervousness, optimism, pride, realization, relief, remorse, sadness, surprise.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('goemotions', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for GoEmotions
Dataset Summary
The GoEmotions dataset contains 58k carefully curated Reddit comments labeled for 27 emotion categories or Neutral. The raw data is included as well as the smaller, simplified version of the dataset with predefined train/val/test splits.
Supported Tasks and Leaderboards
This dataset is intended for multi-class, multi-label emotion classification.
Languages
The data is in English and Brazilian Portuguese… See the full description on the dataset page: https://huggingface.co/datasets/antoniomenezes/go_emotions_ptbr.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
GoEmotions Ekman Emotions Dataset
Dataset Description
This dataset contains 10,000 text samples from Reddit comments mapped to the 7 basic Ekman emotions. It's derived from the original GoEmotions dataset and processed specifically for emotion classification research using Paul Ekman's fundamental emotion model.
Supported Tasks
Text Classification: Multi-class emotion classification Sentiment Analysis: Fine-grained emotion detection Psychology Research:… See the full description on the dataset page: https://huggingface.co/datasets/Frankhihi/goemotion-ekman-emotions.
Facebook
TwitterThe GoEmotions dataset, a collection of 58k Reddit comments labeled according to the publisher affect from a taxonomy of 28 emotions.
Facebook
TwitterHostileic/hinglish-goemotions dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Original dataset: GoEmotions dataset Added labels_ekman column with multi-label emotion annotations mapped to 7 base emotions as per Dr. Ekman theory. Column labels contains multi-label emotion annotations with 28 emotion labels as per GoEmotion dataset: 0: admiration 1: amusement 2: anger 3: annoyance 4: approval 5: caring 6: confusion 7: curiosity 8: desire 9: disappointment 10: disapproval 11: disgust 12: embarrassment 13: excitement 14: fear 15: gratitude 16: grief 17: joy 18: love 19:… See the full description on the dataset page: https://huggingface.co/datasets/SkyWater21/en_go_emotions.
Facebook
TwitterAutoTrain Dataset for project: twitter-goemotions-binary-fear-classification
Dataset Description
This dataset has been automatically processed by AutoTrain for project twitter-goemotions-binary-fear-classification.
Languages
The BCP-47 code for the dataset's language is unk.
Dataset Structure
Data Instances
A sample from this dataset looks as follows: [ { "text": "Downvoting comments you don't like is your right.", "feat_id":… See the full description on the dataset page: https://huggingface.co/datasets/garrettbaber/twitter-roberta-goemotions-binary-fear-classification.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
GoEmotions dataset
The original dataset: GoEmotions (paper). The derived dataset contains an additional labels_ekman column with emotion labels as per Paul Ekman's theory.
The original 27 + neutral emotion labels (may contain more than one label per sample):
0: admiration 1: amusement 2: anger 3: annoyance 4: approval 5: caring 6: confusion 7: curiosity 8: desire 9: disappointment 10: disapproval 11: disgust 12: embarrassment 13: excitement 14: fear 15: gratitude 16: grief 17: joy… See the full description on the dataset page: https://huggingface.co/datasets/AiLab-IMCS-UL/go_emotions-en.
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Humans' emotional perception is subjective by nature, in which each individual could express different emotions regarding the same textual content. Existing datasets for emotion analysis commonly depend on a single ground truth per data sample, derived from majority voting or averaging the opinions of all annotators. We introduce a new non-aggregated dataset, namely StudEmo, that contains 5,182 customer reviews, each annotated by 25 people with intensities of eight emotions from Plutchik's model, extended with valence and arousal. We also propose three personalized models that use not only textual content but also the individual human perspective, providing the model with different approaches to learning human representations. The experiments were carried out as a multitask classification on two datasets: our StudEmo dataset and GoEmotions dataset, which contains 28 emotional categories. The proposed personalized methods significantly improve prediction results, especially for emotions that have low inter-annotator agreement.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Original dataset: GoEmotions dataset Filtered using the following mapping based on the basic emotions found in Plutchik's Wheel of Emotions and filtered to use only the sentences with one label wheel_dict = { "joy": [ "joy", "amusement", "excitement", "gratitude", "pride", "relief", "admiration", "love", "optimism", ], "trust": ["approval", "caring"], "fear": ["fear", "nervousness"], "surprise":… See the full description on the dataset page: https://huggingface.co/datasets/Jsevisal/go_emotions_wheel_unilabel.
Facebook
Twitter数据名称:Go Emotions:Google 情绪数据集 数据说明:Google AI GoEmotions 数据集由 Reddit 用户的评论组成,并附有情绪色彩标签。GoEmotions 旨在训练神经网络对文本的音调进行深入分析。大多数现有的情绪分类数据集涵盖某些领域(例如新闻标题和电影字幕),规模较小,并且仅使用六种基本情绪(愤怒、惊讶、厌恶、喜悦、恐惧和悲伤)的量表。扩大数据集中考虑的情绪范围可以创建更灵敏的聊天机器人、用于检测互联网上危险行为的模型,以及改善客户支持服务。 数据时间:更新于 2021年 数据来源:互联网数据
Facebook
TwitterOriginal dataset: GoEmotions dataset Filtered using "ekman_mapping.json" from original dataset repo and filtered to use only the sentences with one label Dataset contains 7 emotion labels as per Dr. Ekman theory. Labels are as follows: 0: anger 1: disgust 2: fear 3: joy 4: sadness 5: surprise 6: neutral
Facebook
Twittertgelton/GoEmotions dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twittersangkm/go-emotions-cleaned dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for RuIzardEmotions
Dataset Summary
The RuIzardEmotions dataset is a high-quality translation of the go-emotions dataset and the other emotion-detection dataset. It contains 30k Reddit comments labeled for 10 emotion categories (joy, sadness, anger, enthusiasm, surprise, disgust, fear, guilt, shame and neutral). The datasets were translated using the accurate translator DeepL and additional processing. The idea for the dataset was inspired by the Izard's… See the full description on the dataset page: https://huggingface.co/datasets/Djacon/ru-izard-emotions.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Lithuanian GoEmotions dataset
The original dataset: GoEmotions (paper). The derived dataset was machine translated from English into Lithuanian using the free Google Translate API (with deep-translator). The translation script: from datasets import load_dataset from deep_translator import GoogleTranslator from deep_translator.exceptions import TranslationNotFound
original_dataset = load_dataset("go_emotions", name="simplified") translator = GoogleTranslator(source="en"… See the full description on the dataset page: https://huggingface.co/datasets/AiLab-IMCS-UL/go_emotions-lt.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for CAMEO
Dataset to accompany the EMNLP'23 paper titled: "Misery Loves Complexity: Exploring Linguistic Complexity in the Context of Emotion Detection".
Dataset Details
50,000 subset from the GoEmotions Dataset automatically annotated with the following linguistic complexity measures:
idt: Incomplete Dependency Theory dlt: Dependency Locality Theory nnd: Nested-Nouns Distance le: Left-embededness percentage_polysyllable_words: % of polysyllable words… See the full description on the dataset page: https://huggingface.co/datasets/pranaydeeps/CAMEO.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Russian GoEmotions dataset
The original dataset: GoEmotions (paper). The derived dataset was machine translated from English into Russian using the free Google Translate API (with deep-translator). The translation script: from datasets import load_dataset from deep_translator import GoogleTranslator from deep_translator.exceptions import TranslationNotFound
original_dataset = load_dataset("go_emotions", name="simplified") translator = GoogleTranslator(source="en", target="ru")
def… See the full description on the dataset page: https://huggingface.co/datasets/AiLab-IMCS-UL/go_emotions-ru.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Original dataset: GoEmotions dataset The dataset was machine translated to Latvian using free Google Translate API. Tool used for translation: deep-translator Translation script: from datasets import load_dataset from deep_translator import GoogleTranslator from deep_translator.exceptions import TranslationNotFound
original_dataset = load_dataset("go_emotions", name="simplified") translator = GoogleTranslator(source="en", target="lv")
def translate_batch(batch): original_text =… See the full description on the dataset page: https://huggingface.co/datasets/SkyWater21/lv_go_emotions.
Facebook
TwitterGoEmotions
GoEmotions is a corpus of 58k carefully curated comments extracted from Reddit, with human annotations to 27 emotion categories or Neutral.
Number of examples: 58,009. Number of labels: 27 + Neutral. Maximum sequence length in training and evaluation datasets: 30.
On top of the raw data, we also include a version filtered based on reter-agreement, which contains a train/test/validation split:
Size of training dataset: 43,410. Size of test dataset: 5,427. Size of… See the full description on the dataset page: https://huggingface.co/datasets/mrm8488/goemotions.