https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Dataset Card for "emotion"
Dataset Summary
Emotion is a dataset of English Twitter messages with six basic emotions: anger, fear, joy, love, sadness, and surprise. For more detailed information please refer to the paper.
Supported Tasks and Leaderboards
More Information Needed
Languages
More Information Needed
Dataset Structure
Data Instances
An example looks as follows. { "text": "im feeling quite sad and sorry for myself but… See the full description on the dataset page: https://huggingface.co/datasets/dair-ai/emotion.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is collected and annotated for the SMILE project http://www.culturesmile.org. This collection of tweets mentioning 13 Twitter handles associated with British museums was gathered between May 2013 and June 2015. It was created for the purpose of classifying emotions, expressed on Twitter towards arts and cultural experiences in museums. It contains 3,085 tweets, with 5 emotions namely anger, disgust, happiness, surprise and sadness. Please see our paper "SMILE: Twitter Emotion Classification using Domain Adaptation" for more details of the dataset.License: The annotations are provided under a CC-BY license, while Twitter retains the ownership and rights of the content of the tweets.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Twitter is an online Social Media Platform where people share their their though as tweets. It is observed that some people misuse it to tweet hateful content. Twitter is trying to tackle this problem and we shall help it by creating a strong NLP based-classifier model to distinguish the negative tweets & block such tweets. Can you build a strong classifier model to predict the same?
Each row contains the text of a tweet and a sentiment label. In the training set you are provided with a word or phrase drawn from the tweet (selected_text) that encapsulates the provided sentiment.
Make sure, when parsing the CSV, to remove the beginning / ending quotes from the text field, to ensure that you don't include them in your training.
You're attempting to predict the word or phrase from the tweet that exemplifies the provided sentiment. The word or phrase should include all characters within that span (i.e. including commas, spaces, etc.)
Columns:
textID - unique ID for each piece of text
text - the text of the tweet
sentiment - the general sentiment of the tweet
Acknowledgement:
The dataset is download from Kaggle Competetions:
https://www.kaggle.com/c/tweet-sentiment-extraction/data?select=train.csv
Objective: Understand the Dataset & cleanup (if required). Build classification models to predict the twitter sentiments. Compare the evaluation metrics of vaious classification algorithms.
Original Data Source: Twitter Tweets Sentiment Dataset
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The dataset is about public conversation on Twitter surrounding the COVID-19 pandemic. They annotated seventeen latent semantic attributes for each public tweet using natural language processing techniques and machine-learning based algorithms. The latent semantic attributes include: 1) ten attributes indicating the tweet’s relevance to ten detected topics, 2) five quantitative attributes indicating the degree of intensity in the valence (i.e., unpleasantness/pleasantness) and emotional intensities across four primary emotions of fear, anger, sadness and joy, and 3) two qualitative attributes indicating the sentiment category and the most dominant emotion category, respectively. Data is accessible to people who have an OPEN ICPSR account.
The RANLP-Emotions-Twitter dataset contains 210 English tweets annotated by six trained annotators for Ekman's basic emotions plus the neutral class.
The details of the annotation procedure and various analyses can be found in [1].
Dataset can be used only for research non-commercial purposes.
If you use this dataset, please reference the following paper:
[1] Štajner, S. 2021. Exploring Reliability of Gold Labels for Emotion Detection in Twitter. In Proceedings of the 13th international conference on Recent Advances in Natural Language Processing (RANLP), pp. 1350-1359.
Bibtex reference:
@inproceedings{stajner-2021-ranlp-emotions, title = "Exploring Reliability of Gold Labels for Emotion Detection in Twitter", author = "\v{S}tajner, Sanja", booktitle = "Proceedings of the 13th international conference on Recent Advances in Natural Language Processing (RANLP)", month = sep, year = "2021", address = "Online", pages = "1350--1359", abstract = "Emotion detection from social media posts has attracted noticeable attention from natural language processing (NLP) community in recent years. The ways for obtaining gold labels for training and testing of the systems for automatic emotion detection differ significantly from one study to another, and pose the question of reliability of gold labels and obtained classification results. This study systematically explores several ways for obtaining gold labels for Ekman's emotion model on Twitter data and the influence of the chosen strategy on the manual classification results."}
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Sentiment Analysis Dataset is a dataset for emotional analysis, including large-scale tweet text collected from Twitter and emotional polarity (0=negative, 2=neutral, 4=positive) labels for each tweet, featuring automatic labeling based on emoticons.
2) Data Utilization (1) Sentiment Analysis Dataset has characteristics that: • Each sample consists of six columns: emotional polarity, tweet ID, date of writing, search word, author, and tweet body, and is suitable for training natural language processing and classification models using tweet text and emotion labels. (2) Sentiment Analysis Dataset can be used to: • Emotional Classification Model Development: Using tweet text and emotional polarity labels, we can build positive, negative, and neutral emotional automatic classification models with various machine learning and deep learning models such as logistic regression, SVM, RNN, and LSTM. • Analysis of SNS public opinion and trends: By analyzing the distribution of emotions by time series and keywords, you can explore changes in public opinion on specific issues or brands, positive and negative trends, and key emotional keywords.
This is an entity-level Twitter Sentiment Analysis dataset. For each message, the task is to judge the sentiment of the entire sentence towards a given entity. For example, A outperforms B is positive for entity A but negative for entity B. The dataset contains ~70K labeled training messages and 1K labeled validation messages. It is available online for free on Kaggle.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The CaSET dataset is a Catalan corpus of Tweets annotated with Emotions, Static Stance, and Dynamic Stance. The dataset contains 11k unique sentences on five controversial topics, grouped in 6k pairs of sentences, paired as parent messages and replies to these messages.
We provide the following files and folders:
This dataset can be used to train models for emotion detection, static stance detection, and dynamic stance detection.
The dataset is in Catalan (ca-ES).
Each instance in the dataset is a pair of parent-reply messages, annotated with the relation between the two messages (the dynamic stance) and the topic of the messages. For each message there is the id to retrieve it with the Twitter API, the emotions identified in the message, and the relation between the message and the topic (static stance). The text fields have to be retrieved using the Twitter API.
{
"id_parent": "1413960970066710533",
"id_reply": "1413968453690658816",
"parent_text": "",
"reply_text": "",
"topic": "vaccines",
"dynamic_stance": "Disagree",
"parent_stance": "FAVOUR",
"reply_stance": "AGAINST",
"parent_emotion": ["distrust", "joy", "disgust"],
"reply_emotion": ["distrust"]
}
The dataset does not contain splits.
We created this corpus to contribute to the development of language models in Catalan, a low-resource language.
Source Data
The data was collected using the Twitter API by the Barcelona Supercomputing Center.
The data was collected based on a list of keywords related to the five topics included in the dataset: vaccines, rent regulation, surrogate pregnancy, airport expansion, and a TV show rigging. Specific periods in which the topic was under discussion were also selected.
The source language producers are users of Twitter.
All the annotators are native speakers of Catalan.
Social Impact of Dataset
We hope this corpus contributes to the development of language models in Catalan, a low-resource language.
We are aware that, since the data comes from social media, this will contain biases, hate speech and toxic content. We have not applied any steps to reduce their impact.
The dataset has to be downloaded using the Twitter API, therefore some instances might be lost.
Language Technologies Unit (LangTech) at the Barcelona Supercomputing Center.
This work has been promoted and financed by the Generalitat de Catalunya through the Aina project.
Creative Commons Attribution 4.0.
@inproceedings{figueras-etal-2023-dynamic,
title = "Dynamic Stance: Modeling Discussions by Labeling the Interactions",
author = "Figueras, Blanca and
Baucells, Irene and
Caselli, Tommaso",
editor = "Bouamor, Houda and
Pino, Juan and
Bali, Kalika",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2023",
month = dec,
year = "2023",
address = "Singapore",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.findings-emnlp.432",
doi = "10.18653/v1/2023.findings-emnlp.432",
pages = "6503--6515",
}
@inproceedings{gonzalez-agirre-etal-2024-building-data,
title = "Building a Data Infrastructure for a Mid-Resource Language: The Case of {C}atalan",
author = "Gonzalez-Agirre, Aitor and
Marimon, Montserrat and
Rodriguez-Penagos, Carlos and
Aula-Blasco, Javier and
Baucells, Irene and
Armentano-Oller, Carme and
Palomar-Giner, Jorge and
Kulebi, Baybars and
Villegas, Marta",
editor = "Calzolari, Nicoletta and
Kan, Min-Yen and
Hoste, Veronique and
Lenci, Alessandro and
Sakti, Sakriani and
Xue, Nianwen",
booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
month = may,
year = "2024",
address = "Torino, Italia",
publisher = "ELRA and ICCL",
url = "https://aclanthology.org/2024.lrec-main.231",
pages = "2556--2566",
}
For further information, please send an email to langtech@bsc.es.
https://www.worldbank.org/en/about/legal/terms-of-use-for-datasetshttps://www.worldbank.org/en/about/legal/terms-of-use-for-datasets
This dataset consists of a few thousand Twitter user reviews (input text) and Emotions (output labels ) for learning how to train the text for emotion analysis. This dataset was created using Twitter API by implementing the Keywords. The idea here is a dataset is more than a toy - real business data on a reasonable scale - but can be trained in minutes on a modest laptop.
This file has Sl no, Tweets, Search key, Feeling.
Description of columns in the file:
Tweets - text of the review Search key - Keyword used Feeling - Emotion classified using the keyword, this column contains 6 emotions i.e., Happy, Sad, Surprise, Fear, Disgust, Angry
This would be helpful for the organization to understand Customer reviews/feedbacks.
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data introduction • Twitter-tweets-sentiment dataset is a dataset that aims to analyze tweet sentiment for Twitter and natural language processing.
2) Data utilization (1)Twitter-tweets-sentiment data has characteristics that: • The data consists of three columns, including emotion and text, and aims to block negative tweets through a powerful classification model. (2) Twitter-tweets-sentiment data can be used to: • Social Media Monitoring: Businesses and organizations can use data to monitor social media platforms and gauge public sentiment about a brand, product, event, or social issue. • Sentiment analysis: This dataset can be used to train models that classify the sentiment of tweets, which can help companies and researchers understand public opinion on a variety of topics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We collected and processed a dataset and make it available for the research community to study the COVD-19 pandemic in multiple possibilities.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
because of COVID-19
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset features the training models, emotion classifications and emotion patterns before and after events, related to the paper:F. Kunneman, M. van Mulken and A. Van den Bosch, Anticipointment detection in event tweets (under review)Abstract of the study:We developed a system to detect positive expectation, disappointment, and satisfaction in tweets that refer to events automatically discovered in the Twitter stream. The emotional content shared on Twitter when referring to public events can provide insights into the presumed and experienced quality of the event. We expected to find a connection between positive expectation and disappointment, a succession that is sometimes referred to as anticipointment. The application of computational approaches makes it possible to detect the presence and strength of this hypothetical relation for a large number of events. We extracted events from a longitudinal data set of Dutch Twitter posts, and modeled classifiers to recognize emotion in the tweets related to those events by means of hashtag-labeled training data. After classifying all tweets before and after the events in our data set, we summarized the collective emotions by calculating the percentage of tweets classified with an emotion as well as ranking tweets based on the classifier confidence score for an emotion and selecting the 90th percentile. Only a weak correlation of around 0.2 was found between positive expectation and disappointment, while a higher correlation of 0.6 was found between positiveexpectation and satisfaction. The most anticipointing events were events with a clear loss, such as a canceled event or when the favored sports team had lost. We conclude that senders of Twitter posts might be more inclined to share satisfaction than disappointment after a much anticipated event.Subject period: January 1st 2011 until October 31st 2015 Date: start=2015-11-01; end=2016-02-28 (data collection)
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset is a processed version of the Sentiment140 corpus, containing 1.6 million tweets with binary sentiment labels. The original data has been cleaned, tokenized, and prepared for natural language processing (NLP) and machine learning tasks. It provides a rich resource for sentiment analysis, text classification, and other NLP applications. The dataset includes the full processed corpus (train-processed.csv) and a smaller sample of 10,000 tweets (train-processed-sample.csv) for quick experimentation and model prototyping. Key Features:
1.6 million labeled tweets Binary sentiment classification (0 for negative, 1 for positive) Preprocessed and tokenized text Balanced class distribution Suitable for various NLP tasks and model architectures
Citation If you use this dataset in your research or project, please cite the original Sentiment140 dataset: Go, A., Bhayani, R. and Huang, L., 2009. Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, 1(2009), p.12.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for "Large twitter tweets sentiment analysis"
Dataset Description
Dataset Summary
This dataset is a collection of tweets formatted in a tabular data structure, annotated for sentiment analysis. Each tweet is associated with a sentiment label, with 1 indicating a Positive sentiment and 0 for a Negative sentiment.
Languages
The tweets in English.
Dataset Structure
Data Instances
An instance of the dataset includes… See the full description on the dataset page: https://huggingface.co/datasets/gxb912/large-twitter-tweets-sentiment.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset has three sentiments namely, negative, neutral, and positive. It contains two fields for the tweet and label.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains tweets labeled for sentiment analysis, categorized into Positive, Negative, and Neutral sentiments. The dataset includes tweet IDs, user metadata, sentiment labels, and tweet text, making it suitable for Natural Language Processing (NLP), machine learning, and AI-based sentiment classification research. Originally sourced from Kaggle, this dataset is curated for improved usability in social media sentiment analysis.
We designed an emotional speech database that can be used for emotion recognition as well as recognition and synthesis of speech with various emotions. The database was designed by compiling tweets acquired from Twitter and selecting emotion- dependent tweets considering phonetic and prosodic balance. We classified gathered tweets into four emotions: joy, anger, sadness and neutral, and then selected 50 sentences from sentences of each emotion based on the entropy-based algorithm. We compared the selected sentence sets with randomly selected sentence sets from aspects of phonetic and prosodic balance and sentence length, and confirmed that the sets selected by the algorithm were more balanced. Next, we recorded emotional speech based on the selected sentences. Then, we evaluated the speech from the viewpoint of emotional recognition and emotional speech recognition.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Description
The Twitter Financial News dataset is an English-language dataset containing an annotated corpus of finance-related tweets. This dataset is used to classify finance-related tweets for their sentiment.
The dataset holds 11,932 documents annotated with 3 labels:
sentiments = { "LABEL_0": "Bearish", "LABEL_1": "Bullish", "LABEL_2": "Neutral" }
The data was collected using the Twitter API. The current dataset supports the multi-class classification… See the full description on the dataset page: https://huggingface.co/datasets/zeroshot/twitter-financial-news-sentiment.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SSH CENTRE (Social Sciences and Humanities for Climate, Energy aNd Transport Research Excellence) is a Horizon Europe project, engaging directly with stakeholders across research, policy, and business (including citizens) to strengthen social innovation, SSH-STEM collaboration, transdisciplinary policy advice, inclusive engagement, and SSH communities across Europe, accelerating the EU’s transition to carbon neutrality. SSH CENTRE is based in a range of activities related to Open Science, inclusivity and diversity – especially with regards Southern and Eastern Europe and different career stages – including: development of novel SSH-STEM collaborations to facilitate the delivery of the EU Green Deal; SSH knowledge brokerage to support regions in transition; and the effective design of strategies for citizen engagement in EU R&I activities. Outputs include action-led agendas and building stakeholder synergies through regular Policy Insight events.This is captured in a high-profile virtual SSH CENTRE generating and sharing best practice for SSH policy advice, overcoming fragmentation to accelerate the EU’s journey to a sustainable future.The documents uploaded here are part of WP2 whereby novel, interdisciplinary teams were provided funding to undertake activities to develop a policy recommendation related to EU Green Deal policy. Each of these policy recommendations, and the activities that inform them, will be written-up as a chapter in an edited book collection. Three books will make up this edited collection - one on climate, one on energy and one on mobility. As part of writing a chapter for the SSH CENTRE book on ‘Mobility’, we set out to analyse the sentiment of users on Twitter regarding shared and active mobility modes in Brussels. This involved us collecting tweets between 2017-2022. A tweet was collected if it contained a previously defined mobility keyword (for example: metro) and either the name of a (local) politician, a neighbourhood or municipality, or a (shared) mobility provider. The files attached to this Zenodo webpage is a csv files containing the tweets collected.”.
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Dataset Card for "emotion"
Dataset Summary
Emotion is a dataset of English Twitter messages with six basic emotions: anger, fear, joy, love, sadness, and surprise. For more detailed information please refer to the paper.
Supported Tasks and Leaderboards
More Information Needed
Languages
More Information Needed
Dataset Structure
Data Instances
An example looks as follows. { "text": "im feeling quite sad and sorry for myself but… See the full description on the dataset page: https://huggingface.co/datasets/dair-ai/emotion.