100+ datasets found

h
turkish-sentiment-analysis-dataset
huggingface.co
kaggle.com
Updated Jun 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Batuhan (2022). turkish-sentiment-analysis-dataset [Dataset]. https://huggingface.co/datasets/winvoker/turkish-sentiment-analysis-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 22, 2022
Authors
Batuhan
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset

This dataset contains positive , negative and notr sentences from several data sources given in the references. In the most sentiment models , there are only two labels; positive and negative. However , user input can be totally notr sentence. For such cases there were no data I could find. Therefore I created this dataset with 3 class. Positive and negative sentences are listed below. Notr examples are extraced from turkish wiki dump. In addition, added some random text… See the full description on the dataset page: https://huggingface.co/datasets/winvoker/turkish-sentiment-analysis-dataset.
Sentiment Analysis Dataset
kaggle.com
zip
Updated May 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
abdelmalek eladjelet (2025). Sentiment Analysis Dataset [Dataset]. https://www.kaggle.com/datasets/abdelmalekeladjelet/sentiment-analysis-dataset
Explore at:
zip(9105036 bytes)Available download formats
Dataset updated
May 3, 2025
Authors
abdelmalek eladjelet
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
🧠 Multi-Class Sentiment Analysis Dataset (240K+ English Comments)

📌 Description

This dataset is a large-scale collection of 241,000+ English-language comments sourced from various online platforms. Each comment is annotated with a sentiment label:

0 — Negative

1 — Neutral

2 — Positive

The Data has been gathered from multiple websites such as : Hugginface : https://huggingface.co/datasets/Sp1786/multiclass-sentiment-analysis-dataset Kaggle : https://www.kaggle.com/datasets/abhi8923shriv/sentiment-analysis-dataset
https://www.kaggle.com/datasets/jp797498e/twitter-entity-sentiment-analysis https://www.kaggle.com/datasets/crowdflower/twitter-airline-sentiment

The goal is to enable training and evaluation of multi-class sentiment analysis models for real-world text data. The dataset is already preprocessed — lowercase, cleaned from punctuation, URLs, numbers, and stopwords — and is ready for NLP pipelines.

📊 Columns

Column Description
Comment User-generated text content
Sentiment Sentiment label (0=Negative, 1=Neutral, 2=Positive)

🚀 Use Cases

🧠 Train sentiment classifiers using LSTM, BiLSTM, CNN, BERT, or RoBERTa

🔍 Evaluate preprocessing and tokenization strategies

📈 Benchmark NLP models on multi-class classification tasks

🎓 Educational projects and research in opinion mining or text classification

🧪 Fine-tune transformer models on a large and diverse sentiment dataset

💬 Example

Comment: "apple pay is so convenient secure and easy to use" Sentiment: 2 (Positive)
amazon-reviews-sentiment-analysis
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
fastai X Hugging Face Group 2022, amazon-reviews-sentiment-analysis [Dataset]. https://huggingface.co/datasets/hugginglearners/amazon-reviews-sentiment-analysis
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset provided by
Hugging Facehttps://huggingface.co/
Authors
fastai X Hugging Face Group 2022
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Dataset Card for amazon reviews for sentiment analysis

Dataset Summary

One of the most important problems in e-commerce is the correct calculation of the points given to after-sales products. The solution to this problem is to provide greater customer satisfaction for the e-commerce site, product prominence for sellers, and a seamless shopping experience for buyers. Another problem is the correct ordering of the comments given to the products. The prominence of misleading… See the full description on the dataset page: https://huggingface.co/datasets/hugginglearners/amazon-reviews-sentiment-analysis.
h
sst2
huggingface.co
Updated Mar 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford NLP (2024). sst2 [Dataset]. https://huggingface.co/datasets/stanfordnlp/sst2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 26, 2024
Dataset authored and provided by
Stanford NLP
License
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Description
Dataset Card for [Dataset Name]

Dataset Summary

The Stanford Sentiment Treebank is a corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in language. The corpus is based on the dataset introduced by Pang and Lee (2005) and consists of 11,855 single sentences extracted from movie reviews. It was parsed with the Stanford parser and includes a total of 215,154 unique phrases from those parse trees, each… See the full description on the dataset page: https://huggingface.co/datasets/stanfordnlp/sst2.
Z
Data for manuscript: "Longitudinal Analysis of Sentiment and Emotion in News...
data.niaid.nih.gov
data-staging.niaid.nih.gov
+1more
Updated Sep 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymized (2022). Data for manuscript: "Longitudinal Analysis of Sentiment and Emotion in News Media Headlines Using Automated Labelling with Transformer Language Models" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5144112
Explore at:
Dataset updated
Sep 13, 2022
Dataset provided by
NA
Authors
Anonymized
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data set contains automated sentiment and emotionality annotations of 23 million headlines from 47 popular news media outlets popular in the United States.

The set of 47 news media outlets analysed (listed in Figure 1 of the main manuscript) was derived from the AllSides organization 2019 Media Bias Chart v1.1. The human ratings of outlets’ ideological leanings were also taken from this chart and are listed in Figure 2 of the main manuscript.

News articles headlines from the set of outlets analyzed in the manuscript are available in the outlets’ online domains and/or public cache repositories such as The Internet Wayback Machine, Google cache and Common Crawl. Articles headlines were located in articles’ HTML raw data using outlet-specific XPath expressions.

The temporal coverage of headlines across news outlets is not uniform. For some media organizations, news articles availability in online domains or Internet cache repositories becomes sparse for earlier years. Furthermore, some news outlets popular in 2019, such as The Huffington Post or Breitbart, did not exist in the early 2000’s. Hence, our data set is sparser in headlines sample size and representativeness for earlier years in the 2000-2019 timeline. Nevertheless, 20 outlets in our data set have chronologically continuous partial or full headline data availability since the year 2000. Figure S 1 in the SI reports the number of headlines per outlet and per year in our analysis.

In a small percentage of articles, outlet specific XPath expressions might fail to properly capture the content of the headline due to the heterogeneity of HTML elements and CSS styling combinations with which articles text content is arranged in outlets online domains. After manual testing, we determined that the percentage of headlines following in this category is very small. Additionally, our method might miss detecting some articles in the online domains of news outlets. To conclude, in a data analysis of over 23 million headlines, we cannot manually check the correctness of every single data instance and hundred percent accuracy at capturing headlines’ content is elusive due to the small number of difficult to detect boundary cases such as incorrect HTML markup syntax in online domains. Overall however, we are confident that our headlines set is representative of headlines in print news media content for the studied time period and outlets analyzed.

The list of compressed files in this data set is listed next:

-analysisScripts.rar contains the analysis scripts used in the main manuscript as well as aggregated data of sentiment and emotionality automated annotations of the headlines and human annotations of a subset of headlines sentiment and emotionality used as ground truth.

-models.rar contains the Transformer sentiment and emotion annotation models used in the analysis. Namely:

Siebert/sentiment-roberta-large-english from https://huggingface.co/siebert/sentiment-roberta-large-english. This model is a fine-tuned checkpoint of RoBERTa-large (Liu et al. 2019). It enables reliable binary sentiment analysis for various types of English-language text. For each instance, it predicts either positive (1) or negative (0) sentiment. The model was fine-tuned and evaluated on 15 data sets from diverse text sources to enhance generalization across different types of texts (reviews, tweets, etc.). See more information from the original authors at https://huggingface.co/siebert/sentiment-roberta-large-english

DistilbertSST2.rar is the default sentiment classification model of the HuggingFace Transformer library https://huggingface.co/ This model is only used to replicate the results of the sentiment analysis with sentiment-roberta-large-english

DistilRoberta j-hartmann/emotion-english-distilroberta-base from https://huggingface.co/j-hartmann/emotion-english-distilroberta-base. The model is a fine-tuned checkpoint of DistilRoBERTa-base. The model allows annotation of English text with Ekman's 6 basic emotions, plus a neutral class. The model was trained on 6 diverse datasets. Please refer to the original author at https://huggingface.co/j-hartmann/emotion-english-distilroberta-base for an overview of the data sets used for fine tuning. https://huggingface.co/j-hartmann/emotion-english-distilroberta-base

-headlinesDataWithSentimentLabelsAnnotationsFromSentimentRobertaLargeModel.rar URLs of headlines analyzed and the sentiment annotations of the siebert/sentiment-roberta-large-english Transformer model. https://huggingface.co/siebert/sentiment-roberta-large-english

-headlinesDataWithSentimentLabelsAnnotationsFromDistilbertSST2.rar URLs of headlines analyzed and the sentiment annotations of the default HuggingFace sentiment analysis model fine-tuned on the SST-2 dataset. https://huggingface.co/

-headlinesDataWithEmotionLabelsAnnotationsFromDistilRoberta.rar URLs of headlines analyzed and the emotion categories annotations of the j-hartmann/emotion-english-distilroberta-base Transformer model. https://huggingface.co/j-hartmann/emotion-english-distilroberta-base
h
financial-sentiment-analysis
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ysj, financial-sentiment-analysis [Dataset]. https://huggingface.co/datasets/sjyuxyz/financial-sentiment-analysis
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
ysj
Description
sjyuxyz/financial-sentiment-analysis dataset hosted on Hugging Face and contributed by the HF Datasets community
h
sentiment-analysis-for-mental-health
huggingface.co
Updated Oct 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sayan Golder (2024). sentiment-analysis-for-mental-health [Dataset]. https://huggingface.co/datasets/btwitssayan/sentiment-analysis-for-mental-health
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 25, 2024
Authors
Sayan Golder
Description
btwitssayan/sentiment-analysis-for-mental-health dataset hosted on Hugging Face and contributed by the HF Datasets community
h
sentiment140
huggingface.co
opendatalab.com
+2more
Updated Apr 23, 2010
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford NLP (2010). sentiment140 [Dataset]. https://huggingface.co/datasets/stanfordnlp/sentiment140
Explore at:
Dataset updated
Apr 23, 2010
Dataset authored and provided by
Stanford NLP
Description
Sentiment140 consists of Twitter messages with emoticons, which are used as noisy labels for sentiment classification. For more detailed information please refer to the paper.
Rotten Tomatoes Movie Reviews
kaggle.com
Updated Nov 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Rotten Tomatoes Movie Reviews [Dataset]. https://www.kaggle.com/datasets/thedevastator/movie-review-data-set-from-rotten-tomatoes
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 20, 2022
Dataset provided by
Kaggle
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Rotten Tomatoes Movie Reviews

Predicting Movie Review Sentiment

Source

Huggingface Hub: link

About this dataset

The Rotten Tomatoes Movie Review Sentiment Analysis Dataset contains a set of 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. Bo Pang and Lillian Lee first used this data in their paper Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales, which was published in Proceedings of the ACL in 2005. All of the data fields are identical in every single one of the splits.The text column contains the review itself, and the label column indicates whether the review is positive or negative

How to use the dataset

The Performance of Sentiment Analysis In this post we take a look at the performance of different sentiment analysis systems on a movie review dataset from Rotten Tomatoes. This data was first used in Bo Pang and Lillian Lee, Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales., Proceedings of the ACL, 2005. The data fields are the same among all splits

We will be using three different libraries for this post: 1) Scikit-learn, 2) NLTK, and 3) TextBlob. We will also compare the results of these systems with those from human raters. Each library takes different amounts of time and resources to run, so we will also be considering these factors in our comparisons.

NLTK

NLTK is a popular library for working with text data in Python. It includes many useful features for pre-processing text data, including tokenization, lemmatization, and part-of-speech tagging. NLTK also includes a number of helpful classes for building and evaluating predictive models (such as decision trees and maximum entropy classifiers).

TextBlob

TextBlob is a relatively new library that attempts to provide an easy-to-use interface for common text processing tasks (such as part-of-speech tagging, sentence parsing, spelling correction, etc). TextBlob is built on top of NLTK and Pattern, another Python library for web mining (see below).

Scikit-learn

Scikit-learn is a popular machine learning library for Python that provides efficient implementations of common algorithms such as support vector machines, random forests, and k-nearest neighbors classifiers. It also includes helpful utilities for pre-processing data and assessing model performance

Research Ideas

Identify positive and negative sentiment in movie reviews

Categorize movie reviews by rating

Cluster movie reviews to group together similar reviews

Acknowledgements

Huggingface Hub: link

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: validation.csv | Column name | Description | |:--------------|:----------------------------------| | text | The text of the review. (String) | | label | The label of the review. (String) |

File: train.csv | Column name | Description | |:--------------|:----------------------------------| | text | The text of the review. (String) | | label | The label of the review. (String) |

File: test.csv | Column name | Description | |:--------------|:----------------------------------| | text | The text of the review. (String) | | label | The label of the review. (String) |
h
multilingual-sentiments
huggingface.co
Updated Sep 19, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tay Yong Qiang (2022). multilingual-sentiments [Dataset]. https://huggingface.co/datasets/tyqiangz/multilingual-sentiments
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 19, 2022
Authors
Tay Yong Qiang
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Multilingual Sentiments Dataset

A collection of multilingual sentiments datasets grouped into 3 classes -- positive, neutral, negative. Most multilingual sentiment datasets are either 2-class positive or negative, 5-class ratings of products reviews (e.g. Amazon multilingual dataset) or multiple classes of emotions. However, to an average person, sometimes positive, negative and neutral classes suffice and are more straightforward to perceive and annotate. Also, a positive/negative… See the full description on the dataset page: https://huggingface.co/datasets/tyqiangz/multilingual-sentiments.
O
tweet-sentiment-multilingual
opendatalab.com
huggingface.co
zip
Updated Jan 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cardiff University (2022). tweet-sentiment-multilingual [Dataset]. https://opendatalab.com/OpenDataLab/tweet-sentiment-multilingual
Explore at:
zipAvailable download formats
Dataset updated
Jan 1, 2022
Dataset provided by
Cardiff University
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
Tweet Sentiment Multilingual consists of sentiment analysis dataset on Twitter in 8 different lagnuages.
h
Data from: imdb
huggingface.co
Updated Aug 3, 2003
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford NLP (2003). imdb [Dataset]. https://huggingface.co/datasets/stanfordnlp/imdb
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 3, 2003
Dataset authored and provided by
Stanford NLP
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
Dataset Card for "imdb"

Dataset Summary

Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.

Supported Tasks and Leaderboards

More Information Needed

Languages

More Information Needed

Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/stanfordnlp/imdb.
Issues and News Embeddings
kaggle.com
zip
Updated Nov 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
João Pedro Ribeiro (2025). Issues and News Embeddings [Dataset]. https://www.kaggle.com/datasets/ribe3iro/issues-and-news-embeddings
Explore at:
zip(5049484375 bytes)Available download formats
Dataset updated
Nov 9, 2025
Authors
João Pedro Ribeiro
Description
Sentiment analysis: RoBERTa;

Embeddings extraction: MPNet.
h
news-sentiment-data
huggingface.co
Updated Jul 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
amitk17 (2024). news-sentiment-data [Dataset]. https://huggingface.co/datasets/sweatSmile/news-sentiment-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 8, 2024
Authors
amitk17
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
sweatSmile/news-sentiment-data dataset hosted on Hugging Face and contributed by the HF Datasets community
Data from: News Sentiment Analysis
kaggle.com
zip
Updated Aug 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
myrios (2024). News Sentiment Analysis [Dataset]. https://www.kaggle.com/datasets/myrios/news-sentiment-analysis
Explore at:
zip(30198931 bytes)Available download formats
Dataset updated
Aug 3, 2024
Authors
myrios
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Three news sources have been used in creating this dataset. 1. Sun, J. (2016, August). Daily News for Stock Market Prediction, Version 1. Retrieved (2024, August) from https://www.kaggle.com/aaron7sun/stocknews. 2. ARYAN SINGH. NYT Articles: 2.1M+ (2000-Present) Daily Updated. https://www.kaggle.com/datasets/aryansingh0909/nyt-articles-21m-2000-present. 3. GABRIEL PREDA. BBC News. https://www.kaggle.com/datasets/gpreda/bbc-news.

The first source covers from 2008-06-08 to 2016-07-01, the top 25 news of each day from Reddit World News. The second source is a direct import of the abstract column from New York Times articles from 2016-07-01 to 2017-09-05. The third is also a direct import of the description column from BBC News from 2017-09-05 to 2024-08-03. Thus, the whole coverage is from 2008-06-08 to 2024-08-03.

Three models have been used for sentiment results. NLTK VADER is applied first as it is the most lightweight and fastest to apply on large amounts of data. But, as news is mostly neural, NLTK vader gave a 1.0 neutral score for around 25% of the data. Therefore, two more advanced models, NLTK RoBERTa and HUGGING FACE distilbert-base-uncased-finetuned-sst-2-english, are applied to these neutral articles to identify them accurately.

Part of my school project for Nanyang Polytechnic | AI & Data Engineering
h
sentiment-analysis-dataset
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
stepan, sentiment-analysis-dataset [Dataset]. https://huggingface.co/datasets/skibastepan/sentiment-analysis-dataset
Explore at:
Authors
stepan
Description
skibastepan/sentiment-analysis-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
h
course-review-multilabel-sentiment-analysis
huggingface.co
Updated May 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nguyen Minh Chi (2024). course-review-multilabel-sentiment-analysis [Dataset]. https://huggingface.co/datasets/chillies/course-review-multilabel-sentiment-analysis
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 28, 2024
Authors
Nguyen Minh Chi
Description
chillies/course-review-multilabel-sentiment-analysis dataset hosted on Hugging Face and contributed by the HF Datasets community
DailyDialog: Multi-Turn Dialog+Intention+Emotion
kaggle.com
zip
Updated Dec 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). DailyDialog: Multi-Turn Dialog+Intention+Emotion [Dataset]. https://www.kaggle.com/datasets/thedevastator/dailydialog-multi-turn-dialog-with-intention-and
Explore at:
zip(2581845 bytes)Available download formats
Dataset updated
Dec 2, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
DailyDialog: Multi-Turn Dialog+Intention+Emotion

Human-written dialogues with communication intention and emotion labels

By daily_dialog (From Huggingface) [source]

About this dataset

The DailyDialog dataset is a meticulously curated collection of multi-turn dialogues that aims to accurately represent the way we communicate in our daily lives. It covers a wide range of topics that are relevant to our everyday experiences. What sets this dataset apart is that it includes human-written conversations, which means the language used is more natural and realistic, resulting in less noise and higher quality data.

Each dialogue in the dataset consists of two or more participants engaging in a conversation. The conversations are provided in textual form, allowing for easy analysis and processing. Alongside the dialogues, there are also corresponding labels for communication intention and emotion attached to each utterance.

The communication intention labels categorize each utterance based on its intended purpose or goal within the conversation. These categories provide valuable insights into how different participants express their intentions through speech.

In addition to the communication intention labels, there are also emotion labels assigned to each utterance in the dialogues. These emotion labels capture the emotional state or sentiment expressed by participants during various points in the conversation.

To facilitate model evaluation and testing, DailyDialog provides three separate files: validation.csv, train.csv, and test.csv. The validation set (validation.csv) contains dialogues with their respective communication intention and emotion labels for assessing model performance during development stages. The train set (train.csv) includes dialogues paired with corresponding communication intention and emotion labels for training purposes. Lastly, test.csv serves as an independent test set that enables evaluating models' proficiency by providing unseen dialogues along with their associated communication intention and emotion labels.

Overall, DailyDialog stands out as a high-quality dataset due to its accurate representation of daily life conversations paired with comprehensive labeling of both communication intentions and emotions expressed throughout these dialogues. This makes it an invaluable resource for developing robust dialogue systems capable of understanding human interactions on a deeper level while being able to identify diverse intentions behind speech acts alongside various emotional states encountered during daily life exchanges

How to use the dataset

Welcome to the DailyDialog dataset! This high-quality multi-turn dialog dataset has been curated to reflect our daily communication style and covers a wide range of topics related to our everyday lives. The dataset consists of human-written conversations, making it less noisy and more realistic. Each conversation in the dataset has been manually labeled with communication intention and emotion information, providing valuable insights into the dialogues.

To make the most of this dataset, here is a step-by-step guide on how you can use it effectively:

Understanding the columns:

dialog: This column contains the actual conversation between two or more participants. It is in text format.

act: The act column represents the communication intention labels for each utterance in the dialogue. These labels categorize each utterance based on its intention.

emotion: The emotion column contains emotion labels for each utterance in the dialogue. These labels represent the emotions expressed during that particular utterance.

Familiarize yourself with validation.csv:

The validation.csv file serves as a validation set for evaluating your model's performance. It contains pre-labeled conversations along with their corresponding communication intentions and emotion labels.

Explore train.csv for training purposes:

The train.csv file is meant for training purposes and provides conversations along with their communication intentions and emotion labels.

Test your model using test.csv:

Test.csv file has conversation along ithentensions or emotional label which can be addressed once program is recreated.

Finally, remember that this DailyDialog dataset offers an excellent opportunity to develop models capable of understanding multi-turn dialogues in a wide range of everyday scenarios. By utilizing both communication intention and emotion information provided, you can gain valuable insights into analyzing human conversations.

So dive into this rich resource, experiment with different techn...
h
Leadership-sentiment-analysis
huggingface.co
Updated Aug 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
K. A. Consultants (2024). Leadership-sentiment-analysis [Dataset]. https://huggingface.co/datasets/orYx-models/Leadership-sentiment-analysis
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 28, 2024
Dataset authored and provided by
K. A. Consultants
Description
orYx-models/Leadership-sentiment-analysis dataset hosted on Hugging Face and contributed by the HF Datasets community
h
wisesight_sentiment
huggingface.co
Updated Feb 4, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PyThaiNLP (2018). wisesight_sentiment [Dataset]. https://huggingface.co/datasets/pythainlp/wisesight_sentiment
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 4, 2018
Dataset authored and provided by
PyThaiNLP
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
Dataset Card for wisesight_sentiment

Dataset Summary

Wisesight Sentiment Corpus: Social media messages in Thai language with sentiment label (positive, neutral, negative, question)

Released to public domain under Creative Commons Zero v1.0 Universal license. Labels: {"pos": 0, "neu": 1, "neg": 2, "q": 3} Size: 26,737 messages Language: Central Thai Style: Informal and conversational. With some news headlines and advertisement. Time period: Around 2016 to early 2019. With… See the full description on the dataset page: https://huggingface.co/datasets/pythainlp/wisesight_sentiment.

Column	Description
`Comment`	User-generated text content
`Sentiment`	Sentiment label (0=Negative, 1=Neutral, 2=Positive)

Facebook

Twitter

Click to copy link

Link copied

Cite

Batuhan (2022). turkish-sentiment-analysis-dataset [Dataset]. https://huggingface.co/datasets/winvoker/turkish-sentiment-analysis-dataset

turkish-sentiment-analysis-dataset

Turkish Sentiment Dataset

winvoker/turkish-sentiment-analysis-dataset

Explore at:

26 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jun 22, 2022

Authors

Batuhan

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Dataset

This dataset contains positive , negative and notr sentences from several data sources given in the references. In the most sentiment models , there are only two labels; positive and negative. However , user input can be totally notr sentence. For such cases there were no data I could find. Therefore I created this dataset with 3 class. Positive and negative sentences are listed below. Notr examples are extraced from turkish wiki dump. In addition, added some random text… See the full description on the dataset page: https://huggingface.co/datasets/winvoker/turkish-sentiment-analysis-dataset.

Clear search

Close search

Google apps

Main menu

turkish-sentiment-analysis-dataset

Sentiment Analysis Dataset

🧠 Multi-Class Sentiment Analysis Dataset (240K+ English Comments)

📌 Description

📊 Columns

🚀 Use Cases

💬 Example

amazon-reviews-sentiment-analysis

sst2

Data for manuscript: "Longitudinal Analysis of Sentiment and Emotion in News...

financial-sentiment-analysis

sentiment-analysis-for-mental-health

sentiment140

Rotten Tomatoes Movie Reviews

Rotten Tomatoes Movie Reviews

Predicting Movie Review Sentiment

Source

About this dataset

How to use the dataset

NLTK

TextBlob

Scikit-learn

Research Ideas

Acknowledgements

License

Columns

multilingual-sentiments

tweet-sentiment-multilingual

Data from: imdb

Issues and News Embeddings

news-sentiment-data

Data from: News Sentiment Analysis

sentiment-analysis-dataset

course-review-multilabel-sentiment-analysis

DailyDialog: Multi-Turn Dialog+Intention+Emotion

DailyDialog: Multi-Turn Dialog+Intention+Emotion

Human-written dialogues with communication intention and emotion labels

About this dataset

How to use the dataset

Leadership-sentiment-analysis

wisesight_sentiment

turkish-sentiment-analysis-dataset

Turkish Sentiment Dataset

winvoker/turkish-sentiment-analysis-dataset