100+ datasets found
  1. h

    turkish-sentiment-analysis-dataset

    • huggingface.co
    • kaggle.com
    Updated Jun 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Batuhan (2022). turkish-sentiment-analysis-dataset [Dataset]. https://huggingface.co/datasets/winvoker/turkish-sentiment-analysis-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 22, 2022
    Authors
    Batuhan
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset

    This dataset contains positive , negative and notr sentences from several data sources given in the references. In the most sentiment models , there are only two labels; positive and negative. However , user input can be totally notr sentence. For such cases there were no data I could find. Therefore I created this dataset with 3 class. Positive and negative sentences are listed below. Notr examples are extraced from turkish wiki dump. In addition, added some random text… See the full description on the dataset page: https://huggingface.co/datasets/winvoker/turkish-sentiment-analysis-dataset.

  2. Sentiment Analysis Dataset

    • kaggle.com
    zip
    Updated May 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    abdelmalek eladjelet (2025). Sentiment Analysis Dataset [Dataset]. https://www.kaggle.com/datasets/abdelmalekeladjelet/sentiment-analysis-dataset
    Explore at:
    zip(9105036 bytes)Available download formats
    Dataset updated
    May 3, 2025
    Authors
    abdelmalek eladjelet
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    🧠 Multi-Class Sentiment Analysis Dataset (240K+ English Comments)

    📌 Description

    This dataset is a large-scale collection of 241,000+ English-language comments sourced from various online platforms. Each comment is annotated with a sentiment label:

    • 0 — Negative
    • 1 — Neutral
    • 2 — Positive

    The Data has been gathered from multiple websites such as : Hugginface : https://huggingface.co/datasets/Sp1786/multiclass-sentiment-analysis-dataset Kaggle : https://www.kaggle.com/datasets/abhi8923shriv/sentiment-analysis-dataset
    https://www.kaggle.com/datasets/jp797498e/twitter-entity-sentiment-analysis https://www.kaggle.com/datasets/crowdflower/twitter-airline-sentiment

    The goal is to enable training and evaluation of multi-class sentiment analysis models for real-world text data. The dataset is already preprocessed — lowercase, cleaned from punctuation, URLs, numbers, and stopwords — and is ready for NLP pipelines.

    📊 Columns

    ColumnDescription
    CommentUser-generated text content
    SentimentSentiment label (0=Negative, 1=Neutral, 2=Positive)

    🚀 Use Cases

    • 🧠 Train sentiment classifiers using LSTM, BiLSTM, CNN, BERT, or RoBERTa
    • 🔍 Evaluate preprocessing and tokenization strategies
    • 📈 Benchmark NLP models on multi-class classification tasks
    • 🎓 Educational projects and research in opinion mining or text classification
    • 🧪 Fine-tune transformer models on a large and diverse sentiment dataset

    💬 Example

    Comment: "apple pay is so convenient secure and easy to use"
    Sentiment: 2 (Positive)
    
  3. amazon-reviews-sentiment-analysis

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    fastai X Hugging Face Group 2022, amazon-reviews-sentiment-analysis [Dataset]. https://huggingface.co/datasets/hugginglearners/amazon-reviews-sentiment-analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset provided by
    Hugging Facehttps://huggingface.co/
    Authors
    fastai X Hugging Face Group 2022
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for amazon reviews for sentiment analysis

      Dataset Summary
    

    One of the most important problems in e-commerce is the correct calculation of the points given to after-sales products. The solution to this problem is to provide greater customer satisfaction for the e-commerce site, product prominence for sellers, and a seamless shopping experience for buyers. Another problem is the correct ordering of the comments given to the products. The prominence of misleading… See the full description on the dataset page: https://huggingface.co/datasets/hugginglearners/amazon-reviews-sentiment-analysis.

  4. h

    sst2

    • huggingface.co
    Updated Mar 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford NLP (2024). sst2 [Dataset]. https://huggingface.co/datasets/stanfordnlp/sst2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 26, 2024
    Dataset authored and provided by
    Stanford NLP
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    Dataset Card for [Dataset Name]

      Dataset Summary
    

    The Stanford Sentiment Treebank is a corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in language. The corpus is based on the dataset introduced by Pang and Lee (2005) and consists of 11,855 single sentences extracted from movie reviews. It was parsed with the Stanford parser and includes a total of 215,154 unique phrases from those parse trees, each… See the full description on the dataset page: https://huggingface.co/datasets/stanfordnlp/sst2.

  5. Z

    Data for manuscript: "Longitudinal Analysis of Sentiment and Emotion in News...

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    • +1more
    Updated Sep 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymized (2022). Data for manuscript: "Longitudinal Analysis of Sentiment and Emotion in News Media Headlines Using Automated Labelling with Transformer Language Models" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5144112
    Explore at:
    Dataset updated
    Sep 13, 2022
    Dataset provided by
    NA
    Authors
    Anonymized
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data set contains automated sentiment and emotionality annotations of 23 million headlines from 47 popular news media outlets popular in the United States.

    The set of 47 news media outlets analysed (listed in Figure 1 of the main manuscript) was derived from the AllSides organization 2019 Media Bias Chart v1.1. The human ratings of outlets’ ideological leanings were also taken from this chart and are listed in Figure 2 of the main manuscript.

    News articles headlines from the set of outlets analyzed in the manuscript are available in the outlets’ online domains and/or public cache repositories such as The Internet Wayback Machine, Google cache and Common Crawl. Articles headlines were located in articles’ HTML raw data using outlet-specific XPath expressions.

    The temporal coverage of headlines across news outlets is not uniform. For some media organizations, news articles availability in online domains or Internet cache repositories becomes sparse for earlier years. Furthermore, some news outlets popular in 2019, such as The Huffington Post or Breitbart, did not exist in the early 2000’s. Hence, our data set is sparser in headlines sample size and representativeness for earlier years in the 2000-2019 timeline. Nevertheless, 20 outlets in our data set have chronologically continuous partial or full headline data availability since the year 2000. Figure S 1 in the SI reports the number of headlines per outlet and per year in our analysis.

    In a small percentage of articles, outlet specific XPath expressions might fail to properly capture the content of the headline due to the heterogeneity of HTML elements and CSS styling combinations with which articles text content is arranged in outlets online domains. After manual testing, we determined that the percentage of headlines following in this category is very small. Additionally, our method might miss detecting some articles in the online domains of news outlets. To conclude, in a data analysis of over 23 million headlines, we cannot manually check the correctness of every single data instance and hundred percent accuracy at capturing headlines’ content is elusive due to the small number of difficult to detect boundary cases such as incorrect HTML markup syntax in online domains. Overall however, we are confident that our headlines set is representative of headlines in print news media content for the studied time period and outlets analyzed.

    The list of compressed files in this data set is listed next:

    -analysisScripts.rar contains the analysis scripts used in the main manuscript as well as aggregated data of sentiment and emotionality automated annotations of the headlines and human annotations of a subset of headlines sentiment and emotionality used as ground truth.

    -models.rar contains the Transformer sentiment and emotion annotation models used in the analysis. Namely:

    Siebert/sentiment-roberta-large-english from https://huggingface.co/siebert/sentiment-roberta-large-english. This model is a fine-tuned checkpoint of RoBERTa-large (Liu et al. 2019). It enables reliable binary sentiment analysis for various types of English-language text. For each instance, it predicts either positive (1) or negative (0) sentiment. The model was fine-tuned and evaluated on 15 data sets from diverse text sources to enhance generalization across different types of texts (reviews, tweets, etc.). See more information from the original authors at https://huggingface.co/siebert/sentiment-roberta-large-english

    DistilbertSST2.rar is the default sentiment classification model of the HuggingFace Transformer library https://huggingface.co/ This model is only used to replicate the results of the sentiment analysis with sentiment-roberta-large-english

    DistilRoberta j-hartmann/emotion-english-distilroberta-base from https://huggingface.co/j-hartmann/emotion-english-distilroberta-base. The model is a fine-tuned checkpoint of DistilRoBERTa-base. The model allows annotation of English text with Ekman's 6 basic emotions, plus a neutral class. The model was trained on 6 diverse datasets. Please refer to the original author at https://huggingface.co/j-hartmann/emotion-english-distilroberta-base for an overview of the data sets used for fine tuning. https://huggingface.co/j-hartmann/emotion-english-distilroberta-base

    -headlinesDataWithSentimentLabelsAnnotationsFromSentimentRobertaLargeModel.rar URLs of headlines analyzed and the sentiment annotations of the siebert/sentiment-roberta-large-english Transformer model. https://huggingface.co/siebert/sentiment-roberta-large-english

    -headlinesDataWithSentimentLabelsAnnotationsFromDistilbertSST2.rar URLs of headlines analyzed and the sentiment annotations of the default HuggingFace sentiment analysis model fine-tuned on the SST-2 dataset. https://huggingface.co/

    -headlinesDataWithEmotionLabelsAnnotationsFromDistilRoberta.rar URLs of headlines analyzed and the emotion categories annotations of the j-hartmann/emotion-english-distilroberta-base Transformer model. https://huggingface.co/j-hartmann/emotion-english-distilroberta-base

  6. h

    financial-sentiment-analysis

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ysj, financial-sentiment-analysis [Dataset]. https://huggingface.co/datasets/sjyuxyz/financial-sentiment-analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    ysj
    Description

    sjyuxyz/financial-sentiment-analysis dataset hosted on Hugging Face and contributed by the HF Datasets community

  7. h

    sentiment-analysis-for-mental-health

    • huggingface.co
    Updated Oct 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sayan Golder (2024). sentiment-analysis-for-mental-health [Dataset]. https://huggingface.co/datasets/btwitssayan/sentiment-analysis-for-mental-health
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 25, 2024
    Authors
    Sayan Golder
    Description

    btwitssayan/sentiment-analysis-for-mental-health dataset hosted on Hugging Face and contributed by the HF Datasets community

  8. h

    sentiment140

    • huggingface.co
    • opendatalab.com
    • +2more
    Updated Apr 23, 2010
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford NLP (2010). sentiment140 [Dataset]. https://huggingface.co/datasets/stanfordnlp/sentiment140
    Explore at:
    Dataset updated
    Apr 23, 2010
    Dataset authored and provided by
    Stanford NLP
    Description

    Sentiment140 consists of Twitter messages with emoticons, which are used as noisy labels for sentiment classification. For more detailed information please refer to the paper.

  9. Rotten Tomatoes Movie Reviews

    • kaggle.com
    Updated Nov 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Rotten Tomatoes Movie Reviews [Dataset]. https://www.kaggle.com/datasets/thedevastator/movie-review-data-set-from-rotten-tomatoes
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 20, 2022
    Dataset provided by
    Kaggle
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Rotten Tomatoes Movie Reviews

    Predicting Movie Review Sentiment

    Source

    Huggingface Hub: link

    About this dataset

    The Rotten Tomatoes Movie Review Sentiment Analysis Dataset contains a set of 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. Bo Pang and Lillian Lee first used this data in their paper Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales, which was published in Proceedings of the ACL in 2005. All of the data fields are identical in every single one of the splits.The text column contains the review itself, and the label column indicates whether the review is positive or negative

    How to use the dataset

    The Performance of Sentiment Analysis In this post we take a look at the performance of different sentiment analysis systems on a movie review dataset from Rotten Tomatoes. This data was first used in Bo Pang and Lillian Lee, Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales., Proceedings of the ACL, 2005. The data fields are the same among all splits

    We will be using three different libraries for this post: 1) Scikit-learn, 2) NLTK, and 3) TextBlob. We will also compare the results of these systems with those from human raters. Each library takes different amounts of time and resources to run, so we will also be considering these factors in our comparisons.

    NLTK

    NLTK is a popular library for working with text data in Python. It includes many useful features for pre-processing text data, including tokenization, lemmatization, and part-of-speech tagging. NLTK also includes a number of helpful classes for building and evaluating predictive models (such as decision trees and maximum entropy classifiers).

    TextBlob

    TextBlob is a relatively new library that attempts to provide an easy-to-use interface for common text processing tasks (such as part-of-speech tagging, sentence parsing, spelling correction, etc). TextBlob is built on top of NLTK and Pattern, another Python library for web mining (see below).

    Scikit-learn

    Scikit-learn is a popular machine learning library for Python that provides efficient implementations of common algorithms such as support vector machines, random forests, and k-nearest neighbors classifiers. It also includes helpful utilities for pre-processing data and assessing model performance

    Research Ideas

    • Identify positive and negative sentiment in movie reviews
    • Categorize movie reviews by rating
    • Cluster movie reviews to group together similar reviews

    Acknowledgements

    Huggingface Hub: link

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: validation.csv | Column name | Description | |:--------------|:----------------------------------| | text | The text of the review. (String) | | label | The label of the review. (String) |

    File: train.csv | Column name | Description | |:--------------|:----------------------------------| | text | The text of the review. (String) | | label | The label of the review. (String) |

    File: test.csv | Column name | Description | |:--------------|:----------------------------------| | text | The text of the review. (String) | | label | The label of the review. (String) |

  10. h

    multilingual-sentiments

    • huggingface.co
    Updated Sep 19, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tay Yong Qiang (2022). multilingual-sentiments [Dataset]. https://huggingface.co/datasets/tyqiangz/multilingual-sentiments
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 19, 2022
    Authors
    Tay Yong Qiang
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Multilingual Sentiments Dataset

    A collection of multilingual sentiments datasets grouped into 3 classes -- positive, neutral, negative. Most multilingual sentiment datasets are either 2-class positive or negative, 5-class ratings of products reviews (e.g. Amazon multilingual dataset) or multiple classes of emotions. However, to an average person, sometimes positive, negative and neutral classes suffice and are more straightforward to perceive and annotate. Also, a positive/negative… See the full description on the dataset page: https://huggingface.co/datasets/tyqiangz/multilingual-sentiments.

  11. O

    tweet-sentiment-multilingual

    • opendatalab.com
    • huggingface.co
    zip
    Updated Jan 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cardiff University (2022). tweet-sentiment-multilingual [Dataset]. https://opendatalab.com/OpenDataLab/tweet-sentiment-multilingual
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 1, 2022
    Dataset provided by
    Cardiff University
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    Tweet Sentiment Multilingual consists of sentiment analysis dataset on Twitter in 8 different lagnuages.

  12. h

    Data from: imdb

    • huggingface.co
    Updated Aug 3, 2003
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford NLP (2003). imdb [Dataset]. https://huggingface.co/datasets/stanfordnlp/imdb
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 3, 2003
    Dataset authored and provided by
    Stanford NLP
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Dataset Card for "imdb"

      Dataset Summary
    

    Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.

      Supported Tasks and Leaderboards
    

    More Information Needed

      Languages
    

    More Information Needed

      Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/stanfordnlp/imdb.
    
  13. Issues and News Embeddings

    • kaggle.com
    zip
    Updated Nov 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    João Pedro Ribeiro (2025). Issues and News Embeddings [Dataset]. https://www.kaggle.com/datasets/ribe3iro/issues-and-news-embeddings
    Explore at:
    zip(5049484375 bytes)Available download formats
    Dataset updated
    Nov 9, 2025
    Authors
    João Pedro Ribeiro
    Description
    • Sentiment analysis: RoBERTa;
    • Embeddings extraction: MPNet.
  14. h

    news-sentiment-data

    • huggingface.co
    Updated Jul 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    amitk17 (2024). news-sentiment-data [Dataset]. https://huggingface.co/datasets/sweatSmile/news-sentiment-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 8, 2024
    Authors
    amitk17
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    sweatSmile/news-sentiment-data dataset hosted on Hugging Face and contributed by the HF Datasets community

  15. Data from: News Sentiment Analysis

    • kaggle.com
    zip
    Updated Aug 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    myrios (2024). News Sentiment Analysis [Dataset]. https://www.kaggle.com/datasets/myrios/news-sentiment-analysis
    Explore at:
    zip(30198931 bytes)Available download formats
    Dataset updated
    Aug 3, 2024
    Authors
    myrios
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Three news sources have been used in creating this dataset. 1. Sun, J. (2016, August). Daily News for Stock Market Prediction, Version 1. Retrieved (2024, August) from https://www.kaggle.com/aaron7sun/stocknews. 2. ARYAN SINGH. NYT Articles: 2.1M+ (2000-Present) Daily Updated. https://www.kaggle.com/datasets/aryansingh0909/nyt-articles-21m-2000-present. 3. GABRIEL PREDA. BBC News. https://www.kaggle.com/datasets/gpreda/bbc-news.

    The first source covers from 2008-06-08 to 2016-07-01, the top 25 news of each day from Reddit World News. The second source is a direct import of the abstract column from New York Times articles from 2016-07-01 to 2017-09-05. The third is also a direct import of the description column from BBC News from 2017-09-05 to 2024-08-03. Thus, the whole coverage is from 2008-06-08 to 2024-08-03.

    Three models have been used for sentiment results. NLTK VADER is applied first as it is the most lightweight and fastest to apply on large amounts of data. But, as news is mostly neural, NLTK vader gave a 1.0 neutral score for around 25% of the data. Therefore, two more advanced models, NLTK RoBERTa and HUGGING FACE distilbert-base-uncased-finetuned-sst-2-english, are applied to these neutral articles to identify them accurately.

    Part of my school project for Nanyang Polytechnic | AI & Data Engineering

  16. h

    sentiment-analysis-dataset

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    stepan, sentiment-analysis-dataset [Dataset]. https://huggingface.co/datasets/skibastepan/sentiment-analysis-dataset
    Explore at:
    Authors
    stepan
    Description

    skibastepan/sentiment-analysis-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

  17. h

    course-review-multilabel-sentiment-analysis

    • huggingface.co
    Updated May 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nguyen Minh Chi (2024). course-review-multilabel-sentiment-analysis [Dataset]. https://huggingface.co/datasets/chillies/course-review-multilabel-sentiment-analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 28, 2024
    Authors
    Nguyen Minh Chi
    Description

    chillies/course-review-multilabel-sentiment-analysis dataset hosted on Hugging Face and contributed by the HF Datasets community

  18. DailyDialog: Multi-Turn Dialog+Intention+Emotion

    • kaggle.com
    zip
    Updated Dec 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). DailyDialog: Multi-Turn Dialog+Intention+Emotion [Dataset]. https://www.kaggle.com/datasets/thedevastator/dailydialog-multi-turn-dialog-with-intention-and
    Explore at:
    zip(2581845 bytes)Available download formats
    Dataset updated
    Dec 2, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    DailyDialog: Multi-Turn Dialog+Intention+Emotion

    Human-written dialogues with communication intention and emotion labels

    By daily_dialog (From Huggingface) [source]

    About this dataset

    The DailyDialog dataset is a meticulously curated collection of multi-turn dialogues that aims to accurately represent the way we communicate in our daily lives. It covers a wide range of topics that are relevant to our everyday experiences. What sets this dataset apart is that it includes human-written conversations, which means the language used is more natural and realistic, resulting in less noise and higher quality data.

    Each dialogue in the dataset consists of two or more participants engaging in a conversation. The conversations are provided in textual form, allowing for easy analysis and processing. Alongside the dialogues, there are also corresponding labels for communication intention and emotion attached to each utterance.

    The communication intention labels categorize each utterance based on its intended purpose or goal within the conversation. These categories provide valuable insights into how different participants express their intentions through speech.

    In addition to the communication intention labels, there are also emotion labels assigned to each utterance in the dialogues. These emotion labels capture the emotional state or sentiment expressed by participants during various points in the conversation.

    To facilitate model evaluation and testing, DailyDialog provides three separate files: validation.csv, train.csv, and test.csv. The validation set (validation.csv) contains dialogues with their respective communication intention and emotion labels for assessing model performance during development stages. The train set (train.csv) includes dialogues paired with corresponding communication intention and emotion labels for training purposes. Lastly, test.csv serves as an independent test set that enables evaluating models' proficiency by providing unseen dialogues along with their associated communication intention and emotion labels.

    Overall, DailyDialog stands out as a high-quality dataset due to its accurate representation of daily life conversations paired with comprehensive labeling of both communication intentions and emotions expressed throughout these dialogues. This makes it an invaluable resource for developing robust dialogue systems capable of understanding human interactions on a deeper level while being able to identify diverse intentions behind speech acts alongside various emotional states encountered during daily life exchanges

    How to use the dataset

    Welcome to the DailyDialog dataset! This high-quality multi-turn dialog dataset has been curated to reflect our daily communication style and covers a wide range of topics related to our everyday lives. The dataset consists of human-written conversations, making it less noisy and more realistic. Each conversation in the dataset has been manually labeled with communication intention and emotion information, providing valuable insights into the dialogues.

    To make the most of this dataset, here is a step-by-step guide on how you can use it effectively:

    • Understanding the columns:

      • dialog: This column contains the actual conversation between two or more participants. It is in text format.
      • act: The act column represents the communication intention labels for each utterance in the dialogue. These labels categorize each utterance based on its intention.
      • emotion: The emotion column contains emotion labels for each utterance in the dialogue. These labels represent the emotions expressed during that particular utterance.
    • Familiarize yourself with validation.csv:

      • The validation.csv file serves as a validation set for evaluating your model's performance. It contains pre-labeled conversations along with their corresponding communication intentions and emotion labels.
    • Explore train.csv for training purposes:

      • The train.csv file is meant for training purposes and provides conversations along with their communication intentions and emotion labels.
    • Test your model using test.csv:

      • Test.csv file has conversation along ithentensions or emotional label which can be addressed once program is recreated.

    Finally, remember that this DailyDialog dataset offers an excellent opportunity to develop models capable of understanding multi-turn dialogues in a wide range of everyday scenarios. By utilizing both communication intention and emotion information provided, you can gain valuable insights into analyzing human conversations.

    So dive into this rich resource, experiment with different techn...

  19. h

    Leadership-sentiment-analysis

    • huggingface.co
    Updated Aug 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    K. A. Consultants (2024). Leadership-sentiment-analysis [Dataset]. https://huggingface.co/datasets/orYx-models/Leadership-sentiment-analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 28, 2024
    Dataset authored and provided by
    K. A. Consultants
    Description

    orYx-models/Leadership-sentiment-analysis dataset hosted on Hugging Face and contributed by the HF Datasets community

  20. h

    wisesight_sentiment

    • huggingface.co
    Updated Feb 4, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PyThaiNLP (2018). wisesight_sentiment [Dataset]. https://huggingface.co/datasets/pythainlp/wisesight_sentiment
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 4, 2018
    Dataset authored and provided by
    PyThaiNLP
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    Dataset Card for wisesight_sentiment

      Dataset Summary
    

    Wisesight Sentiment Corpus: Social media messages in Thai language with sentiment label (positive, neutral, negative, question)

    Released to public domain under Creative Commons Zero v1.0 Universal license. Labels: {"pos": 0, "neu": 1, "neg": 2, "q": 3} Size: 26,737 messages Language: Central Thai Style: Informal and conversational. With some news headlines and advertisement. Time period: Around 2016 to early 2019. With… See the full description on the dataset page: https://huggingface.co/datasets/pythainlp/wisesight_sentiment.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Batuhan (2022). turkish-sentiment-analysis-dataset [Dataset]. https://huggingface.co/datasets/winvoker/turkish-sentiment-analysis-dataset

turkish-sentiment-analysis-dataset

Turkish Sentiment Dataset

winvoker/turkish-sentiment-analysis-dataset

Explore at:
26 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 22, 2022
Authors
Batuhan
License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Dataset

This dataset contains positive , negative and notr sentences from several data sources given in the references. In the most sentiment models , there are only two labels; positive and negative. However , user input can be totally notr sentence. For such cases there were no data I could find. Therefore I created this dataset with 3 class. Positive and negative sentences are listed below. Notr examples are extraced from turkish wiki dump. In addition, added some random text… See the full description on the dataset page: https://huggingface.co/datasets/winvoker/turkish-sentiment-analysis-dataset.

Search
Clear search
Close search
Google apps
Main menu