100+ datasets found
  1. Sentiment Analysis for Mental Health

    • kaggle.com
    zip
    Updated Jul 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Suchintika Sarkar (2024). Sentiment Analysis for Mental Health [Dataset]. https://www.kaggle.com/datasets/suchintikasarkar/sentiment-analysis-for-mental-health
    Explore at:
    zip(11587194 bytes)Available download formats
    Dataset updated
    Jul 5, 2024
    Authors
    Suchintika Sarkar
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    This comprehensive dataset is a meticulously curated collection of mental health statuses tagged from various statements. The dataset amalgamates raw data from multiple sources, cleaned and compiled to create a robust resource for developing chatbots and performing sentiment analysis.

    Data Source:

    The dataset integrates information from the following Kaggle datasets:

    Data Overview:

    The dataset consists of statements tagged with one of the following seven mental health statuses: - Normal - Depression - Suicidal - Anxiety - Stress - Bi-Polar - Personality Disorder

    Data Collection:

    The data is sourced from diverse platforms including social media posts, Reddit posts, Twitter posts, and more. Each entry is tagged with a specific mental health status, making it an invaluable asset for:

    • Developing intelligent mental health chatbots.
    • Performing in-depth sentiment analysis.
    • Research and studies related to mental health trends.

    Features:

    • unique_id: A unique identifier for each entry.
    • Statement: The textual data or post.
    • Mental Health Status: The tagged mental health status of the statement.

    Usage:

    This dataset is ideal for training machine learning models aimed at understanding and predicting mental health conditions based on textual data. It can be used in various applications such as:

    • Chatbot development for mental health support.
    • Sentiment analysis to gauge mental health trends.
    • Academic research on mental health patterns.

    Acknowledgments:

    This dataset was created by aggregating and cleaning data from various publicly available datasets on Kaggle. Special thanks to the original dataset creators for their contributions.

  2. Sentiment Analysis Dataset

    • kaggle.com
    zip
    Updated May 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    abdelmalek eladjelet (2025). Sentiment Analysis Dataset [Dataset]. https://www.kaggle.com/datasets/abdelmalekeladjelet/sentiment-analysis-dataset
    Explore at:
    zip(9105036 bytes)Available download formats
    Dataset updated
    May 3, 2025
    Authors
    abdelmalek eladjelet
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    🧠 Multi-Class Sentiment Analysis Dataset (240K+ English Comments)

    📌 Description

    This dataset is a large-scale collection of 241,000+ English-language comments sourced from various online platforms. Each comment is annotated with a sentiment label:

    • 0 — Negative
    • 1 — Neutral
    • 2 — Positive

    The Data has been gathered from multiple websites such as : Hugginface : https://huggingface.co/datasets/Sp1786/multiclass-sentiment-analysis-dataset Kaggle : https://www.kaggle.com/datasets/abhi8923shriv/sentiment-analysis-dataset
    https://www.kaggle.com/datasets/jp797498e/twitter-entity-sentiment-analysis https://www.kaggle.com/datasets/crowdflower/twitter-airline-sentiment

    The goal is to enable training and evaluation of multi-class sentiment analysis models for real-world text data. The dataset is already preprocessed — lowercase, cleaned from punctuation, URLs, numbers, and stopwords — and is ready for NLP pipelines.

    📊 Columns

    ColumnDescription
    CommentUser-generated text content
    SentimentSentiment label (0=Negative, 1=Neutral, 2=Positive)

    🚀 Use Cases

    • 🧠 Train sentiment classifiers using LSTM, BiLSTM, CNN, BERT, or RoBERTa
    • 🔍 Evaluate preprocessing and tokenization strategies
    • 📈 Benchmark NLP models on multi-class classification tasks
    • 🎓 Educational projects and research in opinion mining or text classification
    • 🧪 Fine-tune transformer models on a large and diverse sentiment dataset

    💬 Example

    Comment: "apple pay is so convenient secure and easy to use"
    Sentiment: 2 (Positive)
    
  3. Chat Sentiment Dataset

    • kaggle.com
    zip
    Updated Mar 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nursyahrina (2023). Chat Sentiment Dataset [Dataset]. https://www.kaggle.com/datasets/nursyahrina/chat-sentiment-dataset
    Explore at:
    zip(7598 bytes)Available download formats
    Dataset updated
    Mar 22, 2023
    Authors
    Nursyahrina
    Description

    Chat Sentiment Dataset

    A Simple but Rich Dataset for Sentiment Analysis of Chat Messages

    Description:

    This dataset contains a collection of chat messages that can be used to develop a sentiment analysis machine learning model to classify messages into 3 sentiment classes - positive, negative, and neutral. The messages are diverse in nature, containing not only simple text but also special characters, numbers, emoji/emoticons, and URL addresses. The dataset can be used for various natural language processing tasks related to chat analysis.

    Column Descriptions:

    1. message: the content of the chat message.
    2. sentiment: the sentiment of the chat message, can be positive, negative, or neutral.
  4. Datasets for Sentiment Analysis

    • zenodo.org
    csv
    Updated Dec 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 10, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.

    Below are the datasets specified, along with the details of their references, authors, and download sources.

    ----------- STS-Gold Dataset ----------------

    The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.

    Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.

    File name: sts_gold_tweet.csv

    ----------- Amazon Sales Dataset ----------------

    This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.

    Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)

    Features:

    • product_id - Product ID
    • product_name - Name of the Product
    • category - Category of the Product
    • discounted_price - Discounted Price of the Product
    • actual_price - Actual Price of the Product
    • discount_percentage - Percentage of Discount for the Product
    • rating - Rating of the Product
    • rating_count - Number of people who voted for the Amazon rating
    • about_product - Description about the Product
    • user_id - ID of the user who wrote review for the Product
    • user_name - Name of the user who wrote review for the Product
    • review_id - ID of the user review
    • review_title - Short review
    • review_content - Long review
    • img_link - Image Link of the Product
    • product_link - Official Website Link of the Product

    License: CC BY-NC-SA 4.0

    File name: amazon.csv

    ----------- Rotten Tomatoes Reviews Dataset ----------------

    This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.

    This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).

    Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics

    File name: data_rt.csv

    ----------- Preprocessed Dataset Sentiment Analysis ----------------

    Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
    Stemmed and lemmatized using nltk.
    Sentiment labels are generated using TextBlob polarity scores.

    The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).

    DOI: 10.34740/kaggle/dsv/3877817

    Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }

    This dataset was used in the experimental phase of my research.

    File name: EcoPreprocessed.csv

    ----------- Amazon Earphones Reviews ----------------

    This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.

    This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

    The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)

    License: U.S. Government Works

    Source: www.amazon.in

    File name (original): AllProductReviews.csv (contains 14337 reviews)

    File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)

    ----------- Amazon Musical Instruments Reviews ----------------

    This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.

    This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

    The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).

    Source: http://jmcauley.ucsd.edu/data/amazon/

    File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)

    File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)

  5. Brand Sentiment Analysis Dataset (Twitter)

    • kaggle.com
    zip
    Updated Jan 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tushar Paul (2024). Brand Sentiment Analysis Dataset (Twitter) [Dataset]. https://www.kaggle.com/datasets/tusharpaul2001/brand-sentiment-analysis-dataset
    Explore at:
    zip(375745 bytes)Available download formats
    Dataset updated
    Jan 7, 2024
    Authors
    Tushar Paul
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset description Users assessed tweets related to various brands and products, providing evaluations on whether the sentiment conveyed was positive, negative, or neutral. Additionally, if the tweet conveyed any sentiment, contributors identified the specific brand or product targeted by that emotion.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F11965067%2Fa48606bfcaf80acebbb6edff7895484a%2Fdownload.png?generation=1704673111671747&alt=media" alt="">

    Train Dataset : 8589 rows x 3 columns https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F11965067%2Fe998ba81ca461699a787ff7305486b24%2FTrainDS.JPG?generation=1704672608361793&alt=media" alt="">

    Test Dataset : 504 rows x 1 columns https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F11965067%2F07df18965e91f84df123270aabb641e1%2Ftest.JPG?generation=1704679582009718&alt=media" alt="">

  6. h

    tweet-sentiment-analysis-from-kaggle

    • huggingface.co
    Updated Jul 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tek Loon (2024). tweet-sentiment-analysis-from-kaggle [Dataset]. https://huggingface.co/datasets/tekloon/tweet-sentiment-analysis-from-kaggle
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 31, 2024
    Authors
    Tek Loon
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    tekloon/tweet-sentiment-analysis-from-kaggle dataset hosted on Hugging Face and contributed by the HF Datasets community

  7. Twitter Tweets Sentiment Dataset

    • kaggle.com
    zip
    Updated Apr 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M Yasser H (2022). Twitter Tweets Sentiment Dataset [Dataset]. https://www.kaggle.com/datasets/yasserh/twitter-tweets-sentiment-dataset
    Explore at:
    zip(1289519 bytes)Available download formats
    Dataset updated
    Apr 8, 2022
    Authors
    M Yasser H
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    https://raw.githubusercontent.com/Masterx-AI/Project_Twitter_Sentiment_Analysis_/main/twitt.jpg" alt="">

    Description:

    Twitter is an online Social Media Platform where people share their their though as tweets. It is observed that some people misuse it to tweet hateful content. Twitter is trying to tackle this problem and we shall help it by creating a strong NLP based-classifier model to distinguish the negative tweets & block such tweets. Can you build a strong classifier model to predict the same?

    Each row contains the text of a tweet and a sentiment label. In the training set you are provided with a word or phrase drawn from the tweet (selected_text) that encapsulates the provided sentiment.

    Make sure, when parsing the CSV, to remove the beginning / ending quotes from the text field, to ensure that you don't include them in your training.

    You're attempting to predict the word or phrase from the tweet that exemplifies the provided sentiment. The word or phrase should include all characters within that span (i.e. including commas, spaces, etc.)

    Columns:

    1. textID - unique ID for each piece of text
    2. text - the text of the tweet
    3. sentiment - the general sentiment of the tweet

    Acknowledgement:

    The dataset is download from Kaggle Competetions:
    https://www.kaggle.com/c/tweet-sentiment-extraction/data?select=train.csv

    Objective:

    • Understand the Dataset & cleanup (if required).
    • Build classification models to predict the twitter sentiments.
    • Compare the evaluation metrics of vaious classification algorithms.
  8. h

    turkish-sentiment-analysis-dataset

    • huggingface.co
    • kaggle.com
    Updated Jun 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Batuhan (2022). turkish-sentiment-analysis-dataset [Dataset]. https://huggingface.co/datasets/winvoker/turkish-sentiment-analysis-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 22, 2022
    Authors
    Batuhan
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset

    This dataset contains positive , negative and notr sentences from several data sources given in the references. In the most sentiment models , there are only two labels; positive and negative. However , user input can be totally notr sentence. For such cases there were no data I could find. Therefore I created this dataset with 3 class. Positive and negative sentences are listed below. Notr examples are extraced from turkish wiki dump. In addition, added some random text… See the full description on the dataset page: https://huggingface.co/datasets/winvoker/turkish-sentiment-analysis-dataset.

  9. Amazon Kindle Book Review for Sentiment Analysis

    • kaggle.com
    zip
    Updated Sep 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meet Nagadia (2021). Amazon Kindle Book Review for Sentiment Analysis [Dataset]. https://www.kaggle.com/datasets/meetnagadia/amazon-kindle-book-review-for-sentiment-analysis
    Explore at:
    zip(6686485 bytes)Available download formats
    Dataset updated
    Sep 3, 2021
    Authors
    Meet Nagadia
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Context

    This is a small subset of dataset of Book reviews from Amazon Kindle Store category.

    Content

    5-core dataset of product reviews from Amazon Kindle Store category from May 1996 - July 2014. Contains total of 982619 entries. Each reviewer has at least 5 reviews and each product has at least 5 reviews in this dataset. Columns - asin - ID of the product, like B000FA64PK -helpful - helpfulness rating of the review - example: 2/3. -overall - rating of the product. -reviewText - text of the review (heading). -reviewTime - time of the review (raw). -reviewerID - ID of the reviewer, like A3SPTOKDG7WBLN -reviewerName - name of the reviewer. -summary - summary of the review (description). -unixReviewTime - unix timestamp.

    Which file to use?

    There are two files one is preprocessed ready for sentiment analysis and other is unprocessed to you basically have to process the dataset and then perform sentiment analysis

    Acknowledgements

    This dataset is taken from Amazon product data, Julian McAuley, UCSD website. http://jmcauley.ucsd.edu/data/amazon/

    License to the data files belong to them.

    Inspiration

    -Sentiment analysis on reviews. -Understanding how people rate usefulness of a review/ What factors influence helpfulness of a review. -Fake reviews/ outliers. -Best rated product IDs, or similarity between products based on reviews alone (not the best idea ikr). -Any other interesting analysis

  10. 2.5M+ reviews dataset for sentiment analysis

    • kaggle.com
    Updated Jan 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mike Shperling (2025). 2.5M+ reviews dataset for sentiment analysis [Dataset]. https://www.kaggle.com/datasets/dolbokostya/test-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 16, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mike Shperling
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    🌟 Dive into the largest reviews dataset with 2.5M entries, each labeled for sentiment!

    Perfect for AI enthusiasts, data scientists, and researchers to supercharge your NLP projects.

    💡 Why you’ll love it:

    • 📈 Boost your sentiment analysis models with massive, clean data
    • 🧠 Ideal for NLP and deep learning experiments
    • 🚀 Save time and focus on building winning solutions

    ⚡ Upvote & download now to take your projects to the next level! 🖤

  11. Twitter Sentiment Analysis Dataset

    • kaggle.com
    zip
    Updated Aug 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tùng Lê Thanh (2023). Twitter Sentiment Analysis Dataset [Dataset]. https://www.kaggle.com/datasets/tungle98/twitter-sentiment-dataset
    Explore at:
    zip(1291530 bytes)Available download formats
    Dataset updated
    Aug 16, 2023
    Authors
    Tùng Lê Thanh
    Description

    Dataset

    This dataset was created by Tùng Lê Thanh

    Contents

  12. Sentiment Analysis Dataset

    • kaggle.com
    zip
    Updated Dec 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhay Mudgal (2024). Sentiment Analysis Dataset [Dataset]. https://www.kaggle.com/datasets/abhaymudgal/sentiment-analysis-dataset
    Explore at:
    zip(3597460 bytes)Available download formats
    Dataset updated
    Dec 2, 2024
    Authors
    Abhay Mudgal
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    As the Social networking, customer support, and market research are where sentiment analysis is most frequently used. In social media, sentiment analysis is frequently used to examine how users feel about and talk about a brand or product. Organizations can use it to learn how various societal segments see various issues, ranging from hot topics to breaking news. With this knowledge, businesses may react swiftly to public sentiment.

    In this challenge, the goal is to detect the sentiments of the natural occurring sentences.

    Datasets consist following files -

    Dev-datasets: Containing the train and dev datasets along with a sample submission file (answer.txt) test-datasets: Containing the test dataset on which your models will be evaluated

    Train Size - 92,228

    Development Size - 4,855

    Ground Truth contains 3 categorical values -

    • Positive (1)
    • Neutral (0)
    • Negative (-1)

    You have to predict the labels and save the predictions (1, 0, -1) in "answer.txt" file.

  13. multimodal-sentiment-data

    • kaggle.com
    zip
    Updated May 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Suraj (2023). multimodal-sentiment-data [Dataset]. https://www.kaggle.com/datasets/suraj520/multimodal-sentiment-data
    Explore at:
    zip(1021992 bytes)Available download formats
    Dataset updated
    May 8, 2023
    Authors
    Suraj
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset provides a collection of images and their corresponding texts and sentiment which makes it a multi-modal sentiment analysis dataset.

    The dataset contains images of 100 different classes of animals and objects, including sharks, birds, lizards, spiders, and more.

    This dataset can be used for various computer vision and natural language processing tasks, such as image classification, sentiment analysis, and image captioning.

  14. Bitcoin Sentiment Analysis | Twitter Data

    • kaggle.com
    zip
    Updated Nov 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gautam Chettiar (2022). Bitcoin Sentiment Analysis | Twitter Data [Dataset]. https://www.kaggle.com/datasets/gautamchettiar/bitcoin-sentiment-analysis-twitter-data
    Explore at:
    zip(192139671 bytes)Available download formats
    Dataset updated
    Nov 7, 2022
    Authors
    Gautam Chettiar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Twitter tweet data can be used for sentiment analysis for Bitcoin.

    1. Preprocessing on the tweet text has already been done very rudimentarily, you can omit it.
    2. The sentiment polarity score should be removed, it too acts as a classifier.
    3. The final column is the classifier.
    4. If you can use more than just the text data, that will add multi-modality to your functionality.
    5. Enough data points are provided.
  15. Multilabel sentiment analysis dataset

    • kaggle.com
    zip
    Updated Mar 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mansh_anand (2024). Multilabel sentiment analysis dataset [Dataset]. https://www.kaggle.com/datasets/manshanand/multilabel-sentiment-analysis-dataset
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 25, 2024
    Authors
    mansh_anand
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by mansh_anand

    Released under Apache 2.0

    Contents

  16. YouTube Comments Sentiment Dataset

    • kaggle.com
    zip
    Updated Feb 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amaan Poonawala (2025). YouTube Comments Sentiment Dataset [Dataset]. https://www.kaggle.com/datasets/amaanpoonawala/youtube-comments-sentiment-dataset
    Explore at:
    zip(156821847 bytes)Available download formats
    Dataset updated
    Feb 7, 2025
    Authors
    Amaan Poonawala
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Area covered
    YouTube
    Description

    YouTube Comments Sentiment Analysis Dataset (1M+ Labeled Comments)

    Overview

    This dataset comprises over one million YouTube comments, each annotated with sentiment labels—**Positive**, Neutral, or Negative. The comments span a diverse range of topics including programming, news, sports, politics and more, and are enriched with comprehensive metadata to facilitate various NLP and sentiment analysis tasks.

    Dataset Contents

    Each record in the dataset includes the following fields: - CommentID: A unique identifier assigned to each YouTube comment. This allows for individual tracking and analysis of comments. - VideoID: The unique identifier of the YouTube video to which the comment belongs. This links each comment to its corresponding video. - VideoTitle: The title of the YouTube video where the comment was posted. This provides context about the video's content. - AuthorName: The display name of the user who posted the comment. This indicates the commenter's identity. - AuthorChannelID: The unique identifier of the YouTube channel of the comment's author. This allows for tracking comments across different videos from the same author. - CommentText: The actual text content of the YouTube comment. This is the raw data used for sentiment analysis. - Sentiment: The sentiment classification of the comment, typically categorized as positive, negative, or neutral. This represents the emotional tone of the comment. - Likes: The number of likes received by the comment. This indicates the comment's popularity or agreement from other users. - Replies: The number of replies to the comment. This indicates the level of engagement and discussion generated by the comment. - PublishedAt: The date and time when the comment was published. This allows for time-based analysis of comment trends. - CountryCode: The two-letter country code of the user that posted the comment. This can be used to analyze regional sentiment. - CategoryID: The category ID of the video that the comment was posted on. This allows for analysis of sentiment across video categories.

    Key Features:

    • Sentiment Analysis: Each comment has been categorized into positive, negative, or neutral sentiment, allowing for direct analysis of emotional tone.
    • Video and Author Metadata: The dataset includes information about the videos (title, category, ID) and authors (channel ID, name), enabling contextual analysis.
    • Engagement Metrics: Columns such as "Likes" and "Replies" provide insights into comment popularity and discussion levels.
    • Temporal and Geographical Data: "PublishedAt" and "CountryCode" columns allow for time-based and regional sentiment analysis.

    Data Collection & Labeling Process

    • Extraction:
      Comments were gathered using the YouTube Data API, ensuring a rich and diverse collection from multiple channels and regions.
    • Sentiment Labeling:
      A combination of advanced AI (using models such as Gemini) and manual validation was used to accurately label each comment.
    • Cleaning & Preprocessing:
      Comprehensive cleaning steps were applied—removing extraneous noise like timestamps, code snippets, and special characters—to ensure high-quality, ready-to-use text.
    • Augmentation for Balance:
      To address class imbalances (especially for underrepresented negative and neutral sentiments), a comment augmentation process was implemented. This process generated synthetic variations of selected comments, increasing linguistic diversity while preserving the original sentiment, thus ensuring a more balanced dataset.

    Benefits for Users

    • Scale & Diversity:
      With over 1M comments from various domains, this dataset offers a rich resource for training and evaluating sentiment analysis models.
    • Quality & Consistency:
      Rigorous cleaning, preprocessing, and augmentation ensure that the data is both reliable and representative of real-world YouTube interactions.
    • Versatility:
      Ideal for researchers, data scientists, and developers looking to build or fine-tune large language models for sentiment analysis, content moderation, and other NLP applications.

    Uses:

    • Sentiment analysis of YouTube comments.
    • Analysis of viewer engagement and discussion patterns.
    • Exploration of sentiment trends across different video categories.
    • Regional sentiment analysis.
    • Building machine learning models for sentiment prediction.
    • Analyzing the impact of video content on viewer sentiment.

    This dataset is open-sourced to encourage collaboration and innovation. Detailed documentation and the code used for extraction, labeling, and augmentation are available in the accompanying GitHub repository.

  17. Twitter Sentiment Analysis Dataset

    • kaggle.com
    zip
    Updated Jul 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Durgesh Rao (2023). Twitter Sentiment Analysis Dataset [Dataset]. https://www.kaggle.com/datasets/durgeshrao9993/twitter-analysis-dataset-2022
    Explore at:
    zip(1291530 bytes)Available download formats
    Dataset updated
    Jul 3, 2023
    Authors
    Durgesh Rao
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    The Twitter Sentiment Analysis Dataset is a widely used dataset in the field of natural language processing and sentiment analysis. It consists of a collection of tweets, each labeled with the sentiment expressed in the tweet, which can be positive, negative, or neutral. This dataset is commonly used for training and evaluating machine learning models that aim to automatically analyze and classify the sentiment behind Twitter messages.

    The dataset contains a diverse range of tweets, capturing the opinions, emotions, and attitudes of Twitter users on various topics such as movies, products, events, or general daily experiences. The tweets cover a broad spectrum of sentiments, including expressions of joy, satisfaction, anger, disappointment, sarcasm, or indifference.

  18. Sentiment Analysis on Financial Tweets

    • kaggle.com
    zip
    Updated Sep 5, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vivek Rathi (2019). Sentiment Analysis on Financial Tweets [Dataset]. https://www.kaggle.com/datasets/vivekrathi055/sentiment-analysis-on-financial-tweets
    Explore at:
    zip(2538259 bytes)Available download formats
    Dataset updated
    Sep 5, 2019
    Authors
    Vivek Rathi
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Context

    The following information can also be found at https://www.kaggle.com/davidwallach/financial-tweets. Out of curosity, I just cleaned the .csv files to perform a sentiment analysis. So both the .csv files in this dataset are created by me.

    Anything you read in the description is written by David Wallach and using all this information, I happen to perform my first ever sentiment analysis.

    "I have been interested in using public sentiment and journalism to gather sentiment profiles on publicly traded companies. I first developed a Python package (https://github.com/dwallach1/Stocker) that scrapes the web for articles written about companies, and then noticed the abundance of overlap with Twitter. I then developed a NodeJS project that I have been running on my RaspberryPi to monitor Twitter for all tweets coming from those mentioned in the content section. If one of them tweeted about a company in the stocks_cleaned.csv file, then it would write the tweet to the database. Currently, the file is only from earlier today, but after about a month or two, I plan to update the tweets.csv file (hopefully closer to 50,000 entries.

    I am not quite sure how this dataset will be relevant, but I hope to use these tweets and try to generate some sense of public sentiment score."

    Content

    This dataset has all the publicly traded companies (tickers and company names) that were used as input to fill the tweets.csv. The influencers whose tweets were monitored were: ['MarketWatch', 'business', 'YahooFinance', 'TechCrunch', 'WSJ', 'Forbes', 'FT', 'TheEconomist', 'nytimes', 'Reuters', 'GerberKawasaki', 'jimcramer', 'TheStreet', 'TheStalwart', 'TruthGundlach', 'Carl_C_Icahn', 'ReformedBroker', 'benbernanke', 'bespokeinvest', 'BespokeCrypto', 'stlouisfed', 'federalreserve', 'GoldmanSachs', 'ianbremmer', 'MorganStanley', 'AswathDamodaran', 'mcuban', 'muddywatersre', 'StockTwits', 'SeanaNSmith'

    Acknowledgements

    The data used here is gathered from a project I developed : https://github.com/dwallach1/StockerBot

    Inspiration

    I hope to develop a financial sentiment text classifier that would be able to track Twitter's (and the entire public's) feelings about any publicly traded company (and cryptocurrency)

  19. Hindi_Sentiment_Dataset

    • kaggle.com
    zip
    Updated Apr 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pratham R Shetty (2024). Hindi_Sentiment_Dataset [Dataset]. https://www.kaggle.com/datasets/praths71018/hindi-sentiment-dataset
    Explore at:
    zip(230354 bytes)Available download formats
    Dataset updated
    Apr 11, 2024
    Authors
    Pratham R Shetty
    Description

    The dataset contains about 8000 sentences in Hindi classified using 7 labels namely 'neutral', 'surprise', 'fear', 'sadness', 'joy', 'disgust', 'anger'. The dataset can be used for sentiment analysis for Hindi sentences via applying NLP or sequential learning models .

  20. Sentiment Analysis Dataset for NLP Projects

    • kaggle.com
    zip
    Updated Nov 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AlyAhmedTS13 (2025). Sentiment Analysis Dataset for NLP Projects [Dataset]. https://www.kaggle.com/datasets/alyahmedts13/reddit-sentiment-analysis-dataset-for-nlp-projects
    Explore at:
    zip(1204347 bytes)Available download formats
    Dataset updated
    Nov 16, 2025
    Authors
    AlyAhmedTS13
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    🕹️ About Dataset

    This dataset contains short Reddit posts (≤280 characters) about pop music and pop stars, labeled for sentiment analysis.

    We collected ~124k posts using keywords like Taylor Swift, Olivia Rodrigo, Grammy, Billboard, and subreddits like popheads, Music, and Billboard. After cleaning and filtering, we kept only short-form, English posts and combined each post’s title and body into a single text column.

    The final data set is about 32,000+ rows

    Sentiment labels (positive, neutral, negative) were generated using a BERT-based model fine-tuned for social media (CardiffNLP’s Twitter RoBERTa).

    This version is ready for NLP sentiment projects — train your own model, explore pop fandom discourse, or benchmark transformer performance on real-world Reddit data.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Suchintika Sarkar (2024). Sentiment Analysis for Mental Health [Dataset]. https://www.kaggle.com/datasets/suchintikasarkar/sentiment-analysis-for-mental-health
Organization logo

Sentiment Analysis for Mental Health

Unlocking Mental Health Patterns through Statements

Explore at:
14 scholarly articles cite this dataset (View in Google Scholar)
zip(11587194 bytes)Available download formats
Dataset updated
Jul 5, 2024
Authors
Suchintika Sarkar
License

http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

Description

This comprehensive dataset is a meticulously curated collection of mental health statuses tagged from various statements. The dataset amalgamates raw data from multiple sources, cleaned and compiled to create a robust resource for developing chatbots and performing sentiment analysis.

Data Source:

The dataset integrates information from the following Kaggle datasets:

Data Overview:

The dataset consists of statements tagged with one of the following seven mental health statuses: - Normal - Depression - Suicidal - Anxiety - Stress - Bi-Polar - Personality Disorder

Data Collection:

The data is sourced from diverse platforms including social media posts, Reddit posts, Twitter posts, and more. Each entry is tagged with a specific mental health status, making it an invaluable asset for:

  • Developing intelligent mental health chatbots.
  • Performing in-depth sentiment analysis.
  • Research and studies related to mental health trends.

Features:

  • unique_id: A unique identifier for each entry.
  • Statement: The textual data or post.
  • Mental Health Status: The tagged mental health status of the statement.

Usage:

This dataset is ideal for training machine learning models aimed at understanding and predicting mental health conditions based on textual data. It can be used in various applications such as:

  • Chatbot development for mental health support.
  • Sentiment analysis to gauge mental health trends.
  • Academic research on mental health patterns.

Acknowledgments:

This dataset was created by aggregating and cleaning data from various publicly available datasets on Kaggle. Special thanks to the original dataset creators for their contributions.

Search
Clear search
Close search
Google apps
Main menu