100+ datasets found
  1. Sentiment Analysis Dataset

    • kaggle.com
    zip
    Updated May 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    abdelmalek eladjelet (2025). Sentiment Analysis Dataset [Dataset]. https://www.kaggle.com/datasets/abdelmalekeladjelet/sentiment-analysis-dataset
    Explore at:
    zip(9105036 bytes)Available download formats
    Dataset updated
    May 3, 2025
    Authors
    abdelmalek eladjelet
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    🧠 Multi-Class Sentiment Analysis Dataset (240K+ English Comments)

    šŸ“Œ Description

    This dataset is a large-scale collection of 241,000+ English-language comments sourced from various online platforms. Each comment is annotated with a sentiment label:

    • 0 — Negative
    • 1 — Neutral
    • 2 — Positive

    The Data has been gathered from multiple websites such as : Hugginface : https://huggingface.co/datasets/Sp1786/multiclass-sentiment-analysis-dataset Kaggle : https://www.kaggle.com/datasets/abhi8923shriv/sentiment-analysis-dataset
    https://www.kaggle.com/datasets/jp797498e/twitter-entity-sentiment-analysis https://www.kaggle.com/datasets/crowdflower/twitter-airline-sentiment

    The goal is to enable training and evaluation of multi-class sentiment analysis models for real-world text data. The dataset is already preprocessed — lowercase, cleaned from punctuation, URLs, numbers, and stopwords — and is ready for NLP pipelines.

    šŸ“Š Columns

    ColumnDescription
    CommentUser-generated text content
    SentimentSentiment label (0=Negative, 1=Neutral, 2=Positive)

    šŸš€ Use Cases

    • 🧠 Train sentiment classifiers using LSTM, BiLSTM, CNN, BERT, or RoBERTa
    • šŸ” Evaluate preprocessing and tokenization strategies
    • šŸ“ˆ Benchmark NLP models on multi-class classification tasks
    • šŸŽ“ Educational projects and research in opinion mining or text classification
    • 🧪 Fine-tune transformer models on a large and diverse sentiment dataset

    šŸ’¬ Example

    Comment: "apple pay is so convenient secure and easy to use"
    Sentiment: 2 (Positive)
    
  2. Datasets for Sentiment Analysis

    • zenodo.org
    csv
    Updated Dec 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 10, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.

    Below are the datasets specified, along with the details of their references, authors, and download sources.

    ----------- STS-Gold Dataset ----------------

    The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.

    Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.

    File name: sts_gold_tweet.csv

    ----------- Amazon Sales Dataset ----------------

    This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.

    Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)

    Features:

    • product_id - Product ID
    • product_name - Name of the Product
    • category - Category of the Product
    • discounted_price - Discounted Price of the Product
    • actual_price - Actual Price of the Product
    • discount_percentage - Percentage of Discount for the Product
    • rating - Rating of the Product
    • rating_count - Number of people who voted for the Amazon rating
    • about_product - Description about the Product
    • user_id - ID of the user who wrote review for the Product
    • user_name - Name of the user who wrote review for the Product
    • review_id - ID of the user review
    • review_title - Short review
    • review_content - Long review
    • img_link - Image Link of the Product
    • product_link - Official Website Link of the Product

    License: CC BY-NC-SA 4.0

    File name: amazon.csv

    ----------- Rotten Tomatoes Reviews Dataset ----------------

    This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.

    This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).

    Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics

    File name: data_rt.csv

    ----------- Preprocessed Dataset Sentiment Analysis ----------------

    Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
    Stemmed and lemmatized using nltk.
    Sentiment labels are generated using TextBlob polarity scores.

    The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).

    DOI: 10.34740/kaggle/dsv/3877817

    Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }

    This dataset was used in the experimental phase of my research.

    File name: EcoPreprocessed.csv

    ----------- Amazon Earphones Reviews ----------------

    This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.

    This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

    The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)

    License: U.S. Government Works

    Source: www.amazon.in

    File name (original): AllProductReviews.csv (contains 14337 reviews)

    File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)

    ----------- Amazon Musical Instruments Reviews ----------------

    This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.

    This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

    The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).

    Source: http://jmcauley.ucsd.edu/data/amazon/

    File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)

    File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)

  3. h

    turkish-sentiment-analysis-dataset

    • huggingface.co
    • kaggle.com
    Updated Jun 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Batuhan (2022). turkish-sentiment-analysis-dataset [Dataset]. https://huggingface.co/datasets/winvoker/turkish-sentiment-analysis-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 22, 2022
    Authors
    Batuhan
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset

    This dataset contains positive , negative and notr sentences from several data sources given in the references. In the most sentiment models , there are only two labels; positive and negative. However , user input can be totally notr sentence. For such cases there were no data I could find. Therefore I created this dataset with 3 class. Positive and negative sentences are listed below. Notr examples are extraced from turkish wiki dump. In addition, added some random text… See the full description on the dataset page: https://huggingface.co/datasets/winvoker/turkish-sentiment-analysis-dataset.

  4. c

    Sentiment Analysis Dataset

    • cubig.ai
    zip
    Updated May 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Sentiment Analysis Dataset [Dataset]. https://cubig.ai/store/products/270/sentiment-analysis-dataset
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 20, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
    Description

    1) Data Introduction • The Sentiment Analysis Dataset is a dataset for emotional analysis, including large-scale tweet text collected from Twitter and emotional polarity (0=negative, 2=neutral, 4=positive) labels for each tweet, featuring automatic labeling based on emoticons.

    2) Data Utilization (1) Sentiment Analysis Dataset has characteristics that: • Each sample consists of six columns: emotional polarity, tweet ID, date of writing, search word, author, and tweet body, and is suitable for training natural language processing and classification models using tweet text and emotion labels. (2) Sentiment Analysis Dataset can be used to: • Emotional Classification Model Development: Using tweet text and emotional polarity labels, we can build positive, negative, and neutral emotional automatic classification models with various machine learning and deep learning models such as logistic regression, SVM, RNN, and LSTM. • Analysis of SNS public opinion and trends: By analyzing the distribution of emotions by time series and keywords, you can explore changes in public opinion on specific issues or brands, positive and negative trends, and key emotional keywords.

  5. E

    A Sentiment Analysis Dataset for Code-Mixed Malayalam-English

    • live.european-language-grid.eu
    • data-staging.niaid.nih.gov
    • +2more
    tsv
    Updated Dec 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). A Sentiment Analysis Dataset for Code-Mixed Malayalam-English [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7634
    Explore at:
    tsvAvailable download formats
    Dataset updated
    Dec 13, 2021
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    There is an increasing demand for sentiment analysis of text from social media which are mostly code-mixed. Systems trained on monolingual data fail for code-mixed data due to the complexity of mixing at different levels of the text. However, very few resources are available for code-mixed data to create models specific for this data. Although much research in multilingual and cross-lingual sentiment analysis has used semi-supervised or unsupervised methods, supervised methods still performs better. Only a few datasets for popular languages such as English-Spanish, English-Hindi, and English-Chinese are available. There are no resources available for Malayalam-English code-mixed data. This paper presents a new gold standard corpus for sentiment analysis of code-mixed text in Malayalam-English annotated by voluntary annotators. This gold standard corpus obtained a Krippendorff’s alpha above 0.8 for the dataset. We use this new corpus to provide the benchmark for sentiment analysis in Malayalam-English code-mixed texts.

  6. Text Analysis

    • kaggle.com
    zip
    Updated Apr 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vivek603 (2023). Text Analysis [Dataset]. https://www.kaggle.com/datasets/vivek603/text-analysis
    Explore at:
    zip(174251 bytes)Available download formats
    Dataset updated
    Apr 13, 2023
    Authors
    Vivek603
    Description

    Title: Text-Analysis Dataset with Stopwords, Positive Words, and Negative Words

    Description: This dataset is designed for text analysis tasks and contains three types of words: stopwords, positive words, and negative words. Stopwords are common words that are typically removed from text during preprocessing because they don't carry much meaning, such as "the," "and," "a," etc. Positive words are words that convey a positive sentiment, while negative words are words that convey a negative sentiment.

    The stopwords were obtained from a standard list used in natural language processing, while the positive and negative words were obtained from publicly available sentiment lexicons.

    Each word is provided as a separate entry in the dataset.

    The dataset is provided in CSV format and is suitable for use in various text analysis tasks, such as sentiment analysis, text classification, and natural language processing.

    Columns: All the csvs contain a single column having the specified set of words.

    EG: positive-words.txt a+ abound abounds abundance abundant accessable accessible acclaim acclaimed acclamation accolade accolades accommodative . . . and so on

    This dataset can be used to build models that can automatically classify text as positive or negative, or to identify which words are likely to carry more meaning in a given text.

  7. The Bible and The Quran: Sentiment Analysis.

    • kaggle.com
    zip
    Updated Aug 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patrick L Ford (2024). The Bible and The Quran: Sentiment Analysis. [Dataset]. https://www.kaggle.com/datasets/patricklford/bible-and-quran-sentiment-analysis
    Explore at:
    zip(1713644 bytes)Available download formats
    Dataset updated
    Aug 1, 2024
    Authors
    Patrick L Ford
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Introduction

    Text analysis, also known as text mining or natural language processing (NLP), is a branch of computer science and artificial intelligence that involves the extraction of useful information and knowledge from unstructured text data. It encompasses a wide range of techniques and applications, from sentiment analysis and topic modelling to information retrieval and machine translation.

    Core Concepts in Text Analysis

    Before delving into specific tools and techniques, it's essential to understand some fundamental concepts: - Tokenization: The process of breaking down text into individual words or tokens.
    - Stop word removal: Eliminating common words (like "the," "and," "of") that often carry little semantic value.
    - Stemming and Lemmatization: Reducing words to their root form to improve analysis accuracy.
    - Part-of-speech tagging: Identifying the grammatical role of words (noun, verb, adjective, etc.). Named entity recognition (NER): Recognising and classifying named entities (people, organisations, locations, etc.).
    - Sentiment Analysis: Bing and NRC Sentiment analysis aims to determine the emotional tone behind a piece of text. It's widely used in social media monitoring, customer feedback analysis, and market research.
    - Bing Sentiment Analysis: Microsoft's Bing offers a sentiment analysis API that provides polarity scores (positive, negative, neutral) for text. It's relatively easy to use and integrates well with other Bing services. However, it might not be as granular as other options.
    - NRC Sentiment Analysis: The National Research Council (NRC) lexicon is a widely used resource for sentiment analysis. It assigns multiple emotions (anger, fear, joy, sadness, surprise, disgust) to words, allowing for more nuanced analysis. It's often used as a baseline for comparison with other sentiment analysis methods.
    - Word counts: A simple but informative metric that measures the frequency of words in a text. It can be used to identify keywords, identify the most common topics, and compare texts. - Lexical diversity: This metric measures the variety of words used in a text. It can help assess the complexity and richness of language. Common measures include type-token ratio (TTR) and lexical density.
    - Word clouds (also known as tag clouds): Word clouds are visual representations of text data where the size of each word corresponds to its frequency or importance. They are useful for quickly identifying prominent terms and themes in a text.

    Advanced Text Analysis Techniques

    Beyond the basics, text analysis offers a wealth of sophisticated techniques: - Topic modelling: Uncovers hidden thematic structures within a large collection of documents. - Text classification: Categories text into predefined categories (e.g., spam/not spam, news/sports).
    - Named entity recognition (NER): Identifies and classifies named entities (people, organisations, locations, etc.).
    - Relationship extraction: Discovers relationships between entities in text (e.g., "Apple acquired Beats").
    - Machine translation: Translates text from one language to another.

    Applications of Text Analysis

    Text analysis has a wide range of applications across various industries: - Social media monitoring: Analysing public sentiment, identifying trends, and tracking brand reputation. - Customer service: Analysing customer feedback to improve products and services. - Market research: Understanding customer preferences and market trends. - Healthcare: Extracting information from medical records, literature, and patient reviews. - Legal: Analysing legal documents for information extraction and discovery.

    Challenges and Considerations

    Text analysis is not without its challenges: - Ambiguity: Natural language is inherently ambiguous, making it difficult for computers to interpret meaning accurately.
    - Data quality: The quality of the text data can significantly impact the results of analysis. - Computational resources: Some text analysis techniques, especially those involving deep learning, require significant computational power.

    By understanding the core concepts and techniques of text analysis, we can harness the power of text data to extract valuable insights and drive decision-making.

    Visualisations: Old_Testament_KJ_Bible.csv and Quran_english.csv

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13231939%2F30ac54122eb49d5e712b5133316e9654%2FScreenshot%202024-08-04%2015.46.50.png?generation=1722784916593731&alt=media" alt="">

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13231939%2F0701222f04c944c7b8064d20e9a6278b%2FScreenshot%202024-08-04%2015.47.45.png?generation=1722784617149634&alt=media" alt="">

    ![](https://www.googleapis.com/downloa...

  8. Image and text datasets for sentiment analysis

    • figshare.com
    zip
    Updated Jun 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chuang Dong (2025). Image and text datasets for sentiment analysis [Dataset]. http://doi.org/10.6084/m9.figshare.29234471.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 4, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Chuang Dong
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is an image and text dataset for sentiment analysis.

  9. g

    Multimodal Sentiment Analysis Dataset

    • gts.ai
    json
    Updated Jun 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS (2024). Multimodal Sentiment Analysis Dataset [Dataset]. https://gts.ai/dataset-download/multimodal-sentiment-analysis-dataset/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Jun 28, 2024
    Dataset provided by
    GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
    Authors
    GTS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Explore our unique Multimodal Sentiment Analysis Dataset, featuring high-quality images and corresponding text descriptions with sentiment labels.

  10. Sentiment Analysis Dataset(3labels)

    • kaggle.com
    zip
    Updated Apr 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AmiteshPatel_ (2023). Sentiment Analysis Dataset(3labels) [Dataset]. https://www.kaggle.com/datasets/amiteshpatel16/sentiment-analysis-dataset3labels
    Explore at:
    zip(60221965 bytes)Available download formats
    Dataset updated
    Apr 15, 2023
    Authors
    AmiteshPatel_
    Description

    What data this file contains? This file contains two columns text and target. Text can be neutral, positive, or negative. Target can be label 0, 1, 2 for neutral, positive, and negative respectively.

    It has two columns first column is of text and second column is of target which contains sentiment of text

  11. h

    sentiment-analysis-dataset

    • huggingface.co
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    krusty crab (2025). sentiment-analysis-dataset [Dataset]. https://huggingface.co/datasets/krusty99/sentiment-analysis-dataset
    Explore at:
    Dataset updated
    Apr 1, 2025
    Authors
    krusty crab
    Description

    license: mittask_categories: - text-classificationlanguage: - entags: - financepretty_name: sentiment-analysis-datasetsize_categories: - n<1K

      Dataset Card for Sentiment Analysis Dataset
    

    This dataset card aims to provide a comprehensive overview of a sentiment analysis dataset containing product reviews labeled with sentiment.

      Dataset Details
    
    
    
    
    
      Dataset Description
    

    This dataset contains 1,000 product reviews categorized into two sentiment… See the full description on the dataset page: https://huggingface.co/datasets/krusty99/sentiment-analysis-dataset.

  12. Twitter Sentiment Analysis Datasets

    • brightdata.com
    .json, .csv, .xlsx
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data, Twitter Sentiment Analysis Datasets [Dataset]. https://brightdata.com/products/datasets/twitter/sentiment-analysis
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Our Twitter Sentiment Analysis Dataset provides a comprehensive collection of tweets, enabling businesses, researchers, and analysts to assess public sentiment, track trends, and monitor brand perception in real time. This dataset includes detailed metadata for each tweet, allowing for in-depth analysis of user engagement, sentiment trends, and social media impact.

    Key Features:
    
      Tweet Content & Metadata: Includes tweet text, hashtags, mentions, media attachments, and engagement metrics such as likes, retweets, and replies.
      Sentiment Classification: Analyze sentiment polarity (positive, negative, neutral) to gauge public opinion on brands, events, and trending topics.
      Author & User Insights: Access user details such as username, profile information, follower count, and account verification status.
      Hashtag & Topic Tracking: Identify trending hashtags and keywords to monitor conversations and sentiment shifts over time.
      Engagement Metrics: Measure tweet performance based on likes, shares, and comments to evaluate audience interaction.
      Historical & Real-Time Data: Choose from historical datasets for trend analysis or real-time data for up-to-date sentiment tracking.
    
    
    Use Cases:
    
      Brand Monitoring & Reputation Management: Track public sentiment around brands, products, and services to manage reputation and customer perception.
      Market Research & Consumer Insights: Analyze consumer opinions on industry trends, competitor performance, and emerging market opportunities.
      Political & Social Sentiment Analysis: Evaluate public opinion on political events, social movements, and global issues.
      AI & Machine Learning Applications: Train sentiment analysis models for natural language processing (NLP) and predictive analytics.
      Advertising & Campaign Performance: Measure the effectiveness of marketing campaigns by analyzing audience engagement and sentiment.
    
    
    
      Our dataset is available in multiple formats (JSON, CSV, Excel) and can be delivered via API, cloud storage (AWS, Google Cloud, Azure), or direct download. 
      Gain valuable insights into social media sentiment and enhance your decision-making with high-quality, structured Twitter data.
    
  13. Depression Detection using Sentiment Analysis

    • kaggle.com
    zip
    Updated Jan 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SZE GEE LIM (2025). Depression Detection using Sentiment Analysis [Dataset]. https://www.kaggle.com/datasets/szegeelim/mental-health
    Explore at:
    zip(11587194 bytes)Available download formats
    Dataset updated
    Jan 5, 2025
    Authors
    SZE GEE LIM
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    This dataset is a collection of mental health statuses, gathered from real statements shared by individuals. By bringing together and refining data from various sources, it provides a rich and reliable foundation for developing chatbots and exploring sentiment analysis. The goal is to offer meaningful insights and create tools that can better understand and respond to mental health needs.

    The dataset brings together information from the following Kaggle datasets: • 3k Conversations Dataset for Chatbot • Depression Reddit Cleaned • Human Stress Prediction • Predicting Anxiety in Mental Health Data • Mental Health Dataset Bipolar • Reddit Mental Health Data • Students Anxiety and Depression Dataset • Suicidal Mental Health Dataset • Suicidal Tweet Detection Dataset

    Dataset Overview:

    The data is collected from various platforms, including social media, Reddit, Twitter, and others. Each entry is labeled with a specific mental health status. The dataset contains statements categorized under one of the following seven mental health statuses: Normal 1. Depression 2. Suicidal 3. Anxiety 4. Stress 5. Bipolar 6. Personality Disorder

    Features:

    1. A unique identifier for each entry
    2. Statement: The textual data or post
    3. Status: The labeled mental health status of each statement

    Usage:

    The dataset's structure and features make it highly versatile for various applications, particularly in the fields of mental health, artificial intelligence, and data analysis. Here are some key use cases: 1. Mental health chatbot development 2. Sentiment and emotion analysis 3. Early detection and intervention 4. Academic and clinical research 5. Social media monitoring 6. NLP model training

    Acknowledgments:

    This dataset is created by gathering and refining data from publicly available Kaggle datasets. Special thanks to the original dataset creators for their valuable contributions.

  14. h

    synthetic-sentiment-analysis-dataset-v1

    • huggingface.co
    Updated Nov 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tanaos (2025). synthetic-sentiment-analysis-dataset-v1 [Dataset]. https://huggingface.co/datasets/tanaos/synthetic-sentiment-analysis-dataset-v1
    Explore at:
    Dataset updated
    Nov 27, 2025
    Dataset authored and provided by
    Tanaos
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    šŸ“ŠšŸ™‚šŸ˜šŸ˜ž Tanaos Sentiment Analysis Training Dataset

    This dataset was created synthetically by Tanaos with the Artifex Python library. The dataset is designed to train and evaluate sentiment analysis systems — models that classify the sentiment expressed in text as one of five possible categories: very_negative, negative, neutral, positive or very_positive. It can be used to build sentiment analysis models for various applications, such as customer feedback analysis, social… See the full description on the dataset page: https://huggingface.co/datasets/tanaos/synthetic-sentiment-analysis-dataset-v1.

  15. m

    Arabic Sentiment Datasets

    • data.mendeley.com
    Updated Sep 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tamara Alqablan (2025). Arabic Sentiment Datasets [Dataset]. http://doi.org/10.17632/6w9g62xc67.2
    Explore at:
    Dataset updated
    Sep 4, 2025
    Authors
    Tamara Alqablan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is specifically designed for sentiment analysis (SA) in the Arabic language, serving as a crucial resource for developing and evaluating various SA models. The dataset contains [briefly describe the content, e.g., number of entries, types of sentiments (positive, negative, neutral), sources of the data like social media, reviews, etc.]. It has been curated to meet the unique linguistic characteristics of Arabic text, facilitating the training, validation, and benchmarking of machine learning and natural language processing models. While there are several sentiment analysis datasets available in multiple languages, this dataset focuses on Arabic, supporting research aimed at understanding sentiment in Arabic-speaking communities. To ensure the effectiveness of feature selection approaches in sentiment analysis, the dataset can be used alongside well-known datasets such as those available from the UCI Machine Learning Repository (https://archive.ics.uci.edu/), which provides a range of datasets commonly employed for evaluating feature selection techniques.

    The dataset aligns with previous work such as Al-Moslmi et al.'s construction of an Arabic sentiment lexicon for public use, which contributed significantly to Arabic sentiment analysis resources [1]. Additionally, this dataset draws inspiration from established Arabic corpora such as the Opinion Corpus for Arabic (OCA) by Rushdi-Saleh et al. [2], and Ar-Twitter, a corpus designed for sentiment analysis on Arabic tweets, as demonstrated by Abdulla et al. [3].

    References: Al-Moslmi, T., Albared, M., Al-Shabi, A., Omar, N., Abdullah, S.: Arabic sentilexicon: Constructing publicly available language resources for Arabic sentiment analysis. Journal of Information Science, 44(3), 345–362 (2018). Rushdi-Saleh, M., MartĆ­n-Valdivia, M.T., UreƱa-López, L.A., Perea-Ortega, J.M.: OCA: Opinion corpus for Arabic. Journal of the American Society for Information Science and Technology, 62(10), 2045–2054 (2011). Abdulla, N., Mahyoub, N., Shehab, M., Al-Ayyoub, M.: Arabic sentiment analysis: Corpus-based and lexicon-based. In: Proceedings of The IEEE Conference on Applied Electrical Engineering and Computing Technologies (AEECT) (2013).

  16. Twitter dataset

    • figshare.com
    csv
    Updated Feb 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shreyas Poojary; Mohammed Riza; Rashmi Laxmikant Malghan (2025). Twitter dataset [Dataset]. http://doi.org/10.6084/m9.figshare.28390334.v2
    Explore at:
    csvAvailable download formats
    Dataset updated
    Feb 11, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Shreyas Poojary; Mohammed Riza; Rashmi Laxmikant Malghan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains tweets labeled for sentiment analysis, categorized into Positive, Negative, and Neutral sentiments. The dataset includes tweet IDs, user metadata, sentiment labels, and tweet text, making it suitable for Natural Language Processing (NLP), machine learning, and AI-based sentiment classification research. Originally sourced from Kaggle, this dataset is curated for improved usability in social media sentiment analysis.

  17. h

    twitter-sentiment-analysis

    • huggingface.co
    Updated Oct 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md. Abdullah Al Mamun (2025). twitter-sentiment-analysis [Dataset]. https://huggingface.co/datasets/bdstar/twitter-sentiment-analysis
    Explore at:
    Dataset updated
    Oct 30, 2025
    Authors
    Md. Abdullah Al Mamun
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    🐦 Twitter Sentiment Analysis (bdstar/twitter-sentiment-analysis)

      🧠 Overview
    

    A refined and merged version of Twitter text sentiment datasets, providing a clean and well-balanced dataset for sentiment classification across three sentiment categories:positive, negative, and neutral. This dataset is split into three parts — train, test, and validation — each sourced from highly reputable open datasets.It is designed for training, evaluating, and benchmarking NLP models for… See the full description on the dataset page: https://huggingface.co/datasets/bdstar/twitter-sentiment-analysis.

  18. h

    CaSSA-catalan-structured-sentiment-analysis

    • huggingface.co
    Updated May 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Projecte Aina (2024). CaSSA-catalan-structured-sentiment-analysis [Dataset]. http://doi.org/10.57967/hf/1714
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 8, 2024
    Dataset authored and provided by
    Projecte Aina
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Dataset Card for CaSSA, the Catalan Structured Sentiment Analysis dataset

      Dataset Summary
    

    The CaSSA dataset is a corpus of 6,400 reviews and forum messages annotated with polar expressions. Each piece of text is annotated with all the expressions of polarity that it contains. For each polar expression, we annotated the expression itself, the target (the object of the expression), and the source (the subject expressing the sentiment). 25,453 polar expressions have been… See the full description on the dataset page: https://huggingface.co/datasets/projecte-aina/CaSSA-catalan-structured-sentiment-analysis.

  19. A sentence-level sentiment analysis of some literary texts inEnglish...

    • figshare.com
    txt
    Updated Sep 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ricardo HernƔndez-PƩrez; Pablo A. Lara-Martƭnez; Bibiana Obregon-Quintana; Larry S. Liebovitch; Lev Guzmanv (2024). A sentence-level sentiment analysis of some literary texts inEnglish language [Dataset]. http://doi.org/10.6084/m9.figshare.27092008.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Sep 24, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Ricardo HernƔndez-PƩrez; Pablo A. Lara-Martƭnez; Bibiana Obregon-Quintana; Larry S. Liebovitch; Lev Guzmanv
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We perform a sentence-level sentiment analysis study of different literary texts in English language. Each text is converted into a series in which the data points are the sentiment value of each sentence. By applying the Detrended Fluctuation Analysis (DFA) and the Higuchi Fractal Dimension (HFD) methods to these sentiment series, we find that they are monofractal with long-term correlations, which can be explained by the fact that the writing process has memory by construction, with a sentiment evolution that is self-similar. Furthermore, we discretize these series by applying a classification approach which transforms the series into a one on which each data point has only three possible values, corresponding to positive, neutral or negative sentiments. We map these three-states series to a Markov chain and investigate the transitions of sentiment from one sentence to the next, obtaining a state transition matrix for each book that provides information on the probability of transitioning between sentiments from one sentence to the next. This approach shows that there are biases towards increasing the probability of switching to neutral or positive sentences. The two approaches supplement each other, since the long-term correlation approach allows a global assessment of the sentiment of the book, while the state transition matrix approach provides local information about the sentiment evolution along the text.

  20. Multiple Data for Sentiment Analysis

    • kaggle.com
    zip
    Updated May 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    dangerous AI (2024). Multiple Data for Sentiment Analysis [Dataset]. https://www.kaggle.com/datasets/dangerousai/multiple-data-for-sentiment-analysis/code
    Explore at:
    zip(72367351 bytes)Available download formats
    Dataset updated
    May 26, 2024
    Authors
    dangerous AI
    Description

    Special Purpose

    This data set is created for enhancing sentiment analysis, where texts are written by English beginner instead of online blogs. So it may be not useful to sentiment classification upon texts from tweet or Reddit. There are also a slightly big file containing smaples from Tweet, which are much more diverse but less clean.

    Methodology

    Samples from LLMs

    The texts are all generated by LLMs including GPT-3.5-turbo and ChatGLM-4 by simple prompts. The LLMs are prompted to generate new texts on the basis of previous texts, and are strictly required to generated distinctive sentences.

    Large Scale Data

    The large data set contains more than 1.7 million diverse and clean text-sentiment pairs. The data are from those nice datasets:\ https://www.kaggle.com/datasets/saurabhshahane/twitter-sentiment-dataset\ https://www.kaggle.com/datasets/jp797498e/twitter-entity-sentiment-analysis\ https://www.kaggle.com/datasets/kazanova/sentiment140\ https://www.kaggle.com/datasets/abhi8923shriv/sentiment-analysis-dataset.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
abdelmalek eladjelet (2025). Sentiment Analysis Dataset [Dataset]. https://www.kaggle.com/datasets/abdelmalekeladjelet/sentiment-analysis-dataset
Organization logo

Sentiment Analysis Dataset

Dataset for text classification

Explore at:
zip(9105036 bytes)Available download formats
Dataset updated
May 3, 2025
Authors
abdelmalek eladjelet
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

🧠 Multi-Class Sentiment Analysis Dataset (240K+ English Comments)

šŸ“Œ Description

This dataset is a large-scale collection of 241,000+ English-language comments sourced from various online platforms. Each comment is annotated with a sentiment label:

  • 0 — Negative
  • 1 — Neutral
  • 2 — Positive

The Data has been gathered from multiple websites such as : Hugginface : https://huggingface.co/datasets/Sp1786/multiclass-sentiment-analysis-dataset Kaggle : https://www.kaggle.com/datasets/abhi8923shriv/sentiment-analysis-dataset
https://www.kaggle.com/datasets/jp797498e/twitter-entity-sentiment-analysis https://www.kaggle.com/datasets/crowdflower/twitter-airline-sentiment

The goal is to enable training and evaluation of multi-class sentiment analysis models for real-world text data. The dataset is already preprocessed — lowercase, cleaned from punctuation, URLs, numbers, and stopwords — and is ready for NLP pipelines.

šŸ“Š Columns

ColumnDescription
CommentUser-generated text content
SentimentSentiment label (0=Negative, 1=Neutral, 2=Positive)

šŸš€ Use Cases

  • 🧠 Train sentiment classifiers using LSTM, BiLSTM, CNN, BERT, or RoBERTa
  • šŸ” Evaluate preprocessing and tokenization strategies
  • šŸ“ˆ Benchmark NLP models on multi-class classification tasks
  • šŸŽ“ Educational projects and research in opinion mining or text classification
  • 🧪 Fine-tune transformer models on a large and diverse sentiment dataset

šŸ’¬ Example

Comment: "apple pay is so convenient secure and easy to use"
Sentiment: 2 (Positive)
Search
Clear search
Close search
Google apps
Main menu