100+ datasets found
  1. Datasets for Sentiment Analysis

    • zenodo.org
    csv
    Updated Dec 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 10, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.

    Below are the datasets specified, along with the details of their references, authors, and download sources.

    ----------- STS-Gold Dataset ----------------

    The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.

    Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.

    File name: sts_gold_tweet.csv

    ----------- Amazon Sales Dataset ----------------

    This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.

    Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)

    Features:

    • product_id - Product ID
    • product_name - Name of the Product
    • category - Category of the Product
    • discounted_price - Discounted Price of the Product
    • actual_price - Actual Price of the Product
    • discount_percentage - Percentage of Discount for the Product
    • rating - Rating of the Product
    • rating_count - Number of people who voted for the Amazon rating
    • about_product - Description about the Product
    • user_id - ID of the user who wrote review for the Product
    • user_name - Name of the user who wrote review for the Product
    • review_id - ID of the user review
    • review_title - Short review
    • review_content - Long review
    • img_link - Image Link of the Product
    • product_link - Official Website Link of the Product

    License: CC BY-NC-SA 4.0

    File name: amazon.csv

    ----------- Rotten Tomatoes Reviews Dataset ----------------

    This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.

    This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).

    Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics

    File name: data_rt.csv

    ----------- Preprocessed Dataset Sentiment Analysis ----------------

    Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
    Stemmed and lemmatized using nltk.
    Sentiment labels are generated using TextBlob polarity scores.

    The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).

    DOI: 10.34740/kaggle/dsv/3877817

    Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }

    This dataset was used in the experimental phase of my research.

    File name: EcoPreprocessed.csv

    ----------- Amazon Earphones Reviews ----------------

    This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.

    This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

    The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)

    License: U.S. Government Works

    Source: www.amazon.in

    File name (original): AllProductReviews.csv (contains 14337 reviews)

    File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)

    ----------- Amazon Musical Instruments Reviews ----------------

    This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.

    This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

    The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).

    Source: http://jmcauley.ucsd.edu/data/amazon/

    File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)

    File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)

  2. Sentiment Analysis for Mental Health

    • kaggle.com
    zip
    Updated Jul 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Suchintika Sarkar (2024). Sentiment Analysis for Mental Health [Dataset]. https://www.kaggle.com/datasets/suchintikasarkar/sentiment-analysis-for-mental-health
    Explore at:
    zip(11587194 bytes)Available download formats
    Dataset updated
    Jul 5, 2024
    Authors
    Suchintika Sarkar
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    This comprehensive dataset is a meticulously curated collection of mental health statuses tagged from various statements. The dataset amalgamates raw data from multiple sources, cleaned and compiled to create a robust resource for developing chatbots and performing sentiment analysis.

    Data Source:

    The dataset integrates information from the following Kaggle datasets:

    Data Overview:

    The dataset consists of statements tagged with one of the following seven mental health statuses: - Normal - Depression - Suicidal - Anxiety - Stress - Bi-Polar - Personality Disorder

    Data Collection:

    The data is sourced from diverse platforms including social media posts, Reddit posts, Twitter posts, and more. Each entry is tagged with a specific mental health status, making it an invaluable asset for:

    • Developing intelligent mental health chatbots.
    • Performing in-depth sentiment analysis.
    • Research and studies related to mental health trends.

    Features:

    • unique_id: A unique identifier for each entry.
    • Statement: The textual data or post.
    • Mental Health Status: The tagged mental health status of the statement.

    Usage:

    This dataset is ideal for training machine learning models aimed at understanding and predicting mental health conditions based on textual data. It can be used in various applications such as:

    • Chatbot development for mental health support.
    • Sentiment analysis to gauge mental health trends.
    • Academic research on mental health patterns.

    Acknowledgments:

    This dataset was created by aggregating and cleaning data from various publicly available datasets on Kaggle. Special thanks to the original dataset creators for their contributions.

  3. Data for Sentiment Analysis

    • kaggle.com
    zip
    Updated Oct 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arun Jangir (2023). Data for Sentiment Analysis [Dataset]. https://www.kaggle.com/datasets/arunjangir245/data-for-sentiment-analysis
    Explore at:
    zip(84855679 bytes)Available download formats
    Dataset updated
    Oct 11, 2023
    Authors
    Arun Jangir
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description
    1. Sentiment: Numeric sentiment label (0 for negative, 2 for neutral, 4 for positive).
    2. ID: Unique tweet identifier (e.g., 2087).
    3. Date: Date and time of the tweet (e.g., Sat May 16 23:58:44 UTC 2009).
    4. Query: Search term or "NO_QUERY" if not applicable.
    5. User: Twitter username (e.g., robotickilldozr).
    6. Text: The tweet content (e.g., "Lyx is cool").

    These fields contain sentiment analysis data, tweet details, and content.

  4. h

    news-sentiment-data

    • huggingface.co
    Updated Jul 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    amitk17 (2024). news-sentiment-data [Dataset]. https://huggingface.co/datasets/sweatSmile/news-sentiment-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 8, 2024
    Authors
    amitk17
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    sweatSmile/news-sentiment-data dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. h

    sentiment-analysis-for-mental-health-Combined-Data

    • huggingface.co
    Updated Oct 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Soliman (2024). sentiment-analysis-for-mental-health-Combined-Data [Dataset]. https://huggingface.co/datasets/AhmedSSoliman/sentiment-analysis-for-mental-health-Combined-Data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 2, 2024
    Authors
    Ahmed Soliman
    Description

    AhmedSSoliman/sentiment-analysis-for-mental-health-Combined-Data dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. c

    Sentiment Analysis Dataset

    • cubig.ai
    zip
    Updated May 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Sentiment Analysis Dataset [Dataset]. https://cubig.ai/store/products/270/sentiment-analysis-dataset
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 20, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
    Description

    1) Data Introduction • The Sentiment Analysis Dataset is a dataset for emotional analysis, including large-scale tweet text collected from Twitter and emotional polarity (0=negative, 2=neutral, 4=positive) labels for each tweet, featuring automatic labeling based on emoticons.

    2) Data Utilization (1) Sentiment Analysis Dataset has characteristics that: • Each sample consists of six columns: emotional polarity, tweet ID, date of writing, search word, author, and tweet body, and is suitable for training natural language processing and classification models using tweet text and emotion labels. (2) Sentiment Analysis Dataset can be used to: • Emotional Classification Model Development: Using tweet text and emotional polarity labels, we can build positive, negative, and neutral emotional automatic classification models with various machine learning and deep learning models such as logistic regression, SVM, RNN, and LSTM. • Analysis of SNS public opinion and trends: By analyzing the distribution of emotions by time series and keywords, you can explore changes in public opinion on specific issues or brands, positive and negative trends, and key emotional keywords.

  7. Sentiment Analysis Dataset

    • kaggle.com
    zip
    Updated May 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    abdelmalek eladjelet (2025). Sentiment Analysis Dataset [Dataset]. https://www.kaggle.com/datasets/abdelmalekeladjelet/sentiment-analysis-dataset
    Explore at:
    zip(9105036 bytes)Available download formats
    Dataset updated
    May 3, 2025
    Authors
    abdelmalek eladjelet
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    🧠 Multi-Class Sentiment Analysis Dataset (240K+ English Comments)

    📌 Description

    This dataset is a large-scale collection of 241,000+ English-language comments sourced from various online platforms. Each comment is annotated with a sentiment label:

    • 0 — Negative
    • 1 — Neutral
    • 2 — Positive

    The Data has been gathered from multiple websites such as : Hugginface : https://huggingface.co/datasets/Sp1786/multiclass-sentiment-analysis-dataset Kaggle : https://www.kaggle.com/datasets/abhi8923shriv/sentiment-analysis-dataset
    https://www.kaggle.com/datasets/jp797498e/twitter-entity-sentiment-analysis https://www.kaggle.com/datasets/crowdflower/twitter-airline-sentiment

    The goal is to enable training and evaluation of multi-class sentiment analysis models for real-world text data. The dataset is already preprocessed — lowercase, cleaned from punctuation, URLs, numbers, and stopwords — and is ready for NLP pipelines.

    📊 Columns

    ColumnDescription
    CommentUser-generated text content
    SentimentSentiment label (0=Negative, 1=Neutral, 2=Positive)

    🚀 Use Cases

    • 🧠 Train sentiment classifiers using LSTM, BiLSTM, CNN, BERT, or RoBERTa
    • 🔍 Evaluate preprocessing and tokenization strategies
    • 📈 Benchmark NLP models on multi-class classification tasks
    • 🎓 Educational projects and research in opinion mining or text classification
    • 🧪 Fine-tune transformer models on a large and diverse sentiment dataset

    💬 Example

    Comment: "apple pay is so convenient secure and easy to use"
    Sentiment: 2 (Positive)
    
  8. m

    Twitter Sentiments Dataset

    • data.mendeley.com
    Updated May 14, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SHERIF HUSSEIN (2021). Twitter Sentiments Dataset [Dataset]. http://doi.org/10.17632/z9zw7nt5h2.1
    Explore at:
    Dataset updated
    May 14, 2021
    Authors
    SHERIF HUSSEIN
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset has three sentiments namely, negative, neutral, and positive. It contains two fields for the tweet and label.

  9. Product Review Datasets for User Sentiment Analysis

    • datarade.ai
    Updated Sep 28, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oxylabs (2018). Product Review Datasets for User Sentiment Analysis [Dataset]. https://datarade.ai/data-products/product-review-datasets-for-user-sentiment-analysis-oxylabs
    Explore at:
    .json, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Sep 28, 2018
    Dataset authored and provided by
    Oxylabs
    Area covered
    Italy, Egypt, South Africa, Argentina, Canada, Antigua and Barbuda, Hong Kong, Sudan, Libya, Barbados
    Description

    Product Review Datasets: Uncover user sentiment

    Harness the power of Product Review Datasets to understand user sentiment and insights deeply. These datasets are designed to elevate your brand and product feature analysis, help you evaluate your competitive stance, and assess investment risks.

    Data sources:

    • Trustpilot: datasets encompassing general consumer reviews and ratings across various businesses, products, and services.

    Leave the data collection challenges to us and dive straight into market insights with clean, structured, and actionable data, including:

    • Product name;
    • Product category;
    • Number of ratings;
    • Ratings average;
    • Review title;
    • Review body;

    Choose from multiple data delivery options to suit your needs:

    1. Receive data in easy-to-read formats like spreadsheets or structured JSON files.
    2. Select your preferred data storage solutions, including SFTP, Webhooks, Google Cloud Storage, AWS S3, and Microsoft Azure Storage.
    3. Tailor data delivery frequencies, whether on-demand or per your agreed schedule.

    Why choose Oxylabs?

    1. Fresh and accurate data: Access organized, structured, and comprehensive data collected by our leading web scraping professionals.

    2. Time and resource savings: Concentrate on your core business goals while we efficiently handle the data extraction process at an affordable cost.

    3. Adaptable solutions: Share your specific data requirements, and we'll craft a customized data collection approach to meet your objectives.

    4. Legal compliance: Partner with a trusted leader in ethical data collection. Oxylabs is a founding member of the Ethical Web Data Collection Initiative, aligning with GDPR and CCPA standards.

    Pricing Options:

    Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

    Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

    Experience a seamless journey with Oxylabs:

    • Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.
    • Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.
    • Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.
    • Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

    Join the ranks of satisfied customers who appreciate our meticulous attention to detail and personalized support. Experience the power of Product Review Datasets today to uncover valuable insights and enhance decision-making.

  10. Twitter Sentiment Analysis Datasets

    • brightdata.com
    .json, .csv, .xlsx
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data, Twitter Sentiment Analysis Datasets [Dataset]. https://brightdata.com/products/datasets/twitter/sentiment-analysis
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Our Twitter Sentiment Analysis Dataset provides a comprehensive collection of tweets, enabling businesses, researchers, and analysts to assess public sentiment, track trends, and monitor brand perception in real time. This dataset includes detailed metadata for each tweet, allowing for in-depth analysis of user engagement, sentiment trends, and social media impact.

    Key Features:
    
      Tweet Content & Metadata: Includes tweet text, hashtags, mentions, media attachments, and engagement metrics such as likes, retweets, and replies.
      Sentiment Classification: Analyze sentiment polarity (positive, negative, neutral) to gauge public opinion on brands, events, and trending topics.
      Author & User Insights: Access user details such as username, profile information, follower count, and account verification status.
      Hashtag & Topic Tracking: Identify trending hashtags and keywords to monitor conversations and sentiment shifts over time.
      Engagement Metrics: Measure tweet performance based on likes, shares, and comments to evaluate audience interaction.
      Historical & Real-Time Data: Choose from historical datasets for trend analysis or real-time data for up-to-date sentiment tracking.
    
    
    Use Cases:
    
      Brand Monitoring & Reputation Management: Track public sentiment around brands, products, and services to manage reputation and customer perception.
      Market Research & Consumer Insights: Analyze consumer opinions on industry trends, competitor performance, and emerging market opportunities.
      Political & Social Sentiment Analysis: Evaluate public opinion on political events, social movements, and global issues.
      AI & Machine Learning Applications: Train sentiment analysis models for natural language processing (NLP) and predictive analytics.
      Advertising & Campaign Performance: Measure the effectiveness of marketing campaigns by analyzing audience engagement and sentiment.
    
    
    
      Our dataset is available in multiple formats (JSON, CSV, Excel) and can be delivered via API, cloud storage (AWS, Google Cloud, Azure), or direct download. 
      Gain valuable insights into social media sentiment and enhance your decision-making with high-quality, structured Twitter data.
    
  11. E

    A Sentiment Analysis Dataset for Code-Mixed Malayalam-English

    • live.european-language-grid.eu
    • data-staging.niaid.nih.gov
    • +2more
    tsv
    Updated Dec 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). A Sentiment Analysis Dataset for Code-Mixed Malayalam-English [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7634
    Explore at:
    tsvAvailable download formats
    Dataset updated
    Dec 13, 2021
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    There is an increasing demand for sentiment analysis of text from social media which are mostly code-mixed. Systems trained on monolingual data fail for code-mixed data due to the complexity of mixing at different levels of the text. However, very few resources are available for code-mixed data to create models specific for this data. Although much research in multilingual and cross-lingual sentiment analysis has used semi-supervised or unsupervised methods, supervised methods still performs better. Only a few datasets for popular languages such as English-Spanish, English-Hindi, and English-Chinese are available. There are no resources available for Malayalam-English code-mixed data. This paper presents a new gold standard corpus for sentiment analysis of code-mixed text in Malayalam-English annotated by voluntary annotators. This gold standard corpus obtained a Krippendorff’s alpha above 0.8 for the dataset. We use this new corpus to provide the benchmark for sentiment analysis in Malayalam-English code-mixed texts.

  12. h

    turkish-sentiment-analysis-dataset

    • huggingface.co
    • kaggle.com
    Updated Jun 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Batuhan (2022). turkish-sentiment-analysis-dataset [Dataset]. https://huggingface.co/datasets/winvoker/turkish-sentiment-analysis-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 22, 2022
    Authors
    Batuhan
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset

    This dataset contains positive , negative and notr sentences from several data sources given in the references. In the most sentiment models , there are only two labels; positive and negative. However , user input can be totally notr sentence. For such cases there were no data I could find. Therefore I created this dataset with 3 class. Positive and negative sentences are listed below. Notr examples are extraced from turkish wiki dump. In addition, added some random text… See the full description on the dataset page: https://huggingface.co/datasets/winvoker/turkish-sentiment-analysis-dataset.

  13. Positive-Neutral-Negative Sentiment Analysis Data

    • kaggle.com
    zip
    Updated Oct 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JAYESH CHAK (2023). Positive-Neutral-Negative Sentiment Analysis Data [Dataset]. https://www.kaggle.com/datasets/jayeshchak/positive-neutral-negative-sentiment-analysis-data
    Explore at:
    zip(64412 bytes)Available download formats
    Dataset updated
    Oct 27, 2023
    Authors
    JAYESH CHAK
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by JAYESH CHAK

    Released under Apache 2.0

    Contents

  14. E

    Data from: Facebook Data for Sentiment Analysis

    • live.european-language-grid.eu
    binary format
    Updated Jul 16, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2013). Facebook Data for Sentiment Analysis [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/1057
    Explore at:
    binary formatAvailable download formats
    Dataset updated
    Jul 16, 2013
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Corpus consisting of 10,000 Facebook posts manually annotated on sentiment (2,587 positive, 5,174 neutral, 1,991 negative and 248 bipolar posts). The archive contains data and statistics in an Excel file (FBData.xlsx) and gold data in two text files with posts (gold-posts.txt) and labels (gols-labels.txt) on corresponding lines.

  15. f

    A Labelled Dataset for Sentiment Analysis of videos on YouTube, TikTok, and...

    • figshare.com
    • data.niaid.nih.gov
    • +1more
    application/csv
    Updated Jun 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nirmalya Thakur; Vanessa Su; Mingchen Shao; Kesha A. Patel; Hongseok Jeong; Victoria Knieling; Andrew Bian (2024). A Labelled Dataset for Sentiment Analysis of videos on YouTube, TikTok, and other sources about the 2024 Outbreak of Measles [Dataset]. http://doi.org/10.6084/m9.figshare.26086492.v1
    Explore at:
    application/csvAvailable download formats
    Dataset updated
    Jun 24, 2024
    Dataset provided by
    figshare
    Authors
    Nirmalya Thakur; Vanessa Su; Mingchen Shao; Kesha A. Patel; Hongseok Jeong; Victoria Knieling; Andrew Bian
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    YouTube
    Description

    Please cite the following paper when using this dataset:N. Thakur, V. Su, M. Shao, K. Patel, H. Jeong, V. Knieling, and A.Bian “A labelled dataset for sentiment analysis of videos on YouTube, TikTok, and other sources about the 2024 outbreak of measles,” arXiv [cs.CY], 2024. Available: https://doi.org/10.48550/arXiv.2406.07693AbstractThis dataset contains the data of 4011 videos about the ongoing outbreak of measles published on 264 websites on the internet between January 1, 2024, and May 31, 2024. These websites primarily include YouTube and TikTok, which account for 48.6% and 15.2% of the videos, respectively. The remainder of the websites include Instagram and Facebook as well as the websites of various global and local news organizations. For each of these videos, the URL of the video, title of the post, description of the post, and the date of publication of the video are presented as separate attributes in the dataset. After developing this dataset, sentiment analysis (using VADER), subjectivity analysis (using TextBlob), and fine-grain sentiment analysis (using DistilRoBERTa-base) of the video titles and video descriptions were performed. This included classifying each video title and video description into (i) one of the sentiment classes i.e. positive, negative, or neutral, (ii) one of the subjectivity classes i.e. highly opinionated, neutral opinionated, or least opinionated, and (iii) one of the fine-grain sentiment classes i.e. fear, surprise, joy, sadness, anger, disgust, or neutral. These results are presented as separate attributes in the dataset for the training and testing of machine learning algorithms for performing sentiment analysis or subjectivity analysis in this field as well as for other applications. The paper associated with this dataset (please see the above-mentioned citation) also presents a list of open research questions that may be investigated using this dataset.

  16. Z

    CryptoSentiment: A large scale sentiment dataset for cryptocurrencies

    • data.niaid.nih.gov
    • zenodo.org
    Updated Mar 7, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Loukia Avramelou; Paraskevi Nousi; Nikolaos Passalis; Stavros Doropoulos; Anastasios Tefas (2023). CryptoSentiment: A large scale sentiment dataset for cryptocurrencies [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7684409
    Explore at:
    Dataset updated
    Mar 7, 2023
    Dataset provided by
    Aristotle University of Thessaloniki
    Datascouting, Thessaloniki,
    Authors
    Loukia Avramelou; Paraskevi Nousi; Nikolaos Passalis; Stavros Doropoulos; Anastasios Tefas
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    CryptoSentiment is a dataset, which contains sentiment information about cryptocurrency assets, gathered by various online sources, and analyzed by FinBERT sentiment extractor. More specifically, we provide a publicly available dataset containing fine-grained sentiment analysis data (minute-basis) about cryptocurrency market collected by different online sources. CryptoSentiment dataset includes 235,907 sentiment scores for 14 different cryptocurrencies gathered from various online sources such as news articles and social media.

  17. Bitcoin Sentiment Analysis | Twitter Data

    • kaggle.com
    zip
    Updated Nov 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gautam Chettiar (2022). Bitcoin Sentiment Analysis | Twitter Data [Dataset]. https://www.kaggle.com/datasets/gautamchettiar/bitcoin-sentiment-analysis-twitter-data
    Explore at:
    zip(192139671 bytes)Available download formats
    Dataset updated
    Nov 7, 2022
    Authors
    Gautam Chettiar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Twitter tweet data can be used for sentiment analysis for Bitcoin.

    1. Preprocessing on the tweet text has already been done very rudimentarily, you can omit it.
    2. The sentiment polarity score should be removed, it too acts as a classifier.
    3. The final column is the classifier.
    4. If you can use more than just the text data, that will add multi-modality to your functionality.
    5. Enough data points are provided.
  18. G

    Social Media Sentiment Dataset

    • gomask.ai
    csv, json
    Updated Nov 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GoMask.ai (2025). Social Media Sentiment Dataset [Dataset]. https://gomask.ai/marketplace/datasets/social-media-sentiment-dataset
    Explore at:
    json, csv(10 MB)Available download formats
    Dataset updated
    Nov 18, 2025
    Dataset provided by
    GoMask.ai
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    2024 - 2025
    Area covered
    Global
    Variables measured
    post_id, user_id, language, platform, comment_id, post_content, post_datetime, comment_content, comment_user_id, comment_datetime, and 4 more
    Description

    This dataset contains synthetic social media posts and their associated comments, each labeled with sentiment (positive, negative, neutral) and optional sentiment scores. It is designed for training and benchmarking machine learning and NLP models in social analytics, enabling fine-grained sentiment analysis across multiple platforms and languages.

  19. m

    The Climate Change Twitter Dataset

    • data.mendeley.com
    • kaggle.com
    Updated May 19, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dimitrios Effrosynidis (2022). The Climate Change Twitter Dataset [Dataset]. http://doi.org/10.17632/mw8yd7z9wc.2
    Explore at:
    Dataset updated
    May 19, 2022
    Authors
    Dimitrios Effrosynidis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    If you use the dataset, cite the paper: https://doi.org/10.1016/j.eswa.2022.117541

    The most comprehensive dataset to date regarding climate change and human opinions via Twitter. It has the heftiest temporal coverage, spanning over 13 years, includes over 15 million tweets spatially distributed across the world, and provides the geolocation of most tweets. Seven dimensions of information are tied to each tweet, namely geolocation, user gender, climate change stance and sentiment, aggressiveness, deviations from historic temperature, and topic modeling, while accompanied by environmental disaster events information. These dimensions were produced by testing and evaluating a plethora of state-of-the-art machine learning algorithms and methods, both supervised and unsupervised, including BERT, RNN, LSTM, CNN, SVM, Naive Bayes, VADER, Textblob, Flair, and LDA.

    The following columns are in the dataset:

    ➡ created_at: The timestamp of the tweet. ➡ id: The unique id of the tweet. ➡ lng: The longitude the tweet was written. ➡ lat: The latitude the tweet was written. ➡ topic: Categorization of the tweet in one of ten topics namely, seriousness of gas emissions, importance of human intervention, global stance, significance of pollution awareness events, weather extremes, impact of resource overconsumption, Donald Trump versus science, ideological positions on global warming, politics, and undefined. ➡ sentiment: A score on a continuous scale. This scale ranges from -1 to 1 with values closer to 1 being translated to positive sentiment, values closer to -1 representing a negative sentiment while values close to 0 depicting no sentiment or being neutral. ➡ stance: That is if the tweet supports the belief of man-made climate change (believer), if the tweet does not believe in man-made climate change (denier), and if the tweet neither supports nor refuses the belief of man-made climate change (neutral). ➡ gender: Whether the user that made the tweet is male, female, or undefined. ➡ temperature_avg: The temperature deviation in Celsius and relative to the January 1951-December 1980 average at the time and place the tweet was written. ➡ aggressiveness: That is if the tweet contains aggressive language or not.

    Since Twitter forbids making public the text of the tweets, in order to retrieve it you need to do a process called hydrating. Tools such as Twarc or Hydrator can be used to hydrate tweets.

  20. c

    Hacker News Sentiment Analysis Dataset

    • cubig.ai
    zip
    Updated Jul 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Hacker News Sentiment Analysis Dataset [Dataset]. https://cubig.ai/store/products/586/hacker-news-sentiment-analysis-dataset
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 14, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
    Description

    1) Data Introduction • The Hacker News Sentiment Analysis Dataset is a technology community public opinion analysis data that provides an emotional analysis (polarity, subjectivity, and emotional categories) of each of the top 141 hacker news posts along with the title, URL, point, and comment count.

    2) Data Utilization (1) Hacker News Sentiment Analysis Dataset has characteristics that: • This dataset includes polar (-1-1), subjectivity (0-1), and category (positive/neutral/negative) columns that quantify the sentiment of comments using TextBlob, based on the latest top posts as of June 24, 2025. • It is generated through web scraping and NLP preprocessing, and allows for quantitative comparison of community responses to technology news. (2) Hacker News Sentiment Analysis Dataset can be used to: • Visualize technology trends Emotional: Connect emotional scores with post topics to visually analyze community response patterns to specific technology news such as AI and policies. • NLP Model Learning: Emotional classification models can be trained using comment data with real-world technical discussions or applied to research on the subjectivity prediction of comments.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
Organization logo

Datasets for Sentiment Analysis

Explore at:
csvAvailable download formats
Dataset updated
Dec 10, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.

Below are the datasets specified, along with the details of their references, authors, and download sources.

----------- STS-Gold Dataset ----------------

The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.

Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.

File name: sts_gold_tweet.csv

----------- Amazon Sales Dataset ----------------

This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.

Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)

Features:

  • product_id - Product ID
  • product_name - Name of the Product
  • category - Category of the Product
  • discounted_price - Discounted Price of the Product
  • actual_price - Actual Price of the Product
  • discount_percentage - Percentage of Discount for the Product
  • rating - Rating of the Product
  • rating_count - Number of people who voted for the Amazon rating
  • about_product - Description about the Product
  • user_id - ID of the user who wrote review for the Product
  • user_name - Name of the user who wrote review for the Product
  • review_id - ID of the user review
  • review_title - Short review
  • review_content - Long review
  • img_link - Image Link of the Product
  • product_link - Official Website Link of the Product

License: CC BY-NC-SA 4.0

File name: amazon.csv

----------- Rotten Tomatoes Reviews Dataset ----------------

This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.

This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).

Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics

File name: data_rt.csv

----------- Preprocessed Dataset Sentiment Analysis ----------------

Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.

The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).

DOI: 10.34740/kaggle/dsv/3877817

Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }

This dataset was used in the experimental phase of my research.

File name: EcoPreprocessed.csv

----------- Amazon Earphones Reviews ----------------

This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.

This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)

License: U.S. Government Works

Source: www.amazon.in

File name (original): AllProductReviews.csv (contains 14337 reviews)

File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)

----------- Amazon Musical Instruments Reviews ----------------

This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.

This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.

The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).

Source: http://jmcauley.ucsd.edu/data/amazon/

File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)

File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)

Search
Clear search
Close search
Google apps
Main menu