100+ datasets found

Sentiment Analysis Dataset
kaggle.com
zip
Updated May 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
abdelmalek eladjelet (2025). Sentiment Analysis Dataset [Dataset]. https://www.kaggle.com/datasets/abdelmalekeladjelet/sentiment-analysis-dataset
Explore at:
zip(9105036 bytes)Available download formats
Dataset updated
May 3, 2025
Authors
abdelmalek eladjelet
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
🧠 Multi-Class Sentiment Analysis Dataset (240K+ English Comments)

📌 Description

This dataset is a large-scale collection of 241,000+ English-language comments sourced from various online platforms. Each comment is annotated with a sentiment label:

0 — Negative

1 — Neutral

2 — Positive

The Data has been gathered from multiple websites such as : Hugginface : https://huggingface.co/datasets/Sp1786/multiclass-sentiment-analysis-dataset Kaggle : https://www.kaggle.com/datasets/abhi8923shriv/sentiment-analysis-dataset
https://www.kaggle.com/datasets/jp797498e/twitter-entity-sentiment-analysis https://www.kaggle.com/datasets/crowdflower/twitter-airline-sentiment

The goal is to enable training and evaluation of multi-class sentiment analysis models for real-world text data. The dataset is already preprocessed — lowercase, cleaned from punctuation, URLs, numbers, and stopwords — and is ready for NLP pipelines.

📊 Columns

Column Description
Comment User-generated text content
Sentiment Sentiment label (0=Negative, 1=Neutral, 2=Positive)

🚀 Use Cases

🧠 Train sentiment classifiers using LSTM, BiLSTM, CNN, BERT, or RoBERTa

🔍 Evaluate preprocessing and tokenization strategies

📈 Benchmark NLP models on multi-class classification tasks

🎓 Educational projects and research in opinion mining or text classification

🧪 Fine-tune transformer models on a large and diverse sentiment dataset

💬 Example

Comment: "apple pay is so convenient secure and easy to use" Sentiment: 2 (Positive)
h
turkish-sentiment-analysis-dataset
huggingface.co
kaggle.com
Updated Jun 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Batuhan (2022). turkish-sentiment-analysis-dataset [Dataset]. https://huggingface.co/datasets/winvoker/turkish-sentiment-analysis-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 22, 2022
Authors
Batuhan
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset

This dataset contains positive , negative and notr sentences from several data sources given in the references. In the most sentiment models , there are only two labels; positive and negative. However , user input can be totally notr sentence. For such cases there were no data I could find. Therefore I created this dataset with 3 class. Positive and negative sentences are listed below. Notr examples are extraced from turkish wiki dump. In addition, added some random text… See the full description on the dataset page: https://huggingface.co/datasets/winvoker/turkish-sentiment-analysis-dataset.
Datasets for Sentiment Analysis
zenodo.org
csv
Updated Dec 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10157504
Dataset updated
Dec 10, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.
Below are the datasets specified, along with the details of their references, authors, and download sources.

----------- STS-Gold Dataset ----------------
The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.
Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.
File name: sts_gold_tweet.csv
----------- Amazon Sales Dataset ----------------
This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.
Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)
Features:
product_id - Product ID
product_name - Name of the Product
category - Category of the Product
discounted_price - Discounted Price of the Product
actual_price - Actual Price of the Product
discount_percentage - Percentage of Discount for the Product
rating - Rating of the Product
rating_count - Number of people who voted for the Amazon rating
about_product - Description about the Product
user_id - ID of the user who wrote review for the Product
user_name - Name of the user who wrote review for the Product
review_id - ID of the user review
review_title - Short review
review_content - Long review
img_link - Image Link of the Product
product_link - Official Website Link of the Product
License: CC BY-NC-SA 4.0
File name: amazon.csv
----------- Rotten Tomatoes Reviews Dataset ----------------
This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.
This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).
Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics
File name: data_rt.csv
----------- Preprocessed Dataset Sentiment Analysis ----------------
Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.
The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).
DOI: 10.34740/kaggle/dsv/3877817
Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }
This dataset was used in the experimental phase of my research.
File name: EcoPreprocessed.csv
----------- Amazon Earphones Reviews ----------------
This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)
License: U.S. Government Works
Source: www.amazon.in
File name (original): AllProductReviews.csv (contains 14337 reviews)
File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)
----------- Amazon Musical Instruments Reviews ----------------
This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).
Source: http://jmcauley.ucsd.edu/data/amazon/
File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)
File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)
Brand Sentiment Analysis Dataset (Twitter)
kaggle.com
zip
Updated Jan 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tushar Paul (2024). Brand Sentiment Analysis Dataset (Twitter) [Dataset]. https://www.kaggle.com/datasets/tusharpaul2001/brand-sentiment-analysis-dataset
Explore at:
zip(375745 bytes)Available download formats
Dataset updated
Jan 7, 2024
Authors
Tushar Paul
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset description Users assessed tweets related to various brands and products, providing evaluations on whether the sentiment conveyed was positive, negative, or neutral. Additionally, if the tweet conveyed any sentiment, contributors identified the specific brand or product targeted by that emotion.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F11965067%2Fa48606bfcaf80acebbb6edff7895484a%2Fdownload.png?generation=1704673111671747&alt=media" alt="">

Train Dataset : 8589 rows x 3 columns https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F11965067%2Fe998ba81ca461699a787ff7305486b24%2FTrainDS.JPG?generation=1704672608361793&alt=media" alt="">

Test Dataset : 504 rows x 1 columns https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F11965067%2F07df18965e91f84df123270aabb641e1%2Ftest.JPG?generation=1704679582009718&alt=media" alt="">
c
Sentiment Analysis Dataset
cubig.ai
zip
Updated May 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). Sentiment Analysis Dataset [Dataset]. https://cubig.ai/store/products/270/sentiment-analysis-dataset
Explore at:
zipAvailable download formats
Dataset updated
May 20, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
Description
1) Data Introduction • The Sentiment Analysis Dataset is a dataset for emotional analysis, including large-scale tweet text collected from Twitter and emotional polarity (0=negative, 2=neutral, 4=positive) labels for each tweet, featuring automatic labeling based on emoticons.

2) Data Utilization (1) Sentiment Analysis Dataset has characteristics that: • Each sample consists of six columns: emotional polarity, tweet ID, date of writing, search word, author, and tweet body, and is suitable for training natural language processing and classification models using tweet text and emotion labels. (2) Sentiment Analysis Dataset can be used to: • Emotional Classification Model Development: Using tweet text and emotional polarity labels, we can build positive, negative, and neutral emotional automatic classification models with various machine learning and deep learning models such as logistic regression, SVM, RNN, and LSTM. • Analysis of SNS public opinion and trends: By analyzing the distribution of emotions by time series and keywords, you can explore changes in public opinion on specific issues or brands, positive and negative trends, and key emotional keywords.
2.5M+ reviews dataset for sentiment analysis
kaggle.com
Updated Jan 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mike Shperling (2025). 2.5M+ reviews dataset for sentiment analysis [Dataset]. https://www.kaggle.com/datasets/dolbokostya/test-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 16, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mike Shperling
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
🌟 Dive into the largest reviews dataset with 2.5M entries, each labeled for sentiment!

Perfect for AI enthusiasts, data scientists, and researchers to supercharge your NLP projects.

💡 Why you’ll love it:

📈 Boost your sentiment analysis models with massive, clean data

🧠 Ideal for NLP and deep learning experiments

🚀 Save time and focus on building winning solutions

⚡ Upvote & download now to take your projects to the next level! 🖤
Sentiment Analysis Dataset
kaggle.com
zip
Updated Dec 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abhay Mudgal (2024). Sentiment Analysis Dataset [Dataset]. https://www.kaggle.com/datasets/abhaymudgal/sentiment-analysis-dataset
Explore at:
zip(3597460 bytes)Available download formats
Dataset updated
Dec 2, 2024
Authors
Abhay Mudgal
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
As the Social networking, customer support, and market research are where sentiment analysis is most frequently used. In social media, sentiment analysis is frequently used to examine how users feel about and talk about a brand or product. Organizations can use it to learn how various societal segments see various issues, ranging from hot topics to breaking news. With this knowledge, businesses may react swiftly to public sentiment.

In this challenge, the goal is to detect the sentiments of the natural occurring sentences.

Datasets consist following files -

Dev-datasets: Containing the train and dev datasets along with a sample submission file (answer.txt) test-datasets: Containing the test dataset on which your models will be evaluated

Train Size - 92,228

Development Size - 4,855

Ground Truth contains 3 categorical values -

Positive (1)

Neutral (0)

Negative (-1)

You have to predict the labels and save the predictions (1, 0, -1) in "answer.txt" file.
Sentiment Analysis Dataset
kaggle.com
zip
Updated Jul 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Afroz (2024). Sentiment Analysis Dataset [Dataset]. https://www.kaggle.com/datasets/pythonafroz/sentiment-analysis-dataset
Explore at:
zip(2090137 bytes)Available download formats
Dataset updated
Jul 27, 2024
Authors
Afroz
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Sentiment Analysis Dataset

This dataset focuses on entity-level sentiment analysis for Twitter data. This means that for each tweet (message) and a specific entity mentioned within it, the goal is to determine the sentiment expressed towards that entity.

The dataset categorizes sentiments into three classes:

Positive: The tweet expresses a positive opinion about the entity. Negative: The tweet expresses a negative opinion about the entity. Neutral: This category includes two types of tweets:

Tweets that are irrelevant to the entity.

Tweets that express a neutral opinion about the entity (which is difficult to distinguish from irrelevance in practice).

Usage

The dataset is divided into two parts:

twitter_training.csv: This file contains data used to train sentiment analysis models. twitter_validation.csv: This file contains data used to evaluate the performance of trained models.

Evaluation Metric:

The performance of different models is measured using top-1 classification accuracy. This means the model's prediction for the sentiment (positive, negative, or neutral) should match the correct label in the dataset for the highest number of instances.

In Summary

The dataset provides a collection of tweets and their corresponding entities, labeled with sentiment. Researchers and developers can use this data to build models that accurately predict the sentiment towards a given entity within a tweet. The performance of these models is assessed based on how often they correctly identify the sentiment.
m
Twitter Sentiments Dataset
data.mendeley.com
Updated May 14, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SHERIF HUSSEIN (2021). Twitter Sentiments Dataset [Dataset]. http://doi.org/10.17632/z9zw7nt5h2.1
Explore at:
Unique identifier
https://doi.org/10.17632/z9zw7nt5h2.1
Dataset updated
May 14, 2021
Authors
SHERIF HUSSEIN
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset has three sentiments namely, negative, neutral, and positive. It contains two fields for the tweet and label.
g
Multimodal Sentiment Analysis Dataset
gts.ai
json
Updated Jun 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GTS (2024). Multimodal Sentiment Analysis Dataset [Dataset]. https://gts.ai/dataset-download/multimodal-sentiment-analysis-dataset/
Explore at:
jsonAvailable download formats
Dataset updated
Jun 28, 2024
Dataset provided by
GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
Authors
GTS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Explore our unique Multimodal Sentiment Analysis Dataset, featuring high-quality images and corresponding text descriptions with sentiment labels.
h
sentiment-analysis-dataset
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
stepan, sentiment-analysis-dataset [Dataset]. https://huggingface.co/datasets/skibastepan/sentiment-analysis-dataset
Explore at:
Authors
stepan
Description
skibastepan/sentiment-analysis-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
c
Hacker News Sentiment Analysis Dataset
cubig.ai
zip
Updated Jul 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). Hacker News Sentiment Analysis Dataset [Dataset]. https://cubig.ai/store/products/586/hacker-news-sentiment-analysis-dataset
Explore at:
zipAvailable download formats
Dataset updated
Jul 14, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
Description
1) Data Introduction • The Hacker News Sentiment Analysis Dataset is a technology community public opinion analysis data that provides an emotional analysis (polarity, subjectivity, and emotional categories) of each of the top 141 hacker news posts along with the title, URL, point, and comment count.

2) Data Utilization (1) Hacker News Sentiment Analysis Dataset has characteristics that: • This dataset includes polar (-1-1), subjectivity (0-1), and category (positive/neutral/negative) columns that quantify the sentiment of comments using TextBlob, based on the latest top posts as of June 24, 2025. • It is generated through web scraping and NLP preprocessing, and allows for quantitative comparison of community responses to technology news. (2) Hacker News Sentiment Analysis Dataset can be used to: • Visualize technology trends Emotional: Connect emotional scores with post topics to visually analyze community response patterns to specific technology news such as AI and policies. • NLP Model Learning: Emotional classification models can be trained using comment data with real-world technical discussions or applied to research on the subjectivity prediction of comments.

Twitter Sentiment Analysis Datasets

brightdata.com

.json, .csv, .xlsx

Facebook

Twitter

Click to copy link

Link copied

Cite

Bright Data, Twitter Sentiment Analysis Datasets [Dataset]. https://brightdata.com/products/datasets/twitter/sentiment-analysis

Explore at:

.json, .csv, .xlsxAvailable download formats

Dataset authored and provided by

Bright Datahttps://brightdata.com/

License

https://brightdata.com/licensehttps://brightdata.com/license

Area covered

Worldwide

Description

Our Twitter Sentiment Analysis Dataset provides a comprehensive collection of tweets, enabling businesses, researchers, and analysts to assess public sentiment, track trends, and monitor brand perception in real time. This dataset includes detailed metadata for each tweet, allowing for in-depth analysis of user engagement, sentiment trends, and social media impact.

Key Features:

  Tweet Content & Metadata: Includes tweet text, hashtags, mentions, media attachments, and engagement metrics such as likes, retweets, and replies.
  Sentiment Classification: Analyze sentiment polarity (positive, negative, neutral) to gauge public opinion on brands, events, and trending topics.
  Author & User Insights: Access user details such as username, profile information, follower count, and account verification status.
  Hashtag & Topic Tracking: Identify trending hashtags and keywords to monitor conversations and sentiment shifts over time.
  Engagement Metrics: Measure tweet performance based on likes, shares, and comments to evaluate audience interaction.
  Historical & Real-Time Data: Choose from historical datasets for trend analysis or real-time data for up-to-date sentiment tracking.


Use Cases:

  Brand Monitoring & Reputation Management: Track public sentiment around brands, products, and services to manage reputation and customer perception.
  Market Research & Consumer Insights: Analyze consumer opinions on industry trends, competitor performance, and emerging market opportunities.
  Political & Social Sentiment Analysis: Evaluate public opinion on political events, social movements, and global issues.
  AI & Machine Learning Applications: Train sentiment analysis models for natural language processing (NLP) and predictive analytics.
  Advertising & Campaign Performance: Measure the effectiveness of marketing campaigns by analyzing audience engagement and sentiment.



  Our dataset is available in multiple formats (JSON, CSV, Excel) and can be delivered via API, cloud storage (AWS, Google Cloud, Azure), or direct download. 
  Gain valuable insights into social media sentiment and enhance your decision-making with high-quality, structured Twitter data.

Turkish Sentiment Analysis Dataset
humirapps.cs.hacettepe.edu.tr
zip
Updated Apr 12, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hacettepe University Multimedia Information Retrieval Laboratory (2017). Turkish Sentiment Analysis Dataset [Dataset]. http://doi.org/10.1109/SITIS.2016.57
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1109/SITIS.2016.57
Dataset updated
Apr 12, 2017
Dataset provided by
Hacettepe Universityhttp://hacettepe.edu.tr/
Authors
Hacettepe University Multimedia Information Retrieval Laboratory
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
We have selected two most popular movie and hotel recommendation websites from those which attain a high rate in the Alexa website. We selected “beyazperde.com” and “otelpuan.com” for movie and hotel reviews, respectively. The reviews of 5,660 movies were investigated. The all 220,000 extracted reviews had been already rated by own authors using stars 1 to 5. As most of the reviews were positive, we selected the positive reviews as much as the negative ones to provide a balanced situation. The total of negative reviews rated by 1 or 2 stars were 26,700, thus, we randomly selected 26,700 out of 130,210 positive reviews rated by 4 or 5 stars. Overall, 53,400 movie reviews by the average length of 33 words were selected. The similar manner was used to hotel reviews with the difference that the hotel reviews had been rated by the numbers between 0 and 100 instead of stars. From 18,478 reviews extracted from 550 hotels, a balanced set of positive and negative reviews was selected. As there were only 5,802 negative hotel reviews using 0 to 40 rating, we selected 5800 out of 6499 positive reviews rated from 80 to 100. The average length of all 11,600 selected positive and negative hotel reviews were 74 which is more than two times of the movie reviews.
h
news-sentiment-data
huggingface.co
Updated Jul 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
amitk17 (2024). news-sentiment-data [Dataset]. https://huggingface.co/datasets/sweatSmile/news-sentiment-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 8, 2024
Authors
amitk17
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
sweatSmile/news-sentiment-data dataset hosted on Hugging Face and contributed by the HF Datasets community
m
Arabic Sentiment Datasets
data.mendeley.com
Updated Sep 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tamara Alqablan (2025). Arabic Sentiment Datasets [Dataset]. http://doi.org/10.17632/6w9g62xc67.2
Explore at:
Unique identifier
https://doi.org/10.17632/6w9g62xc67.2
Dataset updated
Sep 4, 2025
Authors
Tamara Alqablan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is specifically designed for sentiment analysis (SA) in the Arabic language, serving as a crucial resource for developing and evaluating various SA models. The dataset contains [briefly describe the content, e.g., number of entries, types of sentiments (positive, negative, neutral), sources of the data like social media, reviews, etc.]. It has been curated to meet the unique linguistic characteristics of Arabic text, facilitating the training, validation, and benchmarking of machine learning and natural language processing models. While there are several sentiment analysis datasets available in multiple languages, this dataset focuses on Arabic, supporting research aimed at understanding sentiment in Arabic-speaking communities. To ensure the effectiveness of feature selection approaches in sentiment analysis, the dataset can be used alongside well-known datasets such as those available from the UCI Machine Learning Repository (https://archive.ics.uci.edu/), which provides a range of datasets commonly employed for evaluating feature selection techniques.

The dataset aligns with previous work such as Al-Moslmi et al.'s construction of an Arabic sentiment lexicon for public use, which contributed significantly to Arabic sentiment analysis resources [1]. Additionally, this dataset draws inspiration from established Arabic corpora such as the Opinion Corpus for Arabic (OCA) by Rushdi-Saleh et al. [2], and Ar-Twitter, a corpus designed for sentiment analysis on Arabic tweets, as demonstrated by Abdulla et al. [3].

References: Al-Moslmi, T., Albared, M., Al-Shabi, A., Omar, N., Abdullah, S.: Arabic sentilexicon: Constructing publicly available language resources for Arabic sentiment analysis. Journal of Information Science, 44(3), 345–362 (2018). Rushdi-Saleh, M., Martín-Valdivia, M.T., Ureña-López, L.A., Perea-Ortega, J.M.: OCA: Opinion corpus for Arabic. Journal of the American Society for Information Science and Technology, 62(10), 2045–2054 (2011). Abdulla, N., Mahyoub, N., Shehab, M., Al-Ayyoub, M.: Arabic sentiment analysis: Corpus-based and lexicon-based. In: Proceedings of The IEEE Conference on Applied Electrical Engineering and Computing Technologies (AEECT) (2013).
Twitter Sentiment Analysis Dataset
kaggle.com
zip
Updated Jul 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Durgesh Rao (2023). Twitter Sentiment Analysis Dataset [Dataset]. https://www.kaggle.com/datasets/durgeshrao9993/twitter-analysis-dataset-2022
Explore at:
zip(1291530 bytes)Available download formats
Dataset updated
Jul 3, 2023
Authors
Durgesh Rao
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
The Twitter Sentiment Analysis Dataset is a widely used dataset in the field of natural language processing and sentiment analysis. It consists of a collection of tweets, each labeled with the sentiment expressed in the tweet, which can be positive, negative, or neutral. This dataset is commonly used for training and evaluating machine learning models that aim to automatically analyze and classify the sentiment behind Twitter messages.

The dataset contains a diverse range of tweets, capturing the opinions, emotions, and attitudes of Twitter users on various topics such as movies, products, events, or general daily experiences. The tweets cover a broad spectrum of sentiments, including expressions of joy, satisfaction, anger, disappointment, sarcasm, or indifference.
E
A Sentiment Analysis Dataset for Code-Mixed Malayalam-English
live.european-language-grid.eu
zenodo.org
+1more
tsv
Updated Dec 13, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). A Sentiment Analysis Dataset for Code-Mixed Malayalam-English [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7634
Explore at:
tsvAvailable download formats
Dataset updated
Dec 13, 2021
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
There is an increasing demand for sentiment analysis of text from social media which are mostly code-mixed. Systems trained on monolingual data fail for code-mixed data due to the complexity of mixing at different levels of the text. However, very few resources are available for code-mixed data to create models specific for this data. Although much research in multilingual and cross-lingual sentiment analysis has used semi-supervised or unsupervised methods, supervised methods still performs better. Only a few datasets for popular languages such as English-Spanish, English-Hindi, and English-Chinese are available. There are no resources available for Malayalam-English code-mixed data. This paper presents a new gold standard corpus for sentiment analysis of code-mixed text in Malayalam-English annotated by voluntary annotators. This gold standard corpus obtained a Krippendorff’s alpha above 0.8 for the dataset. We use this new corpus to provide the benchmark for sentiment analysis in Malayalam-English code-mixed texts.
h
sentiment-analysis-dataset
huggingface.co
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
krusty crab (2025). sentiment-analysis-dataset [Dataset]. https://huggingface.co/datasets/krusty99/sentiment-analysis-dataset
Explore at:
Dataset updated
Apr 1, 2025
Authors
krusty crab
Description
license: mittask_categories: - text-classificationlanguage: - entags: - financepretty_name: sentiment-analysis-datasetsize_categories: - n<1K

Dataset Card for Sentiment Analysis Dataset

This dataset card aims to provide a comprehensive overview of a sentiment analysis dataset containing product reviews labeled with sentiment.

Dataset Details Dataset Description

This dataset contains 1,000 product reviews categorized into two sentiment… See the full description on the dataset page: https://huggingface.co/datasets/krusty99/sentiment-analysis-dataset.
Sentiment Analysis for Mental Health
kaggle.com
zip
Updated Jul 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Suchintika Sarkar (2024). Sentiment Analysis for Mental Health [Dataset]. https://www.kaggle.com/datasets/suchintikasarkar/sentiment-analysis-for-mental-health
Explore at:
zip(11587194 bytes)Available download formats
Dataset updated
Jul 5, 2024
Authors
Suchintika Sarkar
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
This comprehensive dataset is a meticulously curated collection of mental health statuses tagged from various statements. The dataset amalgamates raw data from multiple sources, cleaned and compiled to create a robust resource for developing chatbots and performing sentiment analysis.

Data Source:

The dataset integrates information from the following Kaggle datasets:

3k Conversations Dataset for Chatbot

Depression Reddit Cleaned

Human Stress Prediction

Predicting Anxiety in Mental Health Data

Mental Health Dataset Bipolar

Reddit Mental Health Data

Students Anxiety and Depression Dataset

Suicidal Mental Health Dataset

Suicidal Tweet Detection Dataset

Data Overview:

The dataset consists of statements tagged with one of the following seven mental health statuses: - Normal - Depression - Suicidal - Anxiety - Stress - Bi-Polar - Personality Disorder

Data Collection:

The data is sourced from diverse platforms including social media posts, Reddit posts, Twitter posts, and more. Each entry is tagged with a specific mental health status, making it an invaluable asset for:

Developing intelligent mental health chatbots.

Performing in-depth sentiment analysis.

Research and studies related to mental health trends.

Features:

unique_id: A unique identifier for each entry.

Statement: The textual data or post.

Mental Health Status: The tagged mental health status of the statement.

Usage:

This dataset is ideal for training machine learning models aimed at understanding and predicting mental health conditions based on textual data. It can be used in various applications such as:

Chatbot development for mental health support.

Sentiment analysis to gauge mental health trends.

Academic research on mental health patterns.

Acknowledgments:

This dataset was created by aggregating and cleaning data from various publicly available datasets on Kaggle. Special thanks to the original dataset creators for their contributions.

Facebook

Twitter

Click to copy link

Link copied

Cite

abdelmalek eladjelet (2025). Sentiment Analysis Dataset [Dataset]. https://www.kaggle.com/datasets/abdelmalekeladjelet/sentiment-analysis-dataset

Sentiment Analysis Dataset

Dataset for text classification

Explore at:

zip(9105036 bytes)Available download formats

Dataset updated

May 3, 2025

Authors

abdelmalek eladjelet

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

🧠 Multi-Class Sentiment Analysis Dataset (240K+ English Comments)

📌 Description

This dataset is a large-scale collection of 241,000+ English-language comments sourced from various online platforms. Each comment is annotated with a sentiment label:

0 — Negative
1 — Neutral
2 — Positive

The Data has been gathered from multiple websites such as : Hugginface : https://huggingface.co/datasets/Sp1786/multiclass-sentiment-analysis-dataset Kaggle : https://www.kaggle.com/datasets/abhi8923shriv/sentiment-analysis-dataset
https://www.kaggle.com/datasets/jp797498e/twitter-entity-sentiment-analysis https://www.kaggle.com/datasets/crowdflower/twitter-airline-sentiment

The goal is to enable training and evaluation of multi-class sentiment analysis models for real-world text data. The dataset is already preprocessed — lowercase, cleaned from punctuation, URLs, numbers, and stopwords — and is ready for NLP pipelines.

📊 Columns

Column	Description
`Comment`	User-generated text content
`Sentiment`	Sentiment label (0=Negative, 1=Neutral, 2=Positive)

🚀 Use Cases

🧠 Train sentiment classifiers using LSTM, BiLSTM, CNN, BERT, or RoBERTa
🔍 Evaluate preprocessing and tokenization strategies
📈 Benchmark NLP models on multi-class classification tasks
🎓 Educational projects and research in opinion mining or text classification
🧪 Fine-tune transformer models on a large and diverse sentiment dataset

💬 Example

Comment: "apple pay is so convenient secure and easy to use"
Sentiment: 2 (Positive)

Clear search

Close search

Google apps

Main menu

Sentiment Analysis Dataset

🧠 Multi-Class Sentiment Analysis Dataset (240K+ English Comments)

📌 Description

📊 Columns

🚀 Use Cases

💬 Example

turkish-sentiment-analysis-dataset

Datasets for Sentiment Analysis

Brand Sentiment Analysis Dataset (Twitter)

Sentiment Analysis Dataset

2.5M+ reviews dataset for sentiment analysis

🌟 Dive into the largest reviews dataset with 2.5M entries, each labeled for sentiment!

Perfect for AI enthusiasts, data scientists, and researchers to supercharge your NLP projects.

⚡ Upvote & download now to take your projects to the next level! 🖤

Sentiment Analysis Dataset

Sentiment Analysis Dataset

Sentiment Analysis Dataset

The dataset categorizes sentiments into three classes:

Usage

Evaluation Metric:

In Summary

Twitter Sentiments Dataset

Multimodal Sentiment Analysis Dataset

sentiment-analysis-dataset

Hacker News Sentiment Analysis Dataset

Twitter Sentiment Analysis Datasets

Turkish Sentiment Analysis Dataset

news-sentiment-data

Arabic Sentiment Datasets

Twitter Sentiment Analysis Dataset

A Sentiment Analysis Dataset for Code-Mixed Malayalam-English

sentiment-analysis-dataset

Sentiment Analysis for Mental Health

Data Source:

Data Overview:

Data Collection:

Features:

Usage:

Acknowledgments:

Sentiment Analysis Dataset

Dataset for text classification

🧠 Multi-Class Sentiment Analysis Dataset (240K+ English Comments)

📌 Description

📊 Columns

🚀 Use Cases

💬 Example