Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for Dataset Name
Dataset Summary
This dataset card aims to be a base template for new datasets. It has been generated using this raw template.
Supported Tasks and Leaderboards
[More Information Needed]
Languages
[More Information Needed]
Dataset Structure
Data Instances
[More Information Needed]
Data Fields
[More Information Needed]
Data Splits
[More Information Needed]
Dataset Creation… See the full description on the dataset page: https://huggingface.co/datasets/Sp1786/multiclass-sentiment-analysis-dataset.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is a large-scale collection of 241,000+ English-language comments sourced from various online platforms. Each comment is annotated with a sentiment label:
The Data has been gathered from multiple websites such as :
Hugginface : https://huggingface.co/datasets/Sp1786/multiclass-sentiment-analysis-dataset
Kaggle : https://www.kaggle.com/datasets/abhi8923shriv/sentiment-analysis-dataset
https://www.kaggle.com/datasets/jp797498e/twitter-entity-sentiment-analysis
https://www.kaggle.com/datasets/crowdflower/twitter-airline-sentiment
The goal is to enable training and evaluation of multi-class sentiment analysis models for real-world text data. The dataset is already preprocessed — lowercase, cleaned from punctuation, URLs, numbers, and stopwords — and is ready for NLP pipelines.
| Column | Description |
|---|---|
Comment | User-generated text content |
Sentiment | Sentiment label (0=Negative, 1=Neutral, 2=Positive) |
Comment: "apple pay is so convenient secure and easy to use"
Sentiment: 2 (Positive)
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Social Media Sentiments Analysis Dataset captures a vibrant tapestry of emotions, trends, and interactions across various social media platforms. This dataset provides a snapshot of user-generated content, encompassing text, timestamps, hashtags, countries, likes, and retweets. Each entry unveils unique stories—moments of surprise, excitement, admiration, thrill, contentment, and more—shared by individuals worldwide.
Key Features
| Feature | Description |
|---|---|
| Text | User-generated content showcasing sentiments |
| Sentiment | Categorized emotions |
| Timestamp | Date and time information |
| User | Unique identifiers of users contributing |
| Platform | Social media platform where the content originated |
| Hashtags | Identifies trending topics and themes |
| Likes | Quantifies user engagement (likes) |
| Retweets | Reflects content popularity (retweets) |
| Country | Geographical origin of each post |
| Year | Year of the post |
| Month | Month of the post |
| Day | Day of the post |
| Hour | Hour of the post |
How to Use The Social Media Sentiments Analysis Dataset 📊
The Social Media Sentiments Analysis Dataset is a rich source of information that can be leveraged for various analytical purposes. Below are key ways to make the most of this dataset:
Sentiment Analysis:
Explore the emotional landscape by conducting sentiment analysis on the "Text" column. Classify user-generated content into categories such as surprise, excitement, admiration, thrill, contentment, and more.
Temporal Analysis:
Investigate trends over time using the "Timestamp" column. Identify patterns, fluctuations, or recurring themes in social media content.
User Behavior Insights:
Analyze user engagement through the "Likes" and "Retweets" columns. Discover popular content and user preferences.
Platform-Specific Analysis:
Examine variations in content across different social media platforms using the "Platform" column. Understand how sentiments vary across platforms.
Hashtag Trends:
Identify trending topics and themes by analyzing the "Hashtags" column. Uncover popular or recurring hashtags.
Geographical Analysis:
Explore content distribution based on the "Country" column. Understand regional variations in sentiment and topic preferences.
User Identification:
Use the "User" column to track specific users and their contributions. Analyze the impact of influential users on sentiment trends.
Cross-Analysis:
Combine multiple features for in-depth insights. For example, analyze sentiment trends over time or across different platforms and countries.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.
Below are the datasets specified, along with the details of their references, authors, and download sources.
----------- STS-Gold Dataset ----------------
The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.
Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.
File name: sts_gold_tweet.csv
----------- Amazon Sales Dataset ----------------
This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.
Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)
Features:
License: CC BY-NC-SA 4.0
File name: amazon.csv
----------- Rotten Tomatoes Reviews Dataset ----------------
This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.
This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).
Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics
File name: data_rt.csv
----------- Preprocessed Dataset Sentiment Analysis ----------------
Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.
The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).
DOI: 10.34740/kaggle/dsv/3877817
Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }
This dataset was used in the experimental phase of my research.
File name: EcoPreprocessed.csv
----------- Amazon Earphones Reviews ----------------
This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)
License: U.S. Government Works
Source: www.amazon.in
File name (original): AllProductReviews.csv (contains 14337 reviews)
File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)
----------- Amazon Musical Instruments Reviews ----------------
This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).
Source: http://jmcauley.ucsd.edu/data/amazon/
File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)
File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)
Facebook
TwitterSentiment analysis uses natural language processing and machine learning techniques to analyze the emotional tone or sentiment behind a piece of text. It involves identifying and categorizing opinions expressed in a text as positive, negative, or neutral. This dataset contains different kinds of tweets and their sentiment (0 and 1). 1 stands for a negative tweet 0 stands for a positive tweet
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The Twitter Sentiment Analysis Dataset contains 1,578,627 classified tweets, each row is marked as 1 for positive sentiment and 0 for negative sentiment. The dataset is based on data from the following two sources:
University of Michigan Sentiment Analysis competition on Kaggle Twitter Sentiment Corpus by Niek Sanders
Finally, I randomly selected a subset of them, applied a cleaning process, and divided them between the test and train subsets, keeping a balance between the number of positive and negative tweets within each of these subsets.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Description
Title: [Sentiment Analysis Dataset]
Description: This dataset contains [social media comments, customer reviews, etc.], specifically collected from [Reddit, Twitter, etc.]. The primary goal of this dataset is to [describe the purpose, e.g., analyze sentiment, predict outcomes, etc.].
** Features** - Number of Rows: [2000] - Number of Columns: [3] - Columns Discriptors: - Id: A unique identifier for each entry. - Body: The text content or main body of the entry. - Sentiment Type: The sentiment classification of the text (e.g., positive, negative, neutral).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset has three sentiments namely, negative, neutral, and positive. It contains two fields for the tweet and label.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
🐦 Twitter Sentiment Analysis (bdstar/twitter-sentiment-analysis)
🧠 Overview
A refined and merged version of Twitter text sentiment datasets, providing a clean and well-balanced dataset for sentiment classification across three sentiment categories:positive, negative, and neutral. This dataset is split into three parts — train, test, and validation — each sourced from highly reputable open datasets.It is designed for training, evaluating, and benchmarking NLP models for… See the full description on the dataset page: https://huggingface.co/datasets/bdstar/twitter-sentiment-analysis.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
model-generated predictions
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Dataset
This dataset contains positive , negative and notr sentences from several data sources given in the references. In the most sentiment models , there are only two labels; positive and negative. However , user input can be totally notr sentence. For such cases there were no data I could find. Therefore I created this dataset with 3 class. Positive and negative sentences are listed below. Notr examples are extraced from turkish wiki dump. In addition, added some random text… See the full description on the dataset page: https://huggingface.co/datasets/winvoker/turkish-sentiment-analysis-dataset.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
YouTube Comments Sentiment Analysis Dataset (1M+ Labeled Comments)
Overview
This dataset comprises over one million YouTube comments, each annotated with sentiment labels—Positive, Neutral, or Negative. The comments span a diverse range of topics including programming, news, sports, politics and more, and are enriched with comprehensive metadata to facilitate various NLP and sentiment analysis tasks.
How to use:
import pandas as pd df =… See the full description on the dataset page: https://huggingface.co/datasets/AmaanP314/youtube-comment-sentiment.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The Sentiment and Emotion Analysis Dataset is a meticulously curated collection of textual data, designed to empower researchers, data scientists, and NLP enthusiasts to delve into the intricacies of human emotions and sentiments embedded in text. With a blend of large-scale emotional diversity and sentiment categorization, this dataset offers a rich playground for building state-of-the-art machine learning and deep learning models.
1. Emotion Analysis: Over 422,000 sentences, labeled with six distinct emotions: - Joy: 143,067 samples - Sadness: 121,187 samples - Anger: 59,317 samples - Fear: 49,649 samples - Love: 34,554 samples - Surprise: 14,972 samples
2. Sentiment Analysis: A supplementary set of 3,309 sentences, categorized into two primary sentiments: - Positive: 1,679 samples - Negative: 1,630 samples
3. Versatile Applications: This dataset is perfectly suited for tasks like: - Emotion detection in text - Sentiment polarity classification - Multitask NLP applications - Pre-training or fine-tuning transformer models like BERT, GPT, or similar architectures
4. Balanced and Well-Structured: Each sample consists of a sentence and its corresponding label (emotion or sentiment), ensuring ease of use and streamlined preprocessing for your projects.
The Sentiment and Emotion Analysis Dataset stands out with its extensive scale, class diversity, and real-world relevance. The data spans a variety of contexts, making it ideal for developing models that excel in understanding human psychology through textual cues. By leveraging this dataset, you can push the boundaries of NLP applications, from chatbots to mental health analysis tools.
Whether you're a beginner exploring NLP or a seasoned data scientist, this dataset is your gateway to mastering emotion and sentiment analysis. Dive in to create impactful solutions and uncover insights like never before!
Facebook
Twittersjyuxyz/financial-sentiment-analysis dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Tanaos Sentiment Analysis Training Dataset
This dataset was created synthetically by Tanaos with the Artifex Python library. The dataset is designed to train and evaluate sentiment analysis systems — models that classify the sentiment expressed in text as one of five possible categories: very_negative, negative, neutral, positive or very_positive. It can be used to build sentiment analysis models for various applications, such as customer feedback analysis, social media… See the full description on the dataset page: https://huggingface.co/datasets/tanaos/synthetic-sentiment-analysis-dataset-v1.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is an image and text dataset for sentiment analysis.
Facebook
TwitterThis dataset contains a sentiment analysis of news collected as of July 1, 2024. The news was collected using the MediaStack API, which provides access to real-time news from multiple global sources. Sentiment analysis was performed using the TextBlob library for Python.
News was collected using the MediaStack API. Each news story was categorized and analyzed for the sentiment of its description. The MediaStack API was chosen due to its flexibility and permissions for using data in development and analysis projects.
Sentiment Analysis Sentiment analysis was conducted using the TextBlob library. Each news description was analyzed to determine whether the sentiment was positive, neutral or negative. This process helps you understand the overall tone of the news over a one-week period.
License This dataset is shared for educational and developmental purposes. Redistribution of original news data must follow the guidelines and terms of use of MediaStack and the original news sources.
Facebook
Twitterlicense: mittask_categories: - text-classificationlanguage: - entags: - financepretty_name: sentiment-analysis-datasetsize_categories: - n<1K
Dataset Card for Sentiment Analysis Dataset
This dataset card aims to provide a comprehensive overview of a sentiment analysis dataset containing product reviews labeled with sentiment.
Dataset Details
Dataset Description
This dataset contains 1,000 product reviews categorized into two sentiment… See the full description on the dataset page: https://huggingface.co/datasets/krusty99/sentiment-analysis-dataset.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Sentiment Analysis (Teeny-Tiny Castle)
This dataset is part of a tutorial tied to the Teeny-Tiny Castle, an open-source repository containing educational tools for AI Ethics and Safety research.
How to Use
from datasets import load_dataset
dataset = load_dataset("AiresPucrs/sentiment-analysis", split = 'train')
Facebook
Twitterhttps://brightdata.com/licensehttps://brightdata.com/license
Our Twitter Sentiment Analysis Dataset provides a comprehensive collection of tweets, enabling businesses, researchers, and analysts to assess public sentiment, track trends, and monitor brand perception in real time. This dataset includes detailed metadata for each tweet, allowing for in-depth analysis of user engagement, sentiment trends, and social media impact.
Key Features:
Tweet Content & Metadata: Includes tweet text, hashtags, mentions, media attachments, and engagement metrics such as likes, retweets, and replies.
Sentiment Classification: Analyze sentiment polarity (positive, negative, neutral) to gauge public opinion on brands, events, and trending topics.
Author & User Insights: Access user details such as username, profile information, follower count, and account verification status.
Hashtag & Topic Tracking: Identify trending hashtags and keywords to monitor conversations and sentiment shifts over time.
Engagement Metrics: Measure tweet performance based on likes, shares, and comments to evaluate audience interaction.
Historical & Real-Time Data: Choose from historical datasets for trend analysis or real-time data for up-to-date sentiment tracking.
Use Cases:
Brand Monitoring & Reputation Management: Track public sentiment around brands, products, and services to manage reputation and customer perception.
Market Research & Consumer Insights: Analyze consumer opinions on industry trends, competitor performance, and emerging market opportunities.
Political & Social Sentiment Analysis: Evaluate public opinion on political events, social movements, and global issues.
AI & Machine Learning Applications: Train sentiment analysis models for natural language processing (NLP) and predictive analytics.
Advertising & Campaign Performance: Measure the effectiveness of marketing campaigns by analyzing audience engagement and sentiment.
Our dataset is available in multiple formats (JSON, CSV, Excel) and can be delivered via API, cloud storage (AWS, Google Cloud, Azure), or direct download.
Gain valuable insights into social media sentiment and enhance your decision-making with high-quality, structured Twitter data.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for Dataset Name
Dataset Summary
This dataset card aims to be a base template for new datasets. It has been generated using this raw template.
Supported Tasks and Leaderboards
[More Information Needed]
Languages
[More Information Needed]
Dataset Structure
Data Instances
[More Information Needed]
Data Fields
[More Information Needed]
Data Splits
[More Information Needed]
Dataset Creation… See the full description on the dataset page: https://huggingface.co/datasets/Sp1786/multiclass-sentiment-analysis-dataset.