Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is a large-scale collection of 241,000+ English-language comments sourced from various online platforms. Each comment is annotated with a sentiment label:
The Data has been gathered from multiple websites such as :
Hugginface : https://huggingface.co/datasets/Sp1786/multiclass-sentiment-analysis-dataset
Kaggle : https://www.kaggle.com/datasets/abhi8923shriv/sentiment-analysis-dataset
https://www.kaggle.com/datasets/jp797498e/twitter-entity-sentiment-analysis
https://www.kaggle.com/datasets/crowdflower/twitter-airline-sentiment
The goal is to enable training and evaluation of multi-class sentiment analysis models for real-world text data. The dataset is already preprocessed — lowercase, cleaned from punctuation, URLs, numbers, and stopwords — and is ready for NLP pipelines.
| Column | Description |
|---|---|
Comment | User-generated text content |
Sentiment | Sentiment label (0=Negative, 1=Neutral, 2=Positive) |
Comment: "apple pay is so convenient secure and easy to use"
Sentiment: 2 (Positive)
Facebook
Twitterhttps://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Sentiment Analysis Dataset is a dataset for emotional analysis, including large-scale tweet text collected from Twitter and emotional polarity (0=negative, 2=neutral, 4=positive) labels for each tweet, featuring automatic labeling based on emoticons.
2) Data Utilization (1) Sentiment Analysis Dataset has characteristics that: • Each sample consists of six columns: emotional polarity, tweet ID, date of writing, search word, author, and tweet body, and is suitable for training natural language processing and classification models using tweet text and emotion labels. (2) Sentiment Analysis Dataset can be used to: • Emotional Classification Model Development: Using tweet text and emotional polarity labels, we can build positive, negative, and neutral emotional automatic classification models with various machine learning and deep learning models such as logistic regression, SVM, RNN, and LSTM. • Analysis of SNS public opinion and trends: By analyzing the distribution of emotions by time series and keywords, you can explore changes in public opinion on specific issues or brands, positive and negative trends, and key emotional keywords.
Facebook
TwitterDataset Card for Custom Text Dataset
Dataset Name
Custom Text Dataset
Overview
This dataset contains text data for training sentiment analysis models. The data is collected from various sources, including books, articles, and web pages.
Composition
Number of records: 50,000 Fields: text, label Size: 134 MB
Collection Process
The data was collected using web scraping and manual extraction from public domain sources.
Preprocessing… See the full description on the dataset page: https://huggingface.co/datasets/t7439/custom_sentiment_analysis_dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.
Below are the datasets specified, along with the details of their references, authors, and download sources.
----------- STS-Gold Dataset ----------------
The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.
Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.
File name: sts_gold_tweet.csv
----------- Amazon Sales Dataset ----------------
This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.
Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)
Features:
License: CC BY-NC-SA 4.0
File name: amazon.csv
----------- Rotten Tomatoes Reviews Dataset ----------------
This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.
This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).
Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics
File name: data_rt.csv
----------- Preprocessed Dataset Sentiment Analysis ----------------
Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.
The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).
DOI: 10.34740/kaggle/dsv/3877817
Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }
This dataset was used in the experimental phase of my research.
File name: EcoPreprocessed.csv
----------- Amazon Earphones Reviews ----------------
This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)
License: U.S. Government Works
Source: www.amazon.in
File name (original): AllProductReviews.csv (contains 14337 reviews)
File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)
----------- Amazon Musical Instruments Reviews ----------------
This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).
Source: http://jmcauley.ucsd.edu/data/amazon/
File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)
File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By [source]
This dataset consists of a comprehensive list of Portuguese words and the corresponding sentiment labels attached to them. By providing finer-grained annotation and labeling, this dataset allows for comparative sentiment analysis in Portuguese from Twitter and Buscapé reviews. With humans assigned to annotate this data, it provides an accurate measure of the sentiment of Portuguese words in multiple contexts. The labels range from positive to negative with numeric values, allowing for more nuanced categorization and comparison between different subcategories within reviews. Whether you’re mining social media conversations or utilizing customer feedback for analytics purposes, this labeled corpus provides an invaluable resource that can help inform your decision making process
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset, comprised of Twitter and Buscapé reviews from Portuguese-speaking areas, provides sentiment labels at the word level. This makes it easy to apply to natural language processing models for analysis. The corpus is composed of 3,457 tweets and 476 Buscapé reviews, with a total of 114 unique words in the lexicon along with associated human-annotated sentiment scores for each word.
To properly utilize this resource for comparative sentiment analysis, you need an environment that can read CSV files containing both text and numerical data. With such setting, users can use machine learning algorithms to compare words or phrases within texts or across different datasets and gain an understanding of the opinion expressed towards various topics so far as they have been labeled in this corpus. This data has been annotated according to 3 possible sentiment labels: negative (–1), neutral (0) or positive (+1).
In order to work with this dataset effectively here are some tips:
- Familiarize yourself with the data which contains a list of Portuguese words and their associated sentiment labels – by reading through a full content list you will be able to understand how it works better;
- Create a visualization tool that allows you not only see the weight assigned for each word but also do comparative analyses such as finding differences between same nouns used in different sentences;
- Analyzing text holistically by taking into account contextual information;
Experimenting on different methods that may increase accuracy when dealing with unequal distribution of examples due to class imbalance;
By applying these above measures one should easily achieve reliable results by making use of this linguistically labeled database generated from two distinct corpora including tweets and Buscapé reviews which have previously never been bridged together like this before! With its help it is now easier than ever before gain insights into people’s opinion on various products based on their textual expressions in real time!
- Comparing the sentiment of Twitter and Buscapé reviews to identify trends in customer opinions over time.
- Understanding how the sentiment of customer reviews compares between different Portuguese languages and dialects.
- Utilizing the labeled corpus for training machine learning models in natural language processing tasks such as sentiment analysis, text classification, and automated opinion summarization
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: portuguese_lexicon.csv
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit .
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
These fields contain sentiment analysis data, tweet details, and content.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Name: BBC Articles Sentiment Analysis Dataset
Source: BBC News
Description: This dataset consists of articles from the BBC News website, containing a diverse range of topics such as business, politics, entertainment, technology, sports, and more. The dataset includes articles from various time periods and categories, along with labels representing the sentiment of the article. The sentiment labels indicate whether the tone of the article is positive, negative, or neutral, making it suitable for sentiment analysis tasks.
Number of Instances: [Specify the number of articles in the dataset, for example, 2,225 articles]
Number of Features: 1. Article Text: The content of the article (string). 2. Sentiment Label: The sentiment classification of the article. The possible labels are: - Positive - Negative - Neutral
Data Fields: - id: Unique identifier for each article. - category: The category or topic of the article (e.g., business, politics, sports). - title: The title of the article. - content: The full text of the article. - sentiment: The sentiment label (positive, negative, or neutral).
Example: | id | category | title | content | sentiment | |----|-----------|---------------------------|-------------------------------------------------------------------------|-----------| | 1 | Business | "Stock Market Surge" | "The stock market has surged to new highs, driven by strong earnings..." | Positive | | 2 | Politics | "Election Results" | "The election results were a mixed bag, with some surprises along the way." | Neutral | | 3 | Sports | "Team Wins Championship" | "The team won the championship after a thrilling final match." | Positive | | 4 | Technology | "New Smartphone Release" | "The new smartphone release has received mixed reactions from users." | Negative |
Preprocessing Notes: - The text has been preprocessed to remove special characters and any HTML tags that might have been included in the original articles. - Tokenization or further text cleaning (e.g., lowercasing, stopword removal) may be necessary depending on the model and method used for sentiment classification.
Use Case: This dataset is ideal for training and evaluating machine learning models for sentiment classification, where the goal is to predict the sentiment (positive, negative, or neutral) based on the article's text.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sentiment Analysis outputs based on the combination of three classifiers for news headlines and body text covering the Olympic legacy of Rio 2016 and London 2012. Data was searched via Google search engine. It is composed of sentiment labels assigned to 1271 news articles in total. News outlets: BBC Daily Mail The Telegraph The Guardian Globo Estadao Folha de S. Paulo Events covered by the articles: London 2012 Olympic legacy Rio 2016 Olympic legacy All classifiers were used in texts in English. Text originally published in Portuguese by the Brazilian media were automatically translated. Sentiment classifiers used: Vader BERT (Trained on Amazon data) BERT (Trained on twitter data - 140) Each document (spreadsheet - xlsx) refers to one outlet and one event (London 2012 or Rio 2016). How were labels assigned to the texts? These labels are a combination of the three sentiment classifiers listed above. If two of them agree with the same label, then this label would be considered as right. Otherwise, the label ‘other’ was assigned. For news article body text: the proportion of sentences of each sentiment type was used to assign labels to the whole article instead of averaging the sentence scores. For example, if the proportion of sentences with negative labels is greater than 50%, then the article is assigned a negative label. The documents are composed of the following columns: Rank: the position of the article on Google search ranking Date: date of article's publication (DD/MM/YYYY) Link: article's link Title: article's title Sentiment_Title: final sentiment for article headline Sentiment_Text: final sentiment for article's body text PS: Documents do not include articles' body text. Sentiment is presented in labels as follows: Pos: Positive Neg: Negative Neutral: Neutral other: inconclusive - if each of the 3 classifiers assigned a different label to the article, the label 'other' was used. Therefore, 'other' identifies contradictory results.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
[Overview]This dataset targets multimodal sentiment/emotion analysis. It contains aligned and processed features and intermediate artifacts derived from the public dataset(s)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Bangla Product Comments Dataset is a comprehensive collection of product reviews gathered from diverse ecommerce platforms in Bangladesh. This dataset offers a rich source of information reflecting customer opinions and sentiments towards various products available online. This dataset holds significant value for businesses, researchers, and data scientists interested in understanding consumer behavior, product perception, and sentiment analysis within the Bangladeshi ecommerce landscape. By leveraging this dataset, stakeholders can derive actionable insights to enhance product quality, marketing strategies, and overall customer satisfaction.
Columns:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides a comprehensive and realistic representation of customer sentiment across multiple online and offline shopping platforms. It contains 25,000 customer feedback records, each including demographic attributes, product categories, purchase channels, ratings, review text, and sentiment classification.
The dataset reflects how customers express their experiences on platforms such as Amazon, Flipkart, Meesho, Facebook Marketplace, Myntra, Ajio, Nykaa, Croma, Boat, Reliance Digital, BigBasket, JioMart, Swiggy Instamart, Zepto, and many others. It captures a wide spectrum of sentiments, from highly satisfied customers praising product quality and delivery speed to dissatisfied users reporting issues such as delayed deliveries, low-quality items, or unsatisfactory support.
Each review is paired with a star rating (1 to 5). Ratings of 4 and 5 are mapped to positive sentiment, 3 to neutral, and 1 and 2 to negative sentiment. Corresponding review text is generated to match the sentiment tone, making the dataset ideal for text and sentiment understanding.
In addition to sentiment and rating, the dataset includes essential service metrics such as response time (in hours), whether the issue was resolved, and whether a formal complaint was registered. This creates a richer ecosystem of customer experience and feedback patterns.
The dataset is suitable for a wide variety of uses, including customer insight studies, retail analytics, sentiment analysis, product review exploration, behavior understanding, or business decision making. Since the dataset is fully synthetic and free from personal identifiers, it is safe for all academic, analytical, and research purposes.
Facebook
TwitterAttribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Corpus consisting of 10,000 Facebook posts manually annotated on sentiment (2,587 positive, 5,174 neutral, 1,991 negative and 248 bipolar posts). The archive contains data and statistics in an Excel file (FBData.xlsx) and gold data in two text files with posts (gold-posts.txt) and labels (gols-labels.txt) on corresponding lines.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Kurdish language is regarded as one of the less-resourced languages. The language is globally practised by 30-40 people. The language has 33 letters that are largely similar to the Arabic language. The Kurdish language has two major dialects Sorani and Badini. The dataset includes a collection of texts written in the Sorani dialect. It contains tweets the Twitter API. Due to security reasons and following the policies of Twitter, we removed the user's identity. We collected the tweets which was published during the time of the Corona Virus pandemic. The tweets are raw texts, and the content covers a varied range of topics, starting from politics, sports, entertainment, social life, etc. Data Labeling We used the Twitter developer (Twitter API) to mine the tweets. The dataset was annotated manually by three Kurdish native speakers. The annotators were required to identify the classes and categories of each text. The classes included positive, negative and neutral and the categories consisted of news, technology, art, social and health. The texts which were agreed upon by at least two annotators to possess a specific label and category were regarded as conflict-free and accepted for further processing. Other texts that caused conflict among all three raters were ignored and have been removed from the dataset. The doccano program was used to help the annotators label each text one by one.
Facebook
TwitterWith the advent and expansion of social networking, the amount of generated text data has seen a sharp increase. In order to handle such a huge volume of text data, new and improved text mining techniques are a necessity. One of the characteristics of text data that makes text mining difficult, is multi-labelity. In order to build a robust and effective text classification method which is an integral part of text mining research, we must consider this property more closely. This kind of property is not unique to text data as it can be found in non-text (e.g., numeric) data as well. However, in text data, it is most prevalent. This property also puts the text classification problem in the domain of multi-label classification (MLC), where each instance is associated with a subset of class-labels instead of a single class, as in conventional classification. In this paper, we explore how the generation of pseudo labels (i.e., combinations of existing class labels) can help us in performing better text classification and under what kind of circumstances. During the classification, the high and sparse dimensionality of text data has also been considered. Although, here we are proposing and evaluating a text classification technique, our main focus is on the handling of the multi-labelity of text data while utilizing the correlation among multiple labels existing in the data set. Our text classification technique is called pseudo-LSC (pseudo-Label Based Subspace Clustering). It is a subspace clustering algorithm that considers the high and sparse dimensionality as well as the correlation among different class labels during the classification process to provide better performance than existing approaches. Results on three real world multi-label data sets provide us insight into how the multi-labelity is handled in our classification process and shows the effectiveness of our approach.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset comprises over one million YouTube comments, each annotated with sentiment labels—**Positive**, Neutral, or Negative. The comments span a diverse range of topics including programming, news, sports, politics and more, and are enriched with comprehensive metadata to facilitate various NLP and sentiment analysis tasks.
Each record in the dataset includes the following fields: - CommentID: A unique identifier assigned to each YouTube comment. This allows for individual tracking and analysis of comments. - VideoID: The unique identifier of the YouTube video to which the comment belongs. This links each comment to its corresponding video. - VideoTitle: The title of the YouTube video where the comment was posted. This provides context about the video's content. - AuthorName: The display name of the user who posted the comment. This indicates the commenter's identity. - AuthorChannelID: The unique identifier of the YouTube channel of the comment's author. This allows for tracking comments across different videos from the same author. - CommentText: The actual text content of the YouTube comment. This is the raw data used for sentiment analysis. - Sentiment: The sentiment classification of the comment, typically categorized as positive, negative, or neutral. This represents the emotional tone of the comment. - Likes: The number of likes received by the comment. This indicates the comment's popularity or agreement from other users. - Replies: The number of replies to the comment. This indicates the level of engagement and discussion generated by the comment. - PublishedAt: The date and time when the comment was published. This allows for time-based analysis of comment trends. - CountryCode: The two-letter country code of the user that posted the comment. This can be used to analyze regional sentiment. - CategoryID: The category ID of the video that the comment was posted on. This allows for analysis of sentiment across video categories.
This dataset is open-sourced to encourage collaboration and innovation. Detailed documentation and the code used for extraction, labeling, and augmentation are available in the accompanying GitHub repository.
Facebook
TwitterThis dataset provides millions of consumer reviews enriched with sentiment labels (positive, neutral, or negative), making it an essential asset for training AI models, analyzing customer satisfaction, and detecting risk signals in customer feedback.
Collected across 970+ marketplaces (including Amazon, eBay, Temu, Flipkart, and others) and spanning 160+ industries, it reflects how consumers express delight, frustration, or dissatisfaction in real purchase and service situations.
Each entry includes:
Use this dataset to:
Whether you're building models or measuring brand trust, this dataset offers a structured view of consumer emotion, helping you turn unstructured feedback into meaningful action.
The more you purchase, the lower the price will be.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Dataset
This dataset contains positive , negative and notr sentences from several data sources given in the references. In the most sentiment models , there are only two labels; positive and negative. However , user input can be totally notr sentence. For such cases there were no data I could find. Therefore I created this dataset with 3 class. Positive and negative sentences are listed below. Notr examples are extraced from turkish wiki dump. In addition, added some random text… See the full description on the dataset page: https://huggingface.co/datasets/winvoker/turkish-sentiment-analysis-dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the data produced by the running of the Naive Bayes classifier algorithm. It is a list of every word in the vocabulary of the classifier, as well as the number of occurrences of each word, as well as the likelihood ratio of this word. Please note the likelihood ratio is calculated by taking the likelihood of word given a positive label divided by the likelihood of a word given a negative label. This data is licensed under the CC BY 4.0 international license, and may be taken and used freely with credit given. This data was produced by two different datasets, using a Naive Bayes classifier. These datasets were the Polarity Review v2.0 dataset from Cornell, and the Large Movie Review Dataset from Stanford.
Facebook
Twitterhttps://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Explore the surging Text Annotation Tool market, projected to reach $850 million by 2025 with an 18.5% CAGR. Discover key drivers like NLP and AI adoption, alongside market trends and competitive landscape.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
We provide annotated datasets on a three-point sentiment scale (positive, neutral and negative) for Serbian, Bosnian, Macedonian, Albanian, and Estonian. For all languages except Estonian, we include pairs of source URL (where corresponding text can be found) and sentiment label.
For Estonian, we randomly sampled 100 articles from "Ekspress news article archive (in Estonian and Russian) 1.0" (http://hdl.handle.net/11356/1408).
The data is organized in Tab-Separated Values (TSV) format. For Serbian, Bosnian, Macedonian, and Albanian, the dataset contains two columns: sourceURL and sentiment. For Estonian, the dataset consists of three columns: text ID (from the CLARIN.SI reference above), body text, and sentiment label.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is a large-scale collection of 241,000+ English-language comments sourced from various online platforms. Each comment is annotated with a sentiment label:
The Data has been gathered from multiple websites such as :
Hugginface : https://huggingface.co/datasets/Sp1786/multiclass-sentiment-analysis-dataset
Kaggle : https://www.kaggle.com/datasets/abhi8923shriv/sentiment-analysis-dataset
https://www.kaggle.com/datasets/jp797498e/twitter-entity-sentiment-analysis
https://www.kaggle.com/datasets/crowdflower/twitter-airline-sentiment
The goal is to enable training and evaluation of multi-class sentiment analysis models for real-world text data. The dataset is already preprocessed — lowercase, cleaned from punctuation, URLs, numbers, and stopwords — and is ready for NLP pipelines.
| Column | Description |
|---|---|
Comment | User-generated text content |
Sentiment | Sentiment label (0=Negative, 1=Neutral, 2=Positive) |
Comment: "apple pay is so convenient secure and easy to use"
Sentiment: 2 (Positive)