76 datasets found

Twitter Tweets Sentiment Dataset
kaggle.com
opendatabay.com
Updated Apr 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
M Yasser H (2022). Twitter Tweets Sentiment Dataset [Dataset]. https://www.kaggle.com/datasets/yasserh/twitter-tweets-sentiment-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 8, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
M Yasser H
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
https://raw.githubusercontent.com/Masterx-AI/Project_Twitter_Sentiment_Analysis_/main/twitt.jpg" alt="">

Description:

Twitter is an online Social Media Platform where people share their their though as tweets. It is observed that some people misuse it to tweet hateful content. Twitter is trying to tackle this problem and we shall help it by creating a strong NLP based-classifier model to distinguish the negative tweets & block such tweets. Can you build a strong classifier model to predict the same?

Each row contains the text of a tweet and a sentiment label. In the training set you are provided with a word or phrase drawn from the tweet (selected_text) that encapsulates the provided sentiment.

Make sure, when parsing the CSV, to remove the beginning / ending quotes from the text field, to ensure that you don't include them in your training.

You're attempting to predict the word or phrase from the tweet that exemplifies the provided sentiment. The word or phrase should include all characters within that span (i.e. including commas, spaces, etc.)

Columns:

textID - unique ID for each piece of text

text - the text of the tweet

sentiment - the general sentiment of the tweet

Acknowledgement:

The dataset is download from Kaggle Competetions:
https://www.kaggle.com/c/tweet-sentiment-extraction/data?select=train.csv

Objective:

Understand the Dataset & cleanup (if required).

Build classification models to predict the twitter sentiments.

Compare the evaluation metrics of vaious classification algorithms.
A
‘Sentiment Analysis Dataset’ analyzed by Analyst-2
analyst-2.ai
Updated Oct 17, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2016). ‘Sentiment Analysis Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-sentiment-analysis-dataset-caeb/f26f1fc2/?iid=004-932&v=presentation
Explore at:
Dataset updated
Oct 17, 2016
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Sentiment Analysis Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/sonaam1234/sentimentdata on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Data for sentiment analysis

--- Original source retains full ownership of the source dataset ---
Twitter Sentiment Analysis Dataset
kaggle.com
zip
Updated Feb 13, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zohair Ahmed (2021). Twitter Sentiment Analysis Dataset [Dataset]. https://www.kaggle.com/zohairahmed007/twitter-sentiment-analysis-dataset
Explore at:
zip(38737743 bytes)Available download formats
Dataset updated
Feb 13, 2021
Authors
Zohair Ahmed
Description
Dataset

This dataset was created by Zohair Ahmed

Contents
A
‘Financial Sentiment Analysis’ analyzed by Analyst-2
analyst-2.ai
Updated Aug 4, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘Financial Sentiment Analysis’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-financial-sentiment-analysis-5b39/latest
Explore at:
Dataset updated
Aug 4, 2020
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Financial Sentiment Analysis’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/sbhatti/financial-sentiment-analysis on 13 February 2022.

--- Dataset description provided by original source is as follows ---

Data

The following data is intended for advancing financial sentiment analysis research. It's two datasets (FiQA, Financial PhraseBank) combined into one easy-to-use CSV file. It provides financial sentences with sentiment labels.

Citations

Malo, Pekka, et al. "Good debt or bad debt: Detecting semantic orientations in economic texts." Journal of the Association for Information Science and Technology 65.4 (2014): 782-796.

--- Original source retains full ownership of the source dataset ---
A
‘Product Reviews and Ratings (Sentiment Analysis)’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Product Reviews and Ratings (Sentiment Analysis)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-product-reviews-and-ratings-sentiment-analysis-fb82/latest
Explore at:
Dataset updated
Feb 13, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Product Reviews and Ratings (Sentiment Analysis)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/mafaisal007/product-reviews-and-ratings-sentiment-analysis on 13 February 2022.

--- Dataset description provided by original source is as follows ---

Context

This dataset is from a toy store in Europe that contains customer reviews about a particular prodcut it is to be used for text mining and sentiment anlaysis.

--- Original source retains full ownership of the source dataset ---
o
NLP Preprocessed Sentiment Dataset
opendatabay.com
.undefined
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). NLP Preprocessed Sentiment Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/6323a1b5-7112-49bd-ad55-c1ef6968abc3
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 3, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Data Science and Analytics
Description
This dataset is a substantial collection of over 241,000 English-language comments, gathered from various online platforms. Each comment within the dataset has been carefully annotated with a sentiment label: 0 for negative sentiment, 1 for neutral, and 2 for positive. The primary aim of this dataset is to facilitate the training and evaluation of multi-class sentiment analysis models, designed to work effectively with real-world text data. The dataset has undergone a preprocessing stage, ensuring comments are in lowercase, and are cleaned of punctuation, URLs, numbers, and stopwords, making it readily usable for Natural Language Processing (NLP) pipelines.

Columns

Comment: This column contains the user-generated text content.

Sentiment: This column provides the corresponding sentiment label for each comment, where 0 denotes Negative, 1 denotes Neutral, and 2 denotes Positive.

Distribution

The dataset comprises over 241,000 records. While the specific file format is not detailed, such datasets are typically provided in a tabular format, often as a CSV file. It is structured with two distinct columns as described above, suitable for direct integration into machine learning workflows.

Usage

This dataset is ideally suited for a variety of applications and use cases, including: * Training sentiment classifiers utilising advanced models such as LSTM, BiLSTM, CNN, BERT, or RoBERTa. * Evaluating the efficacy of different preprocessing and tokenisation strategies for text data. * Benchmarking NLP models on multi-class classification tasks to assess their performance. * Supporting educational projects and research initiatives in the fields of opinion mining or text classification. * Fine-tuning transformer models on a large and diverse collection of sentiment-annotated text.

Coverage

The dataset's coverage is global, comprising English-language comments. It focuses on general user-generated text content without specific demographic notes. The dataset is listed with a version of 1.0.

License

CC0

Who Can Use It

This dataset is suitable for individuals and organisations involved in data science and analytics. Intended users include: * Data Scientists and Machine Learning Engineers for developing and deploying sentiment analysis models. * Researchers and Academics for studies in NLP, text classification, and opinion mining. * Students undertaking educational projects in artificial intelligence and machine learning.

Dataset Name Suggestions

Multi-class Comment Sentiment Data

User Text Sentiment Collection

Online Comment Sentiment Analysis Dataset

English Sentiment Labelled Comments

Preprocessed Sentiment Dataset

Attributes

Original Data Source: Sentiment Analysis Dataset
f
Twitter dataset
figshare.com
csv
Updated Feb 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shreyas Poojary; Mohammed Riza; Rashmi Laxmikant Malghan (2025). Twitter dataset [Dataset]. http://doi.org/10.6084/m9.figshare.28390334.v2
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28390334.v2
Dataset updated
Feb 11, 2025
Dataset provided by
figshare
Authors
Shreyas Poojary; Mohammed Riza; Rashmi Laxmikant Malghan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains tweets labeled for sentiment analysis, categorized into Positive, Negative, and Neutral sentiments. The dataset includes tweet IDs, user metadata, sentiment labels, and tweet text, making it suitable for Natural Language Processing (NLP), machine learning, and AI-based sentiment classification research. Originally sourced from Kaggle, this dataset is curated for improved usability in social media sentiment analysis.
Stock Market Dataset for Predictive Analysis
kaggle.com
Updated Feb 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WARNER (2025). Stock Market Dataset for Predictive Analysis [Dataset]. https://www.kaggle.com/datasets/s3programmer/stock-market-dataset-for-predictive-analysis
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 24, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
WARNER
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This Stock Market Dataset is designed for predictive analysis and machine learning applications in financial markets. It includes 13647 records of simulated stock trading data with features commonly used in stock price forecasting.

🔹 Key Features Date – Trading day timestamps (business days only) Open, High, Low, Close – Simulated stock prices Volume – Trading volume per day RSI (Relative Strength Index) – Measures market momentum MACD (Moving Average Convergence Divergence) – Trend-following momentum indicator Sentiment Score – Simulated market sentiment from financial news & social media Target – Binary label (1: Price goes up, 0: Price goes down) for next-day prediction This dataset is useful for training hybrid deep learning models such as LSTM, CNN, and Attention-based networks for stock market forecasting. It enables financial analysts, traders, and AI researchers to experiment with market trends, technical analysis, and sentiment-based predictions.
A
‘Sentiment Analysis of Commodity News (Gold)’ analyzed by Analyst-2
analyst-2.ai
Updated Sep 27, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Sentiment Analysis of Commodity News (Gold)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-sentiment-analysis-of-commodity-news-gold-732f/e3232de2/?iid=002-045&v=presentation
Explore at:
Dataset updated
Sep 27, 2021
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Sentiment Analysis of Commodity News (Gold)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/ankurzing/sentiment-analysis-in-commodity-market-gold on 14 February 2022.

--- Dataset description provided by original source is as follows ---

Context

This is a news dataset for the commodity market where we have manually annotated 11,412 news headlines across multiple dimensions into various classes. The dataset has been sampled from a period of 20+ years (2000-2021).

Content

The dataset has been collected from various news sources and annotated by three human annotators who were subject experts. Each news headline was evaluated on various dimensions, for instance - if a headline is a price related news then what is the direction of price movements it is talking about; whether the news headline is talking about the past or future; whether the news item is talking about asset comparison; etc.

Acknowledgements

Sinha, Ankur, and Tanmay Khandait. "Impact of News on the Commodity Market: Dataset and Results." In Future of Information and Communication Conference, pp. 589-601. Springer, Cham, 2021.

https://arxiv.org/abs/2009.04202 Sinha, Ankur, and Tanmay Khandait. "Impact of News on the Commodity Market: Dataset and Results." arXiv preprint arXiv:2009.04202 (2020)

We would like to acknowledge the financial support provided by the India Gold Policy Centre (IGPC).

Inspiration

Commodity prices are known to be quite volatile. Machine learning models that understand the commodity news well, will be able to provide an additional input to the short-term and long-term price forecasting models. The dataset will also be useful in creating news-based indicators for commodities.

Apart from researchers and practitioners working in the area of news analytics for commodities, the dataset will also be useful for researchers looking to evaluate their models on classification problems in the context of text-analytics. Some of the classes in the dataset are highly imbalanced and may pose challenges to the machine learning algorithms.

--- Original source retains full ownership of the source dataset ---
o
Data from: Financial News Sentiment Analysis Dataset
opendatabay.com
.undefined
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Financial News Sentiment Analysis Dataset [Dataset]. https://www.opendatabay.com/data/dataset/6c0503f5-8003-44c8-a56c-d65bec8e5d40
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 3, 2025
Dataset authored and provided by
Datasimple
Area covered
Data Science and Analytics
Description
This dataset provides fine-grained financial sentiment analysis on news headlines. It is a human-annotated collection of over 10,700 news headlines, specifically designed to address the challenging task of sentiment extraction in financial news where multiple entities may be present, often with conflicting sentiments. Over 2,800 headlines within the dataset feature multiple entities. The dataset is balanced, containing more than 4,100 positive entities, 3,200 negative entities, and 4,500 neutral entities. It is ideal for furthering research in entity-aware sentiment analysis and can be used for training models for extracting financial named entities.

Columns

S No.: Represents the serial number for each entry.

Title: Contains the news headlines.

Decisions: Provides the sentiment annotations for various financial entities identified within the news headlines.

Words: Indicates the number of words in each news headline.

Distribution

The dataset is provided as a data file, typically in CSV format. It contains over 10,700 distinct news headlines. The structure includes human-annotated sentiment labels for financial entities, with sentiments balanced across positive, negative, and neutral categories. The dataset includes headlines with varying word counts, and its sentiment annotations are quite diverse, representing 10,686 unique values for entity sentiments.

Usage

This dataset is well-suited for several applications: * Performing Aspect-based Sentiment Analysis on financial texts. * Training machine learning models for the extraction of named financial entities. * Conducting research into fine-grained financial sentiment analysis. * Validating the effect of news sentiments on aggregate market movements. * Developing and evaluating learning schemes, including those utilising lexicon-based and pre-trained sentence representations, and various classification approaches.

Coverage

The dataset has a global regional coverage. While the listing date is 08/06/2025, the underlying research for SEntFiN 1.0 was published in 2022. The news headlines themselves do not specify a fixed time range within the provided information, focusing instead on the content and its annotations. No specific demographic scope is outlined, but the content is inherently focused on financial markets and related entities.

License

CC-BY

Who Can Use It

This dataset is beneficial for a range of users and purposes: * Researchers focusing on financial Natural Language Processing (NLP) and sentiment analysis. * Data Scientists and Analysts working on text classification, entity extraction, and market behaviour prediction. * Anyone involved in Business or Finance seeking to understand or model sentiment from news. * Developers of AI and Machine Learning models requiring human-annotated financial text data.

Dataset Name Suggestions

SEntFiN 1.0

Financial News Entity Sentiment Headlines

Aspect-Based Financial Sentiment Data

Annotated Financial News for NLP

Financial News Sentiment Analysis Dataset

Attributes

Original Data Source: Aspect based Sentiment Analysis for Financial News
Human Written Text
kaggle.com
Updated May 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Youssef Elebiary (2025). Human Written Text [Dataset]. https://www.kaggle.com/datasets/youssefelebiary/human-written-text
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 13, 2025
Dataset provided by
Kaggle
Authors
Youssef Elebiary
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Overview

This dataset contains 20000 pieces of text collected from Wikipedia, Gutenberg, and CNN/DailyMail. The text is cleaned by replacing symbols such as (.*?/) with a white space using automatic scripts and regex.

Data Source Distribution

10,000 Wikipedia Articles: From the 20220301 dump.

3,000 Gutenberg Books: Via the GutenDex API.

7,000 CNN/DailyMail News Articles: From the CNN/DailyMail 3.0.0 dataset.

Why These Sources

The data was collected from these source to ensure the highest level of integrity against AI generated text. * Wikipedia: The 20220301 dataset was chosen to minimize the chance of including articles generated or heavily edited by AI. * Gutenberg: Books from this source are guaranteed to be written by real humans and span various genres and time periods. * CNN/DailyMail: These news articles were written by professional journalists and cover a variety of topics, ensuring diversity in writing style and subject matter.

Dataset Structure

The dataset consists of 5 CSV files. 1. CNN_DailyMail.csv: Contains all processed news articles. 2. Gutenberg.csv: Contains all processed books. 3. Wikipedia.csv: Contains all processed Wikipedia articles. 4. Human.csv: Combines all three datasets in order. 5. Shuffled_Human.csv: This is the randomly shuffled version of Human.csv.

Each file has 2 columns: - Title: The title of the item. - Text: The content of the item.

Uses

This dataset is suitable for a wide range of NLP tasks, including: - Training models to distinguish between human-written and AI-generated text (Human/AI classifiers). - Training LSTMs or Transformers for chatbots, summarization, or topic modeling. - Sentiment analysis, genre classification, or linguistic research.

Disclaimer

While the data was collected from such sources, the data may not be 100% pure from AI generated text. Wikipedia articles may reflect systemic biases in contributor demographics. CNN/DailyMail articles may focus on specific news topics or regions.

For details on how the dataset was created, click here to view the Kaggle notebook used.

Licensing

This dataset is published under the MIT License, allowing free use for both personal and commercial purposes. Attribution is encouraged but not required.
o
Film Review Binary Sentiment Dataset
opendatabay.com
.undefined
Updated Jul 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Film Review Binary Sentiment Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/dfae630e-27db-4111-905d-0eb10d8f3ce7
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 5, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Entertainment & Media Consumption
Description
This dataset provides a collection of 10,000 text reviews about films, each classified with a binary sentiment label: either positive or negative [1]. It serves as a small version of IMDB text reviews, ideal for machine learning projects focused on sentiment analysis and natural language processing [1, 2].

Columns

review: This column contains the full text of the movie review itself [1].

sentiment: This column indicates the sentiment classification for the corresponding review. A value of 0 represents a positive sentiment, while 1 signifies a negative sentiment [1].

Distribution

The dataset comprises 10,000 unique movie reviews [1]. It is typically distributed in a CSV data file format [3]. Out of the total reviews, 5,037 are classified as positive (0) and 4,963 are classified as negative (1), offering a balanced distribution for training purposes [1]. The dataset is available globally [2].

Usage

This dataset is particularly well-suited for a variety of applications, including: * Training and evaluating sentiment analysis models [1]. * Developing and testing algorithms for binary text classification [4]. * Enhancing Natural Language Processing (NLP) capabilities [4]. * Experimenting with Transformers and PyTorch models for text understanding [4].

Coverage

The dataset focuses on general movie reviews [1, 2]. While specific geographic or demographic details about the original reviewers are not provided, the dataset is listed as having a global region coverage [2]. The listing date for the dataset is noted as 17/06/2025 [2].

License

CCO

Who Can Use It

This dataset is valuable for: * Data scientists and machine learning engineers who need labelled text data to build and improve sentiment prediction models [1]. * Researchers in the fields of NLP and artificial intelligence exploring text classification techniques [4]. * Students and developers learning about text data processing and sentiment analysis [5].

Dataset Name Suggestions

IMDB Movie Review Sentiment (10K)

Film Review Binary Sentiment Dataset

Movie Sentiment Classification Dataset

Textual Movie Review Sentiment Analysis

Attributes

Original Data Source: imdb_sentiment_10k_reviews_binary_classification
A
‘Data for Aspect Based Sentimental Analysis (ABSA)’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Data for Aspect Based Sentimental Analysis (ABSA)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-data-for-aspect-based-sentimental-analysis-absa-ccb8/010e645a/?iid=000-672&v=presentation
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Data for Aspect Based Sentimental Analysis (ABSA)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/kandhalkhandeka/data-for-aspect-based-sentimental-analysis on 28 January 2022.

--- Dataset description provided by original source is as follows ---

This data consists of reviews about an app along with a feature consisting of a word from the review. We can use aspect based sentimental analysis to check for the sentiment of the word w.r.t the text in the review!

--- Original source retains full ownership of the source dataset ---
o
Turkish Tweet Sentiment Analysis Dataset
opendatabay.com
.undefined
Updated Jul 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Turkish Tweet Sentiment Analysis Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/d13ec616-4d59-4471-8b41-1f69546351ed
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 6, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Data Science and Analytics
Description
This dataset comprises over 11,000 tweets primarily in Turkish, curated to facilitate sentiment analysis and the detection of cyberbullying within social media contexts. Each tweet is pre-labelled with either a positive or negative sentiment, making it ideal for training and evaluating machine learning models. The dataset's creation was specifically driven by a project focused on identifying cyberbullying, providing a valuable resource for similar research and development efforts.

Columns

Tip: This column indicates the sentiment label for each tweet, classifying it as either 'Positive' or 'Negative'.

Paylaşım: This column contains the full text of the tweet itself.

Distribution

The dataset is structured as a collection of individual social media posts. It contains 11,006 unique entries, with sentiment distribution approximately 55% positive and 45% negative. The specific file format is typically CSV, though a sample file will be made available separately on the platform.

Usage

This dataset is particularly well-suited for applications such as: * Developing and testing algorithms for social media sentiment analysis. * Building models for the detection and classification of online cyberbullying. * Research in Natural Language Processing (NLP) and Deep Learning, especially concerning Turkish text. * General data science and analytics projects requiring labelled social media data.

Coverage

The dataset primarily covers social media content written in Turkish, making it linguistically specific. While no explicit geographic or time range is specified for the tweets' origin or collection period, its focus on Turkish language posts implies a scope relevant to Turkish-speaking online communities.

License

CC0

Who Can Use It

This dataset is designed for use by: * Data Scientists: For developing and refining sentiment analysis and classification models. * Machine Learning Engineers: To train and test deep learning models on text data. * NLP Researchers: For studies on linguistic patterns, sentiment, and cyberbullying detection in Turkish. * Academics and Students: For educational projects, research, and thesis work related to social media analysis and AI. * Organisations: Looking to implement social media monitoring or content moderation systems.

Dataset Name Suggestions

Turkish Tweet Sentiment Analysis Dataset

Social Media Cyberbullying Tweets (Turkish)

Turkish Sentiment Labelled Tweets

Turkish Social Media Sentiment Dataset

Attributes

Original Data Source: Türkçe Sosyal Medya Paylaşımı Veri Seti
A
‘Stock Sentiment’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Stock Sentiment’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-stock-sentiment-6560/latest
Explore at:
Dataset updated
Feb 14, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Stock Sentiment’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/purvitsharma/stock-sentiment on 14 February 2022.

--- Dataset description provided by original source is as follows ---

Introduction

Stocks Sentiment dataset contains text, Tweets, related to changes in Stock prices and its other necessary statistics like whether the customer is willing to buy or sell the stocks.

Data Info

There are Two columns in the dataset representing Text and its respective sentiment regarding Stocks.

--- Original source retains full ownership of the source dataset ---
IMDB 50K Movie Reviews (TEST your BERT)
kaggle.com
Updated Dec 19, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Atul Anand {Jha} (2019). IMDB 50K Movie Reviews (TEST your BERT) [Dataset]. https://www.kaggle.com/atulanandjha/imdb-50k-movie-reviews-test-your-bert/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 19, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Atul Anand {Jha}
License
http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
Description
Context

Large Movie Review Dataset v1.0 . 😃

https://static.amazon.jobs/teams/53/images/IMDb_Header_Page.jpg?1501027252" alt="IMDB wall">

This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. Provided a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well. Raw text and already processed bag of words formats are provided.

In the entire collection, no more than 30 reviews are allowed for any given movie because reviews for the same movie tend to have correlated ratings. Further, the train and test sets contain a disjoint set of movies, so no significant performance is obtained by memorising movie-unique terms and their associated with observed labels. In the labelled train/test sets, a negative review has a score <= 4 out of 10, and a positive review has a score >= 7 out of 10. Thus reviews with more neutral ratings are not included in the train/test sets. In the unsupervised set, reviews of any rating are included and there are an even number of reviews > 5 and <= 5.

Reference: http://ai.stanford.edu/~amaas/data/sentiment/

NOTE

A starter kernel is here : https://www.kaggle.com/atulanandjha/bert-testing-on-imdb-dataset-starter-kernel

A kernel to expose Dataset collection :

Content

Now let’s understand the task in hand: given a movie review, predict whether it’s positive or negative.

The dataset we use is 50,000 IMDB reviews (25K for train and 25K for test) from the PyTorch-NLP library.

Each review is tagged pos or neg .

There are 50% positive reviews and 50% negative reviews both in train and test sets.

Columns:

text : Reviews from people.

Sentiment : Negative or Positive tag on the review/feedback (Boolean).

Acknowledgements

When using this Dataset Please Cite this ACL paper using :

@InProceedings{

maas-EtAl:2011:ACL-HLT2011,

author = {Maas, Andrew L. and Daly, Raymond E. and Pham, Peter T. and Huang, Dan and Ng, Andrew Y. and Potts, Christopher},

title = {Learning Word Vectors for Sentiment Analysis},

booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies},

month = {June},

year = {2011},

address = {Portland, Oregon, USA},

publisher = {Association for Computational Linguistics},

pages = {142--150},

url = {http://www.aclweb.org/anthology/P11-1015}

}

Link to ref Dataset: https://pytorchnlp.readthedocs.io/en/latest/_modules/torchnlp/datasets/imdb.html

https://www.samyzaf.com/ML/imdb/imdb.html

Inspiration

BERT and other Transformer Architecture models have always been on hype recently due to a great breakthrough by introducing Transfer Learning in NLP. So, Let's use this simple yet efficient Data-set to Test these models, and also compare our results with theirs. Also, I invite fellow researchers to try out their State of the Art Algorithms on this data-set.
A
AI Training Dataset Report
datainsightsmarket.com
doc, pdf, ppt
Updated Apr 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). AI Training Dataset Report [Dataset]. https://www.datainsightsmarket.com/reports/ai-training-dataset-1501897
Explore at:
pdf, doc, pptAvailable download formats
Dataset updated
Apr 30, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The AI training dataset market is experiencing robust growth, driven by the increasing adoption of artificial intelligence across diverse sectors. The market's expansion is fueled by the urgent need for high-quality data to train sophisticated AI models capable of handling complex tasks. Key application areas, such as autonomous vehicles in the automotive industry, advanced medical diagnosis in healthcare, and personalized experiences in retail and e-commerce, are significantly contributing to this market's upward trajectory. The prevalence of text, image/video, and audio data types further diversifies the market, offering opportunities for specialized dataset providers. While the market faces challenges like data privacy concerns and the high cost of data annotation, the overall trajectory remains positive, with a projected Compound Annual Growth Rate (CAGR) exceeding 20% for the forecast period (2025-2033). This growth is further supported by advancements in deep learning techniques that demand increasingly larger and more diverse datasets for optimal performance. Leading companies like Google, Amazon, and Microsoft are actively investing in this space, expanding their dataset offerings and fostering competition within the market. Furthermore, the emergence of specialized data annotation providers caters to the specific needs of various industries, ensuring accurate and reliable data for AI model development. The geographic distribution of the market reveals strong presence in North America and Europe, driven by early adoption of AI technologies and the presence of major technology players. However, Asia Pacific is projected to witness significant growth in the coming years, propelled by increasing digitalization and a burgeoning AI ecosystem in countries like China and India. Government initiatives promoting AI development in various regions are also expected to stimulate demand for high-quality training datasets. While challenges related to data security and ethical considerations remain, the long-term outlook for the AI training dataset market is exceptionally promising, fueled by the continued evolution of artificial intelligence and its increasing integration into various aspects of modern life. The market segmentation by application and data type allows for granular analysis and targeted investments for businesses operating in this rapidly expanding sector.
A
‘⭐ McDonalds Review Sentiment’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘⭐ McDonalds Review Sentiment’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-mcdonalds-review-sentiment-6d6c/9da444f4/?iid=000-968&v=presentation
Explore at:
Dataset updated
Feb 13, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘⭐ McDonalds Review Sentiment’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/mcdonalds-review-sentimente on 13 February 2022.

--- Dataset description provided by original source is as follows ---

About this dataset

A sentiment analysis of negative McDonald's reviews. Contributors were given reviews culled from low-rated McDonald's from random metro areas and asked to classify why the locations received low reviews. Options given were: * Rude Service

Slow Service

Problem with Order

Bad Food

Bad Neighborhood

Dirty Location

Cost

Missing Item Added: March 6, 2015 by CrowdFlower | Data Rows: 1500 Download Now

Source: https://www.crowdflower.com/data-for-everyone/

This dataset was created by CrowdFlower and contains around 2000 samples along with Unit State, Policies Violated, technical information and other features such as: - Review - Policies Violated Gold - and more.

How to use this dataset

Analyze Policies Violated:confidence in relation to City

Study the influence of Last Judgment At on Trusted Judgments

More datasets

Acknowledgements

If you use this dataset in your research, please credit CrowdFlower

Start A New Notebook!

--- Original source retains full ownership of the source dataset ---
AI-Generated Tech News Summaries
kaggle.com
zip
Updated Mar 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Parth Tyagi (2025). AI-Generated Tech News Summaries [Dataset]. https://www.kaggle.com/datasets/tyagi586/ai-generated-tech-news-summaries
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 20, 2025
Authors
Parth Tyagi
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset contains 200+ summarized tech news articles covering AI, machine learning, robotics, cybersecurity, and more. Each entry includes: ✅ Headline (Original news title) ✅ Source & Publication Date ✅ News Summary (AI-generated short version) ✅ Category (AI, Cybersecurity, Startups, etc.) ✅ Sentiment Analysis (Positive, Neutral, Negative) ✅ Keywords (Key topics covered) ✅ Original Article Link

🔹 Perfect for NLP projects, sentiment analysis, and trend analysis!
A
‘STS-Gold Dataset’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘STS-Gold Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-sts-gold-dataset-8075/ffd81669/?iid=000-180&v=presentation
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘STS-Gold Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/divyansh22/stsgold-dataset on 28 January 2022.

--- Dataset description provided by original source is as follows ---

This dataset is originally prepared by Saif et. al. (2013). Please cite the paper if you intend to use this dataset.

Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.

The dataset consists of 2026 tweets, their corresponding ids, and polarity. Use this data to train your sentiment analysis data models.

--- Original source retains full ownership of the source dataset ---

Facebook

Twitter

Click to copy link

Link copied

Cite

M Yasser H (2022). Twitter Tweets Sentiment Dataset [Dataset]. https://www.kaggle.com/datasets/yasserh/twitter-tweets-sentiment-dataset

Twitter Tweets Sentiment Dataset

Twitter Tweets Sentiment Analysis for Natural Language Processing

Explore at:

39 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Apr 8, 2022

Dataset provided by

Kagglehttp://kaggle.com/

Authors

M Yasser H

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

https://raw.githubusercontent.com/Masterx-AI/Project_Twitter_Sentiment_Analysis_/main/twitt.jpg" alt="">

Description:

Twitter is an online Social Media Platform where people share their their though as tweets. It is observed that some people misuse it to tweet hateful content. Twitter is trying to tackle this problem and we shall help it by creating a strong NLP based-classifier model to distinguish the negative tweets & block such tweets. Can you build a strong classifier model to predict the same?

Each row contains the text of a tweet and a sentiment label. In the training set you are provided with a word or phrase drawn from the tweet (selected_text) that encapsulates the provided sentiment.

Make sure, when parsing the CSV, to remove the beginning / ending quotes from the text field, to ensure that you don't include them in your training.

You're attempting to predict the word or phrase from the tweet that exemplifies the provided sentiment. The word or phrase should include all characters within that span (i.e. including commas, spaces, etc.)

Columns:

textID - unique ID for each piece of text
text - the text of the tweet
sentiment - the general sentiment of the tweet

Acknowledgement:

The dataset is download from Kaggle Competetions:
https://www.kaggle.com/c/tweet-sentiment-extraction/data?select=train.csv

Objective:

Understand the Dataset & cleanup (if required).
Build classification models to predict the twitter sentiments.
Compare the evaluation metrics of vaious classification algorithms.

Clear search

Close search

Google apps

Main menu

Twitter Tweets Sentiment Dataset

Description:

Columns:

Acknowledgement:

Objective:

‘Sentiment Analysis Dataset’ analyzed by Analyst-2

Twitter Sentiment Analysis Dataset

Dataset

Contents

‘Financial Sentiment Analysis’ analyzed by Analyst-2

Data

Citations

‘Product Reviews and Ratings (Sentiment Analysis)’ analyzed by Analyst-2

Context

NLP Preprocessed Sentiment Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Twitter dataset

Stock Market Dataset for Predictive Analysis

‘Sentiment Analysis of Commodity News (Gold)’ analyzed by Analyst-2

Context

Content

Acknowledgements

Inspiration

Data from: Financial News Sentiment Analysis Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Human Written Text

Overview

Data Source Distribution

Why These Sources

Dataset Structure

Uses

Disclaimer

Licensing

Film Review Binary Sentiment Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

‘Data for Aspect Based Sentimental Analysis (ABSA)’ analyzed by Analyst-2

Turkish Tweet Sentiment Analysis Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

‘Stock Sentiment’ analyzed by Analyst-2

Introduction

Data Info

IMDB 50K Movie Reviews (TEST your BERT)

Context

Content

Columns:

Acknowledgements

Inspiration

AI Training Dataset Report

‘⭐ McDonalds Review Sentiment’ analyzed by Analyst-2

About this dataset

How to use this dataset

Acknowledgements