76 datasets found
  1. Twitter Tweets Sentiment Dataset

    • kaggle.com
    • opendatabay.com
    Updated Apr 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M Yasser H (2022). Twitter Tweets Sentiment Dataset [Dataset]. https://www.kaggle.com/datasets/yasserh/twitter-tweets-sentiment-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 8, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    M Yasser H
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    https://raw.githubusercontent.com/Masterx-AI/Project_Twitter_Sentiment_Analysis_/main/twitt.jpg" alt="">

    Description:

    Twitter is an online Social Media Platform where people share their their though as tweets. It is observed that some people misuse it to tweet hateful content. Twitter is trying to tackle this problem and we shall help it by creating a strong NLP based-classifier model to distinguish the negative tweets & block such tweets. Can you build a strong classifier model to predict the same?

    Each row contains the text of a tweet and a sentiment label. In the training set you are provided with a word or phrase drawn from the tweet (selected_text) that encapsulates the provided sentiment.

    Make sure, when parsing the CSV, to remove the beginning / ending quotes from the text field, to ensure that you don't include them in your training.

    You're attempting to predict the word or phrase from the tweet that exemplifies the provided sentiment. The word or phrase should include all characters within that span (i.e. including commas, spaces, etc.)

    Columns:

    1. textID - unique ID for each piece of text
    2. text - the text of the tweet
    3. sentiment - the general sentiment of the tweet

    Acknowledgement:

    The dataset is download from Kaggle Competetions:
    https://www.kaggle.com/c/tweet-sentiment-extraction/data?select=train.csv

    Objective:

    • Understand the Dataset & cleanup (if required).
    • Build classification models to predict the twitter sentiments.
    • Compare the evaluation metrics of vaious classification algorithms.
  2. A

    ‘Sentiment Analysis Dataset’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Oct 17, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2016). ‘Sentiment Analysis Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-sentiment-analysis-dataset-caeb/f26f1fc2/?iid=004-932&v=presentation
    Explore at:
    Dataset updated
    Oct 17, 2016
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Sentiment Analysis Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/sonaam1234/sentimentdata on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Data for sentiment analysis

    --- Original source retains full ownership of the source dataset ---

  3. Twitter Sentiment Analysis Dataset

    • kaggle.com
    zip
    Updated Feb 13, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zohair Ahmed (2021). Twitter Sentiment Analysis Dataset [Dataset]. https://www.kaggle.com/zohairahmed007/twitter-sentiment-analysis-dataset
    Explore at:
    zip(38737743 bytes)Available download formats
    Dataset updated
    Feb 13, 2021
    Authors
    Zohair Ahmed
    Description

    Dataset

    This dataset was created by Zohair Ahmed

    Contents

  4. A

    ‘Financial Sentiment Analysis’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Aug 4, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘Financial Sentiment Analysis’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-financial-sentiment-analysis-5b39/latest
    Explore at:
    Dataset updated
    Aug 4, 2020
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Financial Sentiment Analysis’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/sbhatti/financial-sentiment-analysis on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    Data

    The following data is intended for advancing financial sentiment analysis research. It's two datasets (FiQA, Financial PhraseBank) combined into one easy-to-use CSV file. It provides financial sentences with sentiment labels.

    Citations

    Malo, Pekka, et al. "Good debt or bad debt: Detecting semantic orientations in economic texts." Journal of the Association for Information Science and Technology 65.4 (2014): 782-796.

    --- Original source retains full ownership of the source dataset ---

  5. A

    ‘Product Reviews and Ratings (Sentiment Analysis)’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Product Reviews and Ratings (Sentiment Analysis)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-product-reviews-and-ratings-sentiment-analysis-fb82/latest
    Explore at:
    Dataset updated
    Feb 13, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Product Reviews and Ratings (Sentiment Analysis)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/mafaisal007/product-reviews-and-ratings-sentiment-analysis on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    This dataset is from a toy store in Europe that contains customer reviews about a particular prodcut it is to be used for text mining and sentiment anlaysis.

    --- Original source retains full ownership of the source dataset ---

  6. o

    NLP Preprocessed Sentiment Dataset

    • opendatabay.com
    .undefined
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). NLP Preprocessed Sentiment Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/6323a1b5-7112-49bd-ad55-c1ef6968abc3
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Data Science and Analytics
    Description

    This dataset is a substantial collection of over 241,000 English-language comments, gathered from various online platforms. Each comment within the dataset has been carefully annotated with a sentiment label: 0 for negative sentiment, 1 for neutral, and 2 for positive. The primary aim of this dataset is to facilitate the training and evaluation of multi-class sentiment analysis models, designed to work effectively with real-world text data. The dataset has undergone a preprocessing stage, ensuring comments are in lowercase, and are cleaned of punctuation, URLs, numbers, and stopwords, making it readily usable for Natural Language Processing (NLP) pipelines.

    Columns

    • Comment: This column contains the user-generated text content.
    • Sentiment: This column provides the corresponding sentiment label for each comment, where 0 denotes Negative, 1 denotes Neutral, and 2 denotes Positive.

    Distribution

    The dataset comprises over 241,000 records. While the specific file format is not detailed, such datasets are typically provided in a tabular format, often as a CSV file. It is structured with two distinct columns as described above, suitable for direct integration into machine learning workflows.

    Usage

    This dataset is ideally suited for a variety of applications and use cases, including: * Training sentiment classifiers utilising advanced models such as LSTM, BiLSTM, CNN, BERT, or RoBERTa. * Evaluating the efficacy of different preprocessing and tokenisation strategies for text data. * Benchmarking NLP models on multi-class classification tasks to assess their performance. * Supporting educational projects and research initiatives in the fields of opinion mining or text classification. * Fine-tuning transformer models on a large and diverse collection of sentiment-annotated text.

    Coverage

    The dataset's coverage is global, comprising English-language comments. It focuses on general user-generated text content without specific demographic notes. The dataset is listed with a version of 1.0.

    License

    CC0

    Who Can Use It

    This dataset is suitable for individuals and organisations involved in data science and analytics. Intended users include: * Data Scientists and Machine Learning Engineers for developing and deploying sentiment analysis models. * Researchers and Academics for studies in NLP, text classification, and opinion mining. * Students undertaking educational projects in artificial intelligence and machine learning.

    Dataset Name Suggestions

    • Multi-class Comment Sentiment Data
    • User Text Sentiment Collection
    • Online Comment Sentiment Analysis Dataset
    • English Sentiment Labelled Comments
    • Preprocessed Sentiment Dataset

    Attributes

    Original Data Source: Sentiment Analysis Dataset

  7. f

    Twitter dataset

    • figshare.com
    csv
    Updated Feb 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shreyas Poojary; Mohammed Riza; Rashmi Laxmikant Malghan (2025). Twitter dataset [Dataset]. http://doi.org/10.6084/m9.figshare.28390334.v2
    Explore at:
    csvAvailable download formats
    Dataset updated
    Feb 11, 2025
    Dataset provided by
    figshare
    Authors
    Shreyas Poojary; Mohammed Riza; Rashmi Laxmikant Malghan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains tweets labeled for sentiment analysis, categorized into Positive, Negative, and Neutral sentiments. The dataset includes tweet IDs, user metadata, sentiment labels, and tweet text, making it suitable for Natural Language Processing (NLP), machine learning, and AI-based sentiment classification research. Originally sourced from Kaggle, this dataset is curated for improved usability in social media sentiment analysis.

  8. Stock Market Dataset for Predictive Analysis

    • kaggle.com
    Updated Feb 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WARNER (2025). Stock Market Dataset for Predictive Analysis [Dataset]. https://www.kaggle.com/datasets/s3programmer/stock-market-dataset-for-predictive-analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 24, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    WARNER
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This Stock Market Dataset is designed for predictive analysis and machine learning applications in financial markets. It includes 13647 records of simulated stock trading data with features commonly used in stock price forecasting.

    🔹 Key Features Date – Trading day timestamps (business days only) Open, High, Low, Close – Simulated stock prices Volume – Trading volume per day RSI (Relative Strength Index) – Measures market momentum MACD (Moving Average Convergence Divergence) – Trend-following momentum indicator Sentiment Score – Simulated market sentiment from financial news & social media Target – Binary label (1: Price goes up, 0: Price goes down) for next-day prediction This dataset is useful for training hybrid deep learning models such as LSTM, CNN, and Attention-based networks for stock market forecasting. It enables financial analysts, traders, and AI researchers to experiment with market trends, technical analysis, and sentiment-based predictions.

  9. A

    ‘Sentiment Analysis of Commodity News (Gold)’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Sep 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘Sentiment Analysis of Commodity News (Gold)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-sentiment-analysis-of-commodity-news-gold-732f/e3232de2/?iid=002-045&v=presentation
    Explore at:
    Dataset updated
    Sep 27, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Sentiment Analysis of Commodity News (Gold)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/ankurzing/sentiment-analysis-in-commodity-market-gold on 14 February 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    This is a news dataset for the commodity market where we have manually annotated 11,412 news headlines across multiple dimensions into various classes. The dataset has been sampled from a period of 20+ years (2000-2021).

    Content

    The dataset has been collected from various news sources and annotated by three human annotators who were subject experts. Each news headline was evaluated on various dimensions, for instance - if a headline is a price related news then what is the direction of price movements it is talking about; whether the news headline is talking about the past or future; whether the news item is talking about asset comparison; etc.

    Acknowledgements

    Sinha, Ankur, and Tanmay Khandait. "Impact of News on the Commodity Market: Dataset and Results." In Future of Information and Communication Conference, pp. 589-601. Springer, Cham, 2021.

    https://arxiv.org/abs/2009.04202 Sinha, Ankur, and Tanmay Khandait. "Impact of News on the Commodity Market: Dataset and Results." arXiv preprint arXiv:2009.04202 (2020)

    We would like to acknowledge the financial support provided by the India Gold Policy Centre (IGPC).

    Inspiration

    Commodity prices are known to be quite volatile. Machine learning models that understand the commodity news well, will be able to provide an additional input to the short-term and long-term price forecasting models. The dataset will also be useful in creating news-based indicators for commodities.

    Apart from researchers and practitioners working in the area of news analytics for commodities, the dataset will also be useful for researchers looking to evaluate their models on classification problems in the context of text-analytics. Some of the classes in the dataset are highly imbalanced and may pose challenges to the machine learning algorithms.

    --- Original source retains full ownership of the source dataset ---

  10. o

    Data from: Financial News Sentiment Analysis Dataset

    • opendatabay.com
    .undefined
    Updated Jul 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Financial News Sentiment Analysis Dataset [Dataset]. https://www.opendatabay.com/data/dataset/6c0503f5-8003-44c8-a56c-d65bec8e5d40
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Datasimple
    Area covered
    Data Science and Analytics
    Description

    This dataset provides fine-grained financial sentiment analysis on news headlines. It is a human-annotated collection of over 10,700 news headlines, specifically designed to address the challenging task of sentiment extraction in financial news where multiple entities may be present, often with conflicting sentiments. Over 2,800 headlines within the dataset feature multiple entities. The dataset is balanced, containing more than 4,100 positive entities, 3,200 negative entities, and 4,500 neutral entities. It is ideal for furthering research in entity-aware sentiment analysis and can be used for training models for extracting financial named entities.

    Columns

    • S No.: Represents the serial number for each entry.
    • Title: Contains the news headlines.
    • Decisions: Provides the sentiment annotations for various financial entities identified within the news headlines.
    • Words: Indicates the number of words in each news headline.

    Distribution

    The dataset is provided as a data file, typically in CSV format. It contains over 10,700 distinct news headlines. The structure includes human-annotated sentiment labels for financial entities, with sentiments balanced across positive, negative, and neutral categories. The dataset includes headlines with varying word counts, and its sentiment annotations are quite diverse, representing 10,686 unique values for entity sentiments.

    Usage

    This dataset is well-suited for several applications: * Performing Aspect-based Sentiment Analysis on financial texts. * Training machine learning models for the extraction of named financial entities. * Conducting research into fine-grained financial sentiment analysis. * Validating the effect of news sentiments on aggregate market movements. * Developing and evaluating learning schemes, including those utilising lexicon-based and pre-trained sentence representations, and various classification approaches.

    Coverage

    The dataset has a global regional coverage. While the listing date is 08/06/2025, the underlying research for SEntFiN 1.0 was published in 2022. The news headlines themselves do not specify a fixed time range within the provided information, focusing instead on the content and its annotations. No specific demographic scope is outlined, but the content is inherently focused on financial markets and related entities.

    License

    CC-BY

    Who Can Use It

    This dataset is beneficial for a range of users and purposes: * Researchers focusing on financial Natural Language Processing (NLP) and sentiment analysis. * Data Scientists and Analysts working on text classification, entity extraction, and market behaviour prediction. * Anyone involved in Business or Finance seeking to understand or model sentiment from news. * Developers of AI and Machine Learning models requiring human-annotated financial text data.

    Dataset Name Suggestions

    • SEntFiN 1.0
    • Financial News Entity Sentiment Headlines
    • Aspect-Based Financial Sentiment Data
    • Annotated Financial News for NLP
    • Financial News Sentiment Analysis Dataset

    Attributes

    Original Data Source: Aspect based Sentiment Analysis for Financial News

  11. Human Written Text

    • kaggle.com
    Updated May 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Youssef Elebiary (2025). Human Written Text [Dataset]. https://www.kaggle.com/datasets/youssefelebiary/human-written-text
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 13, 2025
    Dataset provided by
    Kaggle
    Authors
    Youssef Elebiary
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Overview

    This dataset contains 20000 pieces of text collected from Wikipedia, Gutenberg, and CNN/DailyMail. The text is cleaned by replacing symbols such as (.*?/) with a white space using automatic scripts and regex.

    Data Source Distribution

    1. 10,000 Wikipedia Articles: From the 20220301 dump.
    2. 3,000 Gutenberg Books: Via the GutenDex API.
    3. 7,000 CNN/DailyMail News Articles: From the CNN/DailyMail 3.0.0 dataset.

    Why These Sources

    The data was collected from these source to ensure the highest level of integrity against AI generated text. * Wikipedia: The 20220301 dataset was chosen to minimize the chance of including articles generated or heavily edited by AI. * Gutenberg: Books from this source are guaranteed to be written by real humans and span various genres and time periods. * CNN/DailyMail: These news articles were written by professional journalists and cover a variety of topics, ensuring diversity in writing style and subject matter.

    Dataset Structure

    The dataset consists of 5 CSV files. 1. CNN_DailyMail.csv: Contains all processed news articles. 2. Gutenberg.csv: Contains all processed books. 3. Wikipedia.csv: Contains all processed Wikipedia articles. 4. Human.csv: Combines all three datasets in order. 5. Shuffled_Human.csv: This is the randomly shuffled version of Human.csv.

    Each file has 2 columns: - Title: The title of the item. - Text: The content of the item.

    Uses

    This dataset is suitable for a wide range of NLP tasks, including: - Training models to distinguish between human-written and AI-generated text (Human/AI classifiers). - Training LSTMs or Transformers for chatbots, summarization, or topic modeling. - Sentiment analysis, genre classification, or linguistic research.

    Disclaimer

    While the data was collected from such sources, the data may not be 100% pure from AI generated text. Wikipedia articles may reflect systemic biases in contributor demographics. CNN/DailyMail articles may focus on specific news topics or regions.

    For details on how the dataset was created, click here to view the Kaggle notebook used.

    Licensing

    This dataset is published under the MIT License, allowing free use for both personal and commercial purposes. Attribution is encouraged but not required.

  12. o

    Film Review Binary Sentiment Dataset

    • opendatabay.com
    .undefined
    Updated Jul 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Film Review Binary Sentiment Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/dfae630e-27db-4111-905d-0eb10d8f3ce7
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 5, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Entertainment & Media Consumption
    Description

    This dataset provides a collection of 10,000 text reviews about films, each classified with a binary sentiment label: either positive or negative [1]. It serves as a small version of IMDB text reviews, ideal for machine learning projects focused on sentiment analysis and natural language processing [1, 2].

    Columns

    • review: This column contains the full text of the movie review itself [1].
    • sentiment: This column indicates the sentiment classification for the corresponding review. A value of 0 represents a positive sentiment, while 1 signifies a negative sentiment [1].

    Distribution

    The dataset comprises 10,000 unique movie reviews [1]. It is typically distributed in a CSV data file format [3]. Out of the total reviews, 5,037 are classified as positive (0) and 4,963 are classified as negative (1), offering a balanced distribution for training purposes [1]. The dataset is available globally [2].

    Usage

    This dataset is particularly well-suited for a variety of applications, including: * Training and evaluating sentiment analysis models [1]. * Developing and testing algorithms for binary text classification [4]. * Enhancing Natural Language Processing (NLP) capabilities [4]. * Experimenting with Transformers and PyTorch models for text understanding [4].

    Coverage

    The dataset focuses on general movie reviews [1, 2]. While specific geographic or demographic details about the original reviewers are not provided, the dataset is listed as having a global region coverage [2]. The listing date for the dataset is noted as 17/06/2025 [2].

    License

    CCO

    Who Can Use It

    This dataset is valuable for: * Data scientists and machine learning engineers who need labelled text data to build and improve sentiment prediction models [1]. * Researchers in the fields of NLP and artificial intelligence exploring text classification techniques [4]. * Students and developers learning about text data processing and sentiment analysis [5].

    Dataset Name Suggestions

    • IMDB Movie Review Sentiment (10K)
    • Film Review Binary Sentiment Dataset
    • Movie Sentiment Classification Dataset
    • Textual Movie Review Sentiment Analysis

    Attributes

    Original Data Source: imdb_sentiment_10k_reviews_binary_classification

  13. A

    ‘Data for Aspect Based Sentimental Analysis (ABSA)’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Data for Aspect Based Sentimental Analysis (ABSA)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-data-for-aspect-based-sentimental-analysis-absa-ccb8/010e645a/?iid=000-672&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Data for Aspect Based Sentimental Analysis (ABSA)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/kandhalkhandeka/data-for-aspect-based-sentimental-analysis on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    This data consists of reviews about an app along with a feature consisting of a word from the review. We can use aspect based sentimental analysis to check for the sentiment of the word w.r.t the text in the review!

    --- Original source retains full ownership of the source dataset ---

  14. o

    Turkish Tweet Sentiment Analysis Dataset

    • opendatabay.com
    .undefined
    Updated Jul 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Turkish Tweet Sentiment Analysis Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/d13ec616-4d59-4471-8b41-1f69546351ed
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 6, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Data Science and Analytics
    Description

    This dataset comprises over 11,000 tweets primarily in Turkish, curated to facilitate sentiment analysis and the detection of cyberbullying within social media contexts. Each tweet is pre-labelled with either a positive or negative sentiment, making it ideal for training and evaluating machine learning models. The dataset's creation was specifically driven by a project focused on identifying cyberbullying, providing a valuable resource for similar research and development efforts.

    Columns

    • Tip: This column indicates the sentiment label for each tweet, classifying it as either 'Positive' or 'Negative'.
    • Paylaşım: This column contains the full text of the tweet itself.

    Distribution

    The dataset is structured as a collection of individual social media posts. It contains 11,006 unique entries, with sentiment distribution approximately 55% positive and 45% negative. The specific file format is typically CSV, though a sample file will be made available separately on the platform.

    Usage

    This dataset is particularly well-suited for applications such as: * Developing and testing algorithms for social media sentiment analysis. * Building models for the detection and classification of online cyberbullying. * Research in Natural Language Processing (NLP) and Deep Learning, especially concerning Turkish text. * General data science and analytics projects requiring labelled social media data.

    Coverage

    The dataset primarily covers social media content written in Turkish, making it linguistically specific. While no explicit geographic or time range is specified for the tweets' origin or collection period, its focus on Turkish language posts implies a scope relevant to Turkish-speaking online communities.

    License

    CC0

    Who Can Use It

    This dataset is designed for use by: * Data Scientists: For developing and refining sentiment analysis and classification models. * Machine Learning Engineers: To train and test deep learning models on text data. * NLP Researchers: For studies on linguistic patterns, sentiment, and cyberbullying detection in Turkish. * Academics and Students: For educational projects, research, and thesis work related to social media analysis and AI. * Organisations: Looking to implement social media monitoring or content moderation systems.

    Dataset Name Suggestions

    • Turkish Tweet Sentiment Analysis Dataset
    • Social Media Cyberbullying Tweets (Turkish)
    • Turkish Sentiment Labelled Tweets
    • Turkish Social Media Sentiment Dataset

    Attributes

    Original Data Source: Türkçe Sosyal Medya Paylaşımı Veri Seti

  15. A

    ‘Stock Sentiment’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Stock Sentiment’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-stock-sentiment-6560/latest
    Explore at:
    Dataset updated
    Feb 14, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Stock Sentiment’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/purvitsharma/stock-sentiment on 14 February 2022.

    --- Dataset description provided by original source is as follows ---

    Introduction

    Stocks Sentiment dataset contains text, Tweets, related to changes in Stock prices and its other necessary statistics like whether the customer is willing to buy or sell the stocks.

    Data Info

    There are Two columns in the dataset representing Text and its respective sentiment regarding Stocks.

    --- Original source retains full ownership of the source dataset ---

  16. IMDB 50K Movie Reviews (TEST your BERT)

    • kaggle.com
    Updated Dec 19, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Atul Anand {Jha} (2019). IMDB 50K Movie Reviews (TEST your BERT) [Dataset]. https://www.kaggle.com/atulanandjha/imdb-50k-movie-reviews-test-your-bert/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 19, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Atul Anand {Jha}
    License

    http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html

    Description

    Context

    Large Movie Review Dataset v1.0 . 😃

    https://static.amazon.jobs/teams/53/images/IMDb_Header_Page.jpg?1501027252" alt="IMDB wall">

    This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. Provided a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well. Raw text and already processed bag of words formats are provided.

    In the entire collection, no more than 30 reviews are allowed for any given movie because reviews for the same movie tend to have correlated ratings. Further, the train and test sets contain a disjoint set of movies, so no significant performance is obtained by memorising movie-unique terms and their associated with observed labels. In the labelled train/test sets, a negative review has a score <= 4 out of 10, and a positive review has a score >= 7 out of 10. Thus reviews with more neutral ratings are not included in the train/test sets. In the unsupervised set, reviews of any rating are included and there are an even number of reviews > 5 and <= 5.

    Reference: http://ai.stanford.edu/~amaas/data/sentiment/

    NOTE

    A starter kernel is here : https://www.kaggle.com/atulanandjha/bert-testing-on-imdb-dataset-starter-kernel

    A kernel to expose Dataset collection :

    Content

    Now let’s understand the task in hand: given a movie review, predict whether it’s positive or negative.

    The dataset we use is 50,000 IMDB reviews (25K for train and 25K for test) from the PyTorch-NLP library.

    Each review is tagged pos or neg .

    There are 50% positive reviews and 50% negative reviews both in train and test sets.

    Columns:

    text : Reviews from people.

    Sentiment : Negative or Positive tag on the review/feedback (Boolean).

    Acknowledgements

    When using this Dataset Please Cite this ACL paper using :

    @InProceedings{

    maas-EtAl:2011:ACL-HLT2011,

    author = {Maas, Andrew L. and Daly, Raymond E. and Pham, Peter T. and Huang, Dan and Ng, Andrew Y. and Potts, Christopher},

    title = {Learning Word Vectors for Sentiment Analysis},

    booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies},

    month = {June},

    year = {2011},

    address = {Portland, Oregon, USA},

    publisher = {Association for Computational Linguistics},

    pages = {142--150},

    url = {http://www.aclweb.org/anthology/P11-1015}

    }

    Link to ref Dataset: https://pytorchnlp.readthedocs.io/en/latest/_modules/torchnlp/datasets/imdb.html

    https://www.samyzaf.com/ML/imdb/imdb.html

    Inspiration

    BERT and other Transformer Architecture models have always been on hype recently due to a great breakthrough by introducing Transfer Learning in NLP. So, Let's use this simple yet efficient Data-set to Test these models, and also compare our results with theirs. Also, I invite fellow researchers to try out their State of the Art Algorithms on this data-set.

  17. A

    AI Training Dataset Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Apr 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). AI Training Dataset Report [Dataset]. https://www.datainsightsmarket.com/reports/ai-training-dataset-1501897
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    Apr 30, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The AI training dataset market is experiencing robust growth, driven by the increasing adoption of artificial intelligence across diverse sectors. The market's expansion is fueled by the urgent need for high-quality data to train sophisticated AI models capable of handling complex tasks. Key application areas, such as autonomous vehicles in the automotive industry, advanced medical diagnosis in healthcare, and personalized experiences in retail and e-commerce, are significantly contributing to this market's upward trajectory. The prevalence of text, image/video, and audio data types further diversifies the market, offering opportunities for specialized dataset providers. While the market faces challenges like data privacy concerns and the high cost of data annotation, the overall trajectory remains positive, with a projected Compound Annual Growth Rate (CAGR) exceeding 20% for the forecast period (2025-2033). This growth is further supported by advancements in deep learning techniques that demand increasingly larger and more diverse datasets for optimal performance. Leading companies like Google, Amazon, and Microsoft are actively investing in this space, expanding their dataset offerings and fostering competition within the market. Furthermore, the emergence of specialized data annotation providers caters to the specific needs of various industries, ensuring accurate and reliable data for AI model development. The geographic distribution of the market reveals strong presence in North America and Europe, driven by early adoption of AI technologies and the presence of major technology players. However, Asia Pacific is projected to witness significant growth in the coming years, propelled by increasing digitalization and a burgeoning AI ecosystem in countries like China and India. Government initiatives promoting AI development in various regions are also expected to stimulate demand for high-quality training datasets. While challenges related to data security and ethical considerations remain, the long-term outlook for the AI training dataset market is exceptionally promising, fueled by the continued evolution of artificial intelligence and its increasing integration into various aspects of modern life. The market segmentation by application and data type allows for granular analysis and targeted investments for businesses operating in this rapidly expanding sector.

  18. A

    ‘⭐ McDonalds Review Sentiment’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘⭐ McDonalds Review Sentiment’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-mcdonalds-review-sentiment-6d6c/9da444f4/?iid=000-968&v=presentation
    Explore at:
    Dataset updated
    Feb 13, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘⭐ McDonalds Review Sentiment’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/mcdonalds-review-sentimente on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    About this dataset

    A sentiment analysis of negative McDonald's reviews. Contributors were given reviews culled from low-rated McDonald's from random metro areas and asked to classify why the locations received low reviews. Options given were: * Rude Service

    • Slow Service
    • Problem with Order
    • Bad Food
    • Bad Neighborhood
    • Dirty Location
    • Cost
    • Missing Item Added: March 6, 2015 by CrowdFlower | Data Rows: 1500 Download Now

    Source: https://www.crowdflower.com/data-for-everyone/

    This dataset was created by CrowdFlower and contains around 2000 samples along with Unit State, Policies Violated, technical information and other features such as: - Review - Policies Violated Gold - and more.

    How to use this dataset

    • Analyze Policies Violated:confidence in relation to City
    • Study the influence of Last Judgment At on Trusted Judgments
    • More datasets

    Acknowledgements

    If you use this dataset in your research, please credit CrowdFlower

    Start A New Notebook!

    --- Original source retains full ownership of the source dataset ---

  19. AI-Generated Tech News Summaries

    • kaggle.com
    zip
    Updated Mar 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Parth Tyagi (2025). AI-Generated Tech News Summaries [Dataset]. https://www.kaggle.com/datasets/tyagi586/ai-generated-tech-news-summaries
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2025
    Authors
    Parth Tyagi
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset contains 200+ summarized tech news articles covering AI, machine learning, robotics, cybersecurity, and more. Each entry includes: ✅ Headline (Original news title) ✅ Source & Publication Date ✅ News Summary (AI-generated short version) ✅ Category (AI, Cybersecurity, Startups, etc.) ✅ Sentiment Analysis (Positive, Neutral, Negative) ✅ Keywords (Key topics covered) ✅ Original Article Link

    🔹 Perfect for NLP projects, sentiment analysis, and trend analysis!

  20. A

    ‘STS-Gold Dataset’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘STS-Gold Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-sts-gold-dataset-8075/ffd81669/?iid=000-180&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘STS-Gold Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/divyansh22/stsgold-dataset on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    This dataset is originally prepared by Saif et. al. (2013). Please cite the paper if you intend to use this dataset.

    Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.

    The dataset consists of 2026 tweets, their corresponding ids, and polarity. Use this data to train your sentiment analysis data models.

    --- Original source retains full ownership of the source dataset ---

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
M Yasser H (2022). Twitter Tweets Sentiment Dataset [Dataset]. https://www.kaggle.com/datasets/yasserh/twitter-tweets-sentiment-dataset
Organization logo

Twitter Tweets Sentiment Dataset

Twitter Tweets Sentiment Analysis for Natural Language Processing

Explore at:
39 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 8, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
M Yasser H
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

https://raw.githubusercontent.com/Masterx-AI/Project_Twitter_Sentiment_Analysis_/main/twitt.jpg" alt="">

Description:

Twitter is an online Social Media Platform where people share their their though as tweets. It is observed that some people misuse it to tweet hateful content. Twitter is trying to tackle this problem and we shall help it by creating a strong NLP based-classifier model to distinguish the negative tweets & block such tweets. Can you build a strong classifier model to predict the same?

Each row contains the text of a tweet and a sentiment label. In the training set you are provided with a word or phrase drawn from the tweet (selected_text) that encapsulates the provided sentiment.

Make sure, when parsing the CSV, to remove the beginning / ending quotes from the text field, to ensure that you don't include them in your training.

You're attempting to predict the word or phrase from the tweet that exemplifies the provided sentiment. The word or phrase should include all characters within that span (i.e. including commas, spaces, etc.)

Columns:

  1. textID - unique ID for each piece of text
  2. text - the text of the tweet
  3. sentiment - the general sentiment of the tweet

Acknowledgement:

The dataset is download from Kaggle Competetions:
https://www.kaggle.com/c/tweet-sentiment-extraction/data?select=train.csv

Objective:

  • Understand the Dataset & cleanup (if required).
  • Build classification models to predict the twitter sentiments.
  • Compare the evaluation metrics of vaious classification algorithms.
Search
Clear search
Close search
Google apps
Main menu