Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This comprehensive dataset is a meticulously curated collection of mental health statuses tagged from various statements. The dataset amalgamates raw data from multiple sources, cleaned and compiled to create a robust resource for developing chatbots and performing sentiment analysis.
The dataset integrates information from the following Kaggle datasets:
The dataset consists of statements tagged with one of the following seven mental health statuses: - Normal - Depression - Suicidal - Anxiety - Stress - Bi-Polar - Personality Disorder
The data is sourced from diverse platforms including social media posts, Reddit posts, Twitter posts, and more. Each entry is tagged with a specific mental health status, making it an invaluable asset for:
This dataset is ideal for training machine learning models aimed at understanding and predicting mental health conditions based on textual data. It can be used in various applications such as:
This dataset was created by aggregating and cleaning data from various publicly available datasets on Kaggle. Special thanks to the original dataset creators for their contributions.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Nitesh sureja
Released under Apache 2.0
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset contains product reviews along with corresponding prices, names, review, summary and sentiment labels. The sentiment labels indicate whether the review expresses a positive, negative, or neutral sentiment towards the product. Based on the provided dataset, a possible application could be sentiment analysis of product reviews. This could involve using machine learning algorithms to automatically classify reviews as positive, negative, or neutral based on the textual content of the review and associated metadata such as the product name and price. Such a system could be used by businesses to track customer sentiment towards their products and identify areas for improvement. It could also be used by consumers to make more informed purchasing decisions based on the experiences of others.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is a large-scale collection of 241,000+ English-language comments sourced from various online platforms. Each comment is annotated with a sentiment label:
The Data has been gathered from multiple websites such as :
Hugginface : https://huggingface.co/datasets/Sp1786/multiclass-sentiment-analysis-dataset
Kaggle : https://www.kaggle.com/datasets/abhi8923shriv/sentiment-analysis-dataset
https://www.kaggle.com/datasets/jp797498e/twitter-entity-sentiment-analysis
https://www.kaggle.com/datasets/crowdflower/twitter-airline-sentiment
The goal is to enable training and evaluation of multi-class sentiment analysis models for real-world text data. The dataset is already preprocessed β lowercase, cleaned from punctuation, URLs, numbers, and stopwords β and is ready for NLP pipelines.
| Column | Description |
|---|---|
Comment | User-generated text content |
Sentiment | Sentiment label (0=Negative, 1=Neutral, 2=Positive) |
Comment: "apple pay is so convenient secure and easy to use"
Sentiment: 2 (Positive)
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
As the Social networking, customer support, and market research are where sentiment analysis is most frequently used. In social media, sentiment analysis is frequently used to examine how users feel about and talk about a brand or product. Organizations can use it to learn how various societal segments see various issues, ranging from hot topics to breaking news. With this knowledge, businesses may react swiftly to public sentiment.
In this challenge, the goal is to detect the sentiments of the natural occurring sentences.
Datasets consist following files -
Dev-datasets: Containing the train and dev datasets along with a sample submission file (answer.txt) test-datasets: Containing the test dataset on which your models will be evaluated
Train Size - 92,228
Development Size - 4,855
Ground Truth contains 3 categorical values -
You have to predict the labels and save the predictions (1, 0, -1) in "answer.txt" file.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Our dataset comprises 1000 tweets, which were taken from Twitter using the Python programming language. The dataset was stored in a CSV file and generated using various modules. The random module was used to generate random IDs and text, while the faker module was used to generate random user names and dates. Additionally, the textblob module was used to assign a random sentiment to each tweet.
This systematic approach ensures that the dataset is well-balanced and represents different types of tweets, user behavior, and sentiment. It is essential to have a balanced dataset to ensure that the analysis and visualization of the dataset are accurate and reliable. By generating tweets with a range of sentiments, we have created a diverse dataset that can be used to analyze and visualize sentiment trends and patterns.
In addition to generating the tweets, we have also prepared a visual representation of the data sets. This visualization provides an overview of the key features of the dataset, such as the frequency distribution of the different sentiment categories, the distribution of tweets over time, and the user names associated with the tweets. This visualization will aid in the initial exploration of the dataset and enable us to identify any patterns or trends that may be present.
Natural Language Processing, Machine Learning Algorithm, Deep Learning
Jannatul Ferdoshi
Institutions: BRAC University
Image Source:Twitter Sentiment Analysis Using Python GeeksforGeeks | lacienciadelcafe.com.ar
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains 1 million unique English sentences, each labeled with one of three sentiment categories: positive, negative, or neutral. The sentences were automatically generated by GPT (Generative Pre-trained Transformer) ChatGPT o3-mini-high and are designed to be used for training and evaluating sentiment analysis models. The variety of sentence structures and emotional tones provides a diverse foundation for NLP tasks, particularly those focused on sentiment classification. This dataset is ideal for machine learning practitioners, researchers, and developers working on sentiment analysis, text classification, and natural language understanding.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Please cite the following paper when using this dataset:
N. Thakur, V. Su, M. Shao, K. Patel, H. Jeong, V. Knieling, and A.Bian βA labelled dataset for sentiment analysis of videos on YouTube, TikTok, and other sources about the 2024 outbreak of measles,β arXiv [cs.CY], 2024. Available: https://doi.org/10.48550/arXiv.2406.07693
Abstract
This dataset contains the data of 4011 videos about the ongoing outbreak of measles published on 264 websites on the internet between January 1, 2024, and May 31, 2024. These websites primarily include YouTube and TikTok, which account for 48.6% and 15.2% of the videos, respectively. The remainder of the websites include Instagram and Facebook as well as the websites of various global and local news organizations. For each of these videos, the URL of the video, title of the post, description of the post, and the date of publication of the video are presented as separate attributes in the dataset. After developing this dataset, sentiment analysis (using VADER), subjectivity analysis (using TextBlob), and fine-grain sentiment analysis (using DistilRoBERTa-base) of the video titles and video descriptions were performed. This included classifying each video title and video description into (i) one of the sentiment classes i.e. positive, negative, or neutral, (ii) one of the subjectivity classes i.e. highly opinionated, neutral opinionated, or least opinionated, and (iii) one of the fine-grain sentiment classes i.e. fear, surprise, joy, sadness, anger, disgust, or neutral. These results are presented as separate attributes in the dataset for the training and testing of machine learning algorithms for performing sentiment analysis or subjectivity analysis in this field as well as for other applications. The paper associated with this dataset (please see the above-mentioned citation) also presents a list of open research questions that may be investigated using this dataset.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains synthetic review data collected from popular online learning platforms such as Coursera, Udemy, and RateMyProfessors. It is designed to support sentiment analysis research by providing structured review content labeled with sentiment classifications.
π Purpose The dataset aims to facilitate Natural Language Processing (NLP) tasks, especially in the context of educational feedback analysis, by enabling users to:
Train and evaluate sentiment classification models.
Analyze learner satisfaction across platforms.
Visualize sentiment trends in online education.
π Dataset Composition The dataset is synthetically generated and includes review texts with associated sentiment labels. It may include:
Review text: A learner's comment or review.
Sentiment label: Categories like positive, neutral, or negative.
Source indicator: Platform such as Coursera, Udemy, or RateMyProfessors.
π Potential Applications Sentiment classification using machine learning (e.g., Logistic Regression, SVM, BERT, VADER).
Topic modeling to extract key concerns or highlights from reviews.
Dashboards for educational insights and user experience monitoring.
β Notes This dataset is synthetic and intended for academic and research purposes only.
No personally identifiable information (PII) is included.
Labeling is consistent with typical sentiment classification tasks.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains LinkedIn profile comments, capturing user interactions and engagement across various profiles. The dataset can be useful for researchers and developers working on natural language processing (NLP), sentiment analysis, and social media behavior analysis.
Key features of the dataset: Captures LinkedIn comments from various profiles . User engagement insights: Analyze the language and sentiment of comments to gauge user engagement. Potential applications: The dataset is ideal for machine learning projects such as sentiment analysis, text classification, and recommendation systems. This dataset can help with:
Identifying sentiment in LinkedIn comments. Detecting popular or trending topics based on comment activity. Enhancing user engagement analysis on professional networking platforms.
Facebook
TwitterA Simple but Rich Dataset for Sentiment Analysis of Chat Messages
This dataset contains a collection of chat messages that can be used to develop a sentiment analysis machine learning model to classify messages into 3 sentiment classes - positive, negative, and neutral. The messages are diverse in nature, containing not only simple text but also special characters, numbers, emoji/emoticons, and URL addresses. The dataset can be used for various natural language processing tasks related to chat analysis.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by DEEPAK POLISETTI
Released under MIT
Facebook
TwitterThis dataset was created by Gilbert Kiprotich
Facebook
TwitterContext
This dataset is a part of our research work titled "Opinion Mining of Customer Reviews Using Supervised Learning Algorithms". If you use this dataset then please cite our work. You can find the article in https://ieeexplore.ieee.org/document/9733435
Content
Nowadays, a lot of people express their opinions on various topics using social networking sites. Twitter has become a famous social networking site where people can express their opinions to the point and so it has become a great source for opinion mining. In this research, the goal was to train and build a model that can automatically and accurately categorize the opinion of customer tweet reviews about popular cell phone brands. We have used python TextBlob library for getting the polarity values of all the tweet reviews of the dataset. We have also used Support Vector Machine (SVM), NaΓ―ve Bayes, Logistic Regression, Decision Tree and Random Forest algorithms along with Bag of Words and TF-IDF vectorizers separately to train and build the model. We have investigated the opinions using five classes which are Strongly Positive, Positive, Neutral, Negative and Strongly Negative.
When referencing this dataset please cite the below paper
Bibtex @inproceedings{arif2021opinion, title={Opinion Mining of Customer Reviews Using Supervised Learning Algorithms}, author={Arif, Shibbir Ahmed and Hossain, Taslima Binte}, booktitle={2021 5th International Conference on Electrical Information and Communication Technology (EICT)}, pages={1--6}, year={2021}, organization={IEEE} }
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
π‘ Why youβll love it:
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
About this Dataset
This dataset is designed for sentiment analysis tasks, specifically to classify text comments as positive or negative. It's a supervised dataset, meaning each comment is already labeled with its corresponding sentiment.
Key Features:
Two Columns: - Text: Contains the raw text of the comments. - Tag: Indicates the sentiment of the comment, labeled as either "positive" or "negative."
Supervised Learning: Ideal for training and evaluating machine learning models for sentiment classification.
Potential Applications: - Sentiment Analysis: Build models to automatically analyze emotions and opinions in various text data. - Social Media Analysis: Understand public sentiment towards brands, products, or topics on social media platforms. - Customer Feedback Analysis: Gauge customer satisfaction and identify areas for improvement based on reviews and feedback. - Text Classification: Develop text categorization systems for diverse applications.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Name: BBC Articles Sentiment Analysis Dataset
Source: BBC News
Description: This dataset consists of articles from the BBC News website, containing a diverse range of topics such as business, politics, entertainment, technology, sports, and more. The dataset includes articles from various time periods and categories, along with labels representing the sentiment of the article. The sentiment labels indicate whether the tone of the article is positive, negative, or neutral, making it suitable for sentiment analysis tasks.
Number of Instances: [Specify the number of articles in the dataset, for example, 2,225 articles]
Number of Features: 1. Article Text: The content of the article (string). 2. Sentiment Label: The sentiment classification of the article. The possible labels are: - Positive - Negative - Neutral
Data Fields: - id: Unique identifier for each article. - category: The category or topic of the article (e.g., business, politics, sports). - title: The title of the article. - content: The full text of the article. - sentiment: The sentiment label (positive, negative, or neutral).
Example: | id | category | title | content | sentiment | |----|-----------|---------------------------|-------------------------------------------------------------------------|-----------| | 1 | Business | "Stock Market Surge" | "The stock market has surged to new highs, driven by strong earnings..." | Positive | | 2 | Politics | "Election Results" | "The election results were a mixed bag, with some surprises along the way." | Neutral | | 3 | Sports | "Team Wins Championship" | "The team won the championship after a thrilling final match." | Positive | | 4 | Technology | "New Smartphone Release" | "The new smartphone release has received mixed reactions from users." | Negative |
Preprocessing Notes: - The text has been preprocessed to remove special characters and any HTML tags that might have been included in the original articles. - Tokenization or further text cleaning (e.g., lowercasing, stopword removal) may be necessary depending on the model and method used for sentiment classification.
Use Case: This dataset is ideal for training and evaluating machine learning models for sentiment classification, where the goal is to predict the sentiment (positive, negative, or neutral) based on the article's text.
Facebook
TwitterIt is the tourist review data collected from the It is a tourist review data collected from top 10 tourist destinations in Nepal. Using various methods, you can analyze text sentiment through this review by converting the sentences into sentiment polarity.
For learner's suggestions: 1. Clean the dataset 2. Convert review into sentiment polarity 3. Use different Machine Learning Algorithms
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Explore the Amazon Product Reviews Dataset, a treasure trove of valuable insights into customer opinions and sentiments about a wide range of products available on Amazon's platform. This dataset is a goldmine for data enthusiasts, analysts, and machine learning practitioners interested in understanding consumer feedback, sentiment analysis, and product performance evaluation.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset description Users assessed tweets related to various brands and products, providing evaluations on whether the sentiment conveyed was positive, negative, or neutral. Additionally, if the tweet conveyed any sentiment, contributors identified the specific brand or product targeted by that emotion.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F11965067%2Fa48606bfcaf80acebbb6edff7895484a%2Fdownload.png?generation=1704673111671747&alt=media" alt="">
Train Dataset : 8589 rows x 3 columns
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F11965067%2Fe998ba81ca461699a787ff7305486b24%2FTrainDS.JPG?generation=1704672608361793&alt=media" alt="">
Test Dataset : 504 rows x 1 columns
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F11965067%2F07df18965e91f84df123270aabb641e1%2Ftest.JPG?generation=1704679582009718&alt=media" alt="">
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This comprehensive dataset is a meticulously curated collection of mental health statuses tagged from various statements. The dataset amalgamates raw data from multiple sources, cleaned and compiled to create a robust resource for developing chatbots and performing sentiment analysis.
The dataset integrates information from the following Kaggle datasets:
The dataset consists of statements tagged with one of the following seven mental health statuses: - Normal - Depression - Suicidal - Anxiety - Stress - Bi-Polar - Personality Disorder
The data is sourced from diverse platforms including social media posts, Reddit posts, Twitter posts, and more. Each entry is tagged with a specific mental health status, making it an invaluable asset for:
This dataset is ideal for training machine learning models aimed at understanding and predicting mental health conditions based on textual data. It can be used in various applications such as:
This dataset was created by aggregating and cleaning data from various publicly available datasets on Kaggle. Special thanks to the original dataset creators for their contributions.