Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://raw.githubusercontent.com/Masterx-AI/Project_Twitter_Sentiment_Analysis_/main/twitt.jpg" alt="">
Twitter is an online Social Media Platform where people share their their though as tweets. It is observed that some people misuse it to tweet hateful content. Twitter is trying to tackle this problem and we shall help it by creating a strong NLP based-classifier model to distinguish the negative tweets & block such tweets. Can you build a strong classifier model to predict the same?
Each row contains the text of a tweet and a sentiment label. In the training set you are provided with a word or phrase drawn from the tweet (selected_text) that encapsulates the provided sentiment.
Make sure, when parsing the CSV, to remove the beginning / ending quotes from the text field, to ensure that you don't include them in your training.
You're attempting to predict the word or phrase from the tweet that exemplifies the provided sentiment. The word or phrase should include all characters within that span (i.e. including commas, spaces, etc.)
The dataset is download from Kaggle Competetions:
https://www.kaggle.com/c/tweet-sentiment-extraction/data?select=train.csv
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset has three sentiments namely, negative, neutral, and positive. It contains two fields for the tweet and label.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains a collection of tweets from the Indonesian community, expressing their opinions on the government's implementation of PPKM (Enforcement of Community Activity Restrictions). The dataset consists of approximately 20,000 tweets gathered within the time range from April 1, 2020, to April 1, 2022.
The selected time range for data collection is based on when Indonesia started implementing PPKM extensively and when the government revoked the policy. Within this dataset, diverse opinions, comments, and reactions from the public regarding the PPKM policy during that period can be found.
This dataset provides an opportunity to analyze the sentiment and public views regarding the PPKM policy, as well as observe changes in opinions over time. It offers valuable insights into understanding the perceptions and reactions of the community towards government policies related to PPKM.
Label: 0 (Positive), 1 (Neutral), 2 (Negative)
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The Twitter Sentiment Analysis Dataset contains 1,578,627 classified tweets, each row is marked as 1 for positive sentiment and 0 for negative sentiment. The dataset is based on data from the following two sources:
University of Michigan Sentiment Analysis competition on Kaggle Twitter Sentiment Corpus by Niek Sanders
Finally, I randomly selected a subset of them, applied a cleaning process, and divided them between the test and train subsets, keeping a balance between the number of positive and negative tweets within each of these subsets.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
š¦ Twitter Sentiment Analysis (bdstar/twitter-sentiment-analysis)
š§ Overview
A refined and merged version of Twitter text sentiment datasets, providing a clean and well-balanced dataset for sentiment classification across three sentiment categories:positive, negative, and neutral. This dataset is split into three parts ā train, test, and validation ā each sourced from highly reputable open datasets.It is designed for training, evaluating, and benchmarking NLP models for⦠See the full description on the dataset page: https://huggingface.co/datasets/bdstar/twitter-sentiment-analysis.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The Twitter Sentiment Analysis Dataset is a widely used dataset in the field of natural language processing and sentiment analysis. It consists of a collection of tweets, each labeled with the sentiment expressed in the tweet, which can be positive, negative, or neutral. This dataset is commonly used for training and evaluating machine learning models that aim to automatically analyze and classify the sentiment behind Twitter messages.
The dataset contains a diverse range of tweets, capturing the opinions, emotions, and attitudes of Twitter users on various topics such as movies, products, events, or general daily experiences. The tweets cover a broad spectrum of sentiments, including expressions of joy, satisfaction, anger, disappointment, sarcasm, or indifference.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Description
The Twitter Financial News dataset is an English-language dataset containing an annotated corpus of finance-related tweets. This dataset is used to classify finance-related tweets for their sentiment.
The dataset holds 11,932 documents annotated with 3 labels:
sentiments = { "LABEL_0": "Bearish", "LABEL_1": "Bullish", "LABEL_2": "Neutral" }
The data was collected using the Twitter API. The current dataset supports the multi-class classification⦠See the full description on the dataset page: https://huggingface.co/datasets/zeroshot/twitter-financial-news-sentiment.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset description Users assessed tweets related to various brands and products, providing evaluations on whether the sentiment conveyed was positive, negative, or neutral. Additionally, if the tweet conveyed any sentiment, contributors identified the specific brand or product targeted by that emotion.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F11965067%2Fa48606bfcaf80acebbb6edff7895484a%2Fdownload.png?generation=1704673111671747&alt=media" alt="">
Train Dataset : 8589 rows x 3 columns
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F11965067%2Fe998ba81ca461699a787ff7305486b24%2FTrainDS.JPG?generation=1704672608361793&alt=media" alt="">
Test Dataset : 504 rows x 1 columns
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F11965067%2F07df18965e91f84df123270aabb641e1%2Ftest.JPG?generation=1704679582009718&alt=media" alt="">
Facebook
Twitterhttps://brightdata.com/licensehttps://brightdata.com/license
Our Twitter Sentiment Analysis Dataset provides a comprehensive collection of tweets, enabling businesses, researchers, and analysts to assess public sentiment, track trends, and monitor brand perception in real time. This dataset includes detailed metadata for each tweet, allowing for in-depth analysis of user engagement, sentiment trends, and social media impact.
Key Features:
Tweet Content & Metadata: Includes tweet text, hashtags, mentions, media attachments, and engagement metrics such as likes, retweets, and replies.
Sentiment Classification: Analyze sentiment polarity (positive, negative, neutral) to gauge public opinion on brands, events, and trending topics.
Author & User Insights: Access user details such as username, profile information, follower count, and account verification status.
Hashtag & Topic Tracking: Identify trending hashtags and keywords to monitor conversations and sentiment shifts over time.
Engagement Metrics: Measure tweet performance based on likes, shares, and comments to evaluate audience interaction.
Historical & Real-Time Data: Choose from historical datasets for trend analysis or real-time data for up-to-date sentiment tracking.
Use Cases:
Brand Monitoring & Reputation Management: Track public sentiment around brands, products, and services to manage reputation and customer perception.
Market Research & Consumer Insights: Analyze consumer opinions on industry trends, competitor performance, and emerging market opportunities.
Political & Social Sentiment Analysis: Evaluate public opinion on political events, social movements, and global issues.
AI & Machine Learning Applications: Train sentiment analysis models for natural language processing (NLP) and predictive analytics.
Advertising & Campaign Performance: Measure the effectiveness of marketing campaigns by analyzing audience engagement and sentiment.
Our dataset is available in multiple formats (JSON, CSV, Excel) and can be delivered via API, cloud storage (AWS, Google Cloud, Azure), or direct download.
Gain valuable insights into social media sentiment and enhance your decision-making with high-quality, structured Twitter data.
Facebook
Twitterhttps://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data introduction ⢠Twitter-tweets-sentiment dataset is a dataset that aims to analyze tweet sentiment for Twitter and natural language processing.
2) Data utilization (1)Twitter-tweets-sentiment data has characteristics that: ⢠The data consists of three columns, including emotion and text, and aims to block negative tweets through a powerful classification model. (2) Twitter-tweets-sentiment data can be used to: ⢠Social Media Monitoring: Businesses and organizations can use data to monitor social media platforms and gauge public sentiment about a brand, product, event, or social issue. ⢠Sentiment analysis: This dataset can be used to train models that classify the sentiment of tweets, which can help companies and researchers understand public opinion on a variety of topics.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is the Sentiment dataset. The tweets have been annotated with 4 different categories(positive,negative,uncertainty,litigious) and they can be used to detect sentiment .
It contains the following 3 fields: - Language - Text - Label
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Twitter Sentiment Dataset
Sample English-only tweet sentiment dataset. Each row represents a single tweet with anonymized text and conversation structure. This is a sample dataset. To access the full version or request any custom dataset tailored to your needs, contact DataHive at contact@datahive.ai.
Files Included
dataset.csv ā tweets data
Whatās included
Anonymized tweet text Conversation linkage via root_id and parent_id 3-class sentiment label (positive⦠See the full description on the dataset page: https://huggingface.co/datasets/datahiveai/Twitter-Conversations-Sentiment-Dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was created as part of a sentiment analysis project using enriched Twitter data. The objective was to train and test a machine learning model to automatically classify the sentiment of tweets (e.g., Positive, Negative, Neutral).
The data was generated using tweets that were sentiment-scored with a custom sentiment scorer. A machine learning pipeline was applied, including text preprocessing, feature extraction with CountVectorizer, and prediction with a HistGradientBoostingClassifier.
The dataset includes five main files:
test_predictions_full.csv ā Predicted sentiment labels for the test set.
sentiment_model.joblib ā Trained machine learning model.
count_vectorizer.joblib ā Text feature extraction model (CountVectorizer).
model_performance.txt ā Evaluation metrics and performance report of the trained model.
confusion_matrix.png ā Visualization of the modelās confusion matrix.
The files follow standard naming conventions based on their purpose.
The .joblib files can be loaded into Python using the joblib and scikit-learn libraries.
The .csv,.txt, and .png files can be opened with any standard text reader, spreadsheet software, or image viewer.
Additional performance documentation is included within the model_performance.txt file.
The data was constructed to ensure reproducibility.
No personal or sensitive information is present.
It can be reused by researchers, data scientists, and students interested in Natural Language Processing (NLP), machine learning classification, and sentiment analysis tasks.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Our dataset comprises 1000 tweets, which were taken from Twitter using the Python programming language. The dataset was stored in a CSV file and generated using various modules. The random module was used to generate random IDs and text, while the faker module was used to generate random user names and dates. Additionally, the textblob module was used to assign a random sentiment to each tweet.
This systematic approach ensures that the dataset is well-balanced and represents different types of tweets, user behavior, and sentiment. It is essential to have a balanced dataset to ensure that the analysis and visualization of the dataset are accurate and reliable. By generating tweets with a range of sentiments, we have created a diverse dataset that can be used to analyze and visualize sentiment trends and patterns.
In addition to generating the tweets, we have also prepared a visual representation of the data sets. This visualization provides an overview of the key features of the dataset, such as the frequency distribution of the different sentiment categories, the distribution of tweets over time, and the user names associated with the tweets. This visualization will aid in the initial exploration of the dataset and enable us to identify any patterns or trends that may be present.
Natural Language Processing, Machine Learning Algorithm, Deep Learning
Jannatul Ferdoshi
Institutions: BRAC University
Image Source:Twitter Sentiment Analysis Using Python GeeksforGeeks | lacienciadelcafe.com.ar
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a large-scale, multilingual and longitudinal Twitter sentiment dataset sampled through distant supervision from the Twitter Stream Grab archive (https://archive.org/details/twitterstream). It covers the time period between January 2013 and June 2020 for 7 languages:- Arabic (ar)- German (de)- English (en)- Spanish (es)- French (fr)- Italian (it)- Chinese (zh)With the files in this repository, we provide tweet IDs that can be used to rehydrate the datasets by using the files available from the Twitter Stream Grab.Files are formatted as TSV files, with the following columns:date \t tweetid \t sentiment \t evidencewhere:- date is the day in which the tweet was posted.- tweetid is the ID of the tweet- sentiment is either pos or neg- evidence is the set of emojis or emoticons used to determine if the tweet was positive or negative.More details about the dataset can be found in the following paper (please cite the paper if you use the dataset):TBA
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Dataset Card for Twitter US Airline Sentiment
Dataset Summary
This data originally came from Crowdflower's Data for Everyone library. As the original source says,
A sentiment analysis job about the problems of each major U.S. airline. Twitter data was scraped from February of 2015 and contributors were asked to first classify positive, negative, and neutral tweets, followed by categorizing negative reasons (such as "late flight" or "rude service").
The data we're⦠See the full description on the dataset page: https://huggingface.co/datasets/osanseviero/twitter-airline-sentiment.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is made up of unique annotated English-Malay code-switching, pure English, and pure Malay tweets using raw_tweets_012019_to_062020.csv on Kaggle (Carlson, 2020). The raw tweets file is the collected usersā tweets about a Malaysian brand called, āThe dUCk Groupā which is founded by Vivy Yusof focuses on selling scarves, bags, cosmetics, stationaries, and Home & Living products. When preparing this dataset, the duplicated, invalid and unusable data rows are removed. The tweets are then annotated with the language category āENGā for pure English tweets, āBMā for pure Malay tweets, and āENG-BMā for the code-switching tweets. Besides, the tweets are annotated with sentiment value 0 for neutral, 1 for positive, and -1 for negative.
The sub-folders contain in this dataset are as follows:
1) Full Training Dataset: This sub-folder contains a full set of annotated pure English, pure Malay, and English-Malay code-switching tweets regarding āThe dUCk Groupā brand, which can be used to train machine learning models. The tweets are kept in both CSV and XML format files namely 'full_training_dataset.csv' and 'full_training_dataset.xml'.
2) Full Testing Dataset: This sub-folder contains a full set of annotated pure English, pure Malay, and English-Malay code-switching tweets regarding āThe dUCk Groupā brand, which can be used to test the performance of learning models. The tweets are kept in both CSV and XML format files namely 'full_testing_dataset.csv' and 'full_testing_dataset.xml'.
3) Code-Switching Training Dataset: This sub-folder comprises only annotated English-Malay code-switching tweets regarding āThe dUCk Groupā brand for training the learning models. The tweets are kept in both CSV and XML format files namely 'eng_malay_training_dataset.csv' and 'eng_malay_training_dataset.xml'.
4) Code-Switching Testing Dataset: This sub-folder comprises only annotated English-Malay code-switching tweets regarding āThe dUCk Groupā brand, which can be used to evaluate the performance of the learning models. The tweets are kept in both CSV and XML format files namely 'eng_malay_testing_dataset.csv' and 'eng_malay_testing_dataset.xml.
*Note: 'Language' column represents the language category of the tweet belongs to 'TweetText' column represents the whole tweet 'TweetSentiment' column represents the sentiment value of the tweet (0, 1, and -1)
Facebook
Twitterhttps://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
yogiyulianto/twitter-sentiment-dataset-en dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterThis dataset contains over 690,000 tweets labeled as Positive, Negative, or Neutral. The data can be used for sentiment analysis and natural language processing tasks. The tweets span various topics, making this a versatile dataset for training and evaluating machine learning models. The dataset was collected and labeled through. It offers a balanced distribution of sentiments to enable robust analysis
Sentiment Distribution: Positive: 248,516 (35.9%) Negative: 244,146 (35.3%) Neutral: 198,586 (28.7%)
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://raw.githubusercontent.com/Masterx-AI/Project_Twitter_Sentiment_Analysis_/main/twitt.jpg" alt="">
Twitter is an online Social Media Platform where people share their their though as tweets. It is observed that some people misuse it to tweet hateful content. Twitter is trying to tackle this problem and we shall help it by creating a strong NLP based-classifier model to distinguish the negative tweets & block such tweets. Can you build a strong classifier model to predict the same?
Each row contains the text of a tweet and a sentiment label. In the training set you are provided with a word or phrase drawn from the tweet (selected_text) that encapsulates the provided sentiment.
Make sure, when parsing the CSV, to remove the beginning / ending quotes from the text field, to ensure that you don't include them in your training.
You're attempting to predict the word or phrase from the tweet that exemplifies the provided sentiment. The word or phrase should include all characters within that span (i.e. including commas, spaces, etc.)
The dataset is download from Kaggle Competetions:
https://www.kaggle.com/c/tweet-sentiment-extraction/data?select=train.csv