Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset has three sentiments namely, negative, neutral, and positive. It contains two fields for the tweet and label.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains tweets labeled for sentiment analysis, categorized into Positive, Negative, and Neutral sentiments. The dataset includes tweet IDs, user metadata, sentiment labels, and tweet text, making it suitable for Natural Language Processing (NLP), machine learning, and AI-based sentiment classification research. Originally sourced from Kaggle, this dataset is curated for improved usability in social media sentiment analysis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Our dataset comprises 1000 tweets, which were taken from Twitter using the Python programming language. The dataset was stored in a CSV file and generated using various modules. The random module was used to generate random IDs and text, while the faker module was used to generate random user names and dates. Additionally, the textblob module was used to assign a random sentiment to each tweet.
This systematic approach ensures that the dataset is well-balanced and represents different types of tweets, user behavior, and sentiment. It is essential to have a balanced dataset to ensure that the analysis and visualization of the dataset are accurate and reliable. By generating tweets with a range of sentiments, we have created a diverse dataset that can be used to analyze and visualize sentiment trends and patterns.
In addition to generating the tweets, we have also prepared a visual representation of the data sets. This visualization provides an overview of the key features of the dataset, such as the frequency distribution of the different sentiment categories, the distribution of tweets over time, and the user names associated with the tweets. This visualization will aid in the initial exploration of the dataset and enable us to identify any patterns or trends that may be present.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SSH CENTRE (Social Sciences and Humanities for Climate, Energy aNd Transport Research Excellence) is a Horizon Europe project, engaging directly with stakeholders across research, policy, and business (including citizens) to strengthen social innovation, SSH-STEM collaboration, transdisciplinary policy advice, inclusive engagement, and SSH communities across Europe, accelerating the EU’s transition to carbon neutrality. SSH CENTRE is based in a range of activities related to Open Science, inclusivity and diversity – especially with regards Southern and Eastern Europe and different career stages – including: development of novel SSH-STEM collaborations to facilitate the delivery of the EU Green Deal; SSH knowledge brokerage to support regions in transition; and the effective design of strategies for citizen engagement in EU R&I activities. Outputs include action-led agendas and building stakeholder synergies through regular Policy Insight events.This is captured in a high-profile virtual SSH CENTRE generating and sharing best practice for SSH policy advice, overcoming fragmentation to accelerate the EU’s journey to a sustainable future.The documents uploaded here are part of WP2 whereby novel, interdisciplinary teams were provided funding to undertake activities to develop a policy recommendation related to EU Green Deal policy. Each of these policy recommendations, and the activities that inform them, will be written-up as a chapter in an edited book collection. Three books will make up this edited collection - one on climate, one on energy and one on mobility. As part of writing a chapter for the SSH CENTRE book on ‘Mobility’, we set out to analyse the sentiment of users on Twitter regarding shared and active mobility modes in Brussels. This involved us collecting tweets between 2017-2022. A tweet was collected if it contained a previously defined mobility keyword (for example: metro) and either the name of a (local) politician, a neighbourhood or municipality, or a (shared) mobility provider. The files attached to this Zenodo webpage is a csv files containing the tweets collected.”.
Dataset consists of 5 emotion labels. These labels are anger, happy, distinguish, surprise and fear. There are 800 tweets in the dataset for each label. Hence, total tweet count is 4000 for dataset.
You can use the data set in many areas such as sentiment, emotion analysis and topic modeling.
Info: Hashtags and usernames was removed in the dataset. Dataset has used many studies and researches. These researches are followed as: -(please citation this article) Güven, Z. A., Diri, B., & Cąkaloglu, T. (2020). Comparison of n-stage Latent Dirichlet Allocation versus other topic modeling methods for emotion analysis. Journal of the Faculty of Engineering and Architecture of Gazi University. https://doi.org/10.17341/gazimmfd.556104 -Güven, Z. A., Diri, B., & Çakaloğlu, T. (2019). Emotion Detection with n-stage Latent Dirichlet Allocation for Turkish Tweets. Academic Platform Journal of Engineering and Science. https://doi.org/10.21541/apjes.459447 -Guven, Z. A., Diri, B., & Cakaloglu, T. (2019). Comparison Method for Emotion Detection of Twitter Users. Proceedings - 2019 Innovations in Intelligent Systems and Applications Conference, ASYU 2019. https://doi.org/10.1109/ASYU48272.2019.8946435
https://brightdata.com/licensehttps://brightdata.com/license
Our Twitter Sentiment Analysis Dataset provides a comprehensive collection of tweets, enabling businesses, researchers, and analysts to assess public sentiment, track trends, and monitor brand perception in real time. This dataset includes detailed metadata for each tweet, allowing for in-depth analysis of user engagement, sentiment trends, and social media impact.
Key Features:
Tweet Content & Metadata: Includes tweet text, hashtags, mentions, media attachments, and engagement metrics such as likes, retweets, and replies.
Sentiment Classification: Analyze sentiment polarity (positive, negative, neutral) to gauge public opinion on brands, events, and trending topics.
Author & User Insights: Access user details such as username, profile information, follower count, and account verification status.
Hashtag & Topic Tracking: Identify trending hashtags and keywords to monitor conversations and sentiment shifts over time.
Engagement Metrics: Measure tweet performance based on likes, shares, and comments to evaluate audience interaction.
Historical & Real-Time Data: Choose from historical datasets for trend analysis or real-time data for up-to-date sentiment tracking.
Use Cases:
Brand Monitoring & Reputation Management: Track public sentiment around brands, products, and services to manage reputation and customer perception.
Market Research & Consumer Insights: Analyze consumer opinions on industry trends, competitor performance, and emerging market opportunities.
Political & Social Sentiment Analysis: Evaluate public opinion on political events, social movements, and global issues.
AI & Machine Learning Applications: Train sentiment analysis models for natural language processing (NLP) and predictive analytics.
Advertising & Campaign Performance: Measure the effectiveness of marketing campaigns by analyzing audience engagement and sentiment.
Our dataset is available in multiple formats (JSON, CSV, Excel) and can be delivered via API, cloud storage (AWS, Google Cloud, Azure), or direct download.
Gain valuable insights into social media sentiment and enhance your decision-making with high-quality, structured Twitter data.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Financial Sentiment Analysis Dataset
Overview
This dataset is a comprehensive collection of tweets focused on financial topics, meticulously curated to assist in sentiment analysis in the domain of finance and stock markets. It serves as a valuable resource for training machine learning models to understand and predict sentiment trends based on social media discourse, particularly within the financial sector.
Data Description
The dataset comprises tweets… See the full description on the dataset page: https://huggingface.co/datasets/TimKoornstra/financial-tweets-sentiment.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
If you use the dataset, cite the paper: https://doi.org/10.1016/j.eswa.2022.117541
The most comprehensive dataset to date regarding climate change and human opinions via Twitter. It has the heftiest temporal coverage, spanning over 13 years, includes over 15 million tweets spatially distributed across the world, and provides the geolocation of most tweets. Seven dimensions of information are tied to each tweet, namely geolocation, user gender, climate change stance and sentiment, aggressiveness, deviations from historic temperature, and topic modeling, while accompanied by environmental disaster events information. These dimensions were produced by testing and evaluating a plethora of state-of-the-art machine learning algorithms and methods, both supervised and unsupervised, including BERT, RNN, LSTM, CNN, SVM, Naive Bayes, VADER, Textblob, Flair, and LDA.
The following columns are in the dataset:
➡ created_at: The timestamp of the tweet. ➡ id: The unique id of the tweet. ➡ lng: The longitude the tweet was written. ➡ lat: The latitude the tweet was written. ➡ topic: Categorization of the tweet in one of ten topics namely, seriousness of gas emissions, importance of human intervention, global stance, significance of pollution awareness events, weather extremes, impact of resource overconsumption, Donald Trump versus science, ideological positions on global warming, politics, and undefined. ➡ sentiment: A score on a continuous scale. This scale ranges from -1 to 1 with values closer to 1 being translated to positive sentiment, values closer to -1 representing a negative sentiment while values close to 0 depicting no sentiment or being neutral. ➡ stance: That is if the tweet supports the belief of man-made climate change (believer), if the tweet does not believe in man-made climate change (denier), and if the tweet neither supports nor refuses the belief of man-made climate change (neutral). ➡ gender: Whether the user that made the tweet is male, female, or undefined. ➡ temperature_avg: The temperature deviation in Celsius and relative to the January 1951-December 1980 average at the time and place the tweet was written. ➡ aggressiveness: That is if the tweet contains aggressive language or not.
Since Twitter forbids making public the text of the tweets, in order to retrieve it you need to do a process called hydrating. Tools such as Twarc or Hydrator can be used to hydrate tweets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.
Below are the datasets specified, along with the details of their references, authors, and download sources.
----------- STS-Gold Dataset ----------------
The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.
Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.
File name: sts_gold_tweet.csv
----------- Amazon Sales Dataset ----------------
This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.
Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)
Features:
License: CC BY-NC-SA 4.0
File name: amazon.csv
----------- Rotten Tomatoes Reviews Dataset ----------------
This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.
This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).
Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics
File name: data_rt.csv
----------- Preprocessed Dataset Sentiment Analysis ----------------
Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.
The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).
DOI: 10.34740/kaggle/dsv/3877817
Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }
This dataset was used in the experimental phase of my research.
File name: EcoPreprocessed.csv
----------- Amazon Earphones Reviews ----------------
This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)
License: U.S. Government Works
Source: www.amazon.in
File name (original): AllProductReviews.csv (contains 14337 reviews)
File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)
----------- Amazon Musical Instruments Reviews ----------------
This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).
Source: http://jmcauley.ucsd.edu/data/amazon/
File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)
File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)
https://brightdata.com/licensehttps://brightdata.com/license
Utilize our Tweets dataset for a range of applications to enhance business strategies and market insights. Analyzing this dataset offers a comprehensive view of social media dynamics, empowering organizations to optimize their communication and marketing strategies. Access the full dataset or select specific data points tailored to your needs. Popular use cases include sentiment analysis to gauge public opinion and brand perception, competitor analysis by examining engagement and sentiment around rival brands, and crisis management through real-time tracking of tweet sentiment and influential voices during critical events.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for "Large twitter tweets sentiment analysis"
Dataset Description
Dataset Summary
This dataset is a collection of tweets formatted in a tabular data structure, annotated for sentiment analysis. Each tweet is associated with a sentiment label, with 1 indicating a Positive sentiment and 0 for a Negative sentiment.
Languages
The tweets in English.
Dataset Structure
Data Instances
An instance of the dataset includes… See the full description on the dataset page: https://huggingface.co/datasets/gxb912/large-twitter-tweets-sentiment.
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data introduction • Twitter-tweets-sentiment dataset is a dataset that aims to analyze tweet sentiment for Twitter and natural language processing.
2) Data utilization (1)Twitter-tweets-sentiment data has characteristics that: • The data consists of three columns, including emotion and text, and aims to block negative tweets through a powerful classification model. (2) Twitter-tweets-sentiment data can be used to: • Social Media Monitoring: Businesses and organizations can use data to monitor social media platforms and gauge public sentiment about a brand, product, event, or social issue. • Sentiment analysis: This dataset can be used to train models that classify the sentiment of tweets, which can help companies and researchers understand public opinion on a variety of topics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was created as part of a sentiment analysis project using enriched Twitter data. The objective was to train and test a machine learning model to automatically classify the sentiment of tweets (e.g., Positive, Negative, Neutral).
The data was generated using tweets that were sentiment-scored with a custom sentiment scorer. A machine learning pipeline was applied, including text preprocessing, feature extraction with CountVectorizer, and prediction with a HistGradientBoostingClassifier.
The dataset includes five main files:
test_predictions_full.csv – Predicted sentiment labels for the test set.
sentiment_model.joblib – Trained machine learning model.
count_vectorizer.joblib – Text feature extraction model (CountVectorizer).
model_performance.txt – Evaluation metrics and performance report of the trained model.
confusion_matrix.png – Visualization of the model’s confusion matrix.
The files follow standard naming conventions based on their purpose.
The .joblib files can be loaded into Python using the joblib and scikit-learn libraries.
The .csv,.txt, and .png files can be opened with any standard text reader, spreadsheet software, or image viewer.
Additional performance documentation is included within the model_performance.txt file.
The data was constructed to ensure reproducibility.
No personal or sensitive information is present.
It can be reused by researchers, data scientists, and students interested in Natural Language Processing (NLP), machine learning classification, and sentiment analysis tasks.
this dataset is a combination of over 6 different datasets found on Kaggle. the labels are 0 and 1 which means negative and positive tweets. in the cleared dataset I delete mentions. you can do any preprocessing you want on the dataset. I will appreciate any notebooks submitted on this dataset to help others with sentiment analysis tasks. I will submit mine as well.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Description
The Twitter Financial News dataset is an English-language dataset containing an annotated corpus of finance-related tweets. This dataset is used to classify finance-related tweets for their sentiment.
The dataset holds 11,932 documents annotated with 3 labels:
sentiments = { "LABEL_0": "Bearish", "LABEL_1": "Bullish", "LABEL_2": "Neutral" }
The data was collected using the Twitter API. The current dataset supports the multi-class classification… See the full description on the dataset page: https://huggingface.co/datasets/zeroshot/twitter-financial-news-sentiment.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
AfriSenti is the largest sentiment analysis benchmark dataset for under-represented African languages---covering 110,000+ annotated tweets in 14 African languages (Amharic, Algerian Arabic, Hausa, Igbo, Kinyarwanda, Moroccan Arabic, Mozambican Portuguese, Nigerian Pidgin, Oromo, Swahili, Tigrinya, Twi, Xitsonga, and yoruba).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset gives a cursory glimpse at the overall sentiment trend of the public discourse regarding the COVID-19 pandemic on Twitter. The live scatter plot of this dataset is available as The Overall Trend block at https://live.rlamsal.com.np. The trend graph reveals multiple peaks and drops that need further analysis. The n-grams during those peaks and drops can prove beneficial for better understanding the discourse.
The dataset comprises tweets labeled with sentiment ratings in an ordinal five-point scale, including classes for strongly negative, negative, neutral, positive, and strongly positive.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Arabic news credibility on Twitter using sentiment analysis and ensemble learning.
WHAT IS IT?
an Arabic news credibility model on Twitter using sentiment analysis and ensemble learning.
Here we include the Collected dataset and the source code of the proposed model written in Python language and using Keras library with Tensorflow backend.
Required Packages
Keras (https://keras.io/).
Scikit-learn (http://scikit-learn.org/)
Imnlearn (imbalanced-learn documentation — Version 0.10.1)
To Run the model
One data file is required to run the model which are:
The data that were used are the collected dataset in the file, set the path of the required data file in the code.
The dataset
There are the dataset file with all features, you can choose the features that you need and apply it on the model.
There are a description file that describe each feature in the news credibility dataset
The file Tweet_ID contains the list of tweets id in the dataset.
The annotated replies based on credibility is provided.
CONTACTS
If you want to report bugs or have general queries email to
A crucial part of sentiment classification is featuring extraction because it involves extracting valuable information from text data, which affects the model’s performance. The goal of this paper is to help in selecting a suitable feature extraction method to enhance the performance of sentiment analysis tasks. In order to provide directions for future machine learning and feature extraction research, it is important to analyze and summarize feature extraction techniques methodically from a machine learning standpoint. There are several methods under consideration, including Bag-of-words (BOW), Word2Vector, N-gram, Term Frequency- Inverse Document Frequency (TF-IDF), Hashing Vectorizer (HV), and Global vector for word representation (GloVe). To prove the ability of each feature extractor, we applied it to the Twitter US airlines and Amazon musical instrument reviews datasets. Finally, we trained a random forest classifier using 70% of the training data and 30% of the testing data, enabling us to evaluate and compare the performance using different metrics. Based on our results, we find that the TD-IDF technique demonstrates superior performance, with an accuracy of 99% in the Amazon reviews dataset and 96% in the Twitter US airlines dataset. This study underscores the paramount significance of feature extraction in sentiment analysis, endowing pragmatic insights to elevate model performance and steer future research pursuits.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset has three sentiments namely, negative, neutral, and positive. It contains two fields for the tweet and label.