Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains tweets labeled for sentiment analysis, categorized into Positive, Negative, and Neutral sentiments. The dataset includes tweet IDs, user metadata, sentiment labels, and tweet text, making it suitable for Natural Language Processing (NLP), machine learning, and AI-based sentiment classification research. Originally sourced from Kaggle, this dataset is curated for improved usability in social media sentiment analysis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset has three sentiments namely, negative, neutral, and positive. It contains two fields for the tweet and label.
Dataset consists of 5 emotion labels. These labels are anger, happy, distinguish, surprise and fear. There are 800 tweets in the dataset for each label. Hence, total tweet count is 4000 for dataset.
You can use the data set in many areas such as sentiment, emotion analysis and topic modeling.
Info: Hashtags and usernames was removed in the dataset. Dataset has used many studies and researches. These researches are followed as: -(please citation this article) Güven, Z. A., Diri, B., & Cąkaloglu, T. (2020). Comparison of n-stage Latent Dirichlet Allocation versus other topic modeling methods for emotion analysis. Journal of the Faculty of Engineering and Architecture of Gazi University. https://doi.org/10.17341/gazimmfd.556104 -Güven, Z. A., Diri, B., & Çakaloğlu, T. (2019). Emotion Detection with n-stage Latent Dirichlet Allocation for Turkish Tweets. Academic Platform Journal of Engineering and Science. https://doi.org/10.21541/apjes.459447 -Guven, Z. A., Diri, B., & Cakaloglu, T. (2019). Comparison Method for Emotion Detection of Twitter Users. Proceedings - 2019 Innovations in Intelligent Systems and Applications Conference, ASYU 2019. https://doi.org/10.1109/ASYU48272.2019.8946435
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Financial Sentiment Analysis Dataset
Overview
This dataset is a comprehensive collection of tweets focused on financial topics, meticulously curated to assist in sentiment analysis in the domain of finance and stock markets. It serves as a valuable resource for training machine learning models to understand and predict sentiment trends based on social media discourse, particularly within the financial sector.
Data Description
The dataset comprises tweets… See the full description on the dataset page: https://huggingface.co/datasets/TimKoornstra/financial-tweets-sentiment.
this dataset is a combination of over 6 different datasets found on Kaggle. the labels are 0 and 1 which means negative and positive tweets. in the cleared dataset I delete mentions. you can do any preprocessing you want on the dataset. I will appreciate any notebooks submitted on this dataset to help others with sentiment analysis tasks. I will submit mine as well.
https://brightdata.com/licensehttps://brightdata.com/license
Our Twitter Sentiment Analysis Dataset provides a comprehensive collection of tweets, enabling businesses, researchers, and analysts to assess public sentiment, track trends, and monitor brand perception in real time. This dataset includes detailed metadata for each tweet, allowing for in-depth analysis of user engagement, sentiment trends, and social media impact.
Key Features:
Tweet Content & Metadata: Includes tweet text, hashtags, mentions, media attachments, and engagement metrics such as likes, retweets, and replies.
Sentiment Classification: Analyze sentiment polarity (positive, negative, neutral) to gauge public opinion on brands, events, and trending topics.
Author & User Insights: Access user details such as username, profile information, follower count, and account verification status.
Hashtag & Topic Tracking: Identify trending hashtags and keywords to monitor conversations and sentiment shifts over time.
Engagement Metrics: Measure tweet performance based on likes, shares, and comments to evaluate audience interaction.
Historical & Real-Time Data: Choose from historical datasets for trend analysis or real-time data for up-to-date sentiment tracking.
Use Cases:
Brand Monitoring & Reputation Management: Track public sentiment around brands, products, and services to manage reputation and customer perception.
Market Research & Consumer Insights: Analyze consumer opinions on industry trends, competitor performance, and emerging market opportunities.
Political & Social Sentiment Analysis: Evaluate public opinion on political events, social movements, and global issues.
AI & Machine Learning Applications: Train sentiment analysis models for natural language processing (NLP) and predictive analytics.
Advertising & Campaign Performance: Measure the effectiveness of marketing campaigns by analyzing audience engagement and sentiment.
Our dataset is available in multiple formats (JSON, CSV, Excel) and can be delivered via API, cloud storage (AWS, Google Cloud, Azure), or direct download.
Gain valuable insights into social media sentiment and enhance your decision-making with high-quality, structured Twitter data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Our dataset comprises 1000 tweets, which were taken from Twitter using the Python programming language. The dataset was stored in a CSV file and generated using various modules. The random module was used to generate random IDs and text, while the faker module was used to generate random user names and dates. Additionally, the textblob module was used to assign a random sentiment to each tweet.
This systematic approach ensures that the dataset is well-balanced and represents different types of tweets, user behavior, and sentiment. It is essential to have a balanced dataset to ensure that the analysis and visualization of the dataset are accurate and reliable. By generating tweets with a range of sentiments, we have created a diverse dataset that can be used to analyze and visualize sentiment trends and patterns.
In addition to generating the tweets, we have also prepared a visual representation of the data sets. This visualization provides an overview of the key features of the dataset, such as the frequency distribution of the different sentiment categories, the distribution of tweets over time, and the user names associated with the tweets. This visualization will aid in the initial exploration of the dataset and enable us to identify any patterns or trends that may be present.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SSH CENTRE (Social Sciences and Humanities for Climate, Energy aNd Transport Research Excellence) is a Horizon Europe project, engaging directly with stakeholders across research, policy, and business (including citizens) to strengthen social innovation, SSH-STEM collaboration, transdisciplinary policy advice, inclusive engagement, and SSH communities across Europe, accelerating the EU’s transition to carbon neutrality. SSH CENTRE is based in a range of activities related to Open Science, inclusivity and diversity – especially with regards Southern and Eastern Europe and different career stages – including: development of novel SSH-STEM collaborations to facilitate the delivery of the EU Green Deal; SSH knowledge brokerage to support regions in transition; and the effective design of strategies for citizen engagement in EU R&I activities. Outputs include action-led agendas and building stakeholder synergies through regular Policy Insight events.This is captured in a high-profile virtual SSH CENTRE generating and sharing best practice for SSH policy advice, overcoming fragmentation to accelerate the EU’s journey to a sustainable future.The documents uploaded here are part of WP2 whereby novel, interdisciplinary teams were provided funding to undertake activities to develop a policy recommendation related to EU Green Deal policy. Each of these policy recommendations, and the activities that inform them, will be written-up as a chapter in an edited book collection. Three books will make up this edited collection - one on climate, one on energy and one on mobility. As part of writing a chapter for the SSH CENTRE book on ‘Mobility’, we set out to analyse the sentiment of users on Twitter regarding shared and active mobility modes in Brussels. This involved us collecting tweets between 2017-2022. A tweet was collected if it contained a previously defined mobility keyword (for example: metro) and either the name of a (local) politician, a neighbourhood or municipality, or a (shared) mobility provider. The files attached to this Zenodo webpage is a csv files containing the tweets collected.”.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for "Large twitter tweets sentiment analysis"
Dataset Description
Dataset Summary
This dataset is a collection of tweets formatted in a tabular data structure, annotated for sentiment analysis. Each tweet is associated with a sentiment label, with 1 indicating a Positive sentiment and 0 for a Negative sentiment.
Languages
The tweets in English.
Dataset Structure
Data Instances
An instance of the dataset includes… See the full description on the dataset page: https://huggingface.co/datasets/gxb912/large-twitter-tweets-sentiment.
LYTinn/sentiment-analysis-tweet dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was created as part of a sentiment analysis project using enriched Twitter data. The objective was to train and test a machine learning model to automatically classify the sentiment of tweets (e.g., Positive, Negative, Neutral).
The data was generated using tweets that were sentiment-scored with a custom sentiment scorer. A machine learning pipeline was applied, including text preprocessing, feature extraction with CountVectorizer, and prediction with a HistGradientBoostingClassifier.
The dataset includes five main files:
test_predictions_full.csv – Predicted sentiment labels for the test set.
sentiment_model.joblib – Trained machine learning model.
count_vectorizer.joblib – Text feature extraction model (CountVectorizer).
model_performance.txt – Evaluation metrics and performance report of the trained model.
confusion_matrix.png – Visualization of the model’s confusion matrix.
The files follow standard naming conventions based on their purpose.
The .joblib files can be loaded into Python using the joblib and scikit-learn libraries.
The .csv,.txt, and .png files can be opened with any standard text reader, spreadsheet software, or image viewer.
Additional performance documentation is included within the model_performance.txt file.
The data was constructed to ensure reproducibility.
No personal or sensitive information is present.
It can be reused by researchers, data scientists, and students interested in Natural Language Processing (NLP), machine learning classification, and sentiment analysis tasks.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset gives a cursory glimpse at the overall sentiment trend of the public discourse regarding the COVID-19 pandemic on Twitter. The live scatter plot of this dataset is available as The Overall Trend block at https://live.rlamsal.com.np. The trend graph reveals multiple peaks and drops that need further analysis. The n-grams during those peaks and drops can prove beneficial for better understanding the discourse.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Arabic news credibility on Twitter using sentiment analysis and ensemble learning.
WHAT IS IT?
an Arabic news credibility model on Twitter using sentiment analysis and ensemble learning.
Here we include the Collected dataset and the source code of the proposed model written in Python language and using Keras library with Tensorflow backend.
Required Packages
Keras (https://keras.io/).
Scikit-learn (http://scikit-learn.org/)
Imnlearn (imbalanced-learn documentation — Version 0.10.1)
To Run the model
One data file is required to run the model which are:
The data that were used are the collected dataset in the file, set the path of the required data file in the code.
The dataset
There are the dataset file with all features, you can choose the features that you need and apply it on the model.
There are a description file that describe each feature in the news credibility dataset
The file Tweet_ID contains the list of tweets id in the dataset.
The annotated replies based on credibility is provided.
CONTACTS
If you want to report bugs or have general queries email to
Sentiment analysis for airline tweets utilizing machine learning techniques.
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Sentiment Analysis Dataset is a dataset for emotional analysis, including large-scale tweet text collected from Twitter and emotional polarity (0=negative, 2=neutral, 4=positive) labels for each tweet, featuring automatic labeling based on emoticons.
2) Data Utilization (1) Sentiment Analysis Dataset has characteristics that: • Each sample consists of six columns: emotional polarity, tweet ID, date of writing, search word, author, and tweet body, and is suitable for training natural language processing and classification models using tweet text and emotion labels. (2) Sentiment Analysis Dataset can be used to: • Emotional Classification Model Development: Using tweet text and emotional polarity labels, we can build positive, negative, and neutral emotional automatic classification models with various machine learning and deep learning models such as logistic regression, SVM, RNN, and LSTM. • Analysis of SNS public opinion and trends: By analyzing the distribution of emotions by time series and keywords, you can explore changes in public opinion on specific issues or brands, positive and negative trends, and key emotional keywords.
https://brightdata.com/licensehttps://brightdata.com/license
Utilize our Tweets dataset for a range of applications to enhance business strategies and market insights. Analyzing this dataset offers a comprehensive view of social media dynamics, empowering organizations to optimize their communication and marketing strategies. Access the full dataset or select specific data points tailored to your needs. Popular use cases include sentiment analysis to gauge public opinion and brand perception, competitor analysis by examining engagement and sentiment around rival brands, and crisis management through real-time tracking of tweet sentiment and influential voices during critical events.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The present dataset contains Tweets in any language supported by Twitter obtained during the months January to March 2023, with any mention to the topic CCS/CCUS. The scraping process were done in Python, using the official Twitter API. All tweets were manually annotated after being machine translated into English.
Structure Every row contains: 1st cell (A): Language 2nd cell (B): Tweet-text 3rd cell (Cc: Benefit 4th cell (D): Concern 5th cell (E): Perception – Fight climate change 6th cell (F): Perception – Climate-friendly technology 7th cell (G): Perception – Extensive R&D needed 8th cell (H): Perception – Better options than CCS 9th cell (I): Sentiment 10th cell (J): Relatedness 11th cell (K): Comments
Annotations Benefit Preventing c. change Reducing c. change risks Safeguarding jobs Creating new jobs Fossil energy production envir. friendly Products envir. friendly Reducing envir. impact Other None Concern Accidents Leakages Environmental Earthquake-related Increased local traffic Investment Greenwashing Lock-in effects for fossil energy Increase cost Other None Perception (Yes / No / None) Fight climate change Climate-friendly technology Extensive R&D needed Better options than CCS Sentiment Positive Negative Neutral
https://www.gnu.org/licenses/old-licenses/lgpl-2.0-standalone.htmlhttps://www.gnu.org/licenses/old-licenses/lgpl-2.0-standalone.html
Dataset contains over 10k Arabic sentiment tweets classified into 4 classes: subjective positive, subjective negative, subjective mixed, and objective.
Context
Twitter Sentiment Analysis Dataset especially for classification using Logistic Regression
Content
tweet - Preprocessed token array for each tweet (Preprocessing done are remove hyperlinks, remove hashtags, remove stop words and punctuation)
bias - Just a simple bias value (default 1)
pos - Sum of positive frequencies of each word in the tweet tokens.
neg - Sum of negative frequencies of each word in the tweet tokens.
label - 1.0 for Positive Tweet and 0.0 for Negative Tweet.
Acknowledgements
This dataset was part of the Week 1 Labs of Coursera Natural Language Processing Course. This dataset was custom created from scratch using NLTL Library for text preprocessing and all functions for preprocessing were from scratch.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
General Description
This dataset comprises 4,038 tweets in Spanish, related to discussions about artificial intelligence (AI), and was created and utilized in the publication "Enhancing Sentiment Analysis on Social Media: Integrating Text and Metadata for Refined Insights," (10.1109/IE61493.2024.10599899) presented at the 20th International Conference on Intelligent Environments. It is designed to support research on public perception, sentiment, and engagement with AI topics on social media from a Spanish-speaking perspective. Each entry includes detailed annotations covering sentiment analysis, user engagement metrics, and user profile characteristics, among others.
Data Collection Method
Tweets were gathered through the Twitter API v1.1 by targeting keywords and hashtags associated with artificial intelligence, focusing specifically on content in Spanish. The dataset captures a wide array of discussions, offering a holistic view of the Spanish-speaking public's sentiment towards AI.
Dataset Content
ID: A unique identifier for each tweet.
text: The textual content of the tweet. It is a string with a maximum allowed length of 280 characters.
polarity: The tweet's sentiment polarity (e.g., Positive, Negative, Neutral).
favorite_count: Indicates how many times the tweet has been liked by Twitter users. It is a non-negative integer.
retweet_count: The number of times this tweet has been retweeted. It is a non-negative integer.
user_verified: When true, indicates that the user has a verified account, which helps the public recognize the authenticity of accounts of public interest. It is a boolean data type with two allowed values: True or False.
user_default_profile: When true, indicates that the user has not altered the theme or background of their user profile. It is a boolean data type with two allowed values: True or False.
user_has_extended_profile: When true, indicates that the user has an extended profile. An extended profile on Twitter allows users to provide more detailed information about themselves, such as an extended biography, a header image, details about their location, website, and other additional data. It is a boolean data type with two allowed values: True or False.
user_followers_count: The current number of followers the account has. It is a non-negative integer.
user_friends_count: The number of users that the account is following. It is a non-negative integer.
user_favourites_count: The number of tweets this user has liked since the account was created. It is a non-negative integer.
user_statuses_count: The number of tweets (including retweets) posted by the user. It is a non-negative integer.
user_protected: When true, indicates that this user has chosen to protect their tweets, meaning their tweets are not publicly visible without their permission. It is a boolean data type with two allowed values: True or False.
user_is_translator: When true, indicates that the user posting the tweet is a verified translator on Twitter. This means they have been recognized and validated by the platform as translators of content in different languages. It is a boolean data type with two allowed values: True or False.
Cite as
Guerrero-Contreras, G., Balderas-Díaz, S., Serrano-Fernández, A., & Muñoz, A. (2024, June). Enhancing Sentiment Analysis on Social Media: Integrating Text and Metadata for Refined Insights. In 2024 International Conference on Intelligent Environments (IE) (pp. 62-69). IEEE.
Potential Use Cases
This dataset is aimed at academic researchers and practitioners with interests in:
Sentiment analysis and natural language processing (NLP) with a focus on AI discussions in the Spanish language.
Social media analysis on public engagement and perception of artificial intelligence among Spanish speakers.
Exploring correlations between user engagement metrics and sentiment in discussions about AI.
Data Format and File Type
The dataset is provided in CSV format, ensuring compatibility with a wide range of data analysis tools and programming environments.
License
The dataset is available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license, permitting sharing, copying, distribution, transmission, and adaptation of the work for any purpose, including commercial, provided proper attribution is given.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains tweets labeled for sentiment analysis, categorized into Positive, Negative, and Neutral sentiments. The dataset includes tweet IDs, user metadata, sentiment labels, and tweet text, making it suitable for Natural Language Processing (NLP), machine learning, and AI-based sentiment classification research. Originally sourced from Kaggle, this dataset is curated for improved usability in social media sentiment analysis.