Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset has three sentiments namely, negative, neutral, and positive. It contains two fields for the tweet and label.
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The following information can also be found at https://www.kaggle.com/davidwallach/financial-tweets. Out of curosity, I just cleaned the .csv files to perform a sentiment analysis. So both the .csv files in this dataset are created by me.
Anything you read in the description is written by David Wallach and using all this information, I happen to perform my first ever sentiment analysis.
"I have been interested in using public sentiment and journalism to gather sentiment profiles on publicly traded companies. I first developed a Python package (https://github.com/dwallach1/Stocker) that scrapes the web for articles written about companies, and then noticed the abundance of overlap with Twitter. I then developed a NodeJS project that I have been running on my RaspberryPi to monitor Twitter for all tweets coming from those mentioned in the content section. If one of them tweeted about a company in the stocks_cleaned.csv file, then it would write the tweet to the database. Currently, the file is only from earlier today, but after about a month or two, I plan to update the tweets.csv file (hopefully closer to 50,000 entries.
I am not quite sure how this dataset will be relevant, but I hope to use these tweets and try to generate some sense of public sentiment score."
This dataset has all the publicly traded companies (tickers and company names) that were used as input to fill the tweets.csv. The influencers whose tweets were monitored were: ['MarketWatch', 'business', 'YahooFinance', 'TechCrunch', 'WSJ', 'Forbes', 'FT', 'TheEconomist', 'nytimes', 'Reuters', 'GerberKawasaki', 'jimcramer', 'TheStreet', 'TheStalwart', 'TruthGundlach', 'Carl_C_Icahn', 'ReformedBroker', 'benbernanke', 'bespokeinvest', 'BespokeCrypto', 'stlouisfed', 'federalreserve', 'GoldmanSachs', 'ianbremmer', 'MorganStanley', 'AswathDamodaran', 'mcuban', 'muddywatersre', 'StockTwits', 'SeanaNSmith'
The data used here is gathered from a project I developed : https://github.com/dwallach1/StockerBot
I hope to develop a financial sentiment text classifier that would be able to track Twitter's (and the entire public's) feelings about any publicly traded company (and cryptocurrency)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
because of COVID-19
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
If you use the dataset, cite the paper: https://doi.org/10.1016/j.eswa.2022.117541
The most comprehensive dataset to date regarding climate change and human opinions via Twitter. It has the heftiest temporal coverage, spanning over 13 years, includes over 15 million tweets spatially distributed across the world, and provides the geolocation of most tweets. Seven dimensions of information are tied to each tweet, namely geolocation, user gender, climate change stance and sentiment, aggressiveness, deviations from historic temperature, and topic modeling, while accompanied by environmental disaster events information. These dimensions were produced by testing and evaluating a plethora of state-of-the-art machine learning algorithms and methods, both supervised and unsupervised, including BERT, RNN, LSTM, CNN, SVM, Naive Bayes, VADER, Textblob, Flair, and LDA.
The following columns are in the dataset:
➡ created_at: The timestamp of the tweet. ➡ id: The unique id of the tweet. ➡ lng: The longitude the tweet was written. ➡ lat: The latitude the tweet was written. ➡ topic: Categorization of the tweet in one of ten topics namely, seriousness of gas emissions, importance of human intervention, global stance, significance of pollution awareness events, weather extremes, impact of resource overconsumption, Donald Trump versus science, ideological positions on global warming, politics, and undefined. ➡ sentiment: A score on a continuous scale. This scale ranges from -1 to 1 with values closer to 1 being translated to positive sentiment, values closer to -1 representing a negative sentiment while values close to 0 depicting no sentiment or being neutral. ➡ stance: That is if the tweet supports the belief of man-made climate change (believer), if the tweet does not believe in man-made climate change (denier), and if the tweet neither supports nor refuses the belief of man-made climate change (neutral). ➡ gender: Whether the user that made the tweet is male, female, or undefined. ➡ temperature_avg: The temperature deviation in Celsius and relative to the January 1951-December 1980 average at the time and place the tweet was written. ➡ aggressiveness: That is if the tweet contains aggressive language or not.
Since Twitter forbids making public the text of the tweets, in order to retrieve it you need to do a process called hydrating. Tools such as Twarc or Hydrator can be used to hydrate tweets.
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Dataset Card for Twitter Dataset: Tesla
Dataset Summary
This dataset contains all the Tweets regarding #Tesla or #tesla till 12/07/2022 (dd-mm-yyyy). It can be used for sentiment analysis research purpose or used in other NLP tasks or just for fun. It contains 10,000 recent Tweets with the user ID, the hashtags used in the Tweets, and other important features.
Supported Tasks and Leaderboards
[More Information Needed]
Languages
[More Information… See the full description on the dataset page: https://huggingface.co/datasets/hugginglearners/twitter-dataset-tesla.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Financial Sentiment Analysis Dataset
Overview
This dataset is a comprehensive collection of tweets focused on financial topics, meticulously curated to assist in sentiment analysis in the domain of finance and stock markets. It serves as a valuable resource for training machine learning models to understand and predict sentiment trends based on social media discourse, particularly within the financial sector.
Data Description
The dataset comprises tweets… See the full description on the dataset page: https://huggingface.co/datasets/TimKoornstra/financial-tweets-sentiment.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is a comprehensive collection of tweets designed for multilingual sentiment analysis tasks. It consists of three primary columns: "tweet","language" and "sentiment". The dataset's goal is to facilitate research and development in natural language processing and multi-task learning applications.
The "tweet" column contains the actual tweet scraped. They encompass a wide range of topics, opinions and sentiments expressed by users across the social media platforms. THe tweets are provided in their original text format.
The "language" column specifies the language in which each tweet is written. The dataset is carefully curated to include tweets from multiple languages.
The "sentiment" column contains sentiment ratings for each tweet in the range of 1 to 5 stars. 1 star represent strongly negative sentiment whereas 5 stars represent strongly positive sentiment. Intermediate values like 2,3 and 4 stars represent negative, neutral and positive sentiment.
Researchers and developers can leverage this dataset for a wide range of NLP and MTL tasks where they can jointly try to predict language and sentiment given a tweet.
The dataset is well structured and formatted in CSV file format.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SSH CENTRE (Social Sciences and Humanities for Climate, Energy aNd Transport Research Excellence) is a Horizon Europe project, engaging directly with stakeholders across research, policy, and business (including citizens) to strengthen social innovation, SSH-STEM collaboration, transdisciplinary policy advice, inclusive engagement, and SSH communities across Europe, accelerating the EU’s transition to carbon neutrality. SSH CENTRE is based in a range of activities related to Open Science, inclusivity and diversity – especially with regards Southern and Eastern Europe and different career stages – including: development of novel SSH-STEM collaborations to facilitate the delivery of the EU Green Deal; SSH knowledge brokerage to support regions in transition; and the effective design of strategies for citizen engagement in EU R&I activities. Outputs include action-led agendas and building stakeholder synergies through regular Policy Insight events.This is captured in a high-profile virtual SSH CENTRE generating and sharing best practice for SSH policy advice, overcoming fragmentation to accelerate the EU’s journey to a sustainable future.The documents uploaded here are part of WP2 whereby novel, interdisciplinary teams were provided funding to undertake activities to develop a policy recommendation related to EU Green Deal policy. Each of these policy recommendations, and the activities that inform them, will be written-up as a chapter in an edited book collection. Three books will make up this edited collection - one on climate, one on energy and one on mobility. As part of writing a chapter for the SSH CENTRE book on ‘Mobility’, we set out to analyse the sentiment of users on Twitter regarding shared and active mobility modes in Brussels. This involved us collecting tweets between 2017-2022. A tweet was collected if it contained a previously defined mobility keyword (for example: metro) and either the name of a (local) politician, a neighbourhood or municipality, or a (shared) mobility provider. The files attached to this Zenodo webpage is a csv files containing the tweets collected.”.
This dataset contains over 690,000 tweets labeled as Positive, Negative, or Neutral. The data can be used for sentiment analysis and natural language processing tasks. The tweets span various topics, making this a versatile dataset for training and evaluating machine learning models. The dataset was collected and labeled through. It offers a balanced distribution of sentiments to enable robust analysis
Sentiment Distribution: Positive: 248,516 (35.9%) Negative: 244,146 (35.3%) Neutral: 198,586 (28.7%)
https://brightdata.com/licensehttps://brightdata.com/license
Utilize our Tweets dataset for a range of applications to enhance business strategies and market insights. Analyzing this dataset offers a comprehensive view of social media dynamics, empowering organizations to optimize their communication and marketing strategies. Access the full dataset or select specific data points tailored to your needs. Popular use cases include sentiment analysis to gauge public opinion and brand perception, competitor analysis by examining engagement and sentiment around rival brands, and crisis management through real-time tracking of tweet sentiment and influential voices during critical events.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for "Large twitter tweets sentiment analysis"
Dataset Description
Dataset Summary
This dataset is a collection of tweets formatted in a tabular data structure, annotated for sentiment analysis. Each tweet is associated with a sentiment label, with 1 indicating a Positive sentiment and 0 for a Negative sentiment.
Languages
The tweets in English.
Dataset Structure
Data Instances
An instance of the dataset includes… See the full description on the dataset page: https://huggingface.co/datasets/gxb912/large-twitter-tweets-sentiment.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset gives a cursory glimpse at the overall sentiment trend of the public discourse regarding the COVID-19 pandemic on Twitter. The live scatter plot of this dataset is available as The Overall Trend block at https://live.rlamsal.com.np. The trend graph reveals multiple peaks and drops that need further analysis. The n-grams during those peaks and drops can prove beneficial for better understanding the discourse.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://raw.githubusercontent.com/Masterx-AI/Project_Twitter_Sentiment_Analysis_/main/twitt.jpg" alt="">
Twitter is an online Social Media Platform where people share their their though as tweets. It is observed that some people misuse it to tweet hateful content. Twitter is trying to tackle this problem and we shall help it by creating a strong NLP based-classifier model to distinguish the negative tweets & block such tweets. Can you build a strong classifier model to predict the same?
Each row contains the text of a tweet and a sentiment label. In the training set you are provided with a word or phrase drawn from the tweet (selected_text) that encapsulates the provided sentiment.
Make sure, when parsing the CSV, to remove the beginning / ending quotes from the text field, to ensure that you don't include them in your training.
You're attempting to predict the word or phrase from the tweet that exemplifies the provided sentiment. The word or phrase should include all characters within that span (i.e. including commas, spaces, etc.)
The dataset is download from Kaggle Competetions:
https://www.kaggle.com/c/tweet-sentiment-extraction/data?select=train.csv
This dataset was created by Tùng Lê Thanh
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a large-scale, multilingual and longitudinal Twitter sentiment dataset sampled through distant supervision from the Twitter Stream Grab archive (https://archive.org/details/twitterstream). It covers the time period between January 2013 and June 2020 for 7 languages:- Arabic (ar)- German (de)- English (en)- Spanish (es)- French (fr)- Italian (it)- Chinese (zh)With the files in this repository, we provide tweet IDs that can be used to rehydrate the datasets by using the files available from the Twitter Stream Grab.Files are formatted as TSV files, with the following columns:date \t tweetid \t sentiment \t evidencewhere:- date is the day in which the tweet was posted.- tweetid is the ID of the tweet- sentiment is either pos or neg- evidence is the set of emojis or emoticons used to determine if the tweet was positive or negative.More details about the dataset can be found in the following paper (please cite the paper if you use the dataset):TBA
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Overview This is an entity-level sentiment analysis dataset of twitter. Given a message and an entity, the task is to judge the sentiment of the message about the entity. There are three classes in this dataset: Positive, Negative and Neutral. We regard messages that are not relevant to the entity (i.e. Irrelevant) as Neutral.
Usage Please use twitter_training.csv as the training set and twitter_validation.csv as the validation set. Top 1 classification accuracy is used as the metric.
Original Data Source: Twitter Sentiment Analysis
General Description
This dataset comprises 4,038 tweets in Spanish, related to discussions about artificial intelligence (AI), and was created and utilized in the publication "Enhancing Sentiment Analysis on Social Media: Integrating Text and Metadata for Refined Insights," (10.1109/IE61493.2024.10599899) presented at the 20th International Conference on Intelligent Environments. It is designed to support research on public perception, sentiment, and engagement with AI topics on social media from a Spanish-speaking perspective. Each entry includes detailed annotations covering sentiment analysis, user engagement metrics, and user profile characteristics, among others.
Data Collection Method
Tweets were gathered through the Twitter API v1.1 by targeting keywords and hashtags associated with artificial intelligence, focusing specifically on content in Spanish. The dataset captures a wide array of discussions, offering a holistic view of the Spanish-speaking public's sentiment towards AI.
Dataset Content
ID: A unique identifier for each tweet.
text: The textual content of the tweet. It is a string with a maximum allowed length of 280 characters.
polarity: The tweet's sentiment polarity (e.g., Positive, Negative, Neutral).
favorite_count: Indicates how many times the tweet has been liked by Twitter users. It is a non-negative integer.
retweet_count: The number of times this tweet has been retweeted. It is a non-negative integer.
user_verified: When true, indicates that the user has a verified account, which helps the public recognize the authenticity of accounts of public interest. It is a boolean data type with two allowed values: True or False.
user_default_profile: When true, indicates that the user has not altered the theme or background of their user profile. It is a boolean data type with two allowed values: True or False.
user_has_extended_profile: When true, indicates that the user has an extended profile. An extended profile on Twitter allows users to provide more detailed information about themselves, such as an extended biography, a header image, details about their location, website, and other additional data. It is a boolean data type with two allowed values: True or False.
user_followers_count: The current number of followers the account has. It is a non-negative integer.
user_friends_count: The number of users that the account is following. It is a non-negative integer.
user_favourites_count: The number of tweets this user has liked since the account was created. It is a non-negative integer.
user_statuses_count: The number of tweets (including retweets) posted by the user. It is a non-negative integer.
user_protected: When true, indicates that this user has chosen to protect their tweets, meaning their tweets are not publicly visible without their permission. It is a boolean data type with two allowed values: True or False.
user_is_translator: When true, indicates that the user posting the tweet is a verified translator on Twitter. This means they have been recognized and validated by the platform as translators of content in different languages. It is a boolean data type with two allowed values: True or False.
Cite as
Guerrero-Contreras, G., Balderas-Díaz, S., Serrano-Fernández, A., & Muñoz, A. (2024, June). Enhancing Sentiment Analysis on Social Media: Integrating Text and Metadata for Refined Insights. In 2024 International Conference on Intelligent Environments (IE) (pp. 62-69). IEEE.
Potential Use Cases
This dataset is aimed at academic researchers and practitioners with interests in:
Sentiment analysis and natural language processing (NLP) with a focus on AI discussions in the Spanish language.
Social media analysis on public engagement and perception of artificial intelligence among Spanish speakers.
Exploring correlations between user engagement metrics and sentiment in discussions about AI.
Data Format and File Type
The dataset is provided in CSV format, ensuring compatibility with a wide range of data analysis tools and programming environments.
License
The dataset is available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license, permitting sharing, copying, distribution, transmission, and adaptation of the work for any purpose, including commercial, provided proper attribution is given.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is collected and annotated for the SMILE project http://www.culturesmile.org. This collection of tweets mentioning 13 Twitter handles associated with British museums was gathered between May 2013 and June 2015. It was created for the purpose of classifying emotions, expressed on Twitter towards arts and cultural experiences in museums. It contains 3,085 tweets, with 5 emotions namely anger, disgust, happiness, surprise and sadness. Please see our paper "SMILE: Twitter Emotion Classification using Domain Adaptation" for more details of the dataset.License: The annotations are provided under a CC-BY license, while Twitter retains the ownership and rights of the content of the tweets.
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This comprehensive dataset is a meticulously curated collection of mental health statuses tagged from various statements. The dataset amalgamates raw data from multiple sources, cleaned and compiled to create a robust resource for developing chatbots and performing sentiment analysis.
The dataset integrates information from the following Kaggle datasets:
The dataset consists of statements tagged with one of the following seven mental health statuses: - Normal - Depression - Suicidal - Anxiety - Stress - Bi-Polar - Personality Disorder
The data is sourced from diverse platforms including social media posts, Reddit posts, Twitter posts, and more. Each entry is tagged with a specific mental health status, making it an invaluable asset for:
This dataset is ideal for training machine learning models aimed at understanding and predicting mental health conditions based on textual data. It can be used in various applications such as:
This dataset was created by aggregating and cleaning data from various publicly available datasets on Kaggle. Special thanks to the original dataset creators for their contributions.
This dataset has been created within Project TRACES (more information: https://traces.gate-ai.eu/). The dataset contains 1810 unique tweet IDs, written in Bulgarian, with annotations (positive, negative, neutral). The tweets are on the topics of lies, manipulation, and Covid-19 and are a subset of the following datasets:
https://zenodo.org/record/7296865
https://zenodo.org/record/7296736
https://zenodo.org/record/7296877
The tweets have been collected via Twitter API under academic access between 1 Jan 2020 - 28 June 2022 and thus cannot be used for commercial purposes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset has three sentiments namely, negative, neutral, and positive. It contains two fields for the tweet and label.