MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for "Large twitter tweets sentiment analysis"
Dataset Description
Dataset Summary
This dataset is a collection of tweets formatted in a tabular data structure, annotated for sentiment analysis. Each tweet is associated with a sentiment label, with 1 indicating a Positive sentiment and 0 for a Negative sentiment.
Languages
The tweets in English.
Dataset Structure
Data Instances
An instance of⊠See the full description on the dataset page: https://huggingface.co/datasets/gxb912/large-twitter-tweets-sentiment.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains tweets labeled for sentiment analysis, categorized into Positive, Negative, and Neutral sentiments. The dataset includes tweet IDs, user metadata, sentiment labels, and tweet text, making it suitable for Natural Language Processing (NLP), machine learning, and AI-based sentiment classification research. Originally sourced from Kaggle, this dataset is curated for improved usability in social media sentiment analysis.
General Description
This dataset comprises 4,038 tweets in Spanish, related to discussions about artificial intelligence (AI), and was created and utilized in the publication "Enhancing Sentiment Analysis on Social Media: Integrating Text and Metadata for Refined Insights," (10.1109/IE61493.2024.10599899) presented at the 20th International Conference on Intelligent Environments. It is designed to support research on public perception, sentiment, and engagement with AI topics on social media from a Spanish-speaking perspective. Each entry includes detailed annotations covering sentiment analysis, user engagement metrics, and user profile characteristics, among others.
Data Collection Method
Tweets were gathered through the Twitter API v1.1 by targeting keywords and hashtags associated with artificial intelligence, focusing specifically on content in Spanish. The dataset captures a wide array of discussions, offering a holistic view of the Spanish-speaking public's sentiment towards AI.
Dataset Content
ID: A unique identifier for each tweet.
text: The textual content of the tweet. It is a string with a maximum allowed length of 280 characters.
polarity: The tweet's sentiment polarity (e.g., Positive, Negative, Neutral).
favorite_count: Indicates how many times the tweet has been liked by Twitter users. It is a non-negative integer.
retweet_count: The number of times this tweet has been retweeted. It is a non-negative integer.
user_verified: When true, indicates that the user has a verified account, which helps the public recognize the authenticity of accounts of public interest. It is a boolean data type with two allowed values: True or False.
user_default_profile: When true, indicates that the user has not altered the theme or background of their user profile. It is a boolean data type with two allowed values: True or False.
user_has_extended_profile: When true, indicates that the user has an extended profile. An extended profile on Twitter allows users to provide more detailed information about themselves, such as an extended biography, a header image, details about their location, website, and other additional data. It is a boolean data type with two allowed values: True or False.
user_followers_count: The current number of followers the account has. It is a non-negative integer.
user_friends_count: The number of users that the account is following. It is a non-negative integer.
user_favourites_count: The number of tweets this user has liked since the account was created. It is a non-negative integer.
user_statuses_count: The number of tweets (including retweets) posted by the user. It is a non-negative integer.
user_protected: When true, indicates that this user has chosen to protect their tweets, meaning their tweets are not publicly visible without their permission. It is a boolean data type with two allowed values: True or False.
user_is_translator: When true, indicates that the user posting the tweet is a verified translator on Twitter. This means they have been recognized and validated by the platform as translators of content in different languages. It is a boolean data type with two allowed values: True or False.
Cite as
Guerrero-Contreras, G., Balderas-DĂaz, S., Serrano-FernĂĄndez, A., & Muñoz, A. (2024, June). Enhancing Sentiment Analysis on Social Media: Integrating Text and Metadata for Refined Insights. In 2024 International Conference on Intelligent Environments (IE) (pp. 62-69). IEEE.
Potential Use Cases
This dataset is aimed at academic researchers and practitioners with interests in:
Sentiment analysis and natural language processing (NLP) with a focus on AI discussions in the Spanish language.
Social media analysis on public engagement and perception of artificial intelligence among Spanish speakers.
Exploring correlations between user engagement metrics and sentiment in discussions about AI.
Data Format and File Type
The dataset is provided in CSV format, ensuring compatibility with a wide range of data analysis tools and programming environments.
License
The dataset is available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license, permitting sharing, copying, distribution, transmission, and adaptation of the work for any purpose, including commercial, provided proper attribution is given.
This dataset was created by Ghany Fitriamara
Released under Other (specified in description)
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Dataset Card for Twitter Dataset: Tesla
Dataset Summary
This dataset contains all the Tweets regarding #Tesla or #tesla till 12/07/2022 (dd-mm-yyyy). It can be used for sentiment analysis research purpose or used in other NLP tasks or just for fun. It contains 10,000 recent Tweets with the user ID, the hashtags used in the Tweets, and other important features.
Supported Tasks and Leaderboards
[More Information Needed]
Languages
[More⊠See the full description on the dataset page: https://huggingface.co/datasets/hugginglearners/twitter-dataset-tesla.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SSH CENTRE (Social Sciences and Humanities for Climate, Energy aNd Transport Research Excellence) is a Horizon Europe project, engaging directly with stakeholders across research, policy, and business (including citizens) to strengthen social innovation, SSH-STEM collaboration, transdisciplinary policy advice, inclusive engagement, and SSH communities across Europe, accelerating the EUâs transition to carbon neutrality. SSH CENTRE is based in a range of activities related to Open Science, inclusivity and diversity â especially with regards Southern and Eastern Europe and different career stages â including: development of novel SSH-STEM collaborations to facilitate the delivery of the EU Green Deal; SSH knowledge brokerage to support regions in transition; and the effective design of strategies for citizen engagement in EU R&I activities. Outputs include action-led agendas and building stakeholder synergies through regular Policy Insight events.This is captured in a high-profile virtual SSH CENTRE generating and sharing best practice for SSH policy advice, overcoming fragmentation to accelerate the EUâs journey to a sustainable future.The documents uploaded here are part of WP2 whereby novel, interdisciplinary teams were provided funding to undertake activities to develop a policy recommendation related to EU Green Deal policy. Each of these policy recommendations, and the activities that inform them, will be written-up as a chapter in an edited book collection. Three books will make up this edited collection - one on climate, one on energy and one on mobility. As part of writing a chapter for the SSH CENTRE book on âMobilityâ, we set out to analyse the sentiment of users on Twitter regarding shared and active mobility modes in Brussels. This involved us collecting tweets between 2017-2022. A tweet was collected if it contained a previously defined mobility keyword (for example: metro) and either the name of a (local) politician, a neighbourhood or municipality, or a (shared) mobility provider. The files attached to this Zenodo webpage is a csv files containing the tweets collected.â.
https://market.oceanprotocol.com/termshttps://market.oceanprotocol.com/terms
This report delves into the correlation between Twitter engagement metrics, including likes, retweets, and influential tweets, and the price movements of the OCEAN token. By analyzing the relationship between these social media engagement indicators and the token's price, we aim to gain valuable insights into the impact of Twitter sentiment on OCEAN's market dynamics.
Additionally, this report showcases a Transformer model specifically designed for sentiment classification of tweets related to the OCEAN token. Leveraging the rich dataset of "The Twitter Financial Dataset (sentiment) version 1.0.0," the model classify tweets as bullish, bearish, or neutral. This classification capability allows us to gauge the prevailing sentiment of the Twitter community towards the OCEAN token.
This is a data set of 482,251 public tweets and retweets (Twitter IDs) posted by the #edchat online community of educators who discuss current trends in teaching with technology. The data set was collected via Twitter's Streaming API between Feb 1, 2018 and Apr 4, 2018, and was used as part of the research on developing a learning analytics dashboard for teaching and learning with Twitter. Following Twitter's terms of service, the data set only includes unique identifiers of relevant tweets. To collect the actual tweets that are part of this data set, you will need to use one of the available third party tools such as Hydrator or Twarc ("hydrate" function). As part of this release, we are also attaching an enriched version of this data set that contains sentiment and opinion analysis labels that were produced by analyzing each tweet with the help of the NLTK SentimentAnalyzer Python package. *This work was supported in part by eCampusOntario and The Social Sciences and Humanities Research Council of Canada.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a longitudinal Twitter dataset of 143K users during the period 2017-2021. The following is the detail of all the files:
1. user: This column represents the identifier for the user. Each row in the CSV corresponds to a specific user, and this column helps to track and differentiate between the users.
2. avg_topic_probability: This column contains the average probability of the topics for each user calculated across all of the tweets in order to compare users in a meaningful way. It represents the average likelihood that a particular user discusses various topics over the observed period.
3. maximum_topic_avg: This column holds the value of the highest average probability among all topics for each user. It indicates the topic that the user most frequently discusses, on average.
4. index_max_avg_topic_probability_200: This column specifies the index or identifier of the topic with the highest average probability out of 200 possible topics. It shows which topic (out of 200) the user discusses the most.
5. global_avg: This column includes the global average probability of topics across all users. It provides a baseline or overall average topic probability that can be used for comparative purposes.
6. max_global_avg: This column contains the maximum global average probability across all topics for all users. It identifies the most discussed topic across the entire user base.
7. index_max_global_avg: This column shows the index or identifier of the topic with the highest global average probability. It indicates which topic (out of 200) is the most popular across all users.
8. entropy_200_topic: This column represents the entropy of the topics for each user, calculated over 200 topics. Entropy measures the diversity or unpredictability in the user's discussion of topics, with higher entropy indicating more varied topic discussion.
In summary, these columns are used to analyze the topic engagement and preferences of users on a platform, highlighting the most frequently discussed topics, the variability in topic discussions, and how individual user behavior compares to overall trends.
This dataset was created by Madhavi
Released under Other (specified in description)
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Financial Sentiment Analysis Dataset
Overview
This dataset is a comprehensive collection of tweets focused on financial topics, meticulously curated to assist in sentiment analysis in the domain of finance and stock markets. It serves as a valuable resource for training machine learning models to understand and predict sentiment trends based on social media discourse, particularly within the financial sector.
Data Description
The dataset comprises⊠See the full description on the dataset page: https://huggingface.co/datasets/TimKoornstra/financial-tweets-sentiment.
This dataset was created by Blesson Densil
Released under Data files © Original Authors
https://brightdata.com/licensehttps://brightdata.com/license
Utilize our Tweets dataset for a range of applications to enhance business strategies and market insights. Analyzing this dataset offers a comprehensive view of social media dynamics, empowering organizations to optimize their communication and marketing strategies. Access the full dataset or select specific data points tailored to your needs. Popular use cases include sentiment analysis to gauge public opinion and brand perception, competitor analysis by examining engagement and sentiment around rival brands, and crisis management through real-time tracking of tweet sentiment and influential voices during critical events.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MAVIS dataset comprises a full knowledge base regarding Twitter messages published in Spanish during the period 2015-2018, in the context of sentiment analysis of specific vaccines and their related diseases. Such diseases and vaccines are summarized as follows:
Tweets have been manually classified as having a negative or non-negative sentiment by 5 experts. Moreover, an automatic classification has been performed by 3 different tools: IBM Watson (now Watson Tone Analyzer, https://www.ibm.com/watson/services/tone-analyzer/), Google Cloud Natural Language (https://cloud.google.com/natural-language), and Meaning Cloud (https://www.meaningcloud.com/). IBM Watson and Google Cloud Natural Language returned a numerical sentiment score ranging from -1 to 1, while Meaning Cloud returned a categorical variable with the values âP+â, âPâ, âNEUâ, âNâ and âN+â, which were converted to 1, 2, 3, 4 and 5 respectively.
With these variables (IBM Watson, Google Cloud Natural Language, and Meaning Cloud annotations and the expertsâ classification as the target label), a machine learning metamodel was developed. Tweets were also annotated with the sentiment output given by this classifier.
The provided data includes intrinsic tweets information, intrinsic information regarding the users that posted the tweets, the keywords mentioned in each tweet, and the annotations that the experts, the tools, and the model gave to each tweet.
Funding: This dataset was obtained with funding from MSD, Spain under MAVIS Study (VEAP ID: 7789).
Current studies using this dataset at the moment of the publication:
This dataset was created by Subhashini
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The rapid evolution of microblogging and the emergence of sites such as Twitter have propelled online communities to flourish by enabling people to create, share and disseminate free-flowing messages and information globally. The exponential growth of product-based user reviews has become an ever-increasing resource playing a key role in emerging Twitter-based sentiment analysis (SA) techniques and applications to collect and analyse customer trends and reviews. Existing studies on supervised black-box sentiment analysis systems do not provide adequate information, regarding rules as to why a certain review was classified to a class or classification. The accuracy in some ways is less than our personal judgement. To address these shortcomings, alternative approaches, such as supervised white-box classification algorithms, need to be developed to improve the classification of Twitter-based microblogs. The purpose of this study was to develop a supervised white-box microblogging SA system to analyse user reviews on certain products using rough set theory (RST)-based rule induction algorithms. RST classifies microblogging reviews of products into positive, negative, or neutral class using different rules extracted from training decision tables using RST-centric rule induction algorithms. The primary focus of this study is also to perform sentiment classification of microblogs (i.e. also known as tweets) of product reviews using conventional, and RST-based rule induction algorithms. The proposed RST-centric rule induction algorithm, namely Learning from Examples Module version: 2, and LEM2 +" role="presentation" style="box-sizing: border-box; display: inline-table; line-height: normal; letter-spacing: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; position: relative;">++ Corpus-based rules (LEM2 +" role="presentation" style="box-sizing: border-box; display: inline-table; line-height: normal; letter-spacing: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; position: relative;">++ CBR),which is an extension of the traditional LEM2 algorithm, are used. Corpus-based rules are generated from tweets, which are unclassified using other conventional LEM2 algorithm rules. Experimental results show the proposed method, when compared with baseline methods, is excellent, with regard to accuracy, coverage and the number of rules employed. The approach using this method achieves an average accuracy of 92.57% and an average coverage of 100%, with an average number of rules of 19.14.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The present dataset contains Tweets in any language supported by Twitter obtained during the months January to March 2023, with any mention to the topic CCS/CCUS. The scraping process were done in Python, using the official Twitter API. All tweets were manually annotated after being machine translated into English.
Structure Every row contains: 1st cell (A): Language 2nd cell (B): Tweet-text 3rd cell (Cc: Benefit 4th cell (D): Concern 5th cell (E): Perception â Fight climate change 6th cell (F): Perception â Climate-friendly technology 7th cell (G): Perception â Extensive R&D needed 8th cell (H): Perception â Better options than CCS 9th cell (I): Sentiment 10th cell (J): Relatedness 11th cell (K): Comments
Annotations Benefit Preventing c. change Reducing c. change risks Safeguarding jobs Creating new jobs Fossil energy production envir. friendly Products envir. friendly Reducing envir. impact Other None Concern Accidents Leakages Environmental Earthquake-related Increased local traffic Investment Greenwashing Lock-in effects for fossil energy Increase cost Other None Perception (Yes / No / None) Fight climate change Climate-friendly technology Extensive R&D needed Better options than CCS Sentiment Positive Negative Neutral
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset ini merupakan hasil dari scraping pada media sosial twitter dengan menggunakan aplikasi twint yang ditujukan pada hashtag #IndonesiaHumanRightsSOS. Scraping data dilakukan untuk cuitan yang dibuat dari tanggal 18 Desember 2020 10:59 AM s/d 19 Desember 2020 23:18 PM.
Pada dataset mengandung 106.903 Row data dengan informasi terkait: User ID, Username, Twitter Name,Tweets, dsb.
Selain itu dilampirkan juga contoh data yang telah dianalisis berupa wordcloud,username cloud, 100 most used word & most active username.
-
This dataset is the result of scraping on social media twitter using the twint application aimed at the hashtag #IndonesiaHumanRightsSOS. Data scraping is done for tweets made from December 18 2020 10:59 AM to December 19 2020 23:18 PM.
The dataset contains 106,903 rows of data with related information: User ID, Username, Twitter Name, Tweets, etc.
Also there is an example of the data that has been analyzed in the form of wordcloud, username cloud, 100 most used words & most active username.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is collected and annotated for the SMILE project http://www.culturesmile.org. This collection of tweets mentioning 13 Twitter handles associated with British museums was gathered between May 2013 and June 2015. It was created for the purpose of classifying emotions, expressed on Twitter towards arts and cultural experiences in museums. It contains 3,085 tweets, with 5 emotions namely anger, disgust, happiness, surprise and sadness. Please see our paper "SMILE: Twitter Emotion Classification using Domain Adaptation" for more details of the dataset.License: The annotations are provided under a CC-BY license, while Twitter retains the ownership and rights of the content of the tweets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Number of initial and usable tweets for extracting the discussion topic.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for "Large twitter tweets sentiment analysis"
Dataset Description
Dataset Summary
This dataset is a collection of tweets formatted in a tabular data structure, annotated for sentiment analysis. Each tweet is associated with a sentiment label, with 1 indicating a Positive sentiment and 0 for a Negative sentiment.
Languages
The tweets in English.
Dataset Structure
Data Instances
An instance of⊠See the full description on the dataset page: https://huggingface.co/datasets/gxb912/large-twitter-tweets-sentiment.