100+ datasets found

h
large-twitter-tweets-sentiment
huggingface.co
Updated Mar 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gong Xiangbo (2024). large-twitter-tweets-sentiment [Dataset]. https://huggingface.co/datasets/gxb912/large-twitter-tweets-sentiment
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 6, 2024
Authors
Gong Xiangbo
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for "Large twitter tweets sentiment analysis"

Dataset Description Dataset Summary

This dataset is a collection of tweets formatted in a tabular data structure, annotated for sentiment analysis. Each tweet is associated with a sentiment label, with 1 indicating a Positive sentiment and 0 for a Negative sentiment.

Languages

The tweets in English.

Dataset Structure Data Instances

An instance of… See the full description on the dataset page: https://huggingface.co/datasets/gxb912/large-twitter-tweets-sentiment.
Twitter dataset
figshare.com
csv
Updated Feb 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shreyas Poojary; Mohammed Riza; Rashmi Laxmikant Malghan (2025). Twitter dataset [Dataset]. http://doi.org/10.6084/m9.figshare.28390334.v2
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28390334.v2
Dataset updated
Feb 11, 2025
Dataset provided by
figshare
Authors
Shreyas Poojary; Mohammed Riza; Rashmi Laxmikant Malghan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains tweets labeled for sentiment analysis, categorized into Positive, Negative, and Neutral sentiments. The dataset includes tweet IDs, user metadata, sentiment labels, and tweet text, making it suitable for Natural Language Processing (NLP), machine learning, and AI-based sentiment classification research. Originally sourced from Kaggle, this dataset is curated for improved usability in social media sentiment analysis.
u
Data from: IA Tweets Analysis Dataset (Spanish)
produccioncientifica.uca.es
data.niaid.nih.gov
+1more
Updated 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Guerrero-Contreras, Gabriel; Balderas-Díaz, Sara; Serrano-Fernández, Alejandro; Muñoz, Andrés; Guerrero-Contreras, Gabriel; Balderas-Díaz, Sara; Serrano-Fernández, Alejandro; Muñoz, Andrés (2024). IA Tweets Analysis Dataset (Spanish) [Dataset]. https://produccioncientifica.uca.es/documentos/67321e53aea56d4af04854bf
Explore at:
Dataset updated
2024
Authors
Guerrero-Contreras, Gabriel; Balderas-Díaz, Sara; Serrano-Fernández, Alejandro; Muñoz, Andrés; Guerrero-Contreras, Gabriel; Balderas-Díaz, Sara; Serrano-Fernández, Alejandro; Muñoz, Andrés
Description
General Description

This dataset comprises 4,038 tweets in Spanish, related to discussions about artificial intelligence (AI), and was created and utilized in the publication "Enhancing Sentiment Analysis on Social Media: Integrating Text and Metadata for Refined Insights," (10.1109/IE61493.2024.10599899) presented at the 20th International Conference on Intelligent Environments. It is designed to support research on public perception, sentiment, and engagement with AI topics on social media from a Spanish-speaking perspective. Each entry includes detailed annotations covering sentiment analysis, user engagement metrics, and user profile characteristics, among others.

Data Collection Method

Tweets were gathered through the Twitter API v1.1 by targeting keywords and hashtags associated with artificial intelligence, focusing specifically on content in Spanish. The dataset captures a wide array of discussions, offering a holistic view of the Spanish-speaking public's sentiment towards AI.

Dataset Content

ID: A unique identifier for each tweet.

text: The textual content of the tweet. It is a string with a maximum allowed length of 280 characters.

polarity: The tweet's sentiment polarity (e.g., Positive, Negative, Neutral).

favorite_count: Indicates how many times the tweet has been liked by Twitter users. It is a non-negative integer.

retweet_count: The number of times this tweet has been retweeted. It is a non-negative integer.

user_verified: When true, indicates that the user has a verified account, which helps the public recognize the authenticity of accounts of public interest. It is a boolean data type with two allowed values: True or False.

user_default_profile: When true, indicates that the user has not altered the theme or background of their user profile. It is a boolean data type with two allowed values: True or False.

user_has_extended_profile: When true, indicates that the user has an extended profile. An extended profile on Twitter allows users to provide more detailed information about themselves, such as an extended biography, a header image, details about their location, website, and other additional data. It is a boolean data type with two allowed values: True or False.

user_followers_count: The current number of followers the account has. It is a non-negative integer.

user_friends_count: The number of users that the account is following. It is a non-negative integer.

user_favourites_count: The number of tweets this user has liked since the account was created. It is a non-negative integer.

user_statuses_count: The number of tweets (including retweets) posted by the user. It is a non-negative integer.

user_protected: When true, indicates that this user has chosen to protect their tweets, meaning their tweets are not publicly visible without their permission. It is a boolean data type with two allowed values: True or False.

user_is_translator: When true, indicates that the user posting the tweet is a verified translator on Twitter. This means they have been recognized and validated by the platform as translators of content in different languages. It is a boolean data type with two allowed values: True or False.

Cite as

Guerrero-Contreras, G., Balderas-Díaz, S., Serrano-Fernández, A., & Muñoz, A. (2024, June). Enhancing Sentiment Analysis on Social Media: Integrating Text and Metadata for Refined Insights. In 2024 International Conference on Intelligent Environments (IE) (pp. 62-69). IEEE.

Potential Use Cases

This dataset is aimed at academic researchers and practitioners with interests in:

Sentiment analysis and natural language processing (NLP) with a focus on AI discussions in the Spanish language.

Social media analysis on public engagement and perception of artificial intelligence among Spanish speakers.

Exploring correlations between user engagement metrics and sentiment in discussions about AI.

Data Format and File Type

The dataset is provided in CSV format, ensuring compatibility with a wide range of data analysis tools and programming environments.

License

The dataset is available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license, permitting sharing, copying, distribution, transmission, and adaptation of the work for any purpose, including commercial, provided proper attribution is given.
Twitter Sentiment Analysis of Election 2024
kaggle.com
zip
Updated Jan 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ghany Fitriamara (2024). Twitter Sentiment Analysis of Election 2024 [Dataset]. https://www.kaggle.com/datasets/ghanyfitria/twitter-sentiment-analysis-of-election-2024
Explore at:
zip(454401 bytes)Available download formats
Dataset updated
Jan 22, 2024
Authors
Ghany Fitriamara
Description
Dataset

This dataset was created by Ghany Fitriamara

Released under Other (specified in description)

Contents
twitter-dataset-tesla
huggingface.co
Updated Jul 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
fastai X Hugging Face Group 2022 (2022). twitter-dataset-tesla [Dataset]. https://huggingface.co/datasets/hugginglearners/twitter-dataset-tesla
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 11, 2022
Dataset provided by
Hugging Facehttps://huggingface.co/
Authors
fastai X Hugging Face Group 2022
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
Dataset Card for Twitter Dataset: Tesla

Dataset Summary

This dataset contains all the Tweets regarding #Tesla or #tesla till 12/07/2022 (dd-mm-yyyy). It can be used for sentiment analysis research purpose or used in other NLP tasks or just for fun. It contains 10,000 recent Tweets with the user ID, the hashtags used in the Tweets, and other important features.

Supported Tasks and Leaderboards

[More Information Needed]

Languages

[More… See the full description on the dataset page: https://huggingface.co/datasets/hugginglearners/twitter-dataset-tesla.
Z
Brussel mobility Twitter sentiment analysis CSV Dataset
data.niaid.nih.gov
zenodo.org
Updated May 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Betancur Arenas, Juliana (2024). Brussel mobility Twitter sentiment analysis CSV Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11401123
Explore at:
Dataset updated
May 31, 2024
Dataset provided by
van Vessem, Charlotte
Ginis, Vincent
Betancur Arenas, Juliana
Tori, Floriano
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Brussels
Description
SSH CENTRE (Social Sciences and Humanities for Climate, Energy aNd Transport Research Excellence) is a Horizon Europe project, engaging directly with stakeholders across research, policy, and business (including citizens) to strengthen social innovation, SSH-STEM collaboration, transdisciplinary policy advice, inclusive engagement, and SSH communities across Europe, accelerating the EU’s transition to carbon neutrality. SSH CENTRE is based in a range of activities related to Open Science, inclusivity and diversity – especially with regards Southern and Eastern Europe and different career stages – including: development of novel SSH-STEM collaborations to facilitate the delivery of the EU Green Deal; SSH knowledge brokerage to support regions in transition; and the effective design of strategies for citizen engagement in EU R&I activities. Outputs include action-led agendas and building stakeholder synergies through regular Policy Insight events.This is captured in a high-profile virtual SSH CENTRE generating and sharing best practice for SSH policy advice, overcoming fragmentation to accelerate the EU’s journey to a sustainable future.The documents uploaded here are part of WP2 whereby novel, interdisciplinary teams were provided funding to undertake activities to develop a policy recommendation related to EU Green Deal policy. Each of these policy recommendations, and the activities that inform them, will be written-up as a chapter in an edited book collection. Three books will make up this edited collection - one on climate, one on energy and one on mobility. As part of writing a chapter for the SSH CENTRE book on ‘Mobility’, we set out to analyse the sentiment of users on Twitter regarding shared and active mobility modes in Brussels. This involved us collecting tweets between 2017-2022. A tweet was collected if it contained a previously defined mobility keyword (for example: metro) and either the name of a (local) politician, a neighbourhood or municipality, or a (shared) mobility provider. The files attached to this Zenodo webpage is a csv files containing the tweets collected.”.
o
Report | OCEAN Token Sentiment Analysis
market.oceanprotocol.com
Updated Jun 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tarandros (2023). Report | OCEAN Token Sentiment Analysis [Dataset]. https://market.oceanprotocol.com/asset/did:op:9a24d68687f535e09f92c98ec875c0a29210ec153be954db3fd3c5ea9821f085
Explore at:
Dataset updated
Jun 14, 2023
Dataset authored and provided by
Tarandros
License
https://market.oceanprotocol.com/termshttps://market.oceanprotocol.com/terms
Description
This report delves into the correlation between Twitter engagement metrics, including likes, retweets, and influential tweets, and the price movements of the OCEAN token. By analyzing the relationship between these social media engagement indicators and the token's price, we aim to gain valuable insights into the impact of Twitter sentiment on OCEAN's market dynamics.

Additionally, this report showcases a Transformer model specifically designed for sentiment classification of tweets related to the OCEAN token. Leveraging the rich dataset of "The Twitter Financial Dataset (sentiment) version 1.0.0," the model classify tweets as bullish, bearish, or neutral. This classification capability allows us to gauge the prevailing sentiment of the Twitter community towards the OCEAN token.
d
EdChat Public Tweets
search.dataone.org
borealisdata.ca
Updated Dec 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gruzd, Anatoliy; Conroy, Nadia (2023). EdChat Public Tweets [Dataset]. https://search.dataone.org/view/sha256%3A1badf4ddc248d00bcd77d23dbff6f03aebe31d7ce40490aee2acbc79d468ecfa
Explore at:
Dataset updated
Dec 28, 2023
Dataset provided by
Borealis
Authors
Gruzd, Anatoliy; Conroy, Nadia
Description
This is a data set of 482,251 public tweets and retweets (Twitter IDs) posted by the #edchat online community of educators who discuss current trends in teaching with technology. The data set was collected via Twitter's Streaming API between Feb 1, 2018 and Apr 4, 2018, and was used as part of the research on developing a learning analytics dashboard for teaching and learning with Twitter. Following Twitter's terms of service, the data set only includes unique identifiers of relevant tweets. To collect the actual tweets that are part of this data set, you will need to use one of the available third party tools such as Hydrator or Twarc ("hydrate" function). As part of this release, we are also attaching an enriched version of this data set that contains sentiment and opinion analysis labels that were produced by analyzing each tweet with the help of the NLTK SentimentAnalyzer Python package. *This work was supported in part by eCampusOntario and The Social Sciences and Humanities Research Council of Canada.
SenTopX: A Benchmark Twitter Dataset for User Sentiment on Various Topics
zenodo.org
csv, zip
Updated May 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hina Qayyum; Hina Qayyum (2024). SenTopX: A Benchmark Twitter Dataset for User Sentiment on Various Topics [Dataset]. http://doi.org/10.5281/zenodo.11243662
Explore at:
zip, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11243662
Dataset updated
May 27, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Hina Qayyum; Hina Qayyum
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
May 25, 2024
Description
This is a longitudinal Twitter dataset of 143K users during the period 2017-2021. The following is the detail of all the files:

SenTopX_userIDs.txt: contains user IDs of 143K Twitter users.

userIDs_tweetIDs.zip: contains Tweet IDs of users, the name of the file is the user ID and the file contains the list of all the tweet IDs.

users_16_perspective_toxicity_scores.csv contains user IDs and 16 median Perspective API scores, the vector is shared as mean, median, and Gini Index of scores calculated over all tweets of a user.

LDAvis_top30_words_for_extracted_topics.csv contains the top 30 most relevant words extracted from each topic extracted by tweet-level topic modeling using the BERTweet topic model.

topic_modelling_statistics_per_user.csv contains important and relevant statistics related to topic modeling results:

1. user: This column represents the identifier for the user. Each row in the CSV corresponds to a specific user, and this column helps to track and differentiate between the users.

2. avg_topic_probability: This column contains the average probability of the topics for each user calculated across all of the tweets in order to compare users in a meaningful way. It represents the average likelihood that a particular user discusses various topics over the observed period.

3. maximum_topic_avg: This column holds the value of the highest average probability among all topics for each user. It indicates the topic that the user most frequently discusses, on average.

4. index_max_avg_topic_probability_200: This column specifies the index or identifier of the topic with the highest average probability out of 200 possible topics. It shows which topic (out of 200) the user discusses the most.

5. global_avg: This column includes the global average probability of topics across all users. It provides a baseline or overall average topic probability that can be used for comparative purposes.

6. max_global_avg: This column contains the maximum global average probability across all topics for all users. It identifies the most discussed topic across the entire user base.

7. index_max_global_avg: This column shows the index or identifier of the topic with the highest global average probability. It indicates which topic (out of 200) is the most popular across all users.

8. entropy_200_topic: This column represents the entropy of the topics for each user, calculated over 200 topics. Entropy measures the diversity or unpredictability in the user's discussion of topics, with higher entropy indicating more varied topic discussion.

In summary, these columns are used to analyze the topic engagement and preferences of users on a platform, highlighting the most frequently discussed topics, the variability in topic discussions, and how individual user behavior compares to overall trends.
Twitter Sentiment Analysis
kaggle.com
zip
Updated Mar 27, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Madhavi (2023). Twitter Sentiment Analysis [Dataset]. https://www.kaggle.com/datasets/madhavirpa/twitter-sentiment-analysis
Explore at:
zip(1982553 bytes)Available download formats
Dataset updated
Mar 27, 2023
Authors
Madhavi
Description
Dataset

This dataset was created by Madhavi

Released under Other (specified in description)

Contents
h
financial-tweets-sentiment
huggingface.co
Updated Jul 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tim Koornstra (2024). financial-tweets-sentiment [Dataset]. https://huggingface.co/datasets/TimKoornstra/financial-tweets-sentiment
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 8, 2024
Authors
Tim Koornstra
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Financial Sentiment Analysis Dataset

Overview

This dataset is a comprehensive collection of tweets focused on financial topics, meticulously curated to assist in sentiment analysis in the domain of finance and stock markets. It serves as a valuable resource for training machine learning models to understand and predict sentiment trends based on social media discourse, particularly within the financial sector.

Data Description

The dataset comprises… See the full description on the dataset page: https://huggingface.co/datasets/TimKoornstra/financial-tweets-sentiment.
Twitter Product Sentiment Analysis
kaggle.com
zip
Updated Sep 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Blesson Densil (2020). Twitter Product Sentiment Analysis [Dataset]. https://www.kaggle.com/blessondensil294/twitter-product-sentiment-analysis
Explore at:
zip(582707 bytes)Available download formats
Dataset updated
Sep 10, 2020
Authors
Blesson Densil
Description
Dataset

This dataset was created by Blesson Densil

Released under Data files © Original Authors

Contents
b
Tweets Dataset
brightdata.com
.json, .csv, .xlsx
Updated Nov 13, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2024). Tweets Dataset [Dataset]. https://brightdata.com/products/datasets/twitter/tweets
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Nov 13, 2024
Dataset authored and provided by
Bright Data
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Utilize our Tweets dataset for a range of applications to enhance business strategies and market insights. Analyzing this dataset offers a comprehensive view of social media dynamics, empowering organizations to optimize their communication and marketing strategies. Access the full dataset or select specific data points tailored to your needs. Popular use cases include sentiment analysis to gauge public opinion and brand perception, competitor analysis by examining engagement and sentiment around rival brands, and crisis management through real-time tracking of tweet sentiment and influential voices during critical events.
Data from: MAVIS Twitter dataset: A collection of tweets and sentiment...
zenodo.org
portalinvestigacion.uniovi.es
+1more
bin, tsv
Updated Dec 18, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alejandro Rodríguez González; Alejandro Rodríguez González; Juan Manuel Tuñas; Lucia Prieto Santamaría; Lucia Prieto Santamaría; Diego Fernandez Peces-Barba; Ernestina Menasalvas Ruiz; Ernestina Menasalvas Ruiz; Almudena Jaramillo; Manuel Cotarelo; Antonio J. Conejo Fernández; Amalia Arce; Angel Gil; Juan Manuel Tuñas; Diego Fernandez Peces-Barba; Almudena Jaramillo; Manuel Cotarelo; Antonio J. Conejo Fernández; Amalia Arce; Angel Gil (2020). MAVIS Twitter dataset: A collection of tweets and sentiment analysis in Spanish about vaccines and diseases during the period 2015-2018 [Dataset]. http://doi.org/10.5281/zenodo.4335594
Explore at:
tsv, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4335594
Dataset updated
Dec 18, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alejandro Rodríguez González; Alejandro Rodríguez González; Juan Manuel Tuñas; Lucia Prieto Santamaría; Lucia Prieto Santamaría; Diego Fernandez Peces-Barba; Ernestina Menasalvas Ruiz; Ernestina Menasalvas Ruiz; Almudena Jaramillo; Manuel Cotarelo; Antonio J. Conejo Fernández; Amalia Arce; Angel Gil; Juan Manuel Tuñas; Diego Fernandez Peces-Barba; Almudena Jaramillo; Manuel Cotarelo; Antonio J. Conejo Fernández; Amalia Arce; Angel Gil
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MAVIS dataset comprises a full knowledge base regarding Twitter messages published in Spanish during the period 2015-2018, in the context of sentiment analysis of specific vaccines and their related diseases. Such diseases and vaccines are summarized as follows:

Invasive meningococcal disease (“EMI” in Spanish): Bexsero, Trumenba, Nimenrix

Invasive pneumococcal disease (“ENI” in Spanish)

Influenza

Hepatitis

Rotavirus: Rotarix, Rotateq

Measles (“Sarampión” in Spanish) and MMR (“Triple vírica” in Spanish)

Sepsis

Whooping cough (“Tosferina” in Spanish)

Chickenpox (“Varicela” in Spanish): Varivax, Varilrix; and Shingles (“Zoster” in Spanish)

Human papillomavirus infection (“VPH” in Spanish): Cervarix, Gardasil

Tweets have been manually classified as having a negative or non-negative sentiment by 5 experts. Moreover, an automatic classification has been performed by 3 different tools: IBM Watson (now Watson Tone Analyzer, https://www.ibm.com/watson/services/tone-analyzer/), Google Cloud Natural Language (https://cloud.google.com/natural-language), and Meaning Cloud (https://www.meaningcloud.com/). IBM Watson and Google Cloud Natural Language returned a numerical sentiment score ranging from -1 to 1, while Meaning Cloud returned a categorical variable with the values ‘P+’, ‘P’, ‘NEU’, ‘N’ and ‘N+’, which were converted to 1, 2, 3, 4 and 5 respectively.

With these variables (IBM Watson, Google Cloud Natural Language, and Meaning Cloud annotations and the experts’ classification as the target label), a machine learning metamodel was developed. Tweets were also annotated with the sentiment output given by this classifier.

The provided data includes intrinsic tweets information, intrinsic information regarding the users that posted the tweets, the keywords mentioned in each tweet, and the annotations that the experts, the tools, and the model gave to each tweet.

Funding: This dataset was obtained with funding from MSD, Spain under MAVIS Study (VEAP ID: 7789).

Current studies using this dataset at the moment of the publication:

Rodríguez-González et al., “Creating a metamodel based on machine learning to identify the sentiment of vaccine and disease-related messages in Twitter: the MAVIS study” in 2020 IEEE 33st International Symposium on Computer-Based Medical Systems (CBMS), Jul. 2020, p. 6. DOI: 10.1109/CBMS49503.2020.00053

Rodríguez-González et al., "Identifying Polarity in Tweets from an Imbalanced Dataset about Diseases and Vaccines Using a Meta-Model Based on Machine Learning Techniques" in Applied Sciences, 2020, 10. DOI: 10.3390/app10249019
Twitter data for sentiment analysis
kaggle.com
zip
Updated Jun 7, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Subhashini (2020). Twitter data for sentiment analysis [Dataset]. https://www.kaggle.com/subhamila/twitter-data-for-sentiment-analysis
Explore at:
zip(5807698 bytes)Available download formats
Dataset updated
Jun 7, 2020
Authors
Subhashini
Description
Dataset

This dataset was created by Subhashini

Contents
Data from: RIFT: A Rule Induction Framework for Twitter Sentiment Analysis
figshare.com
html
Updated Aug 19, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zubair Asghar; Furqan Khan; Aurangzeb Khan; Fazal Masud Kundi (2017). RIFT: A Rule Induction Framework for Twitter Sentiment Analysis [Dataset]. http://doi.org/10.6084/m9.figshare.5327065.v1
Explore at:
htmlAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5327065.v1
Dataset updated
Aug 19, 2017
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Zubair Asghar; Furqan Khan; Aurangzeb Khan; Fazal Masud Kundi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The rapid evolution of microblogging and the emergence of sites such as Twitter have propelled online communities to flourish by enabling people to create, share and disseminate free-flowing messages and information globally. The exponential growth of product-based user reviews has become an ever-increasing resource playing a key role in emerging Twitter-based sentiment analysis (SA) techniques and applications to collect and analyse customer trends and reviews. Existing studies on supervised black-box sentiment analysis systems do not provide adequate information, regarding rules as to why a certain review was classified to a class or classification. The accuracy in some ways is less than our personal judgement. To address these shortcomings, alternative approaches, such as supervised white-box classification algorithms, need to be developed to improve the classification of Twitter-based microblogs. The purpose of this study was to develop a supervised white-box microblogging SA system to analyse user reviews on certain products using rough set theory (RST)-based rule induction algorithms. RST classifies microblogging reviews of products into positive, negative, or neutral class using different rules extracted from training decision tables using RST-centric rule induction algorithms. The primary focus of this study is also to perform sentiment classification of microblogs (i.e. also known as tweets) of product reviews using conventional, and RST-based rule induction algorithms. The proposed RST-centric rule induction algorithm, namely Learning from Examples Module version: 2, and LEM2 +" role="presentation" style="box-sizing: border-box; display: inline-table; line-height: normal; letter-spacing: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; position: relative;">++ Corpus-based rules (LEM2 +" role="presentation" style="box-sizing: border-box; display: inline-table; line-height: normal; letter-spacing: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; position: relative;">++ CBR),which is an extension of the traditional LEM2 algorithm, are used. Corpus-based rules are generated from tweets, which are unclassified using other conventional LEM2 algorithm rules. Experimental results show the proposed method, when compared with baseline methods, is excellent, with regard to accuracy, coverage and the number of rules employed. The approach using this method achieves an average accuracy of 92.57% and an average coverage of 100%, with an average number of rules of 19.14.
Z
CCUS Sentiment Analysis - Tweets Dataset
data.niaid.nih.gov
zenodo.org
Updated May 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Padilla, Marielisa (2024). CCUS Sentiment Analysis - Tweets Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11202682
Explore at:
Dataset updated
May 16, 2024
Dataset provided by
Sánchez, Alberto
Padilla, Marielisa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The present dataset contains Tweets in any language supported by Twitter obtained during the months January to March 2023, with any mention to the topic CCS/CCUS. The scraping process were done in Python, using the official Twitter API. All tweets were manually annotated after being machine translated into English.

Structure Every row contains: 1st cell (A): Language 2nd cell (B): Tweet-text 3rd cell (Cc: Benefit 4th cell (D): Concern 5th cell (E): Perception – Fight climate change 6th cell (F): Perception – Climate-friendly technology 7th cell (G): Perception – Extensive R&D needed 8th cell (H): Perception – Better options than CCS 9th cell (I): Sentiment 10th cell (J): Relatedness 11th cell (K): Comments

Annotations Benefit Preventing c. change Reducing c. change risks Safeguarding jobs Creating new jobs Fossil energy production envir. friendly Products envir. friendly Reducing envir. impact Other None Concern Accidents Leakages Environmental Earthquake-related Increased local traffic Investment Greenwashing Lock-in effects for fossil energy Increase cost Other None Perception (Yes / No / None) Fight climate change Climate-friendly technology Extensive R&D needed Better options than CCS Sentiment Positive Negative Neutral
#IndonesiaHumanRightsSOS Twitter Hashtag Tweets Dataset
zenodo.org
data.niaid.nih.gov
bin, csv, png, txt
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Azmi Nawwar; Azmi Nawwar (2024). #IndonesiaHumanRightsSOS Twitter Hashtag Tweets Dataset [Dataset]. http://doi.org/10.5281/zenodo.4362505
Explore at:
txt, bin, png, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4362505
Dataset updated
Jul 19, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Azmi Nawwar; Azmi Nawwar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset ini merupakan hasil dari scraping pada media sosial twitter dengan menggunakan aplikasi twint yang ditujukan pada hashtag #IndonesiaHumanRightsSOS. Scraping data dilakukan untuk cuitan yang dibuat dari tanggal 18 Desember 2020 10:59 AM s/d 19 Desember 2020 23:18 PM.

Pada dataset mengandung 106.903 Row data dengan informasi terkait: User ID, Username, Twitter Name,Tweets, dsb.

Selain itu dilampirkan juga contoh data yang telah dianalisis berupa wordcloud,username cloud, 100 most used word & most active username.

-

This dataset is the result of scraping on social media twitter using the twint application aimed at the hashtag #IndonesiaHumanRightsSOS. Data scraping is done for tweets made from December 18 2020 10:59 AM to December 19 2020 23:18 PM.

The dataset contains 106,903 rows of data with related information: User ID, Username, Twitter Name, Tweets, etc.

Also there is an example of the data that has been analyzed in the form of wordcloud, username cloud, 100 most used words & most active username.
SMILE Twitter Emotion dataset
figshare.com
kaggle.com
txt
Updated Apr 21, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bo Wang; Adam Tsakalidis; Maria Liakata; Arkaitz Zubiaga; Rob Procter; Eric Jensen (2016). SMILE Twitter Emotion dataset [Dataset]. http://doi.org/10.6084/m9.figshare.3187909.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3187909.v2
Dataset updated
Apr 21, 2016
Dataset provided by
figshare
Authors
Bo Wang; Adam Tsakalidis; Maria Liakata; Arkaitz Zubiaga; Rob Procter; Eric Jensen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is collected and annotated for the SMILE project http://www.culturesmile.org. This collection of tweets mentioning 13 Twitter handles associated with British museums was gathered between May 2013 and June 2015. It was created for the purpose of classifying emotions, expressed on Twitter towards arts and cultural experiences in museums. It contains 3,085 tweets, with 5 emotions namely anger, disgust, happiness, surprise and sadness. Please see our paper "SMILE: Twitter Emotion Classification using Domain Adaptation" for more details of the dataset.License: The annotations are provided under a CC-BY license, while Twitter retains the ownership and rights of the content of the tweets.
f
Number of initial and usable tweets for extracting the discussion topic.
figshare.com
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
James Waters; Nicos Nicolaou; Dimosthenis Stefanidis; Hariton Efstathiades; George Pallis; Marios Dikaiakos (2023). Number of initial and usable tweets for extracting the discussion topic. [Dataset]. http://doi.org/10.1371/journal.pone.0254337.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0254337.t006
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
James Waters; Nicos Nicolaou; Dimosthenis Stefanidis; Hariton Efstathiades; George Pallis; Marios Dikaiakos
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Number of initial and usable tweets for extracting the discussion topic.

Facebook

Twitter

Click to copy link

Link copied

Cite

Gong Xiangbo (2024). large-twitter-tweets-sentiment [Dataset]. https://huggingface.co/datasets/gxb912/large-twitter-tweets-sentiment

large-twitter-tweets-sentiment

s

gxb912/large-twitter-tweets-sentiment

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Mar 6, 2024

Authors

Gong Xiangbo

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Dataset Card for "Large twitter tweets sentiment analysis"

  Dataset Description







  Dataset Summary

This dataset is a collection of tweets formatted in a tabular data structure, annotated for sentiment analysis. Each tweet is associated with a sentiment label, with 1 indicating a Positive sentiment and 0 for a Negative sentiment.

  Languages

The tweets in English.

  Dataset Structure







  Data Instances

An instance of… See the full description on the dataset page: https://huggingface.co/datasets/gxb912/large-twitter-tweets-sentiment.

Clear search

Close search

Google apps

Main menu

large-twitter-tweets-sentiment

Twitter dataset

Data from: IA Tweets Analysis Dataset (Spanish)

Twitter Sentiment Analysis of Election 2024

Dataset

Contents

twitter-dataset-tesla

Brussel mobility Twitter sentiment analysis CSV Dataset

Report | OCEAN Token Sentiment Analysis

EdChat Public Tweets

SenTopX: A Benchmark Twitter Dataset for User Sentiment on Various Topics

Twitter Sentiment Analysis

Dataset

Contents

financial-tweets-sentiment

Twitter Product Sentiment Analysis

Dataset

Contents

Tweets Dataset

Data from: MAVIS Twitter dataset: A collection of tweets and sentiment...

Twitter data for sentiment analysis

Dataset

Contents

Data from: RIFT: A Rule Induction Framework for Twitter Sentiment Analysis

CCUS Sentiment Analysis - Tweets Dataset

#IndonesiaHumanRightsSOS Twitter Hashtag Tweets Dataset

SMILE Twitter Emotion dataset

Number of initial and usable tweets for extracting the discussion topic.

large-twitter-tweets-sentimentSee More Versions

s

gxb912/large-twitter-tweets-sentiment

large-twitter-tweets-sentiment