100+ datasets found
  1. h

    large-twitter-tweets-sentiment

    • huggingface.co
    Updated Mar 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gong Xiangbo (2024). large-twitter-tweets-sentiment [Dataset]. https://huggingface.co/datasets/gxb912/large-twitter-tweets-sentiment
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 6, 2024
    Authors
    Gong Xiangbo
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for "Large twitter tweets sentiment analysis"

      Dataset Description
    
    
    
    
    
    
    
      Dataset Summary
    

    This dataset is a collection of tweets formatted in a tabular data structure, annotated for sentiment analysis. Each tweet is associated with a sentiment label, with 1 indicating a Positive sentiment and 0 for a Negative sentiment.

      Languages
    

    The tweets in English.

      Dataset Structure
    
    
    
    
    
    
    
      Data Instances
    

    An instance of
 See the full description on the dataset page: https://huggingface.co/datasets/gxb912/large-twitter-tweets-sentiment.

  2. Twitter dataset

    • figshare.com
    csv
    Updated Feb 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shreyas Poojary; Mohammed Riza; Rashmi Laxmikant Malghan (2025). Twitter dataset [Dataset]. http://doi.org/10.6084/m9.figshare.28390334.v2
    Explore at:
    csvAvailable download formats
    Dataset updated
    Feb 11, 2025
    Dataset provided by
    figshare
    Authors
    Shreyas Poojary; Mohammed Riza; Rashmi Laxmikant Malghan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains tweets labeled for sentiment analysis, categorized into Positive, Negative, and Neutral sentiments. The dataset includes tweet IDs, user metadata, sentiment labels, and tweet text, making it suitable for Natural Language Processing (NLP), machine learning, and AI-based sentiment classification research. Originally sourced from Kaggle, this dataset is curated for improved usability in social media sentiment analysis.

  3. u

    Data from: IA Tweets Analysis Dataset (Spanish)

    • produccioncientifica.uca.es
    • data.niaid.nih.gov
    • +1more
    Updated 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guerrero-Contreras, Gabriel; Balderas-Díaz, Sara; Serrano-Fernåndez, Alejandro; Muñoz, Andrés; Guerrero-Contreras, Gabriel; Balderas-Díaz, Sara; Serrano-Fernåndez, Alejandro; Muñoz, Andrés (2024). IA Tweets Analysis Dataset (Spanish) [Dataset]. https://produccioncientifica.uca.es/documentos/67321e53aea56d4af04854bf
    Explore at:
    Dataset updated
    2024
    Authors
    Guerrero-Contreras, Gabriel; Balderas-Díaz, Sara; Serrano-Fernåndez, Alejandro; Muñoz, Andrés; Guerrero-Contreras, Gabriel; Balderas-Díaz, Sara; Serrano-Fernåndez, Alejandro; Muñoz, Andrés
    Description

    General Description

    This dataset comprises 4,038 tweets in Spanish, related to discussions about artificial intelligence (AI), and was created and utilized in the publication "Enhancing Sentiment Analysis on Social Media: Integrating Text and Metadata for Refined Insights," (10.1109/IE61493.2024.10599899) presented at the 20th International Conference on Intelligent Environments. It is designed to support research on public perception, sentiment, and engagement with AI topics on social media from a Spanish-speaking perspective. Each entry includes detailed annotations covering sentiment analysis, user engagement metrics, and user profile characteristics, among others.

    Data Collection Method

    Tweets were gathered through the Twitter API v1.1 by targeting keywords and hashtags associated with artificial intelligence, focusing specifically on content in Spanish. The dataset captures a wide array of discussions, offering a holistic view of the Spanish-speaking public's sentiment towards AI.

    Dataset Content

    ID: A unique identifier for each tweet.

    text: The textual content of the tweet. It is a string with a maximum allowed length of 280 characters.

    polarity: The tweet's sentiment polarity (e.g., Positive, Negative, Neutral).

    favorite_count: Indicates how many times the tweet has been liked by Twitter users. It is a non-negative integer.

    retweet_count: The number of times this tweet has been retweeted. It is a non-negative integer.

    user_verified: When true, indicates that the user has a verified account, which helps the public recognize the authenticity of accounts of public interest. It is a boolean data type with two allowed values: True or False.

    user_default_profile: When true, indicates that the user has not altered the theme or background of their user profile. It is a boolean data type with two allowed values: True or False.

    user_has_extended_profile: When true, indicates that the user has an extended profile. An extended profile on Twitter allows users to provide more detailed information about themselves, such as an extended biography, a header image, details about their location, website, and other additional data. It is a boolean data type with two allowed values: True or False.

    user_followers_count: The current number of followers the account has. It is a non-negative integer.

    user_friends_count: The number of users that the account is following. It is a non-negative integer.

    user_favourites_count: The number of tweets this user has liked since the account was created. It is a non-negative integer.

    user_statuses_count: The number of tweets (including retweets) posted by the user. It is a non-negative integer.

    user_protected: When true, indicates that this user has chosen to protect their tweets, meaning their tweets are not publicly visible without their permission. It is a boolean data type with two allowed values: True or False.

    user_is_translator: When true, indicates that the user posting the tweet is a verified translator on Twitter. This means they have been recognized and validated by the platform as translators of content in different languages. It is a boolean data type with two allowed values: True or False.

    Cite as

    Guerrero-Contreras, G., Balderas-Díaz, S., Serrano-Fernåndez, A., & Muñoz, A. (2024, June). Enhancing Sentiment Analysis on Social Media: Integrating Text and Metadata for Refined Insights. In 2024 International Conference on Intelligent Environments (IE) (pp. 62-69). IEEE.

    Potential Use Cases

    This dataset is aimed at academic researchers and practitioners with interests in:

    Sentiment analysis and natural language processing (NLP) with a focus on AI discussions in the Spanish language.

    Social media analysis on public engagement and perception of artificial intelligence among Spanish speakers.

    Exploring correlations between user engagement metrics and sentiment in discussions about AI.

    Data Format and File Type

    The dataset is provided in CSV format, ensuring compatibility with a wide range of data analysis tools and programming environments.

    License

    The dataset is available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license, permitting sharing, copying, distribution, transmission, and adaptation of the work for any purpose, including commercial, provided proper attribution is given.

  4. Twitter Sentiment Analysis of Election 2024

    • kaggle.com
    zip
    Updated Jan 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ghany Fitriamara (2024). Twitter Sentiment Analysis of Election 2024 [Dataset]. https://www.kaggle.com/datasets/ghanyfitria/twitter-sentiment-analysis-of-election-2024
    Explore at:
    zip(454401 bytes)Available download formats
    Dataset updated
    Jan 22, 2024
    Authors
    Ghany Fitriamara
    Description

    Dataset

    This dataset was created by Ghany Fitriamara

    Released under Other (specified in description)

    Contents

  5. twitter-dataset-tesla

    • huggingface.co
    Updated Jul 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    fastai X Hugging Face Group 2022 (2022). twitter-dataset-tesla [Dataset]. https://huggingface.co/datasets/hugginglearners/twitter-dataset-tesla
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 11, 2022
    Dataset provided by
    Hugging Facehttps://huggingface.co/
    Authors
    fastai X Hugging Face Group 2022
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    Dataset Card for Twitter Dataset: Tesla

      Dataset Summary
    

    This dataset contains all the Tweets regarding #Tesla or #tesla till 12/07/2022 (dd-mm-yyyy). It can be used for sentiment analysis research purpose or used in other NLP tasks or just for fun. It contains 10,000 recent Tweets with the user ID, the hashtags used in the Tweets, and other important features.

      Supported Tasks and Leaderboards
    

    [More Information Needed]

      Languages
    

    [More
 See the full description on the dataset page: https://huggingface.co/datasets/hugginglearners/twitter-dataset-tesla.

  6. Z

    Brussel mobility Twitter sentiment analysis CSV Dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated May 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Betancur Arenas, Juliana (2024). Brussel mobility Twitter sentiment analysis CSV Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11401123
    Explore at:
    Dataset updated
    May 31, 2024
    Dataset provided by
    van Vessem, Charlotte
    Ginis, Vincent
    Betancur Arenas, Juliana
    Tori, Floriano
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Brussels
    Description

    SSH CENTRE (Social Sciences and Humanities for Climate, Energy aNd Transport Research Excellence) is a Horizon Europe project, engaging directly with stakeholders across research, policy, and business (including citizens) to strengthen social innovation, SSH-STEM collaboration, transdisciplinary policy advice, inclusive engagement, and SSH communities across Europe, accelerating the EU’s transition to carbon neutrality. SSH CENTRE is based in a range of activities related to Open Science, inclusivity and diversity – especially with regards Southern and Eastern Europe and different career stages – including: development of novel SSH-STEM collaborations to facilitate the delivery of the EU Green Deal; SSH knowledge brokerage to support regions in transition; and the effective design of strategies for citizen engagement in EU R&I activities. Outputs include action-led agendas and building stakeholder synergies through regular Policy Insight events.This is captured in a high-profile virtual SSH CENTRE generating and sharing best practice for SSH policy advice, overcoming fragmentation to accelerate the EU’s journey to a sustainable future.The documents uploaded here are part of WP2 whereby novel, interdisciplinary teams were provided funding to undertake activities to develop a policy recommendation related to EU Green Deal policy. Each of these policy recommendations, and the activities that inform them, will be written-up as a chapter in an edited book collection. Three books will make up this edited collection - one on climate, one on energy and one on mobility. As part of writing a chapter for the SSH CENTRE book on ‘Mobility’, we set out to analyse the sentiment of users on Twitter regarding shared and active mobility modes in Brussels. This involved us collecting tweets between 2017-2022. A tweet was collected if it contained a previously defined mobility keyword (for example: metro) and either the name of a (local) politician, a neighbourhood or municipality, or a (shared) mobility provider. The files attached to this Zenodo webpage is a csv files containing the tweets collected.”.

  7. o

    Report | OCEAN Token Sentiment Analysis

    • market.oceanprotocol.com
    Updated Jun 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tarandros (2023). Report | OCEAN Token Sentiment Analysis [Dataset]. https://market.oceanprotocol.com/asset/did:op:9a24d68687f535e09f92c98ec875c0a29210ec153be954db3fd3c5ea9821f085
    Explore at:
    Dataset updated
    Jun 14, 2023
    Dataset authored and provided by
    Tarandros
    License

    https://market.oceanprotocol.com/termshttps://market.oceanprotocol.com/terms

    Description

    This report delves into the correlation between Twitter engagement metrics, including likes, retweets, and influential tweets, and the price movements of the OCEAN token. By analyzing the relationship between these social media engagement indicators and the token's price, we aim to gain valuable insights into the impact of Twitter sentiment on OCEAN's market dynamics.

    Additionally, this report showcases a Transformer model specifically designed for sentiment classification of tweets related to the OCEAN token. Leveraging the rich dataset of "The Twitter Financial Dataset (sentiment) version 1.0.0," the model classify tweets as bullish, bearish, or neutral. This classification capability allows us to gauge the prevailing sentiment of the Twitter community towards the OCEAN token.

  8. d

    EdChat Public Tweets

    • search.dataone.org
    • borealisdata.ca
    Updated Dec 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gruzd, Anatoliy; Conroy, Nadia (2023). EdChat Public Tweets [Dataset]. https://search.dataone.org/view/sha256%3A1badf4ddc248d00bcd77d23dbff6f03aebe31d7ce40490aee2acbc79d468ecfa
    Explore at:
    Dataset updated
    Dec 28, 2023
    Dataset provided by
    Borealis
    Authors
    Gruzd, Anatoliy; Conroy, Nadia
    Description

    This is a data set of 482,251 public tweets and retweets (Twitter IDs) posted by the #edchat online community of educators who discuss current trends in teaching with technology. The data set was collected via Twitter's Streaming API between Feb 1, 2018 and Apr 4, 2018, and was used as part of the research on developing a learning analytics dashboard for teaching and learning with Twitter. Following Twitter's terms of service, the data set only includes unique identifiers of relevant tweets. To collect the actual tweets that are part of this data set, you will need to use one of the available third party tools such as Hydrator or Twarc ("hydrate" function). As part of this release, we are also attaching an enriched version of this data set that contains sentiment and opinion analysis labels that were produced by analyzing each tweet with the help of the NLTK SentimentAnalyzer Python package. *This work was supported in part by eCampusOntario and The Social Sciences and Humanities Research Council of Canada.

  9. SenTopX: A Benchmark Twitter Dataset for User Sentiment on Various Topics

    • zenodo.org
    csv, zip
    Updated May 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hina Qayyum; Hina Qayyum (2024). SenTopX: A Benchmark Twitter Dataset for User Sentiment on Various Topics [Dataset]. http://doi.org/10.5281/zenodo.11243662
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    May 27, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Hina Qayyum; Hina Qayyum
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    May 25, 2024
    Description

    This is a longitudinal Twitter dataset of 143K users during the period 2017-2021. The following is the detail of all the files:

    • SenTopX_userIDs.txt: contains user IDs of 143K Twitter users.
    • userIDs_tweetIDs.zip: contains Tweet IDs of users, the name of the file is the user ID and the file contains the list of all the tweet IDs.
    • users_16_perspective_toxicity_scores.csv contains user IDs and 16 median Perspective API scores, the vector is shared as mean, median, and Gini Index of scores calculated over all tweets of a user.
    • LDAvis_top30_words_for_extracted_topics.csv contains the top 30 most relevant words extracted from each topic extracted by tweet-level topic modeling using the BERTweet topic model.
    • topic_modelling_statistics_per_user.csv contains important and relevant statistics related to topic modeling results:
      • 1. user: This column represents the identifier for the user. Each row in the CSV corresponds to a specific user, and this column helps to track and differentiate between the users.

        2. avg_topic_probability: This column contains the average probability of the topics for each user calculated across all of the tweets in order to compare users in a meaningful way. It represents the average likelihood that a particular user discusses various topics over the observed period.

        3. maximum_topic_avg: This column holds the value of the highest average probability among all topics for each user. It indicates the topic that the user most frequently discusses, on average.

        4. index_max_avg_topic_probability_200: This column specifies the index or identifier of the topic with the highest average probability out of 200 possible topics. It shows which topic (out of 200) the user discusses the most.

        5. global_avg: This column includes the global average probability of topics across all users. It provides a baseline or overall average topic probability that can be used for comparative purposes.

        6. max_global_avg: This column contains the maximum global average probability across all topics for all users. It identifies the most discussed topic across the entire user base.

        7. index_max_global_avg: This column shows the index or identifier of the topic with the highest global average probability. It indicates which topic (out of 200) is the most popular across all users.

        8. entropy_200_topic: This column represents the entropy of the topics for each user, calculated over 200 topics. Entropy measures the diversity or unpredictability in the user's discussion of topics, with higher entropy indicating more varied topic discussion.

        In summary, these columns are used to analyze the topic engagement and preferences of users on a platform, highlighting the most frequently discussed topics, the variability in topic discussions, and how individual user behavior compares to overall trends.

  10. Twitter Sentiment Analysis

    • kaggle.com
    zip
    Updated Mar 27, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Madhavi (2023). Twitter Sentiment Analysis [Dataset]. https://www.kaggle.com/datasets/madhavirpa/twitter-sentiment-analysis
    Explore at:
    zip(1982553 bytes)Available download formats
    Dataset updated
    Mar 27, 2023
    Authors
    Madhavi
    Description

    Dataset

    This dataset was created by Madhavi

    Released under Other (specified in description)

    Contents

  11. h

    financial-tweets-sentiment

    • huggingface.co
    Updated Jul 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tim Koornstra (2024). financial-tweets-sentiment [Dataset]. https://huggingface.co/datasets/TimKoornstra/financial-tweets-sentiment
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 8, 2024
    Authors
    Tim Koornstra
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Financial Sentiment Analysis Dataset

      Overview
    

    This dataset is a comprehensive collection of tweets focused on financial topics, meticulously curated to assist in sentiment analysis in the domain of finance and stock markets. It serves as a valuable resource for training machine learning models to understand and predict sentiment trends based on social media discourse, particularly within the financial sector.

      Data Description
    

    The dataset comprises
 See the full description on the dataset page: https://huggingface.co/datasets/TimKoornstra/financial-tweets-sentiment.

  12. Twitter Product Sentiment Analysis

    • kaggle.com
    zip
    Updated Sep 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Blesson Densil (2020). Twitter Product Sentiment Analysis [Dataset]. https://www.kaggle.com/blessondensil294/twitter-product-sentiment-analysis
    Explore at:
    zip(582707 bytes)Available download formats
    Dataset updated
    Sep 10, 2020
    Authors
    Blesson Densil
    Description

    Dataset

    This dataset was created by Blesson Densil

    Released under Data files © Original Authors

    Contents

  13. b

    Tweets Dataset

    • brightdata.com
    .json, .csv, .xlsx
    Updated Nov 13, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2024). Tweets Dataset [Dataset]. https://brightdata.com/products/datasets/twitter/tweets
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Nov 13, 2024
    Dataset authored and provided by
    Bright Data
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Utilize our Tweets dataset for a range of applications to enhance business strategies and market insights. Analyzing this dataset offers a comprehensive view of social media dynamics, empowering organizations to optimize their communication and marketing strategies. Access the full dataset or select specific data points tailored to your needs. Popular use cases include sentiment analysis to gauge public opinion and brand perception, competitor analysis by examining engagement and sentiment around rival brands, and crisis management through real-time tracking of tweet sentiment and influential voices during critical events.

  14. Data from: MAVIS Twitter dataset: A collection of tweets and sentiment...

    • zenodo.org
    • portalinvestigacion.uniovi.es
    • +1more
    bin, tsv
    Updated Dec 18, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alejandro Rodríguez Gonzålez; Alejandro Rodríguez Gonzålez; Juan Manuel Tuñas; Lucia Prieto Santamaría; Lucia Prieto Santamaría; Diego Fernandez Peces-Barba; Ernestina Menasalvas Ruiz; Ernestina Menasalvas Ruiz; Almudena Jaramillo; Manuel Cotarelo; Antonio J. Conejo Fernåndez; Amalia Arce; Angel Gil; Juan Manuel Tuñas; Diego Fernandez Peces-Barba; Almudena Jaramillo; Manuel Cotarelo; Antonio J. Conejo Fernåndez; Amalia Arce; Angel Gil (2020). MAVIS Twitter dataset: A collection of tweets and sentiment analysis in Spanish about vaccines and diseases during the period 2015-2018 [Dataset]. http://doi.org/10.5281/zenodo.4335594
    Explore at:
    tsv, binAvailable download formats
    Dataset updated
    Dec 18, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alejandro Rodríguez Gonzålez; Alejandro Rodríguez Gonzålez; Juan Manuel Tuñas; Lucia Prieto Santamaría; Lucia Prieto Santamaría; Diego Fernandez Peces-Barba; Ernestina Menasalvas Ruiz; Ernestina Menasalvas Ruiz; Almudena Jaramillo; Manuel Cotarelo; Antonio J. Conejo Fernåndez; Amalia Arce; Angel Gil; Juan Manuel Tuñas; Diego Fernandez Peces-Barba; Almudena Jaramillo; Manuel Cotarelo; Antonio J. Conejo Fernåndez; Amalia Arce; Angel Gil
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MAVIS dataset comprises a full knowledge base regarding Twitter messages published in Spanish during the period 2015-2018, in the context of sentiment analysis of specific vaccines and their related diseases. Such diseases and vaccines are summarized as follows:

    • Invasive meningococcal disease (“EMI” in Spanish): Bexsero, Trumenba, Nimenrix
    • Invasive pneumococcal disease (“ENI” in Spanish)
    • Influenza
    • Hepatitis
    • Rotavirus: Rotarix, Rotateq
    • Measles (“SarampiĂłn” in Spanish) and MMR (“Triple vĂ­rica” in Spanish)
    • Sepsis
    • Whooping cough (“Tosferina” in Spanish)
    • Chickenpox (“Varicela” in Spanish): Varivax, Varilrix; and Shingles (“Zoster” in Spanish)
    • Human papillomavirus infection (“VPH” in Spanish): Cervarix, Gardasil

    Tweets have been manually classified as having a negative or non-negative sentiment by 5 experts. Moreover, an automatic classification has been performed by 3 different tools: IBM Watson (now Watson Tone Analyzer, https://www.ibm.com/watson/services/tone-analyzer/), Google Cloud Natural Language (https://cloud.google.com/natural-language), and Meaning Cloud (https://www.meaningcloud.com/). IBM Watson and Google Cloud Natural Language returned a numerical sentiment score ranging from -1 to 1, while Meaning Cloud returned a categorical variable with the values ‘P+’, ‘P’, ‘NEU’, ‘N’ and ‘N+’, which were converted to 1, 2, 3, 4 and 5 respectively.

    With these variables (IBM Watson, Google Cloud Natural Language, and Meaning Cloud annotations and the experts’ classification as the target label), a machine learning metamodel was developed. Tweets were also annotated with the sentiment output given by this classifier.

    The provided data includes intrinsic tweets information, intrinsic information regarding the users that posted the tweets, the keywords mentioned in each tweet, and the annotations that the experts, the tools, and the model gave to each tweet.

    Funding: This dataset was obtained with funding from MSD, Spain under MAVIS Study (VEAP ID: 7789).

    Current studies using this dataset at the moment of the publication:

    • RodrĂ­guez-GonzĂĄlez et al., “Creating a metamodel based on machine learning to identify the sentiment of vaccine and disease-related messages in Twitter: the MAVIS study” in 2020 IEEE 33st International Symposium on Computer-Based Medical Systems (CBMS), Jul. 2020, p. 6. DOI: 10.1109/CBMS49503.2020.00053
    • RodrĂ­guez-GonzĂĄlez et al., "Identifying Polarity in Tweets from an Imbalanced Dataset about Diseases and Vaccines Using a Meta-Model Based on Machine Learning Techniques" in Applied Sciences, 2020, 10. DOI: 10.3390/app10249019
  15. Twitter data for sentiment analysis

    • kaggle.com
    zip
    Updated Jun 7, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Subhashini (2020). Twitter data for sentiment analysis [Dataset]. https://www.kaggle.com/subhamila/twitter-data-for-sentiment-analysis
    Explore at:
    zip(5807698 bytes)Available download formats
    Dataset updated
    Jun 7, 2020
    Authors
    Subhashini
    Description

    Dataset

    This dataset was created by Subhashini

    Contents

  16. Data from: RIFT: A Rule Induction Framework for Twitter Sentiment Analysis

    • figshare.com
    html
    Updated Aug 19, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zubair Asghar; Furqan Khan; Aurangzeb Khan; Fazal Masud Kundi (2017). RIFT: A Rule Induction Framework for Twitter Sentiment Analysis [Dataset]. http://doi.org/10.6084/m9.figshare.5327065.v1
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Aug 19, 2017
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Zubair Asghar; Furqan Khan; Aurangzeb Khan; Fazal Masud Kundi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The rapid evolution of microblogging and the emergence of sites such as Twitter have propelled online communities to flourish by enabling people to create, share and disseminate free-flowing messages and information globally. The exponential growth of product-based user reviews has become an ever-increasing resource playing a key role in emerging Twitter-based sentiment analysis (SA) techniques and applications to collect and analyse customer trends and reviews. Existing studies on supervised black-box sentiment analysis systems do not provide adequate information, regarding rules as to why a certain review was classified to a class or classification. The accuracy in some ways is less than our personal judgement. To address these shortcomings, alternative approaches, such as supervised white-box classification algorithms, need to be developed to improve the classification of Twitter-based microblogs. The purpose of this study was to develop a supervised white-box microblogging SA system to analyse user reviews on certain products using rough set theory (RST)-based rule induction algorithms. RST classifies microblogging reviews of products into positive, negative, or neutral class using different rules extracted from training decision tables using RST-centric rule induction algorithms. The primary focus of this study is also to perform sentiment classification of microblogs (i.e. also known as tweets) of product reviews using conventional, and RST-based rule induction algorithms. The proposed RST-centric rule induction algorithm, namely Learning from Examples Module version: 2, and LEM2 +" role="presentation" style="box-sizing: border-box; display: inline-table; line-height: normal; letter-spacing: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; position: relative;">++ Corpus-based rules (LEM2 +" role="presentation" style="box-sizing: border-box; display: inline-table; line-height: normal; letter-spacing: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; position: relative;">++ CBR),which is an extension of the traditional LEM2 algorithm, are used. Corpus-based rules are generated from tweets, which are unclassified using other conventional LEM2 algorithm rules. Experimental results show the proposed method, when compared with baseline methods, is excellent, with regard to accuracy, coverage and the number of rules employed. The approach using this method achieves an average accuracy of 92.57% and an average coverage of 100%, with an average number of rules of 19.14.

  17. Z

    CCUS Sentiment Analysis - Tweets Dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated May 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Padilla, Marielisa (2024). CCUS Sentiment Analysis - Tweets Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11202682
    Explore at:
    Dataset updated
    May 16, 2024
    Dataset provided by
    SĂĄnchez, Alberto
    Padilla, Marielisa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The present dataset contains Tweets in any language supported by Twitter obtained during the months January to March 2023, with any mention to the topic CCS/CCUS. The scraping process were done in Python, using the official Twitter API. All tweets were manually annotated after being machine translated into English.

    • Structure Every row contains: 1st cell (A): Language 2nd cell (B): Tweet-text 3rd cell (Cc: Benefit 4th cell (D): Concern 5th cell (E): Perception – Fight climate change 6th cell (F): Perception – Climate-friendly technology 7th cell (G): Perception – Extensive R&D needed 8th cell (H): Perception – Better options than CCS 9th cell (I): Sentiment 10th cell (J): Relatedness 11th cell (K): Comments

    • Annotations Benefit Preventing c. change Reducing c. change risks Safeguarding jobs Creating new jobs Fossil energy production envir. friendly Products envir. friendly Reducing envir. impact Other None Concern Accidents Leakages Environmental Earthquake-related Increased local traffic Investment Greenwashing Lock-in effects for fossil energy Increase cost Other None Perception (Yes / No / None) Fight climate change Climate-friendly technology Extensive R&D needed Better options than CCS Sentiment Positive Negative Neutral

  18. #IndonesiaHumanRightsSOS Twitter Hashtag Tweets Dataset

    • zenodo.org
    • data.niaid.nih.gov
    bin, csv, png, txt
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Azmi Nawwar; Azmi Nawwar (2024). #IndonesiaHumanRightsSOS Twitter Hashtag Tweets Dataset [Dataset]. http://doi.org/10.5281/zenodo.4362505
    Explore at:
    txt, bin, png, csvAvailable download formats
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Azmi Nawwar; Azmi Nawwar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset ini merupakan hasil dari scraping pada media sosial twitter dengan menggunakan aplikasi twint yang ditujukan pada hashtag #IndonesiaHumanRightsSOS. Scraping data dilakukan untuk cuitan yang dibuat dari tanggal 18 Desember 2020 10:59 AM s/d 19 Desember 2020 23:18 PM.

    Pada dataset mengandung 106.903 Row data dengan informasi terkait: User ID, Username, Twitter Name,Tweets, dsb.

    Selain itu dilampirkan juga contoh data yang telah dianalisis berupa wordcloud,username cloud, 100 most used word & most active username.

    -

    This dataset is the result of scraping on social media twitter using the twint application aimed at the hashtag #IndonesiaHumanRightsSOS. Data scraping is done for tweets made from December 18 2020 10:59 AM to December 19 2020 23:18 PM.

    The dataset contains 106,903 rows of data with related information: User ID, Username, Twitter Name, Tweets, etc.

    Also there is an example of the data that has been analyzed in the form of wordcloud, username cloud, 100 most used words & most active username.

  19. SMILE Twitter Emotion dataset

    • figshare.com
    • kaggle.com
    txt
    Updated Apr 21, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bo Wang; Adam Tsakalidis; Maria Liakata; Arkaitz Zubiaga; Rob Procter; Eric Jensen (2016). SMILE Twitter Emotion dataset [Dataset]. http://doi.org/10.6084/m9.figshare.3187909.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Apr 21, 2016
    Dataset provided by
    figshare
    Authors
    Bo Wang; Adam Tsakalidis; Maria Liakata; Arkaitz Zubiaga; Rob Procter; Eric Jensen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is collected and annotated for the SMILE project http://www.culturesmile.org. This collection of tweets mentioning 13 Twitter handles associated with British museums was gathered between May 2013 and June 2015. It was created for the purpose of classifying emotions, expressed on Twitter towards arts and cultural experiences in museums. It contains 3,085 tweets, with 5 emotions namely anger, disgust, happiness, surprise and sadness. Please see our paper "SMILE: Twitter Emotion Classification using Domain Adaptation" for more details of the dataset.License: The annotations are provided under a CC-BY license, while Twitter retains the ownership and rights of the content of the tweets.

  20. f

    Number of initial and usable tweets for extracting the discussion topic.

    • figshare.com
    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    James Waters; Nicos Nicolaou; Dimosthenis Stefanidis; Hariton Efstathiades; George Pallis; Marios Dikaiakos (2023). Number of initial and usable tweets for extracting the discussion topic. [Dataset]. http://doi.org/10.1371/journal.pone.0254337.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    James Waters; Nicos Nicolaou; Dimosthenis Stefanidis; Hariton Efstathiades; George Pallis; Marios Dikaiakos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Number of initial and usable tweets for extracting the discussion topic.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Gong Xiangbo (2024). large-twitter-tweets-sentiment [Dataset]. https://huggingface.co/datasets/gxb912/large-twitter-tweets-sentiment

large-twitter-tweets-sentiment

s

gxb912/large-twitter-tweets-sentiment

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 6, 2024
Authors
Gong Xiangbo
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Dataset Card for "Large twitter tweets sentiment analysis"

  Dataset Description







  Dataset Summary

This dataset is a collection of tweets formatted in a tabular data structure, annotated for sentiment analysis. Each tweet is associated with a sentiment label, with 1 indicating a Positive sentiment and 0 for a Negative sentiment.

  Languages

The tweets in English.

  Dataset Structure







  Data Instances

An instance of
 See the full description on the dataset page: https://huggingface.co/datasets/gxb912/large-twitter-tweets-sentiment.

Search
Clear search
Close search
Google apps
Main menu