100+ datasets found
  1. h

    twitter-sentiment-analysis

    • huggingface.co
    Updated Aug 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Miguel Carlos Blanco Cacharrón (2022). twitter-sentiment-analysis [Dataset]. https://huggingface.co/datasets/carblacac/twitter-sentiment-analysis
    Explore at:
    Dataset updated
    Aug 16, 2022
    Authors
    Miguel Carlos Blanco Cacharrón
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    The Twitter Sentiment Analysis Dataset contains 1,578,627 classified tweets, each row is marked as 1 for positive sentiment and 0 for negative sentiment. The dataset is based on data from the following two sources:

    University of Michigan Sentiment Analysis competition on Kaggle Twitter Sentiment Corpus by Niek Sanders

    Finally, I randomly selected a subset of them, applied a cleaning process, and divided them between the test and train subsets, keeping a balance between the number of positive and negative tweets within each of these subsets.

  2. Twitter Sentiment Analysis Dataset

    • kaggle.com
    zip
    Updated Aug 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tùng Lê Thanh (2023). Twitter Sentiment Analysis Dataset [Dataset]. https://www.kaggle.com/datasets/tungle98/twitter-sentiment-dataset
    Explore at:
    zip(1291530 bytes)Available download formats
    Dataset updated
    Aug 16, 2023
    Authors
    Tùng Lê Thanh
    Description

    Dataset

    This dataset was created by Tùng Lê Thanh

    Contents

  3. i

    Twitter Sentiment Analysis Data

    • ieee-dataport.org
    Updated Aug 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rabindra Lamsal (2024). Twitter Sentiment Analysis Data [Dataset]. http://doi.org/10.21227/t4mp-ce93
    Explore at:
    Dataset updated
    Aug 6, 2024
    Dataset provided by
    IEEE Dataport
    Authors
    Rabindra Lamsal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset page is currently being updated. The tweets collected by the model deployed at https://live.rlamsal.com.np/ are shared here. However, because of COVID-19, all computing resources I have are being used for a dedicated collection of the tweets related to the pandemic. You can go through the following datasets to access those tweets:Coronavirus (COVID-19) Tweets Dataset: https://ieee-dataport.org/open-access/coronavirus-covid-19-tweets-datasetCoronavirus (COVID-19) Geo-tagged Tweets Dataset: https://ieee-dataport.org/open-access/coronavirus-covid-19-geo-tagged-tweets-dataset

  4. m

    Twitter Sentiments Dataset

    • data.mendeley.com
    Updated May 14, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SHERIF HUSSEIN (2021). Twitter Sentiments Dataset [Dataset]. http://doi.org/10.17632/z9zw7nt5h2.1
    Explore at:
    Dataset updated
    May 14, 2021
    Authors
    SHERIF HUSSEIN
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset has three sentiments namely, negative, neutral, and positive. It contains two fields for the tweet and label.

  5. h

    twitter-airline-sentiment

    • huggingface.co
    Updated Feb 24, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Omar Sanseviero (2015). twitter-airline-sentiment [Dataset]. https://huggingface.co/datasets/osanseviero/twitter-airline-sentiment
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 24, 2015
    Authors
    Omar Sanseviero
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for Twitter US Airline Sentiment

      Dataset Summary
    

    This data originally came from Crowdflower's Data for Everyone library. As the original source says,

    A sentiment analysis job about the problems of each major U.S. airline. Twitter data was scraped from February of 2015 and contributors were asked to first classify positive, negative, and neutral tweets, followed by categorizing negative reasons (such as "late flight" or "rude service").

    The data… See the full description on the dataset page: https://huggingface.co/datasets/osanseviero/twitter-airline-sentiment.

  6. Sentiment Analysis on Financial Tweets

    • kaggle.com
    zip
    Updated Sep 5, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vivek Rathi (2019). Sentiment Analysis on Financial Tweets [Dataset]. https://www.kaggle.com/datasets/vivekrathi055/sentiment-analysis-on-financial-tweets
    Explore at:
    zip(2538259 bytes)Available download formats
    Dataset updated
    Sep 5, 2019
    Authors
    Vivek Rathi
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Context

    The following information can also be found at https://www.kaggle.com/davidwallach/financial-tweets. Out of curosity, I just cleaned the .csv files to perform a sentiment analysis. So both the .csv files in this dataset are created by me.

    Anything you read in the description is written by David Wallach and using all this information, I happen to perform my first ever sentiment analysis.

    "I have been interested in using public sentiment and journalism to gather sentiment profiles on publicly traded companies. I first developed a Python package (https://github.com/dwallach1/Stocker) that scrapes the web for articles written about companies, and then noticed the abundance of overlap with Twitter. I then developed a NodeJS project that I have been running on my RaspberryPi to monitor Twitter for all tweets coming from those mentioned in the content section. If one of them tweeted about a company in the stocks_cleaned.csv file, then it would write the tweet to the database. Currently, the file is only from earlier today, but after about a month or two, I plan to update the tweets.csv file (hopefully closer to 50,000 entries.

    I am not quite sure how this dataset will be relevant, but I hope to use these tweets and try to generate some sense of public sentiment score."

    Content

    This dataset has all the publicly traded companies (tickers and company names) that were used as input to fill the tweets.csv. The influencers whose tweets were monitored were: ['MarketWatch', 'business', 'YahooFinance', 'TechCrunch', 'WSJ', 'Forbes', 'FT', 'TheEconomist', 'nytimes', 'Reuters', 'GerberKawasaki', 'jimcramer', 'TheStreet', 'TheStalwart', 'TruthGundlach', 'Carl_C_Icahn', 'ReformedBroker', 'benbernanke', 'bespokeinvest', 'BespokeCrypto', 'stlouisfed', 'federalreserve', 'GoldmanSachs', 'ianbremmer', 'MorganStanley', 'AswathDamodaran', 'mcuban', 'muddywatersre', 'StockTwits', 'SeanaNSmith'

    Acknowledgements

    The data used here is gathered from a project I developed : https://github.com/dwallach1/StockerBot

    Inspiration

    I hope to develop a financial sentiment text classifier that would be able to track Twitter's (and the entire public's) feelings about any publicly traded company (and cryptocurrency)

  7. P

    Twitter US Airline Sentiment Dataset

    • paperswithcode.com
    Updated May 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Twitter US Airline Sentiment Dataset [Dataset]. https://paperswithcode.com/dataset/twitter-us-airline-sentiment
    Explore at:
    Dataset updated
    May 10, 2022
    Description

    A sentiment analysis job about the problems of each major U.S. airline. Twitter data was scraped from February of 2015 and contributors were asked to first classify positive, negative, and neutral tweets, followed by categorizing negative reasons (such as "late flight" or "rude service"). You can download the non-aggregated results (55,000 rows) here.

  8. Z

    Brussel mobility Twitter sentiment analysis CSV Dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated May 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Betancur Arenas, Juliana (2024). Brussel mobility Twitter sentiment analysis CSV Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11401123
    Explore at:
    Dataset updated
    May 31, 2024
    Dataset provided by
    Tori, Floriano
    Betancur Arenas, Juliana
    Ginis, Vincent
    van Vessem, Charlotte
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Brussels
    Description

    SSH CENTRE (Social Sciences and Humanities for Climate, Energy aNd Transport Research Excellence) is a Horizon Europe project, engaging directly with stakeholders across research, policy, and business (including citizens) to strengthen social innovation, SSH-STEM collaboration, transdisciplinary policy advice, inclusive engagement, and SSH communities across Europe, accelerating the EU’s transition to carbon neutrality. SSH CENTRE is based in a range of activities related to Open Science, inclusivity and diversity – especially with regards Southern and Eastern Europe and different career stages – including: development of novel SSH-STEM collaborations to facilitate the delivery of the EU Green Deal; SSH knowledge brokerage to support regions in transition; and the effective design of strategies for citizen engagement in EU R&I activities. Outputs include action-led agendas and building stakeholder synergies through regular Policy Insight events.This is captured in a high-profile virtual SSH CENTRE generating and sharing best practice for SSH policy advice, overcoming fragmentation to accelerate the EU’s journey to a sustainable future.The documents uploaded here are part of WP2 whereby novel, interdisciplinary teams were provided funding to undertake activities to develop a policy recommendation related to EU Green Deal policy. Each of these policy recommendations, and the activities that inform them, will be written-up as a chapter in an edited book collection. Three books will make up this edited collection - one on climate, one on energy and one on mobility. As part of writing a chapter for the SSH CENTRE book on ‘Mobility’, we set out to analyse the sentiment of users on Twitter regarding shared and active mobility modes in Brussels. This involved us collecting tweets between 2017-2022. A tweet was collected if it contained a previously defined mobility keyword (for example: metro) and either the name of a (local) politician, a neighbourhood or municipality, or a (shared) mobility provider. The files attached to this Zenodo webpage is a csv files containing the tweets collected.”.

  9. c

    Data from: Twitter sentiment for 15 European languages

    • clarin.si
    • live.european-language-grid.eu
    Updated Feb 23, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Igor Mozetič; Miha Grčar; Jasmina Smailović (2016). Twitter sentiment for 15 European languages [Dataset]. https://www.clarin.si/repository/xmlui/handle/11356/1054
    Explore at:
    Dataset updated
    Feb 23, 2016
    Authors
    Igor Mozetič; Miha Grčar; Jasmina Smailović
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    The dataset contains over 1.6 million tweets (tweet IDs), labeled with sentiment by human annotators. There are 15 Twitter corpora for the corresponding 15 European languages. The data can be used to train and evaluate Twitter sentiment classifiers, to compute annotator agreement, or to study the differences between language usage on Twitter.

    The data analysis is described in the following papers:

    I. Mozetič, M. Grčar, J. Smailović. Multilingual Twitter sentiment classification: The role of human annotators, PLoS ONE 11(5): e0155036, doi: 10.1371/journal.pone.e0155036, 2016. (http://dx.doi.org/10.1371/journal.pone.0155036)

    I. Mozetič, L. Torgo, V. Cerqueira, J. Smailović. How to evaluate sentiment classifiers for Twitter time-ordered data?, PLoS ONE 13(3): e0194317, doi: 10.1371/journal.pone.0194317, 2018. (https://dx.doi.org/10.1371/journal.pone.0194317)

  10. h

    twitter-financial-news-sentiment

    • huggingface.co
    • opendatalab.com
    Updated Dec 4, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    not a (2022). twitter-financial-news-sentiment [Dataset]. https://huggingface.co/datasets/zeroshot/twitter-financial-news-sentiment
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 4, 2022
    Authors
    not a
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Description

    The Twitter Financial News dataset is an English-language dataset containing an annotated corpus of finance-related tweets. This dataset is used to classify finance-related tweets for their sentiment.

    The dataset holds 11,932 documents annotated with 3 labels:

    sentiments = { "LABEL_0": "Bearish", "LABEL_1": "Bullish", "LABEL_2": "Neutral" }

    The data was collected using the Twitter API. The current dataset supports the multi-class… See the full description on the dataset page: https://huggingface.co/datasets/zeroshot/twitter-financial-news-sentiment.

  11. m

    Dataset of tweets in English language about the COVID-19 pandemic for binary...

    • data.mendeley.com
    Updated Sep 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Larissa Santos da Motta (2021). Dataset of tweets in English language about the COVID-19 pandemic for binary sentiment analysis [Dataset]. http://doi.org/10.17632/6fx22vj6g6.1
    Explore at:
    Dataset updated
    Sep 13, 2021
    Authors
    Larissa Santos da Motta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is aimed to the task of sentiment analysis in tweets about the COVID-19 pandemic. There are 3 versions of the dataset, composed by 186,000, 132,000, and 82,000 tweets in English language with stopwords removal, respectively. Positive tweets have polarity equal to 1, while negative tweets have polarity equal to 0 in all versions. All datasets were selected, cleaned and organized from the public dataset available at https://ieee-dataport.org/open-access/coronavirus-covid-19-tweets-dataset. The datasets are accompanied by embedding matrices generated from the pre-trained Word2Vec shallow neural network available at https://data.mendeley.com/datasets/t8bxg423yk/1.

  12. i

    Stock Market Tweets Data

    • ieee-dataport.org
    Updated Apr 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bruno Taborda (2021). Stock Market Tweets Data [Dataset]. http://doi.org/10.21227/g8vy-5w61
    Explore at:
    Dataset updated
    Apr 15, 2021
    Dataset provided by
    IEEE Dataport
    Authors
    Bruno Taborda
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Twitter is one of the most popular social networks for sentiment analysis. This data set of tweets are related to the stock market. We collected 943,672 tweets between April 9 and July 16, 2020, using the S&P 500 tag (#SPX500), the references to the top 25 companies in the S&P 500 index, and the Bloomberg tag (#stocks). 1,300 out of the 943,672 tweets were manually annotated in positive, neutral, or negative classes. A second independent annotator reviewed the manually annotated tweets. This annotated data set can contribute to create new domain-specific lexicons or enrich some of the actual dictionaries. Researchers can train their supervised models using the annotated data set. Additionally, the full data set can be used for text mining and sentiment analysis related to the stock market.

  13. m

    Data from: A Large Scale Tweet Dataset for Urdu Text Sentiment Analysis

    • data.mendeley.com
    Updated Dec 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rakhi Batra (2020). A Large Scale Tweet Dataset for Urdu Text Sentiment Analysis [Dataset]. http://doi.org/10.17632/rz3xg97rm5.1
    Explore at:
    Dataset updated
    Dec 7, 2020
    Authors
    Rakhi Batra
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a dataset of tweets in the Urdu language. There are 1,140,824 tweets in the dataset, collected from Twitter for September and October 2020. This large-scale corpus of tweets is generated by performing preprocessing, which includes removing columns containing user information, retweet’s count, followers, and duplicate tweets, removing unnecessary punctuations, links, symbols, and spaces, and finally extracting emojis if present in the tweet text. This dataset's final tweet record contains columns for tweet id, text, and emoji extracted from the text with a sentiment score. Emojis are extracted to validate Machine Learning models used for the multilingual sentiment and behavior analysis.

  14. P

    Coronavirus (COVID-19) Tweets Dataset Dataset

    • paperswithcode.com
    Updated Sep 15, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Coronavirus (COVID-19) Tweets Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/cov19tweets-dataset
    Explore at:
    Dataset updated
    Sep 15, 2022
    Description

    This dataset includes CSV files that contain IDs and sentiment scores of the tweets related to the COVID-19 pandemic. The real-time Twitter feed is monitored for coronavirus-related tweets using 90+ different keywords and hashtags that are commonly used while referencing the pandemic. The oldest tweets in this dataset date back to October 01, 2019. This dataset has been wholly re-designed on March 20, 2020, to comply with the content redistribution policy set by Twitter. Twitter's policy restricts the sharing of Twitter data other than IDs; therefore, only the tweet IDs are released through this dataset. You need to hydrate the tweet IDs in order to get complete data.

  15. T

    sentiment140

    • tensorflow.org
    • opendatalab.com
    • +2more
    Updated Dec 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). sentiment140 [Dataset]. https://www.tensorflow.org/datasets/catalog/sentiment140
    Explore at:
    Dataset updated
    Dec 23, 2022
    Description

    Sentiment140 allows you to discover the sentiment of a brand, product, or topic on Twitter.

    The data is a CSV with emoticons removed. Data file format has 6 fields:

    1. the polarity of the tweet (0 = negative, 2 = neutral, 4 = positive)
    2. the id of the tweet (2087)
    3. the date of the tweet (Sat May 16 23:58:44 UTC 2009)
    4. the query (lyx). If there is no query, then this value is NO_QUERY.
    5. the user that tweeted (robotickilldozr)
    6. the text of the tweet (Lyx is cool)

    For more information, refer to the paper Twitter Sentiment Classification with Distant Supervision at https://cs.stanford.edu/people/alecmgo/papers/TwitterDistantSupervision09.pdf

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('sentiment140', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  16. P

    ASTD Dataset

    • paperswithcode.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mahmoud Nabil; Mohamed Aly; Amir Atiya, ASTD Dataset [Dataset]. https://paperswithcode.com/dataset/astd
    Explore at:
    Authors
    Mahmoud Nabil; Mohamed Aly; Amir Atiya
    Description

    Arabic Sentiment Tweets Dataset (ASTD) is an Arabic social sentiment analysis dataset gathered from Twitter. It consists of about 10,000 tweets which are classified as objective, subjective positive, subjective negative, and subjective mixed.

  17. Twitter Sentiment Analysis ID Election 2024

    • kaggle.com
    zip
    Updated Jan 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ghany Fitriamara (2024). Twitter Sentiment Analysis ID Election 2024 [Dataset]. https://www.kaggle.com/datasets/ghanyfitria/twitter-sentiment-analysis-id-election-2024
    Explore at:
    zip(23474 bytes)Available download formats
    Dataset updated
    Jan 22, 2024
    Authors
    Ghany Fitriamara
    Description

    Dataset

    This dataset was created by Ghany Fitriamara

    Released under Other (specified in description)

    Contents

  18. h

    twitter-sentiment-dataset-en

    • huggingface.co
    Updated Aug 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yogi Yulianto (2023). twitter-sentiment-dataset-en [Dataset]. https://huggingface.co/datasets/yogiyulianto/twitter-sentiment-dataset-en
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 1, 2023
    Authors
    Yogi Yulianto
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    yogiyulianto/twitter-sentiment-dataset-en dataset hosted on Hugging Face and contributed by the HF Datasets community

  19. m

    The Climate Change Twitter Dataset

    • data.mendeley.com
    Updated May 19, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dimitrios Effrosynidis (2022). The Climate Change Twitter Dataset [Dataset]. http://doi.org/10.17632/mw8yd7z9wc.2
    Explore at:
    Dataset updated
    May 19, 2022
    Authors
    Dimitrios Effrosynidis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    If you use the dataset, cite the paper: https://doi.org/10.1016/j.eswa.2022.117541

    The most comprehensive dataset to date regarding climate change and human opinions via Twitter. It has the heftiest temporal coverage, spanning over 13 years, includes over 15 million tweets spatially distributed across the world, and provides the geolocation of most tweets. Seven dimensions of information are tied to each tweet, namely geolocation, user gender, climate change stance and sentiment, aggressiveness, deviations from historic temperature, and topic modeling, while accompanied by environmental disaster events information. These dimensions were produced by testing and evaluating a plethora of state-of-the-art machine learning algorithms and methods, both supervised and unsupervised, including BERT, RNN, LSTM, CNN, SVM, Naive Bayes, VADER, Textblob, Flair, and LDA.

    The following columns are in the dataset:

    ➡ created_at: The timestamp of the tweet. ➡ id: The unique id of the tweet. ➡ lng: The longitude the tweet was written. ➡ lat: The latitude the tweet was written. ➡ topic: Categorization of the tweet in one of ten topics namely, seriousness of gas emissions, importance of human intervention, global stance, significance of pollution awareness events, weather extremes, impact of resource overconsumption, Donald Trump versus science, ideological positions on global warming, politics, and undefined. ➡ sentiment: A score on a continuous scale. This scale ranges from -1 to 1 with values closer to 1 being translated to positive sentiment, values closer to -1 representing a negative sentiment while values close to 0 depicting no sentiment or being neutral. ➡ stance: That is if the tweet supports the belief of man-made climate change (believer), if the tweet does not believe in man-made climate change (denier), and if the tweet neither supports nor refuses the belief of man-made climate change (neutral). ➡ gender: Whether the user that made the tweet is male, female, or undefined. ➡ temperature_avg: The temperature deviation in Celsius and relative to the January 1951-December 1980 average at the time and place the tweet was written. ➡ aggressiveness: That is if the tweet contains aggressive language or not.

    Since Twitter forbids making public the text of the tweets, in order to retrieve it you need to do a process called hydrating. Tools such as Twarc or Hydrator can be used to hydrate tweets.

  20. D

    Data from: Using Twitter Dataset for Social Listening in Singapore

    • researchdata.ntu.edu.sg
    zip
    Updated Jul 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DR-NTU (Data) (2024). Using Twitter Dataset for Social Listening in Singapore [Dataset]. http://doi.org/10.21979/N9/PALUID
    Explore at:
    zip(7282650657)Available download formats
    Dataset updated
    Jul 2, 2024
    Dataset provided by
    DR-NTU (Data)
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Time period covered
    2008 - 2023
    Area covered
    Singapore
    Dataset funded by
    Ministry of National Development (MND)
    National Research Foundation (NRF)
    Description

    This study delves into analyzing social media data sourced from Twitter within the context of Singapore, forming a crucial component of a broader social listening initiative. We provide a decade’s worth of social data from Singapore, offering invaluable insights for the research community. This work presents two analytical approaches utilizing this dataset: sentiment analysis and bursty topic detection. Sentiment analysis for direct search is based on zero shot pretrained model while busrty topic analysis is based on biterm topic model. The detailed experiments demonstrate the efficacy of the approach for analyzing social trends using Twitter data. We collected all twitter data posted in Singapore from 2008 to 2023. The geocode setting as (1.346353, 103.807526, 25km) was used in Twitter API to cover the whole of Singapore. The total number of tweets in this dataset is 96,686,894. There are 3 data files: 1. place.json includes 10k detailed places information in Singapore.2.subzones.json includes 332 subzone information in Singapore 3.tweets.json includes 96M+tweets posted in Singapore. MongoDB was used as the database to store and manage the data.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Miguel Carlos Blanco Cacharrón (2022). twitter-sentiment-analysis [Dataset]. https://huggingface.co/datasets/carblacac/twitter-sentiment-analysis

twitter-sentiment-analysis

carblacac/twitter-sentiment-analysis

TSATC: Twitter Sentiment Analysis Training Corpus

Explore at:
4 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Aug 16, 2022
Authors
Miguel Carlos Blanco Cacharrón
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

The Twitter Sentiment Analysis Dataset contains 1,578,627 classified tweets, each row is marked as 1 for positive sentiment and 0 for negative sentiment. The dataset is based on data from the following two sources:

University of Michigan Sentiment Analysis competition on Kaggle Twitter Sentiment Corpus by Niek Sanders

Finally, I randomly selected a subset of them, applied a cleaning process, and divided them between the test and train subsets, keeping a balance between the number of positive and negative tweets within each of these subsets.

Search
Clear search
Close search
Google apps
Main menu