73 datasets found
  1. o

    Data from: A large-scale COVID-19 Twitter chatter dataset for open...

    • explore.openaire.eu
    • data.niaid.nih.gov
    • +1more
    Updated Feb 7, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juan M. Banda; Ramya Tekumalla; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding; Katya Artemova; Elena Tutubalina; Gerardo Chowell (2021). A large-scale COVID-19 Twitter chatter dataset for open scientific research - an international collaboration [Dataset]. http://doi.org/10.5281/zenodo.4516518
    Explore at:
    Dataset updated
    Feb 7, 2021
    Authors
    Juan M. Banda; Ramya Tekumalla; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding; Katya Artemova; Elena Tutubalina; Gerardo Chowell
    Description

    Version 48 of the dataset. Due to the relevance of the COVID-19 global pandemic, we are releasing our dataset of tweets acquired from the Twitter Stream related to COVID-19 chatter. Since our first release we have received additional data from our new collaborators, allowing this resource to grow to its current size. Dedicated data gathering started from March 11th yielding over 4 million tweets a day. We have added additional data provided by our new collaborators from January 27th to March 27th, to provide extra longitudinal coverage. Version 10 added ~1.5 million tweets in the Russian language collected between January 1st and May 8th, gracefully provided to us by: Katya Artemova (NRU HSE) and Elena Tutubalina (KFU). From version 12 we have included daily hashtags, mentions and emoijis and their frequencies the respective zip files. From version 14 we have included the tweet identifiers and their respective language for the clean version of the dataset. Since version 20 we have included language and place location for all tweets. The data collected from the stream captures all languages, but the higher prevalence are: English, Spanish, and French. We release all tweets and retweets on the full_dataset.tsv file (948,493,362 unique tweets), and a cleaned version with no retweets on the full_dataset-clean.tsv file (238,771,950 unique tweets). There are several practical reasons for us to leave the retweets, tracing important tweets and their dissemination is one of them. For NLP tasks we provide the top 1000 frequent terms in frequent_terms.csv, the top 1000 bigrams in frequent_bigrams.csv, and the top 1000 trigrams in frequent_trigrams.csv. Some general statistics per day are included for both datasets in the full_dataset-statistics.tsv and full_dataset-clean-statistics.tsv files. For more statistics and some visualizations visit: http://www.panacealab.org/covid19/ More details can be found (and will be updated faster at: https://github.com/thepanacealab/covid19_twitter) and our pre-print about the dataset (https://arxiv.org/abs/2004.03688) As always, the tweets distributed here are only tweet identifiers (with date and time added) due to the terms and conditions of Twitter to re-distribute Twitter data ONLY for research purposes. They need to be hydrated to be used. This dataset will be updated bi-weekly at least with additional tweets, look at the github repo for these updates. Release: We have standardized the name of the resource to match our pre-print manuscript and to not have to update it every week.

  2. A Twitter Dataset of 150+ million tweets related to COVID-19 for open...

    • zenodo.org
    application/gzip, csv +1
    Updated Apr 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juan M. Banda; Juan M. Banda; Ramya Tekumalla; Ramya Tekumalla; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding; Gerardo Chowell; Gerardo Chowell; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding (2023). A Twitter Dataset of 150+ million tweets related to COVID-19 for open research [Dataset]. http://doi.org/10.5281/zenodo.3738018
    Explore at:
    application/gzip, csv, tsvAvailable download formats
    Dataset updated
    Apr 17, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Juan M. Banda; Juan M. Banda; Ramya Tekumalla; Ramya Tekumalla; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding; Gerardo Chowell; Gerardo Chowell; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding
    Description

    Due to the relevance of the COVID-19 global pandemic, we are releasing our dataset of tweets acquired from the Twitter Stream related to COVID-19 chatter. Since our first release we have received additional data from our new collaborators, allowing this resource to grow to its current size. Dedicated data gathering started from March 11th yielding over 4 million tweets a day. We have added additional data provided by our new collaborators from January 27th to March 27th, to provide extra longitudinal coverage.

    The data collected from the stream captures all languages, but the higher prevalence are: English, Spanish, and French. We release all tweets and retweets on the full_dataset.tsv file (152,920,832 unique tweets), and a cleaned version with no retweets on the full_dataset-clean.tsv file (30,990,645 unique tweets). There are several practical reasons for us to leave the retweets, tracing important tweets and their dissemination is one of them. For NLP tasks we provide the top 1000 frequent terms in frequent_terms.csv, the top 1000 bigrams in frequent_bigrams.csv, and the top 1000 trigrams in frequent_trigrams.csv. Some general statistics per day are included for both datasets in the statistics-full_dataset.tsv and statistics-full_dataset-clean.tsv files.

    More details can be found (and will be updated faster at: https://github.com/thepanacealab/covid19_twitter)

    As always, the tweets distributed here are only tweet identifiers (with date and time added) due to the terms and conditions of Twitter to re-distribute Twitter data ONLY for research purposes. The need to be hydrated to be used.

  3. o

    Data from: Global Reactions to COVID-19 on Twitter: A Labelled Dataset with...

    • openicpsr.org
    delimited
    Updated Feb 16, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raj Gupta; Ajay Vishwanath; Yinping Yang (2021). Global Reactions to COVID-19 on Twitter: A Labelled Dataset with Latent Topic, Sentiment and Emotion Attributes [Dataset]. http://doi.org/10.3886/E120321V6
    Explore at:
    delimitedAvailable download formats
    Dataset updated
    Feb 16, 2021
    Dataset provided by
    Institute of High Performance Computing (IHPC), A*STAR
    Institute of High Performance Computing, A*STAR
    Authors
    Raj Gupta; Ajay Vishwanath; Yinping Yang
    Time period covered
    Jan 28, 2020 - Jul 1, 2020
    Area covered
    Global
    Description

    This project aims to present a large dataset for researchers to discover public conversation on Twitter surrounding the COVID-19 pandemic. As strong concerns and emotions are expressed in the publicly available tweets, we annotated seventeen latent semantic attributes for each public tweet using natural language processing techniques and machine-learning based algorithms. The latent semantic attributes include: 1) ten attributes indicating the tweet’s relevance to ten detected topics, 2) five quantitative attributes indicating the degree of intensity in the valence (i.e., unpleasantness/pleasantness) and emotional intensities across four primary emotions of fear, anger, sadness and joy, and 3) two qualitative attributes indicating the sentiment category and the most dominant emotion category, respectively.

  4. The complete corpus of #COVID-19 Twitter dataset

    • zenodo.org
    txt
    Updated Jun 24, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Despoina Antonakaki; Despoina Antonakaki (2021). The complete corpus of #COVID-19 Twitter dataset [Dataset]. http://doi.org/10.5281/zenodo.4899941
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 24, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Despoina Antonakaki; Despoina Antonakaki
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description


    COVID-19 pandemic initiated over a year ago continues to spread around the globe and the ongoing research regarding COVID-19 is on a continues growth as well. The online discourse on social media regarding COVID-19 has been growing along with the timeline of the pandemic.

    Open data on Twitter have been released and offer the research community the opportunity for new findings and resolving this new threat. In this dataset, we open a corpus of Twitter's data from March 2020 till today, that is being updated every day based on the two most important hashtags regarding COVID-19. This dataset will offer the research community the opportunity to explore the social extensions of this pandemic including topic analysis, hate speech sentiment analysis, regarding either the opinion of the users on the pandemic, the comments on the public discourse, or the vaccination releases. The dataset has been collected by retrieving all the tweets that contain the hashtags: #coronavirus and #COVID19 including approximately 208M tweets for hashtags #coronavirus and 392M tweets for hashtag #COVID-19, resulting in a total of 600M tweets.

  5. i

    Place#Hashtag Twitter Dataset: COVID-19 Hashtags

    • ieee-dataport.org
    Updated Jul 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Suraksha Pokhrel (2025). Place#Hashtag Twitter Dataset: COVID-19 Hashtags [Dataset]. https://ieee-dataport.org/open-access/placehashtag-twitter-dataset-covid-19-hashtags
    Explore at:
    Dataset updated
    Jul 29, 2025
    Authors
    Suraksha Pokhrel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    hashtag] relations from 190 countries and territories

  6. o

    Data from: COVID-19 Twitter Dataset with Latent Topics, Sentiments and...

    • openicpsr.org
    Updated Jul 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yinping Yang (2020). COVID-19 Twitter Dataset with Latent Topics, Sentiments and Emotions Attributes [Dataset]. http://doi.org/10.3886/E120321V1
    Explore at:
    Dataset updated
    Jul 18, 2020
    Dataset provided by
    Dr.
    Authors
    Yinping Yang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 28, 2020 - Jul 1, 2020
    Area covered
    Global
    Description

    We collected and processed a dataset and make it available for the research community to study the COVD-19 pandemic in multiple possibilities.

  7. Covid - 19 Twitter Dataset

    • figshare.com
    zip
    Updated Feb 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kate finch (2021). Covid - 19 Twitter Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.13698856.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 3, 2021
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Kate finch
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    I am sharing covid 19 Twitter dataset to the research community containing large Tweets. I hope this data set will enable the study of online conversation dynamics in the context of a global outbreak of unprecedented proportions and implications. I have collected this dataset using Trackmyhashtag, an affordable platform.I hope researchers find it helpful. If you need more datasets, let me know.

  8. d

    Data from: A Review of Models for Hydrating Large-scale Twitter Data of...

    • dataone.org
    • dataverse.harvard.edu
    Updated Nov 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arafat, Mahmoud (2023). A Review of Models for Hydrating Large-scale Twitter Data of COVID-19-related Tweets for Transportation Research [Dataset]. http://doi.org/10.7910/DVN/LJWIGZ
    Explore at:
    Dataset updated
    Nov 22, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Arafat, Mahmoud
    Description

    In response to the Coronavirus disease (COVID-19) outbreak and the Transportation Research Board’s (TRB) urgent need for work related to transportation and pandemics, this paper contributes with a sense of urgency and provides a starting point for research on the topic. The main goal of this paper is to support transportation researchers and the TRB community during this COVID-19 pandemic by reviewing the performance of software models used for extracting large-scale data from Twitter streams related to COVID-19. The study extends the previous research efforts in social media data mining by providing a review of contemporary tools, including their computing maturity and their potential usefulness. The paper also includes an open repository for the processed data frames to facilitate the quick development of new transportation research studies. The output of this work is recommended to be used by the TRB community when deciding to further investigate topics related to COVID-19 and social media data mining tools.

  9. A Twitter Dataset of 70+ million tweets related to COVID-19

    • zenodo.org
    csv, tsv, zip
    Updated Apr 17, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juan M. Banda; Juan M. Banda; Ramya Tekumalla; Ramya Tekumalla; Gerardo Chowell; Gerardo Chowell (2023). A Twitter Dataset of 70+ million tweets related to COVID-19 [Dataset]. http://doi.org/10.5281/zenodo.3732460
    Explore at:
    csv, tsv, zipAvailable download formats
    Dataset updated
    Apr 17, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Juan M. Banda; Juan M. Banda; Ramya Tekumalla; Ramya Tekumalla; Gerardo Chowell; Gerardo Chowell
    Description

    Due to the relevance of the COVID-19 global pandemic, we are releasing our dataset of tweets acquired from the Twitter Stream related to COVID-19 chatter. The first 9 weeks of data (from January 1st, 2020 to March 11th, 2020) contain very low tweet counts as we filtered other data we were collecting for other research purposes, however, one can see the dramatic increase as the awareness for the virus spread. Dedicated data gathering started from March 11th to March 29th which yielded over 4 million tweets a day.

    The data collected from the stream captures all languages, but the higher prevalence are: English, Spanish, and French. We release all tweets and retweets on the full_dataset.tsv file (70,569,368 unique tweets), and a cleaned version with no retweets on the full_dataset-clean.tsv file (13,535,912 unique tweets). There are several practical reasons for us to leave the retweets, tracing important tweets and their dissemination is one of them. For NLP tasks we provide the top 1000 frequent terms in frequent_terms.csv, the top 1000 bigrams in frequent_bigrams.csv, and the top 1000 trigrams in frequent_trigrams.csv. Some general statistics per day are included for both datasets in the statistics-full_dataset.tsv and statistics-full_dataset-clean.tsv files.

    More details can be found (and will be updated faster at: https://github.com/thepanacealab/covid19_twitter)

    As always, the tweets distributed here are only tweet identifiers (with date and time added) due to the terms and conditions of Twitter to re-distribute Twitter data. The need to be hydrated to be used.

  10. m

    SpanishTweetsCOVID-19: A Social Media Enriched Covid-19 Twitter Spanish...

    • data.mendeley.com
    Updated Feb 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antonela Tommasel (2025). SpanishTweetsCOVID-19: A Social Media Enriched Covid-19 Twitter Spanish Dataset [Dataset]. http://doi.org/10.17632/nv8k69y59d.4
    Explore at:
    Dataset updated
    Feb 19, 2025
    Authors
    Antonela Tommasel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset presents a large-scale collection of millions of Twitter posts related to the coronavirus pandemic in Spanish language. The collection was built by monitoring public posts written in Spanish containing a diverse set of hashtags related to the COVID-19, as well as tweets shared by the official Argentinian government offices, such as ministries and secretaries at different levels. Data was collected between March and October 2020 using the Twitter API, and will be periodically updated.

    In addition to tweets IDs, the dataset includes information about mentions, retweets, media, URLs, hashtags, replies, users and content-based user relations, allowing the observation of the dynamics of the shared information. Data is presented in different tables that can be analysed separately or combined.

    The dataset aims at serving as source for studying several coronavirus effects in people through social media, including the impact of public policies, the perception of risk and related disease consequences, the adoption of guidelines, the emergence, dynamics and propagation of disinformation and rumours, the formation of communities and other social phenomena, the evolution of health related indicators (such as fear, stress, sleep disorders, or children behaviour changes), among other possibilities. In this sense, the dataset can be useful for multi-disciplinary researchers related to the different fields of data science, social network analysis, social computing, medical informatics, social sciences, among others.

  11. Dataset: Tracking the Twitter attention around the research efforts on the...

    • figshare.com
    xlsx
    Updated Feb 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhichao Fang (2021). Dataset: Tracking the Twitter attention around the research efforts on the COVID-19 pandemic [Dataset]. http://doi.org/10.6084/m9.figshare.12490457.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 28, 2021
    Dataset provided by
    figshare
    Authors
    Zhichao Fang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Datasets of the figures in the paper "Tracking the Twitter attention around the research effort on the COVID-19 pandemic".

  12. COVID-19 vaccine sentiment on Twitter

    • figshare.com
    txt
    Updated Nov 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ferenc Beres; Andras A. Benczur (2023). COVID-19 vaccine sentiment on Twitter [Dataset]. http://doi.org/10.6084/m9.figshare.24647883.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Nov 28, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Ferenc Beres; Andras A. Benczur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    From 24 January to 31 July in 2021, we collected data that anyone can view on Twitter by using the free Twitter API. By using the keywords “vaccine”, “vaccination”, “vaccinated”, “vaxxer”, “vaxxers”, “#CovidVaccine”, “covid denier”, “pfizer”, “moderna”, “astra” and “zeneca”, “sinopharm”, “sputnik”, we collected 33K tweets published by popular Twitter accounts. For each tweet, the following variables were recorded: their author (user ID), the author's categorization (healthcare professional, news media source, other accounts with thousands of followers), the date of publication (to the precision of seconds), the vaccine mentioned, the language, and the general sentiment of the tweet text on a scale from 1 to 5. For multilingual sentiment analysis, we used an open-source BERT model from Huggingface (https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment). When multiple vaccines are mentioned in a tweet, in our data, it is recorded as multiple tweets, one for each vaccine.To uphold the privacy policy for publishing Twitter data, the tweet texts, as well as the original user identifiers for the authors of the tweets, are not disclosed. Instead, we encoded the user information with random integers. To access the complete content of these tweets, researchers may utilize the Twitter search API by referencing the provided tweet identifiers.

  13. A Twitter Dataset of 40+ million tweets related to COVID-19

    • zenodo.org
    • explore.openaire.eu
    csv, tsv
    Updated Apr 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juan M. Banda; Juan M. Banda; Ramya Tekumalla; Ramya Tekumalla (2023). A Twitter Dataset of 40+ million tweets related to COVID-19 [Dataset]. http://doi.org/10.5281/zenodo.3723940
    Explore at:
    tsv, csvAvailable download formats
    Dataset updated
    Apr 17, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Juan M. Banda; Juan M. Banda; Ramya Tekumalla; Ramya Tekumalla
    Description

    Due to the relevance of the COVID-19 global pandemic, we are releasing our dataset of tweets acquired from the Twitter Stream related to COVID-19 chatter. The first 9 weeks of data (from January 1st, 2020 to March 11th, 2020) contain very low tweet counts as we filtered other data we were collecting for other research purposes, however, one can see the dramatic increase as the awareness for the virus spread. Dedicated data gathering started from March 11th to March 22nd which yielded over 4 million tweets a day.

    The data collected from the stream captures all languages, but the higher prevalence are: English, Spanish, and French. We release all tweets and retweets on the full_dataset.tsv file (40,823,816 unique tweets), and a cleaned version with no retweets on the full_dataset-clean.tsv file (7,479,940 unique tweets). There are several practical reasons for us to leave the retweets, tracing important tweets and their dissemination is one of them. For NLP tasks we provide the top 1000 frequent terms in frequent_terms.csv, the top 1000 bigrams in frequent_bigrams.csv, and the top 1000 trigrams in frequent_trigrams.csv. Some general statistics per day are included for both datasets in the statistics-full_dataset.tsv and statistics-full_dataset-clean.tsv files.

    More details can be found (and will be updated faster at: https://github.com/thepanacealab/covid19_twitter)

    As always, the tweets distributed here are only tweet identifiers (with date and time added) due to the terms and conditions of Twitter to re-distribute Twitter data. The need to be hydrated to be used.

  14. i

    Data from: A Large-Scale Dataset of Twitter Chatter about Online Learning...

    • ieee-dataport.org
    Updated Aug 10, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nirmalya Thakur (2022). A Large-Scale Dataset of Twitter Chatter about Online Learning during the Current COVID-19 Omicron Wave [Dataset]. https://ieee-dataport.org/documents/large-scale-dataset-twitter-chatter-about-online-learning-during-current-covid-19-omicron
    Explore at:
    Dataset updated
    Aug 10, 2022
    Authors
    Nirmalya Thakur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    no. 8

  15. i

    Coronavirus (COVID-19) Tweets Dataset

    • ieee-dataport.org
    • search.datacite.org
    • +1more
    Updated May 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rabindra Lamsal (2025). Coronavirus (COVID-19) Tweets Dataset [Dataset]. https://ieee-dataport.org/open-access/coronavirus-covid-19-tweets-dataset
    Explore at:
    Dataset updated
    May 7, 2025
    Authors
    Rabindra Lamsal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    2020

  16. o

    Social Media Dataset of Covid-aware Publics

    • ordo.open.ac.uk
    csv
    Updated Sep 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christian Nold (2024). Social Media Dataset of Covid-aware Publics [Dataset]. http://doi.org/10.21954/ou.rd.27044467.v1
    Explore at:
    csvAvailable download formats
    Dataset updated
    Sep 30, 2024
    Dataset provided by
    The Open University
    Authors
    Christian Nold
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a dataset of tweets by and about COVID-aware publics from the 'X' (Twitter) social media platform collected by the author. The dataset consists of 344 textual tweets regarding COVID-related material practices gathered during the research period Jan 2023 - Sep 2024, yet the dataset also includes tweets created before this date.The textual data has been rewritten to fully anonymise the people who made the tweets, and identifiable contexts have been removed. In addition, all date/time metadata and hashtags, as well as any attached images, have been removed. Square brackets have been used for editorial edits to obfuscate entities or add context to tweets. The dataset consists of a structured comma-separated text file that can be read in any spreadsheet software to maximise accessibility.The research dataset was created with Open university HREC approval: HREC/4557/Nold

  17. i

    000 Tweets

    • ieee-dataport.org
    Updated Jul 25, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nirmalya Thakur (2022). 000 Tweets [Dataset]. https://ieee-dataport.org/documents/twitter-conversations-about-covid-19-omicron-variant-large-scale-dataset-more-500000
    Explore at:
    Dataset updated
    Jul 25, 2022
    Authors
    Nirmalya Thakur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    2022

  18. H

    Coronavirus Tweet Ids

    • dataverse.harvard.edu
    Updated Nov 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Kerchner; Laura Wrubel; Dolsy Smith (2022). Coronavirus Tweet Ids [Dataset]. http://doi.org/10.7910/DVN/LW0BTB
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 17, 2022
    Dataset provided by
    Harvard Dataverse
    Authors
    Daniel Kerchner; Laura Wrubel; Dolsy Smith
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Mar 2, 2020 - Jul 27, 2022
    Description

    This dataset contains the tweet ids of 354,903,485 tweets related to Coronavirus or COVID-19. They were collected between March 3, 2020 and December 3, 2020 from the Twitter API using Social Feed Manager. Please note that this is VERSION 9 of this data set. See the Versions tab below for all versions. Version 1 contains tweets from March 3, 2020 through March 19, 2020. Version 2 contains tweets from March 3, 2020 through March 31, 2020. Version 4 contains tweets from March 3, 2020 through April 16, 2020. Version 5 contains tweets from March 3 through May 1, 2020. Version 6 contains tweets from March 3 through May 27, 2020. Version 7 contains tweets from March 3 through June 9, 2020. Version 8 contains tweets from March 3, 2020 through July 27, 2022 These tweets were collected using the POST statuses/filter method of the Twitter Stream API, using the track parameter with the following keywords: #Coronavirus, #Coronaoutbreak, #COVID19 Because of the size of the collection, the list of identifiers is split into 36 files of up to 10 million lines each, with a tweet identifier on each line. There is a covid19filter-README.txt file containing additional documentation on how the tweets were collected. Data from the first and last days of the collection do not represent complete days. The GET statuses/lookup method supports retrieving the complete tweet for a tweet id (known as hydrating). Tools such as Twarc or Hydrator can be used to hydrate tweets. Per Twitter’s Developer Policy, tweet ids may be publicly shared for academic purposes; tweets may not. This dataset contains only tweet ids, not the actual tweets. We intend to continue updating this dataset periodically, as the collection is ongoing. Please check the Versions tab below for new versions. Questions about this dataset can be sent to sfm@gwu.edu. George Washington University researchers should contact us for access to the tweets.

  19. Z

    A dataset of media releases (Twitter, News and Comments, Youtube, Facebook)...

    • data.niaid.nih.gov
    Updated Mar 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrzej Jarynowski (2021). A dataset of media releases (Twitter, News and Comments, Youtube, Facebook) form Poland related to COVID-19 for open research [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3985567
    Explore at:
    Dataset updated
    Mar 29, 2021
    Dataset authored and provided by
    Andrzej Jarynowski
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Poland, YouTube
    Description

    Social behavior has a fundamental impact on the dynamics of infectious diseases (such as COVID-19), challenging public health mitigation strategies and possibly the political consensus. The widespread use of the traditional and social media on the Internet provides us with an invaluable source of information on societal dynamics during pandemics. With this dataset, we aim to understand mechanisms of COVID-19 epidemic-related social behavior in Poland deploying methods of computational social science and digital epidemiology. We have collected and analyzed COVID-19 perception on the Polish language Internet during 15.01-31.07(06.08) and labeled data quantitatively (Twitter, Youtube, Articles) and qualitatively (Facebook, Articles and Comments of Article) in the Internet by infomediological approach.

    • manually labelled1,449 articles / Facebook posts from Lower Silesia (facebook_articles_lower_silesia.zip) and 111 texts from outside this region;

    -manually labelled 1000 most popular tweets (twits_annotated.xlsx) with cathegories is_fake (categorical and numeric) topic and sentiment;

    -extracted 57,306 representative articles (articles_till_06_08.zip) in Polish using Eventregitry.org tool in language Polish and topic "Coronavirus" in article body;

    • extracted 1,015,199 (tweets_till_31_07_users.zip and tweets_till_31_07_text.zip) and Tweets from #Koronawirus in language Polish using Twitter API.

    • collected 1,574 videos (youtube_comments_till_31_07.zip and youtube_movie.csv) with keyword: Koronawirus on YouTube and 247,575 comments on them using Google API;

    • We supplemented the media observations with an analysis of 244 social empirical studies till 25.05 on COVID-19 in Poland (empirical_social_studies.csv).

    Reports and analyzes and coding books can be found in Polish at: http://www.infodemia-koronawirusa.pl

    Main report (in Polish) https://depot.ceon.pl/handle/123456789/19215

  20. i

    Coronavirus (COVID-19) Tweets Sentiment Trend

    • ieee-dataport.org
    Updated Nov 4, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rabindra Lamsal (2022). Coronavirus (COVID-19) Tweets Sentiment Trend [Dataset]. https://ieee-dataport.org/open-access/coronavirus-covid-19-tweets-sentiment-trend
    Explore at:
    Dataset updated
    Nov 4, 2022
    Authors
    Rabindra Lamsal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset gives a cursory glimpse at the overall sentiment trend of the public discourse regarding the COVID-19 pandemic on Twitter. The live scatter plot of this dataset is available as The Overall Trend block at https://live.rlamsal.com.np. The trend graph reveals multiple peaks and drops that need further analysis. The n-grams during those peaks and drops can prove beneficial for better understanding the discourse.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Juan M. Banda; Ramya Tekumalla; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding; Katya Artemova; Elena Tutubalina; Gerardo Chowell (2021). A large-scale COVID-19 Twitter chatter dataset for open scientific research - an international collaboration [Dataset]. http://doi.org/10.5281/zenodo.4516518

Data from: A large-scale COVID-19 Twitter chatter dataset for open scientific research - an international collaboration

Related Article
Explore at:
23 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Feb 7, 2021
Authors
Juan M. Banda; Ramya Tekumalla; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding; Katya Artemova; Elena Tutubalina; Gerardo Chowell
Description

Version 48 of the dataset. Due to the relevance of the COVID-19 global pandemic, we are releasing our dataset of tweets acquired from the Twitter Stream related to COVID-19 chatter. Since our first release we have received additional data from our new collaborators, allowing this resource to grow to its current size. Dedicated data gathering started from March 11th yielding over 4 million tweets a day. We have added additional data provided by our new collaborators from January 27th to March 27th, to provide extra longitudinal coverage. Version 10 added ~1.5 million tweets in the Russian language collected between January 1st and May 8th, gracefully provided to us by: Katya Artemova (NRU HSE) and Elena Tutubalina (KFU). From version 12 we have included daily hashtags, mentions and emoijis and their frequencies the respective zip files. From version 14 we have included the tweet identifiers and their respective language for the clean version of the dataset. Since version 20 we have included language and place location for all tweets. The data collected from the stream captures all languages, but the higher prevalence are: English, Spanish, and French. We release all tweets and retweets on the full_dataset.tsv file (948,493,362 unique tweets), and a cleaned version with no retweets on the full_dataset-clean.tsv file (238,771,950 unique tweets). There are several practical reasons for us to leave the retweets, tracing important tweets and their dissemination is one of them. For NLP tasks we provide the top 1000 frequent terms in frequent_terms.csv, the top 1000 bigrams in frequent_bigrams.csv, and the top 1000 trigrams in frequent_trigrams.csv. Some general statistics per day are included for both datasets in the full_dataset-statistics.tsv and full_dataset-clean-statistics.tsv files. For more statistics and some visualizations visit: http://www.panacealab.org/covid19/ More details can be found (and will be updated faster at: https://github.com/thepanacealab/covid19_twitter) and our pre-print about the dataset (https://arxiv.org/abs/2004.03688) As always, the tweets distributed here are only tweet identifiers (with date and time added) due to the terms and conditions of Twitter to re-distribute Twitter data ONLY for research purposes. They need to be hydrated to be used. This dataset will be updated bi-weekly at least with additional tweets, look at the github repo for these updates. Release: We have standardized the name of the resource to match our pre-print manuscript and to not have to update it every week.

Search
Clear search
Close search
Google apps
Main menu