73 datasets found

o
Data from: A large-scale COVID-19 Twitter chatter dataset for open...
explore.openaire.eu
data.niaid.nih.gov
+1more
Updated Feb 7, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juan M. Banda; Ramya Tekumalla; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding; Katya Artemova; Elena Tutubalina; Gerardo Chowell (2021). A large-scale COVID-19 Twitter chatter dataset for open scientific research - an international collaboration [Dataset]. http://doi.org/10.5281/zenodo.4516518
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.4516518
Dataset updated
Feb 7, 2021
Authors
Juan M. Banda; Ramya Tekumalla; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding; Katya Artemova; Elena Tutubalina; Gerardo Chowell
Description
Version 48 of the dataset. Due to the relevance of the COVID-19 global pandemic, we are releasing our dataset of tweets acquired from the Twitter Stream related to COVID-19 chatter. Since our first release we have received additional data from our new collaborators, allowing this resource to grow to its current size. Dedicated data gathering started from March 11th yielding over 4 million tweets a day. We have added additional data provided by our new collaborators from January 27th to March 27th, to provide extra longitudinal coverage. Version 10 added ~1.5 million tweets in the Russian language collected between January 1st and May 8th, gracefully provided to us by: Katya Artemova (NRU HSE) and Elena Tutubalina (KFU). From version 12 we have included daily hashtags, mentions and emoijis and their frequencies the respective zip files. From version 14 we have included the tweet identifiers and their respective language for the clean version of the dataset. Since version 20 we have included language and place location for all tweets. The data collected from the stream captures all languages, but the higher prevalence are: English, Spanish, and French. We release all tweets and retweets on the full_dataset.tsv file (948,493,362 unique tweets), and a cleaned version with no retweets on the full_dataset-clean.tsv file (238,771,950 unique tweets). There are several practical reasons for us to leave the retweets, tracing important tweets and their dissemination is one of them. For NLP tasks we provide the top 1000 frequent terms in frequent_terms.csv, the top 1000 bigrams in frequent_bigrams.csv, and the top 1000 trigrams in frequent_trigrams.csv. Some general statistics per day are included for both datasets in the full_dataset-statistics.tsv and full_dataset-clean-statistics.tsv files. For more statistics and some visualizations visit: http://www.panacealab.org/covid19/ More details can be found (and will be updated faster at: https://github.com/thepanacealab/covid19_twitter) and our pre-print about the dataset (https://arxiv.org/abs/2004.03688) As always, the tweets distributed here are only tweet identifiers (with date and time added) due to the terms and conditions of Twitter to re-distribute Twitter data ONLY for research purposes. They need to be hydrated to be used. This dataset will be updated bi-weekly at least with additional tweets, look at the github repo for these updates. Release: We have standardized the name of the resource to match our pre-print manuscript and to not have to update it every week.
A Twitter Dataset of 150+ million tweets related to COVID-19 for open...
zenodo.org
application/gzip, csv +1
Updated Apr 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juan M. Banda; Juan M. Banda; Ramya Tekumalla; Ramya Tekumalla; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding; Gerardo Chowell; Gerardo Chowell; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding (2023). A Twitter Dataset of 150+ million tweets related to COVID-19 for open research [Dataset]. http://doi.org/10.5281/zenodo.3738018
Explore at:
application/gzip, csv, tsvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3738018
Dataset updated
Apr 17, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Juan M. Banda; Juan M. Banda; Ramya Tekumalla; Ramya Tekumalla; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding; Gerardo Chowell; Gerardo Chowell; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding
Description
Due to the relevance of the COVID-19 global pandemic, we are releasing our dataset of tweets acquired from the Twitter Stream related to COVID-19 chatter. Since our first release we have received additional data from our new collaborators, allowing this resource to grow to its current size. Dedicated data gathering started from March 11th yielding over 4 million tweets a day. We have added additional data provided by our new collaborators from January 27th to March 27th, to provide extra longitudinal coverage.

The data collected from the stream captures all languages, but the higher prevalence are: English, Spanish, and French. We release all tweets and retweets on the full_dataset.tsv file (152,920,832 unique tweets), and a cleaned version with no retweets on the full_dataset-clean.tsv file (30,990,645 unique tweets). There are several practical reasons for us to leave the retweets, tracing important tweets and their dissemination is one of them. For NLP tasks we provide the top 1000 frequent terms in frequent_terms.csv, the top 1000 bigrams in frequent_bigrams.csv, and the top 1000 trigrams in frequent_trigrams.csv. Some general statistics per day are included for both datasets in the statistics-full_dataset.tsv and statistics-full_dataset-clean.tsv files.

More details can be found (and will be updated faster at: https://github.com/thepanacealab/covid19_twitter)

As always, the tweets distributed here are only tweet identifiers (with date and time added) due to the terms and conditions of Twitter to re-distribute Twitter data ONLY for research purposes. The need to be hydrated to be used.
o
Data from: Global Reactions to COVID-19 on Twitter: A Labelled Dataset with...
openicpsr.org
delimited
Updated Feb 16, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raj Gupta; Ajay Vishwanath; Yinping Yang (2021). Global Reactions to COVID-19 on Twitter: A Labelled Dataset with Latent Topic, Sentiment and Emotion Attributes [Dataset]. http://doi.org/10.3886/E120321V6
Explore at:
delimitedAvailable download formats
Unique identifier
https://doi.org/10.3886/E120321V6
Dataset updated
Feb 16, 2021
Dataset provided by
Institute of High Performance Computing (IHPC), A*STAR
Institute of High Performance Computing, A*STAR
Authors
Raj Gupta; Ajay Vishwanath; Yinping Yang
Time period covered
Jan 28, 2020 - Jul 1, 2020
Area covered
Global
Description
This project aims to present a large dataset for researchers to discover public conversation on Twitter surrounding the COVID-19 pandemic. As strong concerns and emotions are expressed in the publicly available tweets, we annotated seventeen latent semantic attributes for each public tweet using natural language processing techniques and machine-learning based algorithms. The latent semantic attributes include: 1) ten attributes indicating the tweet’s relevance to ten detected topics, 2) five quantitative attributes indicating the degree of intensity in the valence (i.e., unpleasantness/pleasantness) and emotional intensities across four primary emotions of fear, anger, sadness and joy, and 3) two qualitative attributes indicating the sentiment category and the most dominant emotion category, respectively.
The complete corpus of #COVID-19 Twitter dataset
zenodo.org
txt
Updated Jun 24, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Despoina Antonakaki; Despoina Antonakaki (2021). The complete corpus of #COVID-19 Twitter dataset [Dataset]. http://doi.org/10.5281/zenodo.4899941
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4899941
Dataset updated
Jun 24, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Despoina Antonakaki; Despoina Antonakaki
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description

COVID-19 pandemic initiated over a year ago continues to spread around the globe and the ongoing research regarding COVID-19 is on a continues growth as well. The online discourse on social media regarding COVID-19 has been growing along with the timeline of the pandemic.

Open data on Twitter have been released and offer the research community the opportunity for new findings and resolving this new threat. In this dataset, we open a corpus of Twitter's data from March 2020 till today, that is being updated every day based on the two most important hashtags regarding COVID-19. This dataset will offer the research community the opportunity to explore the social extensions of this pandemic including topic analysis, hate speech sentiment analysis, regarding either the opinion of the users on the pandemic, the comments on the public discourse, or the vaccination releases. The dataset has been collected by retrieving all the tweets that contain the hashtags: #coronavirus and #COVID19 including approximately 208M tweets for hashtags #coronavirus and 392M tweets for hashtag #COVID-19, resulting in a total of 600M tweets.
i
Place#Hashtag Twitter Dataset: COVID-19 Hashtags
ieee-dataport.org
Updated Jul 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Suraksha Pokhrel (2025). Place#Hashtag Twitter Dataset: COVID-19 Hashtags [Dataset]. https://ieee-dataport.org/open-access/placehashtag-twitter-dataset-covid-19-hashtags
Explore at:
Dataset updated
Jul 29, 2025
Authors
Suraksha Pokhrel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
hashtag] relations from 190 countries and territories
o
Data from: COVID-19 Twitter Dataset with Latent Topics, Sentiments and...
openicpsr.org
Updated Jul 18, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yinping Yang (2020). COVID-19 Twitter Dataset with Latent Topics, Sentiments and Emotions Attributes [Dataset]. http://doi.org/10.3886/E120321V1
Explore at:
Unique identifier
https://doi.org/10.3886/E120321V1
Dataset updated
Jul 18, 2020
Dataset provided by
Dr.
Authors
Yinping Yang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 28, 2020 - Jul 1, 2020
Area covered
Global
Description
We collected and processed a dataset and make it available for the research community to study the COVD-19 pandemic in multiple possibilities.
Covid - 19 Twitter Dataset
figshare.com
zip
Updated Feb 3, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kate finch (2021). Covid - 19 Twitter Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.13698856.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13698856.v1
Dataset updated
Feb 3, 2021
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Kate finch
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
I am sharing covid 19 Twitter dataset to the research community containing large Tweets. I hope this data set will enable the study of online conversation dynamics in the context of a global outbreak of unprecedented proportions and implications. I have collected this dataset using Trackmyhashtag, an affordable platform.I hope researchers find it helpful. If you need more datasets, let me know.
d
Data from: A Review of Models for Hydrating Large-scale Twitter Data of...
dataone.org
dataverse.harvard.edu
Updated Nov 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arafat, Mahmoud (2023). A Review of Models for Hydrating Large-scale Twitter Data of COVID-19-related Tweets for Transportation Research [Dataset]. http://doi.org/10.7910/DVN/LJWIGZ
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/LJWIGZ
Dataset updated
Nov 22, 2023
Dataset provided by
Harvard Dataverse
Authors
Arafat, Mahmoud
Description
In response to the Coronavirus disease (COVID-19) outbreak and the Transportation Research Board’s (TRB) urgent need for work related to transportation and pandemics, this paper contributes with a sense of urgency and provides a starting point for research on the topic. The main goal of this paper is to support transportation researchers and the TRB community during this COVID-19 pandemic by reviewing the performance of software models used for extracting large-scale data from Twitter streams related to COVID-19. The study extends the previous research efforts in social media data mining by providing a review of contemporary tools, including their computing maturity and their potential usefulness. The paper also includes an open repository for the processed data frames to facilitate the quick development of new transportation research studies. The output of this work is recommended to be used by the TRB community when deciding to further investigate topics related to COVID-19 and social media data mining tools.
A Twitter Dataset of 70+ million tweets related to COVID-19
zenodo.org
csv, tsv, zip
Updated Apr 17, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juan M. Banda; Juan M. Banda; Ramya Tekumalla; Ramya Tekumalla; Gerardo Chowell; Gerardo Chowell (2023). A Twitter Dataset of 70+ million tweets related to COVID-19 [Dataset]. http://doi.org/10.5281/zenodo.3732460
Explore at:
csv, tsv, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3732460
Dataset updated
Apr 17, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Juan M. Banda; Juan M. Banda; Ramya Tekumalla; Ramya Tekumalla; Gerardo Chowell; Gerardo Chowell
Description
Due to the relevance of the COVID-19 global pandemic, we are releasing our dataset of tweets acquired from the Twitter Stream related to COVID-19 chatter. The first 9 weeks of data (from January 1st, 2020 to March 11th, 2020) contain very low tweet counts as we filtered other data we were collecting for other research purposes, however, one can see the dramatic increase as the awareness for the virus spread. Dedicated data gathering started from March 11th to March 29th which yielded over 4 million tweets a day.

The data collected from the stream captures all languages, but the higher prevalence are: English, Spanish, and French. We release all tweets and retweets on the full_dataset.tsv file (70,569,368 unique tweets), and a cleaned version with no retweets on the full_dataset-clean.tsv file (13,535,912 unique tweets). There are several practical reasons for us to leave the retweets, tracing important tweets and their dissemination is one of them. For NLP tasks we provide the top 1000 frequent terms in frequent_terms.csv, the top 1000 bigrams in frequent_bigrams.csv, and the top 1000 trigrams in frequent_trigrams.csv. Some general statistics per day are included for both datasets in the statistics-full_dataset.tsv and statistics-full_dataset-clean.tsv files.

More details can be found (and will be updated faster at: https://github.com/thepanacealab/covid19_twitter)

As always, the tweets distributed here are only tweet identifiers (with date and time added) due to the terms and conditions of Twitter to re-distribute Twitter data. The need to be hydrated to be used.
m
SpanishTweetsCOVID-19: A Social Media Enriched Covid-19 Twitter Spanish...
data.mendeley.com
Updated Feb 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Antonela Tommasel (2025). SpanishTweetsCOVID-19: A Social Media Enriched Covid-19 Twitter Spanish Dataset [Dataset]. http://doi.org/10.17632/nv8k69y59d.4
Explore at:
Unique identifier
https://doi.org/10.17632/nv8k69y59d.4
Dataset updated
Feb 19, 2025
Authors
Antonela Tommasel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset presents a large-scale collection of millions of Twitter posts related to the coronavirus pandemic in Spanish language. The collection was built by monitoring public posts written in Spanish containing a diverse set of hashtags related to the COVID-19, as well as tweets shared by the official Argentinian government offices, such as ministries and secretaries at different levels. Data was collected between March and October 2020 using the Twitter API, and will be periodically updated.

In addition to tweets IDs, the dataset includes information about mentions, retweets, media, URLs, hashtags, replies, users and content-based user relations, allowing the observation of the dynamics of the shared information. Data is presented in different tables that can be analysed separately or combined.

The dataset aims at serving as source for studying several coronavirus effects in people through social media, including the impact of public policies, the perception of risk and related disease consequences, the adoption of guidelines, the emergence, dynamics and propagation of disinformation and rumours, the formation of communities and other social phenomena, the evolution of health related indicators (such as fear, stress, sleep disorders, or children behaviour changes), among other possibilities. In this sense, the dataset can be useful for multi-disciplinary researchers related to the different fields of data science, social network analysis, social computing, medical informatics, social sciences, among others.
Dataset: Tracking the Twitter attention around the research efforts on the...
figshare.com
xlsx
Updated Feb 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhichao Fang (2021). Dataset: Tracking the Twitter attention around the research efforts on the COVID-19 pandemic [Dataset]. http://doi.org/10.6084/m9.figshare.12490457.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12490457.v1
Dataset updated
Feb 28, 2021
Dataset provided by
figshare
Authors
Zhichao Fang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Datasets of the figures in the paper "Tracking the Twitter attention around the research effort on the COVID-19 pandemic".
COVID-19 vaccine sentiment on Twitter
figshare.com
txt
Updated Nov 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ferenc Beres; Andras A. Benczur (2023). COVID-19 vaccine sentiment on Twitter [Dataset]. http://doi.org/10.6084/m9.figshare.24647883.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24647883.v1
Dataset updated
Nov 28, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Ferenc Beres; Andras A. Benczur
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
From 24 January to 31 July in 2021, we collected data that anyone can view on Twitter by using the free Twitter API. By using the keywords “vaccine”, “vaccination”, “vaccinated”, “vaxxer”, “vaxxers”, “#CovidVaccine”, “covid denier”, “pfizer”, “moderna”, “astra” and “zeneca”, “sinopharm”, “sputnik”, we collected 33K tweets published by popular Twitter accounts. For each tweet, the following variables were recorded: their author (user ID), the author's categorization (healthcare professional, news media source, other accounts with thousands of followers), the date of publication (to the precision of seconds), the vaccine mentioned, the language, and the general sentiment of the tweet text on a scale from 1 to 5. For multilingual sentiment analysis, we used an open-source BERT model from Huggingface (https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment). When multiple vaccines are mentioned in a tweet, in our data, it is recorded as multiple tweets, one for each vaccine.To uphold the privacy policy for publishing Twitter data, the tweet texts, as well as the original user identifiers for the authors of the tweets, are not disclosed. Instead, we encoded the user information with random integers. To access the complete content of these tweets, researchers may utilize the Twitter search API by referencing the provided tweet identifiers.
A Twitter Dataset of 40+ million tweets related to COVID-19
zenodo.org
explore.openaire.eu
csv, tsv
Updated Apr 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juan M. Banda; Juan M. Banda; Ramya Tekumalla; Ramya Tekumalla (2023). A Twitter Dataset of 40+ million tweets related to COVID-19 [Dataset]. http://doi.org/10.5281/zenodo.3723940
Explore at:
tsv, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3723940
Dataset updated
Apr 17, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Juan M. Banda; Juan M. Banda; Ramya Tekumalla; Ramya Tekumalla
Description
Due to the relevance of the COVID-19 global pandemic, we are releasing our dataset of tweets acquired from the Twitter Stream related to COVID-19 chatter. The first 9 weeks of data (from January 1st, 2020 to March 11th, 2020) contain very low tweet counts as we filtered other data we were collecting for other research purposes, however, one can see the dramatic increase as the awareness for the virus spread. Dedicated data gathering started from March 11th to March 22nd which yielded over 4 million tweets a day.

The data collected from the stream captures all languages, but the higher prevalence are: English, Spanish, and French. We release all tweets and retweets on the full_dataset.tsv file (40,823,816 unique tweets), and a cleaned version with no retweets on the full_dataset-clean.tsv file (7,479,940 unique tweets). There are several practical reasons for us to leave the retweets, tracing important tweets and their dissemination is one of them. For NLP tasks we provide the top 1000 frequent terms in frequent_terms.csv, the top 1000 bigrams in frequent_bigrams.csv, and the top 1000 trigrams in frequent_trigrams.csv. Some general statistics per day are included for both datasets in the statistics-full_dataset.tsv and statistics-full_dataset-clean.tsv files.

More details can be found (and will be updated faster at: https://github.com/thepanacealab/covid19_twitter)

As always, the tweets distributed here are only tweet identifiers (with date and time added) due to the terms and conditions of Twitter to re-distribute Twitter data. The need to be hydrated to be used.
i
Data from: A Large-Scale Dataset of Twitter Chatter about Online Learning...
ieee-dataport.org
Updated Aug 10, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nirmalya Thakur (2022). A Large-Scale Dataset of Twitter Chatter about Online Learning during the Current COVID-19 Omicron Wave [Dataset]. https://ieee-dataport.org/documents/large-scale-dataset-twitter-chatter-about-online-learning-during-current-covid-19-omicron
Explore at:
Dataset updated
Aug 10, 2022
Authors
Nirmalya Thakur
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
no. 8
i
Coronavirus (COVID-19) Tweets Dataset
ieee-dataport.org
search.datacite.org
+1more
Updated May 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rabindra Lamsal (2025). Coronavirus (COVID-19) Tweets Dataset [Dataset]. https://ieee-dataport.org/open-access/coronavirus-covid-19-tweets-dataset
Explore at:
Dataset updated
May 7, 2025
Authors
Rabindra Lamsal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
2020
o
Social Media Dataset of Covid-aware Publics
ordo.open.ac.uk
csv
Updated Sep 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christian Nold (2024). Social Media Dataset of Covid-aware Publics [Dataset]. http://doi.org/10.21954/ou.rd.27044467.v1
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.21954/ou.rd.27044467.v1
Dataset updated
Sep 30, 2024
Dataset provided by
The Open University
Authors
Christian Nold
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a dataset of tweets by and about COVID-aware publics from the 'X' (Twitter) social media platform collected by the author. The dataset consists of 344 textual tweets regarding COVID-related material practices gathered during the research period Jan 2023 - Sep 2024, yet the dataset also includes tweets created before this date.The textual data has been rewritten to fully anonymise the people who made the tweets, and identifiable contexts have been removed. In addition, all date/time metadata and hashtags, as well as any attached images, have been removed. Square brackets have been used for editorial edits to obfuscate entities or add context to tweets. The dataset consists of a structured comma-separated text file that can be read in any spreadsheet software to maximise accessibility.The research dataset was created with Open university HREC approval: HREC/4557/Nold
i
000 Tweets
ieee-dataport.org
Updated Jul 25, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nirmalya Thakur (2022). 000 Tweets [Dataset]. https://ieee-dataport.org/documents/twitter-conversations-about-covid-19-omicron-variant-large-scale-dataset-more-500000
Explore at:
Dataset updated
Jul 25, 2022
Authors
Nirmalya Thakur
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
2022
H
Coronavirus Tweet Ids
dataverse.harvard.edu
Updated Nov 17, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Kerchner; Laura Wrubel; Dolsy Smith (2022). Coronavirus Tweet Ids [Dataset]. http://doi.org/10.7910/DVN/LW0BTB
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/LW0BTB
Dataset updated
Nov 17, 2022
Dataset provided by
Harvard Dataverse
Authors
Daniel Kerchner; Laura Wrubel; Dolsy Smith
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
Mar 2, 2020 - Jul 27, 2022
Description
This dataset contains the tweet ids of 354,903,485 tweets related to Coronavirus or COVID-19. They were collected between March 3, 2020 and December 3, 2020 from the Twitter API using Social Feed Manager. Please note that this is VERSION 9 of this data set. See the Versions tab below for all versions. Version 1 contains tweets from March 3, 2020 through March 19, 2020. Version 2 contains tweets from March 3, 2020 through March 31, 2020. Version 4 contains tweets from March 3, 2020 through April 16, 2020. Version 5 contains tweets from March 3 through May 1, 2020. Version 6 contains tweets from March 3 through May 27, 2020. Version 7 contains tweets from March 3 through June 9, 2020. Version 8 contains tweets from March 3, 2020 through July 27, 2022 These tweets were collected using the POST statuses/filter method of the Twitter Stream API, using the track parameter with the following keywords: #Coronavirus, #Coronaoutbreak, #COVID19 Because of the size of the collection, the list of identifiers is split into 36 files of up to 10 million lines each, with a tweet identifier on each line. There is a covid19filter-README.txt file containing additional documentation on how the tweets were collected. Data from the first and last days of the collection do not represent complete days. The GET statuses/lookup method supports retrieving the complete tweet for a tweet id (known as hydrating). Tools such as Twarc or Hydrator can be used to hydrate tweets. Per Twitter’s Developer Policy, tweet ids may be publicly shared for academic purposes; tweets may not. This dataset contains only tweet ids, not the actual tweets. We intend to continue updating this dataset periodically, as the collection is ongoing. Please check the Versions tab below for new versions. Questions about this dataset can be sent to sfm@gwu.edu. George Washington University researchers should contact us for access to the tweets.
Z
A dataset of media releases (Twitter, News and Comments, Youtube, Facebook)...
data.niaid.nih.gov
Updated Mar 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrzej Jarynowski (2021). A dataset of media releases (Twitter, News and Comments, Youtube, Facebook) form Poland related to COVID-19 for open research [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3985567
Explore at:
Dataset updated
Mar 29, 2021
Dataset authored and provided by
Andrzej Jarynowski
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Poland, YouTube
Description
Social behavior has a fundamental impact on the dynamics of infectious diseases (such as COVID-19), challenging public health mitigation strategies and possibly the political consensus. The widespread use of the traditional and social media on the Internet provides us with an invaluable source of information on societal dynamics during pandemics. With this dataset, we aim to understand mechanisms of COVID-19 epidemic-related social behavior in Poland deploying methods of computational social science and digital epidemiology. We have collected and analyzed COVID-19 perception on the Polish language Internet during 15.01-31.07(06.08) and labeled data quantitatively (Twitter, Youtube, Articles) and qualitatively (Facebook, Articles and Comments of Article) in the Internet by infomediological approach.

manually labelled1,449 articles / Facebook posts from Lower Silesia (facebook_articles_lower_silesia.zip) and 111 texts from outside this region;

-manually labelled 1000 most popular tweets (twits_annotated.xlsx) with cathegories is_fake (categorical and numeric) topic and sentiment;

-extracted 57,306 representative articles (articles_till_06_08.zip) in Polish using Eventregitry.org tool in language Polish and topic "Coronavirus" in article body;

extracted 1,015,199 (tweets_till_31_07_users.zip and tweets_till_31_07_text.zip) and Tweets from #Koronawirus in language Polish using Twitter API.

collected 1,574 videos (youtube_comments_till_31_07.zip and youtube_movie.csv) with keyword: Koronawirus on YouTube and 247,575 comments on them using Google API;

We supplemented the media observations with an analysis of 244 social empirical studies till 25.05 on COVID-19 in Poland (empirical_social_studies.csv).

Reports and analyzes and coding books can be found in Polish at: http://www.infodemia-koronawirusa.pl

Main report (in Polish) https://depot.ceon.pl/handle/123456789/19215
i
Coronavirus (COVID-19) Tweets Sentiment Trend
ieee-dataport.org
Updated Nov 4, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rabindra Lamsal (2022). Coronavirus (COVID-19) Tweets Sentiment Trend [Dataset]. https://ieee-dataport.org/open-access/coronavirus-covid-19-tweets-sentiment-trend
Explore at:
Dataset updated
Nov 4, 2022
Authors
Rabindra Lamsal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset gives a cursory glimpse at the overall sentiment trend of the public discourse regarding the COVID-19 pandemic on Twitter. The live scatter plot of this dataset is available as The Overall Trend block at https://live.rlamsal.com.np. The trend graph reveals multiple peaks and drops that need further analysis. The n-grams during those peaks and drops can prove beneficial for better understanding the discourse.

Facebook

Twitter

Click to copy link

Link copied

Cite

Juan M. Banda; Ramya Tekumalla; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding; Katya Artemova; Elena Tutubalina; Gerardo Chowell (2021). A large-scale COVID-19 Twitter chatter dataset for open scientific research - an international collaboration [Dataset]. http://doi.org/10.5281/zenodo.4516518

Data from: A large-scale COVID-19 Twitter chatter dataset for open scientific research - an international collaboration

Explore at:

23 scholarly articles cite this dataset (View in Google Scholar)

Unique identifier

https://doi.org/10.5281/zenodo.4516518

Dataset updated

Feb 7, 2021

Authors

Juan M. Banda; Ramya Tekumalla; Guanyu Wang; Jingyuan Yu; Tuo Liu; Yuning Ding; Katya Artemova; Elena Tutubalina; Gerardo Chowell

Description

Version 48 of the dataset. Due to the relevance of the COVID-19 global pandemic, we are releasing our dataset of tweets acquired from the Twitter Stream related to COVID-19 chatter. Since our first release we have received additional data from our new collaborators, allowing this resource to grow to its current size. Dedicated data gathering started from March 11th yielding over 4 million tweets a day. We have added additional data provided by our new collaborators from January 27th to March 27th, to provide extra longitudinal coverage. Version 10 added ~1.5 million tweets in the Russian language collected between January 1st and May 8th, gracefully provided to us by: Katya Artemova (NRU HSE) and Elena Tutubalina (KFU). From version 12 we have included daily hashtags, mentions and emoijis and their frequencies the respective zip files. From version 14 we have included the tweet identifiers and their respective language for the clean version of the dataset. Since version 20 we have included language and place location for all tweets. The data collected from the stream captures all languages, but the higher prevalence are: English, Spanish, and French. We release all tweets and retweets on the full_dataset.tsv file (948,493,362 unique tweets), and a cleaned version with no retweets on the full_dataset-clean.tsv file (238,771,950 unique tweets). There are several practical reasons for us to leave the retweets, tracing important tweets and their dissemination is one of them. For NLP tasks we provide the top 1000 frequent terms in frequent_terms.csv, the top 1000 bigrams in frequent_bigrams.csv, and the top 1000 trigrams in frequent_trigrams.csv. Some general statistics per day are included for both datasets in the full_dataset-statistics.tsv and full_dataset-clean-statistics.tsv files. For more statistics and some visualizations visit: http://www.panacealab.org/covid19/ More details can be found (and will be updated faster at: https://github.com/thepanacealab/covid19_twitter) and our pre-print about the dataset (https://arxiv.org/abs/2004.03688) As always, the tweets distributed here are only tweet identifiers (with date and time added) due to the terms and conditions of Twitter to re-distribute Twitter data ONLY for research purposes. They need to be hydrated to be used. This dataset will be updated bi-weekly at least with additional tweets, look at the github repo for these updates. Release: We have standardized the name of the resource to match our pre-print manuscript and to not have to update it every week.

Clear search

Close search

Google apps

Main menu

Data from: A large-scale COVID-19 Twitter chatter dataset for open...

A Twitter Dataset of 150+ million tweets related to COVID-19 for open...

Data from: Global Reactions to COVID-19 on Twitter: A Labelled Dataset with...

The complete corpus of #COVID-19 Twitter dataset

Place#Hashtag Twitter Dataset: COVID-19 Hashtags

Data from: COVID-19 Twitter Dataset with Latent Topics, Sentiments and...

Covid - 19 Twitter Dataset

Data from: A Review of Models for Hydrating Large-scale Twitter Data of...

A Twitter Dataset of 70+ million tweets related to COVID-19

SpanishTweetsCOVID-19: A Social Media Enriched Covid-19 Twitter Spanish...

Dataset: Tracking the Twitter attention around the research efforts on the...

COVID-19 vaccine sentiment on Twitter

A Twitter Dataset of 40+ million tweets related to COVID-19

Data from: A Large-Scale Dataset of Twitter Chatter about Online Learning...

Coronavirus (COVID-19) Tweets Dataset

Social Media Dataset of Covid-aware Publics

000 Tweets

Coronavirus Tweet Ids

A dataset of media releases (Twitter, News and Comments, Youtube, Facebook)...

Coronavirus (COVID-19) Tweets Sentiment Trend

Data from: A large-scale COVID-19 Twitter chatter dataset for open scientific research - an international collaborationSee More Versions

Data from: A large-scale COVID-19 Twitter chatter dataset for open scientific research - an international collaboration