100+ datasets found
  1. Types of digital media used the most in coronavirus outbreak in China 2020

    • statista.com
    Updated Nov 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Types of digital media used the most in coronavirus outbreak in China 2020 [Dataset]. https://www.statista.com/statistics/1108394/china-popular-digital-media-usage-in-coronavirus-covid19-outbreak-period-by-type/
    Explore at:
    Dataset updated
    Nov 27, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Feb 20, 2020 - Feb 21, 2020
    Area covered
    China
    Description

    According to a survey on the impact of coronavirus COVID-19 on media usage conducted in February 2020, there was a significant increase in digital media usage among Chinese consumers during the epidemic period. About ********* of respondents stated that they spent most of the time using the Chinese instant messenger WeChat. TV and online video platforms were other media types that captured most of the consumer attention.

  2. COVID-19 impact on daily social media usage in Finland 2020, by platform

    • statista.com
    Updated May 14, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2020). COVID-19 impact on daily social media usage in Finland 2020, by platform [Dataset]. https://www.statista.com/statistics/1186531/coronavirus-impact-on-social-media-usage-by-platform-finland/
    Explore at:
    Dataset updated
    May 14, 2020
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Mar 13, 2020 - Mar 22, 2020
    Area covered
    Finland
    Description

    Based on survey results from **********, daily usage of WhatsApp and Instagram increased the most due to the coronavirus (COVID-19) outbreak in Finland. WhatsApp usage among Finns increased by **** percent compared to the period before the COVID-19 restriction measures were put in place. While most social media platforms increased their popularity, daily usage of Facebook, internet forums, blogs, and LinkedIn decreased during the pandemic.

  3. COVID-19 Sentiment: 500K Instagram Posts (2020-24)

    • kaggle.com
    zip
    Updated Oct 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nirmalya Thakur, PhD (2024). COVID-19 Sentiment: 500K Instagram Posts (2020-24) [Dataset]. https://www.kaggle.com/datasets/thakurnirmalya/covid-19-sentiment-500k-instagram-posts-2020-24
    Explore at:
    zip(118444389 bytes)Available download formats
    Dataset updated
    Oct 21, 2024
    Authors
    Nirmalya Thakur, PhD
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Please cite the following paper when using this dataset:

    N. Thakur, “Five Years of COVID-19 Discourse on Instagram: A Labeled Instagram Dataset of Over Half a Million Posts for Multilingual Sentiment Analysis”, Proceedings of the 7th International Conference on Machine Learning and Natural Language Processing (MLNLP 2024), Chengdu, China, October 18-20, 2024 (Paper accepted for publication, Preprint available at: https://arxiv.org/abs/2410.03293)

    Abstract

    The outbreak of COVID-19 served as a catalyst for content creation and dissemination on social media platforms, as such platforms serve as virtual communities where people can connect and communicate with one another seamlessly. While there have been several works related to the mining and analysis of COVID-19-related posts on social media platforms such as Twitter (or X), YouTube, Facebook, and TikTok, there is still limited research that focuses on the public discourse on Instagram in this context. Furthermore, the prior works in this field have only focused on the development and analysis of datasets of Instagram posts published during the first few months of the outbreak. The work presented in this paper aims to address this research gap and presents a novel multilingual dataset of 500,153 Instagram posts about COVID-19 published between January 2020 and September 2024. This dataset contains Instagram posts in 161 different languages. After the development of this dataset, multilingual sentiment analysis was performed using VADER and twitter-xlm-roberta-base-sentiment. This process involved classifying each post as positive, negative, or neutral. The results of sentiment analysis are presented as a separate attribute in this dataset.

    For each of these posts, the Post ID, Post Description, Date of publication, language code, full version of the language, and sentiment label are presented as separate attributes in the dataset.

    The Instagram posts in this dataset are present in 161 different languages out of which the top 10 languages in terms of frequency are English (343041 posts), Spanish (30220 posts), Hindi (15832 posts), Portuguese (15779 posts), Indonesian (11491 posts), Tamil (9592 posts), Arabic (9416 posts), German (7822 posts), Italian (5162 posts), Turkish (4632 posts)

    There are 535,021 distinct hashtags in this dataset with the top 10 hashtags in terms of frequency being #covid19 (169865 posts), #covid (132485 posts), #coronavirus (117518 posts), #covid_19 (104069 posts), #covidtesting (95095 posts), #coronavirusupdates (75439 posts), #corona (39416 posts), #healthcare (38975 posts), #staysafe (36740 posts), #coronavirusoutbreak (34567 posts)

    The following is a description of the attributes present in this dataset - Post ID: Unique ID of each Instagram post - Post Description: Complete description of each post in the language in which it was originally published - Date: Date of publication in MM/DD/YYYY format - Language code: Language code (for example: “en”) that represents the language of the post as detected using the Google Translate API - Full Language: Full form of the language (for example: “English”) that represents the language of the post as detected using the Google Translate API - Sentiment: Results of sentiment analysis (using the preprocessed version of each post) where each post was classified as positive, negative, or neutral

    Open Research Questions

    This dataset is expected to be helpful for the investigation of the following research questions and even beyond:

    • How does sentiment toward COVID-19 vary across different languages?
    • How has public sentiment toward COVID-19 evolved from 2020 to the present?
    • How do cultural differences affect social media discourse about COVID-19 across various languages?
    • How has COVID-19 impacted mental health, as reflected in social media posts across different languages?
    • How effective were public health campaigns in shifting public sentiment in different languages?
    • What patterns of vaccine hesitancy or support are present in different languages?
    • How did geopolitical events influence public sentiment about COVID-19 in multilingual social media discourse?
    • What role does social media discourse play in shaping public behavior toward COVID-19 in different linguistic communities?
    • How does the sentiment of minority or underrepresented languages compare to that of major world languages regarding COVID-19?
    • What insights can be gained by comparing the sentiment of COVID-19 posts in widely spoken languages (e.g., English, Spanish) to those in less common languages?

    All the Instagram posts that were collected during this data mining process to develop this dataset were publicly available on Instagram and did not require a user to log in to Instagram to view the same (at the time of writing this paper).

  4. May 2020 Covid-19 Twitter Streaming Dataset

    • figshare.com
    application/gzip
    Updated Oct 28, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Social Media Lab (2021). May 2020 Covid-19 Twitter Streaming Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.16897045.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Oct 28, 2021
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Social Media Lab
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The file contains Tweet IDs* for COVID-19 related tweets collected in May, 2020 from Twitter's COVID-19 Streaming Endpoint via a custom script developed by the Social Media Lab (https://socialmedialab.ca/).Visit our interactive dashboard at https://stream.covid19misinfo.org/ for a preview and some general stats about this COVID-19 Twitter streaming dataset.For more info about Twitter's COVID-19 Streaming Endpoint, visit https://developer.twitter.com/en/docs/labs/covid19-stream/overviewNote: In accordance with Twitter API Terms, the dataset only includes Tweet IDs (as opposed to the actual tweets and associated metadata). To recollect tweets contained in this dataset, you can use programs such as Hydrator (https://github.com/DocNow/hydrator/) or the Python library Twarc (https://github.com/DocNow/twarc/).

  5. Increased digital media consumption after coronavirus outbreak in China...

    • statista.com
    Updated Jul 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Increased digital media consumption after coronavirus outbreak in China 2020, by type [Dataset]. https://www.statista.com/statistics/1108438/china-digital-media-usage-after-coronavirus-covid19-outbreak-period-by-type/
    Explore at:
    Dataset updated
    Jul 7, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Feb 20, 2020 - Feb 21, 2020
    Area covered
    China
    Description

    According to a survey on the impact of coronavirus COVID-19 on media usage conducted in February 2020, Chinese consumers intended to increase their digital media usage after the epidemic. Almost ** percent of respondents stated that they would spend more time on watching TV after everything is back to normal again.

  6. g

    COVID-19 Social Media Counts & Sentiment

    • covid-hub.gio.georgia.gov
    Updated Apr 6, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    foustl32 (2020). COVID-19 Social Media Counts & Sentiment [Dataset]. https://covid-hub.gio.georgia.gov/datasets/feb6280d42de4e91b47cf37344a91eae
    Explore at:
    Dataset updated
    Apr 6, 2020
    Dataset authored and provided by
    foustl32
    Area covered
    Description

    Update: As of August 26th, 2020 we are sunsetting updates to this free dataset. Please reach out to lyden@spatial.ai if you have interest in this data, Geosocial data, or other related datasets. As part of an effort to provide open source resources and data related to the COVID-19 outbreak, this feature layer includes counts of social media posts aggregated at the county that mention COVID-19. This data is provided historically week over week as far back January 26th, 2020. This feature service will be refreshed regularly to remain up to date. It was most recently updated using data collected through August 24th. Data also includes information about the sentiment of posts collected. Posts are classified as negative, neutral, or positive and aggregated at a county level per week. To perform sentiment analysis, the VADER (Valence Aware Dictionary and sEntiment Reasoner) model was used. This feature service was developed in collaboration between Datastory & Spatial.ai. There's a powerful story hidden in your data... Datastory can help you see it. Visit www.datastoryconsulting.com to learn more. Social media counts and statistics come from Twitter data collected by Spatial.ai for the creation of Geosocial data, which uses machine learning to create geographic social media segmentation. Learn more about the underlying data at https://spatial.ai/esri or reach out to lyden@spatial.ai for more information.

  7. q

    100 days of COVID-19 in the Australian Twittersphere

    • researchdatafinder.qut.edu.au
    Updated Sep 6, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Digital Observatory (2020). 100 days of COVID-19 in the Australian Twittersphere [Dataset]. https://researchdatafinder.qut.edu.au/individual/n10613
    Explore at:
    Dataset updated
    Sep 6, 2020
    Dataset provided by
    Queensland University of Technology (QUT)
    Authors
    Digital Observatory
    Description

    The Australian Twittersphere is a database of tweets from identified Australian accounts, originally set up through the TrISMA project, and now managed by the QUT Digital Observatory. This dataset includes 3.7 million Australian Twitter accounts, with 1.8 billion tweets captured to date. Since the beginning of 2019, there have been about 800,000 new tweets per day, from 100,000 daily active users. The 100 days of COVID-19 in the Australian Twittersphere dataset consists of 2.8 million tweet IDs corresponding to tweets from the Australian Twittersphere that mention the COVID-19 pandemic, either through Coronavirus specific hashtags or keywords. The tweets were created on or after 20 January 2020, and up until 23 May 2020 (the 15 weeks that form the first ‘100 days’ of COVID-19 in Australia). This dataset provides a glimpse of the experiences and attitudes of Australians presently living through this global pandemic. We are all in this together and as such this dataset has been released as rapidly as possible to enable use by the broader research community.

    The SQL used to extract the tweets from the Australian Twittersphere database is as follows:

    SELECT tweet_id FROM oz_twitter.tweet WHERE created_at >= '2020-01-19 14:00:00' -- corresponds to >= '2020-01-20 00:00:00' in Brisbane time AND created_at < '2020-05-03 14:00:00' -- corresponds to < '2020-05-04 00:00:00' in Brisbane time AND multiMatchAny(lower(text), ['covid', 'corona', 'flattenthecurve', 'socialdistancing', 'stayhome', 'lockdown', 'wuhan', 'pandemic']) AND notEmpty(hashtags) = 1;

    Access to the Australian Twittersphere database is managed by the QUT Digital Observatory.

  8. Z

    Data from: A dataset of Covid-related misinformation videos and their spread...

    • data.niaid.nih.gov
    Updated Feb 24, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Knuutila, Aleksi (2021). A dataset of Covid-related misinformation videos and their spread on social media [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4557827
    Explore at:
    Dataset updated
    Feb 24, 2021
    Dataset provided by
    Oxford Internet Institute
    Authors
    Knuutila, Aleksi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains metadata about all Covid-related YouTube videos which circulated on public social media, but which YouTube eventually removed because they contained false information. It describes 8,122 videos that were shared between November 2019 and June 2020. The dataset contains unique identifiers for the videos and social media accounts that shared the videos, statistics on social media engagement and metadata such as video titles and view counts where they were recoverable. We publish the data alongside the code used to produce on Github. The dataset has reuse potential for research studying narratives related to the coronavirus, the impact of social media on knowledge about health and the politics of social media platforms.

  9. Social media company responsibilities during COVID-19 crisis 2020, by...

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, Social media company responsibilities during COVID-19 crisis 2020, by generation [Dataset]. https://www.statista.com/statistics/1107331/social-media-role-during-coronavirus-crisis-by-generation/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Mar 16, 2020 - Mar 20, 2020
    Area covered
    Worldwide
    Description

    Data from a global survey held in March 2020 revealed that almost a third of responding Gen Z internet users worldwide felt that social media companies should provide live-streams of events during the coronavirus crisis. However, only ** percent of Baby Boomer respondents thought the same.For further information about the coronavirus (COVID-19) pandemic, please visit our dedicated Fact and Figures page.

  10. Covid-19 Go Away 2020 (C-19GA20)

    • kaggle.com
    • data.mendeley.com
    zip
    Updated Mar 25, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Priti Rai Jain (2022). Covid-19 Go Away 2020 (C-19GA20) [Dataset]. https://www.kaggle.com/datasets/pritiraijain/covid19-go-away-2020-c19ga20
    Explore at:
    zip(83628 bytes)Available download formats
    Dataset updated
    Mar 25, 2022
    Authors
    Priti Rai Jain
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    The C-19GA20 dataset was gathered online in April 2020 from school and university students between 14 to 24 years of age. It provides insightful information about the students’ mental health, social lives, attitude towards Covid-19, impact of the Covid-19 Pandemic on students’ education, and their experience with online learning. The data includes 5 major groups of variables: 1) Socio-demographic data - age group, gender, current place of stay, study level in their institution 2) 4 items for information regarding connectivity to the internet during the lockdown - device availability for exclusive use, internet bandwidth, top 5 online tools used most commonly, and screen time. 3) 9 items measured the impact of Covid-19 on the students’ social lives - their current situation of living, number of people around them where they live, their feelings towards meeting their friends, visiting their institution of study, events that would have been held offline. Students were asked about their top 5 past time activities during the lockdown and the amount of time they spend on social media online. 4) 6 items to gauge their experience with online learning during the lockdown - questions about feeling connected to their peers, maintaining discipline, structured learning, and the stress/burden felt by them due to online learning in the lockdown 5) 11 items to comprehensively gather information about the students’ mental health - how well have they adapted to stay-at-home instructions, their overall mood in the lockdown, feelings towards Covid 19, their prime concerns regarding their academic schedule, being updated and informed about Covid 19, the impact of social media on their beliefs. Finally, the students were asked to write about how they feel the pandemic has changed them as a person and affected their thinking process, and the students were asked to share a one-line message for the world during the lockdown.

  11. B

    COVID-19 Twitter Dataset

    • borealisdata.ca
    • figshare.com
    Updated Nov 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anatoliy Gruzd; Philip Mai (2020). COVID-19 Twitter Dataset [Dataset]. http://doi.org/10.5683/SP2/PXF2CU
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 10, 2020
    Dataset provided by
    Borealis
    Authors
    Anatoliy Gruzd; Philip Mai
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The current dataset contains 237M Tweet IDs for Twitter posts that mentioned "COVID" as a keyword or as part of a hashtag (e.g., COVID-19, COVID19) between March and July of 2020. Sampling Method: hourly requests sent to Twitter Search API using Social Feed Manager, an open source software that harvests social media data and related content from Twitter and other platforms. NOTE: 1) In accordance with Twitter API Terms, only Tweet IDs are provided as part of this dataset. 2) To recollect tweets based on the list of Tweet IDs contained in these datasets, you will need to use tweet 'rehydration' programs like Hydrator (https://github.com/DocNow/hydrator) or Python library Twarc (https://github.com/DocNow/twarc). 3) This dataset, like most datasets collected via the Twitter Search API, is a sample of the available tweets on this topic and is not meant to be comprehensive. Some COVID-related tweets might not be included in the dataset either because the tweets were collected using a standardized but intermittent (hourly) sampling protocol or because tweets used hashtags/keywords other than COVID (e.g., Coronavirus or #nCoV). 4) To broaden this sample, consider comparing/merging this dataset with other COVID-19 related public datasets such as: https://github.com/thepanacealab/covid19_twitter https://ieee-dataport.org/open-access/corona-virus-covid-19-tweets-dataset https://github.com/echen102/COVID-19-TweetIDs

  12. r

    sv-COVID-19

    • researchdata.se
    • data.europa.eu
    Updated Jan 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Språkbanken Text (2024). sv-COVID-19 [Dataset]. http://doi.org/10.23695/K6FH-4F59
    Explore at:
    Dataset updated
    Jan 1, 2024
    Dataset provided by
    University of Gothenburg
    Authors
    Språkbanken Text
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    sv-covid-19 is a collection of Swedish news texts, scientific and popular science articles and articles from certain blogs and social media wuch as Flashback and Twitter, which started to be published at the beginning of the coronavirus pandemic (early 2020). The latest verision of the corpus consists of approximately eight million words and 9000 articles. The corpus contains various text types and texts with different stylistic levels. The texts have been marked up with word class tags, morphological analysis and lemma, as well as some structural and functional information, such as author names.

  13. s

    Data from: The Shapes of the Fourth Estate During the Pandemic: Profiling...

    • socialmediaarchive.org
    txt
    Updated Aug 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). The Shapes of the Fourth Estate During the Pandemic: Profiling COVID-19 News Consumption in Eight Countries [Dataset]. https://socialmediaarchive.org/record/51
    Explore at:
    txt(1071535680), txt(891847740), txt(2498070060), txt(877999820), txt(868355040), txt(786700960), txt(1187419620), txt(756712180), txt(1093118000), txt(3682130280), txt(775548020), txt(630662160), txt(1555341960), txt(1074894660), txt(1372195300), txt(858269220)Available download formats
    Dataset updated
    Aug 3, 2023
    Description

    COVID2020 dataset provides a new, high-volume COVID-19 tweet dataset. It was collected from March 2020 to November 2020, covering eight months in the first year of the pandemic. The list of tracked COVID-19 keywords is obtained from "Emily Chen, Kristina Lerman, and Emilio Ferrara. 2020. Tracking Social Media Discourse about the COVID-19 Pandemic: Development of a Public Coronavirus Twitter Data Set. JMIR Public Health and Surveillance (2020)". Those keywords include not only generic terms such as "corona virus", "covid", but also non-pharmaceutical interventions such as "lockdown", "n95", and "social distancing."

    This dataset is comprised of tweet IDs.

  14. d

    Replication Data for: Social Media and Policy Responses to the COVID-19...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gilardi, Fabrizio; Gessler, Theresa; Kubli, Mael; Müller, Stefan (2023). Replication Data for: Social Media and Policy Responses to the COVID-19 Pandemic in Switzerland [Dataset]. http://doi.org/10.7910/DVN/BKGZUL
    Explore at:
    Dataset updated
    Nov 19, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Gilardi, Fabrizio; Gessler, Theresa; Kubli, Mael; Müller, Stefan
    Description

    We study the role of social media in debates regarding two policy responses to COVID-19 in Switzerland: face-mask rules and contact-tracing apps. We use a dictionary classifier to categorize 612,177 tweets by parties, politicians, and the public as well as 441,458 articles published in 76 newspapers from February until August 2020. We distinguish between "problem" (COVID-19) and ``solutions'' (face masks and contact-tracing apps) and, using a vector autoregression approach, we analyze the relationship between their salience on social and traditional media, as well as among different groups on social media. We find that overall attention to COVID-19 was not driven by endogenous dynamics between the different actors. By contrast, the debate on face masks was led by the attentive public and by politicians, whereas parties and newspapers followed. The results illustrate how social media challenge the capacity of party and media elites to craft a consensus regarding the appropriateness of different measures as responses to a major crisis.

  15. Impact of COVID-19 on Pharmaceutical Social Media Influencer Activity - June...

    • store.globaldata.com
    Updated Jun 30, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GlobalData UK Ltd. (2020). Impact of COVID-19 on Pharmaceutical Social Media Influencer Activity - June 2020 [Dataset]. https://store.globaldata.com/report/impact-of-covid-19-on-pharmaceutical-social-media-influencer-activity/
    Explore at:
    Dataset updated
    Jun 30, 2020
    Dataset provided by
    GlobalDatahttps://www.globaldata.com/
    Authors
    GlobalData UK Ltd.
    License

    https://www.globaldata.com/privacy-policy/https://www.globaldata.com/privacy-policy/

    Time period covered
    2020 - 2024
    Area covered
    Global
    Description

    The highly contagious coronavirus (SARS-CoV-2), dubbed COVID-19 (formerly 2019-nCoV), which emerged at the close of 2019, has led to a medical emergency across the world, with the World Health Organization (WHO) officially declaring the novel coronavirus a pandemic on March 11, 2020. This report analyzes GlobalData’s social media Influencer dashboards to understand Influencer trends since the pandemic began and what key Influencers are discussing online about COVID-19. Read More

  16. Z

    Data from: PANACEA dataset - Heterogeneous COVID-19 Claims

    • data.niaid.nih.gov
    Updated Jul 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arana-Catania, Miguel; Kochkina, Elena; Zubiaga, Arkaitz; Liakata, Maria; Procter, Rob; He, Yulan (2022). PANACEA dataset - Heterogeneous COVID-19 Claims [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6493846
    Explore at:
    Dataset updated
    Jul 15, 2022
    Dataset provided by
    University of Warwick
    Queen-Mary University of London
    Authors
    Arana-Catania, Miguel; Kochkina, Elena; Zubiaga, Arkaitz; Liakata, Maria; Procter, Rob; He, Yulan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The peer-reviewed publication for this dataset has been presented in the 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), and can be accessed here: https://arxiv.org/abs/2205.02596. Please cite this when using the dataset.

    This dataset contains a heterogeneous set of True and False COVID claims and online sources of information for each claim.

    The claims have been obtained from online fact-checking sources, existing datasets and research challenges. It combines different data sources with different foci, thus enabling a comprehensive approach that combines different media (Twitter, Facebook, general websites, academia), information domains (health, scholar, media), information types (news, claims) and applications (information retrieval, veracity evaluation).

    The processing of the claims included an extensive de-duplication process eliminating repeated or very similar claims. The dataset is presented in a LARGE and a SMALL version, accounting for different degrees of similarity between the remaining claims (excluding respectively claims with a 90% and 99% probability of being similar, as obtained through the MonoT5 model). The similarity of claims was analysed using BM25 (Robertson et al., 1995; Crestani et al., 1998; Robertson and Zaragoza, 2009) with MonoT5 re-ranking (Nogueira et al., 2020), and BERTScore (Zhang et al., 2019).

    The processing of the content also involved removing claims making only a direct reference to existing content in other media (audio, video, photos); automatically obtained content not representing claims; and entries with claims or fact-checking sources in languages other than English.

    The claims were analysed to identify types of claims that may be of particular interest, either for inclusion or exclusion depending on the type of analysis. The following types were identified: (1) Multimodal; (2) Social media references; (3) Claims including questions; (4) Claims including numerical content; (5) Named entities, including: PERSON − People, including fictional; ORGANIZATION − Companies, agencies, institutions, etc.; GPE − Countries, cities, states; FACILITY − Buildings, highways, etc. These entities have been detected using a RoBERTa base English model (Liu et al., 2019) trained on the OntoNotes Release 5.0 dataset (Weischedel et al., 2013) using Spacy.

    The original labels for the claims have been reviewed and homogenised from the different criteria used by each original fact-checker into the final True and False labels.

    The data sources used are:

    The LARGE dataset contains 5,143 claims (1,810 False and 3,333 True), and the SMALL version 1,709 claims (477 False and 1,232 True).

    The entries in the dataset contain the following information:

    • Claim. Text of the claim.

    • Claim label. The labels are: False, and True.

    • Claim source. The sources include mostly fact-checking websites, health information websites, health clinics, public institutions sites, and peer-reviewed scientific journals.

    • Original information source. Information about which general information source was used to obtain the claim.

    • Claim type. The different types, previously explained, are: Multimodal, Social Media, Questions, Numerical, and Named Entities.

    Funding. This work was supported by the UK Engineering and Physical Sciences Research Council (grant no. EP/V048597/1, EP/T017112/1). ML and YH are supported by Turing AI Fellowships funded by the UK Research and Innovation (grant no. EP/V030302/1, EP/V020579/1).

    References

    • Arana-Catania M., Kochkina E., Zubiaga A., Liakata M., Procter R., He Y.. Natural Language Inference with Self-Attention for Veracity Assessment of Pandemic Claims. NAACL 2022 https://arxiv.org/abs/2205.02596

    • Stephen E Robertson, Steve Walker, Susan Jones, Micheline M Hancock-Beaulieu, Mike Gatford, et al. 1995. Okapi at trec-3. Nist Special Publication Sp,109:109.

    • Fabio Crestani, Mounia Lalmas, Cornelis J Van Rijsbergen, and Iain Campbell. 1998. “is this document relevant?. . . probably” a survey of probabilistic models in information retrieval. ACM Computing Surveys (CSUR), 30(4):528–552.

    • Stephen Robertson and Hugo Zaragoza. 2009. The probabilistic relevance framework: BM25 and beyond. Now Publishers Inc.

    • Rodrigo Nogueira, Zhiying Jiang, Ronak Pradeep, and Jimmy Lin. 2020. Document ranking with a pre-trained sequence-to-sequence model. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, pages 708–718.

    • Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. 2019. Bertscore: Evaluating text generation with bert. In International Conference on Learning Representations.

    • Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.

    • Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, et al. 2013. Ontonotes release 5.0 ldc2013t19. Linguistic Data Consortium, Philadelphia, PA, 23.

    • Limeng Cui and Dongwon Lee. 2020. Coaid: Covid-19 healthcare misinformation dataset. arXiv preprint arXiv:2006.00885.

    • Yichuan Li, Bohan Jiang, Kai Shu, and Huan Liu. 2020. Mm-covid: A multilingual and multimodal data repository for combating covid-19 disinformation.

    • Tamanna Hossain, Robert L. Logan IV, Arjuna Ugarte, Yoshitomo Matsubara, Sean Young, and Sameer Singh. 2020. COVIDLies: Detecting COVID-19 misinformation on social media. In Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020, Online. Association for Computational Linguistics.

    • Ellen Voorhees, Tasmeer Alam, Steven Bedrick, Dina Demner-Fushman, William R Hersh, Kyle Lo, Kirk Roberts, Ian Soboroff, and Lucy Lu Wang. 2021. Trec-covid: constructing a pandemic information retrieval test collection. In ACM SIGIR Forum, volume 54, pages 1–12. ACM New York, NY, USA.

  17. A Twitter Dataset of 40+ million tweets related to COVID-19

    • zenodo.org
    csv, tsv
    Updated Apr 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juan M. Banda; Juan M. Banda; Ramya Tekumalla; Ramya Tekumalla (2023). A Twitter Dataset of 40+ million tweets related to COVID-19 [Dataset]. http://doi.org/10.5281/zenodo.3723940
    Explore at:
    tsv, csvAvailable download formats
    Dataset updated
    Apr 17, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Juan M. Banda; Juan M. Banda; Ramya Tekumalla; Ramya Tekumalla
    Description

    Due to the relevance of the COVID-19 global pandemic, we are releasing our dataset of tweets acquired from the Twitter Stream related to COVID-19 chatter. The first 9 weeks of data (from January 1st, 2020 to March 11th, 2020) contain very low tweet counts as we filtered other data we were collecting for other research purposes, however, one can see the dramatic increase as the awareness for the virus spread. Dedicated data gathering started from March 11th to March 22nd which yielded over 4 million tweets a day.

    The data collected from the stream captures all languages, but the higher prevalence are: English, Spanish, and French. We release all tweets and retweets on the full_dataset.tsv file (40,823,816 unique tweets), and a cleaned version with no retweets on the full_dataset-clean.tsv file (7,479,940 unique tweets). There are several practical reasons for us to leave the retweets, tracing important tweets and their dissemination is one of them. For NLP tasks we provide the top 1000 frequent terms in frequent_terms.csv, the top 1000 bigrams in frequent_bigrams.csv, and the top 1000 trigrams in frequent_trigrams.csv. Some general statistics per day are included for both datasets in the statistics-full_dataset.tsv and statistics-full_dataset-clean.tsv files.

    More details can be found (and will be updated faster at: https://github.com/thepanacealab/covid19_twitter)

    As always, the tweets distributed here are only tweet identifiers (with date and time added) due to the terms and conditions of Twitter to re-distribute Twitter data. The need to be hydrated to be used.

  18. c

    Research dataset gathered during the RHD project that explored the the...

    • acquire.cqu.edu.au
    • researchdata.edu.au
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amantha Perera (2023). Research dataset gathered during the RHD project that explored the the impact of online trauma threats faced by journalists during and immediately after the lockdowns (1Q 2020 to 3Q 2021) prompted by the COVID-19 pandemic. [Dataset]. http://doi.org/10.25946/19446812.v1
    Explore at:
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    CQUniversity
    Authors
    Amantha Perera
    License

    https://rightsstatements.org/page/InC/1.0/?language=enhttps://rightsstatements.org/page/InC/1.0/?language=en

    Description

    The research interviews with journalists and experts and the data extraction from the survey conducted for the Masters Research Project; The Impact of Online Trauma Threats Faced by Journalists: The Case of COVID-19 Imposed Remote Working Regimes. BackgroundThe global reach of the COVID-19 pandemic, with its sustained infection and fatality rates from the first quarter of 2020, deeply affected the majority of journalists across the world, who found themselves working on stories of trauma linked to the pandemic from remote locations and under restrictive working conditions. These COVID-19-enforced working conditions exponentially increased the exposure levels of online trauma threats faced by journalists. This research examines the confluence of online trauma threats and their manifestations and impacts, along with mitigative measures some journalists took to ease the impact of this confluence. The research was guided by the central question: ‘How are journalists experiencing and responding to online trauma threats they face in the line of work during and ‘post’ COVID-19 lockdowns?’ The research utilised three distinct yet interrelated methods: an online survey; in-depth, semi-structured interviews; and narrative case studies in the form of feature-length journalism. Thematic analysis of the survey and interviews provided a framework for the works of journalism, which are situated in broader contexts of the journalism profession and online trauma reporting. Responding to the increase in online trauma threat activity exacerbated by the COVID-19 pandemic, the research points towards potential transformations within the profession that might assist journalists to continue undertaking their important role in and for society.

  19. d

    Effect of social media on mental health during Covid-19 lockdowns in India

    • search.dataone.org
    • datadryad.org
    Updated May 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saptorshi Gupta; Ayan Ganguly (2025). Effect of social media on mental health during Covid-19 lockdowns in India [Dataset]. http://doi.org/10.5061/dryad.0cfxpnw32
    Explore at:
    Dataset updated
    May 8, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Saptorshi Gupta; Ayan Ganguly
    Time period covered
    Jan 1, 2021
    Description

    As the COVID-19 pandemic restricted individuals to their houses for a substantial amount of time, people took to the virtual world to stay connected with their peers, family and friends. Likewise, news channels and other forms of electronic media also witnessed a steep rise in viewership all across the globe. That being said, social media has led to adverse impacts on the mental health of individuals through addiction, stress, anxiety, depression and post-traumatic stress syndromes.

    The primary objective of this data is to analyze both the positive and negative effects of social media usage on individuals during an unprecedented global lockdown. Existing literature has found significant connections between the use of social media and mental health during extensive periods of lockdown (Swarnam. S., 2021; Pragholapat, A., 2020., Hong, W. et al., 2020). This dataset is used to understand the extent of depression and anxiety experienced by persons restricted to stay-at-home confinements ...

  20. US COVID Tweets

    • kaggle.com
    zip
    Updated Jul 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    YazanShannak (2020). US COVID Tweets [Dataset]. https://www.kaggle.com/yazanshannak/us-covid-tweets
    Explore at:
    zip(157100557 bytes)Available download formats
    Dataset updated
    Jul 20, 2020
    Authors
    YazanShannak
    License

    Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
    License information was derived automatically

    Description

    Context

    During the 2020 COVID-19 pandemic, social media exploded with content discussing the pandemic and other related issues, tweets discussing this topic was collected from Twitter to build better insights on how the state of the pandemic correlated with people's opinions

    Content

    The dataset consists of multiple features obtained from Twitter using a web crawler build with Python and Scrapy, we queried tweets with specific keywords related to the pandemic in the United States from the beginning of February to the end of April of 2020. Alongside our other features such as mentions, hashtags were extracted from the original tweet.

    Inspiration

    How could we correlate the state of the pandemic through the specified timeline with people's thoughts on social media?

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2025). Types of digital media used the most in coronavirus outbreak in China 2020 [Dataset]. https://www.statista.com/statistics/1108394/china-popular-digital-media-usage-in-coronavirus-covid19-outbreak-period-by-type/
Organization logo

Types of digital media used the most in coronavirus outbreak in China 2020

Explore at:
Dataset updated
Nov 27, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Feb 20, 2020 - Feb 21, 2020
Area covered
China
Description

According to a survey on the impact of coronavirus COVID-19 on media usage conducted in February 2020, there was a significant increase in digital media usage among Chinese consumers during the epidemic period. About ********* of respondents stated that they spent most of the time using the Chinese instant messenger WeChat. TV and online video platforms were other media types that captured most of the consumer attention.

Search
Clear search
Close search
Google apps
Main menu