As of March 2020, social media users in the United States were staying online more. According to a survey of U.S. social media users, 29.7 percent of respondents were using social media 1-2 hours additional hours per day. A further 20.5 percent used social media 30 minutes to 1 hour more than usual per day. Only 1.6 percent of users were adding less than 15 minutes to their usage. Additional social media usage was a result of the coronavirus pandemic, which caused stay home orders and social distancing to be put in place in the country.
Among social media platforms being used during the coronavirus pandemic, Facebook was the most used with 78.1 percent of adults in the United States using the platform as of March 2020. The second-most used platform was Instagram, with 49.5 percent of U.S. adults using the image sharing social platform. However, 7.7 percent of responding adults stated that they were not using social media. Social networks are a popular method to stay in touch with friends and family amidst social distancing directives during the COVID-19 outbreak in the United States.
Based on survey results from March 2020, daily usage of WhatsApp and Instagram increased the most due to the coronavirus (COVID-19) outbreak in Finland. WhatsApp usage among Finns increased by nine percent compared to the period before the COVID-19 restriction measures were put in place. While most social media platforms increased their popularity, daily usage of Facebook, internet forums, blogs, and LinkedIn decreased during the pandemic.
Please cite the following paper when using this dataset:
N. Thakur, “Five Years of COVID-19 Discourse on Instagram: A Labeled Instagram Dataset of Over Half a Million Posts for Multilingual Sentiment Analysis”, Proceedings of the 7th International Conference on Machine Learning and Natural Language Processing (MLNLP 2024), Chengdu, China, October 18-20, 2024 (Paper accepted for publication, Preprint available at: https://arxiv.org/abs/2410.03293)
Abstract
The outbreak of COVID-19 served as a catalyst for content creation and dissemination on social media platforms, as such platforms serve as virtual communities where people can connect and communicate with one another seamlessly. While there have been several works related to the mining and analysis of COVID-19-related posts on social media platforms such as Twitter (or X), YouTube, Facebook, and TikTok, there is still limited research that focuses on the public discourse on Instagram in this context. Furthermore, the prior works in this field have only focused on the development and analysis of datasets of Instagram posts published during the first few months of the outbreak. The work presented in this paper aims to address this research gap and presents a novel multilingual dataset of 500,153 Instagram posts about COVID-19 published between January 2020 and September 2024. This dataset contains Instagram posts in 161 different languages. After the development of this dataset, multilingual sentiment analysis was performed using VADER and twitter-xlm-roberta-base-sentiment. This process involved classifying each post as positive, negative, or neutral. The results of sentiment analysis are presented as a separate attribute in this dataset.
For each of these posts, the Post ID, Post Description, Date of publication, language code, full version of the language, and sentiment label are presented as separate attributes in the dataset.
The Instagram posts in this dataset are present in 161 different languages out of which the top 10 languages in terms of frequency are English (343041 posts), Spanish (30220 posts), Hindi (15832 posts), Portuguese (15779 posts), Indonesian (11491 posts), Tamil (9592 posts), Arabic (9416 posts), German (7822 posts), Italian (5162 posts), Turkish (4632 posts)
There are 535,021 distinct hashtags in this dataset with the top 10 hashtags in terms of frequency being #covid19 (169865 posts), #covid (132485 posts), #coronavirus (117518 posts), #covid_19 (104069 posts), #covidtesting (95095 posts), #coronavirusupdates (75439 posts), #corona (39416 posts), #healthcare (38975 posts), #staysafe (36740 posts), #coronavirusoutbreak (34567 posts)
The following is a description of the attributes present in this dataset
Open Research Questions
This dataset is expected to be helpful for the investigation of the following research questions and even beyond:
All the Instagram posts that were collected during this data mining process to develop this dataset were publicly available on Instagram and did not require a user to log in to Instagram to view the same (at the time of writing this paper).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a dataset of tweets by and about COVID-aware publics from the 'X' (Twitter) social media platform collected by the author. The dataset consists of 344 textual tweets regarding COVID-related material practices gathered during the research period Jan 2023 - Sep 2024, yet the dataset also includes tweets created before this date.The textual data has been rewritten to fully anonymise the people who made the tweets, and identifiable contexts have been removed. In addition, all date/time metadata and hashtags, as well as any attached images, have been removed. Square brackets have been used for editorial edits to obfuscate entities or add context to tweets. The dataset consists of a structured comma-separated text file that can be read in any spreadsheet software to maximise accessibility.The research dataset was created with Open university HREC approval: HREC/4557/Nold
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains metadata about all Covid-related YouTube videos which circulated on public social media, but which YouTube eventually removed because they contained false information. It describes 8,122 videos that were shared between November 2019 and June 2020. The dataset contains unique identifiers for the videos and social media accounts that shared the videos, statistics on social media engagement and metadata such as video titles and view counts where they were recoverable. We publish the data alongside the code used to produce on Github. The dataset has reuse potential for research studying narratives related to the coronavirus, the impact of social media on knowledge about health and the politics of social media platforms.
Data from a global survey held in March 2020 revealed that 43 percent of responding internet users worldwide felt that social media companies should help neighbors and local communities to connect with each other during the coronavirus crisis, though this varied somewhat by country. More than half of survey participants from the UK and the Philippines were in favor of this, yet only 20 percent of Japanese respondents thought the same. Over two thirds of global respondents also thought that it was the responsibility of social media platforms to provide fact-checked content to help people cope with the outbreak.For further information about the coronavirus (COVID-19) pandemic, please visit our dedicated Fact and Figures page.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Social media can be both a source of information and misinformation during health emergencies. During the COVID-19 pandemic, social media became a ubiquitous tool for people to communicate and represents a rich source of data researchers can use to analyse users’ experiences, knowledge and sentiments. Research on social media posts during COVID-19 has identified, to date, the perpetuity of traditional gendered norms and experiences. Yet these studies are mostly based on Western social media platforms. Little is known about gendered experiences of lockdown communicated on non-Western social media platforms. Using data from Weibo, China’s leading social media platform, we examine gendered user patterns and sentiment during the first wave of the pandemic between 1 January 2020 and 1 July 2020. We find that Weibo posts by self-identified women and men conformed with some gendered norms identified on other social media platforms during the COVID-19 pandemic (posting patterns and keyword usage) but not all (sentiment). This insight may be important for targeted public health messaging on social media during future health emergencies.To cite: Gan CCR, Feng SA, Feng H, et al. #WuhanDiary and #WuhanLockdown: gendered posting patterns and behaviours on Weibo during the COVID-19 pandemic. BMJ Global Health 2022;0:e008149. doi:10.1136/bmjgh-2021-008149
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Objective: To identify common social media misconceptions about COVID-19 vaccination in pregnancy, explain the spread of misinformation, and identify solutions to guide clinical practice and policy.
Methodology: A systematic review was conducted and the databases Embase and Medline were searched from December 2019 until February 8, 2023, using terms related to social media, pregnancy, COVID-19 vaccines and misinformation. The inclusion criteria were: original research studies and discuss misinformation about COVID-19 vaccination during pregnancy on social media. The exclusion criteria were: review articles, no full-text, and not published in English. Two independent reviewers conducted screening, extraction, and quality assessment.
Results: Our search identified 76 articles, and 3 fulfilled eligibility criteria. Included studies were of moderate and high quality. The social media platforms investigated included Facebook, Google Searches, Instagram, Reddit, Tik Tok, and Twitter. Misinformation was related to concerns regarding vaccine safety, and its association with infertility. Misinformation was increased due to lack of content monitoring on social media, exclusion of pregnant women from early vaccine trials, lack of information from reputable health sources on social media, and others. Suggested solutions were directed at pregnancy care providers (PCP) and public health/government. Suggestions included integrating COVID-19 vaccination information into antenatal care, PCPs and public health should increase their social media presence to disseminate information, address population-specific vaccine concerns in a culturally relevant manner, and others.
Conclusion: Increased availability of information from reputable health sources through multiple channels could increase COVID-19 vaccine uptake in the pregnant population and help combat misinformation.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Please cite the following paper when using this dataset:
N. Thakur, “Five Years of COVID-19 Discourse on Instagram: A Labeled Instagram Dataset of Over Half a Million Posts for Multilingual Sentiment Analysis”, Proceedings of the 7th International Conference on Machine Learning and Natural Language Processing (MLNLP 2024), Chengdu, China, October 18-20, 2024 (Paper accepted for publication, Preprint available at: https://arxiv.org/abs/2410.03293)
Abstract
The outbreak of COVID-19 served as a catalyst for content creation and dissemination on social media platforms, as such platforms serve as virtual communities where people can connect and communicate with one another seamlessly. While there have been several works related to the mining and analysis of COVID-19-related posts on social media platforms such as Twitter (or X), YouTube, Facebook, and TikTok, there is still limited research that focuses on the public discourse on Instagram in this context. Furthermore, the prior works in this field have only focused on the development and analysis of datasets of Instagram posts published during the first few months of the outbreak. The work presented in this paper aims to address this research gap and presents a novel multilingual dataset of 500,153 Instagram posts about COVID-19 published between January 2020 and September 2024. This dataset contains Instagram posts in 161 different languages. After the development of this dataset, multilingual sentiment analysis was performed using VADER and twitter-xlm-roberta-base-sentiment. This process involved classifying each post as positive, negative, or neutral. The results of sentiment analysis are presented as a separate attribute in this dataset.
For each of these posts, the Post ID, Post Description, Date of publication, language code, full version of the language, and sentiment label are presented as separate attributes in the dataset.
The Instagram posts in this dataset are present in 161 different languages out of which the top 10 languages in terms of frequency are English (343041 posts), Spanish (30220 posts), Hindi (15832 posts), Portuguese (15779 posts), Indonesian (11491 posts), Tamil (9592 posts), Arabic (9416 posts), German (7822 posts), Italian (5162 posts), Turkish (4632 posts)
There are 535,021 distinct hashtags in this dataset with the top 10 hashtags in terms of frequency being #covid19 (169865 posts), #covid (132485 posts), #coronavirus (117518 posts), #covid_19 (104069 posts), #covidtesting (95095 posts), #coronavirusupdates (75439 posts), #corona (39416 posts), #healthcare (38975 posts), #staysafe (36740 posts), #coronavirusoutbreak (34567 posts)
The following is a description of the attributes present in this dataset - Post ID: Unique ID of each Instagram post - Post Description: Complete description of each post in the language in which it was originally published - Date: Date of publication in MM/DD/YYYY format - Language code: Language code (for example: “en”) that represents the language of the post as detected using the Google Translate API - Full Language: Full form of the language (for example: “English”) that represents the language of the post as detected using the Google Translate API - Sentiment: Results of sentiment analysis (using the preprocessed version of each post) where each post was classified as positive, negative, or neutral
Open Research Questions
This dataset is expected to be helpful for the investigation of the following research questions and even beyond:
All the Instagram posts that were collected during this data mining process to develop this dataset were publicly available on Instagram and did not require a user to log in to Instagram to view the same (at the time of writing this paper).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This study utilized a non-experimental descriptive survey design to examine social media information-seeking behaviors during the COVID-19 outbreak, particularly during lockdown periods. The objectives were to describe perceptions of COVID-19 information on social media, explore the platforms used during lockdown, identify groups of connections on social media, and determine if platform use varied based on connected groups. To gather data on information-seeking behaviors, an online survey was administered via Qualtrics, reaching 1,048 respondents in the United States through non-probability opt-in sampling. The survey included the perceptions of the information availability scale (Woolley & Propst, 2005) and an information-seeking behavior scale (Timmers & Glas, 2010) the information availability scale (Woolley & Propst, 2005), and some researcher-adapted Likert-type scales. The results revealed that more than 70% of respondents felt overwhelmed while searching for COVID-19 information, encountered difficulties in accessing and interpreting additional information, and sometimes even avoided news about the pandemic. Among social media platforms, Facebook, Instagram, and Twitter were the most popular for obtaining COVID-19 information. Notably, Facebook emerged as the most widely used platform during lockdowns. Furthermore, respondents primarily utilized Facebook to connect with friends and family during the pandemic, and those with larger social networks tended to access social media platforms more frequently. These findings highlight the significant role of Facebook in disseminating reliable information during the COVID-19 pandemic. They also emphasize the importance of implementing strategies to help individuals navigate the overwhelming amount of information, including misinformation, present on social media platforms, particularly during times of crisis. It is worth noting that there is limited generalizability due to US-centric sample.
Data from a global survey held in March 2020 revealed that almost a third of responding Gen Z internet users worldwide felt that social media companies should provide live-streams of events during the coronavirus crisis. However, only 22 percent of Baby Boomer respondents thought the same.For further information about the coronavirus (COVID-19) pandemic, please visit our dedicated Fact and Figures page.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The current dataset contains Tweet IDs for tweets mentioning "COVID" (e.g., COVID-19, COVID19) and shared between March and July of 2020.Sampling Method: hourly requests sent to Twitter Search API using Social Feed Manager, an open source software that harvests social media data and related content from Twitter and other platforms.NOTE: 1) In accordance with Twitter API Terms, only Tweet IDs are provided as part of this dataset. 2) To recollect tweets based on the list of Tweet IDs contained in these datasets, you will need to use tweet 'rehydration' programs like Hydrator (https://github.com/DocNow/hydrator) or Python library Twarc (https://github.com/DocNow/twarc). 3) This dataset, like most datasets collected via the Twitter Search API, is a sample of the available tweets on this topic and is not meant to be comprehensive. Some COVID-related tweets might not be included in the dataset either because the tweets were collected using a standardized but intermittent (hourly) sampling protocol or because tweets used hashtags/keywords other than COVID (e.g., Coronavirus or #nCoV). 4) To broaden this sample, consider comparing/merging this dataset with other COVID-19 related public datasets such as: https://github.com/thepanacealab/covid19_twitter https://ieee-dataport.org/open-access/corona-virus-covid-19-tweets-dataset https://github.com/echen102/COVID-19-TweetIDs
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Background: COVID-related misinformation is prevalent online, including on social media. The purpose of this study was to explore factors associated with user engagement with COVID-related misinformation on the social media platform, TikTok. Methods: A sample of TikTok videos associated with the hashtag #coronavirus were downloaded on September 20, 2020. Misinformation was evaluated on a scale (low, medium, high) using a codebook developed by experts in infectious diseases. Multivariable modeling was used to evaluate factors associated with number of views and presence of user comments indicating intention to change behavior. Results: 166 TikTok videos were identified. Moderate misinformation was present in 36 (22%) videos, and high-level misinformation was present in 11 (7%). After controlling for characteristics and content, videos containing moderate misinformation were less likely to generate a user response indicating intended behavior change. By contrast, videos containing high-level misinformation were less likely to be viewed but demonstrated a non-significant trend towards higher engagement among viewers. Conclusions: COVID-related misinformation is less frequently viewed on TikTok but more likely to engage viewers. Public health authorities can combat misinformation on social media by posting content of their own.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The United States of America
Government censorship---internet shutdowns, blockages, firewalls---imposes significant barriers to the transnational flow of information despite the connective power of digital technologies. In this paper, we examine whether and how information flows across borders despite government censorship. We develop a semi-automated system that combines deep learning and human annotation to find co-occurring content across different social media platforms and languages. We use this system to detect co-occurring content between Twitter and Sina Weibo as Covid-19 spread globally, and we conduct in-depth investigations of co-occurring content to identify those that constitute an inflow of information from the global information ecosystem into China. We find that approximately one-fourth of content with relevance for China that gain widespread public attention on Twitter makes its way to Weibo. Unsurprisingly, Chinese state-controlled media and commercialized domestic media play a dominant role in facilitating these inflows of information. However, we find that Weibo users without traditional media or government affiliation are also an important mechanism for transmitting information into China. These results imply that while censorship combined with media control provides substantial leeway for the government to set the agenda, social media provides opportunities for non-institutional actors to influence the information environment. Methodologically, the system we develop offers an new approach for the quantitative analysis of cross-platform and cross-lingual communication.
Young adulthood represents a sensitive period for young people's mental health. The lockdown restrictions associated with the COVID-19 pandemic have reduced young people's access to traditional sources of mental health support. This exploratory study aimed to investigate the online resources young people were using to support their mental health during the first lockdown period in Ireland. It made use of an anonymous online survey targeted at young people aged 18–25. Participants were recruited using ads on social media including Facebook, Twitter, Instagram, and SnapChat. A total of 393 respondents completed the survey. Many of the respondents indicated that they were using social media (51.4%, 202/393) and mental health apps (32.6%, 128/393) as sources of support. Fewer were making use of formal online resources such as charities (26%, 102/393) or professional counseling services (13.2%, 52/393). Different social media platforms were used for different purposes; Facebook was used for support groups whilst Instagram was used to engage with influencers who focused on mental health issues. Google search, recommendations from peers and prior knowledge of services played a role in how resources were located. Findings from this survey indicate that digital technologies and online resources have an important role to play in supporting young people's mental health. The COVID-19 pandemic has highlighted these digital tool's potential as well as how they can be improved to better meet young people's needs
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The study examines 30 consecutive episodes of Xinwen Lianbo (541 stories), 332 posts on “cctvxwlianbo” WeChat official account, 161 articles on the front page of People’s Daily newspaper in 30 days, as well as 1,015 hashtags and news articles on Sina Weibo’s two categories of ranking (top searched and real-time hot topic).
The study's methodology is related to these publications: An, Seon-Kyoung, and Karla K. Gower. “How Do the News Media Frame Crises? A Content Analysis of Crisis News Coverage.” Public Relations Review 35, no. 2 (2009): 107–12. https://doi.org/10.1016/j.pubrev.2009.01.010. Goodall, Catherine; Sabo, Jason; Cline, Rebecca & Egbert Nichole (2012), Threat, Efficacy, and Uncertainty in the First 5 Months of National Print and Electronic News Coverage of the H1N1 Virus, Journal of Health Communication, 17:3, 338-355, DOI: 10.1080/10810730.2011.626499
According to a survey on the impact of the COVID-19 pandemic in the Middle East and North Africa (MENA) in 2020, there was a 52 percent increase in the revenue of Snapchat during the COVID-19 pandemic. In 2020, the growth curve of the e-commerce industry in the region had accelerated by five years in about five months.
In 2024, Facebook was the leading social media platform for news in Thailand, which was used by 64 percent of respondents. Other leading platforms for news consumption were YouTube and LINE. LINE, a popular messaging app in Thailand, has also recently created the ‘LINE Today’ section as a news source within the LINE app. Social media as a news source for the younger generation Social media is an efficient way for audiences to access news and share content, especially since it is less regulated than other forms of media in Thailand. Popular social media platforms among Thais such as Facebook and YouTube are the major sites for audiences to join in on live events and exercise freedom of speech. The LINE messaging application is also the main platform for users to share the news. Additionally, Twitter and TikTok are playing an increasingly significant role in shaping how news and sentiment on recent events are discussed, such as the recent youth-led protests in Thailand. The fast and unregulated form of communication on social media, therefore, has cultivated a preference for online news among the younger generation in Thailand. The rise of fake news The ease of sharing news and stories through social media has also led to an issue of fake news in Thailand. Recent events such as the COVID-19 pandemic and the political movements in the country have led to a proliferation of fake news. The COVID-19 pandemic, in particular, has been one of the major causes of the increase in the spread of fake news, especially regarding the information on COVID-19 preventative measures. Since fake news has been rampant in the country, there have been legal restrictions on how content is created and shared in Thailand.
As of March 2020, social media users in the United States were staying online more. According to a survey of U.S. social media users, 29.7 percent of respondents were using social media 1-2 hours additional hours per day. A further 20.5 percent used social media 30 minutes to 1 hour more than usual per day. Only 1.6 percent of users were adding less than 15 minutes to their usage. Additional social media usage was a result of the coronavirus pandemic, which caused stay home orders and social distancing to be put in place in the country.