72 datasets found

Facebook users worldwide 2017-2027
statista.com
tokrwards.com
+4more
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stacy Jo Dixon, Facebook users worldwide 2017-2027 [Dataset]. https://www.statista.com/topics/1164/social-networks/
Explore at:
Dataset provided by
Statistahttp://statista.com/
Authors
Stacy Jo Dixon
Description
The global number of Facebook users was forecast to continuously increase between 2023 and 2027 by in total 391 million users (+14.36 percent). After the fourth consecutive increasing year, the Facebook user base is estimated to reach 3.1 billion users and therefore a new peak in 2027. Notably, the number of Facebook users was continuously increasing over the past years. User figures, shown here regarding the platform Facebook, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
d
Comprehensive Dataset of European Interest Groups Across Social Media...
search.dataone.org
portaldelainvestigacion.uma.es
Updated Dec 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Castillo Esparcia, Antonio; Almansa Martinez, Ana; Gorostiza Cervino, Aritz (2023). Comprehensive Dataset of European Interest Groups Across Social Media Platforms: Twitter, Facebook, Instagram, TikTok, and YouTube [Dataset]. http://doi.org/10.7910/DVN/RRYJV7
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/RRYJV7
Dataset updated
Dec 30, 2023
Dataset provided by
Harvard Dataverse
Authors
Castillo Esparcia, Antonio; Almansa Martinez, Ana; Gorostiza Cervino, Aritz
Area covered
YouTube
Description
Introducing a comprehensive and meticulously curated dataset: "European Interest Groups' Social Media Engagement Dataset." This dataset offers a panoramic view of the digital footprint and social media presence of various interest groups within Europe. Encompassing a diverse range of platforms including Twitter, Facebook, Instagram, TikTok, and YouTube. This are the variables: 1. Name: The name of the organization 2. twitter_link: The link of twitter if it is 3. facebook_link: The link of facebook if it is 4. instagram_link: The link of instagram if it is 5. tiktok_link: The link of tiktok if it is 6. linkedin_link: The link of linkedin if it is 7. youtube_link: The link of youtube if it is With a focus on transparency and relevance, this dataset presents a wealth of information that delves into the strategies, content, and reach of interest groups across these dynamic online platforms. Researchers, policymakers, and analysts can explore trends, patterns, and correlations between online activities and real-world influence, shedding light on the evolving landscape of digital interaction within the realm of European interest groups.

Countries with the most Facebook users 2024

statista.com
tokrwards.com
+4more

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Stacy Jo Dixon, Countries with the most Facebook users 2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/

Explore at:

Dataset provided by

Statistahttp://statista.com/

Authors

Stacy Jo Dixon

Description

Which county has the most Facebook users?

              There are more than 378 million Facebook users in India alone, making it the leading country in terms of Facebook audience size. To put this into context, if India’s Facebook audience were a country then it would be ranked third in terms of largest population worldwide. Apart from India, there are several other markets with more than 100 million Facebook users each: The United States, Indonesia, and Brazil with 193.8 million, 119.05 million, and 112.55 million Facebook users respectively.

              Facebook – the most used social media

              Meta, the company that was previously called Facebook, owns four of the most popular social media platforms worldwide, WhatsApp, Facebook Messenger, Facebook, and Instagram. As of the third quarter of 2021, there were around 3,5 billion cumulative monthly users of the company’s products worldwide. With around 2.9 billion monthly active users, Facebook is the most popular social media worldwide. With an audience of this scale, it is no surprise that the vast majority of Facebook’s revenue is generated through advertising.

              Facebook usage by device
              As of July 2021, it was found that 98.5 percent of active users accessed their Facebook account from mobile devices. In fact, almost 81.8 percent of Facebook audiences worldwide access the platform only via mobile phone. Facebook is not only available through mobile browser as the company has published several mobile apps for users to access their products and services. As of the third quarter 2021, the four core Meta products were leading the ranking of most downloaded mobile apps worldwide, with WhatsApp amassing approximately six billion downloads.

U.S. Facebook data requests from government agencies 2013-2023
statista.com
de.statista.com
+4more
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stacy Jo Dixon, U.S. Facebook data requests from government agencies 2013-2023 [Dataset]. https://www.statista.com/topics/1164/social-networks/
Explore at:
Dataset provided by
Statistahttp://statista.com/
Authors
Stacy Jo Dixon
Description
Facebook received 73,390 user data requests from federal agencies and courts in the United States during the second half of 2023. The social network produced some user data in 88.84 percent of requests from U.S. federal authorities. The United States accounts for the largest share of Facebook user data requests worldwide.
Z
A dataset of media releases (Twitter, News and Comments, Youtube, Facebook)...
data.niaid.nih.gov
Updated Mar 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrzej Jarynowski (2021). A dataset of media releases (Twitter, News and Comments, Youtube, Facebook) form Poland related to COVID-19 for open research [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3985567
Explore at:
Dataset updated
Mar 29, 2021
Dataset authored and provided by
Andrzej Jarynowski
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Poland, YouTube
Description
Social behavior has a fundamental impact on the dynamics of infectious diseases (such as COVID-19), challenging public health mitigation strategies and possibly the political consensus. The widespread use of the traditional and social media on the Internet provides us with an invaluable source of information on societal dynamics during pandemics. With this dataset, we aim to understand mechanisms of COVID-19 epidemic-related social behavior in Poland deploying methods of computational social science and digital epidemiology. We have collected and analyzed COVID-19 perception on the Polish language Internet during 15.01-31.07(06.08) and labeled data quantitatively (Twitter, Youtube, Articles) and qualitatively (Facebook, Articles and Comments of Article) in the Internet by infomediological approach.

manually labelled1,449 articles / Facebook posts from Lower Silesia (facebook_articles_lower_silesia.zip) and 111 texts from outside this region;

-manually labelled 1000 most popular tweets (twits_annotated.xlsx) with cathegories is_fake (categorical and numeric) topic and sentiment;

-extracted 57,306 representative articles (articles_till_06_08.zip) in Polish using Eventregitry.org tool in language Polish and topic "Coronavirus" in article body;

extracted 1,015,199 (tweets_till_31_07_users.zip and tweets_till_31_07_text.zip) and Tweets from #Koronawirus in language Polish using Twitter API.

collected 1,574 videos (youtube_comments_till_31_07.zip and youtube_movie.csv) with keyword: Koronawirus on YouTube and 247,575 comments on them using Google API;

We supplemented the media observations with an analysis of 244 social empirical studies till 25.05 on COVID-19 in Poland (empirical_social_studies.csv).

Reports and analyzes and coding books can be found in Polish at: http://www.infodemia-koronawirusa.pl

Main report (in Polish) https://depot.ceon.pl/handle/123456789/19215
H
Social media datasets from data donation: Data from WhatsApp, Facebook,...
dataverse.harvard.edu
Updated Jan 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Venkata Rama Kiran Garimella (2025). Social media datasets from data donation: Data from WhatsApp, Facebook, YouTube, Telegram and Instagram [Dataset]. http://doi.org/10.7910/DVN/VOFPK1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/VOFPK1
Dataset updated
Jan 15, 2025
Dataset provided by
Harvard Dataverse
Authors
Venkata Rama Kiran Garimella
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
YouTube
Description
Datasets from the ICWSM dataset paper: "Data Donation on Social Media: Tools and Datasets" The datasets were collected using data donation tools developed by Kiran Garimella's team at Rutgers University.

Number of global social network users 2017-2028

statista.com
grusthub.com
+4more

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Stacy Jo Dixon, Number of global social network users 2017-2028 [Dataset]. https://www.statista.com/topics/1164/social-networks/

Explore at:

Dataset provided by

Statistahttp://statista.com/

Authors

Stacy Jo Dixon

Description

How many people use social media?

              Social media usage is one of the most popular online activities. In 2024, over five billion people were using social media worldwide, a number projected to increase to over six billion in 2028.

              Who uses social media?
              Social networking is one of the most popular digital activities worldwide and it is no surprise that social networking penetration across all regions is constantly increasing. As of January 2023, the global social media usage rate stood at 59 percent. This figure is anticipated to grow as lesser developed digital markets catch up with other regions
              when it comes to infrastructure development and the availability of cheap mobile devices. In fact, most of social media’s global growth is driven by the increasing usage of mobile devices. Mobile-first market Eastern Asia topped the global ranking of mobile social networking penetration, followed by established digital powerhouses such as the Americas and Northern Europe.

              How much time do people spend on social media?
              Social media is an integral part of daily internet usage. On average, internet users spend 151 minutes per day on social media and messaging apps, an increase of 40 minutes since 2015. On average, internet users in Latin America had the highest average time spent per day on social media.

              What are the most popular social media platforms?
              Market leader Facebook was the first social network to surpass one billion registered accounts and currently boasts approximately 2.9 billion monthly active users, making it the most popular social network worldwide. In June 2023, the top social media apps in the Apple App Store included mobile messaging apps WhatsApp and Telegram Messenger, as well as the ever-popular app version of Facebook.

Dataset - Information Bubble and Learning in the Digital Age
zenodo.org
data.niaid.nih.gov
bin
Updated Sep 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rodrigo Franklin Frogeri; Rodrigo Franklin Frogeri; Deusdedit Faria Lopes; Deusdedit Faria Lopes; Mariana Aranha de Souza; Mariana Aranha de Souza (2023). Dataset - Information Bubble and Learning in the Digital Age [Dataset]. http://doi.org/10.5281/zenodo.8368711
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8368711
Dataset updated
Sep 22, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Rodrigo Franklin Frogeri; Rodrigo Franklin Frogeri; Deusdedit Faria Lopes; Deusdedit Faria Lopes; Mariana Aranha de Souza; Mariana Aranha de Souza
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset was utilized in the analyses presented in the paper entitled "Information Bubble and Learning in the Digital Age: An Analysis from the Perspective of European and African Students." Details regarding the dataset can be found in the Methodology section of the paper.
m
Social Media Ad Exposure | 1st Party | 3B+ events verified, US consumers |...
omnitrafficdata.mfour.com
datarade.ai
Updated Aug 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MFour (2025). Social Media Ad Exposure | 1st Party | 3B+ events verified, US consumers | Facebook, TikTok, X, Instagram and YouTube [Dataset]. https://omnitrafficdata.mfour.com/
Explore at:
Dataset updated
Aug 11, 2025
Dataset authored and provided by
MFour
Area covered
United States
Description
This dataset encompasses social media exposure to sponsored posts, collected from over 150,000 triple-opt-in first-party U.S. Daily Active Users (DAU). Use it for measurement, attribution or brand lift surveying. Platforms covered include Facebook, TikTok, X, Instagram and YouTube.
d
A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and...
search.dataone.org
Updated Sep 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thakur, Nirmalya; Su, Vanessa; Shao, Mingchen; Patel, Kesha A.; Jeong, Hongseok; Knieling, Victoria; Bian, Andrew (2024). A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and Other Sources about the 2024 Outbreak of Measles [Dataset]. http://doi.org/10.7910/DVN/QTJ9HC
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/QTJ9HC
Dataset updated
Sep 24, 2024
Dataset provided by
Harvard Dataverse
Authors
Thakur, Nirmalya; Su, Vanessa; Shao, Mingchen; Patel, Kesha A.; Jeong, Hongseok; Knieling, Victoria; Bian, Andrew
Time period covered
Jan 1, 2024 - May 31, 2024
Area covered
YouTube
Description
Please cite the following paper when using this dataset: N. Thakur, V. Su, M. Shao, K. Patel, H. Jeong, V. Knieling, and A.Bian “A labelled dataset for sentiment analysis of videos on YouTube, TikTok, and other sources about the 2024 outbreak of measles,” arXiv [cs.CY], 2024. Available: http://arxiv.org/abs/2406.07693 Abstract This dataset contains the data of 4011 videos about the ongoing outbreak of measles published on 264 websites on the internet between January 1, 2024, and May 31, 2024. These websites primarily include YouTube and TikTok, which account for 48.6% and 15.2% of the videos, respectively. The remainder of the websites include Instagram and Facebook as well as the websites of various global and local news organizations. For each of these videos, the URL of the video, title of the post, description of the post, and the date of publication of the video are presented as separate attributes in the dataset. After developing this dataset, sentiment analysis (using VADER), subjectivity analysis (using TextBlob), and fine-grain sentiment analysis (using DistilRoBERTa-base) of the video titles and video descriptions were performed. This included classifying each video title and video description into (i) one of the sentiment classes i.e. positive, negative, or neutral, (ii) one of the subjectivity classes i.e. highly opinionated, neutral opinionated, or least opinionated, and (iii) one of the fine-grain sentiment classes i.e. fear, surprise, joy, sadness, anger, disgust, or neutral. These results are presented as separate attributes in the dataset for the training and testing of machine learning algorithms for performing sentiment analysis or subjectivity analysis in this field as well as for other applications. The paper associated with this dataset (please see the above-mentioned citation) also presents a list of open research questions that may be investigated using this dataset.

Leading social media platforms used by marketers worldwide 2024

statista.com
de.statista.com
+4more

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Christopher Ross, Leading social media platforms used by marketers worldwide 2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/

Explore at:

Dataset provided by

Statistahttp://statista.com/

Authors

Christopher Ross

Description

During a 2024 survey among marketers worldwide, around 86 percent reported using Facebook for marketing purposes. Instagram and LinkedIn followed, respectively mentioned by 79 and 65 percent of the respondents.

              The global social media marketing segment

              According to the same study, 59 percent of responding marketers intended to increase their organic use of YouTube for marketing purposes throughout that year. LinkedIn and Instagram followed with similar shares, rounding up the top three social media platforms attracting a planned growth in organic use among global marketers in 2024. Their main driver is increasing brand exposure and traffic, which led the ranking of benefits of social media marketing worldwide.

              Social media for B2B marketing

              Social media platform adoption rates among business-to-consumer (B2C) and business-to-business (B2B) marketers vary according to each subsegment's focus. While B2C professionals prioritize Facebook and Instagram – both run by Meta, Inc. – due to their popularity among online audiences, B2B marketers concentrate their endeavors on Microsoft-owned LinkedIn due to its goal to connect people and companies in a corporate context.

h
UltraLAMBDA
huggingface.co
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Behavior In The Wild (2024). UltraLAMBDA [Dataset]. https://huggingface.co/datasets/behavior-in-the-wild/UltraLAMBDA
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 19, 2024
Authors
Behavior In The Wild
Description
Dataset Summary

UltraLAMBDAis a large-scale dataset of ads sourced from brand videos on platforms such as YouTube and Facebook Ads, as well as from CommonCrawl. The memorability scores for the ads are assigned by our model Henry.

Dataset Structure

from datasets import load_dataset ds = load_dataset("behavior-in-the-wild/UltraLAMBDA") ds

DatasetDict({ train: Dataset({ features: ['id', 'memorability'], num_rows: 1964 })

})

Data… See the full description on the dataset page: https://huggingface.co/datasets/behavior-in-the-wild/UltraLAMBDA.
Social media usage by local government - Dataset - data.gov.uk
ckan.publishing.service.gov.uk
Updated Jun 8, 2010
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.publishing.service.gov.uk (2010). Social media usage by local government - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/social-media-usage-by-local-government
Explore at:
Dataset updated
Jun 8, 2010
Dataset provided by
CKANhttps://ckan.org/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
A list of UK local authorities which are using social media such as Facebook, Twitter, YouTube. Also includes those with RSS feeds, web development blogs and open data.
daily_socialmedia_engagement
kaggle.com
Updated Feb 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeel Gajera (2023). daily_socialmedia_engagement [Dataset]. https://www.kaggle.com/datasets/earthian/daily-socialmedia-engagement
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 27, 2023
Dataset provided by
Kaggle
Authors
Jeel Gajera
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
This dataset contains information about daily engagement hours on various social media platforms for 1000 users. The data includes user IDs, age, and daily engagement hours on Facebook, Instagram, WhatsApp, Twitter, LinkedIn, Snapchat, and YouTube.
Data from: Using Multistreaming Social Media Video as a Research Method for...
research.usc.edu.au
researchdata.edu.au
Updated Mar 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Karen Sutherland; Krisztina Morris (2022). Using Multistreaming Social Media Video as a Research Method for Interview Data Collection [Dataset]. https://research.usc.edu.au/esploro/outputs/dataset/Using-Multistreaming-Social-Media-Video-as/99620208702621
Explore at:
Dataset updated
Mar 23, 2022
Dataset provided by
Sagehttp://www.sagepublications.com/
Authors
Karen Sutherland; Krisztina Morris
Time period covered
2022
Description
This dataset is designed to explore multistreaming social media video as a research method used to collect semi-structured interview data. The data are provided by Dr Karen E. Sutherland and Ms Krisztina Morris from the School of Business and Creative Industries at the University of the Sunshine Coast in Queensland, Australia. The dataset is drawn from the publicly available video recording of an interview undertaken as part of the research project called: ‘Like, Share, Follow’, a multistreaming show, featuring Dr Sutherland interviewing university graduates about their career journeys, that is broadcast across Facebook, LinkedIn, and Twitter and later uploaded to YouTube. This dataset examines how multistreaming video interview data can be used to answer research questions and the benefits and challenges this specific method of data collection can pose in the process of data analysis. The video example is accompanied by a teaching guide and a student guide.
m
SNA Is is conspiracy or truth.xlsx
figshare.manchester.ac.uk
figshare.com
xlsx
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Beatriz Buarque (2022). SNA Is is conspiracy or truth.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.14115515.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14115515.v1
Dataset updated
Jan 28, 2022
Dataset provided by
University of Manchester
Authors
Beatriz Buarque
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data set has:- Comments manually collected from a YouTube video containing the 5G conspiracy theory articulated as legiitmate truth - Number of followers and followed Twitter users found on posts that shared the aforementioned video- Number of posts identified on Facebook sharing the same video and their respective number of followers
o
Social Media Profile Links by Name
openwebninja.com
json
Updated Feb 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenWeb Ninja (2025). Social Media Profile Links by Name [Dataset]. https://www.openwebninja.com/api/social-links-search
Explore at:
jsonAvailable download formats
Dataset updated
Feb 2, 2025
Dataset authored and provided by
OpenWeb Ninja
Area covered
Worldwide
Description
This dataset provides comprehensive social media profile links discovered through real-time web search. It includes profiles from major social networks like Facebook, TikTok, Instagram, Twitter, LinkedIn, Youtube, Pinterest, Github and more. The data is gathered through intelligent search algorithms and pattern matching. Users can leverage this dataset for social media research, influencer discovery, social presence analysis, and social media marketing. The API enables efficient discovery of social profiles across multiple platforms. The dataset is delivered in a JSON format via REST API.

Facebook: distribution of global audiences 2024, by age and gender

statista.com
de.statista.com
+4more

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Stacy Jo Dixon, Facebook: distribution of global audiences 2024, by age and gender [Dataset]. https://www.statista.com/topics/1164/social-networks/

Explore at:

Dataset provided by

Statistahttp://statista.com/

Authors

Stacy Jo Dixon

Description

As of April 2024, it was found that men between the ages of 25 and 34 years made up Facebook largest audience, accounting for 18.4 percent of global users. Additionally, Facebook's second largest audience base could be found with men aged 18 to 24 years.

              Facebook connects the world

              Founded in 2004 and going public in 2012, Facebook is one of the biggest internet companies in the world with influence that goes beyond social media. It is widely considered as one of the Big Four tech companies, along with Google, Apple, and Amazon (all together known under the acronym GAFA). Facebook is the most popular social network worldwide and the company also owns three other billion-user properties: mobile messaging apps WhatsApp and Facebook Messenger,
              as well as photo-sharing app Instagram. Facebook usersThe vast majority of Facebook users connect to the social network via mobile devices. This is unsurprising, as Facebook has many users in mobile-first online markets. Currently, India ranks first in terms of Facebook audience size with 378 million users. The United States, Brazil, and Indonesia also all have more than 100 million Facebook users each.

Spanish Fake News Dataset

zenodo.org
produccioncientifica.ucm.es

csv, txt

Updated Jun 4, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Arsenii Tretiakov; Arsenii Tretiakov; Sergio D'Antonio Maceiras; Sergio D'Antonio Maceiras; Alejandro Martín; Alejandro Martín (2025). Spanish Fake News Dataset [Dataset]. http://doi.org/10.5281/zenodo.15592391

Explore at:

txt, csvAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.15592391

Dataset updated

Jun 4, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Arsenii Tretiakov; Arsenii Tretiakov; Sergio D'Antonio Maceiras; Sergio D'Antonio Maceiras; Alejandro Martín; Alejandro Martín

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered

Feb 2021

Description

Spanish Fake News Dataset

This dataset contains a structured and annotated collection of false news items in Spanish (Castilian), gathered and processed for academic research on misinformation.

Dataset Scope

The dataset represents most of the recorded false news messages and their variations up to 01.02.2021.

Content Description

The dataset includes samples of false information in various formats:

News articles and headlines
Tweets and Facebook/Instagram/Telegram posts
YouTube video captions
WhatsApp text and voice message transcripts
Transcribed video/audio fragments with false claims
Fake government documents
Captions from photos and memes
Text extracted from images using OCR

Only Spanish (Castilian) texts were used, excluding regional variants (e.g., Catalan, Basque, Galician) for consistency.

Sources

The data was collected from the following verified fact-checking initiatives:

Fact-checkers from these organizations provide detailed articles identifying and explaining falsehoods, often including:

General context of the event
Quotes or links to false claims
Analysis and explanation of why the claims are false
Verified information or corrections

Collection Method

The dataset was built using both manual extraction (e.g., identifying and quoting false statements) and automated parsing:

MyNews service: an archive of Spanish mass media
Custom scripts: for parsing and extracting structured data
OCR tools: for extracting text from images (e.g., memes and screenshots)

Fields Description

Column Name	Description
Topic	The thematic category of the news item (e.g., Politics, Health, COVID-19, Crime). Normalized and translated to English.
Link source	URL to the original news piece, fact-check report, or source of the claim. Invalid links were removed.
Media	The platform or outlet where the false claim appeared (e.g., Facebook, YouTube, WhatsApp). Normalized for consistent spelling and language.
Date	Publication or verification date of the news item, in YYYY-MM-DD format.
Author	(Optional) Author of the news or platform source, if available. May be empty.
Headlines	Title or summary of the news item or article containing the false information.
Fake statement	Quoted false claim or misinformation as cited in the verification article.

⚠️ Notes

The dataset was preprocessed to remove duplicates, invalid links, and non-textual clutter.
Field values were normalized to support multilingual and cross-platform analysis.
Only Castilian Spanish was retained for consistency and clarity.

📚 License & Use

This dataset is intended for non-commercial academic and research purposes. Please cite the original fact-checking organizations and this dataset if used in publications or analysis.

Data from: Five Years of COVID-19 Discourse on Instagram: A Labeled...
zenodo.org
data.niaid.nih.gov
bin
Updated Oct 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nirmalya Thakur, Ph.D.; Nirmalya Thakur, Ph.D. (2024). Five Years of COVID-19 Discourse on Instagram: A Labeled Instagram Dataset of Over Half a Million Posts for Multilingual Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.13896353
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13896353
Dataset updated
Oct 21, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Nirmalya Thakur, Ph.D.; Nirmalya Thakur, Ph.D.
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Oct 6, 2024
Description
Please cite the following paper when using this dataset:

N. Thakur, “Five Years of COVID-19 Discourse on Instagram: A Labeled Instagram Dataset of Over Half a Million Posts for Multilingual Sentiment Analysis”, Proceedings of the 7th International Conference on Machine Learning and Natural Language Processing (MLNLP 2024), Chengdu, China, October 18-20, 2024 (Paper accepted for publication, Preprint available at: https://arxiv.org/abs/2410.03293)

Abstract

The outbreak of COVID-19 served as a catalyst for content creation and dissemination on social media platforms, as such platforms serve as virtual communities where people can connect and communicate with one another seamlessly. While there have been several works related to the mining and analysis of COVID-19-related posts on social media platforms such as Twitter (or X), YouTube, Facebook, and TikTok, there is still limited research that focuses on the public discourse on Instagram in this context. Furthermore, the prior works in this field have only focused on the development and analysis of datasets of Instagram posts published during the first few months of the outbreak. The work presented in this paper aims to address this research gap and presents a novel multilingual dataset of 500,153 Instagram posts about COVID-19 published between January 2020 and September 2024. This dataset contains Instagram posts in 161 different languages. After the development of this dataset, multilingual sentiment analysis was performed using VADER and twitter-xlm-roberta-base-sentiment. This process involved classifying each post as positive, negative, or neutral. The results of sentiment analysis are presented as a separate attribute in this dataset.

For each of these posts, the Post ID, Post Description, Date of publication, language code, full version of the language, and sentiment label are presented as separate attributes in the dataset.

The Instagram posts in this dataset are present in 161 different languages out of which the top 10 languages in terms of frequency are English (343041 posts), Spanish (30220 posts), Hindi (15832 posts), Portuguese (15779 posts), Indonesian (11491 posts), Tamil (9592 posts), Arabic (9416 posts), German (7822 posts), Italian (5162 posts), Turkish (4632 posts)

There are 535,021 distinct hashtags in this dataset with the top 10 hashtags in terms of frequency being #covid19 (169865 posts), #covid (132485 posts), #coronavirus (117518 posts), #covid_19 (104069 posts), #covidtesting (95095 posts), #coronavirusupdates (75439 posts), #corona (39416 posts), #healthcare (38975 posts), #staysafe (36740 posts), #coronavirusoutbreak (34567 posts)

The following is a description of the attributes present in this dataset

Post ID: Unique ID of each Instagram post

Post Description: Complete description of each post in the language in which it was originally published

Date: Date of publication in MM/DD/YYYY format

Language code: Language code (for example: “en”) that represents the language of the post as detected using the Google Translate API

Full Language: Full form of the language (for example: “English”) that represents the language of the post as detected using the Google Translate API

Sentiment: Results of sentiment analysis (using the preprocessed version of each post) where each post was classified as positive, negative, or neutral

Open Research Questions

This dataset is expected to be helpful for the investigation of the following research questions and even beyond:

How does sentiment toward COVID-19 vary across different languages?

How has public sentiment toward COVID-19 evolved from 2020 to the present?

How do cultural differences affect social media discourse about COVID-19 across various languages?

How has COVID-19 impacted mental health, as reflected in social media posts across different languages?

How effective were public health campaigns in shifting public sentiment in different languages?

What patterns of vaccine hesitancy or support are present in different languages?

How did geopolitical events influence public sentiment about COVID-19 in multilingual social media discourse?

What role does social media discourse play in shaping public behavior toward COVID-19 in different linguistic communities?

How does the sentiment of minority or underrepresented languages compare to that of major world languages regarding COVID-19?

What insights can be gained by comparing the sentiment of COVID-19 posts in widely spoken languages (e.g., English, Spanish) to those in less common languages?

All the Instagram posts that were collected during this data mining process to develop this dataset were publicly available on Instagram and did not require a user to log in to Instagram to view the same (at the time of writing this paper).

Facebook

Twitter

Click to copy link

Link copied

Cite

Stacy Jo Dixon, Facebook users worldwide 2017-2027 [Dataset]. https://www.statista.com/topics/1164/social-networks/

Facebook users worldwide 2017-2027

Explore at:

Dataset provided by

Statistahttp://statista.com/

Authors

Stacy Jo Dixon

Description

The global number of Facebook users was forecast to continuously increase between 2023 and 2027 by in total 391 million users (+14.36 percent). After the fourth consecutive increasing year, the Facebook user base is estimated to reach 3.1 billion users and therefore a new peak in 2027. Notably, the number of Facebook users was continuously increasing over the past years. User figures, shown here regarding the platform Facebook, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).

Clear search

Close search

Google apps

Main menu

Facebook users worldwide 2017-2027

Comprehensive Dataset of European Interest Groups Across Social Media...

Countries with the most Facebook users 2024

U.S. Facebook data requests from government agencies 2013-2023

A dataset of media releases (Twitter, News and Comments, Youtube, Facebook)...

Social media datasets from data donation: Data from WhatsApp, Facebook,...

Number of global social network users 2017-2028

Dataset - Information Bubble and Learning in the Digital Age

Social Media Ad Exposure | 1st Party | 3B+ events verified, US consumers |...

A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and...

Leading social media platforms used by marketers worldwide 2024

UltraLAMBDA

Social media usage by local government - Dataset - data.gov.uk

daily_socialmedia_engagement

Data from: Using Multistreaming Social Media Video as a Research Method for...

SNA Is is conspiracy or truth.xlsx

Social Media Profile Links by Name

Facebook: distribution of global audiences 2024, by age and gender

Spanish Fake News Dataset

Data from: Five Years of COVID-19 Discourse on Instagram: A Labeled...

Facebook users worldwide 2017-2027See More Versions

Facebook users worldwide 2017-2027