Facebook
TwitterWhich county has the most Facebook users?
There are more than 378 million Facebook users in India alone, making it the leading country in terms of Facebook audience size. To put this into context, if India’s Facebook audience were a country then it would be ranked third in terms of largest population worldwide. Apart from India, there are several other markets with more than 100 million Facebook users each: The United States, Indonesia, and Brazil with 193.8 million, 119.05 million, and 112.55 million Facebook users respectively.
Facebook – the most used social media
Meta, the company that was previously called Facebook, owns four of the most popular social media platforms worldwide, WhatsApp, Facebook Messenger, Facebook, and Instagram. As of the third quarter of 2021, there were around 3,5 billion cumulative monthly users of the company’s products worldwide. With around 2.9 billion monthly active users, Facebook is the most popular social media worldwide. With an audience of this scale, it is no surprise that the vast majority of Facebook’s revenue is generated through advertising.
Facebook usage by device
As of July 2021, it was found that 98.5 percent of active users accessed their Facebook account from mobile devices. In fact, almost 81.8 percent of Facebook audiences worldwide access the platform only via mobile phone. Facebook is not only available through mobile browser as the company has published several mobile apps for users to access their products and services. As of the third quarter 2021, the four core Meta products were leading the ranking of most downloaded mobile apps worldwide, with WhatsApp amassing approximately six billion downloads.
Facebook
TwitterHow many people use social media?
Social media usage is one of the most popular online activities. In 2024, over five billion people were using social media worldwide, a number projected to increase to over six billion in 2028.
Who uses social media?
Social networking is one of the most popular digital activities worldwide and it is no surprise that social networking penetration across all regions is constantly increasing. As of January 2023, the global social media usage rate stood at 59 percent. This figure is anticipated to grow as lesser developed digital markets catch up with other regions
when it comes to infrastructure development and the availability of cheap mobile devices. In fact, most of social media’s global growth is driven by the increasing usage of mobile devices. Mobile-first market Eastern Asia topped the global ranking of mobile social networking penetration, followed by established digital powerhouses such as the Americas and Northern Europe.
How much time do people spend on social media?
Social media is an integral part of daily internet usage. On average, internet users spend 151 minutes per day on social media and messaging apps, an increase of 40 minutes since 2015. On average, internet users in Latin America had the highest average time spent per day on social media.
What are the most popular social media platforms?
Market leader Facebook was the first social network to surpass one billion registered accounts and currently boasts approximately 2.9 billion monthly active users, making it the most popular social network worldwide. In June 2023, the top social media apps in the Apple App Store included mobile messaging apps WhatsApp and Telegram Messenger, as well as the ever-popular app version of Facebook.
Facebook
TwitterInstagram’s most popular post
As of April 2024, the most popular post on Instagram was Lionel Messi and his teammates after winning the 2022 FIFA World Cup with Argentina, posted by the account @leomessi. Messi's post, which racked up over 61 million likes within a day, knocked off the reigning post, which was 'Photo of an Egg'. Originally posted in January 2021, 'Photo of an Egg' surpassed the world’s most popular Instagram post at that time, which was a photo by Kylie Jenner’s daughter totaling 18 million likes.
After several cryptic posts published by the account, World Record Egg revealed itself to be a part of a mental health campaign aimed at the pressures of social media use.
Instagram’s most popular accounts
As of April 2024, the official Instagram account @instagram had the most followers of any account on the platform, with 672 million followers. Portuguese footballer Cristiano Ronaldo (@cristiano) was the most followed individual with 628 million followers, while Selena Gomez (@selenagomez) was the most followed woman on the platform with 429 million. Additionally, Inter Miami CF striker Lionel Messi (@leomessi) had a total of 502 million. Celebrities such as The Rock, Kylie Jenner, and Ariana Grande all had over 380 million followers each.
Instagram influencers
In the United States, the leading content category of Instagram influencers was lifestyle, with 15.25 percent of influencers creating lifestyle content in 2021. Music ranked in second place with 10.96 percent, followed by family with 8.24 percent. Having a large audience can be very lucrative: Instagram influencers in the United States, Canada and the United Kingdom with over 90,000 followers made around 1,221 US dollars per post.
Instagram around the globe
Instagram’s worldwide popularity continues to grow, and India is the leading country in terms of number of users, with over 362.9 million users as of January 2024. The United States had 169.65 million Instagram users and Brazil had 134.6 million users. The social media platform was also very popular in Indonesia and Turkey, with 100.9 and 57.1, respectively. As of January 2024, Instagram was the fourth most popular social network in the world, behind Facebook, YouTube and WhatsApp.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Top 50 accounts with most followers across major social media platforms
YouTube - as of 24 September 2021 Instagram - as of 26 September 2021 Twitter - as of 25 September 2021 TikTok - as of 23 September 2021 Facebook - as of 20 September 2021 Twitch - as of 21 September 2021
Wikipedia
Facebook
TwitterA global survey conducted in the third quarter of 2024 found that the main reason for using social media was to keep in touch with friends and family, with over 50.8 percent of social media users saying this was their main reason for using online networks. Overall, 39 percent of social media users said that filling spare time was their main reason for using social media platforms, whilst 34.5 percent of respondents said they used it to read news stories. Less than one in five users were on social platforms for the reason of following celebrities and influencers.
The most popular social network
Facebook dominates the social media landscape. The world's most popular social media platform turned 20 in February 2024, and it continues to lead the way in terms of user numbers. As of February 2025, the social network had over three billion global users. YouTube, Instagram, and WhatsApp follow, but none of these well-known brands can surpass Facebook’s audience size.
Moreover, as of the final quarter of 2023, there were almost four billion Meta product users.
Ever-evolving social media usage
The utilization of social media remains largely gratuitous; however, companies have been encouraging users to become paid subscribers to reduce dependence on advertising profits. Meta Verified entices users by offering a blue verification badge and proactive account protection, among other things. X (formerly Twitter), Snapchat, and Reddit also offer users the chance to upgrade their social media accounts for a monthly free.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
This dataset explores the impact of social media usage on suicide rates, presenting an analysis based on social media platform data and WHO suicide rate statistics. It is an insightful resource for researchers, data scientists, and analysts looking to understand the correlation between increased social media activity and suicide rates across different regions and demographics.
The dataset includes the following key sources:
WHO Suicide Rate Data (SDGSUICIDE): Retrieved from WHO data export, which tracks global suicide rates. Social Media Usage Data: Information from major social media platforms, sourced from Kaggle, supplemented with data from:
We would like to acknowledge:
World Health Organization (WHO): For providing global suicide rate data, accessible under their data policy (WHO Data Policy). Kaggle Dataset Contributors: For social media usage data that played a crucial role in the analysis.
This dataset is useful for studying the potential social factors contributing to suicide rates, especially the role of social media. Analysts can explore correlations using time-series analysis, regression models, or other statistical tools to derive meaningful insights. Please ensure compliance with the Creative Commons Attribution Non-Commercial Share Alike 4.0 International License (CC BY-NC-SA 4.0).
Impact-of-social-media-on-suicide-rates-results-1.1.0.zip (90.9 kB) Contains processed results and supplementary data.
If you use this dataset in your work, please cite:
Martin Winkler. (2021). Impact of social media on suicide rates: produced results (1.1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.4701587 https://zenodo.org/records/4701587
This dataset is released under the Creative Commons Attribution Non-Commercial Share Alike 4.0 International (CC BY-NC-SA 4.0) license. You are free to share and adapt the material, provided proper attribution is given, it's not used for commercial purposes, and any derivatives are distributed under the same license.
Year: The year of the recorded data. Sex: Demographic indicator (e.g., male, female). Suicide Rate % Change Since 2010: Percentage change in suicide rates compared to the year 2010. Twitter User Count % Change Since 2010: Percentage change in Twitter user counts compared to the year 2010. Facebook User Count % Change Since 2010: Percentage change in Facebook user counts compared to the year 2010.
The dataset includes categorized data ranges, allowing for analysis of trends within specified intervals. For example, ranges for suicide rates, Twitter user counts, and Facebook user counts are represented in bins for better granularity.
The dataset summarizes counts for various intervals, enabling researchers to identify trends and patterns over time, highlighting periods of significant change or stability in both suicide rates and social media usage.
This dataset can be used for:
Statistical analysis to understand correlations between social media usage and mental health outcomes. Academic research focused on public health, psychology, or sociology. Policy-making discussions aimed at addressing mental health concerns linked to social media.
The dataset contains sensitive information regarding suicide rates. Users should handle this data with care and sensitivity, considering ethical implications when presenting findings.
Facebook
Twitterhttps://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions
This dataset presents year and month wise enforcement actions taken by Significant Social Media Intermediaries (SSMIs) from 2021 to the present, compiled from the mandatory monthly transparency reports published under Rule 4(1)(d) of the Information Technology (Intermediary Guidelines and Digital Media Ethics Code) Rules, 2021. It includes counts of content removed, accounts suspended or banned, and chatrooms, comments, edit profiles and livestreams restricted, along with the policy or violation category (e.g., child sexual exploitation, terrorism, hate speech, bullying, violence, regulated goods, misinformation, etc.).
To enable comparability across platforms with different reporting terms, the dataset uses a standardised enforcement classification:
The type of action taken: a. Content Actioned (any enforcement such as warning, downranking, age-gating), b. Content Removed (content deleted or made inaccessible), c. Account Banned (account suspension or disabling), d. Quality Metric (AI moderation accuracy indicators reported by some platforms).
Whether the platform identified and enforced before user reports: a. Proactive = Found via automated detection or internal review systems, b. Unknown = Platform did not specify proactive vs reactive.
Notes: 1. SSMI denotes to Significant Social Media Intermediaries, with over 50,00,000 registered users in India, which primarily or solely enables online interaction between two or more users and allows them to create, upload, share, disseminate, modify or access information using its services
Facebook & Instagram (Meta) a. Content Actioned counts any enforcement, not only removals (e.g., removals, warning screens/covering, age gates, downranking). b. Proactive Rate = (items found & actioned proactively) ÷ (total content actioned).
X/Twitter a. Child Sexual Exploitation and terrorism suspensions are largely proactive, flagged using proprietary tools and industry hash-sharing systems. b. Data reflects global enforcement, not only India.
Google / YouTube a. Number of removal actions as a result of automated detection captures actions triggered by automated systems (ML + human-trained models).
ShareChat a. Content Removed / Taken Down / UGC discard / Comments/Chatrooms deleted are standardised as Content Removed. b. Also includes rights-holder reporting workflow for copyright/IP and automated proactive monitoring for harmful content.
WhatsApp a. Reports Proactively Banned Accounts, meaning accounts banned before any user reports.
Koo a. Distinguishes between Content Removed, Content Actioned (flagged/downranked), and Account Banned. b. Automation Correct/Wrong reflect AI moderation accuracy, not enforcement outcomes.
Facebook
TwitterDuring a 2024 survey among marketers worldwide, around 86 percent reported using Facebook for marketing purposes. Instagram and LinkedIn followed, respectively mentioned by 79 and 65 percent of the respondents.
The global social media marketing segment
According to the same study, 59 percent of responding marketers intended to increase their organic use of YouTube for marketing purposes throughout that year. LinkedIn and Instagram followed with similar shares, rounding up the top three social media platforms attracting a planned growth in organic use among global marketers in 2024. Their main driver is increasing brand exposure and traffic, which led the ranking of benefits of social media marketing worldwide.
Social media for B2B marketing
Social media platform adoption rates among business-to-consumer (B2C) and business-to-business (B2B) marketers vary according to each subsegment's focus. While B2C professionals prioritize Facebook and Instagram – both run by Meta, Inc. – due to their popularity among online audiences, B2B marketers concentrate their endeavors on Microsoft-owned LinkedIn due to its goal to connect people and companies in a corporate context.
Facebook
TwitterDuring a January 2024 global survey among marketers, nearly 60 percent reported plans to increase their organic use of YouTube for marketing purposes in the following 12 months. LinkedIn and Instagram followed, respectively mentioned by 57 and 56 percent of the respondents intending to use them more. According to the same survey, Facebook was the most important social media platform for marketers worldwide.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Researcher(s): Alexandros Mokas, Eleni Kamateri
Supervisor: Ioannis Tsampoulatidis
This repository contains 3 social media datasets:
2 Post-processing datasets: These datasets contain post-processing data extracted from the analysis of social media posts collected for two different use cases during the first two years of the Deepcube project. More specifically, these include:
The UC2 dataset containing the post-processing analysis of the Twitter data collected for the DeepCube use case (UC2) dealing with the climate induced migration in Africa. This dataset contains in total 5,695,253 social media posts collected from the Twitter platform, based on the initial version of search criteria relevant to UC2 defined by Universitat De Valencia, focused on the regions of Ethiopia and Somalia and started from 26 June, 2021 till March, 2023.
The UC5 dataset containing the post-processing analysis of the Twitter and Instagram data collected for the DeepCube use case (UC5) related to the sustainable and environmentally-friendly tourism. This dataset contains in total 58,143 social media posts collected from the Twitter and Instagram platform (12,881 collected from Twitter and 45,262 collected from Instagram), based on the initial version of search criteria relevant to UC5 defined by MURMURATION SAS, focused on the regions of Brasil and started from 26 June, 2021 till March, 2023.
1 Annotated dataset: An additional anottated dataset was created that contains post-processing data along with annotations of Twitter posts collected for UC2 for the years 2010-2022. More specifically, it includes:
The UC2 dataset contain the post-processing of the Twitter data collected for the DeepCube use case (UC2) dealing with the climate induced migration in Africa. This dataset contains in total 1721 annotated (412 relevant and 1309 irrelevant) by social media posts collected from the Twitter platform, focused on the region of Somalia and started from 1 January, 2010 till 31 December, 2022.
For every social media post retrieved from Twitter and Instagram, a preprocessing step was performed. This involved a three-step analysis of each post using the appropriate web service. First, the location of the post was automatically extracted from the text using a location extraction service. Second, the images included in the post were analyzed using a concept extraction service, which identified and provided the top ten concepts that best described the image. These concepts included items such as "person," "building," "drought," "sun," and so on. Finally, the sentiment expressed in the post's text was determined by using a sentiment analysis service. The sentiment was classified as either positive, negative, or neutral.
After the social media posts were preprocessed, they were visualized using the Social Media Web Application. This intuitive, user-friendly online application was designed for both expert and non-expert users and offers a web-based user interface for filtering and visualizing the collected social media data. The application provides various filtering options, an interactive map, a timeline, and a collection of graphs to help users analyze the data. Moreover, this application provides users with the option to download aggregated data for specific periods by applying filters and clicking the "Download Posts" button. This feature allows users to easily extract and analyze social media data outside of the web application, providing greater flexibility and control over data analysis.
The dataset is provided by INFALIA. INFALIA, being a spin-off of the CERTH institute and a partner of a research EU project, releases this dataset containing Tweets IDs and post pre-processing data for the sole purpose of enabling the validation of the research conducted within the DeepCube. Moreover, Twitter Content provided in this dataset to third parties remains subject to the Twitter Policy, and those third parties must agree to the Twitter Terms of Service, Privacy Policy, Developer Agreement, and Developer Policy (https://developer.twitter.com/en/developer-terms) before receiving this download.
Facebook
TwitterThe global social media penetration rate in was forecast to continuously increase between 2024 and 2028 by in total 11.6 (+18.19 percent). After the ninth consecutive increasing year, the penetration rate is estimated to reach 75.31 and therefore a new peak in 2028. Notably, the social media penetration rate of was continuously increasing over the past years.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset consists of YouTube video and comment metadata for COVID-19 related videos in January 2021.
Social Media is becoming a popular data source for analysts to observe, manipulate, and model. This is especially true for data related to COVID-19. Many Users can easily find a variety of COVID-19 related social media data on Kaggle, such as datasets for Tweets.
However, despite YouTube being one of the most used social media platforms, there aren't many datasets containing metadata for YouTube videos and comments. This is dataset aims to provide usable COVID-19 related text data regarding YouTube.
The videos in this dataset are between January 1st, 2021 and January 30th, 2021. The dataset contains the most-viewed videos that were related to at least one of the following search queries:
For this project, I developed youcos, a simple Python package for collecting and saving YouTube video and comment data through the YouTube v3 API. Feel free to contribute to the project or use the package to collect your own data!
The data was primarily acquired through the YouTube v3 API. You can read their Terms of Service here
Here are some possible next steps for this data: - Perform sentiment analysis for the videos and comments, - Compare sentiments for COVID related posts with Twitter, Reddit, and other social media platforms - Predict the number of comments, views, and likes/dislikes a video will have based on its Title - Predict the number of likes a comment will receive based on its text and sentiments
future datasets / additions will include data collected based on a specific time frame, location, and view counts
If you find this dataset useful, please UPVOTE! It motivates me to create more quality content. Thank you!
Facebook
Twitterhttp://rdm.uva.nl/en/support/confidential-data.htmlhttp://rdm.uva.nl/en/support/confidential-data.html
This data set belongs to:Pouwels, J. L., Valkenburg, P. M., Beyens, I., van Driel, I. I., & Keijsers, L. (2021). Some Socially Poor But Also Some Socially Rich Adolescents Feel Closer To Their Friends After Using Social Media.The preregistration of the design, sampling and analysis plan of the study (https://osf.io/hxf7t), as well as all syntax files and online supplemental materials are available on the Open Science Framework (OSF) at https://osf.io/9ry7j/.The .csv files are used for the analyses in R. The .dat files for the analyses in Mplus. The variable names of the .dat file could be find in the Mplus input file.For more information, please contact the authors at j.l.pouwels@uva.nl or info@project-awesome.nl.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Please cite the following paper when using this dataset:
N. Thakur, “A Large-Scale Dataset of Twitter Chatter about Online Learning during the Current COVID-19 Omicron Wave,” Journal of Data, vol. 7, no. 8, p. 109, Aug. 2022, doi: 10.3390/data7080109
Abstract
The COVID-19 Omicron variant, reported to be the most immune evasive variant of COVID-19, is resulting in a surge of COVID-19 cases globally. This has caused schools, colleges, and universities in different parts of the world to transition to online learning. As a result, social media platforms such as Twitter are seeing an increase in conversations, centered around information seeking and sharing, related to online learning. Mining such conversations, such as Tweets, to develop a dataset can serve as a data resource for interdisciplinary research related to the analysis of interest, views, opinions, perspectives, attitudes, and feedback towards online learning during the current surge of COVID-19 cases caused by the Omicron variant. Therefore this work presents a large-scale public Twitter dataset of conversations about online learning since the first detected case of the COVID-19 Omicron variant in November 2021. The dataset is compliant with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter and the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management.
Data Description
The dataset comprises a total of 52,984 Tweet IDs (that correspond to the same number of Tweets) about online learning that were posted on Twitter from 9th November 2021 to 13th July 2022. The earliest date was selected as 9th November 2021, as the Omicron variant was detected for the first time in a sample that was collected on this date. 13th July 2022 was the most recent date as per the time of data collection and publication of this dataset.
The dataset consists of 9 .txt files. An overview of these dataset files along with the number of Tweet IDs and the date range of the associated tweets is as follows. Table 1 shows the list of all the synonyms or terms that were used for the dataset development.
Filename: TweetIDs_November_2021.txt (No. of Tweet IDs: 1283, Date Range of the associated Tweet IDs: November 1, 2021 to November 30, 2021)
Filename: TweetIDs_December_2021.txt (No. of Tweet IDs: 10545, Date Range of the associated Tweet IDs: December 1, 2021 to December 31, 2021)
Filename: TweetIDs_January_2022.txt (No. of Tweet IDs: 23078, Date Range of the associated Tweet IDs: January 1, 2022 to January 31, 2022)
Filename: TweetIDs_February_2022.txt (No. of Tweet IDs: 4751, Date Range of the associated Tweet IDs: February 1, 2022 to February 28, 2022)
Filename: TweetIDs_March_2022.txt (No. of Tweet IDs: 3434, Date Range of the associated Tweet IDs: March 1, 2022 to March 31, 2022)
Filename: TweetIDs_April_2022.txt (No. of Tweet IDs: 3355, Date Range of the associated Tweet IDs: April 1, 2022 to April 30, 2022)
Filename: TweetIDs_May_2022.txt (No. of Tweet IDs: 3120, Date Range of the associated Tweet IDs: May 1, 2022 to May 31, 2022)
Filename: TweetIDs_June_2022.txt (No. of Tweet IDs: 2361, Date Range of the associated Tweet IDs: June 1, 2022 to June 30, 2022)
Filename: TweetIDs_July_2022.txt (No. of Tweet IDs: 1057, Date Range of the associated Tweet IDs: July 1, 2022 to July 13, 2022)
The dataset contains only Tweet IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used. For hydrating this dataset the Hydrator application (link to download and a step-by-step tutorial on how to use Hydrator) may be used.
Table 1. List of commonly used synonyms, terms, and phrases for online learning and COVID-19 that were used for the dataset development
Terminology
List of synonyms and terms
COVID-19
Omicron, COVID, COVID19, coronavirus, coronaviruspandemic, COVID-19, corona, coronaoutbreak, omicron variant, SARS CoV-2, corona virus
online learning
online education, online learning, remote education, remote learning, e-learning, elearning, distance learning, distance education, virtual learning, virtual education, online teaching, remote teaching, virtual teaching, online class, online classes, remote class, remote classes, distance class, distance classes, virtual class, virtual classes, online course, online courses, remote course, remote courses, distance course, distance courses, virtual course, virtual courses, online school, virtual school, remote school, online college, online university, virtual college, virtual university, remote college, remote university, online lecture, virtual lecture, remote lecture, online lectures, virtual lectures, remote lectures
Facebook
Twitterhttp://www.gnu.org/licenses/gpl.htmlhttp://www.gnu.org/licenses/gpl.html
This dataset contains links to 486 newsroom posts published by five prominent digital platforms (Facebook, Tinder, YouTube, TikTok, and Twitter). Each record includes the name of the platform publishing the post, the date, the title of the post, and a URL to its original location on the official website of the platform in question.
All the newsroom posts collected were identified through content analysis as relating to harm or safety, published between November 1, 2016 – November 1, 2021, from each platforms’ newsroom or company blog. The outlets included the Twitter Blog (), TikTok Newsroom (), the YouTube Official Blog (), Tinder Newsroom (), and the Facebook (now Meta) Newsroom (). While several of the platforms published newsroom posts in a range of languages, our sample only includes posts that were published in English.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Twitter [source]
This dataset provides a comprehensive understanding of Jojo Siwa's Twitter content and engagement. Through detailed analysis, we can see the types of messages she posts, which ones receive the most likes or comments, and other insights into her social media strategy. The data spans from November 2018 to March 2021 and includes over 2,500 tweets posted by Siwa during this period, including text-based messages, images, quotes, outlinks, as well as other media. Using this data allows us to gain insight into how her posts are received by her followers on Twitter compared to other platforms. Our dataset’s columns include information on number of likes/retweets each tweet has received as well as whether the message contained any media like images or videos along with its captions. Utilize this powerful data set today to get a better handle on Jojo Siwa's popular presence on social media!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
Welcome to the Jojo Siwa’s Twitter Insights dataset, where you can gain an insight into how she is engaging with her fanbase on social media. This dataset provides valuable information about her posts—including likes, media, and engagement statistics—and covers over two years of data for easy analysis. Here’s a quick guide to using this dataset:
- Examining engagement rates of different visual media types used in Jojo Siwa’s tweets (photos, graphics, GIFs, etc.), to determine what type of content has the highest engagement.
- Analyzing which topics and themes are the most popular among Jojo Siwa’s followers by examining the hashtags and keywords used in her tweets.
- Investigating trends in engagement throughout different periods of time by analyzing how much each tweet was liked compared to how many followers she has at that period of time
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Twitter.
Facebook
TwitterBy downloading the data, you agree with the terms & conditions mentioned below:
Data Access: The data in the research collection may only be used for research purposes. Portions of the data are copyrighted and have commercial value as data, so you must be careful to use them only for research purposes.
Summaries, analyses and interpretations of the linguistic properties of the information may be derived and published, provided it is impossible to reconstruct the information from these summaries. You may not try identifying the individuals whose texts are included in this dataset. You may not try to identify the original entry on the fact-checking site. You are not permitted to publish any portion of the dataset besides summary statistics or share it with anyone else.
We grant you the right to access the collection's content as described in this agreement. You may not otherwise make unauthorised commercial use of, reproduce, prepare derivative works, distribute copies, perform, or publicly display the collection or parts of it. You are responsible for keeping and storing the data in a way that others cannot access. The data is provided free of charge.
Citation
Please cite our work as
@InProceedings{clef-checkthat:2022:task3,
author = {K{\"o}hler, Juliane and Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Wiegand, Michael and Siegel, Melanie and Mandl, Thomas},
title = "Overview of the {CLEF}-2022 {CheckThat}! Lab Task 3 on Fake News Detection",
year = {2022},
booktitle = "Working Notes of CLEF 2022---Conference and Labs of the Evaluation Forum",
series = {CLEF~'2022},
address = {Bologna, Italy},}
@article{shahi2021overview,
title={Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection},
author={Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Mandl, Thomas},
journal={Working Notes of CLEF},
year={2021}
}
Problem Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and detect the topical domain of the article. This task will run in English and German.
Task 3: Multi-class fake news detection of news articles (English) Sub-task A would detect fake news designed as a four-class classification problem. Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. The training data will be released in batches and roughly about 1264 articles with the respective label in English language. Our definitions for the categories are as follows:
False - The main claim made in an article is untrue.
Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.
True - This rating indicates that the primary elements of the main claim are demonstrably true.
Other- An article that cannot be categorised as true, false, or partially false due to a lack of evidence about its claims. This category includes articles in dispute and unproven articles.
Cross-Lingual Task (German)
Along with the multi-class task for the English language, we have introduced a task for low-resourced language. We will provide the data for the test in the German language. The idea of the task is to use the English data and the concept of transfer to build a classification model for the German language.
Input Data
The data will be provided in the format of Id, title, text, rating, the domain; the description of the columns is as follows:
Output data format
Sample File
public_id, predicted_rating
1, false
2, true
IMPORTANT!
Baseline: For this task, we have created a baseline system. The baseline system can be found at https://zenodo.org/record/6362498
Related Work
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview
Social media platforms have an important role in Brazilian society's polarization. Especially during the COVID-19 pandemic (2020-2021), these platforms have a peak of posts about President Bolsonaro's speech and behavior during this period. On the one hand, millions of people support Bolsonaro's attitudes and follow his controversial guidances. On the other hand, a vast number of social media users accuse Bolsonaro of acting against democracy and science.
In this context, this dataset presents the collection of tweets posts and its linked news articles (including all related media) covering two Brazilians’ demonstrations events PRO and AGAINST President Bolsonaro government, during September 7th and October 2nd of 2021. The dataset contains 4.7M tweets.
Data Collection We use the library Fake-News Crawler (https://github.com/phillipecardenuto/fakenews-crawler) to collect the tweet posts and related media. For this, we provided keywords related to both events to receive the data during events and following days. For instance, some of the keywords used were ‘7deSet’, ‘BolsonaroAte2026’, ‘VemParaRua’, ‘EleNao’, ‘07EuVou’, ‘SupremoÉOPovo’, 'aculpaédobolsonaro’, ‘2outeuvou’.
Disclaimer: We did not perform any filtering or procedure to assert that all collected data is, in fact, related to the demonstrations; therefore, some of the content of the dataset might not be related to these events.
Content brazilian_demonstration_events.json: It contains the tweet posts, their metadata (e.g., post time, language), and all related media content URLs (i.e., news article link and media links).
Media Content Due to the terms of use from the social networks, we do not make publicly available the images and videos that were collected. However, we can provide some extra pieces of media content related to one (or more) events by contacting the authors.
Funding DéjàVu thematic project, São Paulo Research Foundation (grants 2017/12646-3, 2020/02241-9 and 2020/02211-2)
Facebook
TwitterAs of January 2024, #love was the most used hashtag on Instagram, being included in over two billion posts on the social media platform. #Instagood and #instagram were used over one billion times as of early 2024.
Facebook
TwitterDuring a 2024 survey, 77 percent of respondents from Nigeria stated that they used social media as a source of news. In comparison, just 23 percent of Japanese respondents said the same. Large portions of social media users around the world admit that they do not trust social platforms either as media sources or as a way to get news, and yet they continue to access such networks on a daily basis.
Social media: trust and consumption
Despite the majority of adults surveyed in each country reporting that they used social networks to keep up to date with news and current affairs, a 2018 study showed that social media is the least trusted news source in the world. Less than 35 percent of adults in Europe considered social networks to be trustworthy in this respect, yet more than 50 percent of adults in Portugal, Poland, Romania, Hungary, Bulgaria, Slovakia and Croatia said that they got their news on social media.
What is clear is that we live in an era where social media is such an enormous part of daily life that consumers will still use it in spite of their doubts or reservations. Concerns about fake news and propaganda on social media have not stopped billions of users accessing their favorite networks on a daily basis.
Most Millennials in the United States use social media for news every day, and younger consumers in European countries are much more likely to use social networks for national political news than their older peers.
Like it or not, reading news on social is fast becoming the norm for younger generations, and this form of news consumption will likely increase further regardless of whether consumers fully trust their chosen network or not.
Facebook
TwitterWhich county has the most Facebook users?
There are more than 378 million Facebook users in India alone, making it the leading country in terms of Facebook audience size. To put this into context, if India’s Facebook audience were a country then it would be ranked third in terms of largest population worldwide. Apart from India, there are several other markets with more than 100 million Facebook users each: The United States, Indonesia, and Brazil with 193.8 million, 119.05 million, and 112.55 million Facebook users respectively.
Facebook – the most used social media
Meta, the company that was previously called Facebook, owns four of the most popular social media platforms worldwide, WhatsApp, Facebook Messenger, Facebook, and Instagram. As of the third quarter of 2021, there were around 3,5 billion cumulative monthly users of the company’s products worldwide. With around 2.9 billion monthly active users, Facebook is the most popular social media worldwide. With an audience of this scale, it is no surprise that the vast majority of Facebook’s revenue is generated through advertising.
Facebook usage by device
As of July 2021, it was found that 98.5 percent of active users accessed their Facebook account from mobile devices. In fact, almost 81.8 percent of Facebook audiences worldwide access the platform only via mobile phone. Facebook is not only available through mobile browser as the company has published several mobile apps for users to access their products and services. As of the third quarter 2021, the four core Meta products were leading the ranking of most downloaded mobile apps worldwide, with WhatsApp amassing approximately six billion downloads.