The global number of Facebook users was forecast to continuously increase between 2023 and 2027 by in total 391 million users (+14.36 percent). After the fourth consecutive increasing year, the Facebook user base is estimated to reach 3.1 billion users and therefore a new peak in 2027. Notably, the number of Facebook users was continuously increasing over the past years. User figures, shown here regarding the platform Facebook, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
171 million names (100 million unique) This torrent contains: The URL of every searchable Facebook user s profile The name of every searchable Facebook user, both unique and by count (perfect for post-processing, datamining, etc) Processed lists, including first names with count, last names with count, potential usernames with count, etc The programs I used to generate everything So, there you have it: lots of awesome data from Facebook. Now, I just have to find one more problem with Facebook so I can write "Revenge of the Facebook Snatchers" and complete the trilogy. Any suggestions? >:-) Limitations So far, I have only indexed the searchable users, not their friends. Getting their friends will be significantly more data to process, and I don t have those capabilities right now. I d like to tackle that in the future, though, so if anybody has any bandwidth they d like to donate, all I need is an ssh account and Nmap installed. An additional limitation is that these are on
Which county has the most Facebook users?
There are more than 378 million Facebook users in India alone, making it the leading country in terms of Facebook audience size. To put this into context, if India’s Facebook audience were a country then it would be ranked third in terms of largest population worldwide. Apart from India, there are several other markets with more than 100 million Facebook users each: The United States, Indonesia, and Brazil with 193.8 million, 119.05 million, and 112.55 million Facebook users respectively.
Facebook – the most used social media
Meta, the company that was previously called Facebook, owns four of the most popular social media platforms worldwide, WhatsApp, Facebook Messenger, Facebook, and Instagram. As of the third quarter of 2021, there were around 3,5 billion cumulative monthly users of the company’s products worldwide. With around 2.9 billion monthly active users, Facebook is the most popular social media worldwide. With an audience of this scale, it is no surprise that the vast majority of Facebook’s revenue is generated through advertising.
Facebook usage by device
As of July 2021, it was found that 98.5 percent of active users accessed their Facebook account from mobile devices. In fact, almost 81.8 percent of Facebook audiences worldwide access the platform only via mobile phone. Facebook is not only available through mobile browser as the company has published several mobile apps for users to access their products and services. As of the third quarter 2021, the four core Meta products were leading the ranking of most downloaded mobile apps worldwide, with WhatsApp amassing approximately six billion downloads.
http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
Facebook is becoming an essential tool for more than just family and friends. Discover how Cheltenham Township (USA), a diverse community just outside of Philadelphia, deals with major issues such as the Bill Cosby trial, everyday traffic issues, sewer I/I problems and lost cats and dogs. And yes, theft.
Communities work when they're connected and exchanging information. What and who are the essential forces making a positive impact, and when and how do conversational threads get directed or misdirected?
Use Any Facebook Public Group
You can leverage the examples here for any public Facebook group. For an example of the source code used to collect this data, and a quick start docker image, take a look at the following project: facebook-group-scrape.
Data Sources
There are 4 csv files in the dataset, with data from the following 5 public Facebook groups:
post.csv
These are the main posts you will see on the page. It might help to take a quick look at the page. Commas in the msg field have been replaced with {COMMA}, and apostrophes have been replaced with {APOST}.
comment.csv
These are comments to the main post. Note, Facebook postings have comments, and comments on comments.
like.csv
These are likes and responses. The two keys in this file (pid,cid) will join to post and comment respectively.
member.csv
These are all the members in the group. Some members never, or rarely, post or comment. You may find multiple entries in this table for the same person. The name of the individual never changes, but they change their profile picture. Each profile picture change is captured in this table. Facebook gives users a new id in this table when they change their profile picture.
The Facebook Users by Country Data (Cleaned) dataset is a collection of information on Facebook users from different countries. The dataset contains five columns of data, which are named as follows:
The Facebook Users by Country Data (Cleaned) dataset can be used in several ways. Here are some potential use cases:
Market Research: Marketers can use this dataset to identify markets with the highest concentration of Facebook users. This information can be used to target Facebook ads to specific regions, optimize social media campaigns, and determine which markets to expand into.
Business Strategy: Businesses can use this dataset to identify potential markets for their products or services. By analyzing Facebook usage rates in different countries, businesses can identify countries with high engagement rates and target those markets.
Social Media Analysis: Researchers can use this dataset to analyze social media behavior in different countries. By comparing Facebook usage rates across different countries, researchers can identify cultural and social differences that affect social media behavior.
There's a story behind every dataset and here's your opportunity to share yours.
What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
The study on Facebook users was conducted by infratest dimap on behalf of the Konrad Adenauer Foundation. During the survey period from November 26 to December 4, 2018, 2,041 Facebook users were surveyed in online interviews (CAWI) on the following topics: internet use, Facebook groups, Facebook use, political content on Facebook, reaction to content, image experiment and Sunday question. Respondents were selected by quota sampling from an online access panel. Use of various internet services (Tinder, Facebook, Twitter, snapchat, Instagram, YouTube, online newspapers, none of the above); use of open or closed Facebook groups; Facebook content on political topics, on job-related topics, on hobbies, on entertainment or on other topics; type of Facebook use (read/ like/ share content, write comments, disseminate own content); political Facebook use (read/ like/ share political content, write comments on political topics, disseminate own content on political topics); reaction to Facebook content or comments (do I feel informed, entertained, annoyed, provoked, etc.). comments (feel informed, entertained, annoyed, provoked); agreement with various statements on Facebook (on Facebook others upset me, I show others their limits, I can speak my mind anonymously, I find many different opinions, I find opinions that are otherwise suppressed, I dare to say/ share things I would not otherwise say); party preference (Sunday question); comment (open) on a provocative image (split A: refugees, split B: Pegida). Demography: sex; age (year of birth); education; employment; occupational status; net household income (grouped); federal state. Additionally coded: serial number; weighting factor. Die Studie über Facebook-Nutzer wurde von infratest dimap im Auftrag der Konrad-Adenauer-Stiftung durchgeführt. Im Erhebungszeitraum 26. November bis 4. Dezember 2018 wurden 2.041 Facebook-Nutzer in Onlineinterviews (CAWI) zu folgenden Themen befragt: Internetnutzung, Facebook-Gruppen, Facebooknutzung, politische Inhalte auf Facebook, Reaktion auf Inhalte, Bildexperiment und Sonntagsfrage. Die Auswahl der Befragten erfolgte durch eine Quotenstichprobe aus einem Online-Access-Panel. Nutzung verschiedener Internetangebote (Tinder, Facebook, Twitter, snapchat, Instagram, YouTube, Online-Zeitungen, nichts davon); Nutzung offener oder geschlossener Facebook-Gruppen; Facebook-Inhalte zu politischen Themen, zu berufsbezogenen Themen, zu Hobbies, zur Unterhaltung bzw. zu anderen Themen; Art der Facebook-Nutzung (lese/ like/ teile Inhalte, schreibe Kommentare, verbreite eigene Inhalte); politische Facebook-Nutzung (lese/ like/ teile politische Inhalte, schreibe Kommentare zu politischen Themen, verbreite eigenen Inhalt zu politischen Themen); Reaktion auf Facebook- Inhalte bzw. Kommentare (fühle mich informiert, unterhalten, verärgert, provoziert); Zustimmung zu verschiedenen Aussagen zu Facebook (auf Facebook regen mich andere auf, zeige ich anderen ihre Grenzen, kann ich anonym meine Meinung sagen, finde ich viele verschiedene Meinungen, finde ich Meinungen, die sonst unterdrückt werden, traue ich mich Dinge zu sagen/ teilen, die ich sonst nicht sagen würde); Parteipräferenz (Sonntagsfrage); Kommentar (offen) zu einem provozierenden Bild (Split A: Flüchtlinge, Split B: Pegida). Demographie: Geschlecht; Alter (Geburtsjahr); Bildung; Erwerbstätigkeit; berufliche Stellung; Haushaltsnettoeinkommen (gruppiert); Bundesland. Zusätzlich verkodet wurde: lfd. Nummer; Gewichtungsfaktor.
This table includes platform data for Facebook participants in the Deactivation experiment. Each row of the dataset corresponds to data from a participant’s Facebook user account. Each column contains a value, or set of values, that aggregates log data for this specific participant over a certain period of time.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
With roughly 2.89 billion monthly active users as of the second quarter of 2021, Facebook is the biggest social network worldwide. In the third quarter of 2012, the number of active Facebook users surpassed one billion, making it the first social network ever to do so. Active users are those who have logged into Facebook during the past 30 days. During the first quarter of 2021, the company stated that 3.51 billion people were using at least one of the company's core products (Facebook, WhatsApp, Instagram, or Messenger) each month.
This data was collected by Facebook and was released in July 2021.
Die Studie über Facebook-Nutzer wurde von infratest dimap im Auftrag der Konrad-Adenauer-Stiftung durchgeführt. Im Erhebungszeitraum 26. November bis 4. Dezember 2018 wurden 2.041 Facebook-Nutzer in Onlineinterviews (CAWI) zu folgenden Themen befragt: Internetnutzung, Facebook-Gruppen, Facebooknutzung, politische Inhalte auf Facebook, Reaktion auf Inhalte, Bildexperiment und Sonntagsfrage. Die Auswahl der Befragten erfolgte durch eine Quotenstichprobe aus einem Online-Access-Panel. Nutzung verschiedener Internetangebote (Tinder, Facebook, Twitter, snapchat, Instagram, YouTube, Online-Zeitungen, nichts davon); Nutzung offener oder geschlossener Facebook-Gruppen; Facebook-Inhalte zu politischen Themen, zu berufsbezogenen Themen, zu Hobbies, zur Unterhaltung bzw. zu anderen Themen; Art der Facebook-Nutzung (lese/ like/ teile Inhalte, schreibe Kommentare, verbreite eigene Inhalte); politische Facebook-Nutzung (lese/ like/ teile politische Inhalte, schreibe Kommentare zu politischen Themen, verbreite eigenen Inhalt zu politischen Themen); Reaktion auf Facebook- Inhalte bzw. Kommentare (fühle mich informiert, unterhalten, verärgert, provoziert); Zustimmung zu verschiedenen Aussagen zu Facebook (auf Facebook regen mich andere auf, zeige ich anderen ihre Grenzen, kann ich anonym meine Meinung sagen, finde ich viele verschiedene Meinungen, finde ich Meinungen, die sonst unterdrückt werden, traue ich mich Dinge zu sagen/ teilen, die ich sonst nicht sagen würde); Parteipräferenz (Sonntagsfrage); Kommentar (offen) zu einem provozierenden Bild (Split A: Flüchtlinge, Split B: Pegida). Demographie: Geschlecht; Alter (Geburtsjahr); Bildung; Erwerbstätigkeit; berufliche Stellung; Haushaltsnettoeinkommen (gruppiert); Bundesland. Zusätzlich verkodet wurde: lfd. Nummer; Gewichtungsfaktor. The study on Facebook users was conducted by infratest dimap on behalf of the Konrad Adenauer Foundation. During the survey period from November 26 to December 4, 2018, 2,041 Facebook users were surveyed in online interviews (CAWI) on the following topics: internet use, Facebook groups, Facebook use, political content on Facebook, reaction to content, image experiment and Sunday question. Respondents were selected by quota sampling from an online access panel. Use of various internet services (Tinder, Facebook, Twitter, snapchat, Instagram, YouTube, online newspapers, none of the above); use of open or closed Facebook groups; Facebook content on political topics, on job-related topics, on hobbies, on entertainment or on other topics; type of Facebook use (read/ like/ share content, write comments, disseminate own content); political Facebook use (read/ like/ share political content, write comments on political topics, disseminate own content on political topics); reaction to Facebook content or comments (do I feel informed, entertained, annoyed, provoked, etc.). comments (feel informed, entertained, annoyed, provoked); agreement with various statements on Facebook (on Facebook others upset me, I show others their limits, I can speak my mind anonymously, I find many different opinions, I find opinions that are otherwise suppressed, I dare to say/ share things I would not otherwise say); party preference (Sunday question); comment (open) on a provocative image (split A: refugees, split B: Pegida). Demography: sex; age (year of birth); education; employment; occupational status; net household income (grouped); federal state. Additionally coded: serial number; weighting factor.
The number of Facebook users in Malaysia was forecast to continuously decrease between 2024 and 2028 by in total 2.2 million users (-9.36 percent). According to this forecast, in 2028, the Facebook user base will have decreased for the sixth consecutive year to 21.33 million users. User figures, shown here regarding the platform facebook, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find further information concerning Indonesia and Singapore.
The number of Facebook users in India was forecast to continuously increase between 2024 and 2028 by in total **** million users (+*** percent). After the ninth consecutive increasing year, the Facebook user base is estimated to reach ****** million users and therefore a new peak in 2028. Notably, the number of Facebook users of was continuously increasing over the past years.User figures, shown here regarding the platform facebook, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to *** countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Facebook users in countries like Nepal and Pakistan.
The metrics in this dataset measure users who engaged with posts with links to civic news URLs and the volume of their engagement. The dataset contains URL-level metrics from Facebook activity data for adult U.S. monthly active users, aggregated over the study period. Includes content views, audience size, content attributes, user attributes.
The metrics in this dataset measure users who viewed posts with links to civic news URLs. The dataset contains URL-level metrics from Facebook activity data for adult U.S. monthly active users, aggregated over the study period. Includes content views, audience size, content attributes, user attributes.
The number of Facebook users in Indonesia was forecast to continuously decrease between 2024 and 2028 by in total ** million users (-***** percent). According to this forecast, in 2028, the Facebook user base will have decreased for the fifth consecutive year to ****** million users. User figures, shown here regarding the platform facebook, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to *** countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find further information concerning Thailand and Singapore.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This database contains regional estimates of Facebook users based on data from the Facebook Marketing API. It includes information on the number of individuals aged 18 and older who have accessed Facebook in the past month, with data separated by region. These estimates are intended for trend identification and triangulation purposes and are not designed to match official census data or other government sources.
This data can be used as a proxy of internet access.
It should be noted that there could be duplicates across different regions, and the data is anonymized by Meta.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Social mobilization is a process that enlists a large number of people to achieve a goal within a limited time, especially through the use of social media. There is increasing interest in understanding the factors that affect the speed of social mobilization. Based on the Langley Knights competition data set, we analyzed the differences in mobilization speed between users of Facebook and e-mail. We include other factors that may influence mobilization speed (gender, age, timing, and homophily of information source) in our model as control variables in order to isolate the effect of such factors. We show that, in this experiment, although more people used e-mail to recruit, the mobilization speed of Facebook users was faster than that of those that used e-mail. We were also able to measure and show that the mobilization speed for Facebook users was on average seven times faster compared to e-mail before controlling for other factors. After controlling for other factors, we show that Facebook users were 1.84 times more likely to register compared to e-mail users in the next period if they have not done so at any point in time. This finding could provide useful insights for future social mobilization efforts.
The metrics in this dataset measure users who potentially viewed posts with links to civic news URLs that were shared by one of their connections. The dataset contains URL-level metrics from Facebook activity data for adult U.S. monthly active users, aggregated over the study period. Includes potential audience size, content attributes, user attributes, political interest.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Multi-aspect Integrated Migration Indicators (MIMI) dataset is the result of the process of gathering, embedding and combining traditional migration datasets, mostly from sources like Eurostat and UNSD Demographic Statistics Database, and alternative types of data, which consists in multidisciplinary features and measures not typically employed in migration studies, such as the Facebook Social Connectedness Index (SCI). Its purpose is to exploit these novel types of data for: nowcasting migration flows and stocks, studying integration of multiple sources and knowledge, and investigating migration drivers.
The MIMI dataset is designed to have a unique pair of countries for each row. Each record contains country-to-country information about: migrations flows and stock their share, their strength of Facebook connectedness and other features, such as corresponding populations, GDP, coordinates, NET migration, and many others.
Methodology.
After having collected bilateral flows records about international human mobility by citizenship, residence and country of birth (available for both sexes and, in some cases, for different age groups), they have been merged together in order to obtain a unique dataset in which each ordered couple (country-of-origin, country-of-destination) appears once. To avoid duplicate couples, flow records have been selected by following this priority: first migration by citizenship, then migration by residence and lastly by country of birth.
The integration process started by choosing, collecting and meaningfully including many other indicators that could be helpful for the dataset final purpose mentioned above.
Non-bidirectional migration measures for each country: total number of immigrants and emigrants for each year, NET migration and NET migration rate in a five-year range.
Other multidisciplinary indicators (cultural, social, anthropological, demographical, historical features) related to each country: religion (single one or list), yearly GDP at PPP, spoken language (or list of languages), yearly population stocks (and population densities if available), number of Facebook users, percentage of Facebook users, cultural indicators (PDI, IDV, MAS, UAI, LTO). Also the following feature have been included for each pair of countries: Facebook Social Connectedness Index.
Once traditional and non-traditional knowledge is gathered and integrated, we move to the pre-processing phase where we manage the data cleaning, preparation and transformation. Here our dataset was subjected to various computational standard processes and additionally reshaped in the final structure established by our design choices.
The data quality assessment phase was one of the longest and most delicate, since many values were missing and this could have had a negative impact on the quality of the desired resulting knowledge. They have been integrated from additional sources such as The World Bank, World Population Review, Statista, DataHub, Wikipedia and in some cases extracted from Python libraries such as PyPopulation, CountryInfo and PyCountry.
The final dataset has the structure of a huge matrix having countries couples as index (uniquely identified by coupling their ISO 3166-1 alpha-2 codes): it comprises 28725 entries and 485 columns.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview
The Controllable Multimodal Feedback Synthesis (CMFeed) Dataset is designed to enable the generation of sentiment-controlled feedback from multimodal inputs, including text and images. This dataset can be used to train feedback synthesis models in both uncontrolled and sentiment-controlled manners. Serving a crucial role in advancing research, the CMFeed dataset supports the development of human-like feedback synthesis, a novel task defined by the dataset's authors. Additionally, the corresponding feedback synthesis models and benchmark results are presented in the associated code and research publication.
Task Uniqueness: The task of controllable multimodal feedback synthesis is unique, distinct from LLMs and tasks like VisDial, and not addressed by multi-modal LLMs. LLMs often exhibit errors and hallucinations, as evidenced by their auto-regressive and black-box nature, which can obscure the influence of different modalities on the generated responses [Ref1; Ref2]. Our approach includes an interpretability mechanism, as detailed in the supplementary material of the corresponding research publication, demonstrating how metadata and multimodal features shape responses and learn sentiments. This controllability and interpretability aim to inspire new methodologies in related fields.
Data Collection and Annotation
Data was collected by crawling Facebook posts from major news outlets, adhering to ethical and legal standards. The comments were annotated using four sentiment analysis models: FLAIR, SentimentR, RoBERTa, and DistilBERT. Facebook was chosen for dataset construction because of the following factors:
• Facebook was chosen for data collection because it uniquely provides metadata such as news article link, post shares, post reaction, comment like, comment rank, comment reaction rank, and relevance scores, not available on other platforms.
• Facebook is the most used social media platform, with 3.07 billion monthly users, compared to 550 million Twitter and 500 million Reddit users. [Ref]
• Facebook is popular across all age groups (18-29, 30-49, 50-64, 65+), with at least 58% usage, compared to 6% for Twitter and 3% for Reddit. [Ref]. Trends are similar for gender, race, ethnicity, income, education, community, and political affiliation [Ref]
• The male-to-female user ratio on Facebook is 56.3% to 43.7%; on Twitter, it's 66.72% to 23.28%; Reddit does not report this data. [Ref]
Filtering Process: To ensure high-quality and reliable data, the dataset underwent two levels of filtering:
a) Model Agreement Filtering: Retained only comments where at least three out of the four models agreed on the sentiment.
b) Probability Range Safety Margin: Comments with a sentiment probability between 0.49 and 0.51, indicating low confidence in sentiment classification, were excluded.
After filtering, 4,512 samples were marked as XX. Though these samples have been released for the reader's understanding, they were not used in training the feedback synthesis model proposed in the corresponding research paper.
Dataset Description
• Total Samples: 61,734
• Total Samples Annotated: 57,222 after filtering.
• Total Posts: 3,646
• Average Likes per Post: 65.1
• Average Likes per Comment: 10.5
• Average Length of News Text: 655 words
• Average Number of Images per Post: 3.7
Components of the Dataset
The dataset comprises two main components:
• CMFeed.csv File: Contains metadata, comment, and reaction details related to each post.
• Images Folder: Contains folders with images corresponding to each post.
Data Format and Fields of the CSV File
The dataset is structured in CMFeed.csv file along with corresponding images in related folders. This CSV file includes the following fields:
• Id: Unique identifier
• Post: The heading of the news article.
• News_text: The text of the news article.
• News_link: URL link to the original news article.
• News_Images: A path to the folder containing images related to the post.
• Post_shares: Number of times the post has been shared.
• Post_reaction: A JSON object capturing reactions (like, love, etc.) to the post and their counts.
• Comment: Text of the user comment.
• Comment_like: Number of likes on the comment.
• Comment_reaction_rank: A JSON object detailing the type and count of reactions the comment received.
• Comment_link: URL link to the original comment on Facebook.
• Comment_rank: Rank of the comment based on engagement and relevance.
• Score: Sentiment score computed based on the consensus of sentiment analysis models.
• Agreement: Indicates the consensus level among the sentiment models, ranging from -4 (all negative) to 4 (all positive). 3 negative and 1 positive will result into -2 and 3 positives and 1 negative will result into +2.
• Sentiment_class: Categorizes the sentiment of the comment into 1 (positive) or 0 (negative).
More Considerations During Dataset Construction
We thoroughly considered issues such as the choice of social media platform for data collection, bias and generalizability of the data, selection of news handles/websites, ethical protocols, privacy and potential misuse before beginning data collection. While achieving completely unbiased and fair data is unattainable, we endeavored to minimize biases and ensure as much generalizability as possible. Building on these considerations, we made the following decisions about data sources and handling to ensure the integrity and utility of the dataset:
• Why not merge data from different social media platforms? We chose not to merge data from platforms such as Reddit and Twitter with Facebook due to the lack of comprehensive metadata, clear ethical guidelines, and control mechanisms—such as who can comment and whether users' anonymity is maintained—on these platforms other than Facebook. These factors are critical for our analysis. Our focus on Facebook alone was crucial to ensure consistency in data quality and format.
• Choice of four news handles: We selected four news handles—BBC News, Sky News, Fox News, and NY Daily News—to ensure diversity and comprehensive regional coverage. These news outlets were chosen for their distinct regional focuses and editorial perspectives: BBC News is known for its global coverage with a centrist view, Sky News offers geographically targeted and politically varied content learning center/right in the UK/EU/US, Fox News is recognized for its right-leaning content in the US, and NY Daily News provides left-leaning coverage in New York. Many other news handles such as NDTV, The Hindu, Xinhua, and SCMP are also large-scale but may contain information in regional languages such as Indian and Chinese, hence, they have not been selected. This selection ensures a broad spectrum of political discourse and audience engagement.
• Dataset Generalizability and Bias: With 3.07 billion of the total 5 billion social media users, the extensive user base of Facebook, reflective of broader social media engagement patterns, ensures that the insights gained are applicable across various platforms, reducing bias and strengthening the generalizability of our findings. Additionally, the geographic and political diversity of these news sources, ranging from local (NY Daily News) to international (BBC News), and spanning political spectra from left (NY Daily News) to right (Fox News), ensures a balanced representation of global and political viewpoints in our dataset. This approach not only mitigates regional and ideological biases but also enriches the dataset with a wide array of perspectives, further solidifying the robustness and applicability of our research.
• Dataset size and diversity: Facebook prohibits the automatic scraping of its users' personal data. In compliance with this policy, we manually scraped publicly available data. This labor-intensive process requiring around 800 hours of manual effort, limited our data volume but allowed for precise selection. We followed ethical protocols for scraping Facebook data , selecting 1000 posts from each of the four news handles to enhance diversity and reduce bias. Initially, 4000 posts were collected; after preprocessing (detailed in Section 3.1), 3646 posts remained. We then processed all associated comments, resulting in a total of 61734 comments. This manual method ensures adherence to Facebook’s policies and the integrity of our dataset.
Ethical considerations, data privacy and misuse prevention
The data collection adheres to Facebook’s ethical guidelines [<a href="https://developers.facebook.com/terms/"
The global number of Facebook users was forecast to continuously increase between 2023 and 2027 by in total 391 million users (+14.36 percent). After the fourth consecutive increasing year, the Facebook user base is estimated to reach 3.1 billion users and therefore a new peak in 2027. Notably, the number of Facebook users was continuously increasing over the past years. User figures, shown here regarding the platform Facebook, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).