Facebook
TwitterThe number of Facebook users in the United States was forecast to continuously increase between 2024 and 2028 by in total 12.6 million users (+5.04 percent). After the ninth consecutive increasing year, the Facebook user base is estimated to reach 262.8 million users and therefore a new peak in 2028. Notably, the number of Facebook users of was continuously increasing over the past years.User figures, shown here regarding the platform facebook, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
Facebook
Twitterhttps://brightdata.com/licensehttps://brightdata.com/license
Use our Facebook Profiles dataset to explore public profile details such as names, profile and cover photos, work history, education, and photo galleries. Common use cases include people and company research, influencer discovery, and academic studies of career and education signals on Facebook. Over 31M records available Price starts at $250/100K records Data formats are available in JSON, NDJSON, CSV, XLSX and Parquet. 100% ethical and compliant data collection Included datapoints:
Profile URL Profile Name Facebook Profile ID Profile Photo Cover Photo Work History (Title, Company, Company ID, Company URL, Start/End Dates) College Education (Name, ID, URL) High School Education (Name, ID, URL) Photo Galleries And much more
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
With roughly 2.89 billion monthly active users as of the second quarter of 2021, Facebook is the biggest social network worldwide. In the third quarter of 2012, the number of active Facebook users surpassed one billion, making it the first social network ever to do so. Active users are those who have logged into Facebook during the past 30 days. During the first quarter of 2021, the company stated that 3.51 billion people were using at least one of the company's core products (Facebook, WhatsApp, Instagram, or Messenger) each month.
This data was collected by Facebook and was released in July 2021.
Facebook
TwitterFacebook Users Engagement Analysis Author: Tamara Banaim Dataset: Pseudo Facebook Dataset (Kaggle, uploaded to Hugging Face)
Overview-
This project analyzes data from 99,003 Facebook users, focusing on demographic information and engagement metrics such as likes given, likes received, friend count, and account tenure. The analysis explores how age and user activity are related, and what factors influence engagement on the platform.
Objective-
To examine how age and… See the full description on the dataset page: https://huggingface.co/datasets/tamarabanaim/facebook-users-data.
Facebook
Twitterhttps://brightdata.com/licensehttps://brightdata.com/license
Access our extensive Facebook datasets that provide detailed information on public posts, pages, and user engagement. Gain insights into post performance, audience interactions, page details, and content trends with our ethically sourced data. Free samples are available for evaluation. Over 940M records available Price starts at $250/100K records Data formats are available in JSON, NDJSON, CSV, XLSX and Parquet. 100% ethical and compliant data collection Included datapoints:
Post ID Post Content & URL Date Posted Hashtags Number of Comments Number of Shares Likes & Reaction Counts (by type) Video View Count Page Name & Category Page Followers & Likes Page Verification Status Page Website & Contact Info Is Sponsored Post Attachments (Images/Videos) External Link Data And much more
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Facebook data is a valuable resource for businesses. It delivers essential information about your target audience. Over 2.9 billion people use Facebook worldwide. For example, you may learn about their demographics, hobbies, and online activities. Moreover, Facebook data may assist you in categorizing your target demographic. As a result, you may target your messaging to certain groups of people. This enhances the likelihood of your campaigns connecting with them. Furthermore, you may utilize Facebook data to monitor the effectiveness of your initiatives. This allows you to determine what is working and what is not. You can then make changes to enhance your outcomes. Facebook’s data is continually changing. Stay current with the newest trends and best practices. List To Data will help you get the most out of this important resource. Facebook number database is an invaluable tool for marketers looking to engage with their target demographic. This directory, often known as a contact list or dataset, contains crucial information such as user profiles and engagement metrics. This platform provides a wide reservoir of possible leads. Using this content, you may design tailored campaigns that increase engagement. Moving from general techniques to data-driven initiatives will help you achieve better outcomes. This information enables you to modify your messaging for higher response rates. Furthermore, this resource is updated regularly, guaranteeing that your campaigns always have new connections. Visit List To Data to get premium Facebook number databases and boost your company!
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides a longitudinal view of Facebook’s Daily Active Users (DAUs) across four primary geographic regions: US & Canada, Europe, Asia-Pacific, and the Rest of World. Spanning from December 2009 through December 2023, the data captures the platform's evolution from a burgeoning social network to a global utility. DAUs are defined as registered and logged-in users who visited Facebook through the website, a mobile device, or the Messenger application on a given day.
Key Data Components:
Regional Growth: Quarterly DAU counts for each major geography.
Engagement Metrics: DAUs expressed as a percentage of Monthly Active Users (MAUs) to measure user stickiness.
Geographic Attribution: Historical adjustments, such as the 2012 algorithm correction, which refined how users are assigned to specific regions.
Limitations of the Data The metrics are based on internal company estimates and are subject to several inherent challenges:
Estimation Challenges: Identifying unique individuals across multiple accounts or products (Facebook, Instagram, WhatsApp) requires complex algorithms and machine learning models that involve significant judgment.
Methodology Inconsistencies: Estimates may differ from third-party data due to variations in measurement techniques. Methodology improvements can also result in adjustments to historical data.
Technical and Survey Errors: Data is susceptible to technical errors and relies on user surveys for calibration, which are themselves subject to a margin of error (estimated at approximately 3% of worldwide MAP).
Trend Discrepancies: Due to attribution difficulties at scale, reported trends may not always match actual changes in the user base.
Reporting Shifts: Beginning in 2024, the company will cease reporting Facebook-specific DAUs in favor of broader "Family" metrics.
Potential Use Cases
Market Analysis: Tracking the maturation of social media markets in developed regions (US & Canada) versus rapid expansion in emerging regions like Asia-Pacific.
Academic Research: Studying the global diffusion of digital platforms and long-term user engagement trends over a 14-year period.
Investment Benchmarking: Evaluating historical platform health and the effectiveness of international expansion strategies.
Strategic Planning: Modeling future growth trajectories based on historical regional performance and engagement rates.
Facebook
Twitterhttps://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
Context: This dataset offers insights into the usage patterns of social media apps for 1,000 users across seven popular platforms: Facebook, Instagram, Twitter, Snapchat, TikTok, LinkedIn, and Pinterest. It tracks various metrics such as daily time spent on the app, number of posts made, likes received, and new followers gained.
Dataset Features:
User_ID: Unique identifier for each user. App: The social media platform being used. Daily_Minutes_Spent: Total time a user spends on the app each day, ranging from 5 to 500 minutes. Posts_Per_Day: Number of posts a user creates per day, ranging from 0 to 20. Likes_Per_Day: Total number of likes a user receives on their posts each day, ranging from 0 to 200. Follows_Per_Day: The number of new followers a user gains daily, ranging from 0 to 50. Context & Use Cases: This dataset could be particularly useful for social media analysts, digital marketers, or researchers interested in understanding user engagement trends across different platforms. It provides insights into how much time users spend, how actively they post, and the level of engagement they receive (in terms of likes and followers).
Conclusion & Outcome: Analyzing this dataset could yield several outcomes:
Engagement Patterns: Identifying which platforms have higher engagement in terms of time spent or likes received. Active Users: Determining which users are the most active across various platforms based on the number of posts and followers gained. User Retention: Studying the correlation between time spent and follower growth, providing insight into user retention strategies for different platforms. Overall, the dataset allows for exploration of social media usage trends and helps drive decision-making for marketing strategies, content creation, and platform engagement.
Facebook
TwitterThe number of Facebook users in India was forecast to continuously increase between 2024 and 2028 by in total **** million users (+*** percent). After the ninth consecutive increasing year, the Facebook user base is estimated to reach ****** million users and therefore a new peak in 2028. Notably, the number of Facebook users of was continuously increasing over the past years.User figures, shown here regarding the platform facebook, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to *** countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Facebook users in countries like Nepal and Pakistan.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset consists of 'circles' (or 'friends lists') from Facebook. Facebook data was collected from survey participants using this Facebook app. The dataset includes node features (profiles), circles, and ego networks.
Facebook data has been anonymized by replacing the Facebook-internal ids for each user with a new value. Also, while feature vectors from this dataset have been provided, the interpretation of those features has been obscured. For instance, where the original dataset may have contained a feature "political=Democratic Party", the new data would simply contain "political=anonymized feature 1". Thus, using the anonymized data it is possible to determine whether two users have the same political affiliations, but not what their individual political affiliations represent. Data is also available from Google+ and Twitter.
Dataset statistics Nodes 4039 Edges 88234 Nodes in largest WCC 4039 (1.000) Edges in largest WCC 88234 (1.000) Nodes in largest SCC 4039 (1.000) Edges in largest SCC 88234 (1.000) Average clustering coefficient 0.6055 Number of triangles 1612010 Fraction of closed triangles 0.2647 Diameter (longest shortest path) 8 90-percentile effective diameter 4.7
File Information
facebook_combined.txt – Contains the Facebook social network graph represented as an edge list. Each row represents a connection between two users (nodes) indicating a friendship relationship.
facebook_circles.txt – Contains labeled social circles (friend lists) created by users. Each circle represents a group of friends belonging to a specific social context (e.g., family, colleagues, classmates).
facebook_edges.txt – Lists all edges in individual ego networks, showing connections between friends of the ego user.
facebook_egofeat.txt – Contains feature vectors for ego users. These represent anonymized profile attributes of the central user in each ego network.
facebook_feat.txt – Contains anonymized feature vectors for all users in the dataset. Features represent profile attributes such as interests, education, or affiliations but are anonymized to protect privacy.
facebook_featnames.txt – Provides the mapping or index of anonymized features used in the feature vectors.
facebook_ego_networks_info.txt – Contains metadata or statistics about ego networks used to construct the combined social graph.
Facebook
Twitterhttps://brightdata.com/licensehttps://brightdata.com/license
Gain valuable insights with our comprehensive Social Media Dataset, designed to help businesses, marketers, and analysts track trends, monitor engagement, and optimize strategies. This dataset provides structured and reliable social media data from multiple platforms.
Dataset Features
User Profiles: Access public social media profiles, including usernames, bios, follower counts, engagement metrics, and more. Ideal for audience analysis, influencer marketing, and competitive research. Posts & Content: Extract posts, captions, hashtags, media (images/videos), timestamps, and engagement metrics such as likes, shares, and comments. Useful for trend analysis, sentiment tracking, and content strategy optimization. Comments & Interactions: Analyze user interactions, including replies, mentions, and discussions. This data helps brands understand audience sentiment and engagement patterns. Hashtag & Trend Tracking: Monitor trending hashtags, topics, and viral content across platforms to stay ahead of industry trends and consumer interests.
Customizable Subsets for Specific Needs Our Social Media Dataset is fully customizable, allowing you to filter data based on platform, region, keywords, engagement levels, or specific user profiles. Whether you need a broad dataset for market research or a focused subset for brand monitoring, we tailor the dataset to your needs.
Popular Use Cases
Brand Monitoring & Reputation Management: Track brand mentions, customer feedback, and sentiment analysis to manage online reputation effectively. Influencer Marketing & Audience Analysis: Identify key influencers, analyze engagement metrics, and optimize influencer partnerships. Competitive Intelligence: Monitor competitor activity, content performance, and audience engagement to refine marketing strategies. Market Research & Consumer Insights: Analyze social media trends, customer preferences, and emerging topics to inform business decisions. AI & Predictive Analytics: Leverage structured social media data for AI-driven trend forecasting, sentiment analysis, and automated content recommendations.
Whether you're tracking brand sentiment, analyzing audience engagement, or monitoring industry trends, our Social Media Dataset provides the structured data you need. Get started today and customize your dataset to fit your business objectives.
Facebook
TwitterThis layer shows the market potential that an adult has visited facebook.com in the last 30 days in the U.S. in 2016 in a multiscale map (by country, state, county, ZIP Code, tract, and block group). The pop-up is configured to include the following information for each geography level:Market Potential Index and count of adults expected to visit FacebookMarket Potential Index and count of adults expected to visit various social media websitesMarket Potential Index and count of adults expected to visit various news websitesEsri's 2016 Market Potential (MPI) data measures the likely demand for a product or service in an area. The database includes an expected number of consumers and a Market Potential Index (MPI) for each product or service. An MPI compares the demand for a specific product or service in an area with the national demand for that product or service. The MPI values at the US level are 100, representing average demand for the country. A value of more than 100 represents higher demand than the national average, and a value of less than 100 represents lower demand than the national average. For example, an index of 120 implies that demand in the area is 20 percent higher than the US average; an index of 80 implies that demand is 20 percent lower than the US average. See Market Potential database to view the methodology statement and complete variable list.Esri's Electronics & Internet Data Collection includes data that measures the likely demand for electronics and internet usage. The database includes an expected number of consumers and a Market Potential Index (MPI) for each product, activity, or service. See the United States Data Browser to view complete variable lists for each Esri demographics collection.Additional Esri Resources:U.S. 2016/2021 Esri Updated DemographicsEssential demographic vocabularyEsri's arcgis.com demographic map layers
Facebook
TwitterThis dataset contains information about posts made on Famous Cosmetic Brand's Facebook page from 1st of January to 31th of December of 2014. Each row represents a single post and includes the following attributes:
Citation: (Moro et al., 2016) S. Moro, P. Rita and B. Vala. Predicting social media performance metrics and evaluation of the impact on brand building: A data mining approach. Journal of Business Research, Elsevier, In press. Available at: http://dx.doi.org/10.1016/j.jbusres.2016.02.010
Facebook
Twitterhttps://www.pewresearch.org/terms-and-conditions/https://www.pewresearch.org/terms-and-conditions/
A line chart that shows % of U.S. adults who say they ever use …
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The current dataset describes Facebook pages performance for 220 Libraries, Archives and Museums from all over the world. The performance is measured through 9 different social media metrics. That is, number of posts, link-posts, picture-posts, video-posts, total reactions, comments and shares, number of reactions, comments per post and reactions per post. The data harvesting process has been conducted through the use of FanPageKarma API. The gathered metrics and their values depict the performance for each Facebook page in a time-period of 30 days.
Facebook
TwitterThe number of Facebook users in Bulgaria was forecast to continuously increase between 2024 and 2028 by in total *** million users (+***** percent). After the ninth consecutive increasing year, the Facebook user base is estimated to reach **** million users and therefore a new peak in 2028. Notably, the number of Facebook users of was continuously increasing over the past years.User figures, shown here regarding the platform facebook, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to *** countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Social mobilization is a process that enlists a large number of people to achieve a goal within a limited time, especially through the use of social media. There is increasing interest in understanding the factors that affect the speed of social mobilization. Based on the Langley Knights competition data set, we analyzed the differences in mobilization speed between users of Facebook and e-mail. We include other factors that may influence mobilization speed (gender, age, timing, and homophily of information source) in our model as control variables in order to isolate the effect of such factors. We show that, in this experiment, although more people used e-mail to recruit, the mobilization speed of Facebook users was faster than that of those that used e-mail. We were also able to measure and show that the mobilization speed for Facebook users was on average seven times faster compared to e-mail before controlling for other factors. After controlling for other factors, we show that Facebook users were 1.84 times more likely to register compared to e-mail users in the next period if they have not done so at any point in time. This finding could provide useful insights for future social mobilization efforts.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset focuses on analyzing user engagement trends across major social media platforms like Instagram, Twitter, and Facebook. It includes data related to likes, Comments, Shares and other metrics that indicate how users interact with content online.
The goal of this dataset is to explore patterns in social media behavior, understand content performance, and support research in digital marketing, user behavior analytics, and social media strategy.
This dataset was created as part of an academic project and includes:
Engagement metrics (likes, Shares, Comments)
Time-based trends
User categories (influencers, brands, regular users)
Platform-specific observations (Instagram, Twitter, Facebook)
All data is either simulated or compiled from publicly available sources and does not include any personal or sensitive user information.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Purpose For the purpose of informing tobacco intervention programs, this dataset was created and used to explore how online social networks of smokers differed from those of nonsmokers. The study was a secondary analysis of data collected as part of a randomized control trial conducted within Facebook. (See "Other References" in "Metadata" for parent study information.) Basic description of 4 anonymized data files of study participants. fbr_friends: Anonymized Facebook friends networks, basic ego demographics, basic ego social media activity fbr_family: Anonymized Facebook family networks, basic ego demographics, basic ego social media activity fbr_photos: Anonymized Facebook photo networks, basic ego demographics, basic ego social media activity fbr_groups: Anonymized Facebook group networks, basic ego demographics, basic ego social media activity Each network comprises the ego, the ego's first degree connections, and the (second degree) connections between the ego's friends. Missing data and users who did not have friend, family, photo, or group networks were cleaned from the data beforehand. Each data file contains the following columns of data, taken with participant knowledge and consent participant_id: Nonidentifying ids assigned to different study participants. is_smoker: Binary value (0,1) that takes on the value 1 if participant was a smoker and 0 otherwise. gender: One of three categories: male, female, or blank, which signified Other (different from missing data). country: One of four categories: Canada (ca), US (us), Mexico (mx), or Other (xx). likes_count: Numeric data indicating number of Facebook likes the participant had made up to the date the data was collected. wall_count: Numeric data indicating number of Facebook wall posts the participant had made up to the date the data was collected. t_count_page_views: Numeric data indicating number of pages participant had visited in the UbiQUITous app up to the date the data was collected. yearsOld: Numeric data indicating age in years of the participant; right censored at 90 years for data anonymity. vertices: Number of people in the participant's network. edges: Number of connections between people in the network. density: The portion of potential connections in a network that are actual connections; a network-level metric; calculated after removing ego and isolates. mean_betweenness_centrality: An average of the relative importance of all individuals within their own network; a network-level metric; calculated after removing ego and isolates. transitivity: The extent to which the relationship between two nodes in a network that are connected by an edge is transitive (calculated as the number of triads divided by all possible connections); a network-level metric; calculated after removing ego and isolates. mean_closeness: Average of how closely associated members are to one another; a network-level metric; calculated after removing ego and isolates. isolates2: Number of individuals with no connections other than to the ego; a network-level metric. diameter3: Maximum degree of separation between any two individuals in the network; a network-level metric; calculated after removing ego and isolates. clusters3: Number of subnetworks; a network-level metric; calculated after removing ego and isolates. communities3: Number of groups, sorted to increase dense connections within the group and decrease sparse connections outside it (i.e., to maximize modularity); a network-level metric; calculated after removing ego and isolates. modularity3: The strength of division of a network into communities (calculated as the fraction of ties between community members in excess of the expected number of ties within communities if ties were random); a network-level metric. Detailed information on network metrics in the associated manuscript: "An exploration of the Facebook social networks of smokers and non-smokers" by Fu, L, Jacobs MA, Brookover J, Valente TW, Cobb NK, and Graham AL.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview
The Controllable Multimodal Feedback Synthesis (CMFeed) Dataset is designed to enable the generation of sentiment-controlled feedback from multimodal inputs, including text and images. This dataset can be used to train feedback synthesis models in both uncontrolled and sentiment-controlled manners. Serving a crucial role in advancing research, the CMFeed dataset supports the development of human-like feedback synthesis, a novel task defined by the dataset's authors. Additionally, the corresponding feedback synthesis models and benchmark results are presented in the associated code and research publication.
Task Uniqueness: The task of controllable multimodal feedback synthesis is unique, distinct from LLMs and tasks like VisDial, and not addressed by multi-modal LLMs. LLMs often exhibit errors and hallucinations, as evidenced by their auto-regressive and black-box nature, which can obscure the influence of different modalities on the generated responses [Ref1; Ref2]. Our approach includes an interpretability mechanism, as detailed in the supplementary material of the corresponding research publication, demonstrating how metadata and multimodal features shape responses and learn sentiments. This controllability and interpretability aim to inspire new methodologies in related fields.
Data Collection and Annotation
Data was collected by crawling Facebook posts from major news outlets, adhering to ethical and legal standards. The comments were annotated using four sentiment analysis models: FLAIR, SentimentR, RoBERTa, and DistilBERT. Facebook was chosen for dataset construction because of the following factors:
• Facebook was chosen for data collection because it uniquely provides metadata such as news article link, post shares, post reaction, comment like, comment rank, comment reaction rank, and relevance scores, not available on other platforms.
• Facebook is the most used social media platform, with 3.07 billion monthly users, compared to 550 million Twitter and 500 million Reddit users. [Ref]
• Facebook is popular across all age groups (18-29, 30-49, 50-64, 65+), with at least 58% usage, compared to 6% for Twitter and 3% for Reddit. [Ref]. Trends are similar for gender, race, ethnicity, income, education, community, and political affiliation [Ref]
• The male-to-female user ratio on Facebook is 56.3% to 43.7%; on Twitter, it's 66.72% to 23.28%; Reddit does not report this data. [Ref]
Filtering Process: To ensure high-quality and reliable data, the dataset underwent two levels of filtering:
a) Model Agreement Filtering: Retained only comments where at least three out of the four models agreed on the sentiment.
b) Probability Range Safety Margin: Comments with a sentiment probability between 0.49 and 0.51, indicating low confidence in sentiment classification, were excluded.
After filtering, 4,512 samples were marked as XX. Though these samples have been released for the reader's understanding, they were not used in training the feedback synthesis model proposed in the corresponding research paper.
Dataset Description
• Total Samples: 61,734
• Total Samples Annotated: 57,222 after filtering.
• Total Posts: 3,646
• Average Likes per Post: 65.1
• Average Likes per Comment: 10.5
• Average Length of News Text: 655 words
• Average Number of Images per Post: 3.7
Components of the Dataset
The dataset comprises two main components:
• CMFeed.csv File: Contains metadata, comment, and reaction details related to each post.
• Images Folder: Contains folders with images corresponding to each post.
Data Format and Fields of the CSV File
The dataset is structured in CMFeed.csv file along with corresponding images in related folders. This CSV file includes the following fields:
• Id: Unique identifier
• Post: The heading of the news article.
• News_text: The text of the news article.
• News_link: URL link to the original news article.
• News_Images: A path to the folder containing images related to the post.
• Post_shares: Number of times the post has been shared.
• Post_reaction: A JSON object capturing reactions (like, love, etc.) to the post and their counts.
• Comment: Text of the user comment.
• Comment_like: Number of likes on the comment.
• Comment_reaction_rank: A JSON object detailing the type and count of reactions the comment received.
• Comment_link: URL link to the original comment on Facebook.
• Comment_rank: Rank of the comment based on engagement and relevance.
• Score: Sentiment score computed based on the consensus of sentiment analysis models.
• Agreement: Indicates the consensus level among the sentiment models, ranging from -4 (all negative) to 4 (all positive). 3 negative and 1 positive will result into -2 and 3 positives and 1 negative will result into +2.
• Sentiment_class: Categorizes the sentiment of the comment into 1 (positive) or 0 (negative).
More Considerations During Dataset Construction
We thoroughly considered issues such as the choice of social media platform for data collection, bias and generalizability of the data, selection of news handles/websites, ethical protocols, privacy and potential misuse before beginning data collection. While achieving completely unbiased and fair data is unattainable, we endeavored to minimize biases and ensure as much generalizability as possible. Building on these considerations, we made the following decisions about data sources and handling to ensure the integrity and utility of the dataset:
• Why not merge data from different social media platforms? We chose not to merge data from platforms such as Reddit and Twitter with Facebook due to the lack of comprehensive metadata, clear ethical guidelines, and control mechanisms—such as who can comment and whether users' anonymity is maintained—on these platforms other than Facebook. These factors are critical for our analysis. Our focus on Facebook alone was crucial to ensure consistency in data quality and format.
• Choice of four news handles: We selected four news handles—BBC News, Sky News, Fox News, and NY Daily News—to ensure diversity and comprehensive regional coverage. These news outlets were chosen for their distinct regional focuses and editorial perspectives: BBC News is known for its global coverage with a centrist view, Sky News offers geographically targeted and politically varied content learning center/right in the UK/EU/US, Fox News is recognized for its right-leaning content in the US, and NY Daily News provides left-leaning coverage in New York. Many other news handles such as NDTV, The Hindu, Xinhua, and SCMP are also large-scale but may contain information in regional languages such as Indian and Chinese, hence, they have not been selected. This selection ensures a broad spectrum of political discourse and audience engagement.
• Dataset Generalizability and Bias: With 3.07 billion of the total 5 billion social media users, the extensive user base of Facebook, reflective of broader social media engagement patterns, ensures that the insights gained are applicable across various platforms, reducing bias and strengthening the generalizability of our findings. Additionally, the geographic and political diversity of these news sources, ranging from local (NY Daily News) to international (BBC News), and spanning political spectra from left (NY Daily News) to right (Fox News), ensures a balanced representation of global and political viewpoints in our dataset. This approach not only mitigates regional and ideological biases but also enriches the dataset with a wide array of perspectives, further solidifying the robustness and applicability of our research.
• Dataset size and diversity: Facebook prohibits the automatic scraping of its users' personal data. In compliance with this policy, we manually scraped publicly available data. This labor-intensive process requiring around 800 hours of manual effort, limited our data volume but allowed for precise selection. We followed ethical protocols for scraping Facebook data , selecting 1000 posts from each of the four news handles to enhance diversity and reduce bias. Initially, 4000 posts were collected; after preprocessing (detailed in Section 3.1), 3646 posts remained. We then processed all associated comments, resulting in a total of 61734 comments. This manual method ensures adherence to Facebook’s policies and the integrity of our dataset.
Ethical considerations, data privacy and misuse prevention
The data collection adheres to Facebook’s ethical guidelines [<a href="https://developers.facebook.com/terms/"
Facebook
TwitterThe number of Facebook users in the United States was forecast to continuously increase between 2024 and 2028 by in total 12.6 million users (+5.04 percent). After the ninth consecutive increasing year, the Facebook user base is estimated to reach 262.8 million users and therefore a new peak in 2028. Notably, the number of Facebook users of was continuously increasing over the past years.User figures, shown here regarding the platform facebook, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).