As of April 2024, almost 32 percent of global Instagram audiences were aged between 18 and 24 years, and 30.6 percent of users were aged between 25 and 34 years. Overall, 16 percent of users belonged to the 35 to 44 year age group.
Instagram users
With roughly one billion monthly active users, Instagram belongs to the most popular social networks worldwide. The social photo sharing app is especially popular in India and in the United States, which have respectively 362.9 million and 169.7 million Instagram users each.
Instagram features
One of the most popular features of Instagram is Stories. Users can post photos and videos to their Stories stream and the content is live for others to view for 24 hours before it disappears. In January 2019, the company reported that there were 500 million daily active Instagram Stories users. Instagram Stories directly competes with Snapchat, another photo sharing app that initially became famous due to it’s “vanishing photos” feature.
As of the second quarter of 2021, Snapchat had 293 million daily active users.
As of April 2024, around 16.5 percent of global active Instagram users were men between the ages of 18 and 24 years. More than half of the global Instagram population worldwide was aged 34 years or younger.
Teens and social media
As one of the biggest social networks worldwide, Instagram is especially popular with teenagers. As of fall 2020, the photo-sharing app ranked third in terms of preferred social network among teenagers in the United States, second to Snapchat and TikTok. Instagram was one of the most influential advertising channels among female Gen Z users when making purchasing decisions. Teens report feeling more confident, popular, and better about themselves when using social media, and less lonely, depressed and anxious.
Social media can have negative effects on teens, which is also much more pronounced on those with low emotional well-being. It was found that 35 percent of teenagers with low social-emotional well-being reported to have experienced cyber bullying when using social media, while in comparison only five percent of teenagers with high social-emotional well-being stated the same. As such, social media can have a big impact on already fragile states of mind.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Gen Z and Millennials are the biggest social media users of all age groups.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
56.8% of the world’s total population is active on social media.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Regional use of social media has a significant effect on the male and female social media statistics.
As of January 2024, #love was the most used hashtag on Instagram, being included in over two billion posts on the social media platform. #Instagood and #instagram were used over one billion times as of early 2024.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The results might surprise you when looking at internet users that are active on social media in each country.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset captures insights from a survey on social media usage across diverse age groups and genders. It includes data on the most used platforms, daily screen time, reasons for usage, preferred content types, and how social media influences buying decisions. Additionally, it reflects users' concerns about privacy and their willingness to reduce usage. The dataset is useful for analyzing digital behavior, content preferences, and the social impact of online platforms. It can support research in marketing, psychology, and digital well-being, offering a snapshot of how people interact with and perceive social media in their daily lives.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The results of which gender uses which platforms are in.
As of January 2024, Instagram was slightly more popular with men than women, with men accounting for 50.6 percent of the platform’s global users. Additionally, the social media app was most popular amongst younger audiences, with almost 32 percent of users aged between 18 and 24 years.
Instagram’s Global Audience
As of January 2024, Instagram was the fourth most popular social media platform globally, reaching two billion monthly active users (MAU). This number is projected to keep growing with no signs of slowing down, which is not a surprise as the global online social penetration rate across all regions is constantly increasing.
As of January 2024, the country with the largest Instagram audience was India with 362.9 million users, followed by the United States with 169.7 million users.
Who is winning over the generations?
Even though Instagram’s audience is almost twice the size of TikTok’s on a global scale, TikTok has shown itself to be a fierce competitor, particularly amongst younger audiences. TikTok was the most downloaded mobile app globally in 2022, generating 672 million downloads. As of 2022, Generation Z in the United States spent more time on TikTok than on Instagram monthly.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This database is comprised of 951 participants who provided self-report data online in their school classrooms. The data was collected in 2016 and 2017. The dataset is comprised of 509 males (54%) and 442 females (46%). Their ages ranged from 12 to 16 years (M = 13.69, SD = 0.72). Seven participants did not report their age. The majority were born in Australia (N = 849, 89%). The next most common countries of birth were China (N = 24, 2.5%), the UK (N = 23, 2.4%), and the USA (N = 9, 0.9%). Data were drawn from students at five Australian independent secondary schools. The data contains item responses for the Spence Children’s Anxiety Scale (SCAS; Spence, 1998) which is comprised of 44 items. The Social media question asked about frequency of use with the question “How often do you use social media?”. The response options ranged from constantly to once a week or less. Items measuring Fear of Missing Out were included and incorporated the following five questions based on the APS Stress and Wellbeing in Australia Survey (APS, 2015). These were “When I have a good time it is important for me to share the details online; I am afraid that I will miss out on something if I don’t stay connected to my online social networks; I feel worried and uncomfortable when I can’t access my social media accounts; I find it difficult to relax or sleep after spending time on social networking sites; I feel my brain burnout with the constant connectivity of social media. Internal consistency for this measure was α = .81. Self compassion was measured using the 12-item short-form of the Self-Compassion Scale (SCS-SF; Raes et al., 2011). The data set has the option of downloading an excel file (composed of two worksheet tabs) or CSV files 1) Data and 2) Variable labels. References: Australian Psychological Society. (2015). Stress and wellbeing in Australia survey. https://www.headsup.org.au/docs/default-source/default-document-library/stress-and-wellbeing-in-australia-report.pdf?sfvrsn=7f08274d_4 Raes, F., Pommier, E., Neff, K. D., & Van Gucht, D. (2011). Construction and factorial validation of a short form of the self-compassion scale. Clinical Psychology and Psychotherapy, 18(3), 250-255. https://doi.org/10.1002/cpp.702 Spence, S. H. (1998). A measure of anxiety symptoms among children. Behaviour Research and Therapy, 36(5), 545-566. https://doi.org/10.1016/S0005-7967(98)00034-5
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The average person has 8-9 social media accounts. This has doubled since 2013, when the average person just had 4-5 accounts.
Cristiano Ronaldo has one of the most popular Instagram accounts as of April 2024.
The Portuguese footballer is the most-followed person on the photo sharing app platform with 628 million followers. Instagram's own account was ranked first with roughly 672 million followers.
How popular is Instagram?
Instagram is a photo-sharing social networking service that enables users to take pictures and edit them with filters. The platform allows users to post and share their images online and directly with their friends and followers on the social network. The cross-platform app reached one billion monthly active users in mid-2018. In 2020, there were over 114 million Instagram users in the United States and experts project this figure to surpass 127 million users in 2023.
Who uses Instagram?
Instagram audiences are predominantly young – recent data states that almost 60 percent of U.S. Instagram users are aged 34 years or younger. Fall 2020 data reveals that Instagram is also one of the most popular social media for teens and one of the social networks with the biggest reach among teens in the United States.
Celebrity influencers on Instagram
Many celebrities and athletes are brand spokespeople and generate additional income with social media advertising and sponsored content. Unsurprisingly, Ronaldo ranked first again, as the average media value of one of his Instagram posts was 985,441 U.S. dollars.
As of April 2024, it was found that men between the ages of 25 and 34 years made up Facebook largest audience, accounting for 18.4 percent of global users. Additionally, Facebook's second largest audience base could be found with men aged 18 to 24 years.
Facebook connects the world
Founded in 2004 and going public in 2012, Facebook is one of the biggest internet companies in the world with influence that goes beyond social media. It is widely considered as one of the Big Four tech companies, along with Google, Apple, and Amazon (all together known under the acronym GAFA). Facebook is the most popular social network worldwide and the company also owns three other billion-user properties: mobile messaging apps WhatsApp and Facebook Messenger,
as well as photo-sharing app Instagram. Facebook usersThe vast majority of Facebook users connect to the social network via mobile devices. This is unsurprising, as Facebook has many users in mobile-first online markets. Currently, India ranks first in terms of Facebook audience size with 378 million users. The United States, Brazil, and Indonesia also all have more than 100 million Facebook users each.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset explores how daily digital habits — including social media usage, screen time, and notification exposure — relate to individual productivity, stress, and well-being.
The dataset contains 30,000 real-world-style records simulating behavioral patterns of people with various jobs, social habits, and lifestyle choices. The goal is to understand how different digital behaviors correlate with perceived and actual productivity.
✅ Designed for real-world ML workflows
Includes missing values, noise, and outliers — ideal for practicing data cleaning and preprocessing.
🔗 High correlation between target features
The perceived_productivity_score
and actual_productivity_score
are strongly correlated, making this dataset suitable for experiments in feature selection and multicollinearity.
🛠️ Feature Engineering playground
Use this dataset to practice feature scaling, encoding, binning, interaction terms, and more.
🧪 Perfect for EDA, regression & classification
You can model productivity, stress, or satisfaction based on behavior patterns and digital exposure.
Column Name | Description |
---|---|
age | Age of the individual (18–65 years) |
gender | Gender identity: Male, Female, or Other |
job_type | Employment sector or status (IT, Education, Student, etc.) |
daily_social_media_time | Average daily time spent on social media (hours) |
social_platform_preference | Most-used social platform (Instagram, TikTok, Telegram, etc.) |
number_of_notifications | Number of mobile/social notifications per day |
work_hours_per_day | Average hours worked each day |
perceived_productivity_score | Self-rated productivity score (scale: 0–10) |
actual_productivity_score | Simulated ground-truth productivity score (scale: 0–10) |
stress_level | Current stress level (scale: 1–10) |
sleep_hours | Average hours of sleep per night |
screen_time_before_sleep | Time spent on screens before sleeping (hours) |
breaks_during_work | Number of breaks taken during work hours |
uses_focus_apps | Whether the user uses digital focus apps (True/False) |
has_digital_wellbeing_enabled | Whether Digital Wellbeing is activated (True/False) |
coffee_consumption_per_day | Number of coffee cups consumed per day |
days_feeling_burnout_per_month | Number of burnout days reported per month |
weekly_offline_hours | Total hours spent offline each week (excluding sleep) |
job_satisfaction_score | Satisfaction with job/life responsibilities (scale: 0–10) |
👉 Sample notebook coming soon with data cleaning, visualization, and productivity prediction!
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this post, I'll give you all the social media addiction statistics you need to be aware of to moderate your social media use.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Data for a Brief Report/Short Communication published in Body Image (2021). Details of the study are included below via the abstract from the manuscript. The dataset includes online experimental data from 167 women who were recruited via social media and institutional participant pools. The experiment was completed in Qualtrics.Women viewed either neutral travel images (control), body positivity posts with an average-sized model (e.g., ~ UK size 14), or body positivity posts with a larger model (e.g., UK size 18+); which images women viewed is show in the ‘condition’ variable in the data.The data includes the age range, height, weight, calculated BMI, and Instagram use of participants. After viewing the images, women responded to the Positive and Negative Affect Schedule (PANAS), a state version of the Body Satisfaction Scale (BSS), and reported their immediate social comparison with the images (SAC items). Women then selected a lunch for themselves from a hypothetical menu; these selections are detailed in the data, as are the total calories calculated from this and the proportion of their picks which were (provided as a percentage, and as a categorical variable [as used in the paper analyses]). Women also reported whether they were on a special diet (e.g., vegan or vegetarian), had food intolerances, when they last ate, and how hungry they were.
Women also completed trait measures of Body Appreciation (BAS-2) and social comparison (PACS-R). Women also were asked to comment on what they thought the experiment was about. Items and computed scales are included within the dataset.This item includes the dataset collected for the manuscript (in SPSS and CSV formats), the variable list for the CSV file (for users working with the CSV datafile; the variable list and details are contained within the .sav file for the SPSS version), and the SPSS syntax for our analyses (.sps). Also included are the information and consent form (collected via Qualtrics) and the questions as completed by participants (both in pdf format).Please note that the survey order in the PDF is not the same as in the datafiles; users should utilise the variable list (either in CSV or SPSS formats) to identify the items in the data.The SPSS syntax can be used to replicate the analyses reported in the Results section of the paper. Annotations within the syntax file guide the user through these.
A copy of SPSS Statistics is needed to open the .sav and .sps files.
Manuscript abstract:
Body Positivity (or ‘BoPo’) social media content may be beneficial for women’s mood and body image, but concerns have been raised that it may reduce motivation for healthy behaviours. This study examines differences in women’s mood, body satisfaction, and hypothetical food choices after viewing BoPo posts (featuring average or larger women) or a neutral travel control. Women (N = 167, 81.8% aged 18-29) were randomly assigned in an online experiment to one of three conditions (BoPo-average, BoPo-larger, or Travel/Control) and viewed three Instagram posts for two minutes, before reporting their mood and body satisfaction, and selecting a meal from a hypothetical menu. Women who viewed the BoPo posts featuring average-size women reported more positive mood than the control group; women who viewed posts featuring larger women did not. There were no effects of condition on negative mood or body satisfaction. Women did not make less healthy food choices than the control in either BoPo condition; women who viewed the BoPo images of larger women showed a stronger association between hunger and calories selected. These findings suggest that concerns over BoPo promoting unhealthy behaviours may be misplaced, but further research is needed regarding women’s responses to different body sizes.
Knowing who your consumers are is essential for businesses, marketers, and researchers. This detailed demographic file offers an in-depth look at American consumers, packed with insights about personal details, household information, financial status, and lifestyle choices. Let's take a closer look at the data:
Personal Identifiers and Basic Demographics At the heart of this dataset are the key details that make up a consumer profile:
Unique IDs (PID, HHID) for individuals and households Full names (First, Middle, Last) and suffixes Gender and age Date of birth Complete location details (address, city, state, ZIP) These identifiers are critical for accurate marketing and form the base for deeper analysis.
Geospatial Intelligence This file goes beyond just listing addresses by including rich geospatial data like:
Latitude and longitude Census tract and block details Codes for Metropolitan Statistical Areas (MSA) and Core-Based Statistical Areas (CBSA) County size codes Geocoding accuracy This allows for precise geographic segmentation and localized marketing.
Housing and Property Data The dataset covers a lot of ground when it comes to housing, providing valuable insights for real estate professionals, lenders, and home service providers:
Homeownership status Dwelling type (single-family, multi-family, etc.) Property values (market, assessed, and appraised) Year built and square footage Room count, amenities like fireplaces or pools, and building quality This data is crucial for targeting homeowners with products and services like refinancing or home improvement offers.
Wealth and Financial Data For a deeper dive into consumer wealth, the file includes:
Estimated household income Wealth scores Credit card usage Mortgage info (loan amounts, rates, terms) Home equity estimates and investment property ownership These indicators are invaluable for financial services, luxury brands, and fundraising organizations looking to reach affluent individuals.
Lifestyle and Interests One of the most useful features of the dataset is its extensive lifestyle segmentation:
Hobbies and interests (e.g., gardening, travel, sports) Book preferences, magazine subscriptions Outdoor activities (camping, fishing, hunting) Pet ownership, tech usage, political views, and religious affiliations This data is perfect for crafting personalized marketing campaigns and developing products that align with specific consumer preferences.
Consumer Behavior and Purchase Habits The file also sheds light on how consumers behave and shop:
Online and catalog shopping preferences Gift-giving tendencies, presence of children, vehicle ownership Media consumption (TV, radio, internet) Retailers and e-commerce businesses will find this behavioral data especially useful for tailoring their outreach.
Demographic Clusters and Segmentation Pre-built segments like:
Household, neighborhood, family, and digital clusters Generational and lifestage groups make it easier to quickly target specific demographics, streamlining the process for market analysis and campaign planning.
Ethnicity and Language Preferences In today's multicultural market, knowing your audience's cultural background is key. The file includes:
Ethnicity codes and language preferences Flags for Hispanic/Spanish-speaking households This helps ensure culturally relevant and sensitive communication.
Education and Occupation Data The dataset also tracks education and career info:
Education level and occupation codes Home-based business indicators This data is essential for B2B marketers, recruitment agencies, and education-focused campaigns.
Digital and Social Media Habits With everyone online, digital behavior insights are a must:
Internet, TV, radio, and magazine usage Social media platform engagement (Facebook, Instagram, LinkedIn) Streaming subscriptions (Netflix, Hulu) This data helps marketers, app developers, and social media managers connect with their audience in the digital space.
Political and Charitable Tendencies For political campaigns or non-profits, this dataset offers:
Political affiliations and outlook Charitable donation history Volunteer activities These insights are perfect for cause-related marketing and targeted political outreach.
Neighborhood Characteristics By incorporating census data, the file provides a bigger picture of the consumer's environment:
Population density, racial composition, and age distribution Housing occupancy and ownership rates This offers important context for understanding the demographic landscape.
Predictive Consumer Indexes The dataset includes forward-looking indicators in categories like:
Fashion, automotive, and beauty products Health, home decor, pet products, sports, and travel These predictive insights help businesses anticipate consumer trends and needs.
Contact Information Finally, the file includes ke...
A structured, self-report questionnaire designed by our research team was used to develop a customized dataset. The questionnaire was in the form of an online questionnaire comprising 4 main sections: • Demographics: age, gender, and education. • Technology and social media use: Daily hours of screen time, time spent on social media, main platforms used, and preference for technology usage (work or leisure). • Psychological and Cognitive Indicators: Self-rated concentration during the study (1–5), number of interruptions, change in mood following technology use, and perceived difficulty concentrating while using social media. • Self-Awareness and Coping: Perception of being overused, concerns about the use of technology, use of apps to reduce mental fatigue, and use of strategies to reduce duration. The responses were numerical. Physicians left the respondents with missing or invalid responses, which were removed during the pre-processing stage. A new binary response was defined—Brain Rot (Yes/No). A participant was deemed to have brain rot if they demonstrated 3 or more of the 6 brain rot patterns: • Social media use ≥3 hours per day • Screen time ≥ 4 hours per day • Focus level ≤ 2 out of 5 • Reports frequent distraction • Notices mood shift as technology is used • Thinks social media is bad for mental health This was the target variable and the outcome label for classification. However, the dataset was cleaned and pre-processed as follows pre-analysis: • Elimination of incomplete or contradictory records • Conversion of categorical into the numerical form (namely, yes = 1, no = 0). • Normalization of numerical features, if necessary • Treatment of outliers and testing for normality The ultimate dataset was balanced, well-formatted for statistical and machine learning analyses, and presented with well-defined input features and a binary classification output.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview
The Controllable Multimodal Feedback Synthesis (CMFeed) Dataset is designed to enable the generation of sentiment-controlled feedback from multimodal inputs, including text and images. This dataset can be used to train feedback synthesis models in both uncontrolled and sentiment-controlled manners. Serving a crucial role in advancing research, the CMFeed dataset supports the development of human-like feedback synthesis, a novel task defined by the dataset's authors. Additionally, the corresponding feedback synthesis models and benchmark results are presented in the associated code and research publication.
Task Uniqueness: The task of controllable multimodal feedback synthesis is unique, distinct from LLMs and tasks like VisDial, and not addressed by multi-modal LLMs. LLMs often exhibit errors and hallucinations, as evidenced by their auto-regressive and black-box nature, which can obscure the influence of different modalities on the generated responses [Ref1; Ref2]. Our approach includes an interpretability mechanism, as detailed in the supplementary material of the corresponding research publication, demonstrating how metadata and multimodal features shape responses and learn sentiments. This controllability and interpretability aim to inspire new methodologies in related fields.
Data Collection and Annotation
Data was collected by crawling Facebook posts from major news outlets, adhering to ethical and legal standards. The comments were annotated using four sentiment analysis models: FLAIR, SentimentR, RoBERTa, and DistilBERT. Facebook was chosen for dataset construction because of the following factors:
• Facebook was chosen for data collection because it uniquely provides metadata such as news article link, post shares, post reaction, comment like, comment rank, comment reaction rank, and relevance scores, not available on other platforms.
• Facebook is the most used social media platform, with 3.07 billion monthly users, compared to 550 million Twitter and 500 million Reddit users. [Ref]
• Facebook is popular across all age groups (18-29, 30-49, 50-64, 65+), with at least 58% usage, compared to 6% for Twitter and 3% for Reddit. [Ref]. Trends are similar for gender, race, ethnicity, income, education, community, and political affiliation [Ref]
• The male-to-female user ratio on Facebook is 56.3% to 43.7%; on Twitter, it's 66.72% to 23.28%; Reddit does not report this data. [Ref]
Filtering Process: To ensure high-quality and reliable data, the dataset underwent two levels of filtering:
a) Model Agreement Filtering: Retained only comments where at least three out of the four models agreed on the sentiment.
b) Probability Range Safety Margin: Comments with a sentiment probability between 0.49 and 0.51, indicating low confidence in sentiment classification, were excluded.
After filtering, 4,512 samples were marked as XX. Though these samples have been released for the reader's understanding, they were not used in training the feedback synthesis model proposed in the corresponding research paper.
Dataset Description
• Total Samples: 61,734
• Total Samples Annotated: 57,222 after filtering.
• Total Posts: 3,646
• Average Likes per Post: 65.1
• Average Likes per Comment: 10.5
• Average Length of News Text: 655 words
• Average Number of Images per Post: 3.7
Components of the Dataset
The dataset comprises two main components:
• CMFeed.csv File: Contains metadata, comment, and reaction details related to each post.
• Images Folder: Contains folders with images corresponding to each post.
Data Format and Fields of the CSV File
The dataset is structured in CMFeed.csv file along with corresponding images in related folders. This CSV file includes the following fields:
• Id: Unique identifier
• Post: The heading of the news article.
• News_text: The text of the news article.
• News_link: URL link to the original news article.
• News_Images: A path to the folder containing images related to the post.
• Post_shares: Number of times the post has been shared.
• Post_reaction: A JSON object capturing reactions (like, love, etc.) to the post and their counts.
• Comment: Text of the user comment.
• Comment_like: Number of likes on the comment.
• Comment_reaction_rank: A JSON object detailing the type and count of reactions the comment received.
• Comment_link: URL link to the original comment on Facebook.
• Comment_rank: Rank of the comment based on engagement and relevance.
• Score: Sentiment score computed based on the consensus of sentiment analysis models.
• Agreement: Indicates the consensus level among the sentiment models, ranging from -4 (all negative) to 4 (all positive). 3 negative and 1 positive will result into -2 and 3 positives and 1 negative will result into +2.
• Sentiment_class: Categorizes the sentiment of the comment into 1 (positive) or 0 (negative).
More Considerations During Dataset Construction
We thoroughly considered issues such as the choice of social media platform for data collection, bias and generalizability of the data, selection of news handles/websites, ethical protocols, privacy and potential misuse before beginning data collection. While achieving completely unbiased and fair data is unattainable, we endeavored to minimize biases and ensure as much generalizability as possible. Building on these considerations, we made the following decisions about data sources and handling to ensure the integrity and utility of the dataset:
• Why not merge data from different social media platforms? We chose not to merge data from platforms such as Reddit and Twitter with Facebook due to the lack of comprehensive metadata, clear ethical guidelines, and control mechanisms—such as who can comment and whether users' anonymity is maintained—on these platforms other than Facebook. These factors are critical for our analysis. Our focus on Facebook alone was crucial to ensure consistency in data quality and format.
• Choice of four news handles: We selected four news handles—BBC News, Sky News, Fox News, and NY Daily News—to ensure diversity and comprehensive regional coverage. These news outlets were chosen for their distinct regional focuses and editorial perspectives: BBC News is known for its global coverage with a centrist view, Sky News offers geographically targeted and politically varied content learning center/right in the UK/EU/US, Fox News is recognized for its right-leaning content in the US, and NY Daily News provides left-leaning coverage in New York. Many other news handles such as NDTV, The Hindu, Xinhua, and SCMP are also large-scale but may contain information in regional languages such as Indian and Chinese, hence, they have not been selected. This selection ensures a broad spectrum of political discourse and audience engagement.
• Dataset Generalizability and Bias: With 3.07 billion of the total 5 billion social media users, the extensive user base of Facebook, reflective of broader social media engagement patterns, ensures that the insights gained are applicable across various platforms, reducing bias and strengthening the generalizability of our findings. Additionally, the geographic and political diversity of these news sources, ranging from local (NY Daily News) to international (BBC News), and spanning political spectra from left (NY Daily News) to right (Fox News), ensures a balanced representation of global and political viewpoints in our dataset. This approach not only mitigates regional and ideological biases but also enriches the dataset with a wide array of perspectives, further solidifying the robustness and applicability of our research.
• Dataset size and diversity: Facebook prohibits the automatic scraping of its users' personal data. In compliance with this policy, we manually scraped publicly available data. This labor-intensive process requiring around 800 hours of manual effort, limited our data volume but allowed for precise selection. We followed ethical protocols for scraping Facebook data , selecting 1000 posts from each of the four news handles to enhance diversity and reduce bias. Initially, 4000 posts were collected; after preprocessing (detailed in Section 3.1), 3646 posts remained. We then processed all associated comments, resulting in a total of 61734 comments. This manual method ensures adherence to Facebook’s policies and the integrity of our dataset.
Ethical considerations, data privacy and misuse prevention
The data collection adheres to Facebook’s ethical guidelines [<a href="https://developers.facebook.com/terms/"
As of April 2024, almost 32 percent of global Instagram audiences were aged between 18 and 24 years, and 30.6 percent of users were aged between 25 and 34 years. Overall, 16 percent of users belonged to the 35 to 44 year age group.
Instagram users
With roughly one billion monthly active users, Instagram belongs to the most popular social networks worldwide. The social photo sharing app is especially popular in India and in the United States, which have respectively 362.9 million and 169.7 million Instagram users each.
Instagram features
One of the most popular features of Instagram is Stories. Users can post photos and videos to their Stories stream and the content is live for others to view for 24 hours before it disappears. In January 2019, the company reported that there were 500 million daily active Instagram Stories users. Instagram Stories directly competes with Snapchat, another photo sharing app that initially became famous due to it’s “vanishing photos” feature.
As of the second quarter of 2021, Snapchat had 293 million daily active users.