How much time do people spend on social media? As of 2025, the average daily social media usage of internet users worldwide amounted to 141 minutes per day, down from 143 minutes in the previous year. Currently, the country with the most time spent on social media per day is Brazil, with online users spending an average of 3 hours and 49 minutes on social media each day. In comparison, the daily time spent with social media in the U.S. was just 2 hours and 16 minutes. Global social media usageCurrently, the global social network penetration rate is 62.3 percent. Northern Europe had an 81.7 percent social media penetration rate, topping the ranking of global social media usage by region. Eastern and Middle Africa closed the ranking with 10.1 and 9.6 percent usage reach, respectively. People access social media for a variety of reasons. Users like to find funny or entertaining content and enjoy sharing photos and videos with friends, but mainly use social media to stay in touch with current events friends. Global impact of social mediaSocial media has a wide-reaching and significant impact on not only online activities but also offline behavior and life in general. During a global online user survey in February 2019, a significant share of respondents stated that social media had increased their access to information, ease of communication, and freedom of expression. On the flip side, respondents also felt that social media had worsened their personal privacy, increased a polarization in politics and heightened everyday distractions.
In 2023, Meta Platforms had a total annual revenue of over 134 billion U.S. dollars, up from 116 billion in 2022. LinkedIn reported its highest annual revenue to date, generating over 15 billion USD, whilst Snapchat reported an annual revenue of 4.6 billion USD.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The 2020 presidential election saw election officials experience physical and social media threats, harassment, and animosity. Although little research exists regarding animosity toward US election officials, observers noted a sharp increase in 2020 in animosity toward US election officials. The harassment of election officials hindered their work in administering a free and fair election and may have generated doubts about electoral integrity. Our study: (1) Proposes a unique measurement and modeling strategy applicable across many social media networks to study toxicity directed at officials, institutions, or groups; (2) Collects a novel dataset of social media conversations about election administration in the 2020 election; (3) Uses joint sentiment-topic modeling to identify toxicity from the reactions of the public and election officials, and uses dynamic vector autoregression models to determine the temporal structure of the toxic conversations directed at election officials; (4) Finds that the level of animosity toward election officials spikes immediately after the election, that hostile topics overall make up about a quarter of the discussion share during this period, increasing to about 60% following the election, and that hostile topics come from left- and right-wing partisans. Our article concludes by discussing how similar data collection and topic modeling approaches could be deployed in future elections to monitor trolling and harassment of election officials, and to mitigate similar threats to successful election administration globally.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Companion DATA
Title: Using social media and personality traits to assess software developers' emotional polarity
Authors: Leo Moreira Silva Marília Gurgel Castro Miriam Bernardino Silva Milena Santos Uirá Kulesza Margarida Lima Henrique Madeira
Journal: PeerJ Computer Science
Github: https://github.com/leosilva/peerj_computer_science_2022
The folders contain:
Experiment_Protocol.pdf: document that present the protocol regarding recruitment protocol, data collection of public posts from Twitter, criteria for manual analysis, and the assessment of Big Five factors from participants and psychologists. English version.
/analysis analyzed_tweets_by_psychologists.csv: file containing the manual analysis done by psychologists analyzed_tweets_by_participants.csv: file containing the manual analysis done by participants analyzed_tweets_by_psychologists_solved_divergencies.csv: file containing the manual analysis done by psychologists over 51 divergent tweets' classifications
/dataset alldata.json: contains the dataset used in the paper
/ethics_committee committee_response_english_version.pdf: contains the acceptance response of Research Ethics and Deontology Committee of the Faculty of Psychology and Educational Sciences of the University of Coimbra. English version. committee_response_original_portuguese_version: contains the acceptance response of Research Ethics and Deontology Committee of the Faculty of Psychology and Educational Sciences of the University of Coimbra. Portuguese version. committee_submission_form_english_version.pdf: the project submitted to the committee. English version. committee_submission_form_original_portuguese_version.pdf: the project submitted to the committee. Portuguese version. consent_form_english_version.pdf: declaration of free and informed consent fulfilled by participants. English version. consent_form_original_portuguese_version.pdf: declaration of free and informed consent fulfilled by participants. Portuguese version. data_protection_declaration_english_version.pdf: personal data and privacy declaration, according to European Union General Data Protection Regulation. English version. data_protection_declaration_original_portuguese_version.pdf: personal data and privacy declaration, according to European Union General Data Protection Regulation. Portuguese version.
/notebooks General - Charts.ipynb: notebook file containing all charts produced in the study, including those in the paper Statistics - Lexicons and Ensembles.ipynb: notebook file with the statistics for the five lexicons and ensembles used in the study Statistics - Linear Regression.ipynb: notebook file with the multiple linear regression results Statistics - Polynomial Regression.ipynb: notebook file with the polynomial regression results Statistics - Psychologists versus Participants.ipynb: notebook file with the statistics between the psychologists and participants manual analysis Statistics - Working x Non-working.ipynb: notebook file containing the statistical analysis for the tweets posted during work period and those posted outside of working period
/surveys Demographic_Survey_english_version.pdf: survey inviting participants to enroll in the study. We collect demographic data and participants' authorization to access their public Tweet posts. English version. Demographic_Survey_portuguese_version.pdf: survey inviting participants to enroll in the study. We collect demographic data and participants' authorization to access their public Tweet posts. Portuguese version. Demographic_Survey_answers.xlsx: participants' demographic survey answers ibf_pt_br.doc: the Portuguese version of the Big Five Inventory (BFI) instrument to infer participants' Big Five polarity traits. ibf_en.doc: translation in English of the Portuguese version of the Big Five Inventory (BFI) instrument to infer participants' Big Five polarity traits. ibf_answers.xlsx: participantes' and psychologists' answers for BFI
We have removed from dataset any sensible data to protect participants' privacy and anonymity. We have removed from demographic survey answers any sensible data to protect participants' privacy and anonymity.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Social media in general provide great opportunities for mining massive amounts of text, image, and video-based data. However, what questions can be addressed from analyzing such data? In this review, we are focusing on microblogging services and discuss applications of streaming data from the scientific literature. We will focus on text-based approaches because they represent by far the largest cohort of studies and we present a taxonomy of studied problems.
As of April 2024, around 16.5 percent of global active Instagram users were men between the ages of 18 and 24 years. More than half of the global Instagram population worldwide was aged 34 years or younger.
Teens and social media
As one of the biggest social networks worldwide, Instagram is especially popular with teenagers. As of fall 2020, the photo-sharing app ranked third in terms of preferred social network among teenagers in the United States, second to Snapchat and TikTok. Instagram was one of the most influential advertising channels among female Gen Z users when making purchasing decisions. Teens report feeling more confident, popular, and better about themselves when using social media, and less lonely, depressed and anxious.
Social media can have negative effects on teens, which is also much more pronounced on those with low emotional well-being. It was found that 35 percent of teenagers with low social-emotional well-being reported to have experienced cyber bullying when using social media, while in comparison only five percent of teenagers with high social-emotional well-being stated the same. As such, social media can have a big impact on already fragile states of mind.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the raw results of topic mapping presented at ngi.https://ngitopics.delabapps.eu. The dataset includes the title and link to the collected article, as well as the location on the map. There is a separate dataset for each analysed map: six umbrella topics + 6 regions For details on data collection and methodology see: https://ngitopics.delabapps.eu/report.pdf and https://ngitopics.delabapps.eu/report_multilanguage.pdf
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This database is comprised of 951 participants who provided self-report data online in their school classrooms. The data was collected in 2016 and 2017. The dataset is comprised of 509 males (54%) and 442 females (46%). Their ages ranged from 12 to 16 years (M = 13.69, SD = 0.72). Seven participants did not report their age. The majority were born in Australia (N = 849, 89%). The next most common countries of birth were China (N = 24, 2.5%), the UK (N = 23, 2.4%), and the USA (N = 9, 0.9%). Data were drawn from students at five Australian independent secondary schools. The data contains item responses for the Spence Children’s Anxiety Scale (SCAS; Spence, 1998) which is comprised of 44 items. The Social media question asked about frequency of use with the question “How often do you use social media?”. The response options ranged from constantly to once a week or less. Items measuring Fear of Missing Out were included and incorporated the following five questions based on the APS Stress and Wellbeing in Australia Survey (APS, 2015). These were “When I have a good time it is important for me to share the details online; I am afraid that I will miss out on something if I don’t stay connected to my online social networks; I feel worried and uncomfortable when I can’t access my social media accounts; I find it difficult to relax or sleep after spending time on social networking sites; I feel my brain burnout with the constant connectivity of social media. Internal consistency for this measure was α = .81. Self compassion was measured using the 12-item short-form of the Self-Compassion Scale (SCS-SF; Raes et al., 2011). The data set has the option of downloading an excel file (composed of two worksheet tabs) or CSV files 1) Data and 2) Variable labels. References: Australian Psychological Society. (2015). Stress and wellbeing in Australia survey. https://www.headsup.org.au/docs/default-source/default-document-library/stress-and-wellbeing-in-australia-report.pdf?sfvrsn=7f08274d_4 Raes, F., Pommier, E., Neff, K. D., & Van Gucht, D. (2011). Construction and factorial validation of a short form of the self-compassion scale. Clinical Psychology and Psychotherapy, 18(3), 250-255. https://doi.org/10.1002/cpp.702 Spence, S. H. (1998). A measure of anxiety symptoms among children. Behaviour Research and Therapy, 36(5), 545-566. https://doi.org/10.1016/S0005-7967(98)00034-5
Social Networking Market Size 2025-2029
The social networking market size is forecast to increase by USD 312.3 billion, at a CAGR of 21.6% between 2024 and 2029.
The market is experiencing significant growth, driven by the increasing internet penetration worldwide. This expansion is fueled by the rising number of active social media users, enabling businesses to reach a larger audience through digital platforms. However, the market's growth is not without challenges. Privacy concerns are increasingly obstructing market expansion, as users become more conscious of their online data and demand greater control over their information. Social media advertisements, a major revenue source for social networking companies, are gaining traction, creating intense competition among market players. Companies must navigate these challenges by addressing privacy concerns through transparent data handling policies and effective user data protection measures.
Additionally, innovation in advertising formats and targeting strategies will be crucial for businesses to differentiate themselves and maintain a competitive edge. In summary, the market presents both opportunities and challenges, with increasing internet penetration driving growth while privacy concerns and intense competition shaping the strategic landscape. Companies must effectively address these challenges to capitalize on the market's potential and stay ahead of the competition.
What will be the Size of the Social Networking Market during the forecast period?
Explore in-depth regional segment analysis with market size data - historical 2019-2023 and forecasts 2025-2029 - in the full report.
Request Free Sample
The market continues to evolve, with dynamic patterns emerging across various sectors. Customer acquisition and sales conversion are key areas of focus, as social CRM and mobile marketing strategies gain traction. User engagement remains a priority, with social listening and social network analysis providing valuable insights. Big data and data analytics play a crucial role in informing business decisions, while media relations and crisis communication strategies adapt to the digital landscape. Influencer marketing and viral marketing campaigns continue to shape consumer behavior, with conversion optimization and organic reach driving growth. Live streaming and user-generated content offer new opportunities for brands to engage with audiences.
Data visualization and machine learning are transforming how businesses analyze and respond to market trends. E-commerce platforms and social commerce are disrupting traditional retail models, with advertising platforms and social media marketing becoming essential tools for businesses. Algorithm updates and link building strategies impact search engine optimization and content strategy. Privacy concerns and network externalities are shaping the platform economics, while network effects drive user growth. Content creation tools and search engine optimization are essential for effective brand building, with public relations and sentiment analysis playing a critical role in reputation management. Video marketing and customer satisfaction are key drivers of brand loyalty, with data security and competitor analysis essential for maintaining a competitive edge.Social media platforms continue to evolve, offering new opportunities for businesses to connect with their audiences and build strong brands.
How is this Social Networking Industry segmented?
The social networking industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.
Type
Advertising
In-app purchase
Paid apps
Distribution Channel
Google
Apple
App Store Distribution
Service
Communication
Entertainment
Socialization
Marketing
Customer service
Platform
Website-based
Mobile apps
Hybrid platforms
Geography
North America
US
Canada
Europe
France
Germany
Italy
Spain
UK
Middle East and Africa
UAE
APAC
China
India
Japan
South Korea
South America
Brazil
Rest of World (ROW)
By Type Insights
The advertising segment is estimated to witness significant growth during the forecast period.
In the dynamic landscape of the market, various entities intertwine to shape its evolution. Big data and machine learning fuel social media analytics, enabling targeted advertising, conversion optimization, and customer satisfaction. Social listening and sentiment analysis inform brand monitoring, reputation management, and crisis communication. Social crm and community management foster customer loyalty and engagement. Mobile marketing, including user-generated content and live streaming, e
http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence
Monthly Tweetreach reports monitoring the online metrics for the Mayor of London's 'Ask Boris' Twitter sessions. Each report is generated against the search term, hashtag #askboris and is run from the day before the session starts to the end of the day the session takes place. Data includes:
Reach: The number of unique Twitter accounts that received tweets about the session
Exposure: The number of impressions generated by tweets in the report
Activity: Total number of tweets, contributors, time period and volume
Type: Number ot tweets, retweet and replies
Timeline: A full list of tweets
Notes: A full description of Tweetreach analytics and descriptors is available on www.tweetreach.com or in the article Understanding the TweetReach snapshot report
Please note that due to limitations with the listening tool not all tweets from Ask Boris sessions are captured in the reports.
Tweet Reach report - 28 June 2012
Tweet Reach report - 20 July 2012
Tweet Reach report - 30 August 2012
Tweet Reach report - 28 September 2012
Tweet Reach report - 29 October 2012
Tweet Reach report - 23 November 2012
Tweet Reach report - 20 December 2012
Tweet Reach report - 18 January 2013
Tweet Reach report - 25 February 2013
Tweet Reach report - 22 March 2013
Tweet Reach report - 26 April 2013
Tweet Reach report - 20 June 2013
Tweet Reach report - 18 July 2013
Tweet Reach report - 29 August 2013
Tweet Reach report - 23 September 2013
Tweet Reach report - 22 October 2013
Tweet Reach report - 25 November 2013
Tweet Reach report - 13 December 2013
Tweet Reach report - 15 January 2014
Tweet Reach report - 13 February 2014
Tweet Reach report - 27 March 2014
Tweet Reach report - 29 May 2014
Tweet Reach report - 26 June 2014
Tweet Reach report - 16 July 2014
Tweet Reach report - 5 August 2014
Tweet Reach report - 11 September 2014
Tweet Reach report - 20 October 2014
Tweet Reach report - 10 November 2014
Tweet Reach report - 19 December 2014
Tweet Reach report - 20 January 2015
Tweet Reach report - 19 March 2015
Tweet Reach report - 19 May 2015
Tweet Reach report - 25 June 2015
The World Bank conducted an in-depth analysis of the digital economy in Indonesia through the Digital Economic Household Survey (DEHS). The plan to survey 6,600 households was disrupted due to the pandemic. Thus, the DEHS dataset contains 3,063 households (HHs) out of planned 6,600 HHs (46%) from 311 enumeration areas (EAs) out of the planned 660 EAs.
The datasets contain household and individual data. Separate data files are provided for particular modules containing matrix-style questions. All household-level datasets contain the variable "hhid" as household identifier, whereas individual-level datasets contain both "hhid" and "hh_memberid" to identify individuals. These variables can be used for merging purposes across data files.
There are 6 modules available in these dataset: Module 1 contains general household-level information, including demographics, dwelling and ICT device usage. Module 2 asks on internet access and use, including device ownership, social media use, internet affordability, side effects and digital skills. Module 3 contains information related to service delivery, including government services, social assistance, education and health. Module 5 probes information related to household e-commerce activities as buyers and digital on-demand services. Module 6 focuses on use of digital finance in the household. Lastly, Module 9 collects information related to household enterprise activities, which includes e-commerce activities as sellers.
The survey is representative of major island regions in Indonesia (Sumatera, Java, Nusa Tenggara, Kalimantan, Sulawesi, Maluku, Papua).
Individual, Household
The survey uses Stratified Four-Stage PPES (Probability Proportional to Estimated Size) Sampling. As Primary Sampling Units (PSUs), districts in each region were stratified into 'rural' or 'urban'. Villages are Secondary Sampling Units (SSUs), while hamlets and households are Tertiary Sampling Units (TSUs) and Ultimate Sampling Units (USUs), respectively. Eligible villages are defined as villages with internet signal, regardless of the quality of the signal (4G, 3G, or 2.5G), based on Podes 2018 data.
The survey did not deviate from its sample design. However, the survey was unable to obtain its full sample (only 3,063 out of 6,600 households) due to early termination of the survey because of COVID-related restrictions.
Computer Assisted Personal Interview [capi]
The DEHS questionnaire includes the following modules:
Module 1: General Information (01_DEHS_Questionnaire_General_Module_final_070620_eng.pdf) Module 2: Internet Access and Use (02_DEHS_Questionnaire_internet_access_and_use_final_070620_eng.pdf) Module 3: Service Delivery (03_DEHS_Questionnaire_Service_Delivery_final_070620_eng.pdf) Module 5: E-commerce (05_DEHS_Questionnaire_e_Commerce_final_070620_eng.pdf) Module 6: Finance (06_DEHS_Questionnaire_Finance_final_070620_eng.pdf) Module 9: HH Enterprise (09_DEHS_Questionnaire_HH_Enterprise_final_070620_eng.pdf)
Note: The initial survey design also included module 7 (last mile internet service delivery) and module 8 (community retail price). However, both modules were ultimately dropped in order to save enumeration time, and reduce respondent fatigue.
As of January 2024, Instagram was slightly more popular with men than women, with men accounting for 50.6 percent of the platform’s global users. Additionally, the social media app was most popular amongst younger audiences, with almost 32 percent of users aged between 18 and 24 years.
Instagram’s Global Audience
As of January 2024, Instagram was the fourth most popular social media platform globally, reaching two billion monthly active users (MAU). This number is projected to keep growing with no signs of slowing down, which is not a surprise as the global online social penetration rate across all regions is constantly increasing.
As of January 2024, the country with the largest Instagram audience was India with 362.9 million users, followed by the United States with 169.7 million users.
Who is winning over the generations?
Even though Instagram’s audience is almost twice the size of TikTok’s on a global scale, TikTok has shown itself to be a fierce competitor, particularly amongst younger audiences. TikTok was the most downloaded mobile app globally in 2022, generating 672 million downloads. As of 2022, Generation Z in the United States spent more time on TikTok than on Instagram monthly.
PurposeAround 5% of United States (U.S.) population identifies as Sexual and Gender Diverse (SGD), yet there is limited research around cancer prevention among these populations. We present multi-pronged, low-cost, and systematic recruitment strategies used to reach SGD communities in New Mexico (NM), a state that is both largely rural and racially/ethnically classified as a “majority-minority” state.MethodsOur recruitment focused on using: (1) Every Door Direct Mail (EDDM) program, by the United States Postal Services (USPS); (2) Google and Facebook advertisements; (3) Organizational outreach via emails to publicly available SGD-friendly business contacts; (4) Personal outreach via flyers at clinical and community settings across NM. Guided by previous research, we provide detailed descriptions on using strategies to check for fraudulent and suspicious online responses, that ensure data integrity.ResultsA total of 27,369 flyers were distributed through the EDDM program and 436,177 impressions were made through the Google and Facebook ads. We received a total of 6,920 responses on the eligibility survey. For the 5,037 eligible respondents, we received 3,120 (61.9%) complete responses. Of these, 13% (406/3120) were fraudulent/suspicious based on research-informed criteria and were removed. Final analysis included 2,534 respondents, of which the majority (59.9%) reported hearing about the study from social media. Of the respondents, 49.5% were between 31-40 years, 39.5% were Black, Hispanic, or American Indian/Alaskan Native, and 45.9% had an annual household income below $50,000. Over half (55.3%) were assigned male, 40.4% were assigned female, and 4.3% were assigned intersex at birth. Transgender respondents made up 10.6% (n=267) of the respondents. In terms of sexual orientation, 54.1% (n=1371) reported being gay or lesbian, 30% (n=749) bisexual, and 15.8% (n=401) queer. A total of 756 (29.8%) respondents reported receiving a cancer diagnosis and among screen-eligible respondents, 66.2% reported ever having a Pap, 78.6% reported ever having a mammogram, and 84.1% reported ever having a colonoscopy. Over half of eligible respondents (58.7%) reported receiving Human Papillomavirus vaccinations.ConclusionStudy findings showcase effective strategies to reach communities, maximize data quality, and prevent the misrepresentation of data critical to improve health in SGD communities.
https://www.thebusinessresearchcompany.com/privacy-policyhttps://www.thebusinessresearchcompany.com/privacy-policy
Global Social Media market size is expected to reach $341.7 billion by 2029 at 13.2%, segmented as by type, social media advertisement, social media subscription
This data collection consists anonymised survey data collected as part of a study into how people perceive likelihood and risk of inferring sensitive information from social media data when injecting conflicts and uncertainty. Electronic files include XLS spreadsheet of collected survey responses, and pdf versions of the online survey instrument.There is now a broad consensus that new forms of social data emerging from people’s day-to-day activities on the web have the potential to transform the social sciences. However, there is also agreement that current analytical techniques fall short of the methodological standards required for academic research and policymaking and that conclusions drawn from social media data have much greater utility when combined with results drawn from other datasets (including various public sector resources made available through open data initiatives). In this proposal we outline the case for further investigations into the challenges surrounding social media data and the social sciences. Aspects of the work will involve analysis of social media data in a number of contexts, including: - transport disruption around the 2014 Commonwealth Games (Glasgow) - news stories about Scottish independence and UK-EU relations - island communities in the Western Isles. Guided by insights from these case studies we will: -develop a suite of software tools to support various aspects of data analysis and curation; -provide guidance on ethical considerations surrounding analysis of social media data; deliver training workshops for social science researchers; - engage with the public on this important topic through a series of festivals (food, music, science).
This archive contains the data and code required to replicate Barberá et al, "Who Leads? Who Follows? Measuring Issue Attention and Agenda Setting by Legislators and the Mass Public Using Social Media Data" (APSR). See README.pdf for additional information. A copy of the code is also available at: https://github.com/SMAPPNYU/lead_follow_apsr
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Companion DATA
Title:
Using social media and personality traits to assess software developers’ emotions
Authors:
Leo Moreira Silva
Marília Gurgel Castro
Miriam Bernardino Silva
Milena Nestor Santos
Uirá Kulesza
Margarida Lima
Henrique Madeira
Journal:
PeerJ Computer Science
The folders contain:
/analysis
analyzed_tweets_by_psychologists.csv: file containing the manual analysis done by psychologists
analyzed_tweets_by_participants.csv: file containing the manual analysis done by participants
analyzed_tweets_by_psychologists_solved_divergencies.csv: file containing the manual analysis done by psychologists over 51 divergent tweets' classifications
/dataset
alldata.json: contains the dataset used in the paper
/ethics_committee
committee_response.pdf: contains the acceptance response of Research Ethics and Deontology Committee of the Faculty of Psychology and Educational Sciences of the University of Coimbra.
committee_submission_form.pdf: the project submitted to the committee.
consent_form.pdf: declaration of free and informed consent fulfilled by participants.
data_protection_declaration.pdf: personal data and privacy declaration, according to European Union General Data Protection Regulation.
/notebooks
General - Charts.ipynb: notebook file containing all charts produced in the study, including those in the paper
Statistics - Lexicons and Ensembles.ipynb: notebook file with the statistics for the five lexicons and ensembles used in the study
Statistics - Linear Regression.ipynb: notebook file with the multiple linear regression results
Statistics - Polynomial Regression.ipynb: notebook file with the polynomial regression results
Statistics - Psychologists versus Participants.ipynb: notebook file with the statistics between the psychologists and participants manual analysis
Statistics - Working x Non-working.ipynb: notebook file containing the statistical analysis for the tweets posted during work period and those posted outside of working period
/surveys
Demographic_Survey.pdf: survey inviting participants to enroll in the study. We collect demographic data and participants' authorization to access their public Tweet posts
Demographic_Survey_answers.xlsx: participants' demographic survey answers
ibf_pt_br.doc: the Portuguese version of the Big Five Inventory (BFI) instrument to infer participants' Big Five polarity traits
ibf_answers.xlsx: participantes' and psychologists' answers for BFI
Experiment Protocol.pdf: file containing the explanation of the experiment protocol.
We have removed from dataset any sensible data to protect participants' privacy and anonymity.
We have removed from demographic survey answers any sensible data to protect participants' privacy and anonymity.
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.7910/DVN/W31PH5https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.7910/DVN/W31PH5
Dramatic increases in large-scale data generated through social media, combined with increased computational power, have enabled the growth of computational approaches to social media research, and social science in general. While many of these approaches require statistical or computational training, they have the great benefit of being inherently transparent—allowing for research that others can reproduce and learn from. To that end, we wrote a book chapter in the Sage Handbook of Social Media in which we obtain a large-scale dataset of metadata about social media research papers which we analyze using a few commonly-used computational methods. This repository provides the code, data, and documentation designed to tell you exactly how we did that and to walk you through how to reproduce our results and our paper by running the code we wrote. You can find the chapter here: Foote, Jeremy D., Aaron Shaw, and Benjamin Mako Hill. 2017. “A Computational Analysis of Social Media Scholarship.” In The SAGE Handbook of Social Media, edited by Jean Burgess, Alice Marwick, and Thomas Poell, 111–34. London, UK: SAGE. [Official Link] [Preprint PDF] Documentation on how to download and use these data are provided on the following website: https://communitydata.science/social-media-chapter/ A copy of our documentation website can be found in the files README.md and README.html included in this repository.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
This database is comprised of 603 participants who provided self-report data online in their school classrooms. The data was collected in 2016 and 2017. The dataset is comprised of 208 males (34%) and 395 females (66%). Their ages ranged from 12 to 15 years. Their age in years at baseline is provided. The majority were born in Australia. Data were drawn from students at two Australian independent secondary schools. The data contains total responses for the following scales:
The Intolerance of Uncertainty Scale (IUS-12; Short form; Carleton et al, 2007) is a 12-item scale measuring two dimensions of Prospective and Inhibitory intolerance of uncertainty.
Two subscales of the Children’s Automatic Thoughts Scale (CATS; Schniering & Rapee, 2002) were administered. The Peronalising and Social Threat were each composed of 10 items.
UPPS Impulsive Behaviour Scale (Whiteside & Lynam, 2001) which is comprised of 12 items.
Dispositional Envy Scale (DES; Smith et al, 1999) which is comprised of 8 items.
Spence Children’s Anxiety Scale (SCAS; Spence, 1998) which is comprised of 44 items. Three subscales totals included were the GAD subscale (labelled SCAS_GAD), the OCD subscale (labelled SCAS_OCD) and the Social Anxiety subscale (labelled SCAS_SA). Each subscale was comprised of 6 items.
Avoidance and Fusion Questionnaire for Youth (AFQ-Y; Greco et al., 2008) which is comprised of 17 items.
Distress Disclosure Index (DDI; Kahn & Hessling, 2001) which is comprised of 12 items.
Repetitive Thinking Questionnaire-10 (RTQ-10; McEvoy et al., 2014) which is comprised of 10 items.
The Brief Fear of Negative Evaluation Scale, Straightforward Items (BFNE-S; Rodebaugh et al., 2004) which is comprised of 8 items.
Short Mood and Feelings Questionnaire (SMFQ; Angold et al., 1995) which is comprised by 13 items.
The Self-Compassion Scale Short Form (SCS-SF; Raes et al., 2011) which is comprised by 12 items. The subscales include Self Kindness, Self Judgment, Social Media subscales - These subscale scores were based on social media questions composed for this project and also drawn from three separate scales as indicated in the table below. The original scales assessed whether participants experience discomfort and a fear of missing out when disconnected from social media (taken from the Australian Psychological Society Stress and Wellbeing Survey; Australian Psychological Society, 2015a), style of social media use (Tandoc et al., 2015b) and Fear of Missing Out (Przybylski et al., 2013c). The items in each subscale are listed below.
Pub_Share Public Sharing When I have a good time it is important for me to share the details onlinec
On social media how often do you write a status updateb
On social media how often do you post photosb
Surveillance_SM On social media how often do you read the newsfeed
On social media how often do you read a friend’s status updateb
On social media how often do you view a friend’s photob
On social media how often do you browse a friend’s timelineb
Upset Share On social media how often do you go online to share things that have upset you?
Text private On social media how often do you Text friends privately to share things that have upset you?
Insight_SM Social Media Reduction I use social media less now because it often made me feel inadequate
FOMO I am afraid that I will miss out on something if I don’t stay connected to my online social networksa.
I feel worried and uncomfortable when I can’t access my social media accountsa.
Neg Eff of SM I find it difficult to relax or sleep after spending time on social networking sitesa.
I feel my brain ‘burnout’ with the constant connectivity of social mediaa.
I notice I feel envy when I use social media.
I can easily detach from the envy that appears following the use of social media (reverse scored)
DES_SM Envy Mean acts online Feeling envious about another person has led me to post a comment online about another person to make them laugh
Feeling envious has led me to post a photo online without someone’s permission to make them angry or to make fun of them
Feeling envious has prompted me to keep another student out of things on purpose, excluding her from my group of friends or ignoring them.
Substance Use: Two items measuring peer influence on alcohol consumption were adapted from the SHAHRP “Patterns of Alcohol Use” measure (McBride, Farringdon & Midford, 2000). These items were “When I am with friends I am quite likely to drink too much alcohol” and “Substances (alcohol, drugs, medication) are the immediate way I respond to my thoughts about a situation when I feel distressed or upset.
Angold, A., Costello, E. J., Messer, S. C., & Pickles, A. (1995). Development of a short questionnaire for use in epidemiological studies of depression in children and adolescents. International Journal of Methods in Psychiatric Research, 5(4), 237–249.
Australian Psychological Society. (2015). Stress and wellbeing in Australia survey. https://www.headsup.org.au/docs/default-source/default-document-library/stress-and-wellbeing-in-australia-report.pdf?sfvrsn=7f08274d_4
Greco, L.A., Lambert, W. & Baer., R.A. (2008) Psychological inflexibility in childhood and adolescence: Development and evaluation of the Avoidance and Fusion Questionnaire for Youth. Psychological Assessment, 20, 93-102. https://doi.org/10.1037/1040-3590.20.2.9
Kahn, J. H., & Hessling, R. M. (2001). Measuring the tendency to conceal versus disclose psychological distress. Journal of Social and Clinical Psychology, 20(1), 41–65. https://doi.org/10.1521/jscp.20.1.41.22254
McBride, N., Farringdon, F. & Midford, R. (2000) What harms do young Australians experience in alcohol use situations. Australian and New Zealand Journal of Public Health, 24, 54–60 https://doi.org/10.1111/j.1467-842x.2000.tb00723.x
McEvoy, P.M., Thibodeau, M.A., Asmundson, G.J.G. (2014) Trait Repetitive Negative Thinking: A brief transdiagnostic assessment. Journal of Experimental Psychopathology, 5, 1-17. Doi. 10.5127/jep.037813
Przybylski, A. K., Murayama, K., DeHaan, C. R., & Gladwell, V. (2013). Motivational, emotional, and behavioral correlates of fear of missing out. Computers in human behavior, 29(4), 1841-1848. https://doi.org/10.1016/j.chb.2013.02.014
Raes, F., Pommier, E., Neff, K. D., & Van Gucht, D. (2011). Construction and factorial validation of a short form of the self-compassion scale. Clinical Psychology and Psychotherapy, 18(3), 250-255. https://doi.org/10.1002/cpp.702
Rodebaugh, T. L., Woods, C. M., Thissen, D. M., Heimberg, R. G., Chambless, D. L., & Rapee, R. M. (2004). More information from fewer questions: the factor structure and item properties of the original and brief fear of negative evaluation scale. Psychological assessment, 16(2), 169. https://doi.org/10.1037/10403590.16.2.169
Schniering, C. A., & Rapee, R. M. (2002). Development and validation of a measure of children’s automatic thoughts: the children’s automatic thoughts scale. Behaviour Research and Therapy, 40(9), 1091-1109. . https://doi.org/10.1016/S0005-7967(02)00022-0
Smith, R. H., Parrott, W. G., Diener, E. F., Hoyle, R. H., & Kim, S. H. (1999). Dispositional envy. Personality and Social Psychology Bulletin, 25(8), 1007-1020. https://doi.org/10.1177/01461672992511008
Spence, S. H. (1998). A measure of anxiety symptoms among children. Behaviour Research and Therapy, 36(5), 545-566. https://doi.org/10.1016/S0005-7967(98)00034-5
Tandoc, E. C., Ferrucci, P., & Duffy, M. (2015). Facebook use, envy, and depression among college students: Is facebooking depressing? Computers in Human Behavior, 43, 139–146. https://doi.org/10.1016/j.chb.2014.10.053
Whiteside, S.P. & Lynam, D.R. (2001) The five factor model and impulsivity: using a structural model of personality to understand impulsivity. Personality and Individual Differences 30,669-689. https://doi.org/10.1016/S0191-8869(00)00064-7
The data was collected by Dr Danielle A Einstein, Dr Madeleine Fraser, Dr Anne McMaugh, Prof Peter McEvoy, Prof Ron Rapee, Assoc/Prof Maree Abbott, Prof Warren Mansell and Dr Eyal Karin as part of the Insights Project.
The data set has the option of downloading an excel file (composed of two worksheet tabs) or CSV files 1) Data and 2) Variable labels.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MultiSocial is a dataset (described in a paper) for multilingual (22 languages) machine-generated text detection benchmark in social-media domain (5 platforms). It contains 472,097 texts, of which about 58k are human-written and approximately the same amount is generated by each of 7 multilingual large language models by using 3 iterations of paraphrasing. The dataset has been anonymized to minimize amount of sensitive data by hiding email addresses, usernames, and phone numbers.
If you use this dataset in any publication, project, tool or in any other form, please, cite the a paper.
Due to data source (described below), the dataset may contain harmful, disinformation, or offensive content. Based on a multilingual toxicity detector, about 8% of the text samples are probably toxic (from 5% in WhatsApp to 10% in Twitter). Although we have used data sources of older date (lower probability to include machine-generated texts), the labeling (of human-written text) might not be 100% accurate. The anonymization procedure might not successfully hiden all the sensitive/personal content; thus, use the data cautiously (if feeling affected by such content, report the found issues in this regard to dpo[at]kinit.sk). The intended use if for non-commercial research purpose only.
The human-written part consists of a pseudo-randomly selected subset of social media posts from 6 publicly available datasets:
Telegram data originated in Pushshift Telegram, containing 317M messages (Baumgartner et al., 2020). It contains messages from 27k+ channels. The collection started with a set of right-wing extremist and cryptocurrency channels (about 300 in total) and was expanded based on occurrence of forwarded messages from other channels. In the end, it thus contains a wide variety of topics and societal movements reflecting the data collection time.
Twitter data originated in CLEF2022-CheckThat! Task 1, containing 34k tweets on COVID-19 and politics (Nakov et al., 2022, combined with Sentiment140, containing 1.6M tweets on various topics (Go et al., 2009).
Gab data originated in the dataset containing 22M posts from Gab social network. The authors of the dataset (Zannettou et al., 2018) found out that “Gab is predominantly used for the dissemination and discussion of news and world events, and that it attracts alt-right users, conspiracy theorists, and other trolls.” They also found out that hate speech is much more prevalent there compared to Twitter, but lower than 4chan's Politically Incorrect board.
Discord data originated in Discord-Data, containing 51M messages. This is a long-context, anonymized, clean, multi-turn and single-turn conversational dataset based on Discord data scraped from a large variety of servers, big and small. According to the dataset authors, it contains around 0.1% of potentially toxic comments (based on the applied heuristic/classifier).
WhatsApp data originated in whatsapp-public-groups, containing 300k messages (Garimella & Tyson, 2018). The public dataset contains the anonymised data, collected for around 5 months from around 178 groups. Original messages were made available to us on request to dataset authors for research purposes.
From these datasets, we have pseudo-randomly sampled up to 1300 texts (up to 300 for test split and the remaining up to 1000 for train split if available) for each of the selected 22 languages (using a combination of automated approaches to detect the language) and platform. This process resulted in 61,592 human-written texts, which were further filtered out based on occurrence of some characters or their length, resulting in about 58k human-written texts.
The machine-generated part contains texts generated by 7 LLMs (Aya-101, Gemini-1.0-pro, GPT-3.5-Turbo-0125, Mistral-7B-Instruct-v0.2, opt-iml-max-30b, v5-Eagle-7B-HF, vicuna-13b). All these models were self-hosted except for GPT and Gemini, where we used the publicly available APIs. We generated the texts using 3 paraphrases of the original human-written data and then preprocessed the generated texts (filtered out cases when the generation obviously failed).
The dataset has the following fields:
'text' - a text sample,
'label' - 0 for human-written text, 1 for machine-generated text,
'multi_label' - a string representing a large language model that generated the text or the string "human" representing a human-written text,
'split' - a string identifying train or test split of the dataset for the purpose of training and evaluation respectively,
'language' - the ISO 639-1 language code identifying the detected language of the given text,
'length' - word count of the given text,
'source' - a string identifying the source dataset / platform of the given text,
'potential_noise' - 0 for text without identified noise, 1 for text with potential noise.
ToDo Statistics (under construction)
How much time do people spend on social media? As of 2025, the average daily social media usage of internet users worldwide amounted to 141 minutes per day, down from 143 minutes in the previous year. Currently, the country with the most time spent on social media per day is Brazil, with online users spending an average of 3 hours and 49 minutes on social media each day. In comparison, the daily time spent with social media in the U.S. was just 2 hours and 16 minutes. Global social media usageCurrently, the global social network penetration rate is 62.3 percent. Northern Europe had an 81.7 percent social media penetration rate, topping the ranking of global social media usage by region. Eastern and Middle Africa closed the ranking with 10.1 and 9.6 percent usage reach, respectively. People access social media for a variety of reasons. Users like to find funny or entertaining content and enjoy sharing photos and videos with friends, but mainly use social media to stay in touch with current events friends. Global impact of social mediaSocial media has a wide-reaching and significant impact on not only online activities but also offline behavior and life in general. During a global online user survey in February 2019, a significant share of respondents stated that social media had increased their access to information, ease of communication, and freedom of expression. On the flip side, respondents also felt that social media had worsened their personal privacy, increased a polarization in politics and heightened everyday distractions.