Facebook
TwitterThe statistic represents the types of main problems French children have encountered on social media in 2020 and 2021. Almost ** percent had ever encountered a problem on the Internet in 2020, a number which has more than doubled in 2021. More than half of the sample reported having argued with one or more people through the web, followed by ** percent of children who were insulted online for the year 2020. The same survey asked children which social media activites they engaged the most with, finding out that most of them used these platforms to communicate with their friends and family.
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
This dataset was originally collected for a data science and machine learning project that aimed at investigating the potential correlation between the amount of time an individual spends on social media and the impact it has on their mental health.
The project involves conducting a survey to collect data, organizing the data, and using machine learning techniques to create a predictive model that can determine whether a person should seek professional help based on their answers to the survey questions.
This project was completed as part of a Statistics course at a university, and the team is currently in the process of writing a report and completing a paper that summarizes and discusses the findings in relation to other research on the topic.
The following is the Google Colab link to the project, done on Jupyter Notebook -
https://colab.research.google.com/drive/1p7P6lL1QUw1TtyUD1odNR4M6TVJK7IYN
The following is the GitHub Repository of the project -
https://github.com/daerkns/social-media-and-mental-health
Libraries used for the Project -
Pandas
Numpy
Matplotlib
Seaborn
Sci-kit Learn
Facebook
TwitterA February 2020 survey found that 63 percent of children in the United Kingdom had experienced unwelcome friend, follow or contact requests on social media. Additionally, 55 percent of respondents stated that they had experienced people pretending to be someone else when using online platforms. Furthermore, 48 percent of those asked reported to have experienced bullying, abusive behavior or threats whilst accessing social networking services.
Facebook
TwitterHow many people use social media?
Social media usage is one of the most popular online activities. In 2024, over five billion people were using social media worldwide, a number projected to increase to over six billion in 2028.
Who uses social media?
Social networking is one of the most popular digital activities worldwide and it is no surprise that social networking penetration across all regions is constantly increasing. As of January 2023, the global social media usage rate stood at 59 percent. This figure is anticipated to grow as lesser developed digital markets catch up with other regions
when it comes to infrastructure development and the availability of cheap mobile devices. In fact, most of social media’s global growth is driven by the increasing usage of mobile devices. Mobile-first market Eastern Asia topped the global ranking of mobile social networking penetration, followed by established digital powerhouses such as the Americas and Northern Europe.
How much time do people spend on social media?
Social media is an integral part of daily internet usage. On average, internet users spend 151 minutes per day on social media and messaging apps, an increase of 40 minutes since 2015. On average, internet users in Latin America had the highest average time spent per day on social media.
What are the most popular social media platforms?
Market leader Facebook was the first social network to surpass one billion registered accounts and currently boasts approximately 2.9 billion monthly active users, making it the most popular social network worldwide. In June 2023, the top social media apps in the Apple App Store included mobile messaging apps WhatsApp and Telegram Messenger, as well as the ever-popular app version of Facebook.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This data was collected and analyzed as part of a study on PII disclosures in social media conversations with special attention to influencer characteristics in the interactions in the dissertation titled Privacy vs. Social Capital: Examining Information Disclosure Patterns within Social Media Influencer Networks and the research paper titled Unveiling Influencer-Driven Personal Data Sharing in Social Media Discourse.
Each study phase is different, with X (Twitter) data used in the pilot analysis and Reddit data used in the main study. Both folders will have the analyzed_posts and cluster summary csv files broken down by collection (either based on trend or collection date).
Note: Raw data is not made available in these datasets due to the nature of the study and to protect the original authors.
| Column name | Type | Description |
|---|---|---|
| Node ID | UUID | Unique identifier for post (replaces original platform identifier) |
| User ID | UUID | Unique identifier assigned for user (replaces original platform identifier) |
| Cluster Name | Str | Composite ID for subgraph using collection name and subgraph index |
| Influence Power | Float | Eigenvector centrality |
| Influencer Tier | Str | Categorical label calculated by follower count |
| Collection Name | Str | Trend collection assigned based on search query |
| Hashtags | Set(str) | The set of hashtags included in the node |
| PII Disclosed | Bool | Whether or not PII was disclosed |
| PII Detected | Set(str) | The detected token types in post |
| PII Risk Score | Float | The PII score for all tokens in a post |
| Is Comment | Bool | Whether or not the post is a comment or reply |
| Is Text Starter | Bool | Whether or not the post has text content |
| Community | Str | The group, community, channel, etc. associated with |
| Timestamp | Timestamp | Creation timestamp (provided by social media API) |
| Time Elapsed | Int | Time elapsed (seconds) from original influencer’s post |
| Column Name | Type | Description |
|---|---|---|
| Cluster Name | Str | Composite ID for subgraph using collection name and subgraph index |
| Influencer Tiers Frequencies | List[dict] | Frequency of influencer tiers of all users in the cluster |
| Top Influence Power Score | Float | Eigenvector centrality of top influencer |
| Top Influencer Tier | Str | Size tier of top influencer |
| Collection Name | Str | Trend collection assigned based on search query. |
| Hashtags | Set(str) | The set of hashtags included in the cluster |
| PII Detection Frequencies | List[dict] | The detected token types in post with frequencies |
| Node Count | Int | Count of all nodes in the influencer cluster |
| Node Disclosures | Int | Count of all nodes with mean_risk_score > 1* |
| Disclosure Ratio | Float | Sum of nodes with confirmed disclosed PII divided by overall cluster size (count of nodes in the cluster) |
| Mean Risk Score | Float | The mean risk score for an entire network cluster |
| Median Risk Score | Float | The median risk score for an entire network cluster |
| Min Risk Score | Float | The min risk score for an entire network cluster |
| Max Risk Score | Float | The max risk score for an entire network cluster |
| Time Span | Float | Total Time Elapsed |
Facebook
TwitterMany have argued that digital technologies such as smartphones and social media are addictive. We develop an economic model of digital addiction and estimate it using a randomized experiment. Temporary incentives to reduce social media use have persistent effects, suggesting social media are habit forming. Allowing people to set limits on their future screen time substantially reduces use, suggesting self-control problems. Additional evidence suggests people are inattentive to habit formation and partially unaware of self-control problems. Looking at these facts through the lens of our model suggests that self-control problems cause 31 percent of social media use.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Dataset Description:
This dataset captures the real-world online behavior of teenagers, focusing on e-safety awareness, cybersecurity risks, and device interactions. The data was collected from network activity logs and e-safety monitoring systems across various educational institutions and households in Texas and California. Spanning from January 2017 to October 2024, this dataset includes interactions with social media platforms, educational websites, and other online services, providing an in-depth look at teenage online activities in urban and suburban settings. The dataset is anonymized to protect user privacy and contains real incidents of network threats, security breaches, and cybersecurity behavior patterns observed in teenagers.
Use Cases:
Predicting e-safety awareness and online behavior patterns. Detecting malware exposure risk and cybersecurity vulnerabilities. Analyzing online habits related to social media and internet consumption. Evaluating cybersecurity behaviors like password strength, VPN usage, and phishing attempts. Features Overview:
S.No Feature Name Description 1 Device Type The type of device used during the online session (Mobile, Laptop, Tablet, Desktop, etc.) 2 Malware Detection Whether malware was detected on the device during the session (Yes/No) 3 Phishing Attempts Number of phishing attempts experienced during online activity 4 Social Media Usage Frequency of social media usage (Low, Medium, High) 5 VPN Usage Whether a VPN was used during the session (Yes/No) 6 Cyberbullying Reports Number of reported cyberbullying incidents 7 Parental Control Alerts Number of alerts triggered by parental control software 8 Firewall Logs Number of blocked or allowed network connections by the firewall 9 Login Attempts Number of login attempts during the session 10 Download Risk Risk level associated with downloaded files (Low, Medium, High) 11 Password Strength Strength of the passwords used (Weak, Moderate, Strong) 12 Data Breach Notifications Number of alerts regarding compromised personal information 13 Online Purchase Risk Risk level of online purchases made (Low, Medium, High) 14 Education Content Usage Frequency of engagement with educational content (Low, Medium, High) 15 Age Group Age category of the teenager (Under 13, 13-16, 17-19) 16 Geolocation Location of network access (US, EU, etc.) 17 Public Network Usage Whether the online activity occurred over a public network (Yes/No) 18 Network Type Type of network connection (WiFi, Cellular, etc.) 19 Hours Online Total hours spent online during the session 20 Website Visits Number of websites visited per hour during the session 21 Peer Interactions Level of peer-to-peer interactions during online activity 22 Risky Website Visits Whether visits to risky websites occurred (Yes/No) 23 Cloud Service Usage Whether cloud services were accessed during the session (Yes/No) 24 Unencrypted Traffic Whether unencrypted network traffic was accessed during the session (Yes/No) 25 Ad Clicks Whether online advertisements were clicked during the session (Yes/No) 26 Insecure Login Attempts Number of insecure login attempts made (e.g., over unencrypted networks) Potential Research and Machine Learning Applications:
Cybersecurity and anomaly detection models. Predictive modeling for e-safety awareness and risk behaviors. Time-series analysis of internet consumption and security threat trends. Behavioral clustering and pattern recognition in teenage online activity. Data Collection Method: The data was collected through collaboration with local schools and cybersecurity monitoring agencies. Real-time network monitoring systems captured interactions across different online platforms. All personally identifiable information (PII) was anonymized to ensure privacy, making the dataset ideal for public use in research and machine learning tasks.
This dataset provides a rich foundation for studying teenage online behavior patterns and developing predictive models for cybersecurity awareness and risk mitigation. Researchers and data scientists can use this data to create models that better understand online behavior, identify security risks, and design interventions to improve e-safety for teenagers.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is a generated collection of 1,829 social media posts designed to support research in real-time hate speech classification. It simulates user-generated content from platforms like Twitter and Reddit, labeled into three categories:
Neutral — Harmless or general conversation
Offensive — Rude, aggressive, or insulting but not hate-inducing
Hateful — Strongly derogatory, targeting identity groups, or inciting hate
The dataset includes fields such as post_id, timestamp, platform, user_id, text, label, language, and preprocessed_text. The labels and content enriched with context-specific keywords to mimic real-world tone and intent without including any harmful real content.
This dataset is suitable for:
Training and evaluating NLP models for hate speech detection
Benchmarking text classification pipelines
Testing real-time moderation systems
Research on social media monitoring and safety
Timestamps span only 2022–2024, ensuring no future data contamination, and the data is entirely safe, synthetic, and privacy-compliant.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset explores the relationship between user behavior on social media platforms and their exposure to cybercrimes through malicious online advertisements. It simulates how demographic factors, device usage patterns, authentication habits, and browsing contexts influence the likelihood of clicking on harmful ads or being redirected to malicious sites.
The data aims to assist cybersecurity researchers, data scientists, and social media analysts in understanding online threat patterns, building predictive models, and designing preventive security mechanisms.
Facebook
TwitterFinancial overview and grant giving statistics of Organization for Social Media Safety
Facebook
Twitterhttps://joinup.ec.europa.eu/page/eupl-text-11-12https://joinup.ec.europa.eu/page/eupl-text-11-12
SMDRM - Social Media for Disaster Risk Management
Social media has been described as a form of distributed cognition, a mechanism for understanding a situation using information spread across many minds. The interactions among people in social media are a form of collective intelligence, as they allow people to make sense of a developing event collectively. Social media users can contribute to creating a "sensor" for citizen-generated data that modelling or monitoring systems can assimilate during a crisis. Gaining situational awareness in a disaster is critical and time-sensitive. Social media presents the possibilities of a growing data source to help improve response in the early hours and days of a crisis. However, social media platforms may not provide the functionality of summarising the information that is useful for crisis responders.SMDRM is a software platform that streamlines the processing of text and images extracted from Twitter in near real-time during a specific event. The data is collected using a combination of keywords and locations based on daily forecasts from the early warnings systems of the Copernicus Emergency Management Service such as EFAS, GloFAS and EFFIS (emergency.copernicus.eu) or triggered manually in case of earthquakes or not-forecasted events. Text is automatically "annotated" using a binary multilingual classifier trained on 12 languages and extended with multilingual embeddings. Simultaneously, a multi-class convolutional neural network labels relevant images for floods, storms, earthquakes and fires. The information that doesn't embed coordinates is geolocated in a two-step algorithm where location candidates are first selected using a multilingual named-entity recognition tool and then searched on available gazetteers. The last step of the SMDRM data processing is the aggregation of relevant information in spatial (administrative areas) and temporal (daily) units. Social media activity about an event can finally be distributed as a data map and visualised on a map server and made available to users.SMDRM could offer timely information useful for reducing the hazard models' uncertainty and providing added-value information such as reports or descriptions of the situation on the ground or in the vicinity. Other stakeholders, such as research groups could access new data to complement the ones extracted from traditional sensors or earth observation. The platform can adapt to cope with the varying workload as it uses scalable software containers. If the number of tweets is higher during an impactful event, the platform can use more containers to annotate them. SMDR code, together with the tens of thousands of annotated social media messages used for training its models, will be released as an open-source platform whose modules can be adapted to serve other research projects. We describe the platform's architecture and implementation details, and two use cases where images and text were used as a use-case to test the system's modules.
Source https://ui.adsabs.harvard.edu/abs/2021EGUGA..2315012L/abstract
Facebook
TwitterA March 2023 survey of internet users in Canada found that 34 percent of LinkedIn users in Canada felt very safe from online harassment when using the business networking platform. Only 14 percent of respondents stated the same about Facebook.
Facebook
TwitterThe global social media penetration rate in was forecast to continuously increase between 2024 and 2028 by in total 11.6 (+18.19 percent). After the ninth consecutive increasing year, the penetration rate is estimated to reach 75.31 and therefore a new peak in 2028. Notably, the social media penetration rate of was continuously increasing over the past years.
Facebook
TwitterBy CrowdFlower [source]
Welcome to the disaster tweets dataset! This collection of tweets holds a wealth of information about global disasters and their effects on people, governments, and organizations all over the world. With over 10,000 tweets collected and carefully annotated with labels of whether they reported an actual disaster or not, this dataset provides unique insight into what these events look like in terms of social media conversations.
This information is derived from a variety of key terms related to disaster events, such as “ablaze” and “pandemonium” which was used to gather each individual tweet for analysis. The columns for each tweet include detailed metadata about the user who posted it along with variables such as keyword relevance and location. Alongside all these attributes is the core text belonging to each individual tweet- giving you access to all sorts of stories from natural disasters, contagious disease outbreaks or conflicts between nations that can be found in one place!
So whatever you're looking for - whether it's observations about first-hand accounts or conducting research on public sentiment during a major event - this dataset offers you an invaluable source full of timely information that could potentially save lives down the line. So take your journey through this data now and embark upon discovering what devastation looks like through social media!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset contains tweets related to disaster events, including the keyword, location, text, tweetid and userid. It provides insights into how people interact with each other on social media during a disaster. Using this dataset you can gain valuable insight into the dynamics of online communication in disasters and provide an important point of reference for future disaster management initiatives.
- Analyzing the effectiveness of disaster relief and humanitarian aid efforts, by mapping tweets against public data of areas affected by disasters and donations made to help those affected.
- Developing advanced statistical models to predict the magnitude and impact of an oncoming natural disaster using keyword analysis in social media posts related to past disasters.
- Creating text-based classifiers to accurately detect disaster-related tweets in real-time, allowing emergency services providers early warning signs before a potential event occurs
If you use this dataset in your research, please credit the original authors. Data Source
Unknown License - Please check the dataset description for more information.
File: socialmedia-disaster-tweets-DFE.csv | Column name | Description | |:-----------------------|:-----------------------------------------------------------------------------------| | _golden | A boolean value indicating whether the tweet is a golden tweet or not. (Boolean) | | _unit_state | The state of the tweet (e.g. finalized, judged, etc.). (String) | | _trusted_judgments | The number of trusted judgments for the tweet. (Integer) | | _last_judgment_at | The date and time of the last judgment for the tweet. (DateTime) | | choose_one | The label assigned to the tweet (e.g. relevant, not relevant, etc.). (String) | | choose_one_gold | The gold label assigned to the tweet (e.g. relevant, not relevant, etc.). (String) | | keyword | The keyword associated with the tweet. (String) | | location | The location associated with the tweet. (String) | | text | The text content of the tweet. (String) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit CrowdFlower.
Facebook
Twitterhttps://www.sci-tech-today.com/privacy-policyhttps://www.sci-tech-today.com/privacy-policy
Mental Health Statistics: Mental health is a state of mental welfare that enables people to cope with the issues of life, develop this ability, learn well, work well, and also contribute to their community. It has inherent and instrumental value, and it is an inner part of our overall welfare. Anytime a different set of family, individual, structured factors, and community may come together to protect or decrease the mental health issue.
Even though many people are tough, people who are exposed to unfavorable circumstances that also include violence, disability, and poverty are at a very high risk of developing a mental health condition. Mental health care is generally of poor quality when it is delivered. People with mental health issues generally also witness discrimination, stigma, and human rights violations. In this article, we shed more light on mental health statistics.
Facebook
TwitterIn these polarized and challenging times, not even perceptions of personal risk are immune to partisanship. This article introduces results from a new survey with an embedded social media experiment conducted during the first months of the COVID-19 pandemic in Brazil. Descriptive results show that pro-government and opposition partisans report very different expectations of health and job risks. Job and health policy have become wedge issues that elicit partisan responses. The analysis exploits random variation in the survey recruitment to show the effects of the president’s first speech on national television on the perceived risk and the moderating effect of partisanship. The article presents a framing experiment that models key cognitive mechanisms driving partisan differences in perceptions of health risks and job security during the COVID-19 crisis.
Facebook
TwitterHow much time do people spend on social media?
As of 2024, the average daily social media usage of internet users worldwide amounted to 143 minutes per day, down from 151 minutes in the previous year. Currently, the country with the most time spent on social media per day is Brazil, with online users spending an average of three hours and 49 minutes on social media each day. In comparison, the daily time spent with social media in
the U.S. was just two hours and 16 minutes. Global social media usageCurrently, the global social network penetration rate is 62.3 percent. Northern Europe had an 81.7 percent social media penetration rate, topping the ranking of global social media usage by region. Eastern and Middle Africa closed the ranking with 10.1 and 9.6 percent usage reach, respectively.
People access social media for a variety of reasons. Users like to find funny or entertaining content and enjoy sharing photos and videos with friends, but mainly use social media to stay in touch with current events friends. Global impact of social mediaSocial media has a wide-reaching and significant impact on not only online activities but also offline behavior and life in general.
During a global online user survey in February 2019, a significant share of respondents stated that social media had increased their access to information, ease of communication, and freedom of expression. On the flip side, respondents also felt that social media had worsened their personal privacy, increased a polarization in politics and heightened everyday distractions.
Facebook
Twitterhttps://www.mordorintelligence.com/privacy-policyhttps://www.mordorintelligence.com/privacy-policy
Content Moderation Market is Segmented by Type (Solutions and Services), Deployment (Cloud and On-Premises), Content Format (Image, Text, and More), End-User Enterprise Size (Large Enterprises, and Small and Medium Enterprises), End-Use Industry (Social Media and Communities, Gaming and Esports Platforms, and More), and by Geography. The Market Forecasts are Provided in Terms of Value (USD).
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
For this experiment, a used as a Google form questionnaire based dataset collected from
https://www.kaggle.com/datasets/parvezalmuqtadir2348/postpartum-depression . Here, google form was used to deliver a questionnaire that collects 1503 records from a medical institution . Out of the fifteen characteristics in the dataset, ten were chosen, nine of which were utilized for analysis and one of which was the objective feature.
In addition , relevant postpartum-depression keywords are used for extracting related post from social media (instagram and twitter) including terms for Symptoms of Postpartum Depression and Risk Factors for Depression. To extract the keywords tweets and comments from the twitter and instagram, this model adopts for the API function Posted by Health Care Professionals.
For details refer the below published paper Suganthi, D. and Geetha, A., 2024. Predicting Postpartum Depression with Aid of Social Media Texts Using Optimized Machine Learning Model. International Journal of Intelligent Engineering & Systems, 17(3). DOI: 10.22266/ijies2024.0630.33
Dataset DOI:https://doi.org/10.34740/kaggle/dsv/13404841
Facebook
Twitterhttps://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Social Media Security Market size was valued at USD 1.26 Billion in 2024 and is projected to reach USD 2.91 Billion by 2031, growing at a CAGR of 11.10% from 2024 to 2031.Global Social Media Security Market DriversThe market drivers for the Social Media Security Market can be influenced by various factors. These may include:Growing Cyber Threats: As a result of the increase in cyberattacks directed at social media sites, there is a rising need for strong security solutions to safeguard user data and sensitive information.Growing Awareness: As people and organisations become more conscious of the dangers of using social media, they spend money on security measures to protect their online presence.harsher legislation: Businesses are being forced to strengthen their social media security measures as a result of governments and regulatory agencies enforcing harsher legislation and compliance requirements for data protection and privacy.Extending Digital Transformation: Social media platform usage for business reasons is being driven by the continuous digital transformation occurring across industries. Security solutions are even more important in order to reduce the threats that could arise from online interactions.Growing Uptake of BYOD Guidelines: Employees can now access social media sites from their own devices thanks to the increased acceptance of Bring Your Own Device (BYOD) regulations in the workplace, which makes corporate networks more susceptible to security breaches.Advanced Threat Emergence: Cybercriminals are always changing their strategies to take advantage of holes in social media networks. The need for sophisticated security solutions that can identify and neutralise sophisticated threats instantly has resulted from this.Brand Reputation Concerns: Companies are aware of how social media security events affect their reputation and the trust of their customers. They therefore have a tendency to spend money on security measures in order to guard against any breaches that can damage their reputation.Growth of E-commerce: Cybercriminals seeking to take advantage of weaknesses in payment systems and consumer data have been drawn to the e-commerce operations that have proliferated on social media platforms. As a result, security solutions are becoming more and more necessary to guarantee safe transactions and safeguard private data.
Facebook
TwitterThe statistic represents the types of main problems French children have encountered on social media in 2020 and 2021. Almost ** percent had ever encountered a problem on the Internet in 2020, a number which has more than doubled in 2021. More than half of the sample reported having argued with one or more people through the web, followed by ** percent of children who were insulted online for the year 2020. The same survey asked children which social media activites they engaged the most with, finding out that most of them used these platforms to communicate with their friends and family.