80 datasets found

Daily Social Media Active Users
kaggle.com
zip
Updated May 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shaik Barood Mohammed Umar Adnaan Faiz (2025). Daily Social Media Active Users [Dataset]. https://www.kaggle.com/datasets/umeradnaan/daily-social-media-active-users
Explore at:
zip(126814 bytes)Available download formats
Dataset updated
May 5, 2025
Authors
Shaik Barood Mohammed Umar Adnaan Faiz
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Description:

The "Daily Social Media Active Users" dataset provides a comprehensive and dynamic look into the digital presence and activity of global users across major social media platforms. The data was generated to simulate real-world usage patterns for 13 popular platforms, including Facebook, YouTube, WhatsApp, Instagram, WeChat, TikTok, Telegram, Snapchat, X (formerly Twitter), Pinterest, Reddit, Threads, LinkedIn, and Quora. This dataset contains 10,000 rows and includes several key fields that offer insights into user demographics, engagement, and usage habits.

Dataset Breakdown:

Platform: The name of the social media platform where the user activity is tracked. It includes globally recognized platforms, such as Facebook, YouTube, and TikTok, that are known for their large, active user bases.

Owner: The company or entity that owns and operates the platform. Examples include Meta for Facebook, Instagram, and WhatsApp, Google for YouTube, and ByteDance for TikTok.

Primary Usage: This category identifies the primary function of each platform. Social media platforms differ in their primary usage, whether it's for social networking, messaging, multimedia sharing, professional networking, or more.

Country: The geographical region where the user is located. The dataset simulates global coverage, showcasing users from diverse locations and regions. It helps in understanding how user behavior varies across different countries.

Daily Time Spent (min): This field tracks how much time a user spends on a given platform on a daily basis, expressed in minutes. Time spent data is critical for understanding user engagement levels and the popularity of specific platforms.

Verified Account: Indicates whether the user has a verified account. This feature mimics real-world patterns where verified users (often public figures, businesses, or influencers) have enhanced status on social media platforms.

Date Joined: The date when the user registered or started using the platform. This data simulates user account history and can provide insights into user retention trends or platform growth over time.

Context and Use Cases:

This synthetic dataset is designed to offer a privacy-friendly alternative for analytics, research, and machine learning purposes. Given the complexities and privacy concerns around using real user data, especially in the context of social media, this dataset offers a clean and secure way to develop, test, and fine-tune applications, models, and algorithms without the risks of handling sensitive or personal information.

Researchers, data scientists, and developers can use this dataset to:

Model User Behavior: By analyzing patterns in daily time spent, verified status, and country of origin, users can model and predict social media engagement behavior.

Test Analytics Tools: Social media monitoring and analytics platforms can use this dataset to simulate user activity and optimize their tools for engagement tracking, reporting, and visualization.

Train Machine Learning Algorithms: The dataset can be used to train models for various tasks like user segmentation, recommendation systems, or churn prediction based on engagement metrics.

Create Dashboards: This dataset can serve as the foundation for creating user-friendly dashboards that visualize user trends, platform comparisons, and engagement patterns across the globe.

Conduct Market Research: Business intelligence teams can use the data to understand how various demographics use social media, offering valuable insights into the most engaged regions, platform preferences, and usage behaviors.

Sources of Inspiration: This dataset is inspired by public data from industry reports, such as those from Statista, DataReportal, and other market research platforms. These sources provide insights into the global user base and usage statistics of popular social media platforms. The synthetic nature of this dataset allows for the use of realistic engagement metrics without violating any privacy concerns, making it an ideal tool for educational, analytical, and research purposes.

The structure and design of the dataset are based on real-world usage patterns and aim to represent a variety of users from different backgrounds, countries, and activity levels. This diversity makes it an ideal candidate for testing data-driven solutions and exploring social media trends.

Future Considerations:

As the social media landscape continues to evolve, this dataset can be updated or extended to include new platforms, engagement metrics, or user behaviors. Future iterations may incorporate features like post frequency, follower counts, engagement rates (likes, comments, shares), or even sentiment analysis from user-generated content.

By leveraging this dataset, analysts and data scientists can create better, more effective strategies ...
Social Media Dataset
kaggle.com
zip
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nixie6254 (2025). Social Media Dataset [Dataset]. https://www.kaggle.com/datasets/nixie6254/social-media-dataset
Explore at:
zip(28057 bytes)Available download formats
Dataset updated
Apr 17, 2025
Authors
Nixie6254
Description
This dataset consists of 734 entries representing social media activity and performance from a local SME (Micro, Small, and Medium Enterprise) across TikTok, Instagram, and Twitter platforms. It captures key metrics related to audience interaction and content strategy effectiveness, and is valuable for evaluating and optimizing digital marketing efforts for small businesses.

Area : Target location or customer region where the UMKM's content is directed. Category : The business content category (e.g., product promotion, education, seasonal campaign). Day : The day of the week the content was published. Month : The month the post went live. Platform : The social media platform used by the UMKM (TikTok, Instagram, or Twitter). Post Type : The format of the content posted: image, video, carousel, or text. Timestamp : The exact date and time when the content was posted. User : The username or business account that posted the content. Week : Week number within the year for time-based analysis. Year : The year the content was posted. Comments : Total number of comments received on the post. Engagement Rate : A calculated metric showing how engaging the content is (based on likes, comments, shares vs. reach/impressions). Hour : Hour of the day the post was published. Impressions : Number of times the content appeared on users' feeds. Likes : Number of likes the post received. Reach : Number of unique users who saw the content. Shares : Number of times users shared the content.

Number of global social network users 2017-2028

statista.com
de.statista.com

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Stacy Jo Dixon, Number of global social network users 2017-2028 [Dataset]. https://www.statista.com/topics/1164/social-networks/

Explore at:

Dataset provided by

Statistahttp://statista.com/

Authors

Stacy Jo Dixon

Description

How many people use social media?

              Social media usage is one of the most popular online activities. In 2024, over five billion people were using social media worldwide, a number projected to increase to over six billion in 2028.

              Who uses social media?
              Social networking is one of the most popular digital activities worldwide and it is no surprise that social networking penetration across all regions is constantly increasing. As of January 2023, the global social media usage rate stood at 59 percent. This figure is anticipated to grow as lesser developed digital markets catch up with other regions
              when it comes to infrastructure development and the availability of cheap mobile devices. In fact, most of social media’s global growth is driven by the increasing usage of mobile devices. Mobile-first market Eastern Asia topped the global ranking of mobile social networking penetration, followed by established digital powerhouses such as the Americas and Northern Europe.

              How much time do people spend on social media?
              Social media is an integral part of daily internet usage. On average, internet users spend 151 minutes per day on social media and messaging apps, an increase of 40 minutes since 2015. On average, internet users in Latin America had the highest average time spent per day on social media.

              What are the most popular social media platforms?
              Market leader Facebook was the first social network to surpass one billion registered accounts and currently boasts approximately 2.9 billion monthly active users, making it the most popular social network worldwide. In June 2023, the top social media apps in the Apple App Store included mobile messaging apps WhatsApp and Telegram Messenger, as well as the ever-popular app version of Facebook.

Social Media Behavior Dataset
kaggle.com
zip
Updated Nov 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shibin Shereef (2024). Social Media Behavior Dataset [Dataset]. https://www.kaggle.com/datasets/shibinshereef1/social-media-behavior-dataset
Explore at:
zip(7429 bytes)Available download formats
Dataset updated
Nov 25, 2024
Authors
Shibin Shereef
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset contains 600 synthetic entries simulating social media activity across three major platforms: Twitter, Reddit, and Instagram. The data was generated to analyze trends, sentiments, and user engagement patterns based on hashtags and posts. It can be useful for researchers, data analysts, and machine learning enthusiasts interested in studying social media behavior.

Dataset Structure The dataset includes the following columns:

Date: The date of the post, ranging across a simulated timeline. Platform: The social media platform where the post was made (Twitter, Reddit, or Instagram). Hashtag: The main hashtag associated with the post, such as #AI, #MachineLearning, or #Python. Post Content: The text of the post, crafted to simulate common social media interactions. Sentiment: The sentiment of the post, classified as Positive, Neutral, or Negative. Likes: The number of likes the post received. Shares: The number of shares or retweets the post received. Potential Use Cases Sentiment analysis: Train machine learning models to detect sentiment in text. Hashtag popularity analysis: Determine which hashtags are most commonly used or generate the most engagement. Engagement trends: Explore correlations between post sentiment and engagement metrics (likes/shares). Platform comparison: Compare user behavior across different social media platforms. Acknowledgments This dataset is fully synthetic and was generated using Python. It does not contain any real user data and is intended for educational and research purposes.
Global social network penetration 2019-2028
statista.com
de.statista.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stacy Jo Dixon, Global social network penetration 2019-2028 [Dataset]. https://www.statista.com/topics/1164/social-networks/
Explore at:
Dataset provided by
Statistahttp://statista.com/
Authors
Stacy Jo Dixon
Description
The global social media penetration rate in was forecast to continuously increase between 2024 and 2028 by in total 11.6 (+18.19 percent). After the ninth consecutive increasing year, the penetration rate is estimated to reach 75.31 and therefore a new peak in 2028. Notably, the social media penetration rate of was continuously increasing over the past years.

Average daily time spent on social media worldwide 2012-2024

statista.com
de.statista.com

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Stacy Jo Dixon, Average daily time spent on social media worldwide 2012-2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/

Explore at:

Dataset provided by

Statistahttp://statista.com/

Authors

Stacy Jo Dixon

Description

How much time do people spend on social media?

              As of 2024, the average daily social media usage of internet users worldwide amounted to 143 minutes per day, down from 151 minutes in the previous year. Currently, the country with the most time spent on social media per day is Brazil, with online users spending an average of three hours and 49 minutes on social media each day. In comparison, the daily time spent with social media in
              the U.S. was just two hours and 16 minutes. Global social media usageCurrently, the global social network penetration rate is 62.3 percent. Northern Europe had an 81.7 percent social media penetration rate, topping the ranking of global social media usage by region. Eastern and Middle Africa closed the ranking with 10.1 and 9.6 percent usage reach, respectively.
              People access social media for a variety of reasons. Users like to find funny or entertaining content and enjoy sharing photos and videos with friends, but mainly use social media to stay in touch with current events friends. Global impact of social mediaSocial media has a wide-reaching and significant impact on not only online activities but also offline behavior and life in general.
              During a global online user survey in February 2019, a significant share of respondents stated that social media had increased their access to information, ease of communication, and freedom of expression. On the flip side, respondents also felt that social media had worsened their personal privacy, increased a polarization in politics and heightened everyday distractions.

YouTube Social Network with Communities (SNAP)
kaggle.com
zip
Updated Dec 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Subhajit Sahu (2021). YouTube Social Network with Communities (SNAP) [Dataset]. https://www.kaggle.com/datasets/wolfram77/graphs-snap-com-youtube
Explore at:
zip(13777811 bytes)Available download formats
Dataset updated
Dec 16, 2021
Authors
Subhajit Sahu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
YouTube
Description
Youtube social network and ground-truth communities

https://snap.stanford.edu/data/com-Youtube.html

Dataset information

Youtube (http://www.youtube.com/) is a video-sharing web site that includes a social network. In the Youtube social network, users form friendship each other and users can create groups which other users can join. We consider
such user-defined groups as ground-truth communities. This data is provided by Alan Mislove et al.
(http://socialnetworks.mpi-sws.org/data-imc2007.html)

We regard each connected component in a group as a separate ground-truth
community. We remove the ground-truth communities which have less than 3
nodes. We also provide the top 5,000 communities with highest quality
which are described in our paper (http://arxiv.org/abs/1205.6233). As for
the network, we provide the largest connected component.

Network statistics
Nodes 1,134,890
Edges 2,987,624
Nodes in largest WCC 1134890 (1.000)
Edges in largest WCC 2987624 (1.000)
Nodes in largest SCC 1134890 (1.000)
Edges in largest SCC 2987624 (1.000)
Average clustering coefficient 0.0808
Number of triangles 3056386
Fraction of closed triangles 0.002081
Diameter (longest shortest path) 20
90-percentile effective diameter 6.5
Community statistics
Number of communities 8,385
Average community size 13.50
Average membership size 0.10

Source (citation)
J. Yang and J. Leskovec. Defining and Evaluating Network Communities based on Ground-truth. ICDM, 2012. http://arxiv.org/abs/1205.6233

Files
File Description
com-youtube.ungraph.txt.gz Undirected Youtube network
com-youtube.all.cmty.txt.gz Youtube communities
com-youtube.top5000.cmty.txt.gz Youtube communities (Top 5,000)

Notes on inclusion into the SuiteSparse Matrix Collection, July 2018:

The graph in the SNAP data set is 1-based, with nodes numbered 1 to
1,157,827.

In the SuiteSparse Matrix Collection, Problem.A is the undirected Youtube
network, a matrix of size n-by-n with n=1,134,890, which is the number of
unique user id's appearing in any edge.

Problem.aux.nodeid is a list of the node id's that appear in the SNAP data set. A(i,j)=1 if person nodeid(i) is friends with person nodeid(j). The
node id's are the same as the SNAP data set (1-based).

C = Problem.aux.Communities_all is a sparse matrix of size n by 16,386
which represents the communities in the com-youtube.all.cmty.txt file.
The kth line in that file defines the kth community, and is the column
C(:,k), where C(i,k)=1 if person ...
Social Media Datasets
brightdata.com
.json, .csv, .xlsx
Updated Sep 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2022). Social Media Datasets [Dataset]. https://brightdata.com/products/datasets/social-media
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Sep 7, 2022
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Gain valuable insights with our comprehensive Social Media Dataset, designed to help businesses, marketers, and analysts track trends, monitor engagement, and optimize strategies. This dataset provides structured and reliable social media data from multiple platforms.

Dataset Features

User Profiles: Access public social media profiles, including usernames, bios, follower counts, engagement metrics, and more. Ideal for audience analysis, influencer marketing, and competitive research. Posts & Content: Extract posts, captions, hashtags, media (images/videos), timestamps, and engagement metrics such as likes, shares, and comments. Useful for trend analysis, sentiment tracking, and content strategy optimization. Comments & Interactions: Analyze user interactions, including replies, mentions, and discussions. This data helps brands understand audience sentiment and engagement patterns. Hashtag & Trend Tracking: Monitor trending hashtags, topics, and viral content across platforms to stay ahead of industry trends and consumer interests.

Customizable Subsets for Specific Needs Our Social Media Dataset is fully customizable, allowing you to filter data based on platform, region, keywords, engagement levels, or specific user profiles. Whether you need a broad dataset for market research or a focused subset for brand monitoring, we tailor the dataset to your needs.

Popular Use Cases

Brand Monitoring & Reputation Management: Track brand mentions, customer feedback, and sentiment analysis to manage online reputation effectively. Influencer Marketing & Audience Analysis: Identify key influencers, analyze engagement metrics, and optimize influencer partnerships. Competitive Intelligence: Monitor competitor activity, content performance, and audience engagement to refine marketing strategies. Market Research & Consumer Insights: Analyze social media trends, customer preferences, and emerging topics to inform business decisions. AI & Predictive Analytics: Leverage structured social media data for AI-driven trend forecasting, sentiment analysis, and automated content recommendations.

Whether you're tracking brand sentiment, analyzing audience engagement, or monitoring industry trends, our Social Media Dataset provides the structured data you need. Get started today and customize your dataset to fit your business objectives.
Dataset of directed signed networks from social domain
figshare.com
zip
Updated Sep 4, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samin Aref; Ly Dinh; Rezvaneh Rezapour (2020). Dataset of directed signed networks from social domain [Dataset]. http://doi.org/10.6084/m9.figshare.12152628.v3
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12152628.v3
Dataset updated
Sep 4, 2020
Dataset provided by
Figsharehttp://figshare.com/
Authors
Samin Aref; Ly Dinh; Rezvaneh Rezapour
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains a range of directed signed networks (signed digraphs) from social domain. The data come from 9 different sources and in total there are 29 network files. There are two temporal networks and one multilayer network in this dataset. Each network is provided in two formats: edgelist (.csv) and .gml format.This dataset is provided under a CC BY-NC-SA Creative Commons v 4.0 license (Attribution-NonCommercial-ShareAlike). This means that other individuals may remix, tweak, and build upon these data non-commercially, as long as they provide citations to this data repository (https://doi.org/10.6084/m9.figshare.12152628) and the reference article listed below (https://doi.org/10.1038/s41598-020-71838-6), and license the new creations under the identical terms.For more information about the data, one may refer to the article below:Samin Aref, Ly Dinh, Rezvaneh Rezapour, and Jana Diesner. "Multilevel Structural Evaluation of Signed Directed Social Networks based on Balance Theory" Scientific Reports (2020) https://doi.org/10.1038/s41598-020-71838-6
U.S. Facebook data requests from government agencies 2013-2023
statista.com
de.statista.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stacy Jo Dixon, U.S. Facebook data requests from government agencies 2013-2023 [Dataset]. https://www.statista.com/topics/1164/social-networks/
Explore at:
Dataset provided by
Statistahttp://statista.com/
Authors
Stacy Jo Dixon
Description
Facebook received 73,390 user data requests from federal agencies and courts in the United States during the second half of 2023. The social network produced some user data in 88.84 percent of requests from U.S. federal authorities. The United States accounts for the largest share of Facebook user data requests worldwide.
H
Replication data for: Power Positions: International Organizations, Social...
dataverse.harvard.edu
Updated Nov 28, 2007
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emilie M. Hafner-Burton; Alexander Montgomery (2007). Replication data for: Power Positions: International Organizations, Social Networks, and Conflict [Dataset]. http://doi.org/10.7910/DVN/B9YYYH
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/B9YYYH
Dataset updated
Nov 28, 2007
Dataset provided by
Harvard Dataverse
Authors
Emilie M. Hafner-Burton; Alexander Montgomery
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
1885 - 1992
Description
A growing number of international relations scholars argue that intergovernmental organizations (IGOs) promote peace. Existing approaches emphasize IGO membership as an important causal attribute of individual states, much like economic development and regime type. The authors draw up on social network analysis, arguing that conflicts between states are also shaped by relative positions of social power created by IGO memberships and characterized by significant disparity. Membership partitions states into structurally equivalent clusters and establishes hierarchies of prestige in the international system. These relative positions promote common beliefs and alter the distribution of social power, making certain policy strategies more practical or rational. The authors introduce new IGO relational data and explore the empirical merits of their approach during the period from 1885 to 1992. They demonstrate that conflict is increased by the presence of many other states in structurally equivalent clusters, while large prestige disparities and in-group favoritism decrease it.

Social media as a news outlet worldwide 2024

statista.com
de.statista.com

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Amy Watson, Social media as a news outlet worldwide 2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/

Explore at:

Dataset provided by

Statistahttp://statista.com/

Authors

Amy Watson

Description

During a 2024 survey, 77 percent of respondents from Nigeria stated that they used social media as a source of news. In comparison, just 23 percent of Japanese respondents said the same. Large portions of social media users around the world admit that they do not trust social platforms either as media sources or as a way to get news, and yet they continue to access such networks on a daily basis.

              Social media: trust and consumption

              Despite the majority of adults surveyed in each country reporting that they used social networks to keep up to date with news and current affairs, a 2018 study showed that social media is the least trusted news source in the world. Less than 35 percent of adults in Europe considered social networks to be trustworthy in this respect, yet more than 50 percent of adults in Portugal, Poland, Romania, Hungary, Bulgaria, Slovakia and Croatia said that they got their news on social media.

              What is clear is that we live in an era where social media is such an enormous part of daily life that consumers will still use it in spite of their doubts or reservations. Concerns about fake news and propaganda on social media have not stopped billions of users accessing their favorite networks on a daily basis.
              Most Millennials in the United States use social media for news every day, and younger consumers in European countries are much more likely to use social networks for national political news than their older peers.
              Like it or not, reading news on social is fast becoming the norm for younger generations, and this form of news consumption will likely increase further regardless of whether consumers fully trust their chosen network or not.

LiveJournal Social Network with Communities (SNAP)
kaggle.com
zip
Updated Dec 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Subhajit Sahu (2021). LiveJournal Social Network with Communities (SNAP) [Dataset]. https://www.kaggle.com/datasets/wolfram77/graphs-snap-com-livejournal
Explore at:
zip(162104147 bytes)Available download formats
Dataset updated
Dec 16, 2021
Authors
Subhajit Sahu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
LiveJournal social network and ground-truth communities

https://snap.stanford.edu/data/com-LiveJournal.html

Dataset information

LiveJournal (http://www.livejournal.com/) is a free on-line blogging
community where users declare friendship each other. LiveJournal also
allows users form a group which other members can then join. We consider
such user-defined groups as ground-truth communities. We provide the
LiveJournal friendship social network and ground-truth communities.

We regard each connected component in a group as a separate ground-truth
community. We remove the ground-truth communities which have less than 3
nodes. We also provide the top 5,000 communities with highest quality
which are described in our paper (http://arxiv.org/abs/1205.6233). As for
the network, we provide the largest connected component.

Dataset statistics
Nodes 3,997,962
Edges 34,681,189
Nodes in largest WCC 3997962 (1.000)
Edges in largest WCC 34681189 (1.000)
Nodes in largest SCC 3997962 (1.000)
Edges in largest SCC 34681189 (1.000)
Average clustering coefficient 0.2843
Number of triangles 177820130
Fraction of closed triangles 0.04559
Diameter (longest shortest path) 17
90-percentile effective diameter 6.5

Source (citation)
J. Yang and J. Leskovec. Defining and Evaluating Network Communities based on Ground-truth. ICDM, 2012. http://arxiv.org/abs/1205.6233

Files
File Description
com-lj.ungraph.txt.gz Undirected LiveJournal network
com-lj.all.cmty.txt.gz LiveJournal communities
com-lj.top5000.cmty.txt.gz LiveJournal communities (Top 5,000)

Notes on inclusion into the SuiteSparse Matrix Collection, July 2018:

The graph in the SNAP data set is 0-based, with nodes numbering 0 to
4,036,537.

In the SuiteSparse Matrix Collection, Problem.A is the undirected
LiveJournal network, a matrix of size n-by-n with n=3,997,962, which is
the number of unique user id's appearing in any edge.

Problem.aux.nodeid is a list of the node id's that appear in the SNAP data set. A(i,j)=1 if person nodeid(i) is friends with person nodeid(j). The
node id's are the same as the SNAP data set (0-based).

C = Problem.aux.Communities_all is a sparse matrix of size n by 664,414
which represents the communities in the com-lj.all.cmty.txt file. The kth line in that file defines the kth community, and is the column C(:,k),
where C(i,k)=1 if person nodeid(i) is in the kth community. Row C(i,:)
and row/column i of the A matrix thus refer to the same person, nodeid(i).

Ctop = Problem.aux.Communities_top5000 is n-by-5000, with the same
structure as the C array above, with the content of the
com-lj.top5000.cmty.txt file.
s
A Blue Start: A large-scale pairwise and higher-order social network dataset...
socialmediaarchive.org
csv, json, pdf, zip
Updated May 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). A Blue Start: A large-scale pairwise and higher-order social network dataset [Dataset]. https://socialmediaarchive.org/record/78
Explore at:
json(619104394), zip(8502960667), json(132710350), pdf(73964), csv(85077755)Available download formats
Dataset updated
May 9, 2025
Description
This dataset consists of all starter packs and all following network data available on Bluesky in January and February 2025. Starter packs can be created by any Bluesky user. They are lists of users and curated feeds with a minimum of 6 and a maximum of 150 users, curated by the starter pack creator. The creator typically names them and provides a description. Other users can use a single click to follow all users in the starter pack, or they can scroll through a specific starter pack to decide who to follow within that starter pack. In our dataset, all DIDs (persistent, unique identifiers) are anonymized with a non-reversible hash function; users in the network, as well as users who created starter packs, or appear in starter packs, are identified by their hashed DIDs. Similarly, starter packs themselves are identified by their hashed identifiers.

First, we include the Bluesky following network as it appeared in late January/early February 2025. This shows all available directed following relationships on Bluesky. We also include a network dataset of starter packs with information on creators and starter pack members. This is intended for users who wish to undertake a computational analysis of the networks created by starter packs or starter packs’ influences on networks.
s
Dataset for Social Media Activity, Number of Friends, and Relationship...
eprints.soton.ac.uk
Updated Jul 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elder, Lindsay; Brignell, Catherine; Cooke, Tim (2022). Dataset for Social Media Activity, Number of Friends, and Relationship Quality [Dataset]. http://doi.org/10.5258/SOTON/D1955
Explore at:
Unique identifier
https://doi.org/10.5258/SOTON/D1955
Dataset updated
Jul 8, 2022
Dataset provided by
University of Southampton
Authors
Elder, Lindsay; Brignell, Catherine; Cooke, Tim
Description
The data from my thesis. This data was collected using the Lifeguide Software and exported onto SPSS following data collection. The data was collected from young people aged 11-18 years old to explore the impact of different types of social media use.
GitHub Social Network
kaggle.com
Updated Jan 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gitanjali Wadhwa (2023). GitHub Social Network [Dataset]. https://www.kaggle.com/datasets/gitanjali1425/github-social-network-graph-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 12, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Gitanjali Wadhwa
Description
Description

An extensive social network of GitHub developers was collected from the public API in June 2019. Nodes are developers who have starred at most minuscule 10 repositories, and edges are mutual follower relationships between them. The vertex features are extracted based on the location; repositories starred, employer and e-mail address. The task related to the graph is binary node classification - one has to predict whether the GitHub user is a web or a machine learning developer. This targeting feature was derived from the job title of each user.

Properties

Directed: No.

Node features: Yes.

Edge features: No.

Node labels: Yes. Binary-labeled.

Temporal: No.

Nodes: 37,700

Edges: 289,003

Density: 0.001

Transitvity: 0.013

Possible Tasks

Binary node classification

Link prediction

Community detection

Network visualisation
S
Social media profile growth, engagement rate, and reach
data.sugarlandtx.gov
xlsx
Updated Jan 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Communications and Community Engagement (2024). Social media profile growth, engagement rate, and reach [Dataset]. https://data.sugarlandtx.gov/dataset/social-media-profile-growth-engagement-rate-and-reach
Explore at:
xlsxAvailable download formats
Dataset updated
Jan 3, 2024
Dataset authored and provided by
Communications and Community Engagement
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Profile growth - the growth on our social platforms to see where and when we're gaining followers. Engagement rate - a ratio of how many people interacted with ours posts based on when users are usually online. Reach - the number of feeds our posts appeared in (doesn't mean people interacted with the post).
Planned changes in use of selected social media for organic marketing...
statista.com
de.statista.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christopher Ross, Planned changes in use of selected social media for organic marketing worldwide 2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/
Explore at:
Dataset provided by
Statistahttp://statista.com/
Authors
Christopher Ross
Description
During a January 2024 global survey among marketers, nearly 60 percent reported plans to increase their organic use of YouTube for marketing purposes in the following 12 months. LinkedIn and Instagram followed, respectively mentioned by 57 and 56 percent of the respondents intending to use them more. According to the same survey, Facebook was the most important social media platform for marketers worldwide.
d
Removal and Enforcement Actions by Social Media Companies: Year and Month...
dataful.in
Updated Nov 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataful (Factly) (2025). Removal and Enforcement Actions by Social Media Companies: Year and Month wise Number of Content Removed and Accounts Banned/Suspended by SSMIs and Violation Category [Dataset]. https://dataful.in/datasets/18652
Explore at:
csv, application/x-parquet, xlsxAvailable download formats
Dataset updated
Nov 5, 2025
Dataset authored and provided by
Dataful (Factly)
License
https://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions
Area covered
India
Variables measured
Social Media Intermediaries Ban actions
Description
This dataset presents year and month wise enforcement actions taken by Significant Social Media Intermediaries (SSMIs) from 2021 to the present, compiled from the mandatory monthly transparency reports published under Rule 4(1)(d) of the Information Technology (Intermediary Guidelines and Digital Media Ethics Code) Rules, 2021. It includes counts of content removed, accounts suspended or banned, and chatrooms, comments, edit profiles and livestreams restricted, along with the policy or violation category (e.g., child sexual exploitation, terrorism, hate speech, bullying, violence, regulated goods, misinformation, etc.).

To enable comparability across platforms with different reporting terms, the dataset uses a standardised enforcement classification:

enforcement_type:

The type of action taken: a. Content Actioned (any enforcement such as warning, downranking, age-gating), b. Content Removed (content deleted or made inaccessible), c. Account Banned (account suspension or disabling), d. Quality Metric (AI moderation accuracy indicators reported by some platforms).

proactive_flag:

Whether the platform identified and enforced before user reports: a. Proactive = Found via automated detection or internal review systems, b. Unknown = Platform did not specify proactive vs reactive.

Notes: 1. SSMI denotes to Significant Social Media Intermediaries, with over 50,00,000 registered users in India, which primarily or solely enables online interaction between two or more users and allows them to create, upload, share, disseminate, modify or access information using its services

Facebook & Instagram (Meta) a. Content Actioned counts any enforcement, not only removals (e.g., removals, warning screens/covering, age gates, downranking). b. Proactive Rate = (items found & actioned proactively) ÷ (total content actioned).

X/Twitter a. Child Sexual Exploitation and terrorism suspensions are largely proactive, flagged using proprietary tools and industry hash-sharing systems. b. Data reflects global enforcement, not only India.

Google / YouTube a. Number of removal actions as a result of automated detection captures actions triggered by automated systems (ML + human-trained models).

ShareChat a. Content Removed / Taken Down / UGC discard / Comments/Chatrooms deleted are standardised as Content Removed. b. Also includes rights-holder reporting workflow for copyright/IP and automated proactive monitoring for harmful content.

WhatsApp a. Reports Proactively Banned Accounts, meaning accounts banned before any user reports.

Koo a. Distinguishes between Content Removed, Content Actioned (flagged/downranked), and Account Banned. b. Automation Correct/Wrong reflect AI moderation accuracy, not enforcement outcomes.
r
Abbreviated FOMO and social media dataset
researchdata.edu.au
figshare.mq.edu.au
Updated Jul 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ron Rapee; McEvoy, Peter; Maree J. Abbott; Madeleine Ferrari; Eyal Karin; Danielle Einstein; Carol Dabb; Anne McMaugh (2022). Abbreviated FOMO and social media dataset [Dataset]. http://doi.org/10.25949/20188298.V1
Explore at:
Unique identifier
https://doi.org/10.25949/20188298.V1
Dataset updated
Jul 7, 2022
Dataset provided by
Macquarie University
Authors
Ron Rapee; McEvoy, Peter; Maree J. Abbott; Madeleine Ferrari; Eyal Karin; Danielle Einstein; Carol Dabb; Anne McMaugh
Description

This database is comprised of 951 participants who provided self-report data online in their school classrooms. The data was collected in 2016 and 2017. The dataset is comprised of 509 males (54%) and 442 females (46%). Their ages ranged from 12 to 16 years (M = 13.69, SD = 0.72). Seven participants did not report their age. The majority were born in Australia (N = 849, 89%). The next most common countries of birth were China (N = 24, 2.5%), the UK (N = 23, 2.4%), and the USA (N = 9, 0.9%). Data were drawn from students at five Australian independent secondary schools.

The data contains item responses for the Spence Children’s Anxiety Scale (SCAS; Spence, 1998) which is comprised of 44 items. The Social media question asked about frequency of use with the question “How often do you use social media?”. The response options ranged from constantly to once a week or less. Items measuring Fear of Missing Out were included and incorporated the following five questions based on the APS Stress and Wellbeing in Australia Survey (APS, 2015). These were “When I have a good time it is important for me to share the details online; I am afraid that I will miss out on something if I don’t stay connected to my online social networks; I feel worried and uncomfortable when I can’t access my social media accounts; I find it difficult to relax or sleep after spending time on social networking sites; I feel my brain burnout with the constant connectivity of social media. Internal consistency for this measure was α = .81. Self compassion was measured using the 12-item short-form of the Self-Compassion Scale (SCS-SF; Raes et al., 2011).

The data set has the option of downloading an excel file (composed of two worksheet tabs) or CSV files 1) Data and 2) Variable labels.

References:

Australian Psychological Society. (2015). Stress and wellbeing in Australia survey. _{https://www.headsup.org.au/docs/default-source/default-document-library/stress-and-wellbeing-in-australia-report.pdf?sfvrsn=7f08274d_4}

Raes, F., Pommier, E., Neff, K. D., & Van Gucht, D. (2011). Construction and factorial validation of a short form of the self-compassion scale. Clinical Psychology and Psychotherapy, 18(3), 250-255. _{https://doi.org/10.1002/cpp.702}

Spence, S. H. (1998). A measure of anxiety symptoms among children. Behaviour Research and Therapy, 36(5), 545-566. _{https://doi.org/10.1016/S0005-7967(98)00034-5}

Facebook

Twitter

Click to copy link

Link copied

Cite

Shaik Barood Mohammed Umar Adnaan Faiz (2025). Daily Social Media Active Users [Dataset]. https://www.kaggle.com/datasets/umeradnaan/daily-social-media-active-users

Daily Social Media Active Users

"A thorough dataset that displays user activity on major social media platforms

Explore at:

zip(126814 bytes)Available download formats

Dataset updated

May 5, 2025

Authors

Shaik Barood Mohammed Umar Adnaan Faiz

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Description:

The "Daily Social Media Active Users" dataset provides a comprehensive and dynamic look into the digital presence and activity of global users across major social media platforms. The data was generated to simulate real-world usage patterns for 13 popular platforms, including Facebook, YouTube, WhatsApp, Instagram, WeChat, TikTok, Telegram, Snapchat, X (formerly Twitter), Pinterest, Reddit, Threads, LinkedIn, and Quora. This dataset contains 10,000 rows and includes several key fields that offer insights into user demographics, engagement, and usage habits.

Dataset Breakdown:

Platform: The name of the social media platform where the user activity is tracked. It includes globally recognized platforms, such as Facebook, YouTube, and TikTok, that are known for their large, active user bases.
Owner: The company or entity that owns and operates the platform. Examples include Meta for Facebook, Instagram, and WhatsApp, Google for YouTube, and ByteDance for TikTok.
Primary Usage: This category identifies the primary function of each platform. Social media platforms differ in their primary usage, whether it's for social networking, messaging, multimedia sharing, professional networking, or more.
Country: The geographical region where the user is located. The dataset simulates global coverage, showcasing users from diverse locations and regions. It helps in understanding how user behavior varies across different countries.
Daily Time Spent (min): This field tracks how much time a user spends on a given platform on a daily basis, expressed in minutes. Time spent data is critical for understanding user engagement levels and the popularity of specific platforms.
Verified Account: Indicates whether the user has a verified account. This feature mimics real-world patterns where verified users (often public figures, businesses, or influencers) have enhanced status on social media platforms.
Date Joined: The date when the user registered or started using the platform. This data simulates user account history and can provide insights into user retention trends or platform growth over time.

Context and Use Cases:

This synthetic dataset is designed to offer a privacy-friendly alternative for analytics, research, and machine learning purposes. Given the complexities and privacy concerns around using real user data, especially in the context of social media, this dataset offers a clean and secure way to develop, test, and fine-tune applications, models, and algorithms without the risks of handling sensitive or personal information.

Researchers, data scientists, and developers can use this dataset to:

Model User Behavior: By analyzing patterns in daily time spent, verified status, and country of origin, users can model and predict social media engagement behavior.
Test Analytics Tools: Social media monitoring and analytics platforms can use this dataset to simulate user activity and optimize their tools for engagement tracking, reporting, and visualization.
Train Machine Learning Algorithms: The dataset can be used to train models for various tasks like user segmentation, recommendation systems, or churn prediction based on engagement metrics.
Create Dashboards: This dataset can serve as the foundation for creating user-friendly dashboards that visualize user trends, platform comparisons, and engagement patterns across the globe.
Conduct Market Research: Business intelligence teams can use the data to understand how various demographics use social media, offering valuable insights into the most engaged regions, platform preferences, and usage behaviors.
Sources of Inspiration: This dataset is inspired by public data from industry reports, such as those from Statista, DataReportal, and other market research platforms. These sources provide insights into the global user base and usage statistics of popular social media platforms. The synthetic nature of this dataset allows for the use of realistic engagement metrics without violating any privacy concerns, making it an ideal tool for educational, analytical, and research purposes.

The structure and design of the dataset are based on real-world usage patterns and aim to represent a variety of users from different backgrounds, countries, and activity levels. This diversity makes it an ideal candidate for testing data-driven solutions and exploring social media trends.

Future Considerations:

As the social media landscape continues to evolve, this dataset can be updated or extended to include new platforms, engagement metrics, or user behaviors. Future iterations may incorporate features like post frequency, follower counts, engagement rates (likes, comments, shares), or even sentiment analysis from user-generated content.

By leveraging this dataset, analysts and data scientists can create better, more effective strategies ...

Clear search

Close search

Google apps

Main menu

Daily Social Media Active Users

Social Media Dataset

Number of global social network users 2017-2028

Social Media Behavior Dataset

Global social network penetration 2019-2028

Average daily time spent on social media worldwide 2012-2024

YouTube Social Network with Communities (SNAP)

Youtube social network and ground-truth communities

Notes on inclusion into the SuiteSparse Matrix Collection, July 2018:

Social Media Datasets

Dataset of directed signed networks from social domain

U.S. Facebook data requests from government agencies 2013-2023

Replication data for: Power Positions: International Organizations, Social...

Social media as a news outlet worldwide 2024

LiveJournal Social Network with Communities (SNAP)

LiveJournal social network and ground-truth communities

Notes on inclusion into the SuiteSparse Matrix Collection, July 2018:

A Blue Start: A large-scale pairwise and higher-order social network dataset...

Dataset for Social Media Activity, Number of Friends, and Relationship...

GitHub Social Network

Social media profile growth, engagement rate, and reach

Planned changes in use of selected social media for organic marketing...

Removal and Enforcement Actions by Social Media Companies: Year and Month...

Abbreviated FOMO and social media dataset

Daily Social Media Active Users

"A thorough dataset that displays user activity on major social media platforms