Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description:
The "Daily Social Media Active Users" dataset provides a comprehensive and dynamic look into the digital presence and activity of global users across major social media platforms. The data was generated to simulate real-world usage patterns for 13 popular platforms, including Facebook, YouTube, WhatsApp, Instagram, WeChat, TikTok, Telegram, Snapchat, X (formerly Twitter), Pinterest, Reddit, Threads, LinkedIn, and Quora. This dataset contains 10,000 rows and includes several key fields that offer insights into user demographics, engagement, and usage habits.
Dataset Breakdown:
Platform: The name of the social media platform where the user activity is tracked. It includes globally recognized platforms, such as Facebook, YouTube, and TikTok, that are known for their large, active user bases.
Owner: The company or entity that owns and operates the platform. Examples include Meta for Facebook, Instagram, and WhatsApp, Google for YouTube, and ByteDance for TikTok.
Primary Usage: This category identifies the primary function of each platform. Social media platforms differ in their primary usage, whether it's for social networking, messaging, multimedia sharing, professional networking, or more.
Country: The geographical region where the user is located. The dataset simulates global coverage, showcasing users from diverse locations and regions. It helps in understanding how user behavior varies across different countries.
Daily Time Spent (min): This field tracks how much time a user spends on a given platform on a daily basis, expressed in minutes. Time spent data is critical for understanding user engagement levels and the popularity of specific platforms.
Verified Account: Indicates whether the user has a verified account. This feature mimics real-world patterns where verified users (often public figures, businesses, or influencers) have enhanced status on social media platforms.
Date Joined: The date when the user registered or started using the platform. This data simulates user account history and can provide insights into user retention trends or platform growth over time.
Context and Use Cases:
Researchers, data scientists, and developers can use this dataset to:
Model User Behavior: By analyzing patterns in daily time spent, verified status, and country of origin, users can model and predict social media engagement behavior.
Test Analytics Tools: Social media monitoring and analytics platforms can use this dataset to simulate user activity and optimize their tools for engagement tracking, reporting, and visualization.
Train Machine Learning Algorithms: The dataset can be used to train models for various tasks like user segmentation, recommendation systems, or churn prediction based on engagement metrics.
Create Dashboards: This dataset can serve as the foundation for creating user-friendly dashboards that visualize user trends, platform comparisons, and engagement patterns across the globe.
Conduct Market Research: Business intelligence teams can use the data to understand how various demographics use social media, offering valuable insights into the most engaged regions, platform preferences, and usage behaviors.
Sources of Inspiration: This dataset is inspired by public data from industry reports, such as those from Statista, DataReportal, and other market research platforms. These sources provide insights into the global user base and usage statistics of popular social media platforms. The synthetic nature of this dataset allows for the use of realistic engagement metrics without violating any privacy concerns, making it an ideal tool for educational, analytical, and research purposes.
The structure and design of the dataset are based on real-world usage patterns and aim to represent a variety of users from different backgrounds, countries, and activity levels. This diversity makes it an ideal candidate for testing data-driven solutions and exploring social media trends.
Future Considerations:
As the social media landscape continues to evolve, this dataset can be updated or extended to include new platforms, engagement metrics, or user behaviors. Future iterations may incorporate features like post frequency, follower counts, engagement rates (likes, comments, shares), or even sentiment analysis from user-generated content.
By leveraging this dataset, analysts and data scientists can create better, more effective strategies ...
Facebook
TwitterThis dataset consists of 734 entries representing social media activity and performance from a local SME (Micro, Small, and Medium Enterprise) across TikTok, Instagram, and Twitter platforms. It captures key metrics related to audience interaction and content strategy effectiveness, and is valuable for evaluating and optimizing digital marketing efforts for small businesses.
Area : Target location or customer region where the UMKM's content is directed. Category : The business content category (e.g., product promotion, education, seasonal campaign). Day : The day of the week the content was published. Month : The month the post went live. Platform : The social media platform used by the UMKM (TikTok, Instagram, or Twitter). Post Type : The format of the content posted: image, video, carousel, or text. Timestamp : The exact date and time when the content was posted. User : The username or business account that posted the content. Week : Week number within the year for time-based analysis. Year : The year the content was posted. Comments : Total number of comments received on the post. Engagement Rate : A calculated metric showing how engaging the content is (based on likes, comments, shares vs. reach/impressions). Hour : Hour of the day the post was published. Impressions : Number of times the content appeared on users' feeds. Likes : Number of likes the post received. Reach : Number of unique users who saw the content. Shares : Number of times users shared the content.
Facebook
TwitterCristiano Ronaldo has one of the most popular Instagram accounts as of April 2024.
The Portuguese footballer is the most-followed person on the photo sharing app platform with 628 million followers. Instagram's own account was ranked first with roughly 672 million followers.
How popular is Instagram?
Instagram is a photo-sharing social networking service that enables users to take pictures and edit them with filters. The platform allows users to post and share their images online and directly with their friends and followers on the social network. The cross-platform app reached one billion monthly active users in mid-2018. In 2020, there were over 114 million Instagram users in the United States and experts project this figure to surpass 127 million users in 2023.
Who uses Instagram?
Instagram audiences are predominantly young – recent data states that almost 60 percent of U.S. Instagram users are aged 34 years or younger. Fall 2020 data reveals that Instagram is also one of the most popular social media for teens and one of the social networks with the biggest reach among teens in the United States.
Celebrity influencers on Instagram
Many celebrities and athletes are brand spokespeople and generate additional income with social media advertising and sponsored content. Unsurprisingly, Ronaldo ranked first again, as the average media value of one of his Instagram posts was 985,441 U.S. dollars.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains 600 synthetic entries simulating social media activity across three major platforms: Twitter, Reddit, and Instagram. The data was generated to analyze trends, sentiments, and user engagement patterns based on hashtags and posts. It can be useful for researchers, data analysts, and machine learning enthusiasts interested in studying social media behavior.
Dataset Structure The dataset includes the following columns:
Date: The date of the post, ranging across a simulated timeline. Platform: The social media platform where the post was made (Twitter, Reddit, or Instagram). Hashtag: The main hashtag associated with the post, such as #AI, #MachineLearning, or #Python. Post Content: The text of the post, crafted to simulate common social media interactions. Sentiment: The sentiment of the post, classified as Positive, Neutral, or Negative. Likes: The number of likes the post received. Shares: The number of shares or retweets the post received. Potential Use Cases Sentiment analysis: Train machine learning models to detect sentiment in text. Hashtag popularity analysis: Determine which hashtags are most commonly used or generate the most engagement. Engagement trends: Explore correlations between post sentiment and engagement metrics (likes/shares). Platform comparison: Compare user behavior across different social media platforms. Acknowledgments This dataset is fully synthetic and was generated using Python. It does not contain any real user data and is intended for educational and research purposes.
Facebook
Twitterhttps://brightdata.com/licensehttps://brightdata.com/license
Gain valuable insights with our comprehensive Social Media Dataset, designed to help businesses, marketers, and analysts track trends, monitor engagement, and optimize strategies. This dataset provides structured and reliable social media data from multiple platforms.
Dataset Features
User Profiles: Access public social media profiles, including usernames, bios, follower counts, engagement metrics, and more. Ideal for audience analysis, influencer marketing, and competitive research. Posts & Content: Extract posts, captions, hashtags, media (images/videos), timestamps, and engagement metrics such as likes, shares, and comments. Useful for trend analysis, sentiment tracking, and content strategy optimization. Comments & Interactions: Analyze user interactions, including replies, mentions, and discussions. This data helps brands understand audience sentiment and engagement patterns. Hashtag & Trend Tracking: Monitor trending hashtags, topics, and viral content across platforms to stay ahead of industry trends and consumer interests.
Customizable Subsets for Specific Needs Our Social Media Dataset is fully customizable, allowing you to filter data based on platform, region, keywords, engagement levels, or specific user profiles. Whether you need a broad dataset for market research or a focused subset for brand monitoring, we tailor the dataset to your needs.
Popular Use Cases
Brand Monitoring & Reputation Management: Track brand mentions, customer feedback, and sentiment analysis to manage online reputation effectively. Influencer Marketing & Audience Analysis: Identify key influencers, analyze engagement metrics, and optimize influencer partnerships. Competitive Intelligence: Monitor competitor activity, content performance, and audience engagement to refine marketing strategies. Market Research & Consumer Insights: Analyze social media trends, customer preferences, and emerging topics to inform business decisions. AI & Predictive Analytics: Leverage structured social media data for AI-driven trend forecasting, sentiment analysis, and automated content recommendations.
Whether you're tracking brand sentiment, analyzing audience engagement, or monitoring industry trends, our Social Media Dataset provides the structured data you need. Get started today and customize your dataset to fit your business objectives.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides detailed rankings and key metrics for 100+ social media platforms and sites in 2025. It includes information such as user base, popularity trends, and global reach. Ideal for analyzing social media growth, user engagement, and market trends. Whether you're a data scientist, marketer, or researcher, this dataset offers valuable insights into the evolving digital landscape.
Facebook
Twitterhttps://www.ibisworld.com/about/termsofuse/https://www.ibisworld.com/about/termsofuse/
Over the five years through 2025-26, industry revenue is forecast to expand at a compound annual rate of 20.3% to reach £12.5 billion. Social media platforms are integral to people's lives, offering ways to communicate, create and view content and share information. According to Ofcom, approximately 89% of UK internet users in 2023 used social media apps or sites. Teenagers and young adults are the biggest users. Advertising is the primary revenue source for social media platforms, although subscription-based services are gaining momentum as platforms seek to diversify their incomes. TikTok is the success story of the past five years, becoming the most downloaded app between 2020 and 2022, according to Apptopia. The short-form video platform has over 30 million monthly users in the UK in 2025. After Musk's takeover, X, formerly known as Twitter, adjusted its content moderation and allowed previously banned accounts to return. As a result, over 600 advertisers pulled their ads from the site because of fears their brand may be associated with malcontent. In response to falling ad revenue, X has introduced a subscription-based service which enables users to verify themselves and boosts the number of people who view their tweets. Meta-owned Facebook and Instagram have responded by introducing a similar service. In 2025, more social media platforms are using AI to boost user engagement. This improves click-through rates and drives higher advertising revenue. Industry revenue is expected to grow by 6.3% in 2025-26. Over the five years through 2030-31, social media platforms' revenue is projected to climb at an estimated 9.2% to reach £19.4 billion. Regulations relating to how data is collected, stored, and shared will force advertisers and platforms to rethink how they can target their desired demographics. The tightening of regulations will raise industry compliance costs, weighing on profit margin. Older age groups present a new revenue opportunity for social media platforms if they can bridge the gap between passive TV consumption and interactive digital engagement. Augmented Reality (AR) technology will move beyond filters to become standard for immersive product trials, interactive ads, and virtual meetups
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset captures the pulse of viral social media trends across TikTok, Instagram, Twitter, and YouTube. It provides insights into the most popular hashtags, content types, and user engagement levels, offering a comprehensive view of how trends unfold across platforms. With regional data and influencer-driven content, this dataset is perfect for:
Dive in to explore what makes content go viral, the behaviors that drive engagement, and how trends evolve on a global scale! 🌍
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is structured as a graph, where nodes represent users and edges capture their interactions, including tweets, retweets, replies, and mentions. Each node provides detailed user attributes, such as unique ID, follower and following counts, and verification status, offering insights into each user's identity, role, and influence in the mental health discourse. The edges illustrate user interactions, highlighting engagement patterns and types of content that drive responses, such as tweet impressions. This interconnected structure enables sentiment analysis and public reaction studies, allowing researchers to explore engagement trends and identify the mental health topics that resonate most with users.
The dataset consists of three files: 1. Edges Data: Contains graph data essential for social network analysis, including fields for UserID (Source), UserID (Destination), Post/Tweet ID, and Date of Relationship. This file enables analysis of user connections without including tweet content, maintaining compliance with Twitter/X’s data-sharing policies. 2. Nodes Data: Offers user-specific details relevant to network analysis, including UserID, Account Creation Date, Follower and Following counts, Verified Status, and Date Joined Twitter. This file allows researchers to examine user behavior (e.g., identifying influential users or spam-like accounts) without direct reference to tweet content. 3. Twitter/X Content Data: This file contains only the raw tweet text as a single-column dataset, without associated user identifiers or metadata. By isolating the text, we ensure alignment with anonymization standards observed in similar published datasets, safeguarding user privacy in compliance with Twitter/X's data guidelines. This content is crucial for addressing the research focus on mental health discourse in social media. (References to prior Data in Brief publications involving Twitter/X data informed the dataset's structure.)
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
A list of the most popular (top 100 by followers) Instagram, Twitter, YouTube, Twitch, and TikTok users. NB! For YouTube the followers are subscribers and the posts are videos.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
*****Documentation Process***** 1. Data Preparation: - Upload the data into Power Query to assess quality and identify duplicate values, if any. - Verify data quality and types for each column, addressing any miswriting or inconsistencies. 2. Data Management: - Duplicate the original data sheet for future reference and label the new sheet as the "Working File" to preserve the integrity of the original dataset. 3. Understanding Metrics: - Clarify the meaning of column headers, particularly distinguishing between Impressions and Reach, and comprehend how Engagement Rate is calculated. - Engagement Rate formula: Total likes, comments, and shares divided by Reach. 4. Data Integrity Assurance: - Recognize that Impressions should outnumber Reach, reflecting total views versus unique audience size. - Investigate discrepancies between Reach and Impressions to ensure data integrity, identifying and resolving root causes for accurate reporting and analysis. 5. Data Correction: - Collaborate with the relevant team to rectify data inaccuracies, specifically addressing the discrepancy between Impressions and Reach. - Engage with the concerned team to understand the root cause of discrepancies between Impressions and Reach. - Identify instances where Impressions surpass Reach, potentially attributable to data transformation errors. - Following the rectification process, meticulously adjust the dataset to reflect the corrected Impressions and Reach values accurately. - Ensure diligent implementation of the corrections to maintain the integrity and reliability of the data. - Conduct a thorough recalculation of the Engagement Rate post-correction, adhering to rigorous data integrity standards to uphold the credibility of the analysis. 6. Data Enhancement: - Categorize Audience Age into three groups: "Senior Adults" (45+ years), "Mature Adults" (31-45 years), and "Adolescent Adults" (<30 years) within a new column named "Age Group." - Split date and time into separate columns using the text-to-columns option for improved analysis. 7. Temporal Analysis: - Introduce a new column for "Weekend and Weekday," renamed as "Weekday Type," to discern patterns and trends in engagement. - Define time periods by categorizing into "Morning," "Afternoon," "Evening," and "Night" based on time intervals. 8. Sentiment Analysis: - Populate blank cells in the Sentiment column with "Mixed Sentiment," denoting content containing both positive and negative sentiments or ambiguity. 9. Geographical Analysis: - Group countries and obtain additional continent data from an online source (e.g., https://statisticstimes.com/geography/countries-by-continents.php). - Add a new column for "Audience Continent" and utilize XLOOKUP function to retrieve corresponding continent data.
*****Drawing Conclusions and Providing a Summary*****
Facebook
TwitterThis dataset contains simulated data for social media users' demographics, behaviors, and perceptions related to political content. It includes features such as age, gender, education level, occupation, social media usage frequency, exposure to political content, and perceptions of accuracy and relevance.
the features included in the "Social Media Political Content Analysis Dataset":
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides a comprehensive analysis of the factors influencing the credibility of information on social media among Iranian users. The research focuses on identifying the most significant factors that affect the perceived credibility of information shared on various social media platforms. The dataset includes demographic information, social media usage patterns, and ratings of various attributes related to information credibility.
Facebook
TwitterThe Reddit Subreddit Dataset by Dataplex offers a comprehensive and detailed view of Reddit’s vast ecosystem, now enhanced with appended AI-generated columns that provide additional insights and categorization. This dataset includes data from over 2.1 million subreddits, making it an invaluable resource for a wide range of analytical applications, from social media analysis to market research.
Dataset Overview:
This dataset includes detailed information on subreddit activities, user interactions, post frequency, comment data, and more. The inclusion of AI-generated columns adds an extra layer of analysis, offering sentiment analysis, topic categorization, and predictive insights that help users better understand the dynamics of each subreddit.
2.1 Million Subreddits with Enhanced AI Insights: The dataset covers over 2.1 million subreddits and now includes AI-enhanced columns that provide: - Sentiment Analysis: AI-driven sentiment scores for posts and comments, allowing users to gauge community mood and reactions. - Topic Categorization: Automated categorization of subreddit content into relevant topics, making it easier to filter and analyze specific types of discussions. - Predictive Insights: AI models that predict trends, content virality, and user engagement, helping users anticipate future developments within subreddits.
Sourced Directly from Reddit:
All social media data in this dataset is sourced directly from Reddit, ensuring accuracy and authenticity. The dataset is updated regularly, reflecting the latest trends and user interactions on the platform. This ensures that users have access to the most current and relevant data for their analyses.
Key Features:
Use Cases:
Data Quality and Reliability:
The Reddit Subreddit Dataset emphasizes data quality and reliability. Each record is carefully compiled from Reddit’s vast database, ensuring that the information is both accurate and up-to-date. The AI-generated columns further enhance the dataset's value, providing automated insights that help users quickly identify key trends and sentiments.
Integration and Usability:
The dataset is provided in a format that is compatible with most data analysis tools and platforms, making it easy to integrate into existing workflows. Users can quickly import, analyze, and utilize the data for various applications, from market research to academic studies.
User-Friendly Structure and Metadata:
The data is organized for easy navigation and analysis, with metadata files included to help users identify relevant subreddits and data points. The AI-enhanced columns are clearly labeled and structured, allowing users to efficiently incorporate these insights into their analyses.
Ideal For:
This dataset is an essential resource for anyone looking to understand the intricacies of Reddit's vast ecosystem, offering the data and AI-enhanced insights needed to drive informed decisions and strategies across various fields. Whether you’re tracking emerging trends, analyzing user behavior, or conduc...
Facebook
Twitterhttps://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Social Media Listening Market Size 2025-2029
The social media listening market size is forecast to increase by USD 4.87 billion at a CAGR of 8.9% between 2024 and 2029.
The market is experiencing significant growth, driven primarily by the increasing usage of social media platforms worldwide. With over 4.3 billion users as of 2021, social media has become a powerful tool for businesses to engage with their customers and gain valuable insights into consumer behavior and preferences. A key trend in this market is the integration of Artificial Intelligence (AI) and Machine Learning (ML) technologies in social media listening solutions, enabling more accurate and efficient data analysis. However, this market is not without challenges. Data privacy and regulatory compliance are becoming increasingly important, with stricter regulations being implemented to protect user data.
Companies must ensure they have strong data security measures in place to comply with these regulations and maintain consumer trust. Additionally, the vast amount of data generated on social media requires sophisticated analytics tools to extract meaningful insights. As such, businesses seeking to capitalize on the opportunities presented by the market must invest in advanced analytics solutions and prioritize data security and privacy. By doing so, they can effectively navigate the challenges and stay ahead of the competition.
What will be the Size of the Social Media Listening Market during the forecast period?
Request Free Sample
Social media listening has emerged as a crucial business tool, enabling organizations to gain valuable insights from the vast amount of data generated through social media activity. This data is analyzed using techniques such as topic modeling and sentiment scoring to understand consumer behavior, preferences, and trends. Social media geographics and demographics provide essential context, while social media reach and volume measure the scope and impact of conversations. Social media pulse and sentiment reflect the current sentiment and buzz surrounding specific topics, offering real-time insights into market dynamics and trends.
Social media listening software is a vital component of the global market for social media analytics. Social media influence is assessed through the size and engagement of an audience, providing valuable information for marketing and brand management strategies. The social media landscape and heatmap offer a comprehensive view of the social media ecosystem, helping businesses stay informed and adapt to evolving patterns.
How is this Social Media Listening Industry segmented?
The social media listening industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.
Type
Software
Services
End-user
Retail and e-commerce
IT and telecom
BFSI
Media and entertainment
Others
Geography
North America
US
Canada
Europe
France
Germany
UK
Middle East and Africa
APAC
China
India
Japan
South Korea
South America
Brazil
Rest of World (ROW)
By Type Insights
The software segment is estimated to witness significant growth during the forecast period. This segment encompasses platforms and tools that offer real-time, automated, and scalable capabilities to monitor and analyze social media conversations across various channels such as Twitter, Facebook, Instagram, LinkedIn, TikTok, and Reddit. Real-time monitoring is a key feature of these solutions, empowering brands to identify mentions, trends, and sentiment as they emerge. By staying abreast of evolving topics, businesses can respond promptly to customer concerns, capitalize on viral events, and maintain a strong online presence. Artificial Intelligence (AI) and Machine Learning (ML) technologies are integral to social media listening software, enabling advanced topic identification, sentiment analysis, and trend recognition.
These technologies enable businesses to gain valuable customer insights, inform product development, and enhance customer experience. Social media listening platforms also offer data visualization and reporting features, allowing businesses to analyze and present their findings in a clear and actionable manner. Additionally, they provide social media dashboards, alerts, and governance tools to ensure compliance with social media policies and ethical standards. In summary, social media listening software plays a pivotal role in the global market for social media analytics, offering real-time insights and advanced capabilities to help businesses navigate the complex social media landscape and engage effectively with their audience.
Get a glance at the market report of share of v
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Weibo is one of the mainstream social media platforms in China. Among its features, trending topics serve as an important real-time information source for Weibo users, consisting of the most popular search terms at the moment. Weibo's official platform does not provide corresponding tag information for these trending topics, making it difficult for users to access specific categories of topics. To address this issue, we collected over 6,000 trending topic data entries from November 24th to December 23rd, 2020. Each entry was manually categorized into one of eight major categories: "(时政)Politics", "(科技)Technology", "(科普)Popular Science", "(娱乐)Entertainment", "(体育)Sports", "(社会讨论/话题)Social Discussions/Topics", "(时事)Current Affairs" and "(经济)Economy". This categorization aims to facilitate subsequent applications. Besides, we provide another dataset of hot search that are unlabeled. - Politics: The current political news happening now. - Technology: News related to high-tech products. - Popular Science: News topics about popularizing knowledge. - Entertainment: News related to celebrities or variety shows. - Sports: News related to sports events or sports celebrities. - Social Discussions/Topics: Hot topics being discussed by the general public. - Current Affairs: Current social events happening now. - Economy: News related to the economy.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The report provides a snapshot of the social media usage trends amongst online Canadian adults based on an online survey of 1500 participants. Canada continues to be one of the most connected countries in the world. An overwhelming majority of online Canadian adults (94%) have an account on at least one social media platform. However, the 2022 survey results show that the COVID-19 pandemic has ushered in some changes in how and where Canadians are spending their time on social media. Dominant platforms such as Facebook, messaging apps and YouTube are still on top but are losing ground to newer platforms such as TikTok and more niche platforms such as Reddit and Twitch.
Facebook
Twitterhttps://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions
This dataset presents year and month wise enforcement actions taken by Significant Social Media Intermediaries (SSMIs) from 2021 to the present, compiled from the mandatory monthly transparency reports published under Rule 4(1)(d) of the Information Technology (Intermediary Guidelines and Digital Media Ethics Code) Rules, 2021. It includes counts of content removed, accounts suspended or banned, and chatrooms, comments, edit profiles and livestreams restricted, along with the policy or violation category (e.g., child sexual exploitation, terrorism, hate speech, bullying, violence, regulated goods, misinformation, etc.).
To enable comparability across platforms with different reporting terms, the dataset uses a standardised enforcement classification:
The type of action taken: a. Content Actioned (any enforcement such as warning, downranking, age-gating), b. Content Removed (content deleted or made inaccessible), c. Account Banned (account suspension or disabling), d. Quality Metric (AI moderation accuracy indicators reported by some platforms).
Whether the platform identified and enforced before user reports: a. Proactive = Found via automated detection or internal review systems, b. Unknown = Platform did not specify proactive vs reactive.
Notes: 1. SSMI denotes to Significant Social Media Intermediaries, with over 50,00,000 registered users in India, which primarily or solely enables online interaction between two or more users and allows them to create, upload, share, disseminate, modify or access information using its services
Facebook & Instagram (Meta) a. Content Actioned counts any enforcement, not only removals (e.g., removals, warning screens/covering, age gates, downranking). b. Proactive Rate = (items found & actioned proactively) ÷ (total content actioned).
X/Twitter a. Child Sexual Exploitation and terrorism suspensions are largely proactive, flagged using proprietary tools and industry hash-sharing systems. b. Data reflects global enforcement, not only India.
Google / YouTube a. Number of removal actions as a result of automated detection captures actions triggered by automated systems (ML + human-trained models).
ShareChat a. Content Removed / Taken Down / UGC discard / Comments/Chatrooms deleted are standardised as Content Removed. b. Also includes rights-holder reporting workflow for copyright/IP and automated proactive monitoring for harmful content.
WhatsApp a. Reports Proactively Banned Accounts, meaning accounts banned before any user reports.
Koo a. Distinguishes between Content Removed, Content Actioned (flagged/downranked), and Account Banned. b. Automation Correct/Wrong reflect AI moderation accuracy, not enforcement outcomes.
Facebook
TwitterDataset Overview
This comprehensive dataset offers an in-depth analysis of social media engagements across various platforms. It captures the dynamics of user interactions by tracking the number of reactions, comments, shares, and types of posts. Ideal for social media analysts, marketers, and researchers, this dataset serves as a critical tool for understanding digital communication trends and enhancing social media strategies. Each entry provides detailed metrics on how posts are received by audiences, enabling data-driven insights into content performance.
Key Features:
📌 num_reactions: Total number of reactions a post receives, encapsulating the overall engagement. 📍 num_comments: Reflects the level of audience interaction through comments. 📸 num_shares: Indicates the virality of the post by counting how many times it has been shared. ❤️ num_likes: Tracks the number of likes, showing general approval of the content. 🥰 num_loves: Captures more intense affection reactions to posts. 😮 num_wows: Measures the surprise or awe factor of the post. 😂 num_hahas: Counts instances of amusement or laughter triggered by the post. 😢 num_sads: Reflects the number of sad reactions, indicating emotional impact. 😡 num_angrys: Tracks angry reactions, highlighting content that might be controversial or upsetting. 🔗 status_type_link: Binary indicator of whether the post includes a link, enhancing its informational value. 🖼️ status_type_photo: Identifies posts with photos, crucial for visual content analysis. 📝 status_type_status: Marks textual posts, focusing on written content engagement. 🎥 status_type_video: Distinguishes posts with videos, important for engagement in dynamic content.
This dataset not only aids in measuring the effectiveness of social media campaigns but also supports the development of targeted marketing strategies and content optimization efforts to maximize audience engagement.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset ini merupakan hasil dari scraping pada media sosial twitter dengan menggunakan aplikasi twint yang ditujukan pada hashtag #IndonesiaHumanRightsSOS. Scraping data dilakukan untuk cuitan yang dibuat dari tanggal 18 Desember 2020 10:59 AM s/d 19 Desember 2020 23:18 PM.
Pada dataset mengandung 106.903 Row data dengan informasi terkait: User ID, Username, Twitter Name,Tweets, dsb.
Selain itu dilampirkan juga contoh data yang telah dianalisis berupa wordcloud,username cloud, 100 most used word & most active username.
-
This dataset is the result of scraping on social media twitter using the twint application aimed at the hashtag #IndonesiaHumanRightsSOS. Data scraping is done for tweets made from December 18 2020 10:59 AM to December 19 2020 23:18 PM.
The dataset contains 106,903 rows of data with related information: User ID, Username, Twitter Name, Tweets, etc.
Also there is an example of the data that has been analyzed in the form of wordcloud, username cloud, 100 most used words & most active username.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description:
The "Daily Social Media Active Users" dataset provides a comprehensive and dynamic look into the digital presence and activity of global users across major social media platforms. The data was generated to simulate real-world usage patterns for 13 popular platforms, including Facebook, YouTube, WhatsApp, Instagram, WeChat, TikTok, Telegram, Snapchat, X (formerly Twitter), Pinterest, Reddit, Threads, LinkedIn, and Quora. This dataset contains 10,000 rows and includes several key fields that offer insights into user demographics, engagement, and usage habits.
Dataset Breakdown:
Platform: The name of the social media platform where the user activity is tracked. It includes globally recognized platforms, such as Facebook, YouTube, and TikTok, that are known for their large, active user bases.
Owner: The company or entity that owns and operates the platform. Examples include Meta for Facebook, Instagram, and WhatsApp, Google for YouTube, and ByteDance for TikTok.
Primary Usage: This category identifies the primary function of each platform. Social media platforms differ in their primary usage, whether it's for social networking, messaging, multimedia sharing, professional networking, or more.
Country: The geographical region where the user is located. The dataset simulates global coverage, showcasing users from diverse locations and regions. It helps in understanding how user behavior varies across different countries.
Daily Time Spent (min): This field tracks how much time a user spends on a given platform on a daily basis, expressed in minutes. Time spent data is critical for understanding user engagement levels and the popularity of specific platforms.
Verified Account: Indicates whether the user has a verified account. This feature mimics real-world patterns where verified users (often public figures, businesses, or influencers) have enhanced status on social media platforms.
Date Joined: The date when the user registered or started using the platform. This data simulates user account history and can provide insights into user retention trends or platform growth over time.
Context and Use Cases:
Researchers, data scientists, and developers can use this dataset to:
Model User Behavior: By analyzing patterns in daily time spent, verified status, and country of origin, users can model and predict social media engagement behavior.
Test Analytics Tools: Social media monitoring and analytics platforms can use this dataset to simulate user activity and optimize their tools for engagement tracking, reporting, and visualization.
Train Machine Learning Algorithms: The dataset can be used to train models for various tasks like user segmentation, recommendation systems, or churn prediction based on engagement metrics.
Create Dashboards: This dataset can serve as the foundation for creating user-friendly dashboards that visualize user trends, platform comparisons, and engagement patterns across the globe.
Conduct Market Research: Business intelligence teams can use the data to understand how various demographics use social media, offering valuable insights into the most engaged regions, platform preferences, and usage behaviors.
Sources of Inspiration: This dataset is inspired by public data from industry reports, such as those from Statista, DataReportal, and other market research platforms. These sources provide insights into the global user base and usage statistics of popular social media platforms. The synthetic nature of this dataset allows for the use of realistic engagement metrics without violating any privacy concerns, making it an ideal tool for educational, analytical, and research purposes.
The structure and design of the dataset are based on real-world usage patterns and aim to represent a variety of users from different backgrounds, countries, and activity levels. This diversity makes it an ideal candidate for testing data-driven solutions and exploring social media trends.
Future Considerations:
As the social media landscape continues to evolve, this dataset can be updated or extended to include new platforms, engagement metrics, or user behaviors. Future iterations may incorporate features like post frequency, follower counts, engagement rates (likes, comments, shares), or even sentiment analysis from user-generated content.
By leveraging this dataset, analysts and data scientists can create better, more effective strategies ...