Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
trends
https://brightdata.com/licensehttps://brightdata.com/license
Gain valuable insights with our comprehensive Social Media Dataset, designed to help businesses, marketers, and analysts track trends, monitor engagement, and optimize strategies. This dataset provides structured and reliable social media data from multiple platforms.
Dataset Features
User Profiles: Access public social media profiles, including usernames, bios, follower counts, engagement metrics, and more. Ideal for audience analysis, influencer marketing, and competitive research. Posts & Content: Extract posts, captions, hashtags, media (images/videos), timestamps, and engagement metrics such as likes, shares, and comments. Useful for trend analysis, sentiment tracking, and content strategy optimization. Comments & Interactions: Analyze user interactions, including replies, mentions, and discussions. This data helps brands understand audience sentiment and engagement patterns. Hashtag & Trend Tracking: Monitor trending hashtags, topics, and viral content across platforms to stay ahead of industry trends and consumer interests.
Customizable Subsets for Specific Needs Our Social Media Dataset is fully customizable, allowing you to filter data based on platform, region, keywords, engagement levels, or specific user profiles. Whether you need a broad dataset for market research or a focused subset for brand monitoring, we tailor the dataset to your needs.
Popular Use Cases
Brand Monitoring & Reputation Management: Track brand mentions, customer feedback, and sentiment analysis to manage online reputation effectively. Influencer Marketing & Audience Analysis: Identify key influencers, analyze engagement metrics, and optimize influencer partnerships. Competitive Intelligence: Monitor competitor activity, content performance, and audience engagement to refine marketing strategies. Market Research & Consumer Insights: Analyze social media trends, customer preferences, and emerging topics to inform business decisions. AI & Predictive Analytics: Leverage structured social media data for AI-driven trend forecasting, sentiment analysis, and automated content recommendations.
Whether you're tracking brand sentiment, analyzing audience engagement, or monitoring industry trends, our Social Media Dataset provides the structured data you need. Get started today and customize your dataset to fit your business objectives.
Cristiano Ronaldo has one of the most popular Instagram accounts as of April 2024.
The Portuguese footballer is the most-followed person on the photo sharing app platform with 628 million followers. Instagram's own account was ranked first with roughly 672 million followers.
How popular is Instagram?
Instagram is a photo-sharing social networking service that enables users to take pictures and edit them with filters. The platform allows users to post and share their images online and directly with their friends and followers on the social network. The cross-platform app reached one billion monthly active users in mid-2018. In 2020, there were over 114 million Instagram users in the United States and experts project this figure to surpass 127 million users in 2023.
Who uses Instagram?
Instagram audiences are predominantly young – recent data states that almost 60 percent of U.S. Instagram users are aged 34 years or younger. Fall 2020 data reveals that Instagram is also one of the most popular social media for teens and one of the social networks with the biggest reach among teens in the United States.
Celebrity influencers on Instagram
Many celebrities and athletes are brand spokespeople and generate additional income with social media advertising and sponsored content. Unsurprisingly, Ronaldo ranked first again, as the average media value of one of his Instagram posts was 985,441 U.S. dollars.
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Social Media Usage Dataset(Applications) features patterns and activity indicators that 1,000 users use seven major social media platforms, including Facebook, Instagram, and Twitter.
2) Data Utilization (1) Social Media Usage Dataset(Applications) has characteristics that: • This dataset provides different social media activity data for each user, including daily usage time, number of posts, number of likes received, and number of new followers. (2) Social Media Usage Dataset(Applications) can be used to: • Analysis of User Participation by Platform: You can analyze participation and popular trends by platform by comparing usage time and activity for each social media. • Establish marketing strategy: Based on user activity data, it can be used for targeted marketing, content production, and user retention strategies.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Social Media has become a part of our day-to-day routine, keeping users from across the world well-connected through digital platforms. With each passing year, social media is evolving at a rapid speed. With each passing year, the number of social media users is increasing at an immersive speed. Reports also suggest the number of social media users will reach a milestone of 5.85 billion in 2027.
In 2024, 62.6% of the world’s population will access social media, which clearly indicates the dominance of social media platforms in today’s world. In this article, we will examine social media statistics for 2024, uncovering monthly active users, daily time spent by users, most downloaded social media apps, etc.
A dataset consisting of recipient 46 users and, 26180 tweets. The dataset includes the news feed of the users and 13 features that may influence the relevance of the tweets.
As of April 2024, around 16.5 percent of global active Instagram users were men between the ages of 18 and 24 years. More than half of the global Instagram population worldwide was aged 34 years or younger.
Teens and social media
As one of the biggest social networks worldwide, Instagram is especially popular with teenagers. As of fall 2020, the photo-sharing app ranked third in terms of preferred social network among teenagers in the United States, second to Snapchat and TikTok. Instagram was one of the most influential advertising channels among female Gen Z users when making purchasing decisions. Teens report feeling more confident, popular, and better about themselves when using social media, and less lonely, depressed and anxious.
Social media can have negative effects on teens, which is also much more pronounced on those with low emotional well-being. It was found that 35 percent of teenagers with low social-emotional well-being reported to have experienced cyber bullying when using social media, while in comparison only five percent of teenagers with high social-emotional well-being stated the same. As such, social media can have a big impact on already fragile states of mind.
MuMiN is a misinformation graph dataset containing rich social media data (tweets, replies, users, images, articles, hashtags), spanning 21 million tweets belonging to 26 thousand Twitter threads, each of which have been semantically linked to 13 thousand fact-checked claims across dozens of topics, events and domains, in 41 different languages, spanning more than a decade.
MuMiN fills a gap in the existing misinformation datasets in multiple ways:
By having a large amount of social media information which have been semantically linked to fact-checked claims on an individual basis. By featuring 41 languages, enabling evaluation of multilingual misinformation detection models. By featuring both tweets, articles, images, social connections and hashtags, enabling multimodal approaches to misinformation detection.
MuMiN features two node classification tasks, related to the veracity of a claim:
Claim classification: Determine the veracity of a claim, given its social network context. Tweet classification: Determine the likelihood that a social media post to be fact-checked is discussing a misleading claim, given its social network context.
To use the dataset, see the "Getting Started" guide and tutorial at the MuMiN website.
As of January 2024, Instagram was slightly more popular with men than women, with men accounting for 50.6 percent of the platform’s global users. Additionally, the social media app was most popular amongst younger audiences, with almost 32 percent of users aged between 18 and 24 years.
Instagram’s Global Audience
As of January 2024, Instagram was the fourth most popular social media platform globally, reaching two billion monthly active users (MAU). This number is projected to keep growing with no signs of slowing down, which is not a surprise as the global online social penetration rate across all regions is constantly increasing.
As of January 2024, the country with the largest Instagram audience was India with 362.9 million users, followed by the United States with 169.7 million users.
Who is winning over the generations?
Even though Instagram’s audience is almost twice the size of TikTok’s on a global scale, TikTok has shown itself to be a fierce competitor, particularly amongst younger audiences. TikTok was the most downloaded mobile app globally in 2022, generating 672 million downloads. As of 2022, Generation Z in the United States spent more time on TikTok than on Instagram monthly.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
About Dataset This dataset captures the pulse of viral social media trends across Facebook, Instagram and Twitter. It provides insights into the most popular hashtags, content types, and user engagement levels, offering a comprehensive view of how trends unfold across platforms. With regional data and influencer-driven content, this dataset is perfect for:
Trend analysis 🔍 Sentiment modeling 💭 Understanding influencer marketing 📈 Dive in to explore what makes content go viral, the behaviors that drive engagement, and how trends evolve on a global scale! 🌍
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Facebook is fast approaching 3 billion monthly active users. That’s about 36% of the world’s entire population that log in and use Facebook at least once a month.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Fardeen Mohammad
Released under CC0: Public Domain
Weibo-COV is a large-scale COVID-19 social media dataset from Weibo, covering more than 30 million posts from 1 November 2019 to 30 April 2020. Moreover, the field information of the dataset is very rich, including basic posts information, interactive information, location information and retweet network.
https://www.ibisworld.com/about/termsofuse/https://www.ibisworld.com/about/termsofuse/
Social media platforms are integral to people's lives, offering ways to communicate, create and view content and share information. According to Ofcom, approximately 89% of UK internet users in 2023 used social media apps or sites. Teenagers and young adults are the biggest users, although there is rapid uptake among older age groups. Advertising is the primary revenue source for social media platforms, although subscription-based services are gaining momentum as platforms seek to diversify their incomes. TikTok is the success story of the last few years, becoming the most downloaded app between 2020 and 2022, according to Apptopia. The short-form video platform reported that it averaged revenue growth of over 450% between 2019 and 2022. After Musk's takeover, X, formerly known as Twitter, adjusted its content moderation and allowed previously banned accounts to return. As a result, over 600 advertisers have pulled their ads from the site because of fears their brand may be associated with malcontent. In response to falling ad revenue, X has introduced a subscription-based service which enables users to verify themselves and boosts the number of people who view their tweets. Meta-owned Facebook and Instagram have responded by introducing a similar service. Revenue is expected to grow by 14.3% in 2024-25, constrained by a slowdown in user growth for most major social media platforms. Over the five years through 2024-25, revenue is forecast to expand at a compound annual rate of 32.8% to reach £9.8 billion. Looking forward, regulations relating to how data is collected, stored, and shared will force advertisers and platforms to rethink how they can target their desired demographics. The rising prominence of AI will require the introduction of adequate regulations. The Online Safety Bill sets out new guidelines for social media platforms to abide by, with hefty fines in store for those who do not. Operating costs will swell as platforms look to meet consumers’ expectations, weighing on profit. Over the five years through 2029-30, social media platforms' revenue is projected to climb at an estimated 9.4% to reach £15.4 billion.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The cross-lingual natural disaster dataset includes public tweets collected using Twitter’s public API, filtering by location-related keywords and date, without using any additional filtering (e.g., we did not restrict the query to specific languages). We considered two disaster events and two long-term natural disasters across Europe (floods and wildfires) that received substantial news coverage internationally.
Three of the top languages were common to all the studied events: English (ISO 639-1 code: en), Spanish (es), and French (fr). Additionally, we found hundreds of messages for each event in other five languages, including Arabic (ar), German (de), Japanese (ja), Indonesian (id), Italian (it) and Portuguese (pt).
After collecting the data, we labelled tweets that contained potentially informative factual information. We name this group of tweets “informative messages.” Next, we used crowdsourcing to further categorize the messages into various informational categories. We asked three different workers to label each informative messages across languages. The target categories were based on an ontology from TREC-IS 2018, where we grouped some low level ontology categories into higher-level ones.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
A list of the most popular (top 100 by followers) Instagram, Twitter, YouTube, Twitch, and TikTok users. NB! For YouTube the followers are subscribers and the posts are videos.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The average Twitter user spends 5.1 hours per month on the platform.
Bluesky Social Dataset Pollution of online social spaces caused by rampaging d/misinformation is a growing societal concern. However, recent decisions to reduce access to social media APIs are causing a shortage of publicly available, recent, social media data, thus hindering the advancement of computational social science as a whole. To address this pressing issue, we present a large, high-coverage dataset of social interactions and user-generated content from Bluesky Social.
The dataset contains the complete post history of over 4M users (81% of all registered accounts), totaling 235M posts. We also make available social data covering follow, comment, repost, and quote interactions.
Since Bluesky allows users to create and bookmark feed generators (i.e., content recommendation algorithms), we also release the full output of several popular algorithms available on the platform, along with their timestamped “like” interactions and time of bookmarking.
This dataset allows unprecedented analysis of online behavior and human-machine engagement patterns. Notably, it provides ground-truth data for studying the effects of content exposure and self-selection, and performing content virality and diffusion analysis.
Dataset Here is a description of the dataset files.
followers.csv.gz. This compressed file contains the anonymized follower edge list. Once decompressed, each row consists of two comma-separated integers u, v, representing a directed following relation (i.e., user u follows user v). posts.tar.gz. This compressed folder contains data on the individual posts collected. Decompressing this file results in 100 files, each containing the full posts of up to 50,000 users. Each post is stored as a JSON-formatted line. interactions.csv.gz. This compressed file contains the anonymized interactions edge list. Once decompressed, each row consists of six comma-separated integers, and represents a comment, repost, or quote interaction. These integers correspond to the following fields, in this order: user_id, replied_author, thread_root_author, reposted_author ,quoted_author, and date. graphs.tar.gz. This compressed folder contains edge list files for the graphs emerging from reposts, quotes, and replies. Each interaction is timestamped. The folder also contains timestamped higher-order interactions emerging from discussion threads, each containing all users participating in a thread. feed_posts.tar.gz. This compressed folder contains posts that appear in 11 thematic feeds. Decompressing this folder results in 11 files containing posts from one feed each. Posts are stored as a JSON-formatted line. Fields are correspond to those in posts.tar.gz, except for those related to sentiment analysis (sent_label, sent_score), and reposts (repost_from, reposted_author); feed_bookmarks.csv. This file contains users who bookmarked any of the collected feeds. Each record contains three comma-separated values, namely the feed name, the user id, and the timestamp. feed_post_likes.tar.gz. This compressed folder contains data on likes to posts appearing in the feeds, one file per feed. Each record in the files contains the following information, in this order: the id of the ``liker'', the id of the post's author, the id of the liked post, and the like timestamp; scripts.tar.gz. A collection of Python scripts, including the ones originally used to crawl the data, and to perform experiments. These scripts are detailed in a document released within the folder.
Citation If used for research purposes, please cite the following paper describing the dataset details:
Andrea Failla and Giulio Rossetti. "I'm in the Bluesky Tonight": Insights from a Year Worth of Social Data. (2024) arXiv:2404.18984
Acknowledgments: This work is supported by :
the European Union – Horizon 2020 Program under the scheme “INFRAIA-01-2018-2019 – Integrating Activities for Advanced Communities”, Grant Agreement n.871042, “SoBigData++: European Integrated Infrastructure for Social Mining and Big Data Analytics” (http://www.sobigdata.eu); SoBigData.it which receives funding from the European Union – NextGenerationEU – National Recovery and Resilience Plan (Piano Nazionale di Ripresa e Resilienza, PNRR) – Project: “SoBigData.it – Strengthening the Italian RI for Social Mining and Big Data Analytics” – Prot. IR0000013 – Avviso n. 3264 del 28/12/2021; EU NextGenerationEU programme under the funding schemes PNRR-PE-AI FAIR (Future Artificial Intelligence Research).
Emotion recognition is a higher approach or special case of sentiment analysis. In this task, the result is not produced in terms of either polarity: positive or negative or in the form of rating (from 1 to 5) but of a more detailed level of sentiment analysis in which the result are depicted in more expressions like sadness, enjoyment, anger, disgust, fear and surprise. Emotion recognition plays a critical role in measuring brand value of a product by recognizing specific emotions of customers’ comments. In this study, we have achieved two targets. First and foremost, we built a standard Vietnamese Social Media Emotion Corpus (UIT-VSMEC) with about 6,927 human-annotated sentences with six emotion labels, contributing to emotion recognition research in Vietnamese which is a low-resource language in Natural Language Processing (NLP). Secondly, we assessed and measured machine learning and deep neural network models on our UIT-VSMEC. As a result, Convolutional Neural Network (CNN) model achieved the highest performance with 57.61% of F1-score.
Paper: Vong Ho, Duong Nguyen, Danh Nguyen, Linh Pham, Kiet Nguyen and Ngan Nguyen, Emotion Recognition for Vietnamese Social Media Text, 2019 16th International Conference of the Pacific Association for Computational Linguistics (PACLING 2019), October 11-13, 2019, Ha Noi, Vietnam. Link.
https://sites.google.com/uit.edu.vn/uit-nlp/datasets-projects
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset consists of content published in groups of Russian universities on the social media VKontakte. The dataset contains posts and comments from 9,215 university publics from June 2022 to August 2023.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
trends