Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is web-scraped from popular short video platforms like YouTube Shorts, TikTok, and Instagram Reels. It captures user interaction data, including views, likes, comments, shares, and watch duration, along with multimodal features from video content like text (titles, descriptions), image (visual characteristics), and audio (sound properties). The data has been processed and flattened into a structured CSV format with 17,654 Rows.
Facebook
TwitterA sample of TikTok videos associated with the hashtag #coronavirus were downloaded on September 20, 2020. Misinformation was evaluated on a scale (low, medium, high) using a codebook developed by experts in infectious diseases. Multivariable modeling was used to evaluate factors associated with number of views and presence of user comments indicating intention to change behavior. Videos and related metadata were downloaded using a third-party TikTok Scraper using the search term #coronavirus. Videos were reviewed for content and data were entered on a spreadsheet.
Facebook
Twitterhttps://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/
This dataset was created by Robson Caldeira
Released under Community Data License Agreement - Permissive - Version 1.0
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is designed to help data scientists, analysts, and researchers understand, analyze, and predict viral content across major social media platforms. It captures realistic engagement behavior, sentiment signals, and content attributes that influence virality in todayās digital ecosystem.
The dataset includes multi-platform data from: - TikTok - Instagram - X (Twitter) - YouTube Shorts
Each platform is represented with consistent metrics, making cross-platform comparison easy and reliable.
Ideal for NLP tasks, sentiment analysis, and hashtag impact studies.
These metrics allow deep analysis of user interaction patterns.
Perfect for machine learning models and classification tasks.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The dataset includes engagement metrics such as the number of plays, likes, shares, and comments for all videos posted by news publishers on TikTok up to July 2023.
If you use this dataset in any publication or study, please cite: Cheng, Z., & Li, Y. (2023). Like, Comment, and Share on TikTok: Exploring the Effect of Sentiment and Second-Person View on the User Engagement with TikTok News Videos. Social Science Computer Review, 42(1), 201-223. https://doi.org/10.1177/08944393231178603
Facebook
Twitterhttps://brightdata.com/licensehttps://brightdata.com/license
Use our TikTok profiles dataset to extract business and non-business information from complete public profiles and filter by account name, followers, create date, or engagement score. You may purchase the entire dataset or a customized subset depending on your needs. Popular use cases include sentiment analysis, brand monitoring, influencer marketing, and more. The TikTok dataset includes all major data points: timestamp, account name, nickname, bio,average engagement score, creation date, is_verified,l ikes, followers, external link in bio, and more. Get your TikTok dataset today!
Facebook
Twitterhttps://brightdata.com/licensehttps://brightdata.com/license
Use our TikTok Shop dataset to extract detailed e-commerce insights, including product names, prices, discounts, seller details, product descriptions, categories, customer ratings, and reviews. You may purchase the entire dataset or a customized subset tailored to your needs. Popular use cases include trend analysis, pricing optimization, customer behavior studies, and marketing strategy refinement. The TikTok Shop dataset includes key data points: product performance metrics, user engagement, customer reviews, and more. Unlock the potential of TikTok's shopping platform today with our comprehensive dataset!
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
š± About Dataset Overview This Social Media Engagement Dataset contains comprehensive engagement metrics from 5,000 social media posts across six major platforms: Instagram, Twitter, Facebook, LinkedIn, TikTok, and YouTube. The dataset spans over 2 years (2024-2025) and provides valuable insights into content performance, audience engagement patterns, and influencer analytics.
Dataset Contents The dataset includes 20 detailed features covering various aspects of social media engagement:
Post Information Post_ID: Unique identifier for each post Timestamp: Date and time when the post was published Platform: Social media platform (Instagram, Twitter, Facebook, LinkedIn, TikTok, YouTube) Content_Type: Type of content (Photo, Video, Reel, Tweet, Story, etc.) Category: Content category (Technology, Fashion, Food, Travel, Fitness, Education, Entertainment, Business, Lifestyle, Gaming, Health, Sports) Engagement Metrics Likes: Number of likes/reactions received Comments: Number of comments on the post Shares: Number of shares/retweets/reposts Views: Total number of views Saves: Number of bookmarks/saves Engagement_Rate: Calculated engagement rate percentage Account Information Follower_Count: Number of followers of the account Influencer_Tier: Classification (Nano, Micro, Mid-tier, Macro) Is_Verified: Whether the account is verified (True/False) Content Characteristics Hashtag_Count: Number of hashtags used Content_Length: Length in characters (text) or seconds (video) Sentiment: Sentiment analysis (Positive, Neutral, Negative) Has_Media: Whether post contains media (True/False) Temporal Features Hour_of_Day: Hour when the post was published (0-23) Day_of_Week: Day of the week (Monday-Sunday) Use Cases This dataset is perfect for:
š Predictive Analytics: Build ML models to predict engagement rates š Data Visualization: Create insightful dashboards and charts š¤ Machine Learning: Classification, regression, and clustering tasks ā° Time Series Analysis: Analyze posting patterns and optimal timing šÆ Content Strategy: Optimize content strategy based on data insights š Sentiment Analysis: Study correlation between sentiment and engagement š± Platform Comparison: Compare performance across different platforms š¼ Influencer Marketing: Analyze influencer tier performance Technical Details Format: CSV Size: ~651 KB Rows: 5,000 Columns: 20 Time Period: January 2024 - December 2025 Missing Values: None Potential Research Questions What time of day generates the most engagement? Which platform has the highest engagement rates? How does content type affect performance? Does verified status impact engagement? What's the optimal hashtag count? How does sentiment correlate with engagement? Notes Engagement metrics are platform-realistic and proportional All data is synthetically generated for educational and research purposes Suitable for beginners and advanced data scientists
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
TikTok Video Analytics Dataset
Sample TikTok video dataset with comprehensive engagement metrics and metadata. Each row represents a single TikTok video with content and detailed analytics. This is a sample dataset. To access the full version or request any custom dataset tailored to your needs, contact DataHive at contact@datahive.ai.
Files Included
train.csv ā TikTok video analytics data
What's included
Video URLs and identifiers Comprehensive engagement⦠See the full description on the dataset page: https://huggingface.co/datasets/datahiveai/Tiktok-Videos.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset supports research on how engagement with social media (Instagram and TikTok) was related to problematic social media use (PSMU) and mental well-being. There are three different files. The SPSS and Excel spreadsheet files include the same dataset but in a different format. The SPSS output presents the data analysis in regard to the difference between Instagram and TikTok users.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Tiktok network graph with 5,638 nodes and 318,986 unique links, representing up to 790,599 weighted links between labels, using Gephi network analysis software.
Source of:
PeƱa-FernĆ”ndez, Simón, Larrondo-Ureta, Ainara, & Morales-i-Gras, Jordi. (2022). Current affairs on TikTok. Virality and entertainment for digital natives. Profesional De La Información, 31(1), 1ā12. https://doi.org/10.5281/zenodo.5962655
Abstract:
Since its appearance in 2018, TikTok has become one of the most popular social media platforms among digital natives because of its algorithm-based engagement strategies, a policy of public accounts, and a simple, colorful, and intuitive content interface. As happened in the past with other platforms such as Facebook, Twitter, and Instagram, various media are currently seeking ways to adapt to TikTok and its particular characteristics to attract a younger audience less accustomed to the consumption of journalistic material. Against this background, the aim of this study is to identify the presence of the media and journalists on TikTok, measure the virality and engagement of the content they generate, describe the communities created around them, and identify the presence of journalistic use of these accounts. For this, 23,174 videos from 143 accounts belonging to media from 25 countries were analyzed. The results indicate that, in general, the presence and impact of the media in this social network are low and that most of their content is oriented towards the creation of user communities based on viral content and entertainment. However, albeit with a lesser presence, one can also identify accounts and messages that adapt their content to the specific characteristics of TikTok. Their virality and engagement figures illustrate that there is indeed a niche for current affairs on this social network.
Facebook
Twitterhttps://brightdata.com/licensehttps://brightdata.com/license
Our TikTok Influencer Dataset provides comprehensive insights into influencer profiles, audience engagement, and market impact. This dataset is ideal for brands, marketers, and researchers looking to identify top-performing influencers, analyze engagement metrics, and optimize influencer marketing strategies on TikTok.
Key Features:
Influencer Profiles: Access detailed influencer data, including profile name, bio, profile picture, and direct profile URL.
Follower & Engagement Metrics: Track key performance indicators such as follower count, engagement rate, and interaction levels.
Monetization Insights: Analyze influencer earnings with Gross Merchandise Value (GMV) and currency details.
Category & Niche Segmentation: Identify influencers based on their associated product categories to match brand campaigns with relevant audiences.
Contact Information: Retrieve available influencer email addresses for direct outreach and collaboration.
Use Cases:
Influencer Discovery & Marketing: Find high-performing TikTok influencers for brand partnerships and sponsored campaigns.
Competitive Analysis: Compare influencer engagement rates and audience reach to optimize marketing strategies.
Market Research & Trend Analysis: Identify emerging influencers and track content trends within different product categories.
Performance Benchmarking: Evaluate influencer success based on GMV, engagement rate, and follower growth.
Lead Generation & Outreach: Use available contact details to connect with influencers for collaborations and brand promotions.
Our TikTok Influencer Dataset is available in multiple formats (JSON, CSV, Excel) and can be delivered via
API, cloud storage (AWS, Google Cloud, Azure), or direct download.
Gain valuable insights into the TikTok influencer landscape and enhance your marketing strategies with high-quality, structured data.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By [source]
This dataset explores various factors associated with the reception of COVID-19 related content on TikTok. It not only captures overall levels of user engagement such as likes, comments, and views but also explores source credibility including information from healthcare professionals, news sources, patients, and other outlets. It further dives into demographic factors such as gender and age range as well as content type like humor or provision of clinical instruction. Finally, it takes a look at elements such as description of risk factors & symptoms along with modes of transmission established by the posts in question and prevention that was discussed within them. Moreover, there is a discernment component that breaks down user perception - rating the posts for level of misinformation (moderate/high/low). All these measures combined provide insights into how users are engaging with COVID-19 related misinformation on TikTok
For more datasets, click here.
- šØ Your notebook can be here! šØ!
This dataset contains user engagement data and measures of source credibility related to COVID-19 misinformation on TikTok. It can be used to examine the factors associated with content reception, such as views, likes, comments, as well as factors relating to credibility, demographics and content type.
Using this dataset: - Explore the columns available in the dataset. There are a number of columns that measure user engagement (views, likes and comments) as well as source credibility (official source, healthcare professional etc.), demographic factors (gender, age group etc.), and content type (humor etc). Get familiar with all these columns so that you know what information is available for analysis.
- Decide what kind of analysis you want to perform. You can use this data for exploratory or explanatory work - depending on your aims or research question. For example if you want to see how source credibility affects user engagement then you would need descriptive statistical techniques such as correlation tests or regression analyses etc., whereas if you just want to gain an overall understanding of patterns in this data then exploratory techniques such as cross tabulations may be more suitable.
- Developing a predictive model to identify which demographic and source characteristics are correlated with high user engagement for COVID-related posts on TikTok (e.g. views, likes, and comments).
- Investigating the difference in user engagement for posts from healthcare professionals vs non-professional sources to compare how different types of content are received by users on TikTok.
- Analyzing the sentiment of words related to masks and tests in order to gain insights into how content about this topic is perceived by users on TikTok (i.e., positive or negative sentiment)
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: tiktok_data_open.csv | Column name | Description | |:-------------------------------|:------------------------------------------------------------------------| | views | Number of views for the video. (Integer) | | likes | Number of likes for the video. (Integer) | | comments | Number of comments for the video. (Integer) | | official_source | Whether the source of the video is an official source. (Boolean) | | pub_hcp | Whether the source of the video is a healthcare professional. (Boolean) | | pub_news | Whether the source of the video is a news source. (Boolean) | | pub_patient | Whether the source of the video is a patient. (Boolean) | | pub_other | Whether the source of the video is another source. (Boolean) | | female ...
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset provides detailed engagement metrics for TikTok influencer posts, including video performance, audience growth, and cross-platform mentions. It enables marketers, startups, and researchers to analyze influencer effectiveness, optimize campaigns, and uncover network trends across social media platforms.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains survey responses related to consumer behavior in TikTok live streaming commerce, with a particular focus on the beauty and personal care sector in Indonesia. The data was collected in 2025 through an online questionnaire distributed via Google Forms over a three-month period. A total of 390 respondents participated, all of whom had prior experience purchasing beauty and personal care products through TikTok live streams.
The dataset includes demographic information (such as age, gender, and education level) as well as variables measuring consumer perceptions and behaviors. These variables capture persuasive linguistic style of live stream hosts, customer trust, customer engagement, and purchase intention. All constructs were measured using a 5-point Likert scale.
The dataset is suitable for quantitative explanatory research and can be analyzed using advanced statistical techniques such as Partial Least Squares Structural Equation Modeling (PLS-SEM). It provides valuable insights into the influence of host communication styles on consumer trust, engagement, and purchase decisions in live streaming commerce. Researchers and practitioners can use this dataset to explore digital retail dynamics, customer behavior, and strategies for enhancing engagement and sales effectiveness in TikTok commerce.
Facebook
TwitterThis dataset consists of 734 entries representing social media activity and performance from a local SME (Micro, Small, and Medium Enterprise) across TikTok, Instagram, and Twitter platforms. It captures key metrics related to audience interaction and content strategy effectiveness, and is valuable for evaluating and optimizing digital marketing efforts for small businesses.
Area : Target location or customer region where the UMKM's content is directed. Category : The business content category (e.g., product promotion, education, seasonal campaign). Day : The day of the week the content was published. Month : The month the post went live. Platform : The social media platform used by the UMKM (TikTok, Instagram, or Twitter). Post Type : The format of the content posted: image, video, carousel, or text. Timestamp : The exact date and time when the content was posted. User : The username or business account that posted the content. Week : Week number within the year for time-based analysis. Year : The year the content was posted. Comments : Total number of comments received on the post. Engagement Rate : A calculated metric showing how engaging the content is (based on likes, comments, shares vs. reach/impressions). Hour : Hour of the day the post was published. Impressions : Number of times the content appeared on users' feeds. Likes : Number of likes the post received. Reach : Number of unique users who saw the content. Shares : Number of times users shared the content.
Facebook
TwitterIntroducing a comprehensive and meticulously curated dataset: "European Interest Groups' Social Media Engagement Dataset." This dataset offers a panoramic view of the digital footprint and social media presence of various interest groups within Europe. Encompassing a diverse range of platforms including Twitter, Facebook, Instagram, TikTok, and YouTube. This are the variables: 1. Name: The name of the organization 2. twitter_link: The link of twitter if it is 3. facebook_link: The link of facebook if it is 4. instagram_link: The link of instagram if it is 5. tiktok_link: The link of tiktok if it is 6. linkedin_link: The link of linkedin if it is 7. youtube_link: The link of youtube if it is With a focus on transparency and relevance, this dataset presents a wealth of information that delves into the strategies, content, and reach of interest groups across these dynamic online platforms. Researchers, policymakers, and analysts can explore trends, patterns, and correlations between online activities and real-world influence, shedding light on the evolving landscape of digital interaction within the realm of European interest groups.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
š Synthetic TikTok Virality Dataset
1. Dataset Overview
This dataset contains 10,000 synthetic TikTok video metadata records. It was generated to simulate the complex relationship between video content (text descriptions, hashtags) and viral performance (view counts). It is designed for:
Virality Prediction: Training regression models to predict view counts. Content Optimization: Analyzing which keywords and hooks drive engagement. Social Media Analysis: Understanding⦠See the full description on the dataset page: https://huggingface.co/datasets/MatanKriel/social-assistent-synthetic-data.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset provides a comprehensive and diverse snapshot of social media users and their engagements across various popular platforms such as Instagram, Twitter, Facebook, YouTube, Pinterest, TikTok, and Spotify. With 100 rows of anonymized data, it offers valuable insights into the dynamic world of social media usage. š
Each row in the dataset represents a unique user with a designated User ID and Username to ensure anonymity. Alongside user-specific details, the dataset captures essential information, including the platform being used, the post's content, timestamp, and media type (text, image, or video). Additionally, it tracks engagement metrics such as likes, comments, shares/retweets, and user interactions, providing an overview of the user's popularity and social impact. š¬
https://media.giphy.com/media/3GSoFVODOkiPBFArlu/giphy.gif" alt="social">
The dataset also includes pertinent user attributes, such as account creation date, privacy settings, number of followers, and following. The users' profiles are further enriched with demographic characteristics, including anonymized representations of their age group and gender. šØļø
https://media.giphy.com/media/2tSodgDfwCjIMCBY8h/giphy.gif" alt="socialcat">
Hashtags, mentions, media URLs, post URLs, and self-reported location contribute to understanding user interests, content themes, and geographic distribution. Moreover, users' bios and language preferences offer insights into their passions, activities, and linguistic communication on the platforms.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As social platforms experience an influx of diverse content from users, the need to determine high-quality contributions becomes crucial, especially for educational purposes. This paper highlights the pivotal role of quality in assessing how educational-purposed user-generated content (UGC) shapes user experiences, fosters engagement, and establishes credibility. This study proposes a computational framework using a quasi-experimental evaluation through the sorting-based ELimination Et Choice TRanslating Reality, termed ELECTRE-SORT, with a dataset randomly generated from normally distributed user evaluations. Considering the diverse nature of contents, the method evaluates 16 educational-purposed UGC videos from different online media platforms (i.e. Facebook, YouTube, TikTok). These videos were categorized based on their concordance and discordance to three (3) main criteria: content quality, design quality, and technology quality. Employing the ELECTRE-SORT reveals that most UGC videos (i.e. 14 out of 16) fall into the āmedium qualityā category, possessing a considerable standard for the quality of educational purpose content. Their characteristics generally satisfy the quality attributes and can be used to guide the development of future relevant UGC videos. Finally, to demonstrate the robustness of the proposed approach, we presented a sensitivity analysis by designing different weight assignments to the quality attributes. Practical insights are outlined in this work.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is web-scraped from popular short video platforms like YouTube Shorts, TikTok, and Instagram Reels. It captures user interaction data, including views, likes, comments, shares, and watch duration, along with multimodal features from video content like text (titles, descriptions), image (visual characteristics), and audio (sound properties). The data has been processed and flattened into a structured CSV format with 17,654 Rows.