Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
TikTok is the leading destination for short-form mobile video. The platform is built to help imaginations thrive. TikTok's mission is to create a place for inclusive, joyful, and authentic content–where people can safely discover, create, and connect.
| Column name | Type | Description |
|---|---|---|
| # | int | TikTok assigned number for video with claim/opinion. |
| claim_status | obj | Whether the published video has been identified as an “opinion” or a “claim.” In this dataset, an “opinion” refers to an individual’s or group’s personal belief or thought. A “claim” refers to information that is either unsourced or from an unverified source. |
| video_id | int | Random identifying number assigned to video upon publication on TikTok. |
| video_duration_sec | int | How long the published video is measured in seconds. |
| video_transcription_text | obj | Transcribed text of the words spoken in the published video. |
| verified_status | obj | Indicates the status of the TikTok user who published the video in terms of their verification, either “verified” or “not verified.” |
| author_ban_status | obj | Indicates the status of the TikTok user who published the video in terms of their permissions: “active,” “under scrutiny,” or “banned.” |
| video_view_count | float | The total number of times the published video has been viewed. |
| video_like_count | float | The total number of times the published video has been liked by other users. |
| video_share_count | float | The total number of times the published video has been shared by other users. |
| video_download_count | float | The total number of times the published video has been downloaded by other users. |
| video_comment_count | float | The total number of comments on the published video. |
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains information about TikTok videos, including user interactions and video details. It includes features such as video ID, username, video title, likes, comments, shares, views, and more. This dataset is useful for analyzing video performance and user engagement on TikTok.
Columns:
Facebook
Twitterhttps://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/
This dataset was created by Robson Caldeira
Released under Community Data License Agreement - Permissive - Version 1.0
Facebook
TwitterBackground: COVID-related misinformation is prevalent online, including on social media. The purpose of this study was to explore factors associated with user engagement with COVID-related misinformation on the social media platform, TikTok. Methods: A sample of TikTok videos associated with the hashtag #coronavirus were downloaded on September 20, 2020. Misinformation was evaluated on a scale (low, medium, high) using a codebook developed by experts in infectious diseases. Multivariable modeling was used to evaluate factors associated with number of views and presence of user comments indicating intention to change behavior. Results: 166 TikTok videos were identified. Moderate misinformation was present in 36 (22%) videos, and high-level misinformation was present in 11 (7%). After controlling for characteristics and content, videos containing moderate misinformation were less likely to generate a user response indicating intended behavior change. By contrast, videos containing high-le..., ,
Facebook
TwitterThis dataset contains comprehensive information about TikTok posts, originally fetched from RapidAPI. It provides valuable insights into various aspects of TikTok content, including details about the videos, their creators, and audience engagement metrics.
Here's a breakdown of the columns included in this dataset:
video_id: A unique identifier for each TikTok video. author: The username or handle of the TikTok account that posted the video. description: The textual description or caption provided by the creator for the video. (Note: This column contains some missing values.) likes: The number of likes the video has received. comments: The number of comments on the video. shares: The number of times the video has been shared. plays: The total number of plays or views the video has accumulated. (Note: This column contains some missing values.) hashtags: A list of hashtags used in the video's description, which helps categorize content and improve discoverability. (Note: This column contains some missing values.) music: Information about the background music or sound used in the video. create_time: The timestamp indicating when the video was created or published. (Note: This column contains some missing values.) video_url: The direct URL to the TikTok video. fetch_time: The timestamp when the data for the video was fetched from the API. (Note: This column has a high number of missing values.) views: Another metric for the number of views. (Note: This column has a high number of missing values and appears to overlap with plays.) posted_time: The time the video was posted. (Note: This column has a high number of missing values and appears to overlap with create_time.) Potential Uses of This Dataset:
Content Analysis: Analyze popular TikTok content by examining descriptions, hashtags, and engagement metrics. Trend Identification: Identify trending topics, music, and creators on TikTok. Audience Engagement Studies: Understand how different types of content generate likes, comments, shares, and plays. Creator Analysis: Study the posting habits and performance of various TikTok creators. Social Media Research: Conduct research on the dynamics of content dissemination and user interaction on short-form video platforms. Notes on Data Quality:
The description, plays, hashtags, and create_time columns have some missing values, which may require handling (e.g., imputation or removal) depending on your analysis. The fetch_time, views, and posted_time columns are largely empty, suggesting they may not be reliable for comprehensive analysis. It is recommended to primarily rely on create_time for timestamps and plays for engagement metrics. This dataset can be a valuable resource for anyone looking to explore the vast and dynamic world of TikTok content and user engagement.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset captures the pulse of viral social media trends across TikTok, Instagram, Twitter, and YouTube. It provides insights into the most popular hashtags, content types, and user engagement levels, offering a comprehensive view of how trends unfold across platforms. With regional data and influencer-driven content, this dataset is perfect for:
Dive in to explore what makes content go viral, the behaviors that drive engagement, and how trends evolve on a global scale! 🌍
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
TikTok Video Analytics Dataset
Sample TikTok video dataset with comprehensive engagement metrics and metadata. Each row represents a single TikTok video with content and detailed analytics. This is a sample dataset. To access the full version or request any custom dataset tailored to your needs, contact DataHive at contact@datahive.ai.
Files Included
train.csv – TikTok video analytics data
What's included
Video URLs and identifiers Comprehensive engagement… See the full description on the dataset page: https://huggingface.co/datasets/datahiveai/Tiktok-Videos.
Facebook
Twitterhttps://brightdata.com/licensehttps://brightdata.com/license
Use our TikTok profiles dataset to extract business and non-business information from complete public profiles and filter by account name, followers, create date, or engagement score. You may purchase the entire dataset or a customized subset depending on your needs. Popular use cases include sentiment analysis, brand monitoring, influencer marketing, and more. The TikTok dataset includes all major data points: timestamp, account name, nickname, bio,average engagement score, creation date, is_verified,l ikes, followers, external link in bio, and more. Get your TikTok dataset today!
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset supports research on how engagement with social media (Instagram and TikTok) was related to problematic social media use (PSMU) and mental well-being. There are three different files. The SPSS and Excel spreadsheet files include the same dataset but in a different format. The SPSS output presents the data analysis in regard to the difference between Instagram and TikTok users.
Facebook
Twitterhttps://brightdata.com/licensehttps://brightdata.com/license
Use our TikTok Shop dataset to extract detailed e-commerce insights, including product names, prices, discounts, seller details, product descriptions, categories, customer ratings, and reviews. You may purchase the entire dataset or a customized subset tailored to your needs. Popular use cases include trend analysis, pricing optimization, customer behavior studies, and marketing strategy refinement. The TikTok Shop dataset includes key data points: product performance metrics, user engagement, customer reviews, and more. Unlock the potential of TikTok's shopping platform today with our comprehensive dataset!
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By [source]
This dataset explores various factors associated with the reception of COVID-19 related content on TikTok. It not only captures overall levels of user engagement such as likes, comments, and views but also explores source credibility including information from healthcare professionals, news sources, patients, and other outlets. It further dives into demographic factors such as gender and age range as well as content type like humor or provision of clinical instruction. Finally, it takes a look at elements such as description of risk factors & symptoms along with modes of transmission established by the posts in question and prevention that was discussed within them. Moreover, there is a discernment component that breaks down user perception - rating the posts for level of misinformation (moderate/high/low). All these measures combined provide insights into how users are engaging with COVID-19 related misinformation on TikTok
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset contains user engagement data and measures of source credibility related to COVID-19 misinformation on TikTok. It can be used to examine the factors associated with content reception, such as views, likes, comments, as well as factors relating to credibility, demographics and content type.
Using this dataset: - Explore the columns available in the dataset. There are a number of columns that measure user engagement (views, likes and comments) as well as source credibility (official source, healthcare professional etc.), demographic factors (gender, age group etc.), and content type (humor etc). Get familiar with all these columns so that you know what information is available for analysis.
- Decide what kind of analysis you want to perform. You can use this data for exploratory or explanatory work - depending on your aims or research question. For example if you want to see how source credibility affects user engagement then you would need descriptive statistical techniques such as correlation tests or regression analyses etc., whereas if you just want to gain an overall understanding of patterns in this data then exploratory techniques such as cross tabulations may be more suitable.
- Developing a predictive model to identify which demographic and source characteristics are correlated with high user engagement for COVID-related posts on TikTok (e.g. views, likes, and comments).
- Investigating the difference in user engagement for posts from healthcare professionals vs non-professional sources to compare how different types of content are received by users on TikTok.
- Analyzing the sentiment of words related to masks and tests in order to gain insights into how content about this topic is perceived by users on TikTok (i.e., positive or negative sentiment)
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: tiktok_data_open.csv | Column name | Description | |:-------------------------------|:------------------------------------------------------------------------| | views | Number of views for the video. (Integer) | | likes | Number of likes for the video. (Integer) | | comments | Number of comments for the video. (Integer) | | official_source | Whether the source of the video is an official source. (Boolean) | | pub_hcp | Whether the source of the video is a healthcare professional. (Boolean) | | pub_news | Whether the source of the video is a news source. (Boolean) | | pub_patient | Whether the source of the video is a patient. (Boolean) | | pub_other | Whether the source of the video is another source. (Boolean) | | female ...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Tiktok network graph with 5,638 nodes and 318,986 unique links, representing up to 790,599 weighted links between labels, using Gephi network analysis software.
Source of:
Peña-Fernández, Simón, Larrondo-Ureta, Ainara, & Morales-i-Gras, Jordi. (2022). Current affairs on TikTok. Virality and entertainment for digital natives. Profesional De La Información, 31(1), 1–12. https://doi.org/10.5281/zenodo.5962655
Abstract:
Since its appearance in 2018, TikTok has become one of the most popular social media platforms among digital natives because of its algorithm-based engagement strategies, a policy of public accounts, and a simple, colorful, and intuitive content interface. As happened in the past with other platforms such as Facebook, Twitter, and Instagram, various media are currently seeking ways to adapt to TikTok and its particular characteristics to attract a younger audience less accustomed to the consumption of journalistic material. Against this background, the aim of this study is to identify the presence of the media and journalists on TikTok, measure the virality and engagement of the content they generate, describe the communities created around them, and identify the presence of journalistic use of these accounts. For this, 23,174 videos from 143 accounts belonging to media from 25 countries were analyzed. The results indicate that, in general, the presence and impact of the media in this social network are low and that most of their content is oriented towards the creation of user communities based on viral content and entertainment. However, albeit with a lesser presence, one can also identify accounts and messages that adapt their content to the specific characteristics of TikTok. Their virality and engagement figures illustrate that there is indeed a niche for current affairs on this social network.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is designed to help data scientists, analysts, and researchers understand, analyze, and predict viral content across major social media platforms. It captures realistic engagement behavior, sentiment signals, and content attributes that influence virality in today’s digital ecosystem.
The dataset includes multi-platform data from: - TikTok - Instagram - X (Twitter) - YouTube Shorts
Each platform is represented with consistent metrics, making cross-platform comparison easy and reliable.
Ideal for NLP tasks, sentiment analysis, and hashtag impact studies.
These metrics allow deep analysis of user interaction patterns.
Perfect for machine learning models and classification tasks.
Facebook
TwitterIntroducing a comprehensive and meticulously curated dataset: "European Interest Groups' Social Media Engagement Dataset." This dataset offers a panoramic view of the digital footprint and social media presence of various interest groups within Europe. Encompassing a diverse range of platforms including Twitter, Facebook, Instagram, TikTok, and YouTube. This are the variables:
With a focus on transparency and relevance, this dataset presents a wealth of information that delves into the strategies, content, and reach of interest groups across these dynamic online platforms. Researchers, policymakers, and analysts can explore trends, patterns, and correlations between online activities and real-world influence, shedding light on the evolving landscape of digital interaction within the realm of European interest groups.
Facebook
TwitterThis repository contains all IDs for political TikTok posts used in the study “Toxic Politics and TikTok Engagement in the 2024 U.S. Election”, published in the Harvard Kennedy School Misinformation Review. The project investigates how political partisanship, toxicity, and topical content influence user engagement with TikTok videos during the 2024 U.S. presidential election cycle. If you use this dataset, please cite: Biswas, A., Javadian Sabet, A., & Lin, Y.-R. (2025). Toxic politics and TikTok engagement in the 2024 U.S. election. Harvard Kennedy School (HKS) Misinformation Review. https://doi.org/10.37016/mr-2020-181
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is web-scraped from popular short video platforms like YouTube Shorts, TikTok, and Instagram Reels. It captures user interaction data, including views, likes, comments, shares, and watch duration, along with multimodal features from video content like text (titles, descriptions), image (visual characteristics), and audio (sound properties). The data has been processed and flattened into a structured CSV format with 17,654 Rows.
Facebook
TwitterSummary
This dataset contains TikTok comments and replies identified as antisemitic. This dataset contains metadata about each comment, such as user profile, engagement metrics, and geographical data, to support comprehensive analysis.
Date Fields
The dataset is structured as a CSV file with the following columns: post_id: Unique numerical identifier for the TikTok post comment_id: Unique numerical identifier for the comment or reply parent_id: Reference to the comment_id… See the full description on the dataset page: https://huggingface.co/datasets/seanvelasco/tiktok-antisemitism.
Facebook
Twitterhttps://brightdata.com/licensehttps://brightdata.com/license
Our TikTok Influencer Dataset provides comprehensive insights into influencer profiles, audience engagement, and market impact. This dataset is ideal for brands, marketers, and researchers looking to identify top-performing influencers, analyze engagement metrics, and optimize influencer marketing strategies on TikTok.
Key Features:
Influencer Profiles: Access detailed influencer data, including profile name, bio, profile picture, and direct profile URL.
Follower & Engagement Metrics: Track key performance indicators such as follower count, engagement rate, and interaction levels.
Monetization Insights: Analyze influencer earnings with Gross Merchandise Value (GMV) and currency details.
Category & Niche Segmentation: Identify influencers based on their associated product categories to match brand campaigns with relevant audiences.
Contact Information: Retrieve available influencer email addresses for direct outreach and collaboration.
Use Cases:
Influencer Discovery & Marketing: Find high-performing TikTok influencers for brand partnerships and sponsored campaigns.
Competitive Analysis: Compare influencer engagement rates and audience reach to optimize marketing strategies.
Market Research & Trend Analysis: Identify emerging influencers and track content trends within different product categories.
Performance Benchmarking: Evaluate influencer success based on GMV, engagement rate, and follower growth.
Lead Generation & Outreach: Use available contact details to connect with influencers for collaborations and brand promotions.
Our TikTok Influencer Dataset is available in multiple formats (JSON, CSV, Excel) and can be delivered via
API, cloud storage (AWS, Google Cloud, Azure), or direct download.
Gain valuable insights into the TikTok influencer landscape and enhance your marketing strategies with high-quality, structured data.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
TikTok Creator and Video Engagement (1M)
This release contains 1,035,817 TikTok videos from 4,926 creators with daily engagement and follower statistics, covering videos posted from 2024-06-09 to 2025-03-20. Github: https://github.com/lingbowzd/tiktok-creator-video-trend-data Arxiv: coming soon...
Uses
This dataset supports research on TikTok creator behavior, content strategy, trend adoption, and audience engagement over time. Unlike random collections of TikTok videos… See the full description on the dataset page: https://huggingface.co/datasets/lingbow/tiktok-video-engagement-1m.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a dataset accompanying the paper “Algorithmic Audit of Personalisation Drift in Polarising Topics on TikTok”, designed to analyze video interactions and user engagement patterns on TikTok website. It contains records of interactions of social media auditing agents with TikTok website over the timespan of present study.
The video excerpts included in this dataset are used solely as units of content for analytical purposes. They do not represent, reflect, or imply the personal views, intentions, or stance of the individuals who created them. Content should be interpreted as data artifacts, not as statements attributable to any person.
To minimize the risk of third-party misuse, the dataset is available only to researchers for non-commercial research purposes upon verification of their email address associated with academic organisation.
Paper: TBA
Preprint: TBA
GitHub repository: https://github.com/kinit-sk/ai-auditology-personalisation-drift-tiktok
If you use this dataset in any publication, project, tool or in any other form, please, cite the following paper:
TBA
The dataset consists of 3 CSV files:
ai-auditology-personalisation-drift-tiktok_32_agents_polarizing_plus_neutral.csv — Data for the first user group (neutral+polarising) consists of 30 users from runs which were seeded with both polarizing and neutral topic.
ai-auditology-personalisation-drift-tiktok_32_agents_polarizing_only.csv — Data for the second user group (polarising only) consists of an additional 32 users (4 for topic+stance) that are only seeded with a polarising topic (representing maximum polarity), but interact with a neutral topic during the interaction phase.
ai-auditology-personalisation-drift-tiktok_US_politics_4_agents_mixed_polarity.csv — Data for the third user group (mixed polarity) seeded with equal manner with only the US politics topic.
The CSV files contain 28 columns (29 for data contained in ai-auditology-personalisation-drift-tiktok_US_politics_4_agents_mixed_polarity.csv), capturing details such as session and video identifiers, timestamps, ad classifications, visual indicators, user demographics, and video metadata.
|
Column name |
Data type |
Description |
Example |
|
interaction_number |
integer |
Unique integer per interaction per agent |
1,2,3… |
|
video_url |
string |
URL of video the agent interacted with | |
|
video_id |
string |
TikTok unique video ID |
1234 |
|
video_author |
string |
TikTok author name |
author123 |
|
video_description |
string |
Video description generated by video author plus hashtags |
This video is about… |
|
video_time_duration |
integer |
Duration of video in seconds |
67.9333 |
|
video_transcript |
string |
Speech transcript by inhouse Whisper model |
Welcome to my video about… |
|
video_transcript_language |
string |
Code for language detected in transcript |
en, fr …. |
|
video_action_skip |
bool |
Decision by user interaction predictor, TRUE if video is to be skipped |
TRUE, FALSE |
|
video_action_watch |
bool |
Decision by user interaction predictor, TRUE if video is to be watched |
TRUE, FALSE |
|
video_action_like |
bool |
Decision by user interaction predictor, TRUE if video is to be liked |
TRUE, FALSE |
|
video_action_bookmark |
bool |
Decision by user interaction predictor, TRUE if video is to be bookmarked |
TRUE, FALSE |
|
video_time_watch_loop_start |
integer |
UNIX timestamp of time when agent started watching particular video |
1765302470.8245792 |
|
video_time_watch_loop_end |
integer |
UNIX timestamp of time when agent finished watching particular video |
1765302470.8245792 |
|
video_time_skip |
integer |
UNIX timestamp of time when agent skipped particular video |
1765302470.8245792 |
|
video_time_like |
integer |
UNIX timestamp of time when agent liked particular video |
1765302470.8245792 |
|
video_time_bookmark |
integer |
UNIX timestamp of time when agent bookmarked particular video |
1765302470.8245792 |
|
video_time_predict_interaction |
integer |
UNIX timestamp of time when user interaction predictor predicted how to interact with particular video |
1765302470.8245792 |
|
agent_id |
string |
Unique ID of agent |
agent_id |
|
topic |
string |
Topic of interest of given agent |
Vaccines, US Politics, Flatearth, Climate change, Cooking |
|
stance |
string |
Stance towards the topic of interest of given agent |
support, oppose |
|
gender |
string |
Gender set for given agent in TikTok |
male, female |
|
country_code |
string |
Country of origin set for given agent |
US |
|
date_of_birth |
string |
Date of birth set for given agent in TikTok |
1/2/2005 |
|
run_id |
string |
ID of given agent run |
1759515058.941394_main |
|
predicted_topic_match |
bool |
TRUE if predicted_topic == topic of interest |
TRUE, FALSE |
|
predicted_stance_match |
bool |
TRUE if predicted stance == stance of given agent |
TRUE, FALSE |
|
predicted_topic |
string |
Topic predicted by data annotator using these data fields: video_author, video_description, video_transcript |
Vaccines, US Politics, Flatearth, Climate change, Cooking |
|
predicted_stance |
string |
Predicted stance towards the topic of interest of given agent. Only in ai-auditology-personalisation-drift-tiktok_US_politics_4_agents_mixed_polarity.csv |
support, oppose |
Most of the ethical, legal and societal issues tied to this dataset were already described in the Ethical Considerations section of the associated paper. The most severe risks were tied to a Terms of Service (ToS) violation, various types of privacy intrusions, the possibility of third-party misuse, or the
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
TikTok is the leading destination for short-form mobile video. The platform is built to help imaginations thrive. TikTok's mission is to create a place for inclusive, joyful, and authentic content–where people can safely discover, create, and connect.
| Column name | Type | Description |
|---|---|---|
| # | int | TikTok assigned number for video with claim/opinion. |
| claim_status | obj | Whether the published video has been identified as an “opinion” or a “claim.” In this dataset, an “opinion” refers to an individual’s or group’s personal belief or thought. A “claim” refers to information that is either unsourced or from an unverified source. |
| video_id | int | Random identifying number assigned to video upon publication on TikTok. |
| video_duration_sec | int | How long the published video is measured in seconds. |
| video_transcription_text | obj | Transcribed text of the words spoken in the published video. |
| verified_status | obj | Indicates the status of the TikTok user who published the video in terms of their verification, either “verified” or “not verified.” |
| author_ban_status | obj | Indicates the status of the TikTok user who published the video in terms of their permissions: “active,” “under scrutiny,” or “banned.” |
| video_view_count | float | The total number of times the published video has been viewed. |
| video_like_count | float | The total number of times the published video has been liked by other users. |
| video_share_count | float | The total number of times the published video has been shared by other users. |
| video_download_count | float | The total number of times the published video has been downloaded by other users. |
| video_comment_count | float | The total number of comments on the published video. |