Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains information on TikTok users' reports of videos and comments that include user claims. These reports flag content for moderator review, generating a significant volume of user reports that need timely attention.
TikTok is developing a predictive model to determine whether a video contains a claim or offers an opinion. A successful prediction model will help reduce the backlog of user reports and enable more efficient prioritization.
This dataset is intended for exploratory data analysis (EDA), statistical analysis, and predictive modeling. It has been created for pedagogical purposes and aims to facilitate learning and research in data analysis and machine learning
Facebook
Twitterhttps://brightdata.com/licensehttps://brightdata.com/license
Use our TikTok profiles dataset to extract business and non-business information from complete public profiles and filter by account name, followers, create date, or engagement score. You may purchase the entire dataset or a customized subset depending on your needs. Popular use cases include sentiment analysis, brand monitoring, influencer marketing, and more. The TikTok dataset includes all major data points: timestamp, account name, nickname, bio,average engagement score, creation date, is_verified,l ikes, followers, external link in bio, and more. Get your TikTok dataset today!
Facebook
TwitterA sample of TikTok videos associated with the hashtag #coronavirus were downloaded on September 20, 2020. Misinformation was evaluated on a scale (low, medium, high) using a codebook developed by experts in infectious diseases. Multivariable modeling was used to evaluate factors associated with number of views and presence of user comments indicating intention to change behavior. Videos and related metadata were downloaded using a third-party TikTok Scraper using the search term #coronavirus. Videos were reviewed for content and data were entered on a spreadsheet.
Facebook
TwitterThis dataset was created by Sharipov Ibrokhim
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
How do you measure the success of a video on social media? Is it the number of likes? The number of shares? The number of comments?
This dataset contains information on videos posted to the social media platform TikTok. The data includes the video ID, description, creation time, length, number of likes, shares, and comments, as well as a link to the video.
With this data, you can explore what factors make a video popular on TikTok and learn more about user preferences on this rapidly growing social media platform
This dataset can be used to study user preferences in social media. The data includes the number of likes, shares, comments, and plays for each video, as well as the video's description, length, and link
- Identifying trends in social media
- Analyzing user preferences in social media
- Predicting future trends in social media
Dataset by TikTok
License
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: omnibuslaw_videos.csv | Column name | Description | |:---------------|:---------------------------------------------------------| | createTime | The date and time the video was posted. (DateTime) | | n_likes | The number of likes the video has received. (Integer) | | n_shares | The number of times the video has been shared. (Integer) | | n_comments | The number of comments the video has received. (Integer) | | n_plays | The number of times the video has been played. (Integer) |
File: tiktok_liked_videos.csv | Column name | Description | |:---------------|:----------------------------------------------------------| | n_likes | The number of likes the video has received. (Integer) | | n_shares | The number of times the video has been shared. (Integer) | | n_comments | The number of comments the video has received. (Integer) | | n_plays | The number of times the video has been played. (Integer) | | user_name | The username of the person who posted the video. (String) |
File: trending.csv | Column name | Description | |:---------------|:----------------------------------------------------------| | user_name | The username of the person who posted the video. (String) | | n_likes | The number of likes the video has received. (Integer) | | n_shares | The number of times the video has been shared. (Integer) | | n_comments | The number of comments the video has received. (Integer) | | n_plays | The number of times the video has been played. (Integer) |
File: washingtonpost_videos.csv | Column name | Description | |:---------------|:----------------------------------------------------------| | user_name | The username of the person who posted the video. (String) | | n_likes | The number of likes the video has received. (Integer) | | n_shares | The number of times the video has been shared. (Integer) | | n_comments | The number of comments the video has received. (Integer) | | n_plays | The number of times the video has been played. (Integer) |
Facebook
Twitterhttps://brightdata.com/licensehttps://brightdata.com/license
Use our TikTok Shop dataset to extract detailed e-commerce insights, including product names, prices, discounts, seller details, product descriptions, categories, customer ratings, and reviews. You may purchase the entire dataset or a customized subset tailored to your needs. Popular use cases include trend analysis, pricing optimization, customer behavior studies, and marketing strategy refinement. The TikTok Shop dataset includes key data points: product performance metrics, user engagement, customer reviews, and more. Unlock the potential of TikTok's shopping platform today with our comprehensive dataset!
Facebook
Twitterhttps://www.gesis.org/en/institute/data-usage-termshttps://www.gesis.org/en/institute/data-usage-terms
TikTok is developing into a key platform for news, advertising, politics, online shopping, and entertainment in Germany, with over 20 million monthly users. Especially among young people, TikTok plays an increasing role in their information environment. We provide a human-coded dataset of over 4,000 TikTok videos from German-speaking news outlets from 2023. The coding includes descriptive variables of the videos (e.g., visual style, text overlays, and audio presence) and theory-derived concepts from the journalism sciences (e.g., news values).
This dataset consists of every second video published in 2023 by major news outlets active on TikTok from Germany, Austria, and Switzerland. The data collection was facilitated with the official TikTok API in January 2024. The manual coding took place between September 2024 and December 2024. For a detailed description of the data collection, validation, annotation and descriptive analysis, please refer to [Forthcoming dataset paper publication].
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
TikTok users have the ability to submit reports that identify videos and comments that contain user claims. In a social media platform like TikTok, report a claim typically refers to the feature that allows users to report content that they believe violates the platform's community guidelines or terms of service. When a user reports a claim over a video, they are flagging the content for reviewing by the platform's content moderation team. The team then assess the reported content to determine if it indeed violates the guidelines, and if so, they may take actions such as removing the content, issuing a warning to the user who posted it, or even suspending or banning the user's account who posted the video. Reporting a claim is an important tool for maintaining a safe and respectful environment on social media platforms.
However, this process generates a large number of reports that are challenging to consider in a timely manner. Therefore, TikTok is working on the development of a predictive model that can determine whether a video contains a claim or offers an opinion. With a successful prediction model, TikTok can reduce the backlog of user reports and prioritize them more efficiently.
The TikTok data team is developing a machine learning model for classifying claims made in videos submitted to the platform.
The target variable:
The data dictionary shows that there is a column called claim_status. This is a binary value that indicates whether a video is a claim or an opinion. This is the target variable. In other words, for each video, the model should predict whether the video is a claim or an opinion. This is a classification task because the model is predicting a binary class.
To determine which evaluation metric might be best, consider how the model might be wrong. There are two possibilities for bad predictions:
In the given scenario, it's better for the model to predict false positives when it makes a mistake, and worse for it to predict false negatives. It is very important to identify videos that break the terms of service, even if that means some opinion videos are misclassified as claims. The worst case for an opinion misclassified as a claim is that the video goes to human review. The worst case for a claim that is misclassified as an opinion is that the video does not get reviewed and it violates the terms of service.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
TikTok Video Analytics Dataset
Sample TikTok video dataset with comprehensive engagement metrics and metadata. Each row represents a single TikTok video with content and detailed analytics. This is a sample dataset. To access the full version or request any custom dataset tailored to your needs, contact DataHive at contact@datahive.ai.
Files Included
train.csv – TikTok video analytics data
What's included
Video URLs and identifiers Comprehensive engagement… See the full description on the dataset page: https://huggingface.co/datasets/datahiveai/Tiktok-Videos.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset supports research on how engagement with social media (Instagram and TikTok) was related to problematic social media use (PSMU) and mental well-being. There are three different files. The SPSS and Excel spreadsheet files include the same dataset but in a different format. The SPSS output presents the data analysis in regard to the difference between Instagram and TikTok users.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Dataset Card for TikTok Harmful Video Dataset (Vietnamese)
Dataset Details
Dataset Description
This dataset contains TikTok videos collected for research on harmful content detection in Vietnamese. Each sample is stored as a folder with:
the original video file (video.mp4) a metadata file (metadata.json)
The dataset is designed for multimodal learning (video + audio + text from metadata).
Curated by: Student research project (IE212 – Big Data, UIT, VNU-HCM)… See the full description on the dataset page: https://huggingface.co/datasets/funa21/tiktok-data.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset was created by Funan Ma
Released under Database: Open Database, Contents: Database Contents
Facebook
Twitterhttps://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
TikTok-10M Dataset
Dataset Description
TikTok-10M is a large-scale dataset containing 10 million short-form posts from TikTok, designed for video understanding, multimodal learning, and social media content analysis. The dataset was curated to bridge the gap between academic video datasets and actual user-generated content, providing researchers with authentic patterns and characteristics of modern short-form video content that dominates social media platforms.… See the full description on the dataset page: https://huggingface.co/datasets/The-data-company/TikTok-10M.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a dataset of videos and comments related to the invasion of Ukraine, published on TikTok by a number of users over the year of 2022. It was compiled by Benjamin Steel, Sara Parker and Derek Ruths at the Network Dynamics Lab, McGill University. We created this dataset to facilitate the study of TikTok, and the nature of social interaction on the platform relevant to a major political event.
The dataset has been released here on Zenodo: https://doi.org/10.5281/zenodo.7534952 as well as on Github: https://github.com/networkdynamics/data-and-code/tree/master/ukraine_tiktok
To create the dataset, we identified hashtags and keywords explicitly related to the conflict to collect a core set of videos (or ”TikToks”). We then compiled comments associated with these videos. All of the data captured is publically available information, and contains personally identifiable information. In total we collected approximately 16 thousand videos and 12 million comments, from approximately 6 million users. There are approximately 1.9 comments on average per user captured, and 1.5 videos per user who posted a video. The author personally collected this data using the web scraping PyTok library, developed by the author: https://github.com/networkdynamics/pytok.
Due to scraping duration, this is just a sample of the publically available discourse concerning the invasion of Ukraine on TikTok. Due to the fuzzy search functionality of the TikTok, the dataset contains videos with a range of relatedness to the invasion.
We release here the unique video IDs of the dataset in a CSV format. The data was collected without the specific consent of the content creators, so we have released only the data required to re-create it, to allow users to delete content from TikTok and be removed from the dataset if they wish. Contained in this repository are scripts that will automatically pull the full dataset, which will take the form of JSON files organised into a folder for each video. The JSON files are the entirety of the data returned by the TikTok API. We include a script to parse the JSON files into CSV files with the most commonly used data. We plan to further expand this dataset as collection processes progress and the war continues. We will version the dataset to ensure reproducibility.
To build this dataset from the IDs here:
Go to https://github.com/networkdynamics/pytok and clone the repo locally
Run pip install -e . in the pytok directory
Run pip install pandas tqdm to install these libraries if not already installed
Run get_videos.py to get the video data
Run video_comments.py to get the comment data
Run user_tiktoks.py to get the video history of the users
Run hashtag_tiktoks.py or search_tiktoks.py to get more videos from other hashtags and search terms
Run load_json_to_csv.py to compile the JSON files into two CSV files, comments.csv and videos.csv
If you get an error about the wrong chrome version, use the command line argument get_videos.py --chrome-version YOUR_CHROME_VERSION Please note pulling data from TikTok takes a while! We recommend leaving the scripts running on a server for a while for them to finish downloading everything. Feel free to play around with the delay constants to either speed up the process or avoid TikTok rate limiting.
Please do not hesitate to make an issue in this repo to get our help with this!
The videos.csv will contain the following columns:
video_id: Unique video ID
createtime: UTC datetime of video creation time in YYYY-MM-DD HH:MM:SS format
author_name: Unique author name
author_id: Unique author ID
desc: The full video description from the author
hashtags: A list of hashtags used in the video description
share_video_id: If the video is sharing another video, this is the video ID of that original video, else empty
share_video_user_id: If the video is sharing another video, this the user ID of the author of that video, else empty
share_video_user_name: If the video is sharing another video, this is the user name of the author of that video, else empty
share_type: If the video is sharing another video, this is the type of the share, stitch, duet etc.
mentions: A list of users mentioned in the video description, if any
The comments.csv will contain the following columns:
comment_id: Unique comment ID
createtime: UTC datetime of comment creation time in YYYY-MM-DD HH:MM:SS format
author_name: Unique author name
author_id: Unique author ID
text: Text of the comment
mentions: A list of users that are tagged in the comment
video_id: The ID of the video the comment is on
comment_language: The language of the comment, as predicted by the TikTok API
reply_comment_id: If the comment is replying to another comment, this is the ID of that comment
The date can be compiled into a user interaction network to facilitate study of interaction dynamics. There is code to help with that here: https://github.com/networkdynamics/polar-seeds. Additional scripts for further preprocessing of this data can be found there too.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was initially used in the paper "The use and impact of TikTok in the 2022 Brazilian presidential election". It contains data from official TikTok accounts of the two main candidates running for the 2022 Brazilian presidential election, Lula (@lulaoficial) and Bolsonaro (@bolsonaromessiasjair). It was collected 576 posts of the candidates and more than 540 million interactions on these posts. Data encompass three periods of 2022: (i) Pre-campaign (Jun 30 to Aug 15); (ii) 1st round campaign (Aug 16 to Oct 1); and (iii) 2nd round campaign (Oct 2 - Oct 29). It contains two files. (i) Accounts: How many followers the candidate has, on a day-to-day basis, starting on Sept 5; and (ii) Posts and interactions: Individual data and metrics of each post, including date of the post, text, link for the post, number of plays, likes, comments and shares.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
🎬 YouTube Shorts & TikTok Trends (2025)
Author: Tarek MasryoLicense: CC BY 4.0 A structured snapshot of short-form video activity across YouTube Shorts and TikTok during 2025 (Jan–Aug).Built for content intelligence, analytics dashboards, and ML baselines (classification/regression).
What’s inside
This repository ships:
Two loadable dataset configs (via datasets.load_dataset): default → ML-ready table (cleaned + modeling-friendly) raw → raw video-level table (wider… See the full description on the dataset page: https://huggingface.co/datasets/tarekmasryo/youtube-tiktok-trends-dataset-2025.
Facebook
Twitterhttps://brightdata.com/licensehttps://brightdata.com/license
Our TikTok Influencer Dataset provides comprehensive insights into influencer profiles, audience engagement, and market impact. This dataset is ideal for brands, marketers, and researchers looking to identify top-performing influencers, analyze engagement metrics, and optimize influencer marketing strategies on TikTok.
Key Features:
Influencer Profiles: Access detailed influencer data, including profile name, bio, profile picture, and direct profile URL.
Follower & Engagement Metrics: Track key performance indicators such as follower count, engagement rate, and interaction levels.
Monetization Insights: Analyze influencer earnings with Gross Merchandise Value (GMV) and currency details.
Category & Niche Segmentation: Identify influencers based on their associated product categories to match brand campaigns with relevant audiences.
Contact Information: Retrieve available influencer email addresses for direct outreach and collaboration.
Use Cases:
Influencer Discovery & Marketing: Find high-performing TikTok influencers for brand partnerships and sponsored campaigns.
Competitive Analysis: Compare influencer engagement rates and audience reach to optimize marketing strategies.
Market Research & Trend Analysis: Identify emerging influencers and track content trends within different product categories.
Performance Benchmarking: Evaluate influencer success based on GMV, engagement rate, and follower growth.
Lead Generation & Outreach: Use available contact details to connect with influencers for collaborations and brand promotions.
Our TikTok Influencer Dataset is available in multiple formats (JSON, CSV, Excel) and can be delivered via
API, cloud storage (AWS, Google Cloud, Azure), or direct download.
Gain valuable insights into the TikTok influencer landscape and enhance your marketing strategies with high-quality, structured data.
Facebook
TwitterTiktok network graph with 5,638 nodes and 318,986 unique links, representing up to 790,599 weighted links between labels, using Gephi network analysis software. Source of: Peña-Fernández, Simón, Larrondo-Ureta, Ainara, & Morales-i-Gras, Jordi. (2022). Current affairs on TikTok. Virality and entertainment for digital natives. Profesional De La Información, 31(1), 1–12. https://doi.org/10.5281/zenodo.5962655 Abstract: Since its appearance in 2018, TikTok has become one of the most popular social media platforms among digital natives because of its algorithm-based engagement strategies, a policy of public accounts, and a simple, colorful, and intuitive content interface. As happened in the past with other platforms such as Facebook, Twitter, and Instagram, various media are currently seeking ways to adapt to TikTok and its particular characteristics to attract a younger audience less accustomed to the consumption of journalistic material. Against this background, the aim of this study is to identify the presence of the media and journalists on TikTok, measure the virality and engagement of the content they generate, describe the communities created around them, and identify the presence of journalistic use of these accounts. For this, 23,174 videos from 143 accounts belonging to media from 25 countries were analyzed. The results indicate that, in general, the presence and impact of the media in this social network are low and that most of their content is oriented towards the creation of user communities based on viral content and entertainment. However, albeit with a lesser presence, one can also identify accounts and messages that adapt their content to the specific characteristics of TikTok. Their virality and engagement figures illustrate that there is indeed a niche for current affairs on this social network.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Video Tiktok Extraction is a dataset for object detection tasks - it contains Words annotations for 201 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
Twitterhttps://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/
This dataset was created by Robson Caldeira
Released under Community Data License Agreement - Permissive - Version 1.0
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains information on TikTok users' reports of videos and comments that include user claims. These reports flag content for moderator review, generating a significant volume of user reports that need timely attention.
TikTok is developing a predictive model to determine whether a video contains a claim or offers an opinion. A successful prediction model will help reduce the backlog of user reports and enable more efficient prioritization.
This dataset is intended for exploratory data analysis (EDA), statistical analysis, and predictive modeling. It has been created for pedagogical purposes and aims to facilitate learning and research in data analysis and machine learning