Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
TikTok is the leading destination for short-form mobile video. The platform is built to help imaginations thrive. TikTok's mission is to create a place for inclusive, joyful, and authentic content–where people can safely discover, create, and connect.
| Column name | Type | Description |
|---|---|---|
| # | int | TikTok assigned number for video with claim/opinion. |
| claim_status | obj | Whether the published video has been identified as an “opinion” or a “claim.” In this dataset, an “opinion” refers to an individual’s or group’s personal belief or thought. A “claim” refers to information that is either unsourced or from an unverified source. |
| video_id | int | Random identifying number assigned to video upon publication on TikTok. |
| video_duration_sec | int | How long the published video is measured in seconds. |
| video_transcription_text | obj | Transcribed text of the words spoken in the published video. |
| verified_status | obj | Indicates the status of the TikTok user who published the video in terms of their verification, either “verified” or “not verified.” |
| author_ban_status | obj | Indicates the status of the TikTok user who published the video in terms of their permissions: “active,” “under scrutiny,” or “banned.” |
| video_view_count | float | The total number of times the published video has been viewed. |
| video_like_count | float | The total number of times the published video has been liked by other users. |
| video_share_count | float | The total number of times the published video has been shared by other users. |
| video_download_count | float | The total number of times the published video has been downloaded by other users. |
| video_comment_count | float | The total number of comments on the published video. |
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains information on TikTok users' reports of videos and comments that include user claims. These reports flag content for moderator review, generating a significant volume of user reports that need timely attention.
TikTok is developing a predictive model to determine whether a video contains a claim or offers an opinion. A successful prediction model will help reduce the backlog of user reports and enable more efficient prioritization.
This dataset is intended for exploratory data analysis (EDA), statistical analysis, and predictive modeling. It has been created for pedagogical purposes and aims to facilitate learning and research in data analysis and machine learning
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains information about TikTok videos, including user interactions and video details. It includes features such as video ID, username, video title, likes, comments, shares, views, and more. This dataset is useful for analyzing video performance and user engagement on TikTok.
Columns:
Facebook
TwitterThis dataset contains comprehensive information about TikTok posts, originally fetched from RapidAPI. It provides valuable insights into various aspects of TikTok content, including details about the videos, their creators, and audience engagement metrics.
Here's a breakdown of the columns included in this dataset:
video_id: A unique identifier for each TikTok video. author: The username or handle of the TikTok account that posted the video. description: The textual description or caption provided by the creator for the video. (Note: This column contains some missing values.) likes: The number of likes the video has received. comments: The number of comments on the video. shares: The number of times the video has been shared. plays: The total number of plays or views the video has accumulated. (Note: This column contains some missing values.) hashtags: A list of hashtags used in the video's description, which helps categorize content and improve discoverability. (Note: This column contains some missing values.) music: Information about the background music or sound used in the video. create_time: The timestamp indicating when the video was created or published. (Note: This column contains some missing values.) video_url: The direct URL to the TikTok video. fetch_time: The timestamp when the data for the video was fetched from the API. (Note: This column has a high number of missing values.) views: Another metric for the number of views. (Note: This column has a high number of missing values and appears to overlap with plays.) posted_time: The time the video was posted. (Note: This column has a high number of missing values and appears to overlap with create_time.) Potential Uses of This Dataset:
Content Analysis: Analyze popular TikTok content by examining descriptions, hashtags, and engagement metrics. Trend Identification: Identify trending topics, music, and creators on TikTok. Audience Engagement Studies: Understand how different types of content generate likes, comments, shares, and plays. Creator Analysis: Study the posting habits and performance of various TikTok creators. Social Media Research: Conduct research on the dynamics of content dissemination and user interaction on short-form video platforms. Notes on Data Quality:
The description, plays, hashtags, and create_time columns have some missing values, which may require handling (e.g., imputation or removal) depending on your analysis. The fetch_time, views, and posted_time columns are largely empty, suggesting they may not be reliable for comprehensive analysis. It is recommended to primarily rely on create_time for timestamps and plays for engagement metrics. This dataset can be a valuable resource for anyone looking to explore the vast and dynamic world of TikTok content and user engagement.
Facebook
Twitterhttps://brightdata.com/licensehttps://brightdata.com/license
Use our TikTok profiles dataset to extract business and non-business information from complete public profiles and filter by account name, followers, create date, or engagement score. You may purchase the entire dataset or a customized subset depending on your needs. Popular use cases include sentiment analysis, brand monitoring, influencer marketing, and more. The TikTok dataset includes all major data points: timestamp, account name, nickname, bio,average engagement score, creation date, is_verified,l ikes, followers, external link in bio, and more. Get your TikTok dataset today!
Facebook
Twitterhttps://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/
This dataset was created by Robson Caldeira
Released under Community Data License Agreement - Permissive - Version 1.0
Facebook
Twitterhttps://brightdata.com/licensehttps://brightdata.com/license
Use our TikTok Shop dataset to extract detailed e-commerce insights, including product names, prices, discounts, seller details, product descriptions, categories, customer ratings, and reviews. You may purchase the entire dataset or a customized subset tailored to your needs. Popular use cases include trend analysis, pricing optimization, customer behavior studies, and marketing strategy refinement. The TikTok Shop dataset includes key data points: product performance metrics, user engagement, customer reviews, and more. Unlock the potential of TikTok's shopping platform today with our comprehensive dataset!
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
TikTok is one of the hottest social media platforms out there, and it's only getting bigger. If you're looking to get in on the action, this dataset is for you!
This dataset contains a collection of videos from TikTok, including information on the user who posted the video, the number of likes, shares, and comments the video received, as well as the video's length and description. With this data, you can see what types of videos are popular on TikTok and start planning your own viral content!
- The dataset contains a collection of videos from the social media platform TikTok.
- The videos include information on the user who posted the video, the number of likes, shares, and comments the video received, as well as the video's length and description.
- The dataset also contains information on popular TikTok authors, including their unique ID, nickname, avatar thumbnail, signature, and whether or not their account is verified or private.
- Additionally, the dataset includes a list of trending videos on TikTok, as well as the number of likes, shares, comments, and plays each video has received
- Identifying popular TikTok authors to target for scraping videos and liked videos
- Finding trending videos on TikTok for further analysis
- Generating a list of videos from the TikTok app that are tagged with the #funny hashtag
License
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: tiktok_collected_liked_videos.csv | Column name | Description | |:---------------|:---------------------------------------------------------| | user_name | The name of the user who posted the video. (String) | | n_likes | The number of likes the video has received. (Integer) | | n_shares | The number of shares the video has received. (Integer) | | n_comments | The number of comments the video has received. (Integer) | | n_plays | The number of times the video has been played. (Integer) |
File: tiktok_collected_videos.csv | Column name | Description | |:---------------|:---------------------------------------------------------| | user_name | The name of the user who posted the video. (String) | | n_likes | The number of likes the video has received. (Integer) | | n_shares | The number of shares the video has received. (Integer) | | n_comments | The number of comments the video has received. (Integer) | | n_plays | The number of times the video has been played. (Integer) |
File: tiktok_funny_hashtag_videos.csv | Column name | Description | |:--------------------------|:-----------------------------------------------------------| | author_nickname | The author's nickname. (String) | | author_avatarThumb | The author's avatar thumbnail. (String) | | author_signature | The author's signature. (String) | | author_verification | Whether or not the author's account is verified. (Boolean) | | author_privateAccount | Whether or not the author's account is private. (Boolean) | | author_followingCount | The number of people the author is following. (Integer) | | author_followerCount | The number of people following the author. (Integer) | | author_heartCount | The number of hearts the author has. (Integer) | | author_diggCount | The number of diggs the author has. (Integer) | | music_title | The title of the music. (String) | | music_playUrl | The play url of the music. (String) | | music_coverThumb | The cover thumbnail of the music. (String) | | music_authorName | The author name of the music. (String) | | music_originality | The originality of the music. (String) | | music_duration | The duration of the music. (String) |
File: trending_authors.csv | Column name | Description ...
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
TikTok Video Analytics Dataset
Sample TikTok video dataset with comprehensive engagement metrics and metadata. Each row represents a single TikTok video with content and detailed analytics. This is a sample dataset. To access the full version or request any custom dataset tailored to your needs, contact DataHive at contact@datahive.ai.
Files Included
train.csv – TikTok video analytics data
What's included
Video URLs and identifiers Comprehensive engagement… See the full description on the dataset page: https://huggingface.co/datasets/datahiveai/Tiktok-Videos.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset supports research on how engagement with social media (Instagram and TikTok) was related to problematic social media use (PSMU) and mental well-being. There are three different files. The SPSS and Excel spreadsheet files include the same dataset but in a different format. The SPSS output presents the data analysis in regard to the difference between Instagram and TikTok users.
Facebook
Twitterhttps://www.gesis.org/en/institute/data-usage-termshttps://www.gesis.org/en/institute/data-usage-terms
TikTok is developing into a key platform for news, advertising, politics, online shopping, and entertainment in Germany, with over 20 million monthly users. Especially among young people, TikTok plays an increasing role in their information environment. We provide a human-coded dataset of over 4,000 TikTok videos from German-speaking news outlets from 2023. The coding includes descriptive variables of the videos (e.g., visual style, text overlays, and audio presence) and theory-derived concepts from the journalism sciences (e.g., news values).
This dataset consists of every second video published in 2023 by major news outlets active on TikTok from Germany, Austria, and Switzerland. The data collection was facilitated with the official TikTok API in January 2024. The manual coding took place between September 2024 and December 2024. For a detailed description of the data collection, validation, annotation and descriptive analysis, please refer to [Forthcoming dataset paper publication].
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a dataset of videos and comments related to the invasion of Ukraine, published on TikTok by a number of users over the year of 2022. It was compiled by Benjamin Steel, Sara Parker and Derek Ruths at the Network Dynamics Lab, McGill University. We created this dataset to facilitate the study of TikTok, and the nature of social interaction on the platform relevant to a major political event.
The dataset has been released here on Zenodo: https://doi.org/10.5281/zenodo.7534952 as well as on Github: https://github.com/networkdynamics/data-and-code/tree/master/ukraine_tiktok
To create the dataset, we identified hashtags and keywords explicitly related to the conflict to collect a core set of videos (or ”TikToks”). We then compiled comments associated with these videos. All of the data captured is publically available information, and contains personally identifiable information. In total we collected approximately 16 thousand videos and 12 million comments, from approximately 6 million users. There are approximately 1.9 comments on average per user captured, and 1.5 videos per user who posted a video. The author personally collected this data using the web scraping PyTok library, developed by the author: https://github.com/networkdynamics/pytok.
Due to scraping duration, this is just a sample of the publically available discourse concerning the invasion of Ukraine on TikTok. Due to the fuzzy search functionality of the TikTok, the dataset contains videos with a range of relatedness to the invasion.
We release here the unique video IDs of the dataset in a CSV format. The data was collected without the specific consent of the content creators, so we have released only the data required to re-create it, to allow users to delete content from TikTok and be removed from the dataset if they wish. Contained in this repository are scripts that will automatically pull the full dataset, which will take the form of JSON files organised into a folder for each video. The JSON files are the entirety of the data returned by the TikTok API. We include a script to parse the JSON files into CSV files with the most commonly used data. We plan to further expand this dataset as collection processes progress and the war continues. We will version the dataset to ensure reproducibility.
To build this dataset from the IDs here:
Go to https://github.com/networkdynamics/pytok and clone the repo locally
Run pip install -e . in the pytok directory
Run pip install pandas tqdm to install these libraries if not already installed
Run get_videos.py to get the video data
Run video_comments.py to get the comment data
Run user_tiktoks.py to get the video history of the users
Run hashtag_tiktoks.py or search_tiktoks.py to get more videos from other hashtags and search terms
Run load_json_to_csv.py to compile the JSON files into two CSV files, comments.csv and videos.csv
If you get an error about the wrong chrome version, use the command line argument get_videos.py --chrome-version YOUR_CHROME_VERSION Please note pulling data from TikTok takes a while! We recommend leaving the scripts running on a server for a while for them to finish downloading everything. Feel free to play around with the delay constants to either speed up the process or avoid TikTok rate limiting.
Please do not hesitate to make an issue in this repo to get our help with this!
The videos.csv will contain the following columns:
video_id: Unique video ID
createtime: UTC datetime of video creation time in YYYY-MM-DD HH:MM:SS format
author_name: Unique author name
author_id: Unique author ID
desc: The full video description from the author
hashtags: A list of hashtags used in the video description
share_video_id: If the video is sharing another video, this is the video ID of that original video, else empty
share_video_user_id: If the video is sharing another video, this the user ID of the author of that video, else empty
share_video_user_name: If the video is sharing another video, this is the user name of the author of that video, else empty
share_type: If the video is sharing another video, this is the type of the share, stitch, duet etc.
mentions: A list of users mentioned in the video description, if any
The comments.csv will contain the following columns:
comment_id: Unique comment ID
createtime: UTC datetime of comment creation time in YYYY-MM-DD HH:MM:SS format
author_name: Unique author name
author_id: Unique author ID
text: Text of the comment
mentions: A list of users that are tagged in the comment
video_id: The ID of the video the comment is on
comment_language: The language of the comment, as predicted by the TikTok API
reply_comment_id: If the comment is replying to another comment, this is the ID of that comment
The date can be compiled into a user interaction network to facilitate study of interaction dynamics. There is code to help with that here: https://github.com/networkdynamics/polar-seeds. Additional scripts for further preprocessing of this data can be found there too.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
his dataset contains a large collection of TikTok video metadata fetched using the TikTok Scraper API. It includes videos from multiple regions (e.g., US, India,) and categories (e.g., fyp, dance, comedy, food, travel, etc.). Each video entry
provides detailed information such as:
Video ID: Unique identifier for the video. Region: The region where the video is popular. Category: The keyword/category used to fetch the video (e.g., dance, comedy). Title: The title of the video. Duration: The length of the video in seconds. Play URL: Direct link to the video. Watermarked URL: Link to the watermarked version of the video. Cover Image: URL of the video's cover image. Music URL: Link to the music used in the video. Timestamp: The date and time when the data was fetched.
How This Dataset Can Be Helpful
Trend Analysis: Analyze trending videos across different regions and categories. Identify patterns in video popularity based on region, duration, or category.
Machine Learning: Train models to predict video popularity based on features like duration, region, and category. Build recommendation systems for TikTok videos.
Content Moderation: Use the dataset to analyze video content for moderation purposes.
Sentiment Analysis: Perform sentiment analysis on video titles to understand user preferences.
Cross-Region Insights: Compare video trends across different regions to understand cultural differences.
How to Use This Dataset Filter by Region: Analyze videos from a specific region (e.g., US or India).
Filter by Category: Focus on videos from a specific category (e.g., dance or comedy).
Trend Analysis: Identify trending videos based on timestamp and region.
Machine Learning: Use the dataset to train models for video popularity prediction or recommendation systems.
Facebook
TwitterThe Measurable AI TikTok E-Receipt Dataset is a leading source of email receipts and transaction data, offering data collected directly from users via Proprietary Consumer Apps, with millions of opt-in users.
We source our email receipt consumer data panel via two consumer apps which garner the express consent of our end-users (GDPR compliant). We then aggregate and anonymize all the transactional data to produce raw and aggregate datasets for our clients.
Use Cases Our clients leverage our datasets to produce actionable consumer insights such as: - Market share analysis - User behavioral traits (e.g. retention rates) - Average order values - Promotional strategies used by the key players. Several of our clients also use our datasets for forecasting and understanding industry trends better.
Coverage - Asia (Japan, Thailand, Malaysia, Vietnam, Indonesia, Singapore, Hong Kong, Phillippines) - EMEA (Spain, United Arab Emirates, Saudi, Qatar) - Latin America (Brazil, Mexico, Columbia, Argentina)
Granular Data Itemized, high-definition data per transaction level with metrics such as - Order value - Items ordered - No. of orders per user - Delivery fee - Service fee - Promotions used - Geolocation data and more - Email ID (can work out user overlap with peers and loyalty)
Aggregate Data - Weekly/ monthly order volume - Revenue delivered in aggregate form, with historical data dating back to 2018.
Most of our clients are fast-growing Tech Companies, Financial Institutions, Buyside Firms, Market Research Agencies, Consultancies and Academia.
Our dataset is GDPR compliant, contains no PII information and is aggregated & anonymized with user consent. Contact business@measurable.ai for a data dictionary and to find out our volume in each country.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset of The Influence of TikTok Shop on MSMEs
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Tiktok network graph with 5,638 nodes and 318,986 unique links, representing up to 790,599 weighted links between labels, using Gephi network analysis software.
Source of:
Peña-Fernández, Simón, Larrondo-Ureta, Ainara, & Morales-i-Gras, Jordi. (2022). Current affairs on TikTok. Virality and entertainment for digital natives. Profesional De La Información, 31(1), 1–12. https://doi.org/10.5281/zenodo.5962655
Abstract:
Since its appearance in 2018, TikTok has become one of the most popular social media platforms among digital natives because of its algorithm-based engagement strategies, a policy of public accounts, and a simple, colorful, and intuitive content interface. As happened in the past with other platforms such as Facebook, Twitter, and Instagram, various media are currently seeking ways to adapt to TikTok and its particular characteristics to attract a younger audience less accustomed to the consumption of journalistic material. Against this background, the aim of this study is to identify the presence of the media and journalists on TikTok, measure the virality and engagement of the content they generate, describe the communities created around them, and identify the presence of journalistic use of these accounts. For this, 23,174 videos from 143 accounts belonging to media from 25 countries were analyzed. The results indicate that, in general, the presence and impact of the media in this social network are low and that most of their content is oriented towards the creation of user communities based on viral content and entertainment. However, albeit with a lesser presence, one can also identify accounts and messages that adapt their content to the specific characteristics of TikTok. Their virality and engagement figures illustrate that there is indeed a niche for current affairs on this social network.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We used TikTok’s built-in account analytics to download and record video and account metrics for the period between 10/8/2021 and 2/6/2022. We collected the following summary data for each individual video: video views, likes, comments, shares, total cumulative play time, average duration the video was watched, percentage of viewers who watched the full video, unique reached audience, and the percentage of video views by section (For You, personal profile, Following, hashtags).
We evaluated the “success” of videos based on reach and engagement metrics, as well as viewer retention (how long a video is watched). We used metrics of reach (number of unique users the video was seen by) and engagement (likes, comments, and shares) to calculate the engagement rate of each video. The engagement rate is calculated as the engagement parameter as a percentage of total reach (e.g., Likes / Audience Reached *100).
Facebook
Twitterhttps://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
TikTok-10M Dataset
Dataset Description
TikTok-10M is a large-scale dataset containing 10 million short-form posts from TikTok, designed for video understanding, multimodal learning, and social media content analysis. The dataset was curated to bridge the gap between academic video datasets and actual user-generated content, providing researchers with authentic patterns and characteristics of modern short-form video content that dominates social media platforms.… See the full description on the dataset page: https://huggingface.co/datasets/The-data-company/TikTok-10M.
Facebook
TwitterWith the rise of new social media platforms designed for teenagers and adolescents, the importance of content moderation supported by algorithms is more necessary than ever. State-of-the-art hate speech detection algorithms are increasingly challenged by the rapid and creative evolution of modern language. To better understand the online discourse and phenomena of German far-right extremism on contemporary platforms, this research presents the first German TikTok dataset, consisting of 10,586 comments collected from comment sections and annotated for far-right extremism and hate speech. An extensive and novel annotation scheme comprising of 32 labels was developed in collaboration with domain experts in online extremism, specifically tailored to the TikTok platform. Three trained annotators meticulously annotated the dataset, with 13.76\% of the collected data annotated to be hateful. A quantitative analysis was conducted, examining the primary keywords emerging within hate speech classes, identifying label combinations and distributions, and a sentiment analysis was performed. The dataset reveals extensive hate directed toward German politicians, particularly members of the Green Party, as well as women and immigrants. This research contributes to the field by introducing a new annotation schema, providing a fully annotated dataset, and analyzing the annotations and language used.
Facebook
Twitterhttps://www.apache.org/licenses/LICENSE-2.0https://www.apache.org/licenses/LICENSE-2.0
English news that mention the "TikTok". Crawled date: Jan, 2026. Documents count: 1,000+.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
TikTok is the leading destination for short-form mobile video. The platform is built to help imaginations thrive. TikTok's mission is to create a place for inclusive, joyful, and authentic content–where people can safely discover, create, and connect.
| Column name | Type | Description |
|---|---|---|
| # | int | TikTok assigned number for video with claim/opinion. |
| claim_status | obj | Whether the published video has been identified as an “opinion” or a “claim.” In this dataset, an “opinion” refers to an individual’s or group’s personal belief or thought. A “claim” refers to information that is either unsourced or from an unverified source. |
| video_id | int | Random identifying number assigned to video upon publication on TikTok. |
| video_duration_sec | int | How long the published video is measured in seconds. |
| video_transcription_text | obj | Transcribed text of the words spoken in the published video. |
| verified_status | obj | Indicates the status of the TikTok user who published the video in terms of their verification, either “verified” or “not verified.” |
| author_ban_status | obj | Indicates the status of the TikTok user who published the video in terms of their permissions: “active,” “under scrutiny,” or “banned.” |
| video_view_count | float | The total number of times the published video has been viewed. |
| video_like_count | float | The total number of times the published video has been liked by other users. |
| video_share_count | float | The total number of times the published video has been shared by other users. |
| video_download_count | float | The total number of times the published video has been downloaded by other users. |
| video_comment_count | float | The total number of comments on the published video. |