50 datasets found

P
TikTok Dataset Dataset
paperswithcode.com
Updated Jun 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yasamin Jafarian; Hyun Soo Park (2021). TikTok Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/tiktok-dataset
Explore at:
Dataset updated
Jun 9, 2021
Authors
Yasamin Jafarian; Hyun Soo Park
Description
We learn high fidelity human depths by leveraging a collection of social media dance videos scraped from the TikTok mobile social networking application. It is by far one of the most popular video sharing applications across generations, which include short videos (10-15 seconds) of diverse dance challenges as shown above. We manually find more than 300 dance videos that capture a single person performing dance moves from TikTok dance challenge compilations for each month, variety, type of dances, which are moderate movements that do not generate excessive motion blur. For each video, we extract RGB images at 30 frame per second, resulting in more than 100K images. We segmented these images using Removebg application, and computed the UV coordinates from DensePose.

Download TikTok Dataset:

Please use the dataset only for the research purpose.

The dataset can be viewed and downloaded from the Kaggle page. (you need to make an account in Kaggle to be able to download the data. It is free!)

The dataset can also be downloaded from here (42 GB). The dataset resolution is: (1080 x 604)

The original YouTube videos corresponding to each sequence and the dance name can be downloaded from here (2.6 GB).
c
Data from: News on TikTok: An Annotated Dataset of TikTok Videos from...
datacatalogue.cessda.eu
search.gesis.org
Updated Apr 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wedel, Lion; Mayer, Anna-Theresa; Batzner, Jan; Hendrickx, Jonathan (2025). News on TikTok: An Annotated Dataset of TikTok Videos from German-Speaking News Outlets in 2023 [Dataset]. http://doi.org/10.7802/2863
Explore at:
Unique identifier
https://doi.org/10.7802/2863
Dataset updated
Apr 2, 2025
Dataset provided by
University of Copenhagen
Weizenbaum Institute for the Networked Society
Authors
Wedel, Lion; Mayer, Anna-Theresa; Batzner, Jan; Hendrickx, Jonathan
Area covered
Österreich, Schweiz, Deutschland
Measurement technique
Aufzeichnung (mechanisch/elektronisch), Content Analysis
Description
TikTok is developing into a key platform for news, advertising, politics, online shopping, and entertainment in Germany, with over 20 million monthly users. Especially among young people, TikTok plays an increasing role in their information environment. We provide a human-coded dataset of over 4,000 TikTok videos from German-speaking news outlets from 2023. The coding includes descriptive variables of the videos (e.g., visual style, text overlays, and audio presence) and theory-derived concepts from the journalism sciences (e.g., news values).

This dataset consists of every second video published in 2023 by major news outlets active on TikTok from Germany, Austria, and Switzerland. The data collection was facilitated with the official TikTok API in January 2024. The manual coding took place between September 2024 and December 2024. For a detailed description of the data collection, validation, annotation and descriptive analysis, please refer to:

Mayer, A.-T., Wedel, L., Batzner, J., Hendrickx, J., Bremer, E., Iwan, A., Stocker, V., & Ohme, J. (2025). News on TikTok: An Annotated Dataset of TikTok Videos from German-Speaking News Outlets in 2023. Proceedings of the Nineteenth International AAAI Conference on Web and Social Media, 19, forthcoming.
c
from TikTok Dataset
cubig.ai
Updated Jun 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). from TikTok Dataset [Dataset]. https://cubig.ai/store/products/457/from-tiktok-dataset
Explore at:
Dataset updated
Jun 12, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
Description
1) Data Introduction • The Dataset from TikTok contains 19,382 reports that users flagged as including "claim" in videos or comments, along with video length, transcription text, account status, and participation indicators, and is suitable for analyzing reporting reasons and viewer reactions by content.

2) Data Utilization (1) Dataset from TikTok has characteristics that: • This dataset consists of 12 columns, providing both the reported content type and the meta-participation index of the video. (2) Dataset from TikTok can be used to: • Claim Judgment Classification Model Development: By inputting video transcription text, participation indicators such as views, likes, shares, comments, and account authentication and sanctions information, the machine learning classification model can be automatically determined whether the content contains "claims." • Optimizing moderation tasks: Automate reporting priorities based on classification model predictability to speed up reporting processing and reduce supervision burden by selecting content that managers urgently need to review.
d
A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and...
search.dataone.org
Updated Sep 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thakur, Nirmalya; Su, Vanessa; Shao, Mingchen; Patel, Kesha A.; Jeong, Hongseok; Knieling, Victoria; Bian, Andrew (2024). A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and Other Sources about the 2024 Outbreak of Measles [Dataset]. http://doi.org/10.7910/DVN/QTJ9HC
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/QTJ9HC
Dataset updated
Sep 24, 2024
Dataset provided by
Harvard Dataverse
Authors
Thakur, Nirmalya; Su, Vanessa; Shao, Mingchen; Patel, Kesha A.; Jeong, Hongseok; Knieling, Victoria; Bian, Andrew
Time period covered
Jan 1, 2024 - May 31, 2024
Area covered
YouTube
Description
Please cite the following paper when using this dataset: N. Thakur, V. Su, M. Shao, K. Patel, H. Jeong, V. Knieling, and A.Bian “A labelled dataset for sentiment analysis of videos on YouTube, TikTok, and other sources about the 2024 outbreak of measles,” arXiv [cs.CY], 2024. Available: http://arxiv.org/abs/2406.07693 Abstract This dataset contains the data of 4011 videos about the ongoing outbreak of measles published on 264 websites on the internet between January 1, 2024, and May 31, 2024. These websites primarily include YouTube and TikTok, which account for 48.6% and 15.2% of the videos, respectively. The remainder of the websites include Instagram and Facebook as well as the websites of various global and local news organizations. For each of these videos, the URL of the video, title of the post, description of the post, and the date of publication of the video are presented as separate attributes in the dataset. After developing this dataset, sentiment analysis (using VADER), subjectivity analysis (using TextBlob), and fine-grain sentiment analysis (using DistilRoBERTa-base) of the video titles and video descriptions were performed. This included classifying each video title and video description into (i) one of the sentiment classes i.e. positive, negative, or neutral, (ii) one of the subjectivity classes i.e. highly opinionated, neutral opinionated, or least opinionated, and (iii) one of the fine-grain sentiment classes i.e. fear, surprise, joy, sadness, anger, disgust, or neutral. These results are presented as separate attributes in the dataset for the training and testing of machine learning algorithms for performing sentiment analysis or subjectivity analysis in this field as well as for other applications. The paper associated with this dataset (please see the above-mentioned citation) also presents a list of open research questions that may be investigated using this dataset.
The Invasion of Ukraine Viewed through TikTok: A Dataset
zenodo.org
bin, csv +1
Updated May 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benjamin Steel; Sara Parker; Derek Ruths; Benjamin Steel; Sara Parker; Derek Ruths (2023). The Invasion of Ukraine Viewed through TikTok: A Dataset [Dataset]. http://doi.org/10.5281/zenodo.7926959
Explore at:
text/x-python, bin, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7926959
Dataset updated
May 13, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Benjamin Steel; Sara Parker; Derek Ruths; Benjamin Steel; Sara Parker; Derek Ruths
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Ukraine
Description
This is a dataset of videos and comments related to the invasion of Ukraine, published on TikTok by a number of users over the year of 2022. It was compiled by Benjamin Steel, Sara Parker and Derek Ruths at the Network Dynamics Lab, McGill University. We created this dataset to facilitate the study of TikTok, and the nature of social interaction on the platform relevant to a major political event.

The dataset has been released here on Zenodo: https://doi.org/10.5281/zenodo.7926959 as well as on Github: https://github.com/networkdynamics/data-and-code/tree/master/ukraine_tiktok

To create the dataset, we identified hashtags and keywords explicitly related to the conflict to collect a core set of videos (or ”TikToks”). We then compiled comments associated with these videos. All of the data captured is publically available information, and contains personally identifiable information. In total we collected approximately 16 thousand videos and 12 million comments, from approximately 6 million users. There are approximately 1.9 comments on average per user captured, and 1.5 videos per user who posted a video. The author personally collected this data using the web scraping PyTok library, developed by the author: https://github.com/networkdynamics/pytok.

Due to scraping duration, this is just a sample of the publically available discourse concerning the invasion of Ukraine on TikTok. Due to the fuzzy search functionality of the TikTok, the dataset contains videos with a range of relatedness to the invasion.

We release here the unique video IDs of the dataset in a CSV format. The data was collected without the specific consent of the content creators, so we have released only the data required to re-create it, to allow users to delete content from TikTok and be removed from the dataset if they wish. Contained in this repository are scripts that will automatically pull the full dataset, which will take the form of JSON files organised into a folder for each video. The JSON files are the entirety of the data returned by the TikTok API. We include a script to parse the JSON files into CSV files with the most commonly used data. We plan to further expand this dataset as collection processes progress and the war continues. We will version the dataset to ensure reproducibility.

To build this dataset from the IDs here:

Go to https://github.com/networkdynamics/pytok and clone the repo locally

Run pip install -e . in the pytok directory

Run pip install pandas tqdm to install these libraries if not already installed

Run get_videos.py to get the video data

Run video_comments.py to get the comment data

Run user_tiktoks.py to get the video history of the users

Run hashtag_tiktoks.py or search_tiktoks.py to get more videos from other hashtags and search terms

Run load_json_to_csv.py to compile the JSON files into two CSV files, comments.csv and videos.csv

If you get an error about the wrong chrome version, use the command line argument get_videos.py --chrome-version YOUR_CHROME_VERSION Please note pulling data from TikTok takes a while! We recommend leaving the scripts running on a server for a while for them to finish downloading everything. Feel free to play around with the delay constants to either speed up the process or avoid TikTok rate limiting.

Please do not hesitate to make an issue in this repo to get our help with this!

The videos.csv will contain the following columns:

video_id: Unique video ID

createtime: UTC datetime of video creation time in YYYY-MM-DD HH:MM:SS format

author_name: Unique author name

author_id: Unique author ID

desc: The full video description from the author

hashtags: A list of hashtags used in the video description

share_video_id: If the video is sharing another video, this is the video ID of that original video, else empty

share_video_user_id: If the video is sharing another video, this the user ID of the author of that video, else empty

share_video_user_name: If the video is sharing another video, this is the user name of the author of that video, else empty

share_type: If the video is sharing another video, this is the type of the share, stitch, duet etc.

mentions: A list of users mentioned in the video description, if any

The comments.csv will contain the following columns:

comment_id: Unique comment ID

createtime: UTC datetime of comment creation time in YYYY-MM-DD HH:MM:SS format

author_name: Unique author name

author_id: Unique author ID

text: Text of the comment

mentions: A list of users that are tagged in the comment

video_id: The ID of the video the comment is on

comment_language: The language of the comment, as predicted by the TikTok API

reply_comment_id: If the comment is replying to another comment, this is the ID of that comment

The date can be compiled into a user interaction network to facilitate study of interaction dynamics. There is code to help with that here: https://github.com/networkdynamics/polar-seeds. Additional scripts for further preprocessing of this data can be found there too.
g
TikHarm Dataset
gts.ai
json
Updated Jul 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GTS (2024). TikHarm Dataset [Dataset]. https://gts.ai/dataset-download/tikharm-dataset/
Explore at:
jsonAvailable download formats
Dataset updated
Jul 25, 2024
Dataset provided by
GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
Authors
GTS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Explore the TikHarm dataset, designed to train models for classifying harmful content on TikTok. This curated collection focuses on videos accessible to children, categorized into harmful content, adult content, safe content, and suicide-related material.
i
A Labeled Dataset for Sentiment Analysis of Videos on YouTube
ieee-dataport.org
Updated Jun 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nirmalya Thakur (2025). A Labeled Dataset for Sentiment Analysis of Videos on YouTube [Dataset]. https://ieee-dataport.org/documents/labeled-dataset-sentiment-analysis-videos-youtube-tiktok-and-other-sources-about-2024
Explore at:
Dataset updated
Jun 2, 2025
Authors
Nirmalya Thakur
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
YouTube
Description
USA
s
Data from: TikTok dataset - Current affairs on TikTok. Virality and...
research.science.eus
data.niaid.nih.gov
+1more
Updated 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peña-Fernández, Simón; Larrondo-Ureta, Ainara; Morales-i-Gras, Jordi; Peña-Fernández, Simón; Larrondo-Ureta, Ainara; Morales-i-Gras, Jordi (2022). TikTok dataset - Current affairs on TikTok. Virality and entertainment for digital natives [Dataset]. https://research.science.eus/documentos/668fc45ab9e7c03b01bdae53?lang=ca
Explore at:
Dataset updated
2022
Authors
Peña-Fernández, Simón; Larrondo-Ureta, Ainara; Morales-i-Gras, Jordi; Peña-Fernández, Simón; Larrondo-Ureta, Ainara; Morales-i-Gras, Jordi
Description
Tiktok network graph with 5,638 nodes and 318,986 unique links, representing up to 790,599 weighted links between labels, using Gephi network analysis software. Source of: Peña-Fernández, Simón, Larrondo-Ureta, Ainara, & Morales-i-Gras, Jordi. (2022). Current affairs on TikTok. Virality and entertainment for digital natives. Profesional De La Información, 31(1), 1–12. https://doi.org/10.5281/zenodo.5962655 Abstract: Since its appearance in 2018, TikTok has become one of the most popular social media platforms among digital natives because of its algorithm-based engagement strategies, a policy of public accounts, and a simple, colorful, and intuitive content interface. As happened in the past with other platforms such as Facebook, Twitter, and Instagram, various media are currently seeking ways to adapt to TikTok and its particular characteristics to attract a younger audience less accustomed to the consumption of journalistic material. Against this background, the aim of this study is to identify the presence of the media and journalists on TikTok, measure the virality and engagement of the content they generate, describe the communities created around them, and identify the presence of journalistic use of these accounts. For this, 23,174 videos from 143 accounts belonging to media from 25 countries were analyzed. The results indicate that, in general, the presence and impact of the media in this social network are low and that most of their content is oriented towards the creation of user communities based on viral content and entertainment. However, albeit with a lesser presence, one can also identify accounts and messages that adapt their content to the specific characteristics of TikTok. Their virality and engagement figures illustrate that there is indeed a niche for current affairs on this social network.
TikTok Trending Metadata
kaggle.com
Updated Feb 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brad Culbertson (2023). TikTok Trending Metadata [Dataset]. https://www.kaggle.com/datasets/vbradculbertson/tiktok-trending-metadata
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 24, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Brad Culbertson
Description
The dataset was originally obtained from TikTok's trending API by a GitHub user named Ivan Tran. It contains metadata on engagement with user-created videos and user profile data. The original create time is in Unix timecode format and is extracted directly from the video id number. TikTok's API has become much more difficult to access recently, so more current data is harder to obtain. The hashtags column contains lists.
f
TikTokData.xlsx
figshare.com
xlsx
Updated Jun 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emily Zawacki (2022). TikTokData.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.20069333.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20069333.v1
Dataset updated
Jun 14, 2022
Dataset provided by
figshare
Authors
Emily Zawacki
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We used TikTok’s built-in account analytics to download and record video and account metrics for the period between 10/8/2021 and 2/6/2022. We collected the following summary data for each individual video: video views, likes, comments, shares, total cumulative play time, average duration the video was watched, percentage of viewers who watched the full video, unique reached audience, and the percentage of video views by section (For You, personal profile, Following, hashtags).
We evaluated the “success” of videos based on reach and engagement metrics, as well as viewer retention (how long a video is watched). We used metrics of reach (number of unique users the video was seen by) and engagement (likes, comments, and shares) to calculate the engagement rate of each video. The engagement rate is calculated as the engagement parameter as a percentage of total reach (e.g., Likes / Audience Reached *100).
TikTok Videos Reported Claims
kaggle.com
Updated May 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Murilo Zangari (2024). TikTok Videos Reported Claims [Dataset]. https://www.kaggle.com/datasets/murilozangari/tiktok-claim-analysis/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 9, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Murilo Zangari
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
TikTok users have the ability to submit reports that identify videos and comments that contain user claims. In a social media platform like TikTok, report a claim typically refers to the feature that allows users to report content that they believe violates the platform's community guidelines or terms of service. When a user reports a claim over a video, they are flagging the content for reviewing by the platform's content moderation team. The team then assess the reported content to determine if it indeed violates the guidelines, and if so, they may take actions such as removing the content, issuing a warning to the user who posted it, or even suspending or banning the user's account who posted the video. Reporting a claim is an important tool for maintaining a safe and respectful environment on social media platforms.

However, this process generates a large number of reports that are challenging to consider in a timely manner. Therefore, TikTok is working on the development of a predictive model that can determine whether a video contains a claim or offers an opinion. With a successful prediction model, TikTok can reduce the backlog of user reports and prioritize them more efficiently.

The TikTok data team is developing a machine learning model for classifying claims made in videos submitted to the platform.

The target variable:

The data dictionary shows that there is a column called claim_status. This is a binary value that indicates whether a video is a claim or an opinion. This is the target variable. In other words, for each video, the model should predict whether the video is a claim or an opinion. This is a classification task because the model is predicting a binary class.

To determine which evaluation metric might be best, consider how the model might be wrong. There are two possibilities for bad predictions:

False positives: When the model predicts a video is a claim when in fact it is an opinion

False negatives: When the model predicts a video is an opinion when in fact it is a claim

In the given scenario, it's better for the model to predict false positives when it makes a mistake, and worse for it to predict false negatives. It is very important to identify videos that break the terms of service, even if that means some opinion videos are misclassified as claims. The worst case for an opinion misclassified as a claim is that the video goes to human review. The worst case for a claim that is misclassified as an opinion is that the video does not get reviewed and it violates the terms of service.
h
ai-tube-tik-tak-tok
huggingface.co
Updated Dec 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julian Bilcke (2023). ai-tube-tik-tak-tok [Dataset]. https://huggingface.co/datasets/jbilcke-hf/ai-tube-tik-tak-tok
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 21, 2023
Authors
Julian Bilcke
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Description

Tik Tak Tok - Est. 2023

Model

HotshotXL

Voice

Julian

Orientation

Portrait

Tags

Short Dancing

Style

tiktok video, instagram, beautiful, sharp, detailed

Music

mainstream pop music

Prompt

A channel generating short vertical videos, between 20 seconds and 60 seconds Most videos are about people dancing, doing choregraphy, or talking selfies, filming their cats, daily life (eg. going to a cafe… See the full description on the dataset page: https://huggingface.co/datasets/jbilcke-hf/ai-tube-tik-tak-tok.
U
Data from: #Coronavirus on TikTok: user engagement with misinformation as a...
datacatalog.hshsl.umaryland.edu
data.niaid.nih.gov
+2more
Updated Jul 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonathan D. Baghdadi; K.C. Coffey; Rachael Belcher; James Frisbie; Naeemul Hassan; Danielle Sim; Rena D. Malik (2024). #Coronavirus on TikTok: user engagement with misinformation as a potential threat to public health behavior [Dataset]. http://doi.org/10.5061/dryad.bvq83bkdp
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.bvq83bkdp
Dataset updated
Jul 18, 2024
Dataset provided by
HS/HSL
Authors
Jonathan D. Baghdadi; K.C. Coffey; Rachael Belcher; James Frisbie; Naeemul Hassan; Danielle Sim; Rena D. Malik
Area covered
United States
Description
A sample of TikTok videos associated with the hashtag #coronavirus were downloaded on September 20, 2020. Misinformation was evaluated on a scale (low, medium, high) using a codebook developed by experts in infectious diseases. Multivariable modeling was used to evaluate factors associated with number of views and presence of user comments indicating intention to change behavior. Videos and related metadata were downloaded using a third-party TikTok Scraper using the search term #coronavirus. Videos were reviewed for content and data were entered on a spreadsheet.
h
TikTok_Most_Shared_Video_Transcription_Example
huggingface.co
Updated Jul 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MasaAI (2025). TikTok_Most_Shared_Video_Transcription_Example [Dataset]. https://huggingface.co/datasets/MasaFoundation/TikTok_Most_Shared_Video_Transcription_Example
Explore at:
Dataset updated
Jul 17, 2025
Dataset authored and provided by
MasaAI
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
📲 Example Dataset: TikTok Scraper Tool

👉 Start Scraping TikTok: TikTok Scraper Tool

✨ Key Features

⚡ Instant Transcription – Turn any TikTok video into an AI-ready transcript
🎯 Metadata – Get the title, language description, and video hashtags
🔗 URL-Based Access – Just drop in a TikTok video URL to start scraping
🧩 LLM-Ready Output – Receive clean JSON ready for agents, RAG, or AI tools
💸 Free Tier – Use up to 100 queries during the beta period
💫 Easy… See the full description on the dataset page: https://huggingface.co/datasets/MasaFoundation/TikTok_Most_Shared_Video_Transcription_Example.
R
Video Tiktok Extraction Dataset
universe.roboflow.com
zip
Updated Mar 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Binokular Media Utama (2024). Video Tiktok Extraction Dataset [Dataset]. https://universe.roboflow.com/binokular-media-utama/video-tiktok-extraction/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Mar 1, 2024
Dataset authored and provided by
Binokular Media Utama
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Words Bounding Boxes
Description
Video Tiktok Extraction

## Overview Video Tiktok Extraction is a dataset for object detection tasks - it contains Words annotations for 201 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Brazilian TikTok Trending Videos
kaggle.com
Updated May 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ilan Brik (2021). Brazilian TikTok Trending Videos [Dataset]. https://www.kaggle.com/ilanbrik/brazilian-tiktok-trending-videos
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 7, 2021
Dataset provided by
Kaggle
Authors
Ilan Brik
Area covered
Brazil
Description
Context

US Supermarkets have seen a recent shortage of Feta Cheese due to a TikTok pasta that went viral. "https://www.fox5ny.com/news/viral-tiktok-video-recipe-prompts-feta-cheese-shortage"

The Brazilian music industry is already experiencing huge shifts in it's business model, TikTok changed young people playlists. Most of the biggest players in this market realized the day-light revolution of music going on, and are trying to influence as much as possible something many believe to be random: songs going viral.

Content

This data contains 10.000 rows, each describing a single video. Along with that, there are 14 columns: username, user id, video id, video desc, videotime, video length, video link, n likes, n shares, n comments, n plays, music name, music url

Acknowledgements

Thank you David Teather for developing a nice and easy-to-use API.
d
12.5M+ Tiktok Posts with 50K+ Plays | Global User Profiles Data | Social...
datarade.ai
.csv, .xls, .txt
Updated Jun 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Unify (2025). 12.5M+ Tiktok Posts with 50K+ Plays | Global User Profiles Data | Social Media Marketing & Brand Monitoring [Dataset]. https://datarade.ai/data-products/social-media-data-12-5m-tiktok-posts-with-50k-plays-pos-data-unify
Explore at:
.csv, .xls, .txtAvailable download formats
Dataset updated
Jun 17, 2025
Dataset authored and provided by
Data Unify
Area covered
Georgia, Ethiopia, Uruguay, Seychelles, Malawi, Cayman Islands, Albania, Nigeria, Croatia, France
Description
Unlock insights into high-performing content with this curated dataset of TikTok posts, each with over 50,000 plays. This collection surfaces the videos that resonate most with audiences—spanning creators, themes, and formats that drive virality.

📈 Performance Threshold: Only includes posts that have exceeded 50K views, ensuring a focus on high-engagement, trend-relevant content.

📱 Detailed Post Data: Captures video captions, play counts, likes, shares, comments, sound IDs, hashtags, and posting timestamps.

👤 Creator Metadata: Includes usernames, follower counts, bio snippets, and profile metrics to support creator analysis.

📊 Engagement Benchmarking: Useful for identifying viral content, measuring campaign performance, and refining creative strategies.

⚡ Trend Analysis Ready: Track how themes, hashtags, or sounds perform at scale within and across verticals.

🚀 Structured for Scale: Delivered in clean CSV format API, or custom format, ready for integration into analytics tools, dashboards, or model training environments.

This dataset is designed for marketers, agencies, analysts, and researchers looking to decode the mechanics of virality, identify top-performing content, and inform influencer strategy on TikTok. Whether you're building recommendation engines or planning your next campaign, this dataset offers a high-signal view into TikTok's most impactful content.
tiktok_russian_celebrities_mentions_graph
kaggle.com
Updated May 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anton Kostin (2021). tiktok_russian_celebrities_mentions_graph [Dataset]. https://www.kaggle.com/visualcomments/tiktok-russian-celebrities-mentions-graph/tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 14, 2021
Dataset provided by
Kaggle
Authors
Anton Kostin
License
http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
Area covered
Russia
Description
This graph contains weighted edges between names of celebrities (vertices) which have more than 60 percent of cosine similarity. Similarity info was taken from Gensim Fasttext model which was trained on texts (descriptions) for Tiktok videos.
D
Dataset for "Short-Form Videos Degrade Our Capacity to Retain Intentions:...
darus.uni-stuttgart.de
Updated Sep 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Francesco Chiossi; Luke Haliburton; Changkun Ou; Andreas Butz; Albrecht Schmidt (2024). Dataset for "Short-Form Videos Degrade Our Capacity to Retain Intentions: Effect of Context Switching On Prospective Memory" [Dataset]. http://doi.org/10.18419/DARUS-3327
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.18419/DARUS-3327
Dataset updated
Sep 16, 2024
Dataset provided by
DaRUS
Authors
Francesco Chiossi; Luke Haliburton; Changkun Ou; Andreas Butz; Albrecht Schmidt
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset funded by
DFG
Description
Social media platforms use short, highly engaging videos to catch users’ attention. While the short-form video feeds popularized by TikTok are rapidly spreading to other platforms, we do not yet understand their impact on cognitive functions. We conducted a between-subjects experiment (𝑁 = 60) investigating the impact of engaging with TikTok, Twitter, and YouTube while performing a Prospective Memory task (i.e., executing a previously planned action). The study required participants to remember intentions over interruptions. We found that the TikTok condition significantly degraded the users’ performance in this task. As none of the other conditions (Twitter, YouTube, no activity) had a similar effect, our results indicate that the combination of short videos and rapid context-switching impairs intention recall and execution. We contribute a quantified understanding of the effect of social media feed format on Prospective Memory and outline consequences for media technology designers not to harm the users’ memory and wellbeing. Description of the Dataset Data frame: The ./data/rt.csv provides the data frame of reaction times. The ./data/acc.csv provides the data frame of reaction accuracy scores. The ./data/q.csv provides the data frame collected from questionnaires. The ./data/ddm.csv is the learned DDM features using ./appendix2_ddm_fitting.ipynb, which is then used in ./3.ddm_anova.ipynb. Figures: All figures appeared in the paper are placed in ./figures and can be reproduced using *_vis.ipynb files.
P
EVA Dataset
paperswithcode.com
Updated Jan 8, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pengpeng Yang; Daniele Baracchi; Massimo Iuliani; Dasara Shullani; Rongrong Ni; Yao Zhao; Alessandro Piva (2022). EVA Dataset [Dataset]. https://paperswithcode.com/dataset/eva
Explore at:
Dataset updated
Jan 8, 2022
Authors
Pengpeng Yang; Daniele Baracchi; Massimo Iuliani; Dasara Shullani; Rongrong Ni; Yao Zhao; Alessandro Piva
Description
The dataset contains 7000 videos: native, altered and exchanged through social platforms. The altered contents include manipulations with FFmpeg, AVIdemux, Kdenlive and Adobe Premiere. The social platforms used to exchange the native and altered videos are Facebook, Tiktok, Youtube and Weibo.

A detailed description of the dataset is available in the journal paper by Yang, Pengpeng, et al. "Efficient Video Integrity Analysis Through Container Characterization." IEEE Journal of Selected Topics in Signal Processing 14.5 (2020): 947-954, 10.1109/JSTSP.2020.3008088.

Facebook

Twitter

Click to copy link

Link copied

Cite

Yasamin Jafarian; Hyun Soo Park (2021). TikTok Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/tiktok-dataset

TikTok Dataset Dataset

Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos

Explore at:

208 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Jun 9, 2021

Authors

Yasamin Jafarian; Hyun Soo Park

Description

We learn high fidelity human depths by leveraging a collection of social media dance videos scraped from the TikTok mobile social networking application. It is by far one of the most popular video sharing applications across generations, which include short videos (10-15 seconds) of diverse dance challenges as shown above. We manually find more than 300 dance videos that capture a single person performing dance moves from TikTok dance challenge compilations for each month, variety, type of dances, which are moderate movements that do not generate excessive motion blur. For each video, we extract RGB images at 30 frame per second, resulting in more than 100K images. We segmented these images using Removebg application, and computed the UV coordinates from DensePose.

Download TikTok Dataset:

Please use the dataset only for the research purpose.

The dataset can be viewed and downloaded from the Kaggle page. (you need to make an account in Kaggle to be able to download the data. It is free!)

The dataset can also be downloaded from here (42 GB). The dataset resolution is: (1080 x 604)

The original YouTube videos corresponding to each sequence and the dance name can be downloaded from here (2.6 GB).

Clear search

Close search

Google apps

Main menu

TikTok Dataset Dataset

Data from: News on TikTok: An Annotated Dataset of TikTok Videos from...

from TikTok Dataset

A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and...

The Invasion of Ukraine Viewed through TikTok: A Dataset

TikHarm Dataset

A Labeled Dataset for Sentiment Analysis of Videos on YouTube

Data from: TikTok dataset - Current affairs on TikTok. Virality and...

TikTok Trending Metadata

TikTokData.xlsx

TikTok Videos Reported Claims

ai-tube-tik-tak-tok

Data from: #Coronavirus on TikTok: user engagement with misinformation as a...

TikTok_Most_Shared_Video_Transcription_Example

Video Tiktok Extraction Dataset

Video Tiktok Extraction

Brazilian TikTok Trending Videos

Context

Content

Acknowledgements

12.5M+ Tiktok Posts with 50K+ Plays | Global User Profiles Data | Social...

tiktok_russian_celebrities_mentions_graph

Dataset for "Short-Form Videos Degrade Our Capacity to Retain Intentions:...

EVA Dataset

TikTok Dataset Dataset

Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos