50 datasets found
  1. P

    TikTok Dataset Dataset

    • paperswithcode.com
    Updated Jun 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yasamin Jafarian; Hyun Soo Park (2021). TikTok Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/tiktok-dataset
    Explore at:
    Dataset updated
    Jun 9, 2021
    Authors
    Yasamin Jafarian; Hyun Soo Park
    Description

    We learn high fidelity human depths by leveraging a collection of social media dance videos scraped from the TikTok mobile social networking application. It is by far one of the most popular video sharing applications across generations, which include short videos (10-15 seconds) of diverse dance challenges as shown above. We manually find more than 300 dance videos that capture a single person performing dance moves from TikTok dance challenge compilations for each month, variety, type of dances, which are moderate movements that do not generate excessive motion blur. For each video, we extract RGB images at 30 frame per second, resulting in more than 100K images. We segmented these images using Removebg application, and computed the UV coordinates from DensePose.

    Download TikTok Dataset:

    Please use the dataset only for the research purpose.

    The dataset can be viewed and downloaded from the Kaggle page. (you need to make an account in Kaggle to be able to download the data. It is free!)

    The dataset can also be downloaded from here (42 GB). The dataset resolution is: (1080 x 604)

    The original YouTube videos corresponding to each sequence and the dance name can be downloaded from here (2.6 GB).

  2. c

    Data from: News on TikTok: An Annotated Dataset of TikTok Videos from...

    • datacatalogue.cessda.eu
    • search.gesis.org
    Updated Apr 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wedel, Lion; Mayer, Anna-Theresa; Batzner, Jan; Hendrickx, Jonathan (2025). News on TikTok: An Annotated Dataset of TikTok Videos from German-Speaking News Outlets in 2023 [Dataset]. http://doi.org/10.7802/2863
    Explore at:
    Dataset updated
    Apr 2, 2025
    Dataset provided by
    University of Copenhagen
    Weizenbaum Institute for the Networked Society
    Authors
    Wedel, Lion; Mayer, Anna-Theresa; Batzner, Jan; Hendrickx, Jonathan
    Area covered
    Österreich, Schweiz, Deutschland
    Measurement technique
    Aufzeichnung (mechanisch/elektronisch), Content Analysis
    Description

    TikTok is developing into a key platform for news, advertising, politics, online shopping, and entertainment in Germany, with over 20 million monthly users. Especially among young people, TikTok plays an increasing role in their information environment. We provide a human-coded dataset of over 4,000 TikTok videos from German-speaking news outlets from 2023. The coding includes descriptive variables of the videos (e.g., visual style, text overlays, and audio presence) and theory-derived concepts from the journalism sciences (e.g., news values).

    This dataset consists of every second video published in 2023 by major news outlets active on TikTok from Germany, Austria, and Switzerland. The data collection was facilitated with the official TikTok API in January 2024. The manual coding took place between September 2024 and December 2024. For a detailed description of the data collection, validation, annotation and descriptive analysis, please refer to:

    Mayer, A.-T., Wedel, L., Batzner, J., Hendrickx, J., Bremer, E., Iwan, A., Stocker, V., & Ohme, J. (2025). News on TikTok: An Annotated Dataset of TikTok Videos from German-Speaking News Outlets in 2023. Proceedings of the Nineteenth International AAAI Conference on Web and Social Media, 19, forthcoming.

  3. c

    from TikTok Dataset

    • cubig.ai
    Updated Jun 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). from TikTok Dataset [Dataset]. https://cubig.ai/store/products/457/from-tiktok-dataset
    Explore at:
    Dataset updated
    Jun 12, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
    Description

    1) Data Introduction • The Dataset from TikTok contains 19,382 reports that users flagged as including "claim" in videos or comments, along with video length, transcription text, account status, and participation indicators, and is suitable for analyzing reporting reasons and viewer reactions by content.

    2) Data Utilization (1) Dataset from TikTok has characteristics that: • This dataset consists of 12 columns, providing both the reported content type and the meta-participation index of the video. (2) Dataset from TikTok can be used to: • Claim Judgment Classification Model Development: By inputting video transcription text, participation indicators such as views, likes, shares, comments, and account authentication and sanctions information, the machine learning classification model can be automatically determined whether the content contains "claims." • Optimizing moderation tasks: Automate reporting priorities based on classification model predictability to speed up reporting processing and reduce supervision burden by selecting content that managers urgently need to review.

  4. d

    A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and...

    • search.dataone.org
    Updated Sep 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thakur, Nirmalya; Su, Vanessa; Shao, Mingchen; Patel, Kesha A.; Jeong, Hongseok; Knieling, Victoria; Bian, Andrew (2024). A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and Other Sources about the 2024 Outbreak of Measles [Dataset]. http://doi.org/10.7910/DVN/QTJ9HC
    Explore at:
    Dataset updated
    Sep 24, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Thakur, Nirmalya; Su, Vanessa; Shao, Mingchen; Patel, Kesha A.; Jeong, Hongseok; Knieling, Victoria; Bian, Andrew
    Time period covered
    Jan 1, 2024 - May 31, 2024
    Area covered
    YouTube
    Description

    Please cite the following paper when using this dataset: N. Thakur, V. Su, M. Shao, K. Patel, H. Jeong, V. Knieling, and A.Bian “A labelled dataset for sentiment analysis of videos on YouTube, TikTok, and other sources about the 2024 outbreak of measles,” arXiv [cs.CY], 2024. Available: http://arxiv.org/abs/2406.07693 Abstract This dataset contains the data of 4011 videos about the ongoing outbreak of measles published on 264 websites on the internet between January 1, 2024, and May 31, 2024. These websites primarily include YouTube and TikTok, which account for 48.6% and 15.2% of the videos, respectively. The remainder of the websites include Instagram and Facebook as well as the websites of various global and local news organizations. For each of these videos, the URL of the video, title of the post, description of the post, and the date of publication of the video are presented as separate attributes in the dataset. After developing this dataset, sentiment analysis (using VADER), subjectivity analysis (using TextBlob), and fine-grain sentiment analysis (using DistilRoBERTa-base) of the video titles and video descriptions were performed. This included classifying each video title and video description into (i) one of the sentiment classes i.e. positive, negative, or neutral, (ii) one of the subjectivity classes i.e. highly opinionated, neutral opinionated, or least opinionated, and (iii) one of the fine-grain sentiment classes i.e. fear, surprise, joy, sadness, anger, disgust, or neutral. These results are presented as separate attributes in the dataset for the training and testing of machine learning algorithms for performing sentiment analysis or subjectivity analysis in this field as well as for other applications. The paper associated with this dataset (please see the above-mentioned citation) also presents a list of open research questions that may be investigated using this dataset.

  5. The Invasion of Ukraine Viewed through TikTok: A Dataset

    • zenodo.org
    bin, csv +1
    Updated May 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin Steel; Sara Parker; Derek Ruths; Benjamin Steel; Sara Parker; Derek Ruths (2023). The Invasion of Ukraine Viewed through TikTok: A Dataset [Dataset]. http://doi.org/10.5281/zenodo.7926959
    Explore at:
    text/x-python, bin, csvAvailable download formats
    Dataset updated
    May 13, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Benjamin Steel; Sara Parker; Derek Ruths; Benjamin Steel; Sara Parker; Derek Ruths
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Ukraine
    Description

    This is a dataset of videos and comments related to the invasion of Ukraine, published on TikTok by a number of users over the year of 2022. It was compiled by Benjamin Steel, Sara Parker and Derek Ruths at the Network Dynamics Lab, McGill University. We created this dataset to facilitate the study of TikTok, and the nature of social interaction on the platform relevant to a major political event.

    The dataset has been released here on Zenodo: https://doi.org/10.5281/zenodo.7926959 as well as on Github: https://github.com/networkdynamics/data-and-code/tree/master/ukraine_tiktok

    To create the dataset, we identified hashtags and keywords explicitly related to the conflict to collect a core set of videos (or ”TikToks”). We then compiled comments associated with these videos. All of the data captured is publically available information, and contains personally identifiable information. In total we collected approximately 16 thousand videos and 12 million comments, from approximately 6 million users. There are approximately 1.9 comments on average per user captured, and 1.5 videos per user who posted a video. The author personally collected this data using the web scraping PyTok library, developed by the author: https://github.com/networkdynamics/pytok.

    Due to scraping duration, this is just a sample of the publically available discourse concerning the invasion of Ukraine on TikTok. Due to the fuzzy search functionality of the TikTok, the dataset contains videos with a range of relatedness to the invasion.

    We release here the unique video IDs of the dataset in a CSV format. The data was collected without the specific consent of the content creators, so we have released only the data required to re-create it, to allow users to delete content from TikTok and be removed from the dataset if they wish. Contained in this repository are scripts that will automatically pull the full dataset, which will take the form of JSON files organised into a folder for each video. The JSON files are the entirety of the data returned by the TikTok API. We include a script to parse the JSON files into CSV files with the most commonly used data. We plan to further expand this dataset as collection processes progress and the war continues. We will version the dataset to ensure reproducibility.

    To build this dataset from the IDs here:

    1. Go to https://github.com/networkdynamics/pytok and clone the repo locally
    2. Run pip install -e . in the pytok directory
    3. Run pip install pandas tqdm to install these libraries if not already installed
    4. Run get_videos.py to get the video data
    5. Run video_comments.py to get the comment data
    6. Run user_tiktoks.py to get the video history of the users
    7. Run hashtag_tiktoks.py or search_tiktoks.py to get more videos from other hashtags and search terms
    8. Run load_json_to_csv.py to compile the JSON files into two CSV files, comments.csv and videos.csv

    If you get an error about the wrong chrome version, use the command line argument get_videos.py --chrome-version YOUR_CHROME_VERSION Please note pulling data from TikTok takes a while! We recommend leaving the scripts running on a server for a while for them to finish downloading everything. Feel free to play around with the delay constants to either speed up the process or avoid TikTok rate limiting.

    Please do not hesitate to make an issue in this repo to get our help with this!

    The videos.csv will contain the following columns:

    video_id: Unique video ID

    createtime: UTC datetime of video creation time in YYYY-MM-DD HH:MM:SS format

    author_name: Unique author name

    author_id: Unique author ID

    desc: The full video description from the author

    hashtags: A list of hashtags used in the video description

    share_video_id: If the video is sharing another video, this is the video ID of that original video, else empty

    share_video_user_id: If the video is sharing another video, this the user ID of the author of that video, else empty

    share_video_user_name: If the video is sharing another video, this is the user name of the author of that video, else empty

    share_type: If the video is sharing another video, this is the type of the share, stitch, duet etc.

    mentions: A list of users mentioned in the video description, if any

    The comments.csv will contain the following columns:

    comment_id: Unique comment ID

    createtime: UTC datetime of comment creation time in YYYY-MM-DD HH:MM:SS format

    author_name: Unique author name

    author_id: Unique author ID

    text: Text of the comment

    mentions: A list of users that are tagged in the comment

    video_id: The ID of the video the comment is on

    comment_language: The language of the comment, as predicted by the TikTok API

    reply_comment_id: If the comment is replying to another comment, this is the ID of that comment

    The date can be compiled into a user interaction network to facilitate study of interaction dynamics. There is code to help with that here: https://github.com/networkdynamics/polar-seeds. Additional scripts for further preprocessing of this data can be found there too.

  6. g

    TikHarm Dataset

    • gts.ai
    json
    Updated Jul 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS (2024). TikHarm Dataset [Dataset]. https://gts.ai/dataset-download/tikharm-dataset/
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Jul 25, 2024
    Dataset provided by
    GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
    Authors
    GTS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Explore the TikHarm dataset, designed to train models for classifying harmful content on TikTok. This curated collection focuses on videos accessible to children, categorized into harmful content, adult content, safe content, and suicide-related material.

  7. i

    A Labeled Dataset for Sentiment Analysis of Videos on YouTube

    • ieee-dataport.org
    Updated Jun 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nirmalya Thakur (2025). A Labeled Dataset for Sentiment Analysis of Videos on YouTube [Dataset]. https://ieee-dataport.org/documents/labeled-dataset-sentiment-analysis-videos-youtube-tiktok-and-other-sources-about-2024
    Explore at:
    Dataset updated
    Jun 2, 2025
    Authors
    Nirmalya Thakur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    YouTube
    Description

    USA

  8. s

    Data from: TikTok dataset - Current affairs on TikTok. Virality and...

    • research.science.eus
    • data.niaid.nih.gov
    • +1more
    Updated 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peña-Fernández, Simón; Larrondo-Ureta, Ainara; Morales-i-Gras, Jordi; Peña-Fernández, Simón; Larrondo-Ureta, Ainara; Morales-i-Gras, Jordi (2022). TikTok dataset - Current affairs on TikTok. Virality and entertainment for digital natives [Dataset]. https://research.science.eus/documentos/668fc45ab9e7c03b01bdae53?lang=ca
    Explore at:
    Dataset updated
    2022
    Authors
    Peña-Fernández, Simón; Larrondo-Ureta, Ainara; Morales-i-Gras, Jordi; Peña-Fernández, Simón; Larrondo-Ureta, Ainara; Morales-i-Gras, Jordi
    Description

    Tiktok network graph with 5,638 nodes and 318,986 unique links, representing up to 790,599 weighted links between labels, using Gephi network analysis software. Source of: Peña-Fernández, Simón, Larrondo-Ureta, Ainara, & Morales-i-Gras, Jordi. (2022). Current affairs on TikTok. Virality and entertainment for digital natives. Profesional De La Información, 31(1), 1–12. https://doi.org/10.5281/zenodo.5962655 Abstract: Since its appearance in 2018, TikTok has become one of the most popular social media platforms among digital natives because of its algorithm-based engagement strategies, a policy of public accounts, and a simple, colorful, and intuitive content interface. As happened in the past with other platforms such as Facebook, Twitter, and Instagram, various media are currently seeking ways to adapt to TikTok and its particular characteristics to attract a younger audience less accustomed to the consumption of journalistic material. Against this background, the aim of this study is to identify the presence of the media and journalists on TikTok, measure the virality and engagement of the content they generate, describe the communities created around them, and identify the presence of journalistic use of these accounts. For this, 23,174 videos from 143 accounts belonging to media from 25 countries were analyzed. The results indicate that, in general, the presence and impact of the media in this social network are low and that most of their content is oriented towards the creation of user communities based on viral content and entertainment. However, albeit with a lesser presence, one can also identify accounts and messages that adapt their content to the specific characteristics of TikTok. Their virality and engagement figures illustrate that there is indeed a niche for current affairs on this social network.

  9. TikTok Trending Metadata

    • kaggle.com
    Updated Feb 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brad Culbertson (2023). TikTok Trending Metadata [Dataset]. https://www.kaggle.com/datasets/vbradculbertson/tiktok-trending-metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 24, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Brad Culbertson
    Description

    The dataset was originally obtained from TikTok's trending API by a GitHub user named Ivan Tran. It contains metadata on engagement with user-created videos and user profile data. The original create time is in Unix timecode format and is extracted directly from the video id number. TikTok's API has become much more difficult to access recently, so more current data is harder to obtain. The hashtags column contains lists.

  10. f

    TikTokData.xlsx

    • figshare.com
    xlsx
    Updated Jun 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emily Zawacki (2022). TikTokData.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.20069333.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 14, 2022
    Dataset provided by
    figshare
    Authors
    Emily Zawacki
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We used TikTok’s built-in account analytics to download and record video and account metrics for the period between 10/8/2021 and 2/6/2022. We collected the following summary data for each individual video: video views, likes, comments, shares, total cumulative play time, average duration the video was watched, percentage of viewers who watched the full video, unique reached audience, and the percentage of video views by section (For You, personal profile, Following, hashtags).
    We evaluated the “success” of videos based on reach and engagement metrics, as well as viewer retention (how long a video is watched). We used metrics of reach (number of unique users the video was seen by) and engagement (likes, comments, and shares) to calculate the engagement rate of each video. The engagement rate is calculated as the engagement parameter as a percentage of total reach (e.g., Likes / Audience Reached *100).

  11. TikTok Videos Reported Claims

    • kaggle.com
    Updated May 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Murilo Zangari (2024). TikTok Videos Reported Claims [Dataset]. https://www.kaggle.com/datasets/murilozangari/tiktok-claim-analysis/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 9, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Murilo Zangari
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    TikTok users have the ability to submit reports that identify videos and comments that contain user claims. In a social media platform like TikTok, report a claim typically refers to the feature that allows users to report content that they believe violates the platform's community guidelines or terms of service. When a user reports a claim over a video, they are flagging the content for reviewing by the platform's content moderation team. The team then assess the reported content to determine if it indeed violates the guidelines, and if so, they may take actions such as removing the content, issuing a warning to the user who posted it, or even suspending or banning the user's account who posted the video. Reporting a claim is an important tool for maintaining a safe and respectful environment on social media platforms.

    However, this process generates a large number of reports that are challenging to consider in a timely manner. Therefore, TikTok is working on the development of a predictive model that can determine whether a video contains a claim or offers an opinion. With a successful prediction model, TikTok can reduce the backlog of user reports and prioritize them more efficiently.

    The TikTok data team is developing a machine learning model for classifying claims made in videos submitted to the platform.

    The target variable:

    The data dictionary shows that there is a column called claim_status. This is a binary value that indicates whether a video is a claim or an opinion. This is the target variable. In other words, for each video, the model should predict whether the video is a claim or an opinion. This is a classification task because the model is predicting a binary class.

    To determine which evaluation metric might be best, consider how the model might be wrong. There are two possibilities for bad predictions:

    • False positives: When the model predicts a video is a claim when in fact it is an opinion
    • False negatives: When the model predicts a video is an opinion when in fact it is a claim

    In the given scenario, it's better for the model to predict false positives when it makes a mistake, and worse for it to predict false negatives. It is very important to identify videos that break the terms of service, even if that means some opinion videos are misclassified as claims. The worst case for an opinion misclassified as a claim is that the video goes to human review. The worst case for a claim that is misclassified as an opinion is that the video does not get reviewed and it violates the terms of service.

  12. h

    ai-tube-tik-tak-tok

    • huggingface.co
    Updated Dec 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julian Bilcke (2023). ai-tube-tik-tak-tok [Dataset]. https://huggingface.co/datasets/jbilcke-hf/ai-tube-tik-tak-tok
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 21, 2023
    Authors
    Julian Bilcke
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Description

    Tik Tak Tok - Est. 2023

      Model
    

    HotshotXL

      Voice
    

    Julian

      Orientation
    

    Portrait

      Tags
    

    Short Dancing

      Style
    

    tiktok video, instagram, beautiful, sharp, detailed

      Music
    

    mainstream pop music

      Prompt
    

    A channel generating short vertical videos, between 20 seconds and 60 seconds Most videos are about people dancing, doing choregraphy, or talking selfies, filming their cats, daily life (eg. going to a cafe… See the full description on the dataset page: https://huggingface.co/datasets/jbilcke-hf/ai-tube-tik-tak-tok.

  13. U

    Data from: #Coronavirus on TikTok: user engagement with misinformation as a...

    • datacatalog.hshsl.umaryland.edu
    • data.niaid.nih.gov
    • +2more
    Updated Jul 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan D. Baghdadi; K.C. Coffey; Rachael Belcher; James Frisbie; Naeemul Hassan; Danielle Sim; Rena D. Malik (2024). #Coronavirus on TikTok: user engagement with misinformation as a potential threat to public health behavior [Dataset]. http://doi.org/10.5061/dryad.bvq83bkdp
    Explore at:
    Dataset updated
    Jul 18, 2024
    Dataset provided by
    HS/HSL
    Authors
    Jonathan D. Baghdadi; K.C. Coffey; Rachael Belcher; James Frisbie; Naeemul Hassan; Danielle Sim; Rena D. Malik
    Area covered
    United States
    Description

    A sample of TikTok videos associated with the hashtag #coronavirus were downloaded on September 20, 2020. Misinformation was evaluated on a scale (low, medium, high) using a codebook developed by experts in infectious diseases. Multivariable modeling was used to evaluate factors associated with number of views and presence of user comments indicating intention to change behavior. Videos and related metadata were downloaded using a third-party TikTok Scraper using the search term #coronavirus. Videos were reviewed for content and data were entered on a spreadsheet.

  14. h

    TikTok_Most_Shared_Video_Transcription_Example

    • huggingface.co
    Updated Jul 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MasaAI (2025). TikTok_Most_Shared_Video_Transcription_Example [Dataset]. https://huggingface.co/datasets/MasaFoundation/TikTok_Most_Shared_Video_Transcription_Example
    Explore at:
    Dataset updated
    Jul 17, 2025
    Dataset authored and provided by
    MasaAI
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    📲 Example Dataset: TikTok Scraper Tool

    👉 Start Scraping TikTok: TikTok Scraper Tool

      ✨ Key Features
    

    ⚡ Instant Transcription – Turn any TikTok video into an AI-ready transcript
    🎯 Metadata – Get the title, language description, and video hashtags
    🔗 URL-Based Access – Just drop in a TikTok video URL to start scraping
    🧩 LLM-Ready Output – Receive clean JSON ready for agents, RAG, or AI tools
    💸 Free Tier – Use up to 100 queries during the beta period
    💫 Easy… See the full description on the dataset page: https://huggingface.co/datasets/MasaFoundation/TikTok_Most_Shared_Video_Transcription_Example.

  15. R

    Video Tiktok Extraction Dataset

    • universe.roboflow.com
    zip
    Updated Mar 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Binokular Media Utama (2024). Video Tiktok Extraction Dataset [Dataset]. https://universe.roboflow.com/binokular-media-utama/video-tiktok-extraction/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 1, 2024
    Dataset authored and provided by
    Binokular Media Utama
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Words Bounding Boxes
    Description

    Video Tiktok Extraction

    ## Overview
    
    Video Tiktok Extraction is a dataset for object detection tasks - it contains Words annotations for 201 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  16. Brazilian TikTok Trending Videos

    • kaggle.com
    Updated May 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ilan Brik (2021). Brazilian TikTok Trending Videos [Dataset]. https://www.kaggle.com/ilanbrik/brazilian-tiktok-trending-videos
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 7, 2021
    Dataset provided by
    Kaggle
    Authors
    Ilan Brik
    Area covered
    Brazil
    Description

    Context

    US Supermarkets have seen a recent shortage of Feta Cheese due to a TikTok pasta that went viral. "https://www.fox5ny.com/news/viral-tiktok-video-recipe-prompts-feta-cheese-shortage"

    The Brazilian music industry is already experiencing huge shifts in it's business model, TikTok changed young people playlists. Most of the biggest players in this market realized the day-light revolution of music going on, and are trying to influence as much as possible something many believe to be random: songs going viral.

    Content

    This data contains 10.000 rows, each describing a single video. Along with that, there are 14 columns: username, user id, video id, video desc, videotime, video length, video link, n likes, n shares, n comments, n plays, music name, music url

    Acknowledgements

    Thank you David Teather for developing a nice and easy-to-use API.

  17. d

    12.5M+ Tiktok Posts with 50K+ Plays | Global User Profiles Data | Social...

    • datarade.ai
    .csv, .xls, .txt
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Unify (2025). 12.5M+ Tiktok Posts with 50K+ Plays | Global User Profiles Data | Social Media Marketing & Brand Monitoring [Dataset]. https://datarade.ai/data-products/social-media-data-12-5m-tiktok-posts-with-50k-plays-pos-data-unify
    Explore at:
    .csv, .xls, .txtAvailable download formats
    Dataset updated
    Jun 17, 2025
    Dataset authored and provided by
    Data Unify
    Area covered
    Georgia, Ethiopia, Uruguay, Seychelles, Malawi, Cayman Islands, Albania, Nigeria, Croatia, France
    Description

    Unlock insights into high-performing content with this curated dataset of TikTok posts, each with over 50,000 plays. This collection surfaces the videos that resonate most with audiences—spanning creators, themes, and formats that drive virality.

    📈 Performance Threshold: Only includes posts that have exceeded 50K views, ensuring a focus on high-engagement, trend-relevant content.

    📱 Detailed Post Data: Captures video captions, play counts, likes, shares, comments, sound IDs, hashtags, and posting timestamps.

    👤 Creator Metadata: Includes usernames, follower counts, bio snippets, and profile metrics to support creator analysis.

    📊 Engagement Benchmarking: Useful for identifying viral content, measuring campaign performance, and refining creative strategies.

    ⚡ Trend Analysis Ready: Track how themes, hashtags, or sounds perform at scale within and across verticals.

    🚀 Structured for Scale: Delivered in clean CSV format API, or custom format, ready for integration into analytics tools, dashboards, or model training environments.

    This dataset is designed for marketers, agencies, analysts, and researchers looking to decode the mechanics of virality, identify top-performing content, and inform influencer strategy on TikTok. Whether you're building recommendation engines or planning your next campaign, this dataset offers a high-signal view into TikTok's most impactful content.

  18. tiktok_russian_celebrities_mentions_graph

    • kaggle.com
    Updated May 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anton Kostin (2021). tiktok_russian_celebrities_mentions_graph [Dataset]. https://www.kaggle.com/visualcomments/tiktok-russian-celebrities-mentions-graph/tasks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 14, 2021
    Dataset provided by
    Kaggle
    Authors
    Anton Kostin
    License

    http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html

    Area covered
    Russia
    Description

    This graph contains weighted edges between names of celebrities (vertices) which have more than 60 percent of cosine similarity. Similarity info was taken from Gensim Fasttext model which was trained on texts (descriptions) for Tiktok videos.

  19. D

    Dataset for "Short-Form Videos Degrade Our Capacity to Retain Intentions:...

    • darus.uni-stuttgart.de
    Updated Sep 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Francesco Chiossi; Luke Haliburton; Changkun Ou; Andreas Butz; Albrecht Schmidt (2024). Dataset for "Short-Form Videos Degrade Our Capacity to Retain Intentions: Effect of Context Switching On Prospective Memory" [Dataset]. http://doi.org/10.18419/DARUS-3327
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 16, 2024
    Dataset provided by
    DaRUS
    Authors
    Francesco Chiossi; Luke Haliburton; Changkun Ou; Andreas Butz; Albrecht Schmidt
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Dataset funded by
    DFG
    Description

    Social media platforms use short, highly engaging videos to catch users’ attention. While the short-form video feeds popularized by TikTok are rapidly spreading to other platforms, we do not yet understand their impact on cognitive functions. We conducted a between-subjects experiment (𝑁 = 60) investigating the impact of engaging with TikTok, Twitter, and YouTube while performing a Prospective Memory task (i.e., executing a previously planned action). The study required participants to remember intentions over interruptions. We found that the TikTok condition significantly degraded the users’ performance in this task. As none of the other conditions (Twitter, YouTube, no activity) had a similar effect, our results indicate that the combination of short videos and rapid context-switching impairs intention recall and execution. We contribute a quantified understanding of the effect of social media feed format on Prospective Memory and outline consequences for media technology designers not to harm the users’ memory and wellbeing. Description of the Dataset Data frame: The ./data/rt.csv provides the data frame of reaction times. The ./data/acc.csv provides the data frame of reaction accuracy scores. The ./data/q.csv provides the data frame collected from questionnaires. The ./data/ddm.csv is the learned DDM features using ./appendix2_ddm_fitting.ipynb, which is then used in ./3.ddm_anova.ipynb. Figures: All figures appeared in the paper are placed in ./figures and can be reproduced using *_vis.ipynb files.

  20. P

    EVA Dataset

    • paperswithcode.com
    Updated Jan 8, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pengpeng Yang; Daniele Baracchi; Massimo Iuliani; Dasara Shullani; Rongrong Ni; Yao Zhao; Alessandro Piva (2022). EVA Dataset [Dataset]. https://paperswithcode.com/dataset/eva
    Explore at:
    Dataset updated
    Jan 8, 2022
    Authors
    Pengpeng Yang; Daniele Baracchi; Massimo Iuliani; Dasara Shullani; Rongrong Ni; Yao Zhao; Alessandro Piva
    Description

    The dataset contains 7000 videos: native, altered and exchanged through social platforms. The altered contents include manipulations with FFmpeg, AVIdemux, Kdenlive and Adobe Premiere. The social platforms used to exchange the native and altered videos are Facebook, Tiktok, Youtube and Weibo.

    A detailed description of the dataset is available in the journal paper by Yang, Pengpeng, et al. "Efficient Video Integrity Analysis Through Container Characterization." IEEE Journal of Selected Topics in Signal Processing 14.5 (2020): 947-954, 10.1109/JSTSP.2020.3008088.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Yasamin Jafarian; Hyun Soo Park (2021). TikTok Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/tiktok-dataset

TikTok Dataset Dataset

Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos

Explore at:
208 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jun 9, 2021
Authors
Yasamin Jafarian; Hyun Soo Park
Description

We learn high fidelity human depths by leveraging a collection of social media dance videos scraped from the TikTok mobile social networking application. It is by far one of the most popular video sharing applications across generations, which include short videos (10-15 seconds) of diverse dance challenges as shown above. We manually find more than 300 dance videos that capture a single person performing dance moves from TikTok dance challenge compilations for each month, variety, type of dances, which are moderate movements that do not generate excessive motion blur. For each video, we extract RGB images at 30 frame per second, resulting in more than 100K images. We segmented these images using Removebg application, and computed the UV coordinates from DensePose.

Download TikTok Dataset:

Please use the dataset only for the research purpose.

The dataset can be viewed and downloaded from the Kaggle page. (you need to make an account in Kaggle to be able to download the data. It is free!)

The dataset can also be downloaded from here (42 GB). The dataset resolution is: (1080 x 604)

The original YouTube videos corresponding to each sequence and the dance name can be downloaded from here (2.6 GB).

Search
Clear search
Close search
Google apps
Main menu