100+ datasets found
  1. TikTok User Engagement Data

    • kaggle.com
    zip
    Updated Oct 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yakhyojon (2023). TikTok User Engagement Data [Dataset]. https://www.kaggle.com/datasets/yakhyojon/tiktok
    Explore at:
    zip(813245 bytes)Available download formats
    Dataset updated
    Oct 18, 2023
    Authors
    Yakhyojon
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    TikTok is the leading destination for short-form mobile video. The platform is built to help imaginations thrive. TikTok's mission is to create a place for inclusive, joyful, and authentic content–where people can safely discover, create, and connect.

    Column nameTypeDescription
    #intTikTok assigned number for video with claim/opinion.
    claim_statusobjWhether the published video has been identified as an “opinion” or a “claim.” In this dataset, an “opinion” refers to an individual’s or group’s personal belief or thought. A “claim” refers to information that is either unsourced or from an unverified source.
    video_idintRandom identifying number assigned to video upon publication on TikTok.
    video_duration_secintHow long the published video is measured in seconds.
    video_transcription_textobjTranscribed text of the words spoken in the published video.
    verified_statusobjIndicates the status of the TikTok user who published the video in terms of their verification, either “verified” or “not verified.”
    author_ban_statusobjIndicates the status of the TikTok user who published the video in terms of their permissions: “active,” “under scrutiny,” or “banned.”
    video_view_countfloatThe total number of times the published video has been viewed.
    video_like_countfloatThe total number of times the published video has been liked by other users.
    video_share_countfloatThe total number of times the published video has been shared by other users.
    video_download_countfloatThe total number of times the published video has been downloaded by other users.
    video_comment_countfloatThe total number of comments on the published video.
  2. Tiktok 2025 Dataset

    • kaggle.com
    zip
    Updated Jun 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haziq Halifi (2025). Tiktok 2025 Dataset [Dataset]. https://www.kaggle.com/datasets/haziqhalifi/tiktok-2025-dataset
    Explore at:
    zip(889553 bytes)Available download formats
    Dataset updated
    Jun 13, 2025
    Authors
    Haziq Halifi
    Description

    This dataset contains comprehensive information about TikTok posts, originally fetched from RapidAPI. It provides valuable insights into various aspects of TikTok content, including details about the videos, their creators, and audience engagement metrics.

    Here's a breakdown of the columns included in this dataset:

    video_id: A unique identifier for each TikTok video. author: The username or handle of the TikTok account that posted the video. description: The textual description or caption provided by the creator for the video. (Note: This column contains some missing values.) likes: The number of likes the video has received. comments: The number of comments on the video. shares: The number of times the video has been shared. plays: The total number of plays or views the video has accumulated. (Note: This column contains some missing values.) hashtags: A list of hashtags used in the video's description, which helps categorize content and improve discoverability. (Note: This column contains some missing values.) music: Information about the background music or sound used in the video. create_time: The timestamp indicating when the video was created or published. (Note: This column contains some missing values.) video_url: The direct URL to the TikTok video. fetch_time: The timestamp when the data for the video was fetched from the API. (Note: This column has a high number of missing values.) views: Another metric for the number of views. (Note: This column has a high number of missing values and appears to overlap with plays.) posted_time: The time the video was posted. (Note: This column has a high number of missing values and appears to overlap with create_time.) Potential Uses of This Dataset:

    Content Analysis: Analyze popular TikTok content by examining descriptions, hashtags, and engagement metrics. Trend Identification: Identify trending topics, music, and creators on TikTok. Audience Engagement Studies: Understand how different types of content generate likes, comments, shares, and plays. Creator Analysis: Study the posting habits and performance of various TikTok creators. Social Media Research: Conduct research on the dynamics of content dissemination and user interaction on short-form video platforms. Notes on Data Quality:

    The description, plays, hashtags, and create_time columns have some missing values, which may require handling (e.g., imputation or removal) depending on your analysis. The fetch_time, views, and posted_time columns are largely empty, suggesting they may not be reliable for comprehensive analysis. It is recommended to primarily rely on create_time for timestamps and plays for engagement metrics. This dataset can be a valuable resource for anyone looking to explore the vast and dynamic world of TikTok content and user engagement.

  3. TikTok Datasets

    • brightdata.com
    .json, .csv, .xlsx
    Updated Sep 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2022). TikTok Datasets [Dataset]. https://brightdata.com/products/datasets/tiktok
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Sep 9, 2022
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Use our TikTok profiles dataset to extract business and non-business information from complete public profiles and filter by account name, followers, create date, or engagement score. You may purchase the entire dataset or a customized subset depending on your needs. Popular use cases include sentiment analysis, brand monitoring, influencer marketing, and more. The TikTok dataset includes all major data points: timestamp, account name, nickname, bio,average engagement score, creation date, is_verified,l ikes, followers, external link in bio, and more. Get your TikTok dataset today!

  4. TikTok Video Dataset

    • kaggle.com
    zip
    Updated Mar 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wasif Ullah (2025). TikTok Video Dataset [Dataset]. https://www.kaggle.com/datasets/wasifullahcs/tiktok-video-dataset
    Explore at:
    zip(1835515 bytes)Available download formats
    Dataset updated
    Mar 8, 2025
    Authors
    Wasif Ullah
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    his dataset contains a large collection of TikTok video metadata fetched using the TikTok Scraper API. It includes videos from multiple regions (e.g., US, India,) and categories (e.g., fyp, dance, comedy, food, travel, etc.). Each video entry

    provides detailed information such as:

    Video ID: Unique identifier for the video. Region: The region where the video is popular. Category: The keyword/category used to fetch the video (e.g., dance, comedy). Title: The title of the video. Duration: The length of the video in seconds. Play URL: Direct link to the video. Watermarked URL: Link to the watermarked version of the video. Cover Image: URL of the video's cover image. Music URL: Link to the music used in the video. Timestamp: The date and time when the data was fetched.

    How This Dataset Can Be Helpful

    Trend Analysis: Analyze trending videos across different regions and categories. Identify patterns in video popularity based on region, duration, or category.

    Machine Learning: Train models to predict video popularity based on features like duration, region, and category. Build recommendation systems for TikTok videos.

    Content Moderation: Use the dataset to analyze video content for moderation purposes.

    Sentiment Analysis: Perform sentiment analysis on video titles to understand user preferences.

    Cross-Region Insights: Compare video trends across different regions to understand cultural differences.

    How to Use This Dataset Filter by Region: Analyze videos from a specific region (e.g., US or India).

    Filter by Category: Focus on videos from a specific category (e.g., dance or comedy).

    Trend Analysis: Identify trending videos based on timestamp and region.

    Machine Learning: Use the dataset to train models for video popularity prediction or recommendation systems.

  5. Z

    Dataset for the Instagram and TikTok problematic use

    • data.niaid.nih.gov
    Updated Jul 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Limniou, Maria; Hendrikse, Calanthe (2023). Dataset for the Instagram and TikTok problematic use [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8159159
    Explore at:
    Dataset updated
    Jul 19, 2023
    Dataset provided by
    University of Liverpool
    Authors
    Limniou, Maria; Hendrikse, Calanthe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset supports research on how engagement with social media (Instagram and TikTok) was related to problematic social media use (PSMU) and mental well-being. There are three different files. The SPSS and Excel spreadsheet files include the same dataset but in a different format. The SPSS output presents the data analysis in regard to the difference between Instagram and TikTok users.

  6. TikTok Dataset

    • kaggle.com
    zip
    Updated Jul 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Anas Mahmood (2022). TikTok Dataset [Dataset]. https://www.kaggle.com/datasets/muhammadanasmahmood/tiktok-dataset
    Explore at:
    zip(733532 bytes)Available download formats
    Dataset updated
    Jul 27, 2022
    Authors
    Muhammad Anas Mahmood
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This is the Dataset of popular hashtags on TikTok, this includes the author name, author id, author signature, comment count, hashtags details, URL, share count, hashtags which i scrape are meme, funny, humor, comedy, education, lol, dance, song, music, etc.

  7. TikTok Shop Datasets

    • brightdata.com
    .json, .csv, .xlsx
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2025). TikTok Shop Datasets [Dataset]. https://brightdata.com/products/datasets/tiktok/shop
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Use our TikTok Shop dataset to extract detailed e-commerce insights, including product names, prices, discounts, seller details, product descriptions, categories, customer ratings, and reviews. You may purchase the entire dataset or a customized subset tailored to your needs. Popular use cases include trend analysis, pricing optimization, customer behavior studies, and marketing strategy refinement. The TikTok Shop dataset includes key data points: product performance metrics, user engagement, customer reviews, and more. Unlock the potential of TikTok's shopping platform today with our comprehensive dataset!

  8. U

    Data from: #Coronavirus on TikTok: user engagement with misinformation as a...

    • datacatalog.hshsl.umaryland.edu
    • datasetcatalog.nlm.nih.gov
    • +3more
    Updated Jul 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan D. Baghdadi; K.C. Coffey; Rachael Belcher; James Frisbie; Naeemul Hassan; Danielle Sim; Rena D. Malik (2024). #Coronavirus on TikTok: user engagement with misinformation as a potential threat to public health behavior [Dataset]. http://doi.org/10.5061/dryad.bvq83bkdp
    Explore at:
    Dataset updated
    Jul 18, 2024
    Dataset provided by
    HS/HSL
    Authors
    Jonathan D. Baghdadi; K.C. Coffey; Rachael Belcher; James Frisbie; Naeemul Hassan; Danielle Sim; Rena D. Malik
    Area covered
    United States
    Description

    A sample of TikTok videos associated with the hashtag #coronavirus were downloaded on September 20, 2020. Misinformation was evaluated on a scale (low, medium, high) using a codebook developed by experts in infectious diseases. Multivariable modeling was used to evaluate factors associated with number of views and presence of user comments indicating intention to change behavior. Videos and related metadata were downloaded using a third-party TikTok Scraper using the search term #coronavirus. Videos were reviewed for content and data were entered on a spreadsheet.

  9. c

    from TikTok Dataset

    • cubig.ai
    zip
    Updated Jun 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). from TikTok Dataset [Dataset]. https://cubig.ai/store/products/457/from-tiktok-dataset
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 12, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
    Description

    1) Data Introduction • The Dataset from TikTok contains 19,382 reports that users flagged as including "claim" in videos or comments, along with video length, transcription text, account status, and participation indicators, and is suitable for analyzing reporting reasons and viewer reactions by content.

    2) Data Utilization (1) Dataset from TikTok has characteristics that: • This dataset consists of 12 columns, providing both the reported content type and the meta-participation index of the video. (2) Dataset from TikTok can be used to: • Claim Judgment Classification Model Development: By inputting video transcription text, participation indicators such as views, likes, shares, comments, and account authentication and sanctions information, the machine learning classification model can be automatically determined whether the content contains "claims." • Optimizing moderation tasks: Automate reporting priorities based on classification model predictability to speed up reporting processing and reduce supervision burden by selecting content that managers urgently need to review.

  10. h

    TikTok-10M

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataset Company, TikTok-10M [Dataset]. https://huggingface.co/datasets/The-data-company/TikTok-10M
    Explore at:
    Dataset authored and provided by
    Dataset Company
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    TikTok-10M Dataset

      Dataset Description
    

    TikTok-10M is a large-scale dataset containing 10 million short-form posts from TikTok, designed for video understanding, multimodal learning, and social media content analysis. The dataset was curated to bridge the gap between academic video datasets and actual user-generated content, providing researchers with authentic patterns and characteristics of modern short-form video content that dominates social media platforms.… See the full description on the dataset page: https://huggingface.co/datasets/The-data-company/TikTok-10M.

  11. g

    Data from: News on TikTok: An Annotated Dataset of TikTok Videos from...

    • search.gesis.org
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wedel, Lion; Mayer, Anna-Theresa; Batzner, Jan; Hendrickx, Jonathan, News on TikTok: An Annotated Dataset of TikTok Videos from German-Speaking News Outlets in 2023 [Dataset]. http://doi.org/10.7802/2863
    Explore at:
    Dataset provided by
    GESIS search
    GESIS, Köln
    Authors
    Wedel, Lion; Mayer, Anna-Theresa; Batzner, Jan; Hendrickx, Jonathan
    License

    https://www.gesis.org/en/institute/data-usage-termshttps://www.gesis.org/en/institute/data-usage-terms

    Area covered
    Germany
    Description

    TikTok is developing into a key platform for news, advertising, politics, online shopping, and entertainment in Germany, with over 20 million monthly users. Especially among young people, TikTok plays an increasing role in their information environment. We provide a human-coded dataset of over 4,000 TikTok videos from German-speaking news outlets from 2023. The coding includes descriptive variables of the videos (e.g., visual style, text overlays, and audio presence) and theory-derived concepts from the journalism sciences (e.g., news values).

    This dataset consists of every second video published in 2023 by major news outlets active on TikTok from Germany, Austria, and Switzerland. The data collection was facilitated with the official TikTok API in January 2024. The manual coding took place between September 2024 and December 2024. For a detailed description of the data collection, validation, annotation and descriptive analysis, please refer to [Forthcoming dataset paper publication].

  12. h

    Tiktok-Videos

    • huggingface.co
    Updated Oct 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DataHive AI (2025). Tiktok-Videos [Dataset]. https://huggingface.co/datasets/datahiveai/Tiktok-Videos
    Explore at:
    Dataset updated
    Oct 5, 2025
    Dataset authored and provided by
    DataHive AI
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    TikTok Video Analytics Dataset

    Sample TikTok video dataset with comprehensive engagement metrics and metadata. Each row represents a single TikTok video with content and detailed analytics. This is a sample dataset. To access the full version or request any custom dataset tailored to your needs, contact DataHive at contact@datahive.ai.

      Files Included
    

    train.csv – TikTok video analytics data

      What's included
    

    Video URLs and identifiers Comprehensive engagement… See the full description on the dataset page: https://huggingface.co/datasets/datahiveai/Tiktok-Videos.

  13. Daily Social Media Active Users

    • kaggle.com
    zip
    Updated May 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaik Barood Mohammed Umar Adnaan Faiz (2025). Daily Social Media Active Users [Dataset]. https://www.kaggle.com/datasets/umeradnaan/daily-social-media-active-users
    Explore at:
    zip(126814 bytes)Available download formats
    Dataset updated
    May 5, 2025
    Authors
    Shaik Barood Mohammed Umar Adnaan Faiz
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Description:

    The "Daily Social Media Active Users" dataset provides a comprehensive and dynamic look into the digital presence and activity of global users across major social media platforms. The data was generated to simulate real-world usage patterns for 13 popular platforms, including Facebook, YouTube, WhatsApp, Instagram, WeChat, TikTok, Telegram, Snapchat, X (formerly Twitter), Pinterest, Reddit, Threads, LinkedIn, and Quora. This dataset contains 10,000 rows and includes several key fields that offer insights into user demographics, engagement, and usage habits.

    Dataset Breakdown:

    • Platform: The name of the social media platform where the user activity is tracked. It includes globally recognized platforms, such as Facebook, YouTube, and TikTok, that are known for their large, active user bases.

    • Owner: The company or entity that owns and operates the platform. Examples include Meta for Facebook, Instagram, and WhatsApp, Google for YouTube, and ByteDance for TikTok.

    • Primary Usage: This category identifies the primary function of each platform. Social media platforms differ in their primary usage, whether it's for social networking, messaging, multimedia sharing, professional networking, or more.

    • Country: The geographical region where the user is located. The dataset simulates global coverage, showcasing users from diverse locations and regions. It helps in understanding how user behavior varies across different countries.

    • Daily Time Spent (min): This field tracks how much time a user spends on a given platform on a daily basis, expressed in minutes. Time spent data is critical for understanding user engagement levels and the popularity of specific platforms.

    • Verified Account: Indicates whether the user has a verified account. This feature mimics real-world patterns where verified users (often public figures, businesses, or influencers) have enhanced status on social media platforms.

    • Date Joined: The date when the user registered or started using the platform. This data simulates user account history and can provide insights into user retention trends or platform growth over time.

    Context and Use Cases:

    • This synthetic dataset is designed to offer a privacy-friendly alternative for analytics, research, and machine learning purposes. Given the complexities and privacy concerns around using real user data, especially in the context of social media, this dataset offers a clean and secure way to develop, test, and fine-tune applications, models, and algorithms without the risks of handling sensitive or personal information.

    Researchers, data scientists, and developers can use this dataset to:

    • Model User Behavior: By analyzing patterns in daily time spent, verified status, and country of origin, users can model and predict social media engagement behavior.

    • Test Analytics Tools: Social media monitoring and analytics platforms can use this dataset to simulate user activity and optimize their tools for engagement tracking, reporting, and visualization.

    • Train Machine Learning Algorithms: The dataset can be used to train models for various tasks like user segmentation, recommendation systems, or churn prediction based on engagement metrics.

    • Create Dashboards: This dataset can serve as the foundation for creating user-friendly dashboards that visualize user trends, platform comparisons, and engagement patterns across the globe.

    • Conduct Market Research: Business intelligence teams can use the data to understand how various demographics use social media, offering valuable insights into the most engaged regions, platform preferences, and usage behaviors.

    • Sources of Inspiration: This dataset is inspired by public data from industry reports, such as those from Statista, DataReportal, and other market research platforms. These sources provide insights into the global user base and usage statistics of popular social media platforms. The synthetic nature of this dataset allows for the use of realistic engagement metrics without violating any privacy concerns, making it an ideal tool for educational, analytical, and research purposes.

    The structure and design of the dataset are based on real-world usage patterns and aim to represent a variety of users from different backgrounds, countries, and activity levels. This diversity makes it an ideal candidate for testing data-driven solutions and exploring social media trends.

    Future Considerations:

    As the social media landscape continues to evolve, this dataset can be updated or extended to include new platforms, engagement metrics, or user behaviors. Future iterations may incorporate features like post frequency, follower counts, engagement rates (likes, comments, shares), or even sentiment analysis from user-generated content.

    By leveraging this dataset, analysts and data scientists can create better, more effective strategies ...

  14. d

    A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and...

    • search.dataone.org
    Updated Sep 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thakur, Nirmalya; Su, Vanessa; Shao, Mingchen; Patel, Kesha A.; Jeong, Hongseok; Knieling, Victoria; Bian, Andrew (2024). A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and Other Sources about the 2024 Outbreak of Measles [Dataset]. http://doi.org/10.7910/DVN/QTJ9HC
    Explore at:
    Dataset updated
    Sep 24, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Thakur, Nirmalya; Su, Vanessa; Shao, Mingchen; Patel, Kesha A.; Jeong, Hongseok; Knieling, Victoria; Bian, Andrew
    Time period covered
    Jan 1, 2024 - May 31, 2024
    Area covered
    YouTube
    Description

    Please cite the following paper when using this dataset: N. Thakur, V. Su, M. Shao, K. Patel, H. Jeong, V. Knieling, and A.Bian “A labelled dataset for sentiment analysis of videos on YouTube, TikTok, and other sources about the 2024 outbreak of measles,” arXiv [cs.CY], 2024. Available: http://arxiv.org/abs/2406.07693 Abstract This dataset contains the data of 4011 videos about the ongoing outbreak of measles published on 264 websites on the internet between January 1, 2024, and May 31, 2024. These websites primarily include YouTube and TikTok, which account for 48.6% and 15.2% of the videos, respectively. The remainder of the websites include Instagram and Facebook as well as the websites of various global and local news organizations. For each of these videos, the URL of the video, title of the post, description of the post, and the date of publication of the video are presented as separate attributes in the dataset. After developing this dataset, sentiment analysis (using VADER), subjectivity analysis (using TextBlob), and fine-grain sentiment analysis (using DistilRoBERTa-base) of the video titles and video descriptions were performed. This included classifying each video title and video description into (i) one of the sentiment classes i.e. positive, negative, or neutral, (ii) one of the subjectivity classes i.e. highly opinionated, neutral opinionated, or least opinionated, and (iii) one of the fine-grain sentiment classes i.e. fear, surprise, joy, sadness, anger, disgust, or neutral. These results are presented as separate attributes in the dataset for the training and testing of machine learning algorithms for performing sentiment analysis or subjectivity analysis in this field as well as for other applications. The paper associated with this dataset (please see the above-mentioned citation) also presents a list of open research questions that may be investigated using this dataset.

  15. Invasion of Ukraine Discourse on TikTok Dataset

    • zenodo.org
    • data.niaid.nih.gov
    bin, csv +1
    Updated May 11, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin Steel; Sara Parker; Derek Ruths; Benjamin Steel; Sara Parker; Derek Ruths (2023). Invasion of Ukraine Discourse on TikTok Dataset [Dataset]. http://doi.org/10.5281/zenodo.7534952
    Explore at:
    text/x-python, csv, binAvailable download formats
    Dataset updated
    May 11, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Benjamin Steel; Sara Parker; Derek Ruths; Benjamin Steel; Sara Parker; Derek Ruths
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Ukraine
    Description

    This is a dataset of videos and comments related to the invasion of Ukraine, published on TikTok by a number of users over the year of 2022. It was compiled by Benjamin Steel, Sara Parker and Derek Ruths at the Network Dynamics Lab, McGill University. We created this dataset to facilitate the study of TikTok, and the nature of social interaction on the platform relevant to a major political event.

    The dataset has been released here on Zenodo: https://doi.org/10.5281/zenodo.7534952 as well as on Github: https://github.com/networkdynamics/data-and-code/tree/master/ukraine_tiktok

    To create the dataset, we identified hashtags and keywords explicitly related to the conflict to collect a core set of videos (or ”TikToks”). We then compiled comments associated with these videos. All of the data captured is publically available information, and contains personally identifiable information. In total we collected approximately 16 thousand videos and 12 million comments, from approximately 6 million users. There are approximately 1.9 comments on average per user captured, and 1.5 videos per user who posted a video. The author personally collected this data using the web scraping PyTok library, developed by the author: https://github.com/networkdynamics/pytok.

    Due to scraping duration, this is just a sample of the publically available discourse concerning the invasion of Ukraine on TikTok. Due to the fuzzy search functionality of the TikTok, the dataset contains videos with a range of relatedness to the invasion.

    We release here the unique video IDs of the dataset in a CSV format. The data was collected without the specific consent of the content creators, so we have released only the data required to re-create it, to allow users to delete content from TikTok and be removed from the dataset if they wish. Contained in this repository are scripts that will automatically pull the full dataset, which will take the form of JSON files organised into a folder for each video. The JSON files are the entirety of the data returned by the TikTok API. We include a script to parse the JSON files into CSV files with the most commonly used data. We plan to further expand this dataset as collection processes progress and the war continues. We will version the dataset to ensure reproducibility.

    To build this dataset from the IDs here:

    1. Go to https://github.com/networkdynamics/pytok and clone the repo locally
    2. Run pip install -e . in the pytok directory
    3. Run pip install pandas tqdm to install these libraries if not already installed
    4. Run get_videos.py to get the video data
    5. Run video_comments.py to get the comment data
    6. Run user_tiktoks.py to get the video history of the users
    7. Run hashtag_tiktoks.py or search_tiktoks.py to get more videos from other hashtags and search terms
    8. Run load_json_to_csv.py to compile the JSON files into two CSV files, comments.csv and videos.csv

    If you get an error about the wrong chrome version, use the command line argument get_videos.py --chrome-version YOUR_CHROME_VERSION Please note pulling data from TikTok takes a while! We recommend leaving the scripts running on a server for a while for them to finish downloading everything. Feel free to play around with the delay constants to either speed up the process or avoid TikTok rate limiting.

    Please do not hesitate to make an issue in this repo to get our help with this!

    The videos.csv will contain the following columns:

    video_id: Unique video ID

    createtime: UTC datetime of video creation time in YYYY-MM-DD HH:MM:SS format

    author_name: Unique author name

    author_id: Unique author ID

    desc: The full video description from the author

    hashtags: A list of hashtags used in the video description

    share_video_id: If the video is sharing another video, this is the video ID of that original video, else empty

    share_video_user_id: If the video is sharing another video, this the user ID of the author of that video, else empty

    share_video_user_name: If the video is sharing another video, this is the user name of the author of that video, else empty

    share_type: If the video is sharing another video, this is the type of the share, stitch, duet etc.

    mentions: A list of users mentioned in the video description, if any

    The comments.csv will contain the following columns:

    comment_id: Unique comment ID

    createtime: UTC datetime of comment creation time in YYYY-MM-DD HH:MM:SS format

    author_name: Unique author name

    author_id: Unique author ID

    text: Text of the comment

    mentions: A list of users that are tagged in the comment

    video_id: The ID of the video the comment is on

    comment_language: The language of the comment, as predicted by the TikTok API

    reply_comment_id: If the comment is replying to another comment, this is the ID of that comment

    The date can be compiled into a user interaction network to facilitate study of interaction dynamics. There is code to help with that here: https://github.com/networkdynamics/polar-seeds. Additional scripts for further preprocessing of this data can be found there too.

  16. Z

    Data from: TikTok dataset - Current affairs on TikTok. Virality and...

    • data.niaid.nih.gov
    • research.science.eus
    • +1more
    Updated Aug 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peña-Fernández, Simón; Larrondo-Ureta, Ainara; Morales-i-Gras, Jordi (2022). TikTok dataset - Current affairs on TikTok. Virality and entertainment for digital natives [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7024884
    Explore at:
    Dataset updated
    Aug 28, 2022
    Dataset provided by
    University of the Basque Country (UPV/EHU)
    Authors
    Peña-Fernández, Simón; Larrondo-Ureta, Ainara; Morales-i-Gras, Jordi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Tiktok network graph with 5,638 nodes and 318,986 unique links, representing up to 790,599 weighted links between labels, using Gephi network analysis software.

    Source of:

    Peña-Fernández, Simón, Larrondo-Ureta, Ainara, & Morales-i-Gras, Jordi. (2022). Current affairs on TikTok. Virality and entertainment for digital natives. Profesional De La Información, 31(1), 1–12. https://doi.org/10.5281/zenodo.5962655

    Abstract:

    Since its appearance in 2018, TikTok has become one of the most popular social media platforms among digital natives because of its algorithm-based engagement strategies, a policy of public accounts, and a simple, colorful, and intuitive content interface. As happened in the past with other platforms such as Facebook, Twitter, and Instagram, various media are currently seeking ways to adapt to TikTok and its particular characteristics to attract a younger audience less accustomed to the consumption of journalistic material. Against this background, the aim of this study is to identify the presence of the media and journalists on TikTok, measure the virality and engagement of the content they generate, describe the communities created around them, and identify the presence of journalistic use of these accounts. For this, 23,174 videos from 143 accounts belonging to media from 25 countries were analyzed. The results indicate that, in general, the presence and impact of the media in this social network are low and that most of their content is oriented towards the creation of user communities based on viral content and entertainment. However, albeit with a lesser presence, one can also identify accounts and messages that adapt their content to the specific characteristics of TikTok. Their virality and engagement figures illustrate that there is indeed a niche for current affairs on this social network.

  17. TikTok - Google Play Store Review

    • kaggle.com
    zip
    Updated Nov 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shiv Kumar Ganesh (2022). TikTok - Google Play Store Review [Dataset]. https://www.kaggle.com/datasets/shivkumarganesh/tiktok-google-play-store-review
    Explore at:
    zip(44829891 bytes)Available download formats
    Dataset updated
    Nov 30, 2022
    Authors
    Shiv Kumar Ganesh
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Context

    TikTok, known in China as Douyin (Chinese: 抖音; pinyin: Dǒuyīn), is a video-focused social networking service owned by Chinese company ByteDance Ltd. It hosts a variety of short-form user videos, from genres like pranks, stunts, tricks, jokes, dance, and entertainment with durations from 15 seconds to ten minutes.[7][8][9][10] TikTok is an international version of Douyin, which was originally released in the Chinese market in September 2016.[11] TikTok was launched in 2017 for iOS and Android in most markets outside of mainland China; however, it became available worldwide only after merging with another Chinese social media service, Musical.ly, on 2 August 2018.

    TikTok and Douyin have almost the same user interface but no access to each other's content. Their servers are each based in the market where the respective app is available. The two products are similar, but features are not identical. Douyin includes an in-video search feature that can search by people's faces for more videos of them and other features such as buying, booking hotels and making geo-tagged reviews. Since its launch in 2016, TikTok and Douyin rapidly gained popularity in virtually all parts of the world. As of October 2020, TikTok surpassed over 2 billion mobile downloads worldwide.[Source: Wikipedia]

    This dataset belongs to the TikTok app available on the Google Play Store. The Dataset mostly has user reviews and the various comments made by the users.

    Content

    The content of the various columns is listed below. Please find the description for each column.

    Column NameColumn Description
    userNameName of a User
    userImageProfile Image that a user has
    contentThis represents the comments made by a user
    scoreScores/Rating between 1 to 5
    thumbsUpCountNumber of Thumbs up received by a person
    reviewCreatedVersionVersion number on which the review is created
    atCreated At
    replyContentReply to the comment by the Company
    repliedAtDate and time of the above reply
    reviewIdunique identifier

    Acknowledgements

    TikTok image - TikTok

  18. TikTokData.xlsx

    • figshare.com
    xlsx
    Updated Jun 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emily Zawacki (2022). TikTokData.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.20069333.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 14, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Emily Zawacki
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We used TikTok’s built-in account analytics to download and record video and account metrics for the period between 10/8/2021 and 2/6/2022. We collected the following summary data for each individual video: video views, likes, comments, shares, total cumulative play time, average duration the video was watched, percentage of viewers who watched the full video, unique reached audience, and the percentage of video views by section (For You, personal profile, Following, hashtags).
    We evaluated the “success” of videos based on reach and engagement metrics, as well as viewer retention (how long a video is watched). We used metrics of reach (number of unique users the video was seen by) and engagement (likes, comments, and shares) to calculate the engagement rate of each video. The engagement rate is calculated as the engagement parameter as a percentage of total reach (e.g., Likes / Audience Reached *100).

  19. TikTok Influencer Datasets

    • brightdata.com
    .json, .csv, .xlsx
    Updated Mar 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2025). TikTok Influencer Datasets [Dataset]. https://brightdata.com/products/datasets/tiktok/influencers
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Mar 21, 2025
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Our TikTok Influencer Dataset provides comprehensive insights into influencer profiles, audience engagement, and market impact. This dataset is ideal for brands, marketers, and researchers looking to identify top-performing influencers, analyze engagement metrics, and optimize influencer marketing strategies on TikTok.

    Key Features:
    
      Influencer Profiles: Access detailed influencer data, including profile name, bio, profile picture, and direct profile URL.
      Follower & Engagement Metrics: Track key performance indicators such as follower count, engagement rate, and interaction levels.
      Monetization Insights: Analyze influencer earnings with Gross Merchandise Value (GMV) and currency details.
      Category & Niche Segmentation: Identify influencers based on their associated product categories to match brand campaigns with relevant audiences.
      Contact Information: Retrieve available influencer email addresses for direct outreach and collaboration.
    
    
    Use Cases:
    
      Influencer Discovery & Marketing: Find high-performing TikTok influencers for brand partnerships and sponsored campaigns.
      Competitive Analysis: Compare influencer engagement rates and audience reach to optimize marketing strategies.
      Market Research & Trend Analysis: Identify emerging influencers and track content trends within different product categories.
      Performance Benchmarking: Evaluate influencer success based on GMV, engagement rate, and follower growth.
      Lead Generation & Outreach: Use available contact details to connect with influencers for collaborations and brand promotions.
    
    
    
      Our TikTok Influencer Dataset is available in multiple formats (JSON, CSV, Excel) and can be delivered via 
      API, cloud storage (AWS, Google Cloud, Azure), or direct download. 
      Gain valuable insights into the TikTok influencer landscape and enhance your marketing strategies with high-quality, structured data.
    
  20. R

    Tiktok Reg Dataset

    • universe.roboflow.com
    zip
    Updated Apr 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    COD (2025). Tiktok Reg Dataset [Dataset]. https://universe.roboflow.com/cod-qamxh/tiktok-reg/dataset/5
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 18, 2025
    Dataset authored and provided by
    COD
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Variables measured
    Tiktok Bounding Boxes
    Description

    Tiktok Reg

    ## Overview
    
    Tiktok Reg is a dataset for object detection tasks - it contains Tiktok annotations for 868 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [Public Domain license](https://creativecommons.org/licenses/Public Domain).
    
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Yakhyojon (2023). TikTok User Engagement Data [Dataset]. https://www.kaggle.com/datasets/yakhyojon/tiktok
Organization logo

TikTok User Engagement Data

Classifying claims made in videos submitted to the TikTok.

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
zip(813245 bytes)Available download formats
Dataset updated
Oct 18, 2023
Authors
Yakhyojon
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

TikTok is the leading destination for short-form mobile video. The platform is built to help imaginations thrive. TikTok's mission is to create a place for inclusive, joyful, and authentic content–where people can safely discover, create, and connect.

Column nameTypeDescription
#intTikTok assigned number for video with claim/opinion.
claim_statusobjWhether the published video has been identified as an “opinion” or a “claim.” In this dataset, an “opinion” refers to an individual’s or group’s personal belief or thought. A “claim” refers to information that is either unsourced or from an unverified source.
video_idintRandom identifying number assigned to video upon publication on TikTok.
video_duration_secintHow long the published video is measured in seconds.
video_transcription_textobjTranscribed text of the words spoken in the published video.
verified_statusobjIndicates the status of the TikTok user who published the video in terms of their verification, either “verified” or “not verified.”
author_ban_statusobjIndicates the status of the TikTok user who published the video in terms of their permissions: “active,” “under scrutiny,” or “banned.”
video_view_countfloatThe total number of times the published video has been viewed.
video_like_countfloatThe total number of times the published video has been liked by other users.
video_share_countfloatThe total number of times the published video has been shared by other users.
video_download_countfloatThe total number of times the published video has been downloaded by other users.
video_comment_countfloatThe total number of comments on the published video.
Search
Clear search
Close search
Google apps
Main menu