Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Regional TikTok user statistics differentiate significantly. Each major region has also experienced growth a different times.
Launched in 2016, TikTok rose to be one of the most popular social app and video platform for global users. In 2021, TikTok had approximately 656 million global users. This figure was projected to increase by around 15 percent year-over-year, reaching 755 million users in 2022. TikTok global installs peaked at the end of 2019, with the app amassing over 318 million downloads. During 2020 and 2021, TikTok download trends experienced a slower growth, amassing 173 million downloads from users worldwide during the last quarter of 2021.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset supports research on how engagement with social media (Instagram and TikTok) was related to problematic social media use (PSMU) and mental well-being. There are three different files. The SPSS and Excel spreadsheet files include the same dataset but in a different format. The SPSS output presents the data analysis in regard to the difference between Instagram and TikTok users.
https://brightdata.com/licensehttps://brightdata.com/license
Use our TikTok profiles dataset to extract business and non-business information from complete public profiles and filter by account name, followers, create date, or engagement score. You may purchase the entire dataset or a customized subset depending on your needs. Popular use cases include sentiment analysis, brand monitoring, influencer marketing, and more. The TikTok dataset includes all major data points: timestamp, account name, nickname, bio,average engagement score, creation date, is_verified,l ikes, followers, external link in bio, and more. Get your TikTok dataset today!
As of February 2025, it was found that around **** percent of TikTok's global audience were women between the ages of 18 and 24 years, while male users of the same age formed approximately **** percent of the platform's audience. The online audience of the popular social video platform was further composed of **** percent of female users aged between 25 and 34 years, and **** percent of male users in the same age group.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This study focuses on a unique social media user migration phenomenon: a large number of U.S. users shifted to another Chinese social platform, Xiaohongshu, against the backdrop of the U.S. government's push to ban TikTok. By constructing a multidimensional analysis framework, this study systematically analyzes 5,919 user reviews collected during January 2025. The study uses MediaCrawler crawler technology to collect data, TextBlob for sentiment analysis, and combines geographic distribution, time trend and text theme analysis methods to deeply explore this unique user migration pattern. The study finds that despite policy pressure, users have a neutral to positive attitude towards platform migration, with 59.6% of neutral comments and 32.7% of positive comments. The analysis of geographic distribution shows that 88.7% of users in the United States have a significant “digital backlash”. Temporal trend analysis reveals the “bimodal” character of user discussions, reflecting the dynamic change of policy impact and users' continuous attention. Text analysis further shows that users are more concerned about the functional experience of the platform than political factors, reflecting rationality beyond geopolitics. These findings provide new perspectives for understanding social media user behavior in the context of globalization, and have important implications for social media policymaking and platform operation. The study suggests that in the digital era, administrative means have limited influence on users' platform choices, and users' social needs and behavioral choices often transcend geopolitical constraints.
https://brightdata.com/licensehttps://brightdata.com/license
Use our TikTok Shop dataset to extract detailed e-commerce insights, including product names, prices, discounts, seller details, product descriptions, categories, customer ratings, and reviews. You may purchase the entire dataset or a customized subset tailored to your needs. Popular use cases include trend analysis, pricing optimization, customer behavior studies, and marketing strategy refinement. The TikTok Shop dataset includes key data points: product performance metrics, user engagement, customer reviews, and more. Unlock the potential of TikTok's shopping platform today with our comprehensive dataset!
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a dataset of videos and comments related to the invasion of Ukraine, published on TikTok by a number of users over the year of 2022. It was compiled by Benjamin Steel, Sara Parker and Derek Ruths at the Network Dynamics Lab, McGill University. We created this dataset to facilitate the study of TikTok, and the nature of social interaction on the platform relevant to a major political event.
The dataset has been released here on Zenodo: https://doi.org/10.5281/zenodo.7926959 as well as on Github: https://github.com/networkdynamics/data-and-code/tree/master/ukraine_tiktok
To create the dataset, we identified hashtags and keywords explicitly related to the conflict to collect a core set of videos (or ”TikToks”). We then compiled comments associated with these videos. All of the data captured is publically available information, and contains personally identifiable information. In total we collected approximately 16 thousand videos and 12 million comments, from approximately 6 million users. There are approximately 1.9 comments on average per user captured, and 1.5 videos per user who posted a video. The author personally collected this data using the web scraping PyTok library, developed by the author: https://github.com/networkdynamics/pytok.
Due to scraping duration, this is just a sample of the publically available discourse concerning the invasion of Ukraine on TikTok. Due to the fuzzy search functionality of the TikTok, the dataset contains videos with a range of relatedness to the invasion.
We release here the unique video IDs of the dataset in a CSV format. The data was collected without the specific consent of the content creators, so we have released only the data required to re-create it, to allow users to delete content from TikTok and be removed from the dataset if they wish. Contained in this repository are scripts that will automatically pull the full dataset, which will take the form of JSON files organised into a folder for each video. The JSON files are the entirety of the data returned by the TikTok API. We include a script to parse the JSON files into CSV files with the most commonly used data. We plan to further expand this dataset as collection processes progress and the war continues. We will version the dataset to ensure reproducibility.
To build this dataset from the IDs here:
pip install -e .
in the pytok directorypip install pandas tqdm
to install these libraries if not already installedget_videos.py
to get the video datavideo_comments.py
to get the comment datauser_tiktoks.py
to get the video history of the usershashtag_tiktoks.py
or search_tiktoks.py
to get more videos from other hashtags and search termsload_json_to_csv.py
to compile the JSON files into two CSV files, comments.csv
and videos.csv
If you get an error about the wrong chrome version, use the command line argument get_videos.py --chrome-version YOUR_CHROME_VERSION
Please note pulling data from TikTok takes a while! We recommend leaving the scripts running on a server for a while for them to finish downloading everything. Feel free to play around with the delay constants to either speed up the process or avoid TikTok rate limiting.
Please do not hesitate to make an issue in this repo to get our help with this!
The videos.csv
will contain the following columns:
video_id
: Unique video ID
createtime
: UTC datetime of video creation time in YYYY-MM-DD HH:MM:SS format
author_name
: Unique author name
author_id
: Unique author ID
desc
: The full video description from the author
hashtags
: A list of hashtags used in the video description
share_video_id
: If the video is sharing another video, this is the video ID of that original video, else empty
share_video_user_id
: If the video is sharing another video, this the user ID of the author of that video, else empty
share_video_user_name
: If the video is sharing another video, this is the user name of the author of that video, else empty
share_type
: If the video is sharing another video, this is the type of the share, stitch, duet etc.
mentions
: A list of users mentioned in the video description, if any
The comments.csv
will contain the following columns:
comment_id
: Unique comment ID
createtime
: UTC datetime of comment creation time in YYYY-MM-DD HH:MM:SS format
author_name
: Unique author name
author_id
: Unique author ID
text
: Text of the comment
mentions
: A list of users that are tagged in the comment
video_id
: The ID of the video the comment is on
comment_language
: The language of the comment, as predicted by the TikTok API
reply_comment_id
: If the comment is replying to another comment, this is the ID of that comment
The date can be compiled into a user interaction network to facilitate study of interaction dynamics. There is code to help with that here: https://github.com/networkdynamics/polar-seeds. Additional scripts for further preprocessing of this data can be found there too.
In 2023, the number of TikTok users in Malaysia was estimated to reach around ** million. The number was forecast to continuously increase between 2024 and 2029. Based on the forecast, the number of TikTok users in Malaysia will reach **** million by 2029.User figures, shown here with regards to the platform TikTok, have been estimated by considering company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
This dataset was initially used in the paper "The use and impact of TikTok in the 2022 Brazilian presidential election". It contains data from official TikTok accounts of the two main candidates running for the 2022 Brazilian presidential election, Lula (@lulaoficial) and Bolsonaro (@bolsonaromessiasjair). It was collected 576 posts of the candidates and more than 540 million interactions on these posts. Data encompass three periods of 2022: (i) Pre-campaign (Jun 30 to Aug 15); (ii) 1st round campaign (Aug 16 to Oct 1); and (iii) 2nd round campaign (Oct 2 - Oct 29). It contains two files. (i) Accounts: How many followers the candidate has, on a day-to-day basis, starting on Sept 5; and (ii) Posts and interactions: Individual data and metrics of each post, including date of the post, text, link for the post, number of plays, likes, comments and shares.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
TikTok users have the ability to submit reports that identify videos and comments that contain user claims. In a social media platform like TikTok, report a claim typically refers to the feature that allows users to report content that they believe violates the platform's community guidelines or terms of service. When a user reports a claim over a video, they are flagging the content for reviewing by the platform's content moderation team. The team then assess the reported content to determine if it indeed violates the guidelines, and if so, they may take actions such as removing the content, issuing a warning to the user who posted it, or even suspending or banning the user's account who posted the video. Reporting a claim is an important tool for maintaining a safe and respectful environment on social media platforms.
However, this process generates a large number of reports that are challenging to consider in a timely manner. Therefore, TikTok is working on the development of a predictive model that can determine whether a video contains a claim or offers an opinion. With a successful prediction model, TikTok can reduce the backlog of user reports and prioritize them more efficiently.
The TikTok data team is developing a machine learning model for classifying claims made in videos submitted to the platform.
The target variable:
The data dictionary shows that there is a column called claim_status
. This is a binary value that indicates whether a video is a claim or an opinion. This is the target variable. In other words, for each video, the model should predict whether the video is a claim or an opinion. This is a classification task because the model is predicting a binary class.
To determine which evaluation metric might be best, consider how the model might be wrong. There are two possibilities for bad predictions:
In the given scenario, it's better for the model to predict false positives when it makes a mistake, and worse for it to predict false negatives. It is very important to identify videos that break the terms of service, even if that means some opinion videos are misclassified as claims. The worst case for an opinion misclassified as a claim is that the video goes to human review. The worst case for a claim that is misclassified as an opinion is that the video does not get reviewed and it violates the terms of service.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We used TikTok’s built-in account analytics to download and record video and account metrics for the period between 10/8/2021 and 2/6/2022. We collected the following summary data for each individual video: video views, likes, comments, shares, total cumulative play time, average duration the video was watched, percentage of viewers who watched the full video, unique reached audience, and the percentage of video views by section (For You, personal profile, Following, hashtags).
We evaluated the “success” of videos based on reach and engagement metrics, as well as viewer retention (how long a video is watched). We used metrics of reach (number of unique users the video was seen by) and engagement (likes, comments, and shares) to calculate the engagement rate of each video. The engagement rate is calculated as the engagement parameter as a percentage of total reach (e.g., Likes / Audience Reached *100).
https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
Context: This dataset offers insights into the usage patterns of social media apps for 1,000 users across seven popular platforms: Facebook, Instagram, Twitter, Snapchat, TikTok, LinkedIn, and Pinterest. It tracks various metrics such as daily time spent on the app, number of posts made, likes received, and new followers gained.
Dataset Features:
User_ID: Unique identifier for each user. App: The social media platform being used. Daily_Minutes_Spent: Total time a user spends on the app each day, ranging from 5 to 500 minutes. Posts_Per_Day: Number of posts a user creates per day, ranging from 0 to 20. Likes_Per_Day: Total number of likes a user receives on their posts each day, ranging from 0 to 200. Follows_Per_Day: The number of new followers a user gains daily, ranging from 0 to 50. Context & Use Cases: This dataset could be particularly useful for social media analysts, digital marketers, or researchers interested in understanding user engagement trends across different platforms. It provides insights into how much time users spend, how actively they post, and the level of engagement they receive (in terms of likes and followers).
Conclusion & Outcome: Analyzing this dataset could yield several outcomes:
Engagement Patterns: Identifying which platforms have higher engagement in terms of time spent or likes received. Active Users: Determining which users are the most active across various platforms based on the number of posts and followers gained. User Retention: Studying the correlation between time spent and follower growth, providing insight into user retention strategies for different platforms. Overall, the dataset allows for exploration of social media usage trends and helps drive decision-making for marketing strategies, content creation, and platform engagement.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Top 1000 TikTok Influencers Ranking’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/prasertk/top-1000-tiktok-influencers-ranking on 13 February 2022.
--- Dataset description provided by original source is as follows ---
Find the top TikTok accounts.
What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
Data source: https://hypeauditor.com/top-tiktok/
--- Original source retains full ownership of the source dataset ---
Unlock insights into high-performing content with this curated dataset of TikTok posts, each with over 50,000 plays. This collection surfaces the videos that resonate most with audiences—spanning creators, themes, and formats that drive virality.
📈 Performance Threshold: Only includes posts that have exceeded 50K views, ensuring a focus on high-engagement, trend-relevant content.
📱 Detailed Post Data: Captures video captions, play counts, likes, shares, comments, sound IDs, hashtags, and posting timestamps.
👤 Creator Metadata: Includes usernames, follower counts, bio snippets, and profile metrics to support creator analysis.
📊 Engagement Benchmarking: Useful for identifying viral content, measuring campaign performance, and refining creative strategies.
⚡ Trend Analysis Ready: Track how themes, hashtags, or sounds perform at scale within and across verticals.
🚀 Structured for Scale: Delivered in clean CSV format API, or custom format, ready for integration into analytics tools, dashboards, or model training environments.
This dataset is designed for marketers, agencies, analysts, and researchers looking to decode the mechanics of virality, identify top-performing content, and inform influencer strategy on TikTok. Whether you're building recommendation engines or planning your next campaign, this dataset offers a high-signal view into TikTok's most impactful content.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset explores the relationship between digital behavior and mental well-being among 100,000 individuals. It records how much time people spend on screens, use of social media (including TikTok), and how these habits may influence their sleep, stress, and mood levels.
It includes six numerical features, all clean and ready for analysis, making it ideal for machine learning tasks like regression or classification. The data enables researchers and analysts to investigate how modern digital lifestyles may impact mental health indicators in measurable ways.
https://brightdata.com/licensehttps://brightdata.com/license
Gain valuable insights with our comprehensive Social Media Dataset, designed to help businesses, marketers, and analysts track trends, monitor engagement, and optimize strategies. This dataset provides structured and reliable social media data from multiple platforms.
Dataset Features
User Profiles: Access public social media profiles, including usernames, bios, follower counts, engagement metrics, and more. Ideal for audience analysis, influencer marketing, and competitive research. Posts & Content: Extract posts, captions, hashtags, media (images/videos), timestamps, and engagement metrics such as likes, shares, and comments. Useful for trend analysis, sentiment tracking, and content strategy optimization. Comments & Interactions: Analyze user interactions, including replies, mentions, and discussions. This data helps brands understand audience sentiment and engagement patterns. Hashtag & Trend Tracking: Monitor trending hashtags, topics, and viral content across platforms to stay ahead of industry trends and consumer interests.
Customizable Subsets for Specific Needs Our Social Media Dataset is fully customizable, allowing you to filter data based on platform, region, keywords, engagement levels, or specific user profiles. Whether you need a broad dataset for market research or a focused subset for brand monitoring, we tailor the dataset to your needs.
Popular Use Cases
Brand Monitoring & Reputation Management: Track brand mentions, customer feedback, and sentiment analysis to manage online reputation effectively. Influencer Marketing & Audience Analysis: Identify key influencers, analyze engagement metrics, and optimize influencer partnerships. Competitive Intelligence: Monitor competitor activity, content performance, and audience engagement to refine marketing strategies. Market Research & Consumer Insights: Analyze social media trends, customer preferences, and emerging topics to inform business decisions. AI & Predictive Analytics: Leverage structured social media data for AI-driven trend forecasting, sentiment analysis, and automated content recommendations.
Whether you're tracking brand sentiment, analyzing audience engagement, or monitoring industry trends, our Social Media Dataset provides the structured data you need. Get started today and customize your dataset to fit your business objectives.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
TikTok has risen through the ranks to become the 5th most popular social media network worldwide.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1842206%2Fd4a6033b6bd31af45d5175d02e697934%2FAPPLEAPPS2.png?generation=1700357122842963&alt=media" alt="">
These reviews are from Apple App Store
This dataset should paint a good picture on what is the public's perception of the apps over the years. Using this dataset, we can do the following
(AND MANY MORE!)
Images generated using Bing Image Generator
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The dataset is organized into three distinct folders, each corresponding to a chapter of the research project that explores gendered and algorithmic representations on three popular Chinese social media platforms: Rednote (Chapter 3), Kuaishou (Chapter 4), and Douyin (Chapter 5). This research investigates how platform affordances shape the visibility of users and influence the performance of gendered subjectivities and socio-economic influenceEach folder contains different types of data collected for the respective platform:Rednote (Chapter 3) – This folder includes a collection of screenshots from 26 users' posts, comments left by viewers, and 12 interview fieldnotes from participants who actively engage with the platform. 7 key informants' fieldnotes. The creators were selected using digital snowball sampling, and their feedback provides insights into the algorithmic visibility of femininities and the governance of content on Rednote. TKuaishou (Chapter 4) – The Kuaishou folder consists of : screenshots of 40 user posts, viewers comments, and interview fieldnotes from a group of selected Kuaishou influencers, users, and management officials. Kuaishou is a platform known for its emphasis on short video content, and the dataset here reflects the intersection of local cultures, rural narratives, and algorithmic shaping of visibility and influence. Interviews with users provide nuanced perspectives on how they navigate the platform’s dynamics and align their content with the platform’s affordances.Douyin (Chapter 5) – This section contains set of data for Douyin, also known as the Chinese counterpart of TikTok. The data includes 22 creators' screenshots and user comments on Douyin. The study focuses on how Douyin shapes gender performances and the ways users leverage its features to build online communities and visibility.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Regional TikTok user statistics differentiate significantly. Each major region has also experienced growth a different times.