https://brightdata.com/licensehttps://brightdata.com/license
Use our TikTok profiles dataset to extract business and non-business information from complete public profiles and filter by account name, followers, create date, or engagement score. You may purchase the entire dataset or a customized subset depending on your needs. Popular use cases include sentiment analysis, brand monitoring, influencer marketing, and more. The TikTok dataset includes all major data points: timestamp, account name, nickname, bio,average engagement score, creation date, is_verified,l ikes, followers, external link in bio, and more. Get your TikTok dataset today!
In the fourth quarter of 2024, TikTok generated around 186 million downloads from users worldwide. Initially launched in China first by ByteDance as Douyin, the short-video format was popularized by TikTok and took over the global social media environment in 2020. In the first quarter of 2020, TikTok downloads peaked at over 313.5 million worldwide, up by 62.3 percent compared to the first quarter of 2019.
TikTok interactions: is there a magic formula for content success?
In 2024, TikTok registered an engagement rate of approximately 4.64 percent on video content hosted on its platform. During the same examined year, the social video app recorded over 1,100 interactions on average. These interactions were primarily composed of likes, while only recording less than 20 comments per piece of content on average in 2024.
The platform has been actively monitoring the issue of fake interactions, as it removed around 236 million fake likes during the first quarter of 2024. Though there is no secret formula to get the maximum of these metrics, recommended video length can possibly contribute to the success of content on TikTok.
It was recommended that tiny TikTok accounts with up to 500 followers post videos that are around 2.6 minutes long as of the first quarter of 2024. While, the ideal video duration for huge TikTok accounts with over 50,000 followers was 7.28 minutes. The average length of TikTok videos posted by the creators in 2024 was around 43 seconds.
Whatās trending on TikTok Shop?
Since its launch in September 2023, TikTok Shop has become one of the most popular online shopping platforms, offering consumers a wide variety of products. In 2023, TikTok shops featuring beauty and personal care items sold over 370 million products worldwide.
TikTok shops featuring womenswear and underwear, as well as food and beverages, followed with 285 and 138 million products sold, respectively. Similarly, in the United States market, health and beauty products were the most-selling items,
accounting for 85 percent of sales made via the TikTok Shop feature during the first month of its launch. In 2023, Indonesia was the market with the largest number of TikTok Shops, hosting over 20 percent of all TikTok Shops. Thailand and Vietnam followed with 18.29 and 17.54 percent of the total shops listed on the famous short video platform, respectively.
In 2023, the number of TikTok users in Malaysia was estimated to reach around ** million. The number was forecast to continuously increase between 2024 and 2029. Based on the forecast, the number of TikTok users in Malaysia will reach **** million by 2029.User figures, shown here with regards to the platform TikTok, have been estimated by considering company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
TikTok Video Analytics Dataset
Sample TikTok video dataset with comprehensive engagement metrics and metadata. Each row represents a single TikTok video with content and detailed analytics. This is a sample dataset. To access the full version or request any custom dataset tailored to your needs, contact DataHive at contact@datahive.ai.
Files Included
train.csv ā TikTok video analytics data
What's included
Video URLs and identifiers Comprehensive engagement⦠See the full description on the dataset page: https://huggingface.co/datasets/datahiveai/Tiktok-Videos.
Please cite the following paper when using this dataset: N. Thakur, V. Su, M. Shao, K. Patel, H. Jeong, V. Knieling, and A.Bian āA labelled dataset for sentiment analysis of videos on YouTube, TikTok, and other sources about the 2024 outbreak of measles,ā arXiv [cs.CY], 2024. Available: http://arxiv.org/abs/2406.07693 Abstract This dataset contains the data of 4011 videos about the ongoing outbreak of measles published on 264 websites on the internet between January 1, 2024, and May 31, 2024. These websites primarily include YouTube and TikTok, which account for 48.6% and 15.2% of the videos, respectively. The remainder of the websites include Instagram and Facebook as well as the websites of various global and local news organizations. For each of these videos, the URL of the video, title of the post, description of the post, and the date of publication of the video are presented as separate attributes in the dataset. After developing this dataset, sentiment analysis (using VADER), subjectivity analysis (using TextBlob), and fine-grain sentiment analysis (using DistilRoBERTa-base) of the video titles and video descriptions were performed. This included classifying each video title and video description into (i) one of the sentiment classes i.e. positive, negative, or neutral, (ii) one of the subjectivity classes i.e. highly opinionated, neutral opinionated, or least opinionated, and (iii) one of the fine-grain sentiment classes i.e. fear, surprise, joy, sadness, anger, disgust, or neutral. These results are presented as separate attributes in the dataset for the training and testing of machine learning algorithms for performing sentiment analysis or subjectivity analysis in this field as well as for other applications. The paper associated with this dataset (please see the above-mentioned citation) also presents a list of open research questions that may be investigated using this dataset.
https://brightdata.com/licensehttps://brightdata.com/license
Use our TikTok Shop dataset to extract detailed e-commerce insights, including product names, prices, discounts, seller details, product descriptions, categories, customer ratings, and reviews. You may purchase the entire dataset or a customized subset tailored to your needs. Popular use cases include trend analysis, pricing optimization, customer behavior studies, and marketing strategy refinement. The TikTok Shop dataset includes key data points: product performance metrics, user engagement, customer reviews, and more. Unlock the potential of TikTok's shopping platform today with our comprehensive dataset!
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was initially used in the paper "The use and impact of TikTok in the 2022 Brazilian presidential election". It contains data from official TikTok accounts of the two main candidates running for the 2022 Brazilian presidential election, Lula (@lulaoficial) and Bolsonaro (@bolsonaromessiasjair). It was collected 576 posts of the candidates and more than 540 million interactions on these posts. Data encompass three periods of 2022: (i) Pre-campaign (Jun 30 to Aug 15); (ii) 1st round campaign (Aug 16 to Oct 1); and (iii) 2nd round campaign (Oct 2 - Oct 29). It contains two files. (i) Accounts: How many followers the candidate has, on a day-to-day basis, starting on Sept 5; and (ii) Posts and interactions: Individual data and metrics of each post, including date of the post, text, link for the post, number of plays, likes, comments and shares.
During the fourth quarter 2024, approximately 20.6 million TikTok accounts were removed from the platform due to suspicion of being operated by users under the age of 13. During the last measured period, around 185 million fake accounts were removed from fake accounts removed from TikTok.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a dataset of videos and comments related to the invasion of Ukraine, published on TikTok by a number of users over the year of 2022. It was compiled by Benjamin Steel, Sara Parker and Derek Ruths at the Network Dynamics Lab, McGill University. We created this dataset to facilitate the study of TikTok, and the nature of social interaction on the platform relevant to a major political event.
The dataset has been released here on Zenodo: https://doi.org/10.5281/zenodo.7926959 as well as on Github: https://github.com/networkdynamics/data-and-code/tree/master/ukraine_tiktok
To create the dataset, we identified hashtags and keywords explicitly related to the conflict to collect a core set of videos (or āTikToksā). We then compiled comments associated with these videos. All of the data captured is publically available information, and contains personally identifiable information. In total we collected approximately 16 thousand videos and 12 million comments, from approximately 6 million users. There are approximately 1.9 comments on average per user captured, and 1.5 videos per user who posted a video. The author personally collected this data using the web scraping PyTok library, developed by the author: https://github.com/networkdynamics/pytok.
Due to scraping duration, this is just a sample of the publically available discourse concerning the invasion of Ukraine on TikTok. Due to the fuzzy search functionality of the TikTok, the dataset contains videos with a range of relatedness to the invasion.
We release here the unique video IDs of the dataset in a CSV format. The data was collected without the specific consent of the content creators, so we have released only the data required to re-create it, to allow users to delete content from TikTok and be removed from the dataset if they wish. Contained in this repository are scripts that will automatically pull the full dataset, which will take the form of JSON files organised into a folder for each video. The JSON files are the entirety of the data returned by the TikTok API. We include a script to parse the JSON files into CSV files with the most commonly used data. We plan to further expand this dataset as collection processes progress and the war continues. We will version the dataset to ensure reproducibility.
To build this dataset from the IDs here:
pip install -e .
in the pytok directorypip install pandas tqdm
to install these libraries if not already installedget_videos.py
to get the video datavideo_comments.py
to get the comment datauser_tiktoks.py
to get the video history of the usershashtag_tiktoks.py
or search_tiktoks.py
to get more videos from other hashtags and search termsload_json_to_csv.py
to compile the JSON files into two CSV files, comments.csv
and videos.csv
If you get an error about the wrong chrome version, use the command line argument get_videos.py --chrome-version YOUR_CHROME_VERSION
Please note pulling data from TikTok takes a while! We recommend leaving the scripts running on a server for a while for them to finish downloading everything. Feel free to play around with the delay constants to either speed up the process or avoid TikTok rate limiting.
Please do not hesitate to make an issue in this repo to get our help with this!
The videos.csv
will contain the following columns:
video_id
: Unique video ID
createtime
: UTC datetime of video creation time in YYYY-MM-DD HH:MM:SS format
author_name
: Unique author name
author_id
: Unique author ID
desc
: The full video description from the author
hashtags
: A list of hashtags used in the video description
share_video_id
: If the video is sharing another video, this is the video ID of that original video, else empty
share_video_user_id
: If the video is sharing another video, this the user ID of the author of that video, else empty
share_video_user_name
: If the video is sharing another video, this is the user name of the author of that video, else empty
share_type
: If the video is sharing another video, this is the type of the share, stitch, duet etc.
mentions
: A list of users mentioned in the video description, if any
The comments.csv
will contain the following columns:
comment_id
: Unique comment ID
createtime
: UTC datetime of comment creation time in YYYY-MM-DD HH:MM:SS format
author_name
: Unique author name
author_id
: Unique author ID
text
: Text of the comment
mentions
: A list of users that are tagged in the comment
video_id
: The ID of the video the comment is on
comment_language
: The language of the comment, as predicted by the TikTok API
reply_comment_id
: If the comment is replying to another comment, this is the ID of that comment
The date can be compiled into a user interaction network to facilitate study of interaction dynamics. There is code to help with that here: https://github.com/networkdynamics/polar-seeds. Additional scripts for further preprocessing of this data can be found there too.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains information about TikTok videos, including user interactions and video details. It includes features such as video ID, username, video title, likes, comments, shares, views, and more. This dataset is useful for analyzing video performance and user engagement on TikTok.
Columns:
How many people use social media?
Social media usage is one of the most popular online activities. In 2024, over five billion people were using social media worldwide, a number projected to increase to over six billion in 2028.
Who uses social media?
Social networking is one of the most popular digital activities worldwide and it is no surprise that social networking penetration across all regions is constantly increasing. As of January 2023, the global social media usage rate stood at 59 percent. This figure is anticipated to grow as lesser developed digital markets catch up with other regions
when it comes to infrastructure development and the availability of cheap mobile devices. In fact, most of social mediaās global growth is driven by the increasing usage of mobile devices. Mobile-first market Eastern Asia topped the global ranking of mobile social networking penetration, followed by established digital powerhouses such as the Americas and Northern Europe.
How much time do people spend on social media?
Social media is an integral part of daily internet usage. On average, internet users spend 151 minutes per day on social media and messaging apps, an increase of 40 minutes since 2015. On average, internet users in Latin America had the highest average time spent per day on social media.
What are the most popular social media platforms?
Market leader Facebook was the first social network to surpass one billion registered accounts and currently boasts approximately 2.9 billion monthly active users, making it the most popular social network worldwide. In June 2023, the top social media apps in the Apple App Store included mobile messaging apps WhatsApp and Telegram Messenger, as well as the ever-popular app version of Facebook.
As of January 2024, Instagram was slightly more popular with men than women, with men accounting for 50.6 percent of the platformās global users. Additionally, the social media app was most popular amongst younger audiences, with almost 32 percent of users aged between 18 and 24 years.
Instagramās Global Audience
As of January 2024, Instagram was the fourth most popular social media platform globally, reaching two billion monthly active users (MAU). This number is projected to keep growing with no signs of slowing down, which is not a surprise as the global online social penetration rate across all regions is constantly increasing.
As of January 2024, the country with the largest Instagram audience was India with 362.9 million users, followed by the United States with 169.7 million users.
Who is winning over the generations?
Even though Instagramās audience is almost twice the size of TikTokās on a global scale, TikTok has shown itself to be a fierce competitor, particularly amongst younger audiences. TikTok was the most downloaded mobile app globally in 2022, generating 672 million downloads. As of 2022, Generation Z in the United States spent more time on TikTok than on Instagram monthly.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
š² Example Dataset: TikTok Scraper Tool
š Start Scraping TikTok: TikTok Scraper Tool
⨠Key Features
ā” Instant Transcription ā Turn any TikTok video into an AI-ready transcript
šÆ Metadata ā Get the title, language, description, and video hashtags
š URL-Based Access ā Just drop in a TikTok video URL to start scraping
š§© LLM-Ready Output ā Receive clean JSON ready for agents, RAG, or AI tools
šø Free Tier ā Use up to 100 queries during the beta period
š« Easy⦠See the full description on the dataset page: https://huggingface.co/datasets/Gopher-Lab/TikTok_Crypto_Sentiment.
The global number of Facebook users was forecast to continuously increase between 2023 and 2027 by in total 391 million users (+14.36 percent). After the fourth consecutive increasing year, the Facebook user base is estimated to reach 3.1 billion users and therefore a new peak in 2027. Notably, the number of Facebook users was continuously increasing over the past years. User figures, shown here regarding the platform Facebook, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
Unlock insights into high-performing content with this curated dataset of TikTok posts, each with over 50,000 plays. This collection surfaces the videos that resonate most with audiencesāspanning creators, themes, and formats that drive virality.
š Performance Threshold: Only includes posts that have exceeded 50K views, ensuring a focus on high-engagement, trend-relevant content.
š± Detailed Post Data: Captures video captions, play counts, likes, shares, comments, sound IDs, hashtags, and posting timestamps.
š¤ Creator Metadata: Includes usernames, follower counts, bio snippets, and profile metrics to support creator analysis.
š Engagement Benchmarking: Useful for identifying viral content, measuring campaign performance, and refining creative strategies.
ā” Trend Analysis Ready: Track how themes, hashtags, or sounds perform at scale within and across verticals.
š Structured for Scale: Delivered in clean CSV format API, or custom format, ready for integration into analytics tools, dashboards, or model training environments.
This dataset is designed for marketers, agencies, analysts, and researchers looking to decode the mechanics of virality, identify top-performing content, and inform influencer strategy on TikTok. Whether you're building recommendation engines or planning your next campaign, this dataset offers a high-signal view into TikTok's most impactful content.
The global social media penetration rate in was forecast to continuously increase between 2024 and 2028 by in total 11.6 (+18.19 percent). After the ninth consecutive increasing year, the penetration rate is estimated to reach 75.31 and therefore a new peak in 2028. Notably, the social media penetration rate of was continuously increasing over the past years.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
š² Example Dataset: TikTok Scraper Tool
š Start Scraping TikTok: TikTok Scraper Tool
⨠Key Features
ā” Instant Transcription ā Turn any TikTok video into an AI-ready transcript
šÆ Metadata ā Get the title, language, description, and video hashtags
š URL-Based Access ā Just drop in a TikTok video URL to start scraping
š§© LLM-Ready Output ā Receive clean JSON ready for agents, RAG, or AI tools
šø Free Tier ā Use up to 100 queries during the beta period
š« Easy⦠See the full description on the dataset page: https://huggingface.co/datasets/Gopher-Lab/Tiktok_Chatgpt_Prompt_Guide.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Please cite the following paper when using this dataset:
N. Thakur, āFive Years of COVID-19 Discourse on Instagram: A Labeled Instagram Dataset of Over Half a Million Posts for Multilingual Sentiment Analysisā, Proceedings of the 7th International Conference on Machine Learning and Natural Language Processing (MLNLP 2024), Chengdu, China, October 18-20, 2024 (Paper accepted for publication, Preprint available at: https://arxiv.org/abs/2410.03293)
Abstract
The outbreak of COVID-19 served as a catalyst for content creation and dissemination on social media platforms, as such platforms serve as virtual communities where people can connect and communicate with one another seamlessly. While there have been several works related to the mining and analysis of COVID-19-related posts on social media platforms such as Twitter (or X), YouTube, Facebook, and TikTok, there is still limited research that focuses on the public discourse on Instagram in this context. Furthermore, the prior works in this field have only focused on the development and analysis of datasets of Instagram posts published during the first few months of the outbreak. The work presented in this paper aims to address this research gap and presents a novel multilingual dataset of 500,153 Instagram posts about COVID-19 published between January 2020 and September 2024. This dataset contains Instagram posts in 161 different languages. After the development of this dataset, multilingual sentiment analysis was performed using VADER and twitter-xlm-roberta-base-sentiment. This process involved classifying each post as positive, negative, or neutral. The results of sentiment analysis are presented as a separate attribute in this dataset.
For each of these posts, the Post ID, Post Description, Date of publication, language code, full version of the language, and sentiment label are presented as separate attributes in the dataset.
The Instagram posts in this dataset are present in 161 different languages out of which the top 10 languages in terms of frequency are English (343041 posts), Spanish (30220 posts), Hindi (15832 posts), Portuguese (15779 posts), Indonesian (11491 posts), Tamil (9592 posts), Arabic (9416 posts), German (7822 posts), Italian (5162 posts), Turkish (4632 posts)
There are 535,021 distinct hashtags in this dataset with the top 10 hashtags in terms of frequency being #covid19 (169865 posts), #covid (132485 posts), #coronavirus (117518 posts), #covid_19 (104069 posts), #covidtesting (95095 posts), #coronavirusupdates (75439 posts), #corona (39416 posts), #healthcare (38975 posts), #staysafe (36740 posts), #coronavirusoutbreak (34567 posts)
The following is a description of the attributes present in this dataset
Post ID: Unique ID of each Instagram post
Post Description: Complete description of each post in the language in which it was originally published
Date: Date of publication in MM/DD/YYYY format
Language code: Language code (for example: āenā) that represents the language of the post as detected using the Google Translate API
Full Language: Full form of the language (for example: āEnglishā) that represents the language of the post as detected using the Google Translate API
Sentiment: Results of sentiment analysis (using the preprocessed version of each post) where each post was classified as positive, negative, or neutral
Open Research Questions
This dataset is expected to be helpful for the investigation of the following research questions and even beyond:
How does sentiment toward COVID-19 vary across different languages?
How has public sentiment toward COVID-19 evolved from 2020 to the present?
How do cultural differences affect social media discourse about COVID-19 across various languages?
How has COVID-19 impacted mental health, as reflected in social media posts across different languages?
How effective were public health campaigns in shifting public sentiment in different languages?
What patterns of vaccine hesitancy or support are present in different languages?
How did geopolitical events influence public sentiment about COVID-19 in multilingual social media discourse?
What role does social media discourse play in shaping public behavior toward COVID-19 in different linguistic communities?
How does the sentiment of minority or underrepresented languages compare to that of major world languages regarding COVID-19?
What insights can be gained by comparing the sentiment of COVID-19 posts in widely spoken languages (e.g., English, Spanish) to those in less common languages?
All the Instagram posts that were collected during this data mining process to develop this dataset were publicly available on Instagram and did not require a user to log in to Instagram to view the same (at the time of writing this paper).
In 2024, Google ranked as the most valuable media and entertainment brand worldwide, with a brand value of 683 billion U.S. dollars. Facebook ranked second, valued at around 167 billion dollars. Part of the Tencent Group, WeChat and v.qq.com (Tencent Video) had a brand value of 56 billion and 17.5 billion dollars, respectively.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The goal of the study is to explore how social media users think in moral and ethical terms about their online participation when they talk about TikTok. Relatively little research has focused on moral and ethical reasoning in the use of social media and no study to date has provided the opportunity to voice a userās own experience with moral issues as they perceive them through their use of TikTok. A thematic analysis of 40 in-depth interviews is applied to explore how young users define the āgoodā and what significance they attribute to moral principles.
https://brightdata.com/licensehttps://brightdata.com/license
Use our TikTok profiles dataset to extract business and non-business information from complete public profiles and filter by account name, followers, create date, or engagement score. You may purchase the entire dataset or a customized subset depending on your needs. Popular use cases include sentiment analysis, brand monitoring, influencer marketing, and more. The TikTok dataset includes all major data points: timestamp, account name, nickname, bio,average engagement score, creation date, is_verified,l ikes, followers, external link in bio, and more. Get your TikTok dataset today!