42 datasets found

TikTok Datasets
brightdata.com
.json, .csv, .xlsx
Updated Sep 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2022). TikTok Datasets [Dataset]. https://brightdata.com/products/datasets/tiktok
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Sep 9, 2022
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Use our TikTok profiles dataset to extract business and non-business information from complete public profiles and filter by account name, followers, create date, or engagement score. You may purchase the entire dataset or a customized subset depending on your needs. Popular use cases include sentiment analysis, brand monitoring, influencer marketing, and more. The TikTok dataset includes all major data points: timestamp, account name, nickname, bio,average engagement score, creation date, is_verified,l ikes, followers, external link in bio, and more. Get your TikTok dataset today!
P
TikTok Dataset Dataset
paperswithcode.com
Updated Jul 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yasamin Jafarian; Hyun Soo Park (2024). TikTok Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/tiktok-dataset
Explore at:
Dataset updated
Jul 22, 2024
Authors
Yasamin Jafarian; Hyun Soo Park
Description
We learn high fidelity human depths by leveraging a collection of social media dance videos scraped from the TikTok mobile social networking application. It is by far one of the most popular video sharing applications across generations, which include short videos (10-15 seconds) of diverse dance challenges as shown above. We manually find more than 300 dance videos that capture a single person performing dance moves from TikTok dance challenge compilations for each month, variety, type of dances, which are moderate movements that do not generate excessive motion blur. For each video, we extract RGB images at 30 frame per second, resulting in more than 100K images. We segmented these images using Removebg application, and computed the UV coordinates from DensePose.

Download TikTok Dataset:

Please use the dataset only for the research purpose.

The dataset can be viewed and downloaded from the Kaggle page. (you need to make an account in Kaggle to be able to download the data. It is free!)

The dataset can also be downloaded from here (42 GB). The dataset resolution is: (1080 x 604)

The original YouTube videos corresponding to each sequence and the dance name can be downloaded from here (2.6 GB).
g
Data from: News on TikTok: An Annotated Dataset of TikTok Videos from...
search.gesis.org
datacatalogue.cessda.eu
Updated Jan 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wedel, Lion; Mayer, Anna-Theresa; Batzner, Jan; Hendrickx, Jonathan (2025). News on TikTok: An Annotated Dataset of TikTok Videos from German-Speaking News Outlets in 2023 [Dataset]. http://doi.org/10.7802/2863
Explore at:
Unique identifier
https://doi.org/10.7802/2863
Dataset updated
Jan 8, 2025
Dataset provided by
GESIS, Köln
GESIS search
Authors
Wedel, Lion; Mayer, Anna-Theresa; Batzner, Jan; Hendrickx, Jonathan
License
https://www.gesis.org/en/institute/data-usage-termshttps://www.gesis.org/en/institute/data-usage-terms
Description
TikTok is developing into a key platform for news, advertising, politics, online shopping, and entertainment in Germany, with over 20 million monthly users. Especially among young people, TikTok plays an increasing role in their information environment. We provide a human-coded dataset of over 4,000 TikTok videos from German-speaking news outlets from 2023. The coding includes descriptive variables of the videos (e.g., visual style, text overlays, and audio presence) and theory-derived concepts from the journalism sciences (e.g., news values).

This dataset consists of every second video published in 2023 by major news outlets active on TikTok from Germany, Austria, and Switzerland. The data collection was facilitated with the official TikTok API in January 2024. The manual coding took place between September 2024 and December 2024. For a detailed description of the data collection, validation, annotation and descriptive analysis, please refer to [Forthcoming dataset paper publication].
Data from: TikTok dataset - Current affairs on TikTok. Virality and...
zenodo.org
research.science.eus
+1more
Updated Aug 28, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simón Peña-Fernández; Simón Peña-Fernández; Ainara Larrondo-Ureta; Ainara Larrondo-Ureta; Jordi Morales-i-Gras; Jordi Morales-i-Gras (2022). TikTok dataset - Current affairs on TikTok. Virality and entertainment for digital natives [Dataset]. http://doi.org/10.5281/zenodo.7024885
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7024885
Dataset updated
Aug 28, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Simón Peña-Fernández; Simón Peña-Fernández; Ainara Larrondo-Ureta; Ainara Larrondo-Ureta; Jordi Morales-i-Gras; Jordi Morales-i-Gras
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Tiktok network graph with 5,638 nodes and 318,986 unique links, representing up to 790,599 weighted links between labels, using Gephi network analysis software.

Source of:

Peña-Fernández, Simón, Larrondo-Ureta, Ainara, & Morales-i-Gras, Jordi. (2022). Current affairs on TikTok. Virality and entertainment for digital natives. Profesional De La Información, 31(1), 1–12. https://doi.org/10.5281/zenodo.5962655

Abstract:

Since its appearance in 2018, TikTok has become one of the most popular social media platforms among digital natives because of its algorithm-based engagement strategies, a policy of public accounts, and a simple, colorful, and intuitive content interface. As happened in the past with other platforms such as Facebook, Twitter, and Instagram, various media are currently seeking ways to adapt to TikTok and its particular characteristics to attract a younger audience less accustomed to the consumption of journalistic material. Against this background, the aim of this study is to identify the presence of the media and journalists on TikTok, measure the virality and engagement of the content they generate, describe the communities created around them, and identify the presence of journalistic use of these accounts. For this, 23,174 videos from 143 accounts belonging to media from 25 countries were analyzed. The results indicate that, in general, the presence and impact of the media in this social network are low and that most of their content is oriented towards the creation of user communities based on viral content and entertainment. However, albeit with a lesser presence, one can also identify accounts and messages that adapt their content to the specific characteristics of TikTok. Their virality and engagement figures illustrate that there is indeed a niche for current affairs on this social network.
Z
Dataset for the Instagram and TikTok problematic use
data.niaid.nih.gov
Updated Jul 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Limniou, Maria (2023). Dataset for the Instagram and TikTok problematic use [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8159159
Explore at:
Dataset updated
Jul 19, 2023
Dataset provided by
Limniou, Maria
Hendrikse, Calanthe
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset supports research on how engagement with social media (Instagram and TikTok) was related to problematic social media use (PSMU) and mental well-being. There are three different files. The SPSS and Excel spreadsheet files include the same dataset but in a different format. The SPSS output presents the data analysis in regard to the difference between Instagram and TikTok users.
c
from TikTok Dataset
cubig.ai
Updated Jun 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). from TikTok Dataset [Dataset]. https://cubig.ai/store/products/457/from-tiktok-dataset
Explore at:
Dataset updated
Jun 12, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
Description
1) Data Introduction • The Dataset from TikTok contains 19,382 reports that users flagged as including "claim" in videos or comments, along with video length, transcription text, account status, and participation indicators, and is suitable for analyzing reporting reasons and viewer reactions by content.

2) Data Utilization (1) Dataset from TikTok has characteristics that: • This dataset consists of 12 columns, providing both the reported content type and the meta-participation index of the video. (2) Dataset from TikTok can be used to: • Claim Judgment Classification Model Development: By inputting video transcription text, participation indicators such as views, likes, shares, comments, and account authentication and sanctions information, the machine learning classification model can be automatically determined whether the content contains "claims." • Optimizing moderation tasks: Automate reporting priorities based on classification model predictability to speed up reporting processing and reduce supervision burden by selecting content that managers urgently need to review.
d
Dataset for The use and impact of TikTok in the 2022 Brazilian presidential...
search.dataone.org
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lima, Juliana; Santana, Maria; Correa, Andreiwid; Brito, Kellyton (2023). Dataset for The use and impact of TikTok in the 2022 Brazilian presidential election [Dataset]. http://doi.org/10.7910/DVN/9L7LEI
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/9L7LEI
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Lima, Juliana; Santana, Maria; Correa, Andreiwid; Brito, Kellyton
Area covered
Brazil
Description
This dataset was initially used in the paper "The use and impact of TikTok in the 2022 Brazilian presidential election". It contains data from official TikTok accounts of the two main candidates running for the 2022 Brazilian presidential election, Lula (@lulaoficial) and Bolsonaro (@bolsonaromessiasjair). It was collected 576 posts of the candidates and more than 540 million interactions on these posts. Data encompass three periods of 2022: (i) Pre-campaign (Jun 30 to Aug 15); (ii) 1st round campaign (Aug 16 to Oct 1); and (iii) 2nd round campaign (Oct 2 - Oct 29). It contains two files. (i) Accounts: How many followers the candidate has, on a day-to-day basis, starting on Sept 5; and (ii) Posts and interactions: Individual data and metrics of each post, including date of the post, text, link for the post, number of plays, likes, comments and shares.
A
‘Top 1000 TikTok Influencers Ranking’ analyzed by Analyst-2
analyst-2.ai
Updated Dec 2, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2016). ‘Top 1000 TikTok Influencers Ranking’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-top-1000-tiktok-influencers-ranking-7722/2169259b/?iid=002-219&v=presentation
Explore at:
Dataset updated
Dec 2, 2016
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Top 1000 TikTok Influencers Ranking’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/prasertk/top-1000-tiktok-influencers-ranking on 13 February 2022.

--- Dataset description provided by original source is as follows ---

Context

Find the top TikTok accounts.

Content

What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.

Acknowledgements

Data source: https://hypeauditor.com/top-tiktok/

--- Original source retains full ownership of the source dataset ---
Number of TikTok users in Malaysia 2018-2029
statista.com
Updated Jun 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Number of TikTok users in Malaysia 2018-2029 [Dataset]. https://www.statista.com/forecasts/1380739/tiktok-users-in-malaysia
Explore at:
Dataset updated
Jun 27, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Malaysia
Description
In 2023, the number of TikTok users in Malaysia was estimated to reach around ** million. The number was forecast to continuously increase between 2024 and 2029. Based on the forecast, the number of TikTok users in Malaysia will reach **** million by 2029.User figures, shown here with regards to the platform TikTok, have been estimated by considering company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
The Invasion of Ukraine Viewed through TikTok: A Dataset
zenodo.org
bin, csv +1
Updated May 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benjamin Steel; Sara Parker; Derek Ruths; Benjamin Steel; Sara Parker; Derek Ruths (2023). The Invasion of Ukraine Viewed through TikTok: A Dataset [Dataset]. http://doi.org/10.5281/zenodo.7926959
Explore at:
text/x-python, bin, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7926959
Dataset updated
May 13, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Benjamin Steel; Sara Parker; Derek Ruths; Benjamin Steel; Sara Parker; Derek Ruths
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Ukraine
Description
This is a dataset of videos and comments related to the invasion of Ukraine, published on TikTok by a number of users over the year of 2022. It was compiled by Benjamin Steel, Sara Parker and Derek Ruths at the Network Dynamics Lab, McGill University. We created this dataset to facilitate the study of TikTok, and the nature of social interaction on the platform relevant to a major political event.

The dataset has been released here on Zenodo: https://doi.org/10.5281/zenodo.7926959 as well as on Github: https://github.com/networkdynamics/data-and-code/tree/master/ukraine_tiktok

To create the dataset, we identified hashtags and keywords explicitly related to the conflict to collect a core set of videos (or ”TikToks”). We then compiled comments associated with these videos. All of the data captured is publically available information, and contains personally identifiable information. In total we collected approximately 16 thousand videos and 12 million comments, from approximately 6 million users. There are approximately 1.9 comments on average per user captured, and 1.5 videos per user who posted a video. The author personally collected this data using the web scraping PyTok library, developed by the author: https://github.com/networkdynamics/pytok.

Due to scraping duration, this is just a sample of the publically available discourse concerning the invasion of Ukraine on TikTok. Due to the fuzzy search functionality of the TikTok, the dataset contains videos with a range of relatedness to the invasion.

We release here the unique video IDs of the dataset in a CSV format. The data was collected without the specific consent of the content creators, so we have released only the data required to re-create it, to allow users to delete content from TikTok and be removed from the dataset if they wish. Contained in this repository are scripts that will automatically pull the full dataset, which will take the form of JSON files organised into a folder for each video. The JSON files are the entirety of the data returned by the TikTok API. We include a script to parse the JSON files into CSV files with the most commonly used data. We plan to further expand this dataset as collection processes progress and the war continues. We will version the dataset to ensure reproducibility.

To build this dataset from the IDs here:

Go to https://github.com/networkdynamics/pytok and clone the repo locally

Run pip install -e . in the pytok directory

Run pip install pandas tqdm to install these libraries if not already installed

Run get_videos.py to get the video data

Run video_comments.py to get the comment data

Run user_tiktoks.py to get the video history of the users

Run hashtag_tiktoks.py or search_tiktoks.py to get more videos from other hashtags and search terms

Run load_json_to_csv.py to compile the JSON files into two CSV files, comments.csv and videos.csv

If you get an error about the wrong chrome version, use the command line argument get_videos.py --chrome-version YOUR_CHROME_VERSION Please note pulling data from TikTok takes a while! We recommend leaving the scripts running on a server for a while for them to finish downloading everything. Feel free to play around with the delay constants to either speed up the process or avoid TikTok rate limiting.

Please do not hesitate to make an issue in this repo to get our help with this!

The videos.csv will contain the following columns:

video_id: Unique video ID

createtime: UTC datetime of video creation time in YYYY-MM-DD HH:MM:SS format

author_name: Unique author name

author_id: Unique author ID

desc: The full video description from the author

hashtags: A list of hashtags used in the video description

share_video_id: If the video is sharing another video, this is the video ID of that original video, else empty

share_video_user_id: If the video is sharing another video, this the user ID of the author of that video, else empty

share_video_user_name: If the video is sharing another video, this is the user name of the author of that video, else empty

share_type: If the video is sharing another video, this is the type of the share, stitch, duet etc.

mentions: A list of users mentioned in the video description, if any

The comments.csv will contain the following columns:

comment_id: Unique comment ID

createtime: UTC datetime of comment creation time in YYYY-MM-DD HH:MM:SS format

author_name: Unique author name

author_id: Unique author ID

text: Text of the comment

mentions: A list of users that are tagged in the comment

video_id: The ID of the video the comment is on

comment_language: The language of the comment, as predicted by the TikTok API

reply_comment_id: If the comment is replying to another comment, this is the ID of that comment

The date can be compiled into a user interaction network to facilitate study of interaction dynamics. There is code to help with that here: https://github.com/networkdynamics/polar-seeds. Additional scripts for further preprocessing of this data can be found there too.
f
TikTokData.xlsx
figshare.com
xlsx
Updated Jun 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emily Zawacki (2022). TikTokData.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.20069333.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20069333.v1
Dataset updated
Jun 14, 2022
Dataset provided by
figshare
Authors
Emily Zawacki
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We used TikTok’s built-in account analytics to download and record video and account metrics for the period between 10/8/2021 and 2/6/2022. We collected the following summary data for each individual video: video views, likes, comments, shares, total cumulative play time, average duration the video was watched, percentage of viewers who watched the full video, unique reached audience, and the percentage of video views by section (For You, personal profile, Following, hashtags).
We evaluated the “success” of videos based on reach and engagement metrics, as well as viewer retention (how long a video is watched). We used metrics of reach (number of unique users the video was seen by) and engagement (likes, comments, and shares) to calculate the engagement rate of each video. The engagement rate is calculated as the engagement parameter as a percentage of total reach (e.g., Likes / Audience Reached *100).
f
Data from: DataSet "Political communication on TikTok: from the feminisation...
figshare.com
portalcienciaytecnologia.jcyl.es
+2more
xlsx
Updated Nov 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Salvador Gómez García; Raquel Quevedo Redondo (2023). DataSet "Political communication on TikTok: from the feminisation of discourse to incivility expressed in emoji form" [Dataset]. http://doi.org/10.6084/m9.figshare.24599562.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24599562.v1
Dataset updated
Nov 21, 2023
Dataset provided by
figshare
Authors
Salvador Gómez García; Raquel Quevedo Redondo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In a context where there is permanent electoral campaigning, an increasing number of political communication experts are trying to unravel the resources used by government officials and their parties to influence TikTok users. From a broad perspective, the subject matter is not new, but it is topical; nonetheless, this research discloses a gap in the literature by amalgamating the recognition of idiosyncratic attributes of the feminisation of political discourse on TikTok with the analysis of the reactions (text and emojis) that the audiovisual content imbued by this trend elicits in users. The purpose is to ascertain whether the inclusive tone of the feminised rhetorical style can be extrapolated to TikTok and, if so, whether its particular characteristics mitigate expressions of incivility. To do so, the initial content posted (first seven months) on TikTok by the Spanish political platform Sumar with its leader, Yolanda Díaz, featuring prominently in most of the videos, were selected for scrutiny. A mixed methodology analysis of audiovisual content and comments showed that the anti-polarisation rhetoric and storytelling contributed to neutralising the extreme forms of flaming, although Sumar did not use a strategy tailor-made to suit TikTok.
TikTok Shop Datasets
brightdata.com
.json, .csv, .xlsx
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data, TikTok Shop Datasets [Dataset]. https://brightdata.com/products/datasets/tiktok/shop
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Use our TikTok Shop dataset to extract detailed e-commerce insights, including product names, prices, discounts, seller details, product descriptions, categories, customer ratings, and reviews. You may purchase the entire dataset or a customized subset tailored to your needs. Popular use cases include trend analysis, pricing optimization, customer behavior studies, and marketing strategy refinement. The TikTok Shop dataset includes key data points: product performance metrics, user engagement, customer reviews, and more. Unlock the potential of TikTok's shopping platform today with our comprehensive dataset!
TikTok Videos Reported Claims
kaggle.com
Updated May 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Murilo Zangari (2024). TikTok Videos Reported Claims [Dataset]. https://www.kaggle.com/datasets/murilozangari/tiktok-claim-analysis/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 9, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Murilo Zangari
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
TikTok users have the ability to submit reports that identify videos and comments that contain user claims. In a social media platform like TikTok, report a claim typically refers to the feature that allows users to report content that they believe violates the platform's community guidelines or terms of service. When a user reports a claim over a video, they are flagging the content for reviewing by the platform's content moderation team. The team then assess the reported content to determine if it indeed violates the guidelines, and if so, they may take actions such as removing the content, issuing a warning to the user who posted it, or even suspending or banning the user's account who posted the video. Reporting a claim is an important tool for maintaining a safe and respectful environment on social media platforms.

However, this process generates a large number of reports that are challenging to consider in a timely manner. Therefore, TikTok is working on the development of a predictive model that can determine whether a video contains a claim or offers an opinion. With a successful prediction model, TikTok can reduce the backlog of user reports and prioritize them more efficiently.

The TikTok data team is developing a machine learning model for classifying claims made in videos submitted to the platform.

The target variable:

The data dictionary shows that there is a column called claim_status. This is a binary value that indicates whether a video is a claim or an opinion. This is the target variable. In other words, for each video, the model should predict whether the video is a claim or an opinion. This is a classification task because the model is predicting a binary class.

To determine which evaluation metric might be best, consider how the model might be wrong. There are two possibilities for bad predictions:

False positives: When the model predicts a video is a claim when in fact it is an opinion

False negatives: When the model predicts a video is an opinion when in fact it is a claim

In the given scenario, it's better for the model to predict false positives when it makes a mistake, and worse for it to predict false negatives. It is very important to identify videos that break the terms of service, even if that means some opinion videos are misclassified as claims. The worst case for an opinion misclassified as a claim is that the video goes to human review. The worst case for a claim that is misclassified as an opinion is that the video does not get reviewed and it violates the terms of service.
TikTok post-lockdown migration: Xiaohongshu commen
kaggle.com
Updated Feb 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
YuanChunHong (2025). TikTok post-lockdown migration: Xiaohongshu commen [Dataset]. http://doi.org/10.34740/kaggle/dsv/10735086
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/10735086
Dataset updated
Feb 12, 2025
Dataset provided by
Kaggle
Authors
YuanChunHong
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This study focuses on a unique social media user migration phenomenon: a large number of U.S. users shifted to another Chinese social platform, Xiaohongshu, against the backdrop of the U.S. government's push to ban TikTok. By constructing a multidimensional analysis framework, this study systematically analyzes 5,919 user reviews collected during January 2025. The study uses MediaCrawler crawler technology to collect data, TextBlob for sentiment analysis, and combines geographic distribution, time trend and text theme analysis methods to deeply explore this unique user migration pattern. The study finds that despite policy pressure, users have a neutral to positive attitude towards platform migration, with 59.6% of neutral comments and 32.7% of positive comments. The analysis of geographic distribution shows that 88.7% of users in the United States have a significant “digital backlash”. Temporal trend analysis reveals the “bimodal” character of user discussions, reflecting the dynamic change of policy impact and users' continuous attention. Text analysis further shows that users are more concerned about the functional experience of the platform than political factors, reflecting rationality beyond geopolitics. These findings provide new perspectives for understanding social media user behavior in the context of globalization, and have important implications for social media policymaking and platform operation. The study suggests that in the digital era, administrative means have limited influence on users' platform choices, and users' social needs and behavioral choices often transcend geopolitical constraints.
Impact of Digital Habits on Mental Health
kaggle.com
Updated Jun 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shahzad Aslam (2025). Impact of Digital Habits on Mental Health [Dataset]. https://www.kaggle.com/datasets/zeesolver/mental-health
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 14, 2025
Dataset provided by
Kaggle
Authors
Shahzad Aslam
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

This dataset explores the relationship between digital behavior and mental well-being among 100,000 individuals. It records how much time people spend on screens, use of social media (including TikTok), and how these habits may influence their sleep, stress, and mood levels.

It includes six numerical features, all clean and ready for analysis, making it ideal for machine learning tasks like regression or classification. The data enables researchers and analysts to investigate how modern digital lifestyles may impact mental health indicators in measurable ways.

Dataset Applications

Quantify how screen‑time, TikTok use, or multi‑platform engagement statistically relate to stress, sleep loss, and mood.

Train regression or classification models that forecast stress level or mood score from real‑time digital‑usage metrics.

Feed user‑specific data into recommender systems that suggest screen‑time caps or bedtime routines to improve mental health.

Provide evidence for guidelines on youth screen‑time limits and platform moderation based on observed stress‑sleep trade‑offs.

Serve as a teaching dataset for EDA, feature engineering, and model evaluation in data‑science or psychology curricula.

Evaluate app interventions (e.g., screen‑time nudges) by comparing predicted versus actual post‑intervention stress or mood shifts.

Cluster individuals into digital‑behavior personas (e.g., “heavy late‑night scrollers”) to tailor mental‑health resources.

Generate synthetic time‑series scenarios (what‑if reductions in TikTok hours) to estimate downstream impacts on sleep and stress.

Use engineered features (ratio of TikTok hours to total screen‑time, etc.) in broader wellbeing models that include diet or exercise data.

Assess whether mental‑health prediction models remain accurate and unbiased across different screen‑time or platform‑use segments. # Column Descriptions

screen_time_hours – Daily total screen usage in hours across all devices.

social_media_platforms_used – Number of different social media platforms used per day.

hours_on_TikTok – Time spent on TikTok daily, in hours.

sleep_hours – Average number of sleep hours per night.

stress_level – Stress intensity reported on a scale from 1 (low) to 10 (high).

mood_score – Self-rated mood on a scale from 2 (poor) to 10 (excell # Inspiration This dataset was inspired by growing concerns about how screen time and social media affect mental health. It enables analysis of the links between digital habits, stress, sleep, and mood—encouraging data-driven solutions for healthier online behavior and emotional well-being. # Ethically Mined Data: This dataset has been ethically mined and synthetically generated without collecting any personally identifiable information. All values are artificial but statistically realistic, allowing safe use in academic, research, and public health projects while fully respecting user privacy and data ethics.
A Labelled Dataset for Sentiment Analysis of videos on YouTube, TikTok, and...
figshare.com
data.niaid.nih.gov
+2more
application/csv
Updated Jun 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nirmalya Thakur; Vanessa Su; Mingchen Shao; Kesha A. Patel; Hongseok Jeong; Victoria Knieling; Andrew Bian (2024). A Labelled Dataset for Sentiment Analysis of videos on YouTube, TikTok, and other sources about the 2024 Outbreak of Measles [Dataset]. http://doi.org/10.6084/m9.figshare.26086492.v1
Explore at:
application/csvAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.26086492.v1
Dataset updated
Jun 24, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
Nirmalya Thakur; Vanessa Su; Mingchen Shao; Kesha A. Patel; Hongseok Jeong; Victoria Knieling; Andrew Bian
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
YouTube
Description
Please cite the following paper when using this dataset:N. Thakur, V. Su, M. Shao, K. Patel, H. Jeong, V. Knieling, and A.Bian “A labelled dataset for sentiment analysis of videos on YouTube, TikTok, and other sources about the 2024 outbreak of measles,” arXiv [cs.CY], 2024. Available: https://doi.org/10.48550/arXiv.2406.07693AbstractThis dataset contains the data of 4011 videos about the ongoing outbreak of measles published on 264 websites on the internet between January 1, 2024, and May 31, 2024. These websites primarily include YouTube and TikTok, which account for 48.6% and 15.2% of the videos, respectively. The remainder of the websites include Instagram and Facebook as well as the websites of various global and local news organizations. For each of these videos, the URL of the video, title of the post, description of the post, and the date of publication of the video are presented as separate attributes in the dataset. After developing this dataset, sentiment analysis (using VADER), subjectivity analysis (using TextBlob), and fine-grain sentiment analysis (using DistilRoBERTa-base) of the video titles and video descriptions were performed. This included classifying each video title and video description into (i) one of the sentiment classes i.e. positive, negative, or neutral, (ii) one of the subjectivity classes i.e. highly opinionated, neutral opinionated, or least opinionated, and (iii) one of the fine-grain sentiment classes i.e. fear, surprise, joy, sadness, anger, disgust, or neutral. These results are presented as separate attributes in the dataset for the training and testing of machine learning algorithms for performing sentiment analysis or subjectivity analysis in this field as well as for other applications. The paper associated with this dataset (please see the above-mentioned citation) also presents a list of open research questions that may be investigated using this dataset.
c
Social Media Accounts (TikTok, YouTube, X/Twitter) of the Candidates in the...
datacatalogue.cessda.eu
search.gesis.org
Updated Apr 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Steup, Johannes Maximilian; Kielbassa, Pauline; Neumeier, Andreas; Riedl, Jasmin (2025). Social Media Accounts (TikTok, YouTube, X/Twitter) of the Candidates in the 2025 German Federal Election [Dataset]. http://doi.org/10.7802/2862
Explore at:
Unique identifier
https://doi.org/10.7802/2862
Dataset updated
Apr 2, 2025
Dataset provided by
Universität der Bundeswehr München
Authors
Steup, Johannes Maximilian; Kielbassa, Pauline; Neumeier, Andreas; Riedl, Jasmin
Area covered
Germany
Description
The research project, SPARTA (Social Media Analysis for Everyone), funded by dtec.bw (which is funded by the European Union – NextGenerationEU), monitors the 2025 German federal election live as it unfolds on TikTok, YouTube and X/Twitter. Since November 7, 2024, the day the "traffic light" coalition collapsed, we have been collecting and analyzing all German-language posts and reposts on X (formerly Twitter) related to the federal elections. Simultaneously, we gather data from TikTok and YouTube, focusing on the accounts of political parties, including those of candidates and current members of the Bundestag, during the same period. Our analysis includes, among other things, the stances expressed towards political parties and leading candidates, the most discussed issues and hashtags, the outreach of political parties across different platforms, the visibility of female candidates, the occurrence of negative campaigning, the rise of toxic language, and the activity of various actors across platforms. We publish the results in real time on our publicly accessible dashboard (https://dtecbw.de/sparta/), which provides interactive and customizable graphics, making it relevant to a broad audience from politics, academia, journalism, and society. To facilitate real-time analysis of the election campaign, we compiled a dataset based on the data of the federal election officer (Bundeswahlleiterin), containing the TikTok, YouTube and X/Twitter handles of all candidates running for a seat in the parliament. This dataset includes the handles as well as additional information about the candidates from eight political parties: AfD, BSW, Buendnis 90/Die Gruenen, CDU, CSU, Die Linke, FDP and SPD.
h
MASH
huggingface.co
Updated Jun 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ruichen Yao (2025). MASH [Dataset]. https://huggingface.co/datasets/YRC10/MASH
Explore at:
Dataset updated
Jun 1, 2025
Authors
Ruichen Yao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present a Multiplatform Annotated Dataset for Societal Impact of Hurricane (MASH) that includes 98,662 relevant social media data posts from Reddit, X, TikTok, and YouTube. In addition, all relevant posts are annotated on three dimensions: Humanitarian Classes, Bias Classes, and Information Integrity Classes in a multi-modal approach that considers both textual and visual content, providing a rich labeled dataset for in-depth analysis. The dataset is also complemented by an Online Analytics Platform that not only allows users to view hurricane-related posts and articles, but also explores high-frequency keywords, user sentiment, and the locations where posts were made. To our best knowledge, MASH is the first large-scale, multi-platform, multimodal, and multi-dimensionally annotated hurricane dataset. We envision that MASH can contribute to the study of hurricanes' impact on society, such as disaster severity classification, event detections, public sentiment analysis, and bias identification.
d
12.5M+ Tiktok Posts with 50K+ Plays | Global User Profiles Data | Social...
datarade.ai
.csv, .xls, .txt
Updated Jun 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Unify (2025). 12.5M+ Tiktok Posts with 50K+ Plays | Global User Profiles Data | Social Media Marketing & Brand Monitoring [Dataset]. https://datarade.ai/data-products/social-media-data-12-5m-tiktok-posts-with-50k-plays-pos-data-unify
Explore at:
.csv, .xls, .txtAvailable download formats
Dataset updated
Jun 17, 2025
Dataset authored and provided by
Data Unify
Area covered
Georgia, Ethiopia, Uruguay, Malawi, Seychelles, Cayman Islands, Croatia, Albania, France, Nigeria
Description
Unlock insights into high-performing content with this curated dataset of TikTok posts, each with over 50,000 plays. This collection surfaces the videos that resonate most with audiences—spanning creators, themes, and formats that drive virality.

📈 Performance Threshold: Only includes posts that have exceeded 50K views, ensuring a focus on high-engagement, trend-relevant content.

📱 Detailed Post Data: Captures video captions, play counts, likes, shares, comments, sound IDs, hashtags, and posting timestamps.

👤 Creator Metadata: Includes usernames, follower counts, bio snippets, and profile metrics to support creator analysis.

📊 Engagement Benchmarking: Useful for identifying viral content, measuring campaign performance, and refining creative strategies.

⚡ Trend Analysis Ready: Track how themes, hashtags, or sounds perform at scale within and across verticals.

🚀 Structured for Scale: Delivered in clean CSV format API, or custom format, ready for integration into analytics tools, dashboards, or model training environments.

This dataset is designed for marketers, agencies, analysts, and researchers looking to decode the mechanics of virality, identify top-performing content, and inform influencer strategy on TikTok. Whether you're building recommendation engines or planning your next campaign, this dataset offers a high-signal view into TikTok's most impactful content.

Facebook

Twitter

Click to copy link

Link copied

Cite

Bright Data (2022). TikTok Datasets [Dataset]. https://brightdata.com/products/datasets/tiktok

TikTok Datasets

Explore at:

.json, .csv, .xlsxAvailable download formats

Dataset updated

Sep 9, 2022

Dataset authored and provided by

Bright Datahttps://brightdata.com/

License

https://brightdata.com/licensehttps://brightdata.com/license

Area covered

Worldwide

Description

Use our TikTok profiles dataset to extract business and non-business information from complete public profiles and filter by account name, followers, create date, or engagement score. You may purchase the entire dataset or a customized subset depending on your needs. Popular use cases include sentiment analysis, brand monitoring, influencer marketing, and more. The TikTok dataset includes all major data points: timestamp, account name, nickname, bio,average engagement score, creation date, is_verified,l ikes, followers, external link in bio, and more. Get your TikTok dataset today!

Clear search

Close search

Google apps

Main menu

TikTok Datasets

TikTok Dataset Dataset

Data from: News on TikTok: An Annotated Dataset of TikTok Videos from...

Data from: TikTok dataset - Current affairs on TikTok. Virality and...

Dataset for the Instagram and TikTok problematic use

from TikTok Dataset

Dataset for The use and impact of TikTok in the 2022 Brazilian presidential...

‘Top 1000 TikTok Influencers Ranking’ analyzed by Analyst-2

Context

Content

Acknowledgements

Number of TikTok users in Malaysia 2018-2029

The Invasion of Ukraine Viewed through TikTok: A Dataset

TikTokData.xlsx

Data from: DataSet "Political communication on TikTok: from the feminisation...

TikTok Shop Datasets

TikTok Videos Reported Claims

TikTok post-lockdown migration: Xiaohongshu commen

Impact of Digital Habits on Mental Health

Context

Dataset Applications

A Labelled Dataset for Sentiment Analysis of videos on YouTube, TikTok, and...

Social Media Accounts (TikTok, YouTube, X/Twitter) of the Candidates in the...

MASH

12.5M+ Tiktok Posts with 50K+ Plays | Global User Profiles Data | Social...

TikTok Datasets