29 datasets found
  1. The Invasion of Ukraine Viewed through TikTok: A Dataset

    • zenodo.org
    bin, csv +1
    Updated May 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin Steel; Sara Parker; Derek Ruths; Benjamin Steel; Sara Parker; Derek Ruths (2023). The Invasion of Ukraine Viewed through TikTok: A Dataset [Dataset]. http://doi.org/10.5281/zenodo.7926959
    Explore at:
    text/x-python, bin, csvAvailable download formats
    Dataset updated
    May 13, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Benjamin Steel; Sara Parker; Derek Ruths; Benjamin Steel; Sara Parker; Derek Ruths
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Ukraine
    Description

    This is a dataset of videos and comments related to the invasion of Ukraine, published on TikTok by a number of users over the year of 2022. It was compiled by Benjamin Steel, Sara Parker and Derek Ruths at the Network Dynamics Lab, McGill University. We created this dataset to facilitate the study of TikTok, and the nature of social interaction on the platform relevant to a major political event.

    The dataset has been released here on Zenodo: https://doi.org/10.5281/zenodo.7926959 as well as on Github: https://github.com/networkdynamics/data-and-code/tree/master/ukraine_tiktok

    To create the dataset, we identified hashtags and keywords explicitly related to the conflict to collect a core set of videos (or ”TikToks”). We then compiled comments associated with these videos. All of the data captured is publically available information, and contains personally identifiable information. In total we collected approximately 16 thousand videos and 12 million comments, from approximately 6 million users. There are approximately 1.9 comments on average per user captured, and 1.5 videos per user who posted a video. The author personally collected this data using the web scraping PyTok library, developed by the author: https://github.com/networkdynamics/pytok.

    Due to scraping duration, this is just a sample of the publically available discourse concerning the invasion of Ukraine on TikTok. Due to the fuzzy search functionality of the TikTok, the dataset contains videos with a range of relatedness to the invasion.

    We release here the unique video IDs of the dataset in a CSV format. The data was collected without the specific consent of the content creators, so we have released only the data required to re-create it, to allow users to delete content from TikTok and be removed from the dataset if they wish. Contained in this repository are scripts that will automatically pull the full dataset, which will take the form of JSON files organised into a folder for each video. The JSON files are the entirety of the data returned by the TikTok API. We include a script to parse the JSON files into CSV files with the most commonly used data. We plan to further expand this dataset as collection processes progress and the war continues. We will version the dataset to ensure reproducibility.

    To build this dataset from the IDs here:

    1. Go to https://github.com/networkdynamics/pytok and clone the repo locally
    2. Run pip install -e . in the pytok directory
    3. Run pip install pandas tqdm to install these libraries if not already installed
    4. Run get_videos.py to get the video data
    5. Run video_comments.py to get the comment data
    6. Run user_tiktoks.py to get the video history of the users
    7. Run hashtag_tiktoks.py or search_tiktoks.py to get more videos from other hashtags and search terms
    8. Run load_json_to_csv.py to compile the JSON files into two CSV files, comments.csv and videos.csv

    If you get an error about the wrong chrome version, use the command line argument get_videos.py --chrome-version YOUR_CHROME_VERSION Please note pulling data from TikTok takes a while! We recommend leaving the scripts running on a server for a while for them to finish downloading everything. Feel free to play around with the delay constants to either speed up the process or avoid TikTok rate limiting.

    Please do not hesitate to make an issue in this repo to get our help with this!

    The videos.csv will contain the following columns:

    video_id: Unique video ID

    createtime: UTC datetime of video creation time in YYYY-MM-DD HH:MM:SS format

    author_name: Unique author name

    author_id: Unique author ID

    desc: The full video description from the author

    hashtags: A list of hashtags used in the video description

    share_video_id: If the video is sharing another video, this is the video ID of that original video, else empty

    share_video_user_id: If the video is sharing another video, this the user ID of the author of that video, else empty

    share_video_user_name: If the video is sharing another video, this is the user name of the author of that video, else empty

    share_type: If the video is sharing another video, this is the type of the share, stitch, duet etc.

    mentions: A list of users mentioned in the video description, if any

    The comments.csv will contain the following columns:

    comment_id: Unique comment ID

    createtime: UTC datetime of comment creation time in YYYY-MM-DD HH:MM:SS format

    author_name: Unique author name

    author_id: Unique author ID

    text: Text of the comment

    mentions: A list of users that are tagged in the comment

    video_id: The ID of the video the comment is on

    comment_language: The language of the comment, as predicted by the TikTok API

    reply_comment_id: If the comment is replying to another comment, this is the ID of that comment

    The date can be compiled into a user interaction network to facilitate study of interaction dynamics. There is code to help with that here: https://github.com/networkdynamics/polar-seeds. Additional scripts for further preprocessing of this data can be found there too.

  2. h

    TikTok-10M

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataset Company, TikTok-10M [Dataset]. https://huggingface.co/datasets/The-data-company/TikTok-10M
    Explore at:
    Dataset authored and provided by
    Dataset Company
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    TikTok-10M Dataset

      Dataset Description
    

    TikTok-10M is a large-scale dataset containing 10 million short-form posts from TikTok, designed for video understanding, multimodal learning, and social media content analysis. The dataset was curated to bridge the gap between academic video datasets and actual user-generated content, providing researchers with authentic patterns and characteristics of modern short-form video content that dominates social media platforms.… See the full description on the dataset page: https://huggingface.co/datasets/The-data-company/TikTok-10M.

  3. f

    TikTokData.xlsx

    • figshare.com
    xlsx
    Updated Jun 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emily Zawacki (2022). TikTokData.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.20069333.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 14, 2022
    Dataset provided by
    figshare
    Authors
    Emily Zawacki
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We used TikTok’s built-in account analytics to download and record video and account metrics for the period between 10/8/2021 and 2/6/2022. We collected the following summary data for each individual video: video views, likes, comments, shares, total cumulative play time, average duration the video was watched, percentage of viewers who watched the full video, unique reached audience, and the percentage of video views by section (For You, personal profile, Following, hashtags).
    We evaluated the “success” of videos based on reach and engagement metrics, as well as viewer retention (how long a video is watched). We used metrics of reach (number of unique users the video was seen by) and engagement (likes, comments, and shares) to calculate the engagement rate of each video. The engagement rate is calculated as the engagement parameter as a percentage of total reach (e.g., Likes / Audience Reached *100).

  4. TikHarm Dataset

    • kaggle.com
    Updated Jun 29, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    An Hoang Vo (2024). TikHarm Dataset [Dataset]. https://www.kaggle.com/datasets/anhoangvo/tikharm-dataset/versions/1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 29, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    An Hoang Vo
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The TikHarm dataset is a curated collection of TikTok videos designed to train models for classifying harmful content. The dataset is in the format of UCF101, and it is specifically focused on content accessible to children, with the aim of distinguishing between different types of potentially harmful material.

    Data Collection:

    Data was gathered from TikTok, targeting videos that are accessible to children to ensure the dataset reflects the type of content they are likely to encounter.

    Data Labeling:

    Collected videos were manually labeled into four predefined categories: - Harmful Content: Videos that depict violence, dangerous actions that children might imitate, or other harmful behavior. - Adult Content: Videos containing sexual content or other material deemed inappropriate for children. - Safe: Videos that are appropriate and safe for children to view: popular cartoon, etc. - Suicide: Videos that depict, suggest, or discuss suicidal behavior or ideation.

    Dataset Statistics:

    SubsetSamplesMin Duration (s)Max Duration (s)Avg Duration (s)Total Duration (h)
    Train27623.8860038.7129.71
    Dev7905.0460038.574.24
    Test3961.9560038.778.51


    ClassSamplesMin Duration (s)Max Duration (s)Avg Duration (s)Total Duration (h)
    Safe9975.04568.865.3618.1
    Adult9771.9560036.259.84
    Harmful9904.860035.929.88
    Suicide9843.88181.2316.964.63

    These tables present the duration statistics for each subset and class within the TikHarm dataset.

    This comprehensive dataset is invaluable for developing robust video classification models to automatically detect and categorize harmful content on social media platforms.

  5. h

    ai-tube-tik-tak-tok

    • huggingface.co
    Updated Dec 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julian Bilcke (2023). ai-tube-tik-tak-tok [Dataset]. https://huggingface.co/datasets/jbilcke-hf/ai-tube-tik-tak-tok
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 21, 2023
    Authors
    Julian Bilcke
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Description

    Tik Tak Tok - Est. 2023

      Model
    

    HotshotXL

      Voice
    

    Julian

      Orientation
    

    Portrait

      Tags
    

    Short Dancing

      Style
    

    tiktok video, instagram, beautiful, sharp, detailed

      Music
    

    mainstream pop music

      Prompt
    

    A channel generating short vertical videos, between 20 seconds and 60 seconds Most videos are about people dancing, doing choregraphy, or talking selfies, filming their cats, daily life (eg. going to a cafe… See the full description on the dataset page: https://huggingface.co/datasets/jbilcke-hf/ai-tube-tik-tak-tok.

  6. l

    Top 10 Most Viral TikTok Videos of 2024

    • learningrevolution.net
    html
    Updated Jun 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jawad Khan (2025). Top 10 Most Viral TikTok Videos of 2024 [Dataset]. https://www.learningrevolution.net/viral-on-tiktok/
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Jun 24, 2025
    Authors
    Jawad Khan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A ranked dataset of the most viral TikTok videos in 2024, based on total views and creator engagement.

  7. h

    TikTok_Most_Shared_Video_Transcription_Example

    • huggingface.co
    Updated Jul 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Masa (2025). TikTok_Most_Shared_Video_Transcription_Example [Dataset]. https://huggingface.co/datasets/MasaFoundation/TikTok_Most_Shared_Video_Transcription_Example
    Explore at:
    Dataset updated
    Jul 17, 2025
    Dataset authored and provided by
    Masa
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    📲 Example Dataset: TikTok Scraper Tool

    👉 Start Scraping TikTok: TikTok Scraper Tool

      ✨ Key Features
    

    ⚡ Instant Transcription – Turn any TikTok video into an AI-ready transcript
    🎯 Metadata – Get the title, language description, and video hashtags
    🔗 URL-Based Access – Just drop in a TikTok video URL to start scraping
    🧩 LLM-Ready Output – Receive clean JSON ready for agents, RAG, or AI tools
    💸 Free Tier – Use up to 100 queries during the beta period
    💫 Easy… See the full description on the dataset page: https://huggingface.co/datasets/MasaFoundation/TikTok_Most_Shared_Video_Transcription_Example.

  8. D

    Dataset for "Short-Form Videos Degrade Our Capacity to Retain Intentions:...

    • darus.uni-stuttgart.de
    • b2find.eudat.eu
    Updated Sep 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Francesco Chiossi; Luke Haliburton; Changkun Ou; Andreas Butz; Albrecht Schmidt (2024). Dataset for "Short-Form Videos Degrade Our Capacity to Retain Intentions: Effect of Context Switching On Prospective Memory" [Dataset]. http://doi.org/10.18419/DARUS-3327
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 16, 2024
    Dataset provided by
    DaRUS
    Authors
    Francesco Chiossi; Luke Haliburton; Changkun Ou; Andreas Butz; Albrecht Schmidt
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Dataset funded by
    DFG
    Description

    Social media platforms use short, highly engaging videos to catch users’ attention. While the short-form video feeds popularized by TikTok are rapidly spreading to other platforms, we do not yet understand their impact on cognitive functions. We conducted a between-subjects experiment (𝑁 = 60) investigating the impact of engaging with TikTok, Twitter, and YouTube while performing a Prospective Memory task (i.e., executing a previously planned action). The study required participants to remember intentions over interruptions. We found that the TikTok condition significantly degraded the users’ performance in this task. As none of the other conditions (Twitter, YouTube, no activity) had a similar effect, our results indicate that the combination of short videos and rapid context-switching impairs intention recall and execution. We contribute a quantified understanding of the effect of social media feed format on Prospective Memory and outline consequences for media technology designers not to harm the users’ memory and wellbeing. Description of the Dataset Data frame: The ./data/rt.csv provides the data frame of reaction times. The ./data/acc.csv provides the data frame of reaction accuracy scores. The ./data/q.csv provides the data frame collected from questionnaires. The ./data/ddm.csv is the learned DDM features using ./appendix2_ddm_fitting.ipynb, which is then used in ./3.ddm_anova.ipynb. Figures: All figures appeared in the paper are placed in ./figures and can be reproduced using *_vis.ipynb files.

  9. h

    MovingFashion

    • huggingface.co
    Updated Jul 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christian Joppi (2025). MovingFashion [Dataset]. https://huggingface.co/datasets/christianjoppi/MovingFashion
    Explore at:
    Dataset updated
    Jul 27, 2025
    Authors
    Christian Joppi
    Description

    MovingFashion Dataset

    MovingFashion is the first publicly available benchmark designed to address the video-to-shop challenge in computer vision, where the goal is to retrieve fashion items worn in social media videos (e.g., Instagram, TikTok) by matching them to corresponding e-commerce product images. GitHub Repository license: cc-by-nc-4.0

      Overview
    

    Total Videos: 14,855 social videos
    Source Platforms: Instagram, TikTok, and Net-A-Porter
    Associated Shop Images:… See the full description on the dataset page: https://huggingface.co/datasets/christianjoppi/MovingFashion.

  10. h

    myanmar_cele_voices

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wynn, myanmar_cele_voices [Dataset]. https://huggingface.co/datasets/freococo/myanmar_cele_voices
    Explore at:
    Authors
    Wynn
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Area covered
    Myanmar (Burma)
    Description

    Myanmar Celebrity Voices

    A high-quality speech dataset extracted from the official TikTok channel of Myanmar Celebrity TV.

    Myanmar Celebrity Voices is a collection of 69,781 short audio segments (≈46 hours total) derived from public TikTok videos by The Official TikTok Channel of Myanmar Celebrity TV — one of the most popular digital media platforms in Myanmar. The source channel regularly publishes:

    Interviews with Myanmar’s top movie actors and actresses Behind-the-scenes… See the full description on the dataset page: https://huggingface.co/datasets/freococo/myanmar_cele_voices.

  11. LMVD

    • figshare.com
    bin
    Updated Apr 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lang He (2024). LMVD [Dataset]. http://doi.org/10.6084/m9.figshare.25698351.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Apr 26, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Lang He
    License

    https://www.apache.org/licenses/LICENSE-2.0.htmlhttps://www.apache.org/licenses/LICENSE-2.0.html

    Description

    We propose a large-scale multimodal video log database (LMVD) for identifying depression in the wild. In LMVD, there were 1823 samples, capturing 214 hours of 1475 participants from four multimedia platforms (Sina Weibo, Bilibili, Tiktok, and YouTube). For all collected data, we extract video features and audio features separately. For audio features, use a pre trained VGGish41 model. For visual features, use FAU, facial markers, eye gaze, and head posture features. It is worth mentioning that our LMVD is the largest dataset for identifying visual and auditory depression in an individual's daily life, which is a positive contribution to the field of emotional computing.

  12. Most Streamed Spotify Songs 2024

    • kaggle.com
    Updated Jun 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nidula Elgiriyewithana ⚡ (2024). Most Streamed Spotify Songs 2024 [Dataset]. http://doi.org/10.34740/kaggle/dsv/8700156
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 15, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Nidula Elgiriyewithana ⚡
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Description

    This dataset presents a comprehensive compilation of the most streamed songs on Spotify in 2024. It provides extensive insights into each track's attributes, popularity, and presence on various music platforms, offering a valuable resource for music analysts, enthusiasts, and industry professionals. The dataset includes information such as track name, artist, release date, ISRC, streaming statistics, and presence on platforms like YouTube, TikTok, and more.

    DOI

    Here is the link for the 2023 data: "https://www.kaggle.com/datasets/nelgiriyewithana/top-spotify-songs-2023">Most Streamed Spotify Songs 2023 🟢

    Key Features

    • Track Name: Name of the song.
    • Album Name: Name of the album the song belongs to.
    • Artist: Name of the artist(s) of the song.
    • Release Date: Date when the song was released.
    • ISRC: International Standard Recording Code for the song.
    • All Time Rank: Ranking of the song based on its all-time popularity.
    • Track Score: Score assigned to the track based on various factors.
    • Spotify Streams: Total number of streams on Spotify.
    • Spotify Playlist Count: Number of Spotify playlists the song is included in.
    • Spotify Playlist Reach: Reach of the song across Spotify playlists.
    • Spotify Popularity: Popularity score of the song on Spotify.
    • YouTube Views: Total views of the song's official video on YouTube.
    • YouTube Likes: Total likes on the song's official video on YouTube.
    • TikTok Posts: Number of TikTok posts featuring the song.
    • TikTok Likes: Total likes on TikTok posts featuring the song.
    • TikTok Views: Total views on TikTok posts featuring the song.
    • YouTube Playlist Reach: Reach of the song across YouTube playlists.
    • Apple Music Playlist Count: Number of Apple Music playlists the song is included in.
    • AirPlay Spins: Number of times the song has been played on radio stations.
    • SiriusXM Spins: Number of times the song has been played on SiriusXM.
    • Deezer Playlist Count: Number of Deezer playlists the song is included in.
    • Deezer Playlist Reach: Reach of the song across Deezer playlists.
    • Amazon Playlist Count: Number of Amazon Music playlists the song is included in.
    • Pandora Streams: Total number of streams on Pandora.
    • Pandora Track Stations: Number of Pandora stations featuring the song.
    • Soundcloud Streams: Total number of streams on Soundcloud.
    • Shazam Counts: Total number of times the song has been Shazamed.
    • TIDAL Popularity: Popularity score of the song on TIDAL.
    • Explicit Track: Indicates whether the song contains explicit content.

    Potential Use Cases

    • Music Analysis: Analyze trends in audio features to understand popular song characteristics.
    • Platform Comparison: Compare song popularity across different music platforms.
    • Artist Impact: Study the relationship between artist attributes and song success.
    • Temporal Trends: Identify changes in music attributes and preferences over time.
    • Cross-Platform Presence: Investigate song performance across various streaming services.

    Your support through an upvote would be greatly appreciated if you find this dataset useful! ❤️🙂 Thank you.

  13. l

    Viral Views by Platform – How Many Views Is Viral (2025)

    • learningrevolution.net
    html
    Updated Jun 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jawad Khan (2025). Viral Views by Platform – How Many Views Is Viral (2025) [Dataset]. https://www.learningrevolution.net/how-many-views-is-viral/
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Jun 23, 2025
    Authors
    Jawad Khan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Platform, Time to Go Viral, Viral Views Threshold
    Description

    A structured dataset comparing viral view thresholds and timeframes across major platforms, including TikTok, YouTube (long-form & Shorts), Instagram Reels, Facebook, Twitter (X), LinkedIn Video, and LinkedIn Posts.

  14. Social Media Datasets

    • brightdata.com
    .json, .csv, .xlsx
    Updated Sep 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2024). Social Media Datasets [Dataset]. https://brightdata.com/products/datasets/social-media
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Sep 18, 2024
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Gain valuable insights with our comprehensive Social Media Dataset, designed to help businesses, marketers, and analysts track trends, monitor engagement, and optimize strategies. This dataset provides structured and reliable social media data from multiple platforms.

    Dataset Features

    User Profiles: Access public social media profiles, including usernames, bios, follower counts, engagement metrics, and more. Ideal for audience analysis, influencer marketing, and competitive research. Posts & Content: Extract posts, captions, hashtags, media (images/videos), timestamps, and engagement metrics such as likes, shares, and comments. Useful for trend analysis, sentiment tracking, and content strategy optimization. Comments & Interactions: Analyze user interactions, including replies, mentions, and discussions. This data helps brands understand audience sentiment and engagement patterns. Hashtag & Trend Tracking: Monitor trending hashtags, topics, and viral content across platforms to stay ahead of industry trends and consumer interests.

    Customizable Subsets for Specific Needs Our Social Media Dataset is fully customizable, allowing you to filter data based on platform, region, keywords, engagement levels, or specific user profiles. Whether you need a broad dataset for market research or a focused subset for brand monitoring, we tailor the dataset to your needs.

    Popular Use Cases

    Brand Monitoring & Reputation Management: Track brand mentions, customer feedback, and sentiment analysis to manage online reputation effectively. Influencer Marketing & Audience Analysis: Identify key influencers, analyze engagement metrics, and optimize influencer partnerships. Competitive Intelligence: Monitor competitor activity, content performance, and audience engagement to refine marketing strategies. Market Research & Consumer Insights: Analyze social media trends, customer preferences, and emerging topics to inform business decisions. AI & Predictive Analytics: Leverage structured social media data for AI-driven trend forecasting, sentiment analysis, and automated content recommendations.

    Whether you're tracking brand sentiment, analyzing audience engagement, or monitoring industry trends, our Social Media Dataset provides the structured data you need. Get started today and customize your dataset to fit your business objectives.

  15. f

    Original data set used for the current study.

    • plos.figshare.com
    xlsx
    Updated Feb 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Genyan Jiang; Lei Chen; Lan Geng; Yuhan Zhang; Zhiqi Chen; Yaqi Zhu; Shuangshuang Ma; Mei Zhao (2025). Original data set used for the current study. [Dataset]. http://doi.org/10.1371/journal.pone.0316242.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 6, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Genyan Jiang; Lei Chen; Lan Geng; Yuhan Zhang; Zhiqi Chen; Yaqi Zhu; Shuangshuang Ma; Mei Zhao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundTikTok is an important channel for consumers to obtain and adopt health information. However, misinformation on TikTok could potentially impact public health. Currently, the quality of content related to GDM on TikTok has not been thoroughly reviewed.ObjectiveThis study aims to explore the information quality of GDM videos on TikTok.MethodsA comprehensive cross-sectional study was conducted on TikTok videos related to GDM. The quality of the videos was assessed using three standardized evaluation tools: DISCERN, the Journal of the American Medical Association (JAMA) benchmarks, and the Global Quality Scale (GQS). The comprehensiveness of the content was evaluated through six questions covering definitions, signs/symptoms, risk factors, evaluation, management, and outcomes. Additionally, a correlational analysis was conducted between video quality and the characteristics of the uploaders and the videos themselves.ResultsA total of 216 videos were included in the final analysis, with 162 uploaded by health professionals, 40 by general users, and the remaining videos contributed by individual science communicators, for-profit organizations, and news agencies. The average DISCERN, JAMA, and GQS scores for all videos were 48.87, 1.86, and 2.06, respectively. The videos uploaded by health professionals scored the highest in DISCERN, while the videos uploaded by individual science communicators scored significantly higher in JAMA and GQS than those from other sources. Correlation analysis between video quality and video features showed DISCERN scores, JAMA scores and GQS scores were positively correlated with video duration (P

  16. h

    UGC-VideoCap

    • huggingface.co
    Updated Jul 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Memories.ai Research (2025). UGC-VideoCap [Dataset]. https://huggingface.co/datasets/openinterx/UGC-VideoCap
    Explore at:
    Dataset updated
    Jul 16, 2025
    Dataset authored and provided by
    Memories.ai Research
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    UGC-VideoCaptioner Dataset

    Real-world user-generated videos, especially on platforms like TikTok, often feature rich and intertwined audio-visual content. However, existing video captioning benchmarks and models remain predominantly visual-centric, overlooking the crucial role of audio in conveying scene dynamics, speaker intent, and narrative context. This lack of full-modality datasets and lightweight, capable models hampers progress in fine-grained, multimodal video… See the full description on the dataset page: https://huggingface.co/datasets/openinterx/UGC-VideoCap.

  17. f

    Original data set used for the current study.

    • plos.figshare.com
    xlsx
    Updated Mar 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juanjuan Zhang; Jun Yuan; Danqin Zhang; Yi Yang; Chaoyun Wang; Zhiqian Dou; Yan Li (2024). Original data set used for the current study. [Dataset]. http://doi.org/10.1371/journal.pone.0300180.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 8, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Juanjuan Zhang; Jun Yuan; Danqin Zhang; Yi Yang; Chaoyun Wang; Zhiqian Dou; Yan Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundThe development of short popular science video platforms helps people obtain health information, but no research has evaluated the information characteristics and quality of short videos related to cervical cancer. The purpose of this study was to evaluate the quality and reliability of short cervical cancer-related videos on TikTok and Kwai.MethodsThe Chinese keyword "cervical cancer" was used to search for related videos on TikTok and Kwai, and a total of 163 videos were ultimately included. The overall quality of these videos was evaluated by the Global Quality Score (GQS) and the modified DISCERN tool.ResultsA total of 163 videos were included in this study, TikTok and Kwai contributed 82 and 81 videos, respectively. Overall, these videos received much attention; the median number of likes received was 1360 (403–6867), the median number of comments was 147 (40–601), and the median number of collections was 282 (71–1296). In terms of video content, the etiology of cervical cancer was the most frequently discussed topic. Short videos posted on TikTok received more attention than did those posted on Kwai, and the GQS and DISCERN score of videos posted on TikTok were significantly better than those of videos posted on Kwai. In addition, the videos posted by specialists were of the highest quality, with a GQS and DISCERN score of 3 (2–3) and 2 (2–3), respectively. Correlation analysis showed that GQS was significantly correlated with the modified DISCERN scores (p

  18. Instagram: distribution of global audiences 2024, by gender

    • statista.com
    • es.statista.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon, Instagram: distribution of global audiences 2024, by gender [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    As of January 2024, Instagram was slightly more popular with men than women, with men accounting for 50.6 percent of the platform’s global users. Additionally, the social media app was most popular amongst younger audiences, with almost 32 percent of users aged between 18 and 24 years.

                  Instagram’s Global Audience
    
                  As of January 2024, Instagram was the fourth most popular social media platform globally, reaching two billion monthly active users (MAU). This number is projected to keep growing with no signs of slowing down, which is not a surprise as the global online social penetration rate across all regions is constantly increasing.
                  As of January 2024, the country with the largest Instagram audience was India with 362.9 million users, followed by the United States with 169.7 million users.
    
                  Who is winning over the generations?
    
                  Even though Instagram’s audience is almost twice the size of TikTok’s on a global scale, TikTok has shown itself to be a fierce competitor, particularly amongst younger audiences. TikTok was the most downloaded mobile app globally in 2022, generating 672 million downloads. As of 2022, Generation Z in the United States spent more time on TikTok than on Instagram monthly.
    
  19. Average daily time spent on social media worldwide 2012-2024

    • statista.com
    • es.statista.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon, Average daily time spent on social media worldwide 2012-2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    How much time do people spend on social media?

                  As of 2024, the average daily social media usage of internet users worldwide amounted to 143 minutes per day, down from 151 minutes in the previous year. Currently, the country with the most time spent on social media per day is Brazil, with online users spending an average of three hours and 49 minutes on social media each day. In comparison, the daily time spent with social media in
                  the U.S. was just two hours and 16 minutes. Global social media usageCurrently, the global social network penetration rate is 62.3 percent. Northern Europe had an 81.7 percent social media penetration rate, topping the ranking of global social media usage by region. Eastern and Middle Africa closed the ranking with 10.1 and 9.6 percent usage reach, respectively.
                  People access social media for a variety of reasons. Users like to find funny or entertaining content and enjoy sharing photos and videos with friends, but mainly use social media to stay in touch with current events friends. Global impact of social mediaSocial media has a wide-reaching and significant impact on not only online activities but also offline behavior and life in general.
                  During a global online user survey in February 2019, a significant share of respondents stated that social media had increased their access to information, ease of communication, and freedom of expression. On the flip side, respondents also felt that social media had worsened their personal privacy, increased a polarization in politics and heightened everyday distractions.
    
  20. Instagram: distribution of global audiences 2024, by age group

    • statista.com
    • es.statista.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon, Instagram: distribution of global audiences 2024, by age group [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    As of April 2024, almost 32 percent of global Instagram audiences were aged between 18 and 24 years, and 30.6 percent of users were aged between 25 and 34 years. Overall, 16 percent of users belonged to the 35 to 44 year age group.

                  Instagram users
    
                  With roughly one billion monthly active users, Instagram belongs to the most popular social networks worldwide. The social photo sharing app is especially popular in India and in the United States, which have respectively 362.9 million and 169.7 million Instagram users each.
    
                  Instagram features
    
                  One of the most popular features of Instagram is Stories. Users can post photos and videos to their Stories stream and the content is live for others to view for 24 hours before it disappears. In January 2019, the company reported that there were 500 million daily active Instagram Stories users. Instagram Stories directly competes with Snapchat, another photo sharing app that initially became famous due to it’s “vanishing photos” feature.
                  As of the second quarter of 2021, Snapchat had 293 million daily active users.
    
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Benjamin Steel; Sara Parker; Derek Ruths; Benjamin Steel; Sara Parker; Derek Ruths (2023). The Invasion of Ukraine Viewed through TikTok: A Dataset [Dataset]. http://doi.org/10.5281/zenodo.7926959
Organization logo

The Invasion of Ukraine Viewed through TikTok: A Dataset

Explore at:
10 scholarly articles cite this dataset (View in Google Scholar)
text/x-python, bin, csvAvailable download formats
Dataset updated
May 13, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Benjamin Steel; Sara Parker; Derek Ruths; Benjamin Steel; Sara Parker; Derek Ruths
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered
Ukraine
Description

This is a dataset of videos and comments related to the invasion of Ukraine, published on TikTok by a number of users over the year of 2022. It was compiled by Benjamin Steel, Sara Parker and Derek Ruths at the Network Dynamics Lab, McGill University. We created this dataset to facilitate the study of TikTok, and the nature of social interaction on the platform relevant to a major political event.

The dataset has been released here on Zenodo: https://doi.org/10.5281/zenodo.7926959 as well as on Github: https://github.com/networkdynamics/data-and-code/tree/master/ukraine_tiktok

To create the dataset, we identified hashtags and keywords explicitly related to the conflict to collect a core set of videos (or ”TikToks”). We then compiled comments associated with these videos. All of the data captured is publically available information, and contains personally identifiable information. In total we collected approximately 16 thousand videos and 12 million comments, from approximately 6 million users. There are approximately 1.9 comments on average per user captured, and 1.5 videos per user who posted a video. The author personally collected this data using the web scraping PyTok library, developed by the author: https://github.com/networkdynamics/pytok.

Due to scraping duration, this is just a sample of the publically available discourse concerning the invasion of Ukraine on TikTok. Due to the fuzzy search functionality of the TikTok, the dataset contains videos with a range of relatedness to the invasion.

We release here the unique video IDs of the dataset in a CSV format. The data was collected without the specific consent of the content creators, so we have released only the data required to re-create it, to allow users to delete content from TikTok and be removed from the dataset if they wish. Contained in this repository are scripts that will automatically pull the full dataset, which will take the form of JSON files organised into a folder for each video. The JSON files are the entirety of the data returned by the TikTok API. We include a script to parse the JSON files into CSV files with the most commonly used data. We plan to further expand this dataset as collection processes progress and the war continues. We will version the dataset to ensure reproducibility.

To build this dataset from the IDs here:

  1. Go to https://github.com/networkdynamics/pytok and clone the repo locally
  2. Run pip install -e . in the pytok directory
  3. Run pip install pandas tqdm to install these libraries if not already installed
  4. Run get_videos.py to get the video data
  5. Run video_comments.py to get the comment data
  6. Run user_tiktoks.py to get the video history of the users
  7. Run hashtag_tiktoks.py or search_tiktoks.py to get more videos from other hashtags and search terms
  8. Run load_json_to_csv.py to compile the JSON files into two CSV files, comments.csv and videos.csv

If you get an error about the wrong chrome version, use the command line argument get_videos.py --chrome-version YOUR_CHROME_VERSION Please note pulling data from TikTok takes a while! We recommend leaving the scripts running on a server for a while for them to finish downloading everything. Feel free to play around with the delay constants to either speed up the process or avoid TikTok rate limiting.

Please do not hesitate to make an issue in this repo to get our help with this!

The videos.csv will contain the following columns:

video_id: Unique video ID

createtime: UTC datetime of video creation time in YYYY-MM-DD HH:MM:SS format

author_name: Unique author name

author_id: Unique author ID

desc: The full video description from the author

hashtags: A list of hashtags used in the video description

share_video_id: If the video is sharing another video, this is the video ID of that original video, else empty

share_video_user_id: If the video is sharing another video, this the user ID of the author of that video, else empty

share_video_user_name: If the video is sharing another video, this is the user name of the author of that video, else empty

share_type: If the video is sharing another video, this is the type of the share, stitch, duet etc.

mentions: A list of users mentioned in the video description, if any

The comments.csv will contain the following columns:

comment_id: Unique comment ID

createtime: UTC datetime of comment creation time in YYYY-MM-DD HH:MM:SS format

author_name: Unique author name

author_id: Unique author ID

text: Text of the comment

mentions: A list of users that are tagged in the comment

video_id: The ID of the video the comment is on

comment_language: The language of the comment, as predicted by the TikTok API

reply_comment_id: If the comment is replying to another comment, this is the ID of that comment

The date can be compiled into a user interaction network to facilitate study of interaction dynamics. There is code to help with that here: https://github.com/networkdynamics/polar-seeds. Additional scripts for further preprocessing of this data can be found there too.

Search
Clear search
Close search
Google apps
Main menu