21 datasets found
  1. TikTok Dataset

    • kaggle.com
    zip
    Updated Jul 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Anas Mahmood (2022). TikTok Dataset [Dataset]. https://www.kaggle.com/datasets/muhammadanasmahmood/tiktok-dataset
    Explore at:
    zip(733532 bytes)Available download formats
    Dataset updated
    Jul 27, 2022
    Authors
    Muhammad Anas Mahmood
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This is the Dataset of popular hashtags on TikTok, this includes the author name, author id, author signature, comment count, hashtags details, URL, share count, hashtags which i scrape are meme, funny, humor, comedy, education, lol, dance, song, music, etc.

  2. Tiktok 2025 Dataset

    • kaggle.com
    zip
    Updated Jun 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haziq Halifi (2025). Tiktok 2025 Dataset [Dataset]. https://www.kaggle.com/datasets/haziqhalifi/tiktok-2025-dataset
    Explore at:
    zip(889553 bytes)Available download formats
    Dataset updated
    Jun 13, 2025
    Authors
    Haziq Halifi
    Description

    This dataset contains comprehensive information about TikTok posts, originally fetched from RapidAPI. It provides valuable insights into various aspects of TikTok content, including details about the videos, their creators, and audience engagement metrics.

    Here's a breakdown of the columns included in this dataset:

    video_id: A unique identifier for each TikTok video. author: The username or handle of the TikTok account that posted the video. description: The textual description or caption provided by the creator for the video. (Note: This column contains some missing values.) likes: The number of likes the video has received. comments: The number of comments on the video. shares: The number of times the video has been shared. plays: The total number of plays or views the video has accumulated. (Note: This column contains some missing values.) hashtags: A list of hashtags used in the video's description, which helps categorize content and improve discoverability. (Note: This column contains some missing values.) music: Information about the background music or sound used in the video. create_time: The timestamp indicating when the video was created or published. (Note: This column contains some missing values.) video_url: The direct URL to the TikTok video. fetch_time: The timestamp when the data for the video was fetched from the API. (Note: This column has a high number of missing values.) views: Another metric for the number of views. (Note: This column has a high number of missing values and appears to overlap with plays.) posted_time: The time the video was posted. (Note: This column has a high number of missing values and appears to overlap with create_time.) Potential Uses of This Dataset:

    Content Analysis: Analyze popular TikTok content by examining descriptions, hashtags, and engagement metrics. Trend Identification: Identify trending topics, music, and creators on TikTok. Audience Engagement Studies: Understand how different types of content generate likes, comments, shares, and plays. Creator Analysis: Study the posting habits and performance of various TikTok creators. Social Media Research: Conduct research on the dynamics of content dissemination and user interaction on short-form video platforms. Notes on Data Quality:

    The description, plays, hashtags, and create_time columns have some missing values, which may require handling (e.g., imputation or removal) depending on your analysis. The fetch_time, views, and posted_time columns are largely empty, suggesting they may not be reliable for comprehensive analysis. It is recommended to primarily rely on create_time for timestamps and plays for engagement metrics. This dataset can be a valuable resource for anyone looking to explore the vast and dynamic world of TikTok content and user engagement.

  3. 🚀 Viral Social Media Trends & Engagement Analysis

    • kaggle.com
    zip
    Updated May 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Atharva Soundankar (2025). 🚀 Viral Social Media Trends & Engagement Analysis [Dataset]. https://www.kaggle.com/datasets/atharvasoundankar/viral-social-media-trends-and-engagement-analysis
    Explore at:
    zip(230834 bytes)Available download formats
    Dataset updated
    May 23, 2025
    Authors
    Atharva Soundankar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset captures the pulse of viral social media trends across TikTok, Instagram, Twitter, and YouTube. It provides insights into the most popular hashtags, content types, and user engagement levels, offering a comprehensive view of how trends unfold across platforms. With regional data and influencer-driven content, this dataset is perfect for:

    • Trend analysis 🔍
    • Sentiment modeling 💭
    • Understanding influencer marketing 📈

    Dive in to explore what makes content go viral, the behaviors that drive engagement, and how trends evolve on a global scale! 🌍

  4. Tiktok Trending Hashtags

    • kaggle.com
    zip
    Updated Dec 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ronan Takizawa (2025). Tiktok Trending Hashtags [Dataset]. https://www.kaggle.com/datasets/ronantakizawa/tiktok-trending-hashtags
    Explore at:
    zip(18358 bytes)Available download formats
    Dataset updated
    Dec 1, 2025
    Authors
    Ronan Takizawa
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    TikTok Trending Hashtags (2022-2025)

    A comprehensive dataset of trending hashtags on TikTok from 2022 to 2025, containing 1,830 unique hashtag entries across multiple years, languages, and cultural contexts.

    📊 Dataset Description

    This dataset captures trending hashtags from TikTok's Creative Center, providing insights into viral content, cultural moments, and global events from 2022 to 2025.

    Data Source: TikTok Creative Center - Popular Hashtags

    Dataset Structure

    tag,year,rank,posts
    2024,2025,1,3000000
    2025,2025,2,2000000
    valentinesday,2025,3,1000000
    ...
    

    Columns: - tag (string): The hashtag name without the # symbol - year (integer): The year the hashtag was trending (2022-2025) - rank (integer): Rank within that year based on post count (1 = highest) - posts (integer): Total number of posts using this hashtag

    Dataset Statistics

    • Total Entries: 1,830 hashtags
    • Years Covered: 2022-2025
    • Languages: 10+ (English, Spanish, Arabic, Thai, Vietnamese, Portuguese, Chinese, Russian, Korean, and more)
    • Categories: Sports, Entertainment, News, Games, Cultural Events, Politics, Holidays

    Breakdown by Year: - 2025: 586 hashtags (most recent data) - 2024: 909 hashtags (most comprehensive) - 2023: 329 hashtags - 2022: 6 hashtags (limited early data)

    🔍 Key Insights

    Top Trending Hashtags by Year

    Year#1 HashtagPostsTheme
    2025#20243,000,000Year-in-review
    2024#christmas3,000,000Holiday season
    2023#20242,000,000New year anticipation
    2022#newyear286,000New year celebration

    Trends

    Hashtags appearing in multiple years (evergreen content): - #happynewyear - Present in 5 different contexts - #mondaymotivation - Consistent weekly trend across 5 instances - #benfica - Sports team trending across 5 periods - #newyear - 4 years of coverage - #valentinesday - Annual romantic holiday - #superbowl - Annual sports event

    2024 Highlights: - Elections: #trump (267K), #election2024 (136K), #kamalaharris (97K) - Sports: #copaamerica (362K), #olympics (25K), #messi (489K) - Entertainment: #squidgame (1M), #deadpool (32K), #billieeilish (199K) - Holidays: #christmas (3M), #valentinesday (1M), #diademuertos (956K)

    2023 Highlights: - Disney Centennial: #disney100 (829K) - Gaming: #fnaf (788K) - Cultural: #recuerdame (776K)

    2022 Highlights: - Soccer Legend: #pele (117.7K) - Viral Trends: #facechange (69.2K)

    Most Popular Categories: 1. Holidays & Celebrations (30%+): Christmas, New Year, Valentine's Day, Halloween 2. Sports & Outdoor (20%+): Soccer, NFL, Olympics, Basketball 3. Entertainment & News (25%+): Movies, TV shows, Celebrity news 4. Gaming (10%): Squid Game, FNAF, Fortnite, Mobile Legends 5. Cultural Events (10%): Dia de Muertos, Ramadan, Lunar New Year 6. Politics & Social (5%): Elections, protests, social movements

    Post Count Distribution: - Million+ posts: 8 hashtags (mega-viral content) - 500K-1M posts: 15 hashtags (highly viral) - 100K-500K posts: 250+ hashtags (popular trends) - Under 100K: Majority (niche or emerging trends)

  5. TikTok Video Performance Dataset

    • kaggle.com
    zip
    Updated Aug 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Haseeb (2024). TikTok Video Performance Dataset [Dataset]. https://www.kaggle.com/datasets/haseebindata/tiktok-video-performance-dataset
    Explore at:
    zip(2362 bytes)Available download formats
    Dataset updated
    Aug 17, 2024
    Authors
    Muhammad Haseeb
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains information about TikTok videos, including user interactions and video details. It includes features such as video ID, username, video title, likes, comments, shares, views, and more. This dataset is useful for analyzing video performance and user engagement on TikTok.

    File Information:

    • Format: .csv
    • Rows: 5
    • Columns: 15
    • Size: 1.97 KB

    Columns:

    • Video_ID: Unique identifier for each video.
    • User_ID: Unique identifier for the user who posted the video.
    • Username: Username of the user.
    • Video_Title: Title or description of the video.
    • Category: Category or type of the video.
    • Likes: Number of likes the video received.
    • Comments: Number of comments on the video.
    • Shares: Number of shares of the video.
    • Views: Number of views the video received.
    • Upload_Date: Date when the video was uploaded.
    • Video_Length: Length of the video in seconds.
    • Hashtags: List of hashtags used in the video.
    • User_Followers: Number of followers the user has.
    • User_Following: Number of accounts the user is following.
    • User_Likes: Number of likes the user has given. This dataset provides valuable insights into video performance and user engagement, making it useful for various analytical and predictive tasks.
  6. TikTok Viral Trends 2025

    • kaggle.com
    zip
    Updated Sep 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Imaad Mahmood (2025). TikTok Viral Trends 2025 [Dataset]. https://www.kaggle.com/datasets/imaadmahmood/tiktok-viral-trends-2025
    Explore at:
    zip(2940 bytes)Available download formats
    Dataset updated
    Sep 16, 2025
    Authors
    Imaad Mahmood
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    TikTok Viral Trends 2025

    September 2025 Viral Video Insights

    Overview

    This dataset, titled TikTok Viral Trends 2025, provides a curated snapshot of 50 trending TikTok videos from September 2025, capturing the platform's dynamic content landscape. Sourced from real-time web analyses and social media insights (e.g., X posts, trend reports from reputable sources like Ramdam, NapoleonCat, and Tokchart), it focuses on viral videos across diverse categories such as Entertainment, Music, Comedy, Lifestyle, Beauty, Sustainability, and Technology. The dataset is designed for data scientists, researchers, and enthusiasts interested in analyzing social media trends, predicting virality, or exploring multimodal machine learning applications (e.g., NLP, time-series, or clustering). It stands out from existing Kaggle datasets by offering fresh, 2025-specific data with rich metadata, including engagement metrics, hashtags, and sound/trend associations.

    Dataset Description

    • Size: 50 records, each representing a trending TikTok video or aggregated trend data from September 2025.
    • Format: CSV (tiktok_data.csv).
    • Source: Aggregated from public web sources and social media posts, ensuring authenticity and compliance with data-sharing guidelines. Specific sources are cited per record (e.g., post:72, web:65).
    • Update: Reflects trends as of September 16, 2025, making it more current than 2023-2024 TikTok datasets on Kaggle.

    Columns

    The dataset contains the following 12 columns: - video_id: Unique identifier for each video or trend (integer or hashtag-based). - author: Creator username or group (anonymized as "Unknown" where not specified). - description: Brief summary of the video content or trend, derived from source context. - upload_date: Approximate or exact posting date (YYYY-MM-DD). - views: Reported view count (e.g., millions, billions for hashtag aggregates; "N/A" if unavailable). - likes: Reported like count (e.g., thousands, millions; "N/A" if unavailable). - shares: Share count (often "N/A" due to limited public data). - comments: Comment count (often "N/A" due to limited public data). - hashtags: Key hashtags associated with the video or trend (e.g., #Kpop, #Viral). - category: Inferred content category (e.g., Entertainment, Music, Comedy, Lifestyle, Sustainability, Tech). - sound_or_trend: Associated audio track or challenge name driving the trend (e.g., "Soda Pop dance", "JUMP"). - source: Citation of data origin (e.g., post:72 for X post ID, web:65 for web source ID).

    Key Features

    • Diverse Categories: Includes K-pop (e.g., BLACKPINK, SEVENTEEN), dance challenges (e.g., Espresso Dance), AI-driven content (e.g., Identity Swap), comedy, lifestyle (e.g., SustainableSeptember), and beauty trends, reflecting TikTok's global appeal.
    • High Engagement: Videos with reported metrics show millions of views (e.g., 29.4M for BLACKPINK’s JUMP) and likes, with hashtag trends like #Perfume reaching 39.3B views.
    • Multimodal Potential: Supports text analysis (descriptions, hashtags), numerical analysis (views, likes), and categorical analysis (categories, sounds).
    • Timeliness: Captures September 2025 trends, including seasonal (e.g., Autumn Cozy Challenge) and cultural moments (e.g., K-pop releases, viral memes).

    Potential Use Cases

    This dataset is ideal for a variety of machine learning and data analysis tasks on Kaggle, including but not limited to: - Virality Prediction: Use views, likes, and hashtags to train regression or classification models (e.g., XGBoost, neural networks) to predict video success. - Trend Analysis: Apply clustering (e.g., K-means) or topic modeling (e.g., LDA) to identify emerging content themes or regional differences. - NLP Applications: Analyze descriptions and hashtags with BERT or word embeddings to study sentiment, cultural trends, or influencer impact. - Time-Series Forecasting: Leverage upload_date and engagement metrics for temporal analysis of trend lifecycles. - Recommendation Systems: Build content recommendation models based on category, sound, or hashtag similarities. - Social Media Ethics: Explore AI-driven trends (e.g., deepfake Identity Swaps) for studies on misinformation or content authenticity.

    Data Collection

    • Methodology: Data was aggregated from public web sources (e.g., trend reports, news snippets) and X posts discussing viral TikTok content. No private or restricted data was used, ensuring ethical sourcing.
    • Limitations: Some metrics (e.g., shares, comments) are "N/A" due to limited public availability. View and like counts are reported where available, with aggregates for trends (e.g., 686.4K videos for #Ominous). Exact metrics may vary slightly due to real-time fluctuations.
    • Verification: All entries ...
  7. Popular TikTok Videos, Authors, and Musics

    • kaggle.com
    zip
    Updated Nov 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Popular TikTok Videos, Authors, and Musics [Dataset]. https://www.kaggle.com/datasets/thedevastator/popular-tiktok-videos-authors-and-musics
    Explore at:
    zip(73379 bytes)Available download formats
    Dataset updated
    Nov 21, 2022
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Popular TikTok Videos, Authors, and Musics

    A Comprehensive Dataset for performing Trending Analysis

    About this dataset

    TikTok is one of the hottest social media platforms out there, and it's only getting bigger. If you're looking to get in on the action, this dataset is for you!

    This dataset contains a collection of videos from TikTok, including information on the user who posted the video, the number of likes, shares, and comments the video received, as well as the video's length and description. With this data, you can see what types of videos are popular on TikTok and start planning your own viral content!

    How to use the dataset

    1. The dataset contains a collection of videos from the social media platform TikTok.
    2. The videos include information on the user who posted the video, the number of likes, shares, and comments the video received, as well as the video's length and description.
    3. The dataset also contains information on popular TikTok authors, including their unique ID, nickname, avatar thumbnail, signature, and whether or not their account is verified or private.
    4. Additionally, the dataset includes a list of trending videos on TikTok, as well as the number of likes, shares, comments, and plays each video has received

    Research Ideas

    • Identifying popular TikTok authors to target for scraping videos and liked videos
    • Finding trending videos on TikTok for further analysis
    • Generating a list of videos from the TikTok app that are tagged with the #funny hashtag

    Acknowledgements

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: tiktok_collected_liked_videos.csv | Column name | Description | |:---------------|:---------------------------------------------------------| | user_name | The name of the user who posted the video. (String) | | n_likes | The number of likes the video has received. (Integer) | | n_shares | The number of shares the video has received. (Integer) | | n_comments | The number of comments the video has received. (Integer) | | n_plays | The number of times the video has been played. (Integer) |

    File: tiktok_collected_videos.csv | Column name | Description | |:---------------|:---------------------------------------------------------| | user_name | The name of the user who posted the video. (String) | | n_likes | The number of likes the video has received. (Integer) | | n_shares | The number of shares the video has received. (Integer) | | n_comments | The number of comments the video has received. (Integer) | | n_plays | The number of times the video has been played. (Integer) |

    File: tiktok_funny_hashtag_videos.csv | Column name | Description | |:--------------------------|:-----------------------------------------------------------| | author_nickname | The author's nickname. (String) | | author_avatarThumb | The author's avatar thumbnail. (String) | | author_signature | The author's signature. (String) | | author_verification | Whether or not the author's account is verified. (Boolean) | | author_privateAccount | Whether or not the author's account is private. (Boolean) | | author_followingCount | The number of people the author is following. (Integer) | | author_followerCount | The number of people following the author. (Integer) | | author_heartCount | The number of hearts the author has. (Integer) | | author_diggCount | The number of diggs the author has. (Integer) | | music_title | The title of the music. (String) | | music_playUrl | The play url of the music. (String) | | music_coverThumb | The cover thumbnail of the music. (String) | | music_authorName | The author name of the music. (String) | | music_originality | The originality of the music. (String) | | music_duration | The duration of the music. (String) |

    File: trending_authors.csv | Column name | Description ...

  8. hashtag tik tok

    • kaggle.com
    zip
    Updated Feb 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Phụng Trương Thu (2025). hashtag tik tok [Dataset]. https://www.kaggle.com/datasets/phngtrngthu/hashtag-tik-tok
    Explore at:
    zip(2979 bytes)Available download formats
    Dataset updated
    Feb 17, 2025
    Authors
    Phụng Trương Thu
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Phụng Trương Thu

    Released under CC0: Public Domain

    Contents

  9. Social Media Viral Content & Engagement Metrics

    • kaggle.com
    zip
    Updated Jan 18, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ali Hussain (2026). Social Media Viral Content & Engagement Metrics [Dataset]. https://www.kaggle.com/datasets/aliiihussain/social-media-viral-content-and-engagement-metrics
    Explore at:
    zip(70865 bytes)Available download formats
    Dataset updated
    Jan 18, 2026
    Authors
    Ali Hussain
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    🔥 What Makes Content Go Viral?

    This dataset is designed to help data scientists, analysts, and researchers understand, analyze, and predict viral content across major social media platforms. It captures realistic engagement behavior, sentiment signals, and content attributes that influence virality in today’s digital ecosystem.

    🌐 Platforms Covered

    The dataset includes multi-platform data from: - TikTok - Instagram - X (Twitter) - YouTube Shorts

    Each platform is represented with consistent metrics, making cross-platform comparison easy and reliable.

    🧠 Dataset Features (Columns Explained)

    🆔 Post Metadata

    • post_id – Unique identifier for each post
    • platform – Social media platform name
    • content_type – Video, image, carousel, or text
    • topic – Content category (Entertainment, Tech, Sports, etc.)
    • language – Post language (EN, UR, HI, ES, FR)
    • region – Geographic region of the post

    ⏰ Time & Trend Signals

    • post_datetime – Date and time of posting Useful for time-series analysis, peak engagement detection, and trend forecasting.

    #️⃣ Hashtags & Sentiment

    • hashtags – Multiple trending hashtags per post
    • sentiment_score – Emotional tone score (-1 = negative, +1 = positive)

    Ideal for NLP tasks, sentiment analysis, and hashtag impact studies.

    📈 Engagement Metrics

    • views – Total views
    • likes – Likes received
    • comments – Number of comments
    • shares – Number of shares

    These metrics allow deep analysis of user interaction patterns.

    ⚙️ Engineered Features

    • engagement_rate – Combined engagement normalized by views
    • is_viral – Binary label indicating viral content

    Perfect for machine learning models and classification tasks.

  10. Tik Tok creator by hashtag

    • kaggle.com
    zip
    Updated Apr 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lai Wing Ho (2022). Tik Tok creator by hashtag [Dataset]. https://www.kaggle.com/datasets/laiwingho/tik-tok-creator-by-hashtag
    Explore at:
    zip(590 bytes)Available download formats
    Dataset updated
    Apr 11, 2022
    Authors
    Lai Wing Ho
    Description

    As of January 2022, the hashtag "fyp," which stands for "for you page," was the most used hashtag on TikTok, amassing over 18.57 trillion views across posts using it. The hashtag "viral" ranked second, with approximately 6.3 trillion views on TikTok short-video posts using the hashtag. Posts using the hashtag "duet," which refers to TikTok videos that can be shared, mirrored, and commented on by creators, collected around 2.4 trillion views as of January 2022.

  11. TikTok Video Metadata

    • kaggle.com
    zip
    Updated Jan 22, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marat Saratov (2026). TikTok Video Metadata [Dataset]. https://www.kaggle.com/datasets/maratsaratov/tiktok-data
    Explore at:
    zip(50928 bytes)Available download formats
    Dataset updated
    Jan 22, 2026
    Authors
    Marat Saratov
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    TikTok Video Metadata Dataset – 700+ Entries This dataset contains metadata for over 700 TikTok videos, designed for training and testing machine learning models aimed at predicting video popularity, engagement, and virality. It includes key features such as video duration, text descriptions, hashtags, timestamps, author statistics, sound IDs, and engagement metrics (views and likes).

    Key Features:

    Video Metadata: id_video, duration_seconds, text_part, hashtags

    Author Stats: author_followers, author_likes

    Engagement Metrics: views, likes

    Sound & Time: id_sound, human_time

    Hashtags & Descriptions: Provided as comma-separated strings for easy parsing

    Possible Use Cases:

    Engagement Prediction: Build regression or classification models to predict views and likes.

    Content & Hashtag Analysis: Identify which hashtags and text content correlate with higher engagement.

    Author Influence Study: Explore how author popularity impacts video performance.

    Time-based Analysis: Investigate posting time patterns.

    NLP Applications: Perform text mining on video captions and hashtags.

    Data Notes:

    Contains real TikTok video metadata from various topics and regions.

    Some fields may be empty (e.g., missing text or hashtags).

    Suitable for educational projects, hackathons, and initial research in social media analytics.

    Suggested Tasks:

    Predict likes or views using regression models.

    Classify videos into "viral" vs. "non-viral" based on a views threshold.

    Cluster videos based on hashtags or content themes.

    Analyze the impact of video length and posting time on engagement.

  12. TikTok Trending Metadata

    • kaggle.com
    zip
    Updated Feb 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brad Culbertson (2023). TikTok Trending Metadata [Dataset]. https://www.kaggle.com/datasets/vbradculbertson/tiktok-trending-metadata
    Explore at:
    zip(4067303 bytes)Available download formats
    Dataset updated
    Feb 24, 2023
    Authors
    Brad Culbertson
    Description

    The dataset was originally obtained from TikTok's trending API by a GitHub user named Ivan Tran. It contains metadata on engagement with user-created videos and user profile data. The original create time is in Unix timecode format and is extracted directly from the video id number. TikTok's API has become much more difficult to access recently, so more current data is harder to obtain. The hashtags column contains lists.

  13. MTikGuard Dataset

    • kaggle.com
    zip
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KusNguyen (2025). MTikGuard Dataset [Dataset]. https://www.kaggle.com/datasets/kusnguyen/extra-dataset
    Explore at:
    zip(2137777416 bytes)Available download formats
    Dataset updated
    Jun 30, 2025
    Authors
    KusNguyen
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    This dataset is an extension of the TikHarm dataset, created to enhance multimodal harmful content detection on TikTok. It was developed as part of the MTikGuard system, a real-time moderation pipeline designed to protect young audiences from unsafe TikTok videos.

    🔹 Purpose

    The dataset supplements TikHarm with 775 additional annotated videos, collected from TikTok trending and targeted hashtag queries. These videos were selected to address class imbalance and content diversity gaps in the original dataset, improving model generalization for real-world deployment.

    🔹 Content

    Each video is labeled into one of four categories: - Safe - Adult Content - Harmful Content (e.g., dangerous challenges, graphic violence) - Suicide / Self-harm

    🔹 Data Collection & Annotation

    Collection: Automated crawling using Selenium and TikTok Content Scraper, coordinated via Apache Airflow and Apache Kafka.

    Annotation: Conducted via a custom web-based tool, following detailed guidelines to ensure consistency and reliability. Multiple annotators reviewed each video, with disagreements resolved via majority voting.

    Class balance: Oversampling of underrepresented categories (e.g., Suicide, Harmful Content) during collection.

    🔹 Applications

    Training and evaluating multimodal classification models for harmful content detection.

    Benchmarking real-time content moderation pipelines.

    Research on multimodal fusion strategies and multi-label classification.

  14. Social Media Sponsorship & Engagement Dataset

    • kaggle.com
    zip
    Updated May 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OmenKj (2025). Social Media Sponsorship & Engagement Dataset [Dataset]. https://www.kaggle.com/datasets/omenkj/social-media-sponsorship-and-engagement-dataset
    Explore at:
    zip(8047768 bytes)Available download formats
    Dataset updated
    May 28, 2025
    Authors
    OmenKj
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This social media content dataset is simulate realistic influencer posts across multiple popular platforms, reflecting diverse content types, sponsorship details, audience demographics, and engagement metrics. The dataset contains over 52,000 rows representing individual content posts generated over the past two years. It includes a balanced distribution of sponsored and non-sponsored content, with detailed disclosure information to support transparency studies and analyses. The variety of platforms, languages, content categories, and audience demographics makes this dataset ideal for exploring influencer marketing dynamics, content performance analytics, disclosure practices, and audience segmentation in social media research.

    Dataset Features

    id: Unique identifier for each content post (starting from 1).

    platform: The social media platform where the content was posted. Values: YouTube, TikTok, Instagram, Bilibili, RedNote.

    content_id: Unique ID for each content piece (e.g., content_0, content_1, …).

    creator_id: Unique identifier for the content creator, cycling through 5000 distinct creators.

    creator_name: Username of the content creator.

    content_url: URL pointing to the content.

    content_type: Format of the content. Values: video, image, text, mixed.

    content_category: The main theme or niche of the content. Values: beauty, lifestyle, tech.

    post_date: Timestamp of the post, randomly distributed over the past two years.

    language: Language of the content, with probabilities favoring English. Values: English, Chinese, Spanish, Hindi, Japanese.

    content_length: Length of the content in seconds (for video) or word count (for text), varying by content type.

    content_description: Textual description or caption of the content.

    hashtags: A comma-separated string of hashtags used in the post (0 to 5 tags).

    views: Number of views (simulated via a Poisson distribution).

    likes: Number of likes received.

    shares: Number of shares.

    comments_count: Count of comments on the post.

    comments_text: Aggregated text of comments (0 to 5 comments concatenated).

    follower_count: Number of followers the creator had at the time of posting.

    is_sponsored: Boolean indicating whether the post is sponsored.

    disclosure_type: Disclosure type regarding sponsorship for sponsored posts. Values: explicit, implicit, none (non-sponsored always 'none').

    sponsor_name: Name of the sponsoring company if sponsored, else 'Not sponsors'.

    sponsor_category: Sponsorship industry category. Values: cosmetics, electronics, fashion, food, gaming, travel or 'Not sponsors'.

    disclosure_location: Where sponsorship disclosure appears in the post. Values: video, caption, hashtags, none (non-sponsored always 'none').

    audience_age_distribution: Predominant age group of the audience. Values: 13-18, 19-25, 26-35, 36-50, 50+.

    audience_gender_distribution: Predominant gender of the audience. Values: male, female, non-binary, unknown.

    audience_location: Primary geographic location of the audience. Values: USA, China, India, Japan, Brazil, Germany, UK, Russia.

  15. YouTube/TikTok Trends Dataset

    • kaggle.com
    zip
    Updated Sep 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tarek Masryo (2025). YouTube/TikTok Trends Dataset [Dataset]. https://www.kaggle.com/datasets/tarekmasryo/youtube-shorts-and-tiktok-trends-2025/code
    Explore at:
    zip(14982241 bytes)Available download formats
    Dataset updated
    Sep 16, 2025
    Authors
    Tarek Masryo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    YouTube
    Description

    YouTube Shorts & TikTok Trends 2025

    Overview

    A global dataset capturing short-form video performance across YouTube Shorts and TikTok in 2025.
    It includes over 50,000 video records, available in both raw and machine learning–ready formats.
    Designed for reproducible EDA, dashboarding, and baseline ML modeling on social media engagement dynamics.

    Files Included

    FileDescriptionShape
    youtube_shorts_tiktok_trends_2025.csvRaw video-level data with full feature set~48k × ~58
    youtube_shorts_tiktok_trends_2025_ml.csvML-ready, cleaned and engineered version~50k × 32
    monthly_trends_2025.csvMonthly aggregates (Jan–Aug 2025)~480 × 8
    country_platform_summary_2025.csvCountry × platform summary statistics~60 × 14
    top_hashtags_2025.csvRanked list of top trending hashtags~82 × 18
    top_creators_impact_2025.csvCreator-level impact and influence metrics~1,000 × 20
    DATA_DICTIONARY.csvColumn names and definitions~58 × 2

    All files are UTF-8 encoded, cleaned, and schema-aligned for direct analysis.

    Key Columns (ML-Ready File)

    • Identifiers: video_id, platform, country, category, creator_tier
    • Engagement Metrics: views, likes, comments, shares, saves, completions
    • Derived Ratios: engagement_rate = (likes + comments + shares) / views, plus save_rate, share_rate, comment_rate
    • Signals: velocity indicators, rolling statistics, seasonality flags

    Recommended Uses

    • EDA: Analyze short-form engagement trends by country, platform, or content type
    • ML Modeling: Classify trend_label or predict engagement_rate and views
    • Dashboarding: Visualize global video trends and creator performance
    • Market Research: Study cultural and regional patterns of viral content

    Notes

    • trend_label is a snapshot trend proxy; baseline models typically reach 25–35% accuracy without temporal features.
    • publish_date_approx is derived and coarse — for trend direction only.
    • The dataset contains metadata only (no media content).

    If you find this dataset helpful, supporting it with an upvote helps others discover it too ✨

  16. books_challenge _tiktok

    • kaggle.com
    zip
    Updated Dec 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ayoub chaoui (2021). books_challenge _tiktok [Dataset]. https://www.kaggle.com/datasets/ayoubchaoui/books-challenge-tiktok
    Explore at:
    zip(41161295 bytes)Available download formats
    Dataset updated
    Dec 8, 2021
    Authors
    ayoub chaoui
    Description

    Context

    TikTok's platform is mostly fueled by viral videos of users doing outlandish, scary, or funny things. On the platform, these trend and meme videos typically come with a hashtag that includes the word challenge. But what is a TikTok challenge and how do you find or create them? Here's everything you need to know.

    This TikTok book challenge was made by @haleyisfearless, . It asks you to show, your prettiest book,your tiniest book a book you highly suggest a book you're currently reading and one of your favorite books . In the most basic sense, these challenges originate from viral TikTok challenge isn't complete without its defining hashtag in the video's description

    These TikTok challenges are the perfect way to ease into what can be an intimidating social media platform and help you find your fellow book lovers.

    Acknowledgements

    This dataset is generated entirely from TikTok , so we want to thank @haleyisfearless for building such this challange video

    Inspiration

    the goal of this project is to make Python script which takes a video as input and returns all texts visible on the video. the videos are titlok videos so texts can appear everywhere on screen, with different background, font size etc..

  17. TikTok Data - Amber Heard - Social Media 2022

    • kaggle.com
    zip
    Updated Jul 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amber Heard - Data Social Media Analysis (2022). TikTok Data - Amber Heard - Social Media 2022 [Dataset]. https://www.kaggle.com/datasets/amberhearddata/tiktok-data-amber-heard-social-media-2022
    Explore at:
    zip(660350769 bytes)Available download formats
    Dataset updated
    Jul 23, 2022
    Authors
    Amber Heard - Data Social Media Analysis
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Amber Heard TikTok Data from 2022 under 57 hashtags. Videos with full Metrics and information fields. On the Disinformation Operation harming Human Rights Activist Amber Heard. Comments of each post are included in the scraper.

    TikTok Hashtags: - Positive, Neutral, and Negative of 57 hashtags. Positive and Neutral: 1. amberheard 2. amberheardmera 3. amberheardisinnocent 4. amberheardaquaman 5. amberheardisasurvivor 6. amberheardisavictim 7. ibelieveamberheard 8. darvodepp 9. istandwithamber 10. istandwithamberheard 11. loveamberheard 12. wearewithyouamberheard 13. westandwithamberheard 14. standwithamberheard 15. teamah 16. teamamberheard 17. justiceforamberheard 18. johnnydeppisawifebeater 19. johnnydeppisguilty

    Negative: 1. aclusupportsabusers 2. amberhearddoesnotspeakforme 3. amberheardforjail 4. amberheardforprison 5. amberheardisacriminal 6. amberheardisafraud 7. amberheardisanabuser 8. amberheardisapsycopath 9. amberheardisguilty 10. amberheardisoverparty 11. amberheardjohnnydepp 12. amberheardperjury 13. amberheardslawyersucks 14. amberheardtrial 15. amberheard💩 16. amberheard🤡 17. amberheard🤮 18. amberpoop 19. amberturd 20. boycottaquaman2 21. boycottloreal 22. boycottwarnerbros 23. boycottwarnerbrothers 24. deppheardtrial 25. deppvheardtrial 26. deppvsheard 27. fireamberheard 28. istandbyjohnnydepp 29. johnnydepp 30. johnnydeppamberheard 31. johnnydeppisinnocent 32. johnnydepptrial 33. johnnydeppvsamberheard 34. justiceforjohnnydepp 35. putamberheardinjail 36. recastmera 37. teamjd 38. teamjohnnydepp

    Each Hashtag Feed shows 1000 videos per day of collections.

    From Public Research Study: https://github.com/RescueSocialTech/Amber-Heard_Disinformation_Operations_Bots

  18. socialmedia

    • kaggle.com
    zip
    Updated Jul 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anoop Johny (2023). socialmedia [Dataset]. https://www.kaggle.com/datasets/anoopjohny/socialmedia
    Explore at:
    zip(4736 bytes)Available download formats
    Dataset updated
    Jul 30, 2023
    Authors
    Anoop Johny
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    This dataset provides a comprehensive and diverse snapshot of social media users and their engagements across various popular platforms such as Instagram, Twitter, Facebook, YouTube, Pinterest, TikTok, and Spotify. With 100 rows of anonymized data, it offers valuable insights into the dynamic world of social media usage. 😀

    Each row in the dataset represents a unique user with a designated User ID and Username to ensure anonymity. Alongside user-specific details, the dataset captures essential information, including the platform being used, the post's content, timestamp, and media type (text, image, or video). Additionally, it tracks engagement metrics such as likes, comments, shares/retweets, and user interactions, providing an overview of the user's popularity and social impact. 💬

    https://media.giphy.com/media/3GSoFVODOkiPBFArlu/giphy.gif" alt="social">

    The dataset also includes pertinent user attributes, such as account creation date, privacy settings, number of followers, and following. The users' profiles are further enriched with demographic characteristics, including anonymized representations of their age group and gender. 🗨️

    https://media.giphy.com/media/2tSodgDfwCjIMCBY8h/giphy.gif" alt="socialcat">

    Hashtags, mentions, media URLs, post URLs, and self-reported location contribute to understanding user interests, content themes, and geographic distribution. Moreover, users' bios and language preferences offer insights into their passions, activities, and linguistic communication on the platforms.

  19. Social Media Engagement Dataset

    • kaggle.com
    zip
    Updated Jan 30, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aviral Trivedi (2026). Social Media Engagement Dataset [Dataset]. https://www.kaggle.com/datasets/aviral342/social-media-engagement-dataset/data
    Explore at:
    zip(188589 bytes)Available download formats
    Dataset updated
    Jan 30, 2026
    Authors
    Aviral Trivedi
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    📱 About Dataset Overview This Social Media Engagement Dataset contains comprehensive engagement metrics from 5,000 social media posts across six major platforms: Instagram, Twitter, Facebook, LinkedIn, TikTok, and YouTube. The dataset spans over 2 years (2024-2025) and provides valuable insights into content performance, audience engagement patterns, and influencer analytics.

    Dataset Contents The dataset includes 20 detailed features covering various aspects of social media engagement:

    Post Information Post_ID: Unique identifier for each post Timestamp: Date and time when the post was published Platform: Social media platform (Instagram, Twitter, Facebook, LinkedIn, TikTok, YouTube) Content_Type: Type of content (Photo, Video, Reel, Tweet, Story, etc.) Category: Content category (Technology, Fashion, Food, Travel, Fitness, Education, Entertainment, Business, Lifestyle, Gaming, Health, Sports) Engagement Metrics Likes: Number of likes/reactions received Comments: Number of comments on the post Shares: Number of shares/retweets/reposts Views: Total number of views Saves: Number of bookmarks/saves Engagement_Rate: Calculated engagement rate percentage Account Information Follower_Count: Number of followers of the account Influencer_Tier: Classification (Nano, Micro, Mid-tier, Macro) Is_Verified: Whether the account is verified (True/False) Content Characteristics Hashtag_Count: Number of hashtags used Content_Length: Length in characters (text) or seconds (video) Sentiment: Sentiment analysis (Positive, Neutral, Negative) Has_Media: Whether post contains media (True/False) Temporal Features Hour_of_Day: Hour when the post was published (0-23) Day_of_Week: Day of the week (Monday-Sunday) Use Cases This dataset is perfect for:

    📊 Predictive Analytics: Build ML models to predict engagement rates 📈 Data Visualization: Create insightful dashboards and charts 🤖 Machine Learning: Classification, regression, and clustering tasks ⏰ Time Series Analysis: Analyze posting patterns and optimal timing 🎯 Content Strategy: Optimize content strategy based on data insights 🔍 Sentiment Analysis: Study correlation between sentiment and engagement 📱 Platform Comparison: Compare performance across different platforms 💼 Influencer Marketing: Analyze influencer tier performance Technical Details Format: CSV Size: ~651 KB Rows: 5,000 Columns: 20 Time Period: January 2024 - December 2025 Missing Values: None Potential Research Questions What time of day generates the most engagement? Which platform has the highest engagement rates? How does content type affect performance? Does verified status impact engagement? What's the optimal hashtag count? How does sentiment correlate with engagement? Notes Engagement metrics are platform-realistic and proportional All data is synthetically generated for educational and research purposes Suitable for beginners and advanced data scientists

  20. Movie Dataset - 800 movies

    • kaggle.com
    zip
    Updated Apr 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seniru Hasith (2025). Movie Dataset - 800 movies [Dataset]. https://www.kaggle.com/datasets/seniruhasith/movie-dataset-800-movies/code
    Explore at:
    zip(96241 bytes)Available download formats
    Dataset updated
    Apr 13, 2025
    Authors
    Seniru Hasith
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    🎬 Movie Success Prediction Dataset

    This dataset was curated to support machine learning models that predict movie success based on a wide range of multi-modal features, including cast popularity, sentiment analysis, audio-visual cues, social media engagement, and metadata such as budget and IMDb rating.

    📦 Dataset Overview

    The dataset consists of 36 engineered features extracted from various sources:

    • Cast and Crew Insights (e.g., popularity trends, number of cast members)
    • Sentiment Analysis from YouTube Comments using VADER
    • Audio Features from movie trailers using VGGish 3
    • Video Features using ResNet-based frame analysis
    • TikTok Popularity Signals (hashtags, views, engagement rate)
    • Movie Metadata (e.g., budget, IMDb rating)

    Each row represents one movie. The dataset is ideal for classification or regression tasks related to box office success, revenue prediction, or audience engagement forecasting.

    📊 Feature Mapping

    Feature CodeFeature Name
    Feature_1cast_trend_1
    Feature_2cast_trend_2
    Feature_3cast_trend_3
    Feature_4avg_cast_popularity
    Feature_5top_cast_popularity
    Feature_6genre_score
    Feature_7positive_sentiment
    Feature_8neutral_sentiment
    Feature_9negative_sentiment
    Feature_10num_youtube_comments
    Feature_11num_cast_members
    Feature_12num_upcoming_movies
    Feature_13avg_upcoming_popularity
    Feature_14max_upcoming_popularity
    Feature_15tiktok_hashtag_views
    Feature_16tiktok_video_count
    Feature_17tiktok_total_likes
    Feature_18tiktok_total_comments
    Feature_19tiktok_total_shares
    Feature_20tiktok_engagement_rate
    Feature_21audio_tempo
    Feature_22audio_energy_mean
    Feature_23audio_energy_variance
    Feature_24audio_spectral_centroid_mean
    Feature_25audio_spectral_rolloff_mean
    Feature_26video_brightness_mean
    Feature_27video_colorfulness_mean
    Feature_28video_scene_change_rate
    Feature_29video_emotion_happy
    Feature_30video_emotion_sad
    Feature_31imdb_rating
    Feature_32budget
    Feature_33log_budget
    Feature_34sqrt_budget
    Feature_35budget_squared
    Feature_36budget_rating_interaction

    🛠️ Feature Engineering Highlights

    • Audio features were extracted using the VGGish 3 model, widely used in speech emotion recognition tasks.
    • Video features were obtained from a ResNet-based model analyzing brightness, scene change rate, colorfulness, and emotion cues.
    • Sentiment scores were derived from YouTube comments using VADER, capturing positive, neutral, and negative sentiment proportions.
    • TikTok engagement metrics were collected using hashtag data, capturing likes, views, shares, and overall engagement rate.
    • Budget transformations such as log, square root, and squared values are included, along with an interaction feature with IMDb rating.

    💡 Potential Use-Cases

    • Predict box office revenue or success labels
    • Analyze which audio-visual cues correlate with public interest
    • Build early-stage predictors of movie success using trailers and social signals
    • Inform marketing strategies using real-time sentiment and TikTok trends

    📥 Data Sources

    • IMDb for metadata
    • YouTube (comments and trailers) for sentiment and audio/visual analysis
    • TikTok for hashtag popularity and engagement stats
    • In-house processing for video/audio feature extraction using ResNet and VGGish 3

    🚀 Whether you're working on predictive modeling, multimedia analysis, or social signal correlation, this dataset provides a diverse feature set to explore what makes a movie successful.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Muhammad Anas Mahmood (2022). TikTok Dataset [Dataset]. https://www.kaggle.com/datasets/muhammadanasmahmood/tiktok-dataset
Organization logo

TikTok Dataset

Tiktok popular hashtags datset

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
zip(733532 bytes)Available download formats
Dataset updated
Jul 27, 2022
Authors
Muhammad Anas Mahmood
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

This is the Dataset of popular hashtags on TikTok, this includes the author name, author id, author signature, comment count, hashtags details, URL, share count, hashtags which i scrape are meme, funny, humor, comedy, education, lol, dance, song, music, etc.

Search
Clear search
Close search
Google apps
Main menu