13 datasets found
  1. TikTok Dataset

    • kaggle.com
    zip
    Updated Jul 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Anas Mahmood (2022). TikTok Dataset [Dataset]. https://www.kaggle.com/datasets/muhammadanasmahmood/tiktok-dataset
    Explore at:
    zip(733532 bytes)Available download formats
    Dataset updated
    Jul 27, 2022
    Authors
    Muhammad Anas Mahmood
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This is the Dataset of popular hashtags on TikTok, this includes the author name, author id, author signature, comment count, hashtags details, URL, share count, hashtags which i scrape are meme, funny, humor, comedy, education, lol, dance, song, music, etc.

  2. Tiktok 2025 Dataset

    • kaggle.com
    zip
    Updated Jun 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haziq Halifi (2025). Tiktok 2025 Dataset [Dataset]. https://www.kaggle.com/datasets/haziqhalifi/tiktok-2025-dataset
    Explore at:
    zip(889553 bytes)Available download formats
    Dataset updated
    Jun 13, 2025
    Authors
    Haziq Halifi
    Description

    This dataset contains comprehensive information about TikTok posts, originally fetched from RapidAPI. It provides valuable insights into various aspects of TikTok content, including details about the videos, their creators, and audience engagement metrics.

    Here's a breakdown of the columns included in this dataset:

    video_id: A unique identifier for each TikTok video. author: The username or handle of the TikTok account that posted the video. description: The textual description or caption provided by the creator for the video. (Note: This column contains some missing values.) likes: The number of likes the video has received. comments: The number of comments on the video. shares: The number of times the video has been shared. plays: The total number of plays or views the video has accumulated. (Note: This column contains some missing values.) hashtags: A list of hashtags used in the video's description, which helps categorize content and improve discoverability. (Note: This column contains some missing values.) music: Information about the background music or sound used in the video. create_time: The timestamp indicating when the video was created or published. (Note: This column contains some missing values.) video_url: The direct URL to the TikTok video. fetch_time: The timestamp when the data for the video was fetched from the API. (Note: This column has a high number of missing values.) views: Another metric for the number of views. (Note: This column has a high number of missing values and appears to overlap with plays.) posted_time: The time the video was posted. (Note: This column has a high number of missing values and appears to overlap with create_time.) Potential Uses of This Dataset:

    Content Analysis: Analyze popular TikTok content by examining descriptions, hashtags, and engagement metrics. Trend Identification: Identify trending topics, music, and creators on TikTok. Audience Engagement Studies: Understand how different types of content generate likes, comments, shares, and plays. Creator Analysis: Study the posting habits and performance of various TikTok creators. Social Media Research: Conduct research on the dynamics of content dissemination and user interaction on short-form video platforms. Notes on Data Quality:

    The description, plays, hashtags, and create_time columns have some missing values, which may require handling (e.g., imputation or removal) depending on your analysis. The fetch_time, views, and posted_time columns are largely empty, suggesting they may not be reliable for comprehensive analysis. It is recommended to primarily rely on create_time for timestamps and plays for engagement metrics. This dataset can be a valuable resource for anyone looking to explore the vast and dynamic world of TikTok content and user engagement.

  3. 🚀 Viral Social Media Trends & Engagement Analysis

    • kaggle.com
    zip
    Updated May 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Atharva Soundankar (2025). 🚀 Viral Social Media Trends & Engagement Analysis [Dataset]. https://www.kaggle.com/datasets/atharvasoundankar/viral-social-media-trends-and-engagement-analysis
    Explore at:
    zip(230834 bytes)Available download formats
    Dataset updated
    May 23, 2025
    Authors
    Atharva Soundankar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset captures the pulse of viral social media trends across TikTok, Instagram, Twitter, and YouTube. It provides insights into the most popular hashtags, content types, and user engagement levels, offering a comprehensive view of how trends unfold across platforms. With regional data and influencer-driven content, this dataset is perfect for:

    • Trend analysis 🔍
    • Sentiment modeling 💭
    • Understanding influencer marketing 📈

    Dive in to explore what makes content go viral, the behaviors that drive engagement, and how trends evolve on a global scale! 🌍

  4. TikTok Viral Trends 2025

    • kaggle.com
    zip
    Updated Sep 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Imaad Mahmood (2025). TikTok Viral Trends 2025 [Dataset]. https://www.kaggle.com/datasets/imaadmahmood/tiktok-viral-trends-2025
    Explore at:
    zip(2940 bytes)Available download formats
    Dataset updated
    Sep 16, 2025
    Authors
    Imaad Mahmood
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    TikTok Viral Trends 2025

    September 2025 Viral Video Insights

    Overview

    This dataset, titled TikTok Viral Trends 2025, provides a curated snapshot of 50 trending TikTok videos from September 2025, capturing the platform's dynamic content landscape. Sourced from real-time web analyses and social media insights (e.g., X posts, trend reports from reputable sources like Ramdam, NapoleonCat, and Tokchart), it focuses on viral videos across diverse categories such as Entertainment, Music, Comedy, Lifestyle, Beauty, Sustainability, and Technology. The dataset is designed for data scientists, researchers, and enthusiasts interested in analyzing social media trends, predicting virality, or exploring multimodal machine learning applications (e.g., NLP, time-series, or clustering). It stands out from existing Kaggle datasets by offering fresh, 2025-specific data with rich metadata, including engagement metrics, hashtags, and sound/trend associations.

    Dataset Description

    • Size: 50 records, each representing a trending TikTok video or aggregated trend data from September 2025.
    • Format: CSV (tiktok_data.csv).
    • Source: Aggregated from public web sources and social media posts, ensuring authenticity and compliance with data-sharing guidelines. Specific sources are cited per record (e.g., post:72, web:65).
    • Update: Reflects trends as of September 16, 2025, making it more current than 2023-2024 TikTok datasets on Kaggle.

    Columns

    The dataset contains the following 12 columns: - video_id: Unique identifier for each video or trend (integer or hashtag-based). - author: Creator username or group (anonymized as "Unknown" where not specified). - description: Brief summary of the video content or trend, derived from source context. - upload_date: Approximate or exact posting date (YYYY-MM-DD). - views: Reported view count (e.g., millions, billions for hashtag aggregates; "N/A" if unavailable). - likes: Reported like count (e.g., thousands, millions; "N/A" if unavailable). - shares: Share count (often "N/A" due to limited public data). - comments: Comment count (often "N/A" due to limited public data). - hashtags: Key hashtags associated with the video or trend (e.g., #Kpop, #Viral). - category: Inferred content category (e.g., Entertainment, Music, Comedy, Lifestyle, Sustainability, Tech). - sound_or_trend: Associated audio track or challenge name driving the trend (e.g., "Soda Pop dance", "JUMP"). - source: Citation of data origin (e.g., post:72 for X post ID, web:65 for web source ID).

    Key Features

    • Diverse Categories: Includes K-pop (e.g., BLACKPINK, SEVENTEEN), dance challenges (e.g., Espresso Dance), AI-driven content (e.g., Identity Swap), comedy, lifestyle (e.g., SustainableSeptember), and beauty trends, reflecting TikTok's global appeal.
    • High Engagement: Videos with reported metrics show millions of views (e.g., 29.4M for BLACKPINK’s JUMP) and likes, with hashtag trends like #Perfume reaching 39.3B views.
    • Multimodal Potential: Supports text analysis (descriptions, hashtags), numerical analysis (views, likes), and categorical analysis (categories, sounds).
    • Timeliness: Captures September 2025 trends, including seasonal (e.g., Autumn Cozy Challenge) and cultural moments (e.g., K-pop releases, viral memes).

    Potential Use Cases

    This dataset is ideal for a variety of machine learning and data analysis tasks on Kaggle, including but not limited to: - Virality Prediction: Use views, likes, and hashtags to train regression or classification models (e.g., XGBoost, neural networks) to predict video success. - Trend Analysis: Apply clustering (e.g., K-means) or topic modeling (e.g., LDA) to identify emerging content themes or regional differences. - NLP Applications: Analyze descriptions and hashtags with BERT or word embeddings to study sentiment, cultural trends, or influencer impact. - Time-Series Forecasting: Leverage upload_date and engagement metrics for temporal analysis of trend lifecycles. - Recommendation Systems: Build content recommendation models based on category, sound, or hashtag similarities. - Social Media Ethics: Explore AI-driven trends (e.g., deepfake Identity Swaps) for studies on misinformation or content authenticity.

    Data Collection

    • Methodology: Data was aggregated from public web sources (e.g., trend reports, news snippets) and X posts discussing viral TikTok content. No private or restricted data was used, ensuring ethical sourcing.
    • Limitations: Some metrics (e.g., shares, comments) are "N/A" due to limited public availability. View and like counts are reported where available, with aggregates for trends (e.g., 686.4K videos for #Ominous). Exact metrics may vary slightly due to real-time fluctuations.
    • Verification: All entries ...
  5. Tiktok Trending Hashtags

    • kaggle.com
    zip
    Updated Dec 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ronan Takizawa (2025). Tiktok Trending Hashtags [Dataset]. https://www.kaggle.com/datasets/ronantakizawa/tiktok-trending-hashtags
    Explore at:
    zip(18358 bytes)Available download formats
    Dataset updated
    Dec 1, 2025
    Authors
    Ronan Takizawa
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    TikTok Trending Hashtags (2022-2025)

    A comprehensive dataset of trending hashtags on TikTok from 2022 to 2025, containing 1,830 unique hashtag entries across multiple years, languages, and cultural contexts.

    📊 Dataset Description

    This dataset captures trending hashtags from TikTok's Creative Center, providing insights into viral content, cultural moments, and global events from 2022 to 2025.

    Data Source: TikTok Creative Center - Popular Hashtags

    Dataset Structure

    tag,year,rank,posts
    2024,2025,1,3000000
    2025,2025,2,2000000
    valentinesday,2025,3,1000000
    ...
    

    Columns: - tag (string): The hashtag name without the # symbol - year (integer): The year the hashtag was trending (2022-2025) - rank (integer): Rank within that year based on post count (1 = highest) - posts (integer): Total number of posts using this hashtag

    Dataset Statistics

    • Total Entries: 1,830 hashtags
    • Years Covered: 2022-2025
    • Languages: 10+ (English, Spanish, Arabic, Thai, Vietnamese, Portuguese, Chinese, Russian, Korean, and more)
    • Categories: Sports, Entertainment, News, Games, Cultural Events, Politics, Holidays

    Breakdown by Year: - 2025: 586 hashtags (most recent data) - 2024: 909 hashtags (most comprehensive) - 2023: 329 hashtags - 2022: 6 hashtags (limited early data)

    🔍 Key Insights

    Top Trending Hashtags by Year

    Year#1 HashtagPostsTheme
    2025#20243,000,000Year-in-review
    2024#christmas3,000,000Holiday season
    2023#20242,000,000New year anticipation
    2022#newyear286,000New year celebration

    Trends

    Hashtags appearing in multiple years (evergreen content): - #happynewyear - Present in 5 different contexts - #mondaymotivation - Consistent weekly trend across 5 instances - #benfica - Sports team trending across 5 periods - #newyear - 4 years of coverage - #valentinesday - Annual romantic holiday - #superbowl - Annual sports event

    2024 Highlights: - Elections: #trump (267K), #election2024 (136K), #kamalaharris (97K) - Sports: #copaamerica (362K), #olympics (25K), #messi (489K) - Entertainment: #squidgame (1M), #deadpool (32K), #billieeilish (199K) - Holidays: #christmas (3M), #valentinesday (1M), #diademuertos (956K)

    2023 Highlights: - Disney Centennial: #disney100 (829K) - Gaming: #fnaf (788K) - Cultural: #recuerdame (776K)

    2022 Highlights: - Soccer Legend: #pele (117.7K) - Viral Trends: #facechange (69.2K)

    Most Popular Categories: 1. Holidays & Celebrations (30%+): Christmas, New Year, Valentine's Day, Halloween 2. Sports & Outdoor (20%+): Soccer, NFL, Olympics, Basketball 3. Entertainment & News (25%+): Movies, TV shows, Celebrity news 4. Gaming (10%): Squid Game, FNAF, Fortnite, Mobile Legends 5. Cultural Events (10%): Dia de Muertos, Ramadan, Lunar New Year 6. Politics & Social (5%): Elections, protests, social movements

    Post Count Distribution: - Million+ posts: 8 hashtags (mega-viral content) - 500K-1M posts: 15 hashtags (highly viral) - 100K-500K posts: 250+ hashtags (popular trends) - Under 100K: Majority (niche or emerging trends)

  6. Tik Tok creator by hashtag

    • kaggle.com
    zip
    Updated Apr 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lai Wing Ho (2022). Tik Tok creator by hashtag [Dataset]. https://www.kaggle.com/datasets/laiwingho/tik-tok-creator-by-hashtag
    Explore at:
    zip(590 bytes)Available download formats
    Dataset updated
    Apr 11, 2022
    Authors
    Lai Wing Ho
    Description

    As of January 2022, the hashtag "fyp," which stands for "for you page," was the most used hashtag on TikTok, amassing over 18.57 trillion views across posts using it. The hashtag "viral" ranked second, with approximately 6.3 trillion views on TikTok short-video posts using the hashtag. Posts using the hashtag "duet," which refers to TikTok videos that can be shared, mirrored, and commented on by creators, collected around 2.4 trillion views as of January 2022.

  7. books_challenge _tiktok

    • kaggle.com
    zip
    Updated Dec 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ayoub chaoui (2021). books_challenge _tiktok [Dataset]. https://www.kaggle.com/datasets/ayoubchaoui/books-challenge-tiktok
    Explore at:
    zip(41161295 bytes)Available download formats
    Dataset updated
    Dec 8, 2021
    Authors
    ayoub chaoui
    Description

    Context

    TikTok's platform is mostly fueled by viral videos of users doing outlandish, scary, or funny things. On the platform, these trend and meme videos typically come with a hashtag that includes the word challenge. But what is a TikTok challenge and how do you find or create them? Here's everything you need to know.

    This TikTok book challenge was made by @haleyisfearless, . It asks you to show, your prettiest book,your tiniest book a book you highly suggest a book you're currently reading and one of your favorite books . In the most basic sense, these challenges originate from viral TikTok challenge isn't complete without its defining hashtag in the video's description

    These TikTok challenges are the perfect way to ease into what can be an intimidating social media platform and help you find your fellow book lovers.

    Acknowledgements

    This dataset is generated entirely from TikTok , so we want to thank @haleyisfearless for building such this challange video

    Inspiration

    the goal of this project is to make Python script which takes a video as input and returns all texts visible on the video. the videos are titlok videos so texts can appear everywhere on screen, with different background, font size etc..

  8. socialmedia

    • kaggle.com
    zip
    Updated Jul 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anoop Johny (2023). socialmedia [Dataset]. https://www.kaggle.com/datasets/anoopjohny/socialmedia
    Explore at:
    zip(4736 bytes)Available download formats
    Dataset updated
    Jul 30, 2023
    Authors
    Anoop Johny
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    This dataset provides a comprehensive and diverse snapshot of social media users and their engagements across various popular platforms such as Instagram, Twitter, Facebook, YouTube, Pinterest, TikTok, and Spotify. With 100 rows of anonymized data, it offers valuable insights into the dynamic world of social media usage. 😀

    Each row in the dataset represents a unique user with a designated User ID and Username to ensure anonymity. Alongside user-specific details, the dataset captures essential information, including the platform being used, the post's content, timestamp, and media type (text, image, or video). Additionally, it tracks engagement metrics such as likes, comments, shares/retweets, and user interactions, providing an overview of the user's popularity and social impact. 💬

    https://media.giphy.com/media/3GSoFVODOkiPBFArlu/giphy.gif" alt="social">

    The dataset also includes pertinent user attributes, such as account creation date, privacy settings, number of followers, and following. The users' profiles are further enriched with demographic characteristics, including anonymized representations of their age group and gender. 🗨️

    https://media.giphy.com/media/2tSodgDfwCjIMCBY8h/giphy.gif" alt="socialcat">

    Hashtags, mentions, media URLs, post URLs, and self-reported location contribute to understanding user interests, content themes, and geographic distribution. Moreover, users' bios and language preferences offer insights into their passions, activities, and linguistic communication on the platforms.

  9. COVID-19 Sentiment: 500K Instagram Posts (2020-24)

    • kaggle.com
    zip
    Updated Oct 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nirmalya Thakur, PhD (2024). COVID-19 Sentiment: 500K Instagram Posts (2020-24) [Dataset]. https://www.kaggle.com/datasets/thakurnirmalya/covid-19-sentiment-500k-instagram-posts-2020-24
    Explore at:
    zip(118444389 bytes)Available download formats
    Dataset updated
    Oct 21, 2024
    Authors
    Nirmalya Thakur, PhD
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Please cite the following paper when using this dataset:

    N. Thakur, “Five Years of COVID-19 Discourse on Instagram: A Labeled Instagram Dataset of Over Half a Million Posts for Multilingual Sentiment Analysis”, Proceedings of the 7th International Conference on Machine Learning and Natural Language Processing (MLNLP 2024), Chengdu, China, October 18-20, 2024 (Paper accepted for publication, Preprint available at: https://arxiv.org/abs/2410.03293)

    Abstract

    The outbreak of COVID-19 served as a catalyst for content creation and dissemination on social media platforms, as such platforms serve as virtual communities where people can connect and communicate with one another seamlessly. While there have been several works related to the mining and analysis of COVID-19-related posts on social media platforms such as Twitter (or X), YouTube, Facebook, and TikTok, there is still limited research that focuses on the public discourse on Instagram in this context. Furthermore, the prior works in this field have only focused on the development and analysis of datasets of Instagram posts published during the first few months of the outbreak. The work presented in this paper aims to address this research gap and presents a novel multilingual dataset of 500,153 Instagram posts about COVID-19 published between January 2020 and September 2024. This dataset contains Instagram posts in 161 different languages. After the development of this dataset, multilingual sentiment analysis was performed using VADER and twitter-xlm-roberta-base-sentiment. This process involved classifying each post as positive, negative, or neutral. The results of sentiment analysis are presented as a separate attribute in this dataset.

    For each of these posts, the Post ID, Post Description, Date of publication, language code, full version of the language, and sentiment label are presented as separate attributes in the dataset.

    The Instagram posts in this dataset are present in 161 different languages out of which the top 10 languages in terms of frequency are English (343041 posts), Spanish (30220 posts), Hindi (15832 posts), Portuguese (15779 posts), Indonesian (11491 posts), Tamil (9592 posts), Arabic (9416 posts), German (7822 posts), Italian (5162 posts), Turkish (4632 posts)

    There are 535,021 distinct hashtags in this dataset with the top 10 hashtags in terms of frequency being #covid19 (169865 posts), #covid (132485 posts), #coronavirus (117518 posts), #covid_19 (104069 posts), #covidtesting (95095 posts), #coronavirusupdates (75439 posts), #corona (39416 posts), #healthcare (38975 posts), #staysafe (36740 posts), #coronavirusoutbreak (34567 posts)

    The following is a description of the attributes present in this dataset - Post ID: Unique ID of each Instagram post - Post Description: Complete description of each post in the language in which it was originally published - Date: Date of publication in MM/DD/YYYY format - Language code: Language code (for example: “en”) that represents the language of the post as detected using the Google Translate API - Full Language: Full form of the language (for example: “English”) that represents the language of the post as detected using the Google Translate API - Sentiment: Results of sentiment analysis (using the preprocessed version of each post) where each post was classified as positive, negative, or neutral

    Open Research Questions

    This dataset is expected to be helpful for the investigation of the following research questions and even beyond:

    • How does sentiment toward COVID-19 vary across different languages?
    • How has public sentiment toward COVID-19 evolved from 2020 to the present?
    • How do cultural differences affect social media discourse about COVID-19 across various languages?
    • How has COVID-19 impacted mental health, as reflected in social media posts across different languages?
    • How effective were public health campaigns in shifting public sentiment in different languages?
    • What patterns of vaccine hesitancy or support are present in different languages?
    • How did geopolitical events influence public sentiment about COVID-19 in multilingual social media discourse?
    • What role does social media discourse play in shaping public behavior toward COVID-19 in different linguistic communities?
    • How does the sentiment of minority or underrepresented languages compare to that of major world languages regarding COVID-19?
    • What insights can be gained by comparing the sentiment of COVID-19 posts in widely spoken languages (e.g., English, Spanish) to those in less common languages?

    All the Instagram posts that were collected during this data mining process to develop this dataset were publicly available on Instagram and did not require a user to log in to Instagram to view the same (at the time of writing this paper).

  10. MTikGuard Dataset

    • kaggle.com
    zip
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KusNguyen (2025). MTikGuard Dataset [Dataset]. https://www.kaggle.com/datasets/kusnguyen/extra-dataset
    Explore at:
    zip(2137777416 bytes)Available download formats
    Dataset updated
    Jun 30, 2025
    Authors
    KusNguyen
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    This dataset is an extension of the TikHarm dataset, created to enhance multimodal harmful content detection on TikTok. It was developed as part of the MTikGuard system, a real-time moderation pipeline designed to protect young audiences from unsafe TikTok videos.

    🔹 Purpose

    The dataset supplements TikHarm with 775 additional annotated videos, collected from TikTok trending and targeted hashtag queries. These videos were selected to address class imbalance and content diversity gaps in the original dataset, improving model generalization for real-world deployment.

    🔹 Content

    Each video is labeled into one of four categories: - Safe - Adult Content - Harmful Content (e.g., dangerous challenges, graphic violence) - Suicide / Self-harm

    🔹 Data Collection & Annotation

    Collection: Automated crawling using Selenium and TikTok Content Scraper, coordinated via Apache Airflow and Apache Kafka.

    Annotation: Conducted via a custom web-based tool, following detailed guidelines to ensure consistency and reliability. Multiple annotators reviewed each video, with disagreements resolved via majority voting.

    Class balance: Oversampling of underrepresented categories (e.g., Suicide, Harmful Content) during collection.

    🔹 Applications

    Training and evaluating multimodal classification models for harmful content detection.

    Benchmarking real-time content moderation pipelines.

    Research on multimodal fusion strategies and multi-label classification.

  11. YouTube/TikTok Trends Dataset

    • kaggle.com
    zip
    Updated Sep 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tarek Elmasry (2025). YouTube/TikTok Trends Dataset [Dataset]. https://www.kaggle.com/datasets/tarekmasryo/youtube-shorts-and-tiktok-trends-2025/code
    Explore at:
    zip(14982241 bytes)Available download formats
    Dataset updated
    Sep 16, 2025
    Authors
    Tarek Elmasry
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    YouTube
    Description

    YouTube Shorts & TikTok Trends 2025

    Overview

    A global dataset capturing short-form video performance across YouTube Shorts and TikTok in 2025.
    It includes over 50,000 video records, available in both raw and machine learning–ready formats.
    Designed for reproducible EDA, dashboarding, and baseline ML modeling on social media engagement dynamics.

    Files Included

    FileDescriptionShape
    youtube_shorts_tiktok_trends_2025.csvRaw video-level data with full feature set~48k × ~58
    youtube_shorts_tiktok_trends_2025_ml.csvML-ready, cleaned and engineered version~50k × 32
    monthly_trends_2025.csvMonthly aggregates (Jan–Aug 2025)~480 × 8
    country_platform_summary_2025.csvCountry × platform summary statistics~60 × 14
    top_hashtags_2025.csvRanked list of top trending hashtags~82 × 18
    top_creators_impact_2025.csvCreator-level impact and influence metrics~1,000 × 20
    DATA_DICTIONARY.csvColumn names and definitions~58 × 2

    All files are UTF-8 encoded, cleaned, and schema-aligned for direct analysis.

    Key Columns (ML-Ready File)

    • Identifiers: video_id, platform, country, category, creator_tier
    • Engagement Metrics: views, likes, comments, shares, saves, completions
    • Derived Ratios: engagement_rate = (likes + comments + shares) / views, plus save_rate, share_rate, comment_rate
    • Signals: velocity indicators, rolling statistics, seasonality flags

    Recommended Uses

    • EDA: Analyze short-form engagement trends by country, platform, or content type
    • ML Modeling: Classify trend_label or predict engagement_rate and views
    • Dashboarding: Visualize global video trends and creator performance
    • Market Research: Study cultural and regional patterns of viral content

    Notes

    • trend_label is a snapshot trend proxy; baseline models typically reach 25–35% accuracy without temporal features.
    • publish_date_approx is derived and coarse — for trend direction only.
    • The dataset contains metadata only (no media content).

    If you find this dataset helpful, supporting it with an upvote helps others discover it too ✨

  12. TikTok Data - Amber Heard - Social Media 2022

    • kaggle.com
    zip
    Updated Jul 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amber Heard - Data Social Media Analysis (2022). TikTok Data - Amber Heard - Social Media 2022 [Dataset]. https://www.kaggle.com/datasets/amberhearddata/tiktok-data-amber-heard-social-media-2022
    Explore at:
    zip(660350769 bytes)Available download formats
    Dataset updated
    Jul 23, 2022
    Authors
    Amber Heard - Data Social Media Analysis
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Amber Heard TikTok Data from 2022 under 57 hashtags. Videos with full Metrics and information fields. On the Disinformation Operation harming Human Rights Activist Amber Heard. Comments of each post are included in the scraper.

    TikTok Hashtags: - Positive, Neutral, and Negative of 57 hashtags. Positive and Neutral: 1. amberheard 2. amberheardmera 3. amberheardisinnocent 4. amberheardaquaman 5. amberheardisasurvivor 6. amberheardisavictim 7. ibelieveamberheard 8. darvodepp 9. istandwithamber 10. istandwithamberheard 11. loveamberheard 12. wearewithyouamberheard 13. westandwithamberheard 14. standwithamberheard 15. teamah 16. teamamberheard 17. justiceforamberheard 18. johnnydeppisawifebeater 19. johnnydeppisguilty

    Negative: 1. aclusupportsabusers 2. amberhearddoesnotspeakforme 3. amberheardforjail 4. amberheardforprison 5. amberheardisacriminal 6. amberheardisafraud 7. amberheardisanabuser 8. amberheardisapsycopath 9. amberheardisguilty 10. amberheardisoverparty 11. amberheardjohnnydepp 12. amberheardperjury 13. amberheardslawyersucks 14. amberheardtrial 15. amberheard💩 16. amberheard🤡 17. amberheard🤮 18. amberpoop 19. amberturd 20. boycottaquaman2 21. boycottloreal 22. boycottwarnerbros 23. boycottwarnerbrothers 24. deppheardtrial 25. deppvheardtrial 26. deppvsheard 27. fireamberheard 28. istandbyjohnnydepp 29. johnnydepp 30. johnnydeppamberheard 31. johnnydeppisinnocent 32. johnnydepptrial 33. johnnydeppvsamberheard 34. justiceforjohnnydepp 35. putamberheardinjail 36. recastmera 37. teamjd 38. teamjohnnydepp

    Each Hashtag Feed shows 1000 videos per day of collections.

    From Public Research Study: https://github.com/RescueSocialTech/Amber-Heard_Disinformation_Operations_Bots

  13. Movie Dataset - 800 movies

    • kaggle.com
    zip
    Updated Apr 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seniru Hasith (2025). Movie Dataset - 800 movies [Dataset]. https://www.kaggle.com/datasets/seniruhasith/movie-dataset-800-movies/data
    Explore at:
    zip(96241 bytes)Available download formats
    Dataset updated
    Apr 13, 2025
    Authors
    Seniru Hasith
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    🎬 Movie Success Prediction Dataset

    This dataset was curated to support machine learning models that predict movie success based on a wide range of multi-modal features, including cast popularity, sentiment analysis, audio-visual cues, social media engagement, and metadata such as budget and IMDb rating.

    📦 Dataset Overview

    The dataset consists of 36 engineered features extracted from various sources:

    • Cast and Crew Insights (e.g., popularity trends, number of cast members)
    • Sentiment Analysis from YouTube Comments using VADER
    • Audio Features from movie trailers using VGGish 3
    • Video Features using ResNet-based frame analysis
    • TikTok Popularity Signals (hashtags, views, engagement rate)
    • Movie Metadata (e.g., budget, IMDb rating)

    Each row represents one movie. The dataset is ideal for classification or regression tasks related to box office success, revenue prediction, or audience engagement forecasting.

    📊 Feature Mapping

    Feature CodeFeature Name
    Feature_1cast_trend_1
    Feature_2cast_trend_2
    Feature_3cast_trend_3
    Feature_4avg_cast_popularity
    Feature_5top_cast_popularity
    Feature_6genre_score
    Feature_7positive_sentiment
    Feature_8neutral_sentiment
    Feature_9negative_sentiment
    Feature_10num_youtube_comments
    Feature_11num_cast_members
    Feature_12num_upcoming_movies
    Feature_13avg_upcoming_popularity
    Feature_14max_upcoming_popularity
    Feature_15tiktok_hashtag_views
    Feature_16tiktok_video_count
    Feature_17tiktok_total_likes
    Feature_18tiktok_total_comments
    Feature_19tiktok_total_shares
    Feature_20tiktok_engagement_rate
    Feature_21audio_tempo
    Feature_22audio_energy_mean
    Feature_23audio_energy_variance
    Feature_24audio_spectral_centroid_mean
    Feature_25audio_spectral_rolloff_mean
    Feature_26video_brightness_mean
    Feature_27video_colorfulness_mean
    Feature_28video_scene_change_rate
    Feature_29video_emotion_happy
    Feature_30video_emotion_sad
    Feature_31imdb_rating
    Feature_32budget
    Feature_33log_budget
    Feature_34sqrt_budget
    Feature_35budget_squared
    Feature_36budget_rating_interaction

    🛠️ Feature Engineering Highlights

    • Audio features were extracted using the VGGish 3 model, widely used in speech emotion recognition tasks.
    • Video features were obtained from a ResNet-based model analyzing brightness, scene change rate, colorfulness, and emotion cues.
    • Sentiment scores were derived from YouTube comments using VADER, capturing positive, neutral, and negative sentiment proportions.
    • TikTok engagement metrics were collected using hashtag data, capturing likes, views, shares, and overall engagement rate.
    • Budget transformations such as log, square root, and squared values are included, along with an interaction feature with IMDb rating.

    💡 Potential Use-Cases

    • Predict box office revenue or success labels
    • Analyze which audio-visual cues correlate with public interest
    • Build early-stage predictors of movie success using trailers and social signals
    • Inform marketing strategies using real-time sentiment and TikTok trends

    📥 Data Sources

    • IMDb for metadata
    • YouTube (comments and trailers) for sentiment and audio/visual analysis
    • TikTok for hashtag popularity and engagement stats
    • In-house processing for video/audio feature extraction using ResNet and VGGish 3

    🚀 Whether you're working on predictive modeling, multimedia analysis, or social signal correlation, this dataset provides a diverse feature set to explore what makes a movie successful.

  14. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Muhammad Anas Mahmood (2022). TikTok Dataset [Dataset]. https://www.kaggle.com/datasets/muhammadanasmahmood/tiktok-dataset
Organization logo

TikTok Dataset

Tiktok popular hashtags datset

Explore at:
zip(733532 bytes)Available download formats
Dataset updated
Jul 27, 2022
Authors
Muhammad Anas Mahmood
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

This is the Dataset of popular hashtags on TikTok, this includes the author name, author id, author signature, comment count, hashtags details, URL, share count, hashtags which i scrape are meme, funny, humor, comedy, education, lol, dance, song, music, etc.

Search
Clear search
Close search
Google apps
Main menu