Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
A comprehensive dataset of trending hashtags on TikTok from 2022 to 2025, containing 1,830 unique hashtag entries across multiple years, languages, and cultural contexts.
This dataset captures trending hashtags from TikTok's Creative Center, providing insights into viral content, cultural moments, and global events from 2022 to 2025.
Data Source: TikTok Creative Center - Popular Hashtags
tag,year,rank,posts
2024,2025,1,3000000
2025,2025,2,2000000
valentinesday,2025,3,1000000
...
Columns:
- tag (string): The hashtag name without the # symbol
- year (integer): The year the hashtag was trending (2022-2025)
- rank (integer): Rank within that year based on post count (1 = highest)
- posts (integer): Total number of posts using this hashtag
Breakdown by Year: - 2025: 586 hashtags (most recent data) - 2024: 909 hashtags (most comprehensive) - 2023: 329 hashtags - 2022: 6 hashtags (limited early data)
| Year | #1 Hashtag | Posts | Theme |
|---|---|---|---|
| 2025 | #2024 | 3,000,000 | Year-in-review |
| 2024 | #christmas | 3,000,000 | Holiday season |
| 2023 | #2024 | 2,000,000 | New year anticipation |
| 2022 | #newyear | 286,000 | New year celebration |
Hashtags appearing in multiple years (evergreen content): - #happynewyear - Present in 5 different contexts - #mondaymotivation - Consistent weekly trend across 5 instances - #benfica - Sports team trending across 5 periods - #newyear - 4 years of coverage - #valentinesday - Annual romantic holiday - #superbowl - Annual sports event
2024 Highlights: - Elections: #trump (267K), #election2024 (136K), #kamalaharris (97K) - Sports: #copaamerica (362K), #olympics (25K), #messi (489K) - Entertainment: #squidgame (1M), #deadpool (32K), #billieeilish (199K) - Holidays: #christmas (3M), #valentinesday (1M), #diademuertos (956K)
2023 Highlights: - Disney Centennial: #disney100 (829K) - Gaming: #fnaf (788K) - Cultural: #recuerdame (776K)
2022 Highlights: - Soccer Legend: #pele (117.7K) - Viral Trends: #facechange (69.2K)
Most Popular Categories: 1. Holidays & Celebrations (30%+): Christmas, New Year, Valentine's Day, Halloween 2. Sports & Outdoor (20%+): Soccer, NFL, Olympics, Basketball 3. Entertainment & News (25%+): Movies, TV shows, Celebrity news 4. Gaming (10%): Squid Game, FNAF, Fortnite, Mobile Legends 5. Cultural Events (10%): Dia de Muertos, Ramadan, Lunar New Year 6. Politics & Social (5%): Elections, protests, social movements
Post Count Distribution: - Million+ posts: 8 hashtags (mega-viral content) - 500K-1M posts: 15 hashtags (highly viral) - 100K-500K posts: 250+ hashtags (popular trends) - Under 100K: Majority (niche or emerging trends)
Facebook
TwitterAs of January 2022, the hashtag "fyp," which stands for "for you page," was the most used hashtag on TikTok, amassing over 18.57 trillion views across posts using it. The hashtag "viral" ranked second, with approximately 6.3 trillion views on TikTok short-video posts using the hashtag. Posts using the hashtag "duet," which refers to TikTok videos that can be shared, mirrored, and commented on by creators, collected around 2.4 trillion views as of January 2022.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is designed to help data scientists, analysts, and researchers understand, analyze, and predict viral content across major social media platforms. It captures realistic engagement behavior, sentiment signals, and content attributes that influence virality in todayโs digital ecosystem.
The dataset includes multi-platform data from: - TikTok - Instagram - X (Twitter) - YouTube Shorts
Each platform is represented with consistent metrics, making cross-platform comparison easy and reliable.
Ideal for NLP tasks, sentiment analysis, and hashtag impact studies.
These metrics allow deep analysis of user interaction patterns.
Perfect for machine learning models and classification tasks.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
TikTok is one of the hottest social media platforms out there, and it's only getting bigger. If you're looking to get in on the action, this dataset is for you!
This dataset contains a collection of videos from TikTok, including information on the user who posted the video, the number of likes, shares, and comments the video received, as well as the video's length and description. With this data, you can see what types of videos are popular on TikTok and start planning your own viral content!
- The dataset contains a collection of videos from the social media platform TikTok.
- The videos include information on the user who posted the video, the number of likes, shares, and comments the video received, as well as the video's length and description.
- The dataset also contains information on popular TikTok authors, including their unique ID, nickname, avatar thumbnail, signature, and whether or not their account is verified or private.
- Additionally, the dataset includes a list of trending videos on TikTok, as well as the number of likes, shares, comments, and plays each video has received
- Identifying popular TikTok authors to target for scraping videos and liked videos
- Finding trending videos on TikTok for further analysis
- Generating a list of videos from the TikTok app that are tagged with the #funny hashtag
License
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: tiktok_collected_liked_videos.csv | Column name | Description | |:---------------|:---------------------------------------------------------| | user_name | The name of the user who posted the video. (String) | | n_likes | The number of likes the video has received. (Integer) | | n_shares | The number of shares the video has received. (Integer) | | n_comments | The number of comments the video has received. (Integer) | | n_plays | The number of times the video has been played. (Integer) |
File: tiktok_collected_videos.csv | Column name | Description | |:---------------|:---------------------------------------------------------| | user_name | The name of the user who posted the video. (String) | | n_likes | The number of likes the video has received. (Integer) | | n_shares | The number of shares the video has received. (Integer) | | n_comments | The number of comments the video has received. (Integer) | | n_plays | The number of times the video has been played. (Integer) |
File: tiktok_funny_hashtag_videos.csv | Column name | Description | |:--------------------------|:-----------------------------------------------------------| | author_nickname | The author's nickname. (String) | | author_avatarThumb | The author's avatar thumbnail. (String) | | author_signature | The author's signature. (String) | | author_verification | Whether or not the author's account is verified. (Boolean) | | author_privateAccount | Whether or not the author's account is private. (Boolean) | | author_followingCount | The number of people the author is following. (Integer) | | author_followerCount | The number of people following the author. (Integer) | | author_heartCount | The number of hearts the author has. (Integer) | | author_diggCount | The number of diggs the author has. (Integer) | | music_title | The title of the music. (String) | | music_playUrl | The play url of the music. (String) | | music_coverThumb | The cover thumbnail of the music. (String) | | music_authorName | The author name of the music. (String) | | music_originality | The originality of the music. (String) | | music_duration | The duration of the music. (String) |
File: trending_authors.csv | Column name | Description ...
Facebook
TwitterThe dataset was originally obtained from TikTok's trending API by a GitHub user named Ivan Tran. It contains metadata on engagement with user-created videos and user profile data. The original create time is in Unix timecode format and is extracted directly from the video id number. TikTok's API has become much more difficult to access recently, so more current data is harder to obtain. The hashtags column contains lists.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset is an extension of the TikHarm dataset, created to enhance multimodal harmful content detection on TikTok. It was developed as part of the MTikGuard system, a real-time moderation pipeline designed to protect young audiences from unsafe TikTok videos.
๐น Purpose
The dataset supplements TikHarm with 775 additional annotated videos, collected from TikTok trending and targeted hashtag queries. These videos were selected to address class imbalance and content diversity gaps in the original dataset, improving model generalization for real-world deployment.
๐น Content
Each video is labeled into one of four categories: - Safe - Adult Content - Harmful Content (e.g., dangerous challenges, graphic violence) - Suicide / Self-harm
๐น Data Collection & Annotation
Collection: Automated crawling using Selenium and TikTok Content Scraper, coordinated via Apache Airflow and Apache Kafka.
Annotation: Conducted via a custom web-based tool, following detailed guidelines to ensure consistency and reliability. Multiple annotators reviewed each video, with disagreements resolved via majority voting.
Class balance: Oversampling of underrepresented categories (e.g., Suicide, Harmful Content) during collection.
๐น Applications
Training and evaluating multimodal classification models for harmful content detection.
Benchmarking real-time content moderation pipelines.
Research on multimodal fusion strategies and multi-label classification.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset provides a comprehensive and diverse snapshot of social media users and their engagements across various popular platforms such as Instagram, Twitter, Facebook, YouTube, Pinterest, TikTok, and Spotify. With 100 rows of anonymized data, it offers valuable insights into the dynamic world of social media usage. ๐
Each row in the dataset represents a unique user with a designated User ID and Username to ensure anonymity. Alongside user-specific details, the dataset captures essential information, including the platform being used, the post's content, timestamp, and media type (text, image, or video). Additionally, it tracks engagement metrics such as likes, comments, shares/retweets, and user interactions, providing an overview of the user's popularity and social impact. ๐ฌ
https://media.giphy.com/media/3GSoFVODOkiPBFArlu/giphy.gif" alt="social">
The dataset also includes pertinent user attributes, such as account creation date, privacy settings, number of followers, and following. The users' profiles are further enriched with demographic characteristics, including anonymized representations of their age group and gender. ๐จ๏ธ
https://media.giphy.com/media/2tSodgDfwCjIMCBY8h/giphy.gif" alt="socialcat">
Hashtags, mentions, media URLs, post URLs, and self-reported location contribute to understanding user interests, content themes, and geographic distribution. Moreover, users' bios and language preferences offer insights into their passions, activities, and linguistic communication on the platforms.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A global dataset capturing short-form video performance across YouTube Shorts and TikTok in 2025.
It includes over 50,000 video records, available in both raw and machine learningโready formats.
Designed for reproducible EDA, dashboarding, and baseline ML modeling on social media engagement dynamics.
| File | Description | Shape |
|---|---|---|
youtube_shorts_tiktok_trends_2025.csv | Raw video-level data with full feature set | ~48k ร ~58 |
youtube_shorts_tiktok_trends_2025_ml.csv | ML-ready, cleaned and engineered version | ~50k ร 32 |
monthly_trends_2025.csv | Monthly aggregates (JanโAug 2025) | ~480 ร 8 |
country_platform_summary_2025.csv | Country ร platform summary statistics | ~60 ร 14 |
top_hashtags_2025.csv | Ranked list of top trending hashtags | ~82 ร 18 |
top_creators_impact_2025.csv | Creator-level impact and influence metrics | ~1,000 ร 20 |
DATA_DICTIONARY.csv | Column names and definitions | ~58 ร 2 |
All files are UTF-8 encoded, cleaned, and schema-aligned for direct analysis.
video_id, platform, country, category, creator_tierviews, likes, comments, shares, saves, completionsengagement_rate = (likes + comments + shares) / views, plus save_rate, share_rate, comment_ratetrend_label or predict engagement_rate and views trend_label is a snapshot trend proxy; baseline models typically reach 25โ35% accuracy without temporal features. publish_date_approx is derived and coarse โ for trend direction only. If you find this dataset helpful, supporting it with an upvote helps others discover it too โจ
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was curated to support machine learning models that predict movie success based on a wide range of multi-modal features, including cast popularity, sentiment analysis, audio-visual cues, social media engagement, and metadata such as budget and IMDb rating.
The dataset consists of 36 engineered features extracted from various sources:
Each row represents one movie. The dataset is ideal for classification or regression tasks related to box office success, revenue prediction, or audience engagement forecasting.
| Feature Code | Feature Name |
|---|---|
| Feature_1 | cast_trend_1 |
| Feature_2 | cast_trend_2 |
| Feature_3 | cast_trend_3 |
| Feature_4 | avg_cast_popularity |
| Feature_5 | top_cast_popularity |
| Feature_6 | genre_score |
| Feature_7 | positive_sentiment |
| Feature_8 | neutral_sentiment |
| Feature_9 | negative_sentiment |
| Feature_10 | num_youtube_comments |
| Feature_11 | num_cast_members |
| Feature_12 | num_upcoming_movies |
| Feature_13 | avg_upcoming_popularity |
| Feature_14 | max_upcoming_popularity |
| Feature_15 | tiktok_hashtag_views |
| Feature_16 | tiktok_video_count |
| Feature_17 | tiktok_total_likes |
| Feature_18 | tiktok_total_comments |
| Feature_19 | tiktok_total_shares |
| Feature_20 | tiktok_engagement_rate |
| Feature_21 | audio_tempo |
| Feature_22 | audio_energy_mean |
| Feature_23 | audio_energy_variance |
| Feature_24 | audio_spectral_centroid_mean |
| Feature_25 | audio_spectral_rolloff_mean |
| Feature_26 | video_brightness_mean |
| Feature_27 | video_colorfulness_mean |
| Feature_28 | video_scene_change_rate |
| Feature_29 | video_emotion_happy |
| Feature_30 | video_emotion_sad |
| Feature_31 | imdb_rating |
| Feature_32 | budget |
| Feature_33 | log_budget |
| Feature_34 | sqrt_budget |
| Feature_35 | budget_squared |
| Feature_36 | budget_rating_interaction |
๐ Whether you're working on predictive modeling, multimedia analysis, or social signal correlation, this dataset provides a diverse feature set to explore what makes a movie successful.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
A comprehensive dataset of trending hashtags on TikTok from 2022 to 2025, containing 1,830 unique hashtag entries across multiple years, languages, and cultural contexts.
This dataset captures trending hashtags from TikTok's Creative Center, providing insights into viral content, cultural moments, and global events from 2022 to 2025.
Data Source: TikTok Creative Center - Popular Hashtags
tag,year,rank,posts
2024,2025,1,3000000
2025,2025,2,2000000
valentinesday,2025,3,1000000
...
Columns:
- tag (string): The hashtag name without the # symbol
- year (integer): The year the hashtag was trending (2022-2025)
- rank (integer): Rank within that year based on post count (1 = highest)
- posts (integer): Total number of posts using this hashtag
Breakdown by Year: - 2025: 586 hashtags (most recent data) - 2024: 909 hashtags (most comprehensive) - 2023: 329 hashtags - 2022: 6 hashtags (limited early data)
| Year | #1 Hashtag | Posts | Theme |
|---|---|---|---|
| 2025 | #2024 | 3,000,000 | Year-in-review |
| 2024 | #christmas | 3,000,000 | Holiday season |
| 2023 | #2024 | 2,000,000 | New year anticipation |
| 2022 | #newyear | 286,000 | New year celebration |
Hashtags appearing in multiple years (evergreen content): - #happynewyear - Present in 5 different contexts - #mondaymotivation - Consistent weekly trend across 5 instances - #benfica - Sports team trending across 5 periods - #newyear - 4 years of coverage - #valentinesday - Annual romantic holiday - #superbowl - Annual sports event
2024 Highlights: - Elections: #trump (267K), #election2024 (136K), #kamalaharris (97K) - Sports: #copaamerica (362K), #olympics (25K), #messi (489K) - Entertainment: #squidgame (1M), #deadpool (32K), #billieeilish (199K) - Holidays: #christmas (3M), #valentinesday (1M), #diademuertos (956K)
2023 Highlights: - Disney Centennial: #disney100 (829K) - Gaming: #fnaf (788K) - Cultural: #recuerdame (776K)
2022 Highlights: - Soccer Legend: #pele (117.7K) - Viral Trends: #facechange (69.2K)
Most Popular Categories: 1. Holidays & Celebrations (30%+): Christmas, New Year, Valentine's Day, Halloween 2. Sports & Outdoor (20%+): Soccer, NFL, Olympics, Basketball 3. Entertainment & News (25%+): Movies, TV shows, Celebrity news 4. Gaming (10%): Squid Game, FNAF, Fortnite, Mobile Legends 5. Cultural Events (10%): Dia de Muertos, Ramadan, Lunar New Year 6. Politics & Social (5%): Elections, protests, social movements
Post Count Distribution: - Million+ posts: 8 hashtags (mega-viral content) - 500K-1M posts: 15 hashtags (highly viral) - 100K-500K posts: 250+ hashtags (popular trends) - Under 100K: Majority (niche or emerging trends)