Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is the Dataset of popular hashtags on TikTok, this includes the author name, author id, author signature, comment count, hashtags details, URL, share count, hashtags which i scrape are meme, funny, humor, comedy, education, lol, dance, song, music, etc.
Facebook
TwitterThis dataset contains comprehensive information about TikTok posts, originally fetched from RapidAPI. It provides valuable insights into various aspects of TikTok content, including details about the videos, their creators, and audience engagement metrics.
Here's a breakdown of the columns included in this dataset:
video_id: A unique identifier for each TikTok video. author: The username or handle of the TikTok account that posted the video. description: The textual description or caption provided by the creator for the video. (Note: This column contains some missing values.) likes: The number of likes the video has received. comments: The number of comments on the video. shares: The number of times the video has been shared. plays: The total number of plays or views the video has accumulated. (Note: This column contains some missing values.) hashtags: A list of hashtags used in the video's description, which helps categorize content and improve discoverability. (Note: This column contains some missing values.) music: Information about the background music or sound used in the video. create_time: The timestamp indicating when the video was created or published. (Note: This column contains some missing values.) video_url: The direct URL to the TikTok video. fetch_time: The timestamp when the data for the video was fetched from the API. (Note: This column has a high number of missing values.) views: Another metric for the number of views. (Note: This column has a high number of missing values and appears to overlap with plays.) posted_time: The time the video was posted. (Note: This column has a high number of missing values and appears to overlap with create_time.) Potential Uses of This Dataset:
Content Analysis: Analyze popular TikTok content by examining descriptions, hashtags, and engagement metrics. Trend Identification: Identify trending topics, music, and creators on TikTok. Audience Engagement Studies: Understand how different types of content generate likes, comments, shares, and plays. Creator Analysis: Study the posting habits and performance of various TikTok creators. Social Media Research: Conduct research on the dynamics of content dissemination and user interaction on short-form video platforms. Notes on Data Quality:
The description, plays, hashtags, and create_time columns have some missing values, which may require handling (e.g., imputation or removal) depending on your analysis. The fetch_time, views, and posted_time columns are largely empty, suggesting they may not be reliable for comprehensive analysis. It is recommended to primarily rely on create_time for timestamps and plays for engagement metrics. This dataset can be a valuable resource for anyone looking to explore the vast and dynamic world of TikTok content and user engagement.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset captures the pulse of viral social media trends across TikTok, Instagram, Twitter, and YouTube. It provides insights into the most popular hashtags, content types, and user engagement levels, offering a comprehensive view of how trends unfold across platforms. With regional data and influencer-driven content, this dataset is perfect for:
Dive in to explore what makes content go viral, the behaviors that drive engagement, and how trends evolve on a global scale! 🌍
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset, titled TikTok Viral Trends 2025, provides a curated snapshot of 50 trending TikTok videos from September 2025, capturing the platform's dynamic content landscape. Sourced from real-time web analyses and social media insights (e.g., X posts, trend reports from reputable sources like Ramdam, NapoleonCat, and Tokchart), it focuses on viral videos across diverse categories such as Entertainment, Music, Comedy, Lifestyle, Beauty, Sustainability, and Technology. The dataset is designed for data scientists, researchers, and enthusiasts interested in analyzing social media trends, predicting virality, or exploring multimodal machine learning applications (e.g., NLP, time-series, or clustering). It stands out from existing Kaggle datasets by offering fresh, 2025-specific data with rich metadata, including engagement metrics, hashtags, and sound/trend associations.
tiktok_data.csv).post:72, web:65).The dataset contains the following 12 columns:
- video_id: Unique identifier for each video or trend (integer or hashtag-based).
- author: Creator username or group (anonymized as "Unknown" where not specified).
- description: Brief summary of the video content or trend, derived from source context.
- upload_date: Approximate or exact posting date (YYYY-MM-DD).
- views: Reported view count (e.g., millions, billions for hashtag aggregates; "N/A" if unavailable).
- likes: Reported like count (e.g., thousands, millions; "N/A" if unavailable).
- shares: Share count (often "N/A" due to limited public data).
- comments: Comment count (often "N/A" due to limited public data).
- hashtags: Key hashtags associated with the video or trend (e.g., #Kpop, #Viral).
- category: Inferred content category (e.g., Entertainment, Music, Comedy, Lifestyle, Sustainability, Tech).
- sound_or_trend: Associated audio track or challenge name driving the trend (e.g., "Soda Pop dance", "JUMP").
- source: Citation of data origin (e.g., post:72 for X post ID, web:65 for web source ID).
#Perfume reaching 39.3B views.This dataset is ideal for a variety of machine learning and data analysis tasks on Kaggle, including but not limited to:
- Virality Prediction: Use views, likes, and hashtags to train regression or classification models (e.g., XGBoost, neural networks) to predict video success.
- Trend Analysis: Apply clustering (e.g., K-means) or topic modeling (e.g., LDA) to identify emerging content themes or regional differences.
- NLP Applications: Analyze descriptions and hashtags with BERT or word embeddings to study sentiment, cultural trends, or influencer impact.
- Time-Series Forecasting: Leverage upload_date and engagement metrics for temporal analysis of trend lifecycles.
- Recommendation Systems: Build content recommendation models based on category, sound, or hashtag similarities.
- Social Media Ethics: Explore AI-driven trends (e.g., deepfake Identity Swaps) for studies on misinformation or content authenticity.
#Ominous). Exact metrics may vary slightly due to real-time fluctuations.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
A comprehensive dataset of trending hashtags on TikTok from 2022 to 2025, containing 1,830 unique hashtag entries across multiple years, languages, and cultural contexts.
This dataset captures trending hashtags from TikTok's Creative Center, providing insights into viral content, cultural moments, and global events from 2022 to 2025.
Data Source: TikTok Creative Center - Popular Hashtags
tag,year,rank,posts
2024,2025,1,3000000
2025,2025,2,2000000
valentinesday,2025,3,1000000
...
Columns:
- tag (string): The hashtag name without the # symbol
- year (integer): The year the hashtag was trending (2022-2025)
- rank (integer): Rank within that year based on post count (1 = highest)
- posts (integer): Total number of posts using this hashtag
Breakdown by Year: - 2025: 586 hashtags (most recent data) - 2024: 909 hashtags (most comprehensive) - 2023: 329 hashtags - 2022: 6 hashtags (limited early data)
| Year | #1 Hashtag | Posts | Theme |
|---|---|---|---|
| 2025 | #2024 | 3,000,000 | Year-in-review |
| 2024 | #christmas | 3,000,000 | Holiday season |
| 2023 | #2024 | 2,000,000 | New year anticipation |
| 2022 | #newyear | 286,000 | New year celebration |
Hashtags appearing in multiple years (evergreen content): - #happynewyear - Present in 5 different contexts - #mondaymotivation - Consistent weekly trend across 5 instances - #benfica - Sports team trending across 5 periods - #newyear - 4 years of coverage - #valentinesday - Annual romantic holiday - #superbowl - Annual sports event
2024 Highlights: - Elections: #trump (267K), #election2024 (136K), #kamalaharris (97K) - Sports: #copaamerica (362K), #olympics (25K), #messi (489K) - Entertainment: #squidgame (1M), #deadpool (32K), #billieeilish (199K) - Holidays: #christmas (3M), #valentinesday (1M), #diademuertos (956K)
2023 Highlights: - Disney Centennial: #disney100 (829K) - Gaming: #fnaf (788K) - Cultural: #recuerdame (776K)
2022 Highlights: - Soccer Legend: #pele (117.7K) - Viral Trends: #facechange (69.2K)
Most Popular Categories: 1. Holidays & Celebrations (30%+): Christmas, New Year, Valentine's Day, Halloween 2. Sports & Outdoor (20%+): Soccer, NFL, Olympics, Basketball 3. Entertainment & News (25%+): Movies, TV shows, Celebrity news 4. Gaming (10%): Squid Game, FNAF, Fortnite, Mobile Legends 5. Cultural Events (10%): Dia de Muertos, Ramadan, Lunar New Year 6. Politics & Social (5%): Elections, protests, social movements
Post Count Distribution: - Million+ posts: 8 hashtags (mega-viral content) - 500K-1M posts: 15 hashtags (highly viral) - 100K-500K posts: 250+ hashtags (popular trends) - Under 100K: Majority (niche or emerging trends)
Facebook
TwitterAs of January 2022, the hashtag "fyp," which stands for "for you page," was the most used hashtag on TikTok, amassing over 18.57 trillion views across posts using it. The hashtag "viral" ranked second, with approximately 6.3 trillion views on TikTok short-video posts using the hashtag. Posts using the hashtag "duet," which refers to TikTok videos that can be shared, mirrored, and commented on by creators, collected around 2.4 trillion views as of January 2022.
Facebook
TwitterTikTok's platform is mostly fueled by viral videos of users doing outlandish, scary, or funny things. On the platform, these trend and meme videos typically come with a hashtag that includes the word challenge. But what is a TikTok challenge and how do you find or create them? Here's everything you need to know.
This TikTok book challenge was made by @haleyisfearless, . It asks you to show, your prettiest book,your tiniest book a book you highly suggest a book you're currently reading and one of your favorite books . In the most basic sense, these challenges originate from viral TikTok challenge isn't complete without its defining hashtag in the video's description
These TikTok challenges are the perfect way to ease into what can be an intimidating social media platform and help you find your fellow book lovers.
This dataset is generated entirely from TikTok , so we want to thank @haleyisfearless for building such this challange video
the goal of this project is to make Python script which takes a video as input and returns all texts visible on the video. the videos are titlok videos so texts can appear everywhere on screen, with different background, font size etc..
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset provides a comprehensive and diverse snapshot of social media users and their engagements across various popular platforms such as Instagram, Twitter, Facebook, YouTube, Pinterest, TikTok, and Spotify. With 100 rows of anonymized data, it offers valuable insights into the dynamic world of social media usage. 😀
Each row in the dataset represents a unique user with a designated User ID and Username to ensure anonymity. Alongside user-specific details, the dataset captures essential information, including the platform being used, the post's content, timestamp, and media type (text, image, or video). Additionally, it tracks engagement metrics such as likes, comments, shares/retweets, and user interactions, providing an overview of the user's popularity and social impact. 💬
https://media.giphy.com/media/3GSoFVODOkiPBFArlu/giphy.gif" alt="social">
The dataset also includes pertinent user attributes, such as account creation date, privacy settings, number of followers, and following. The users' profiles are further enriched with demographic characteristics, including anonymized representations of their age group and gender. 🗨️
https://media.giphy.com/media/2tSodgDfwCjIMCBY8h/giphy.gif" alt="socialcat">
Hashtags, mentions, media URLs, post URLs, and self-reported location contribute to understanding user interests, content themes, and geographic distribution. Moreover, users' bios and language preferences offer insights into their passions, activities, and linguistic communication on the platforms.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Please cite the following paper when using this dataset:
N. Thakur, “Five Years of COVID-19 Discourse on Instagram: A Labeled Instagram Dataset of Over Half a Million Posts for Multilingual Sentiment Analysis”, Proceedings of the 7th International Conference on Machine Learning and Natural Language Processing (MLNLP 2024), Chengdu, China, October 18-20, 2024 (Paper accepted for publication, Preprint available at: https://arxiv.org/abs/2410.03293)
Abstract
The outbreak of COVID-19 served as a catalyst for content creation and dissemination on social media platforms, as such platforms serve as virtual communities where people can connect and communicate with one another seamlessly. While there have been several works related to the mining and analysis of COVID-19-related posts on social media platforms such as Twitter (or X), YouTube, Facebook, and TikTok, there is still limited research that focuses on the public discourse on Instagram in this context. Furthermore, the prior works in this field have only focused on the development and analysis of datasets of Instagram posts published during the first few months of the outbreak. The work presented in this paper aims to address this research gap and presents a novel multilingual dataset of 500,153 Instagram posts about COVID-19 published between January 2020 and September 2024. This dataset contains Instagram posts in 161 different languages. After the development of this dataset, multilingual sentiment analysis was performed using VADER and twitter-xlm-roberta-base-sentiment. This process involved classifying each post as positive, negative, or neutral. The results of sentiment analysis are presented as a separate attribute in this dataset.
For each of these posts, the Post ID, Post Description, Date of publication, language code, full version of the language, and sentiment label are presented as separate attributes in the dataset.
The Instagram posts in this dataset are present in 161 different languages out of which the top 10 languages in terms of frequency are English (343041 posts), Spanish (30220 posts), Hindi (15832 posts), Portuguese (15779 posts), Indonesian (11491 posts), Tamil (9592 posts), Arabic (9416 posts), German (7822 posts), Italian (5162 posts), Turkish (4632 posts)
There are 535,021 distinct hashtags in this dataset with the top 10 hashtags in terms of frequency being #covid19 (169865 posts), #covid (132485 posts), #coronavirus (117518 posts), #covid_19 (104069 posts), #covidtesting (95095 posts), #coronavirusupdates (75439 posts), #corona (39416 posts), #healthcare (38975 posts), #staysafe (36740 posts), #coronavirusoutbreak (34567 posts)
The following is a description of the attributes present in this dataset - Post ID: Unique ID of each Instagram post - Post Description: Complete description of each post in the language in which it was originally published - Date: Date of publication in MM/DD/YYYY format - Language code: Language code (for example: “en”) that represents the language of the post as detected using the Google Translate API - Full Language: Full form of the language (for example: “English”) that represents the language of the post as detected using the Google Translate API - Sentiment: Results of sentiment analysis (using the preprocessed version of each post) where each post was classified as positive, negative, or neutral
Open Research Questions
This dataset is expected to be helpful for the investigation of the following research questions and even beyond:
All the Instagram posts that were collected during this data mining process to develop this dataset were publicly available on Instagram and did not require a user to log in to Instagram to view the same (at the time of writing this paper).
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset is an extension of the TikHarm dataset, created to enhance multimodal harmful content detection on TikTok. It was developed as part of the MTikGuard system, a real-time moderation pipeline designed to protect young audiences from unsafe TikTok videos.
🔹 Purpose
The dataset supplements TikHarm with 775 additional annotated videos, collected from TikTok trending and targeted hashtag queries. These videos were selected to address class imbalance and content diversity gaps in the original dataset, improving model generalization for real-world deployment.
🔹 Content
Each video is labeled into one of four categories: - Safe - Adult Content - Harmful Content (e.g., dangerous challenges, graphic violence) - Suicide / Self-harm
🔹 Data Collection & Annotation
Collection: Automated crawling using Selenium and TikTok Content Scraper, coordinated via Apache Airflow and Apache Kafka.
Annotation: Conducted via a custom web-based tool, following detailed guidelines to ensure consistency and reliability. Multiple annotators reviewed each video, with disagreements resolved via majority voting.
Class balance: Oversampling of underrepresented categories (e.g., Suicide, Harmful Content) during collection.
🔹 Applications
Training and evaluating multimodal classification models for harmful content detection.
Benchmarking real-time content moderation pipelines.
Research on multimodal fusion strategies and multi-label classification.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A global dataset capturing short-form video performance across YouTube Shorts and TikTok in 2025.
It includes over 50,000 video records, available in both raw and machine learning–ready formats.
Designed for reproducible EDA, dashboarding, and baseline ML modeling on social media engagement dynamics.
| File | Description | Shape |
|---|---|---|
youtube_shorts_tiktok_trends_2025.csv | Raw video-level data with full feature set | ~48k × ~58 |
youtube_shorts_tiktok_trends_2025_ml.csv | ML-ready, cleaned and engineered version | ~50k × 32 |
monthly_trends_2025.csv | Monthly aggregates (Jan–Aug 2025) | ~480 × 8 |
country_platform_summary_2025.csv | Country × platform summary statistics | ~60 × 14 |
top_hashtags_2025.csv | Ranked list of top trending hashtags | ~82 × 18 |
top_creators_impact_2025.csv | Creator-level impact and influence metrics | ~1,000 × 20 |
DATA_DICTIONARY.csv | Column names and definitions | ~58 × 2 |
All files are UTF-8 encoded, cleaned, and schema-aligned for direct analysis.
video_id, platform, country, category, creator_tierviews, likes, comments, shares, saves, completionsengagement_rate = (likes + comments + shares) / views, plus save_rate, share_rate, comment_ratetrend_label or predict engagement_rate and views trend_label is a snapshot trend proxy; baseline models typically reach 25–35% accuracy without temporal features. publish_date_approx is derived and coarse — for trend direction only. If you find this dataset helpful, supporting it with an upvote helps others discover it too ✨
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Amber Heard TikTok Data from 2022 under 57 hashtags. Videos with full Metrics and information fields. On the Disinformation Operation harming Human Rights Activist Amber Heard. Comments of each post are included in the scraper.
TikTok Hashtags: - Positive, Neutral, and Negative of 57 hashtags. Positive and Neutral: 1. amberheard 2. amberheardmera 3. amberheardisinnocent 4. amberheardaquaman 5. amberheardisasurvivor 6. amberheardisavictim 7. ibelieveamberheard 8. darvodepp 9. istandwithamber 10. istandwithamberheard 11. loveamberheard 12. wearewithyouamberheard 13. westandwithamberheard 14. standwithamberheard 15. teamah 16. teamamberheard 17. justiceforamberheard 18. johnnydeppisawifebeater 19. johnnydeppisguilty
Negative: 1. aclusupportsabusers 2. amberhearddoesnotspeakforme 3. amberheardforjail 4. amberheardforprison 5. amberheardisacriminal 6. amberheardisafraud 7. amberheardisanabuser 8. amberheardisapsycopath 9. amberheardisguilty 10. amberheardisoverparty 11. amberheardjohnnydepp 12. amberheardperjury 13. amberheardslawyersucks 14. amberheardtrial 15. amberheard💩 16. amberheard🤡 17. amberheard🤮 18. amberpoop 19. amberturd 20. boycottaquaman2 21. boycottloreal 22. boycottwarnerbros 23. boycottwarnerbrothers 24. deppheardtrial 25. deppvheardtrial 26. deppvsheard 27. fireamberheard 28. istandbyjohnnydepp 29. johnnydepp 30. johnnydeppamberheard 31. johnnydeppisinnocent 32. johnnydepptrial 33. johnnydeppvsamberheard 34. justiceforjohnnydepp 35. putamberheardinjail 36. recastmera 37. teamjd 38. teamjohnnydepp
Each Hashtag Feed shows 1000 videos per day of collections.
From Public Research Study: https://github.com/RescueSocialTech/Amber-Heard_Disinformation_Operations_Bots
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was curated to support machine learning models that predict movie success based on a wide range of multi-modal features, including cast popularity, sentiment analysis, audio-visual cues, social media engagement, and metadata such as budget and IMDb rating.
The dataset consists of 36 engineered features extracted from various sources:
Each row represents one movie. The dataset is ideal for classification or regression tasks related to box office success, revenue prediction, or audience engagement forecasting.
| Feature Code | Feature Name |
|---|---|
| Feature_1 | cast_trend_1 |
| Feature_2 | cast_trend_2 |
| Feature_3 | cast_trend_3 |
| Feature_4 | avg_cast_popularity |
| Feature_5 | top_cast_popularity |
| Feature_6 | genre_score |
| Feature_7 | positive_sentiment |
| Feature_8 | neutral_sentiment |
| Feature_9 | negative_sentiment |
| Feature_10 | num_youtube_comments |
| Feature_11 | num_cast_members |
| Feature_12 | num_upcoming_movies |
| Feature_13 | avg_upcoming_popularity |
| Feature_14 | max_upcoming_popularity |
| Feature_15 | tiktok_hashtag_views |
| Feature_16 | tiktok_video_count |
| Feature_17 | tiktok_total_likes |
| Feature_18 | tiktok_total_comments |
| Feature_19 | tiktok_total_shares |
| Feature_20 | tiktok_engagement_rate |
| Feature_21 | audio_tempo |
| Feature_22 | audio_energy_mean |
| Feature_23 | audio_energy_variance |
| Feature_24 | audio_spectral_centroid_mean |
| Feature_25 | audio_spectral_rolloff_mean |
| Feature_26 | video_brightness_mean |
| Feature_27 | video_colorfulness_mean |
| Feature_28 | video_scene_change_rate |
| Feature_29 | video_emotion_happy |
| Feature_30 | video_emotion_sad |
| Feature_31 | imdb_rating |
| Feature_32 | budget |
| Feature_33 | log_budget |
| Feature_34 | sqrt_budget |
| Feature_35 | budget_squared |
| Feature_36 | budget_rating_interaction |
🚀 Whether you're working on predictive modeling, multimedia analysis, or social signal correlation, this dataset provides a diverse feature set to explore what makes a movie successful.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is the Dataset of popular hashtags on TikTok, this includes the author name, author id, author signature, comment count, hashtags details, URL, share count, hashtags which i scrape are meme, funny, humor, comedy, education, lol, dance, song, music, etc.