Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is the Dataset of popular hashtags on TikTok, this includes the author name, author id, author signature, comment count, hashtags details, URL, share count, hashtags which i scrape are meme, funny, humor, comedy, education, lol, dance, song, music, etc.
Facebook
TwitterThis dataset contains comprehensive information about TikTok posts, originally fetched from RapidAPI. It provides valuable insights into various aspects of TikTok content, including details about the videos, their creators, and audience engagement metrics.
Here's a breakdown of the columns included in this dataset:
video_id: A unique identifier for each TikTok video. author: The username or handle of the TikTok account that posted the video. description: The textual description or caption provided by the creator for the video. (Note: This column contains some missing values.) likes: The number of likes the video has received. comments: The number of comments on the video. shares: The number of times the video has been shared. plays: The total number of plays or views the video has accumulated. (Note: This column contains some missing values.) hashtags: A list of hashtags used in the video's description, which helps categorize content and improve discoverability. (Note: This column contains some missing values.) music: Information about the background music or sound used in the video. create_time: The timestamp indicating when the video was created or published. (Note: This column contains some missing values.) video_url: The direct URL to the TikTok video. fetch_time: The timestamp when the data for the video was fetched from the API. (Note: This column has a high number of missing values.) views: Another metric for the number of views. (Note: This column has a high number of missing values and appears to overlap with plays.) posted_time: The time the video was posted. (Note: This column has a high number of missing values and appears to overlap with create_time.) Potential Uses of This Dataset:
Content Analysis: Analyze popular TikTok content by examining descriptions, hashtags, and engagement metrics. Trend Identification: Identify trending topics, music, and creators on TikTok. Audience Engagement Studies: Understand how different types of content generate likes, comments, shares, and plays. Creator Analysis: Study the posting habits and performance of various TikTok creators. Social Media Research: Conduct research on the dynamics of content dissemination and user interaction on short-form video platforms. Notes on Data Quality:
The description, plays, hashtags, and create_time columns have some missing values, which may require handling (e.g., imputation or removal) depending on your analysis. The fetch_time, views, and posted_time columns are largely empty, suggesting they may not be reliable for comprehensive analysis. It is recommended to primarily rely on create_time for timestamps and plays for engagement metrics. This dataset can be a valuable resource for anyone looking to explore the vast and dynamic world of TikTok content and user engagement.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset captures the pulse of viral social media trends across TikTok, Instagram, Twitter, and YouTube. It provides insights into the most popular hashtags, content types, and user engagement levels, offering a comprehensive view of how trends unfold across platforms. With regional data and influencer-driven content, this dataset is perfect for:
Dive in to explore what makes content go viral, the behaviors that drive engagement, and how trends evolve on a global scale! 🌍
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
A comprehensive dataset of trending hashtags on TikTok from 2022 to 2025, containing 1,830 unique hashtag entries across multiple years, languages, and cultural contexts.
This dataset captures trending hashtags from TikTok's Creative Center, providing insights into viral content, cultural moments, and global events from 2022 to 2025.
Data Source: TikTok Creative Center - Popular Hashtags
tag,year,rank,posts
2024,2025,1,3000000
2025,2025,2,2000000
valentinesday,2025,3,1000000
...
Columns:
- tag (string): The hashtag name without the # symbol
- year (integer): The year the hashtag was trending (2022-2025)
- rank (integer): Rank within that year based on post count (1 = highest)
- posts (integer): Total number of posts using this hashtag
Breakdown by Year: - 2025: 586 hashtags (most recent data) - 2024: 909 hashtags (most comprehensive) - 2023: 329 hashtags - 2022: 6 hashtags (limited early data)
| Year | #1 Hashtag | Posts | Theme |
|---|---|---|---|
| 2025 | #2024 | 3,000,000 | Year-in-review |
| 2024 | #christmas | 3,000,000 | Holiday season |
| 2023 | #2024 | 2,000,000 | New year anticipation |
| 2022 | #newyear | 286,000 | New year celebration |
Hashtags appearing in multiple years (evergreen content): - #happynewyear - Present in 5 different contexts - #mondaymotivation - Consistent weekly trend across 5 instances - #benfica - Sports team trending across 5 periods - #newyear - 4 years of coverage - #valentinesday - Annual romantic holiday - #superbowl - Annual sports event
2024 Highlights: - Elections: #trump (267K), #election2024 (136K), #kamalaharris (97K) - Sports: #copaamerica (362K), #olympics (25K), #messi (489K) - Entertainment: #squidgame (1M), #deadpool (32K), #billieeilish (199K) - Holidays: #christmas (3M), #valentinesday (1M), #diademuertos (956K)
2023 Highlights: - Disney Centennial: #disney100 (829K) - Gaming: #fnaf (788K) - Cultural: #recuerdame (776K)
2022 Highlights: - Soccer Legend: #pele (117.7K) - Viral Trends: #facechange (69.2K)
Most Popular Categories: 1. Holidays & Celebrations (30%+): Christmas, New Year, Valentine's Day, Halloween 2. Sports & Outdoor (20%+): Soccer, NFL, Olympics, Basketball 3. Entertainment & News (25%+): Movies, TV shows, Celebrity news 4. Gaming (10%): Squid Game, FNAF, Fortnite, Mobile Legends 5. Cultural Events (10%): Dia de Muertos, Ramadan, Lunar New Year 6. Politics & Social (5%): Elections, protests, social movements
Post Count Distribution: - Million+ posts: 8 hashtags (mega-viral content) - 500K-1M posts: 15 hashtags (highly viral) - 100K-500K posts: 250+ hashtags (popular trends) - Under 100K: Majority (niche or emerging trends)
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains information about TikTok videos, including user interactions and video details. It includes features such as video ID, username, video title, likes, comments, shares, views, and more. This dataset is useful for analyzing video performance and user engagement on TikTok.
Columns:
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset, titled TikTok Viral Trends 2025, provides a curated snapshot of 50 trending TikTok videos from September 2025, capturing the platform's dynamic content landscape. Sourced from real-time web analyses and social media insights (e.g., X posts, trend reports from reputable sources like Ramdam, NapoleonCat, and Tokchart), it focuses on viral videos across diverse categories such as Entertainment, Music, Comedy, Lifestyle, Beauty, Sustainability, and Technology. The dataset is designed for data scientists, researchers, and enthusiasts interested in analyzing social media trends, predicting virality, or exploring multimodal machine learning applications (e.g., NLP, time-series, or clustering). It stands out from existing Kaggle datasets by offering fresh, 2025-specific data with rich metadata, including engagement metrics, hashtags, and sound/trend associations.
tiktok_data.csv).post:72, web:65).The dataset contains the following 12 columns:
- video_id: Unique identifier for each video or trend (integer or hashtag-based).
- author: Creator username or group (anonymized as "Unknown" where not specified).
- description: Brief summary of the video content or trend, derived from source context.
- upload_date: Approximate or exact posting date (YYYY-MM-DD).
- views: Reported view count (e.g., millions, billions for hashtag aggregates; "N/A" if unavailable).
- likes: Reported like count (e.g., thousands, millions; "N/A" if unavailable).
- shares: Share count (often "N/A" due to limited public data).
- comments: Comment count (often "N/A" due to limited public data).
- hashtags: Key hashtags associated with the video or trend (e.g., #Kpop, #Viral).
- category: Inferred content category (e.g., Entertainment, Music, Comedy, Lifestyle, Sustainability, Tech).
- sound_or_trend: Associated audio track or challenge name driving the trend (e.g., "Soda Pop dance", "JUMP").
- source: Citation of data origin (e.g., post:72 for X post ID, web:65 for web source ID).
#Perfume reaching 39.3B views.This dataset is ideal for a variety of machine learning and data analysis tasks on Kaggle, including but not limited to:
- Virality Prediction: Use views, likes, and hashtags to train regression or classification models (e.g., XGBoost, neural networks) to predict video success.
- Trend Analysis: Apply clustering (e.g., K-means) or topic modeling (e.g., LDA) to identify emerging content themes or regional differences.
- NLP Applications: Analyze descriptions and hashtags with BERT or word embeddings to study sentiment, cultural trends, or influencer impact.
- Time-Series Forecasting: Leverage upload_date and engagement metrics for temporal analysis of trend lifecycles.
- Recommendation Systems: Build content recommendation models based on category, sound, or hashtag similarities.
- Social Media Ethics: Explore AI-driven trends (e.g., deepfake Identity Swaps) for studies on misinformation or content authenticity.
#Ominous). Exact metrics may vary slightly due to real-time fluctuations.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
TikTok is one of the hottest social media platforms out there, and it's only getting bigger. If you're looking to get in on the action, this dataset is for you!
This dataset contains a collection of videos from TikTok, including information on the user who posted the video, the number of likes, shares, and comments the video received, as well as the video's length and description. With this data, you can see what types of videos are popular on TikTok and start planning your own viral content!
- The dataset contains a collection of videos from the social media platform TikTok.
- The videos include information on the user who posted the video, the number of likes, shares, and comments the video received, as well as the video's length and description.
- The dataset also contains information on popular TikTok authors, including their unique ID, nickname, avatar thumbnail, signature, and whether or not their account is verified or private.
- Additionally, the dataset includes a list of trending videos on TikTok, as well as the number of likes, shares, comments, and plays each video has received
- Identifying popular TikTok authors to target for scraping videos and liked videos
- Finding trending videos on TikTok for further analysis
- Generating a list of videos from the TikTok app that are tagged with the #funny hashtag
License
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: tiktok_collected_liked_videos.csv | Column name | Description | |:---------------|:---------------------------------------------------------| | user_name | The name of the user who posted the video. (String) | | n_likes | The number of likes the video has received. (Integer) | | n_shares | The number of shares the video has received. (Integer) | | n_comments | The number of comments the video has received. (Integer) | | n_plays | The number of times the video has been played. (Integer) |
File: tiktok_collected_videos.csv | Column name | Description | |:---------------|:---------------------------------------------------------| | user_name | The name of the user who posted the video. (String) | | n_likes | The number of likes the video has received. (Integer) | | n_shares | The number of shares the video has received. (Integer) | | n_comments | The number of comments the video has received. (Integer) | | n_plays | The number of times the video has been played. (Integer) |
File: tiktok_funny_hashtag_videos.csv | Column name | Description | |:--------------------------|:-----------------------------------------------------------| | author_nickname | The author's nickname. (String) | | author_avatarThumb | The author's avatar thumbnail. (String) | | author_signature | The author's signature. (String) | | author_verification | Whether or not the author's account is verified. (Boolean) | | author_privateAccount | Whether or not the author's account is private. (Boolean) | | author_followingCount | The number of people the author is following. (Integer) | | author_followerCount | The number of people following the author. (Integer) | | author_heartCount | The number of hearts the author has. (Integer) | | author_diggCount | The number of diggs the author has. (Integer) | | music_title | The title of the music. (String) | | music_playUrl | The play url of the music. (String) | | music_coverThumb | The cover thumbnail of the music. (String) | | music_authorName | The author name of the music. (String) | | music_originality | The originality of the music. (String) | | music_duration | The duration of the music. (String) |
File: trending_authors.csv | Column name | Description ...
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Phụng Trương Thu
Released under CC0: Public Domain
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is designed to help data scientists, analysts, and researchers understand, analyze, and predict viral content across major social media platforms. It captures realistic engagement behavior, sentiment signals, and content attributes that influence virality in today’s digital ecosystem.
The dataset includes multi-platform data from: - TikTok - Instagram - X (Twitter) - YouTube Shorts
Each platform is represented with consistent metrics, making cross-platform comparison easy and reliable.
Ideal for NLP tasks, sentiment analysis, and hashtag impact studies.
These metrics allow deep analysis of user interaction patterns.
Perfect for machine learning models and classification tasks.
Facebook
TwitterAs of January 2022, the hashtag "fyp," which stands for "for you page," was the most used hashtag on TikTok, amassing over 18.57 trillion views across posts using it. The hashtag "viral" ranked second, with approximately 6.3 trillion views on TikTok short-video posts using the hashtag. Posts using the hashtag "duet," which refers to TikTok videos that can be shared, mirrored, and commented on by creators, collected around 2.4 trillion views as of January 2022.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
TikTok Video Metadata Dataset – 700+ Entries This dataset contains metadata for over 700 TikTok videos, designed for training and testing machine learning models aimed at predicting video popularity, engagement, and virality. It includes key features such as video duration, text descriptions, hashtags, timestamps, author statistics, sound IDs, and engagement metrics (views and likes).
Key Features:
Video Metadata: id_video, duration_seconds, text_part, hashtags
Author Stats: author_followers, author_likes
Engagement Metrics: views, likes
Sound & Time: id_sound, human_time
Hashtags & Descriptions: Provided as comma-separated strings for easy parsing
Possible Use Cases:
Engagement Prediction: Build regression or classification models to predict views and likes.
Content & Hashtag Analysis: Identify which hashtags and text content correlate with higher engagement.
Author Influence Study: Explore how author popularity impacts video performance.
Time-based Analysis: Investigate posting time patterns.
NLP Applications: Perform text mining on video captions and hashtags.
Data Notes:
Contains real TikTok video metadata from various topics and regions.
Some fields may be empty (e.g., missing text or hashtags).
Suitable for educational projects, hackathons, and initial research in social media analytics.
Suggested Tasks:
Predict likes or views using regression models.
Classify videos into "viral" vs. "non-viral" based on a views threshold.
Cluster videos based on hashtags or content themes.
Analyze the impact of video length and posting time on engagement.
Facebook
TwitterThe dataset was originally obtained from TikTok's trending API by a GitHub user named Ivan Tran. It contains metadata on engagement with user-created videos and user profile data. The original create time is in Unix timecode format and is extracted directly from the video id number. TikTok's API has become much more difficult to access recently, so more current data is harder to obtain. The hashtags column contains lists.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset is an extension of the TikHarm dataset, created to enhance multimodal harmful content detection on TikTok. It was developed as part of the MTikGuard system, a real-time moderation pipeline designed to protect young audiences from unsafe TikTok videos.
🔹 Purpose
The dataset supplements TikHarm with 775 additional annotated videos, collected from TikTok trending and targeted hashtag queries. These videos were selected to address class imbalance and content diversity gaps in the original dataset, improving model generalization for real-world deployment.
🔹 Content
Each video is labeled into one of four categories: - Safe - Adult Content - Harmful Content (e.g., dangerous challenges, graphic violence) - Suicide / Self-harm
🔹 Data Collection & Annotation
Collection: Automated crawling using Selenium and TikTok Content Scraper, coordinated via Apache Airflow and Apache Kafka.
Annotation: Conducted via a custom web-based tool, following detailed guidelines to ensure consistency and reliability. Multiple annotators reviewed each video, with disagreements resolved via majority voting.
Class balance: Oversampling of underrepresented categories (e.g., Suicide, Harmful Content) during collection.
🔹 Applications
Training and evaluating multimodal classification models for harmful content detection.
Benchmarking real-time content moderation pipelines.
Research on multimodal fusion strategies and multi-label classification.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This social media content dataset is simulate realistic influencer posts across multiple popular platforms, reflecting diverse content types, sponsorship details, audience demographics, and engagement metrics. The dataset contains over 52,000 rows representing individual content posts generated over the past two years. It includes a balanced distribution of sponsored and non-sponsored content, with detailed disclosure information to support transparency studies and analyses. The variety of platforms, languages, content categories, and audience demographics makes this dataset ideal for exploring influencer marketing dynamics, content performance analytics, disclosure practices, and audience segmentation in social media research.
Dataset Features
id: Unique identifier for each content post (starting from 1).
platform: The social media platform where the content was posted. Values: YouTube, TikTok, Instagram, Bilibili, RedNote.
content_id: Unique ID for each content piece (e.g., content_0, content_1, …).
creator_id: Unique identifier for the content creator, cycling through 5000 distinct creators.
creator_name: Username of the content creator.
content_url: URL pointing to the content.
content_type: Format of the content. Values: video, image, text, mixed.
content_category: The main theme or niche of the content. Values: beauty, lifestyle, tech.
post_date: Timestamp of the post, randomly distributed over the past two years.
language: Language of the content, with probabilities favoring English. Values: English, Chinese, Spanish, Hindi, Japanese.
content_length: Length of the content in seconds (for video) or word count (for text), varying by content type.
content_description: Textual description or caption of the content.
hashtags: A comma-separated string of hashtags used in the post (0 to 5 tags).
views: Number of views (simulated via a Poisson distribution).
likes: Number of likes received.
shares: Number of shares.
comments_count: Count of comments on the post.
comments_text: Aggregated text of comments (0 to 5 comments concatenated).
follower_count: Number of followers the creator had at the time of posting.
is_sponsored: Boolean indicating whether the post is sponsored.
disclosure_type: Disclosure type regarding sponsorship for sponsored posts. Values: explicit, implicit, none (non-sponsored always 'none').
sponsor_name: Name of the sponsoring company if sponsored, else 'Not sponsors'.
sponsor_category: Sponsorship industry category. Values: cosmetics, electronics, fashion, food, gaming, travel or 'Not sponsors'.
disclosure_location: Where sponsorship disclosure appears in the post. Values: video, caption, hashtags, none (non-sponsored always 'none').
audience_age_distribution: Predominant age group of the audience. Values: 13-18, 19-25, 26-35, 36-50, 50+.
audience_gender_distribution: Predominant gender of the audience. Values: male, female, non-binary, unknown.
audience_location: Primary geographic location of the audience. Values: USA, China, India, Japan, Brazil, Germany, UK, Russia.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A global dataset capturing short-form video performance across YouTube Shorts and TikTok in 2025.
It includes over 50,000 video records, available in both raw and machine learning–ready formats.
Designed for reproducible EDA, dashboarding, and baseline ML modeling on social media engagement dynamics.
| File | Description | Shape |
|---|---|---|
youtube_shorts_tiktok_trends_2025.csv | Raw video-level data with full feature set | ~48k × ~58 |
youtube_shorts_tiktok_trends_2025_ml.csv | ML-ready, cleaned and engineered version | ~50k × 32 |
monthly_trends_2025.csv | Monthly aggregates (Jan–Aug 2025) | ~480 × 8 |
country_platform_summary_2025.csv | Country × platform summary statistics | ~60 × 14 |
top_hashtags_2025.csv | Ranked list of top trending hashtags | ~82 × 18 |
top_creators_impact_2025.csv | Creator-level impact and influence metrics | ~1,000 × 20 |
DATA_DICTIONARY.csv | Column names and definitions | ~58 × 2 |
All files are UTF-8 encoded, cleaned, and schema-aligned for direct analysis.
video_id, platform, country, category, creator_tierviews, likes, comments, shares, saves, completionsengagement_rate = (likes + comments + shares) / views, plus save_rate, share_rate, comment_ratetrend_label or predict engagement_rate and views trend_label is a snapshot trend proxy; baseline models typically reach 25–35% accuracy without temporal features. publish_date_approx is derived and coarse — for trend direction only. If you find this dataset helpful, supporting it with an upvote helps others discover it too ✨
Facebook
TwitterTikTok's platform is mostly fueled by viral videos of users doing outlandish, scary, or funny things. On the platform, these trend and meme videos typically come with a hashtag that includes the word challenge. But what is a TikTok challenge and how do you find or create them? Here's everything you need to know.
This TikTok book challenge was made by @haleyisfearless, . It asks you to show, your prettiest book,your tiniest book a book you highly suggest a book you're currently reading and one of your favorite books . In the most basic sense, these challenges originate from viral TikTok challenge isn't complete without its defining hashtag in the video's description
These TikTok challenges are the perfect way to ease into what can be an intimidating social media platform and help you find your fellow book lovers.
This dataset is generated entirely from TikTok , so we want to thank @haleyisfearless for building such this challange video
the goal of this project is to make Python script which takes a video as input and returns all texts visible on the video. the videos are titlok videos so texts can appear everywhere on screen, with different background, font size etc..
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Amber Heard TikTok Data from 2022 under 57 hashtags. Videos with full Metrics and information fields. On the Disinformation Operation harming Human Rights Activist Amber Heard. Comments of each post are included in the scraper.
TikTok Hashtags: - Positive, Neutral, and Negative of 57 hashtags. Positive and Neutral: 1. amberheard 2. amberheardmera 3. amberheardisinnocent 4. amberheardaquaman 5. amberheardisasurvivor 6. amberheardisavictim 7. ibelieveamberheard 8. darvodepp 9. istandwithamber 10. istandwithamberheard 11. loveamberheard 12. wearewithyouamberheard 13. westandwithamberheard 14. standwithamberheard 15. teamah 16. teamamberheard 17. justiceforamberheard 18. johnnydeppisawifebeater 19. johnnydeppisguilty
Negative: 1. aclusupportsabusers 2. amberhearddoesnotspeakforme 3. amberheardforjail 4. amberheardforprison 5. amberheardisacriminal 6. amberheardisafraud 7. amberheardisanabuser 8. amberheardisapsycopath 9. amberheardisguilty 10. amberheardisoverparty 11. amberheardjohnnydepp 12. amberheardperjury 13. amberheardslawyersucks 14. amberheardtrial 15. amberheard💩 16. amberheard🤡 17. amberheard🤮 18. amberpoop 19. amberturd 20. boycottaquaman2 21. boycottloreal 22. boycottwarnerbros 23. boycottwarnerbrothers 24. deppheardtrial 25. deppvheardtrial 26. deppvsheard 27. fireamberheard 28. istandbyjohnnydepp 29. johnnydepp 30. johnnydeppamberheard 31. johnnydeppisinnocent 32. johnnydepptrial 33. johnnydeppvsamberheard 34. justiceforjohnnydepp 35. putamberheardinjail 36. recastmera 37. teamjd 38. teamjohnnydepp
Each Hashtag Feed shows 1000 videos per day of collections.
From Public Research Study: https://github.com/RescueSocialTech/Amber-Heard_Disinformation_Operations_Bots
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset provides a comprehensive and diverse snapshot of social media users and their engagements across various popular platforms such as Instagram, Twitter, Facebook, YouTube, Pinterest, TikTok, and Spotify. With 100 rows of anonymized data, it offers valuable insights into the dynamic world of social media usage. 😀
Each row in the dataset represents a unique user with a designated User ID and Username to ensure anonymity. Alongside user-specific details, the dataset captures essential information, including the platform being used, the post's content, timestamp, and media type (text, image, or video). Additionally, it tracks engagement metrics such as likes, comments, shares/retweets, and user interactions, providing an overview of the user's popularity and social impact. 💬
https://media.giphy.com/media/3GSoFVODOkiPBFArlu/giphy.gif" alt="social">
The dataset also includes pertinent user attributes, such as account creation date, privacy settings, number of followers, and following. The users' profiles are further enriched with demographic characteristics, including anonymized representations of their age group and gender. 🗨️
https://media.giphy.com/media/2tSodgDfwCjIMCBY8h/giphy.gif" alt="socialcat">
Hashtags, mentions, media URLs, post URLs, and self-reported location contribute to understanding user interests, content themes, and geographic distribution. Moreover, users' bios and language preferences offer insights into their passions, activities, and linguistic communication on the platforms.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
📱 About Dataset Overview This Social Media Engagement Dataset contains comprehensive engagement metrics from 5,000 social media posts across six major platforms: Instagram, Twitter, Facebook, LinkedIn, TikTok, and YouTube. The dataset spans over 2 years (2024-2025) and provides valuable insights into content performance, audience engagement patterns, and influencer analytics.
Dataset Contents The dataset includes 20 detailed features covering various aspects of social media engagement:
Post Information Post_ID: Unique identifier for each post Timestamp: Date and time when the post was published Platform: Social media platform (Instagram, Twitter, Facebook, LinkedIn, TikTok, YouTube) Content_Type: Type of content (Photo, Video, Reel, Tweet, Story, etc.) Category: Content category (Technology, Fashion, Food, Travel, Fitness, Education, Entertainment, Business, Lifestyle, Gaming, Health, Sports) Engagement Metrics Likes: Number of likes/reactions received Comments: Number of comments on the post Shares: Number of shares/retweets/reposts Views: Total number of views Saves: Number of bookmarks/saves Engagement_Rate: Calculated engagement rate percentage Account Information Follower_Count: Number of followers of the account Influencer_Tier: Classification (Nano, Micro, Mid-tier, Macro) Is_Verified: Whether the account is verified (True/False) Content Characteristics Hashtag_Count: Number of hashtags used Content_Length: Length in characters (text) or seconds (video) Sentiment: Sentiment analysis (Positive, Neutral, Negative) Has_Media: Whether post contains media (True/False) Temporal Features Hour_of_Day: Hour when the post was published (0-23) Day_of_Week: Day of the week (Monday-Sunday) Use Cases This dataset is perfect for:
📊 Predictive Analytics: Build ML models to predict engagement rates 📈 Data Visualization: Create insightful dashboards and charts 🤖 Machine Learning: Classification, regression, and clustering tasks ⏰ Time Series Analysis: Analyze posting patterns and optimal timing 🎯 Content Strategy: Optimize content strategy based on data insights 🔍 Sentiment Analysis: Study correlation between sentiment and engagement 📱 Platform Comparison: Compare performance across different platforms 💼 Influencer Marketing: Analyze influencer tier performance Technical Details Format: CSV Size: ~651 KB Rows: 5,000 Columns: 20 Time Period: January 2024 - December 2025 Missing Values: None Potential Research Questions What time of day generates the most engagement? Which platform has the highest engagement rates? How does content type affect performance? Does verified status impact engagement? What's the optimal hashtag count? How does sentiment correlate with engagement? Notes Engagement metrics are platform-realistic and proportional All data is synthetically generated for educational and research purposes Suitable for beginners and advanced data scientists
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was curated to support machine learning models that predict movie success based on a wide range of multi-modal features, including cast popularity, sentiment analysis, audio-visual cues, social media engagement, and metadata such as budget and IMDb rating.
The dataset consists of 36 engineered features extracted from various sources:
Each row represents one movie. The dataset is ideal for classification or regression tasks related to box office success, revenue prediction, or audience engagement forecasting.
| Feature Code | Feature Name |
|---|---|
| Feature_1 | cast_trend_1 |
| Feature_2 | cast_trend_2 |
| Feature_3 | cast_trend_3 |
| Feature_4 | avg_cast_popularity |
| Feature_5 | top_cast_popularity |
| Feature_6 | genre_score |
| Feature_7 | positive_sentiment |
| Feature_8 | neutral_sentiment |
| Feature_9 | negative_sentiment |
| Feature_10 | num_youtube_comments |
| Feature_11 | num_cast_members |
| Feature_12 | num_upcoming_movies |
| Feature_13 | avg_upcoming_popularity |
| Feature_14 | max_upcoming_popularity |
| Feature_15 | tiktok_hashtag_views |
| Feature_16 | tiktok_video_count |
| Feature_17 | tiktok_total_likes |
| Feature_18 | tiktok_total_comments |
| Feature_19 | tiktok_total_shares |
| Feature_20 | tiktok_engagement_rate |
| Feature_21 | audio_tempo |
| Feature_22 | audio_energy_mean |
| Feature_23 | audio_energy_variance |
| Feature_24 | audio_spectral_centroid_mean |
| Feature_25 | audio_spectral_rolloff_mean |
| Feature_26 | video_brightness_mean |
| Feature_27 | video_colorfulness_mean |
| Feature_28 | video_scene_change_rate |
| Feature_29 | video_emotion_happy |
| Feature_30 | video_emotion_sad |
| Feature_31 | imdb_rating |
| Feature_32 | budget |
| Feature_33 | log_budget |
| Feature_34 | sqrt_budget |
| Feature_35 | budget_squared |
| Feature_36 | budget_rating_interaction |
🚀 Whether you're working on predictive modeling, multimedia analysis, or social signal correlation, this dataset provides a diverse feature set to explore what makes a movie successful.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is the Dataset of popular hashtags on TikTok, this includes the author name, author id, author signature, comment count, hashtags details, URL, share count, hashtags which i scrape are meme, funny, humor, comedy, education, lol, dance, song, music, etc.