Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Influencers are categorized by the number of followers they have on social media. They include celebrities with large followings to niche content creators with a loyal following on social-media platforms such as YouTube, Instagram, Facebook, and Twitter.Their followers range in number from hundreds of millions to 1,000. Influencers may be categorized in tiers (mega-, macro-, micro-, and nano-influencers), based on their number of followers.
Businesses pursue people who aim to lessen their consumption of advertisements, and are willing to pay their influencers more. Targeting influencers is seen as increasing marketing's reach, counteracting a growing tendency by prospective customers to ignore marketing.
Marketing researchers Kapitan and Silvera find that influencer selection extends into product personality. This product and benefit matching is key. For a shampoo, it should use an influencer with good hair. Likewise, a flashy product may use bold colors to convey its brand. If an influencer is not flashy, they will clash with the brand. Matching an influencer with the product's purpose and mood is important.
https://sceptermarketing.com/wp-content/uploads/2019/02/social-media-influencers-2l4ues9.png">
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The idea is the figure out the success ratio of youtube content creators and answer some of the basic questions like, how videos is takes for a channel to become successful, what language to choose, what type of content works and establish proof of success with Data and help them make a decision.
Hence the entire team of Business Analyst Interns at KultureHire took the responsibility of collecting and cleaning the data and brought it to an decent shape.
The dataset has 22 fields/columns and over 900 rows or 900 different videos from various youtube channels to it.
Preferred file format is Xlsx or CSV.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
YouTube is an American online video sharing and social media platform headquartered in San Bruno, California. It was launched on February 14, 2005, by Steve Chen, Chad Hurley, and Jawed Karim. It is owned by Google, and is the second most visited website, after Google Search. YouTube has more than 2.5 billion monthly users who collectively watch more than one billion hours of videos each day. As of May 2019, videos were being uploaded at a rate of more than 500 hours of content per minute.
In October 2006, 18 months after posting its first video and 10 months after its official launch, YouTube was bought by Google for $1.65 billion. Google's ownership of YouTube expanded the site's business model, expanding from generating revenue from advertisements alone, to offering paid content such as movies and exclusive content produced by YouTube. It also offers YouTube Premium, a paid subscription option for watching content without ads. YouTube and approved creators participate in Google's AdSense program, which seeks to generate more revenue for both parties. YouTube reported revenue of $19.8 billion in 2020. In 2021, YouTube's annual advertising revenue increased to $28.8 billion.
This dataset consists details on top 1000 influencers all over the world.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The dataset provides structured information about the top 100 influencers from various countries globally. Each entry represents an influencer and includes the following attributes:
Facebook
Twitterhttps://brightdata.com/licensehttps://brightdata.com/license
Use our YouTube profiles dataset to extract both business and non-business information from public channels and filter by channel name, views, creation date, or subscribers. Datapoints include URL, handle, banner image, profile image, name, subscribers, description, video count, create date, views, details, and more. You may purchase the entire dataset or a customized subset, depending on your needs. Popular use cases for this dataset include sentiment analysis, brand monitoring, influencer marketing, and more.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains anonymized YouTube comment data associated with the 2019 online controversy known as Dramageddon, involving beauty influencers James Charles, Tati Westbrook, and Jeffree Star. The dataset was created for research on online hostility, cancel culture, and competitive communication dynamics among influencers.
The dataset includes public user comments collected from 14 YouTube videos posted during May–June 2019, including primary source videos from the influencers involved and reaction videos from commentary channels. A total of ~15,000 comments were collected using the YouTube Data API v3. All comments are anonymized and contain no personally identifiable information.
Each comment record is enriched with metadata and derived variables, including: - Sentiment score (range −1 to +1) - Toxicity score (probability 0–1) - Cancel behavior classification (cold, cool, hot) - Moral language category - Engagement metrics (likes, reply depth) - Time of posting - Video-level metadata (creator, phase of controversy)
This dataset supports research in computational social science, communication studies, digital sociology, and platform governance. It has been used in studies on cancel culture, moral contagion, algorithmic amplification, and influencer reputation dynamics. This dataset contains only publicly available YouTube comments retrieved in accordance with the YouTube Terms of Service. All usernames, channel IDs, and profile references were hashed or removed during preprocessing to ensure anonymization. No attempts were made to identify or contact any YouTube users. The dataset is provided strictly for research purposes. Users must agree to comply with ethical guidelines for internet research (AoIR 2019) and cite the dataset appropriately.
Facebook
TwitterYouTube is a global online video sharing and social media platform headquartered in San Bruno, California. It was launched on February 14, 2005, by Steve Chen, Chad Hurley, and Jawed Karim. It is owned by Google, and is the second most visited website, after Google Search. YouTube has more than 2.5 billion monthly users who collectively watch more than one billion hours of videos each day.
File containing two dataset about 100 top YouTube channels in world and India, based upon subscription. Both the dataset contains 6 columns. Column is named as ranking, channel_name, category, subscribers and average view.
url="https://www.noxinfluencer.com/youtube-channel-rank/top-100-all-all-youtuber-sorted-by-subs-weekly"
Facebook
Twitterhttps://brightdata.com/licensehttps://brightdata.com/license
Use our YouTube Videos dataset to extract detailed information from public videos and filter by video title, views, upload date, or likes. Data points include video URL, title, description, thumbnail, upload date, view count, like count, comment count, tags, and more. You can purchase the entire dataset or a customized subset, tailored to your needs. Popular use cases for this dataset include trend analysis, content performance tracking, brand monitoring, and influencer campaign optimization.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
My own work, created with the YouTube API. Over 50,000 entries crawled circa 10/2020.
Primarily contains product review influencers and other influencers.
Not at all exhaustive!
Facebook
TwitterBabbl Labs' YouTube Public Company & Brand Mentions dataset enables enterprise-level intelligence from unstructured YouTube video content, transformed into actionable insights for brands, PR consultancies, investment firms, and more.
With over 30,000 curated channels and more than 1 million videos per month, this dataset provides unprecedented visibility into how products, executives and messaging resonate with consumers across the world's largest video platform.
Our proprietary platform combines advanced AI/ML technologies to deliver real-time brand monitoring and influencer tracking. The core innovation is our proprietary voice-print technology that identifies and tracks 50,000+ executives, experts, analysts, and influencers with unprecedented accuracy across channels and appearances.
Advanced NLP maps brand mentions, product references, and competitor comparisons across millions of hours of content. Multi-dimensional sentiment analysis algorithms detect brand perception, purchase intent, and viral conversation trends, delivering structured insights through enterprise-grade dashboards and S3/API access.
Facebook
TwitterBabbl Labs' YouTube Public Company & Brand Mentions dataset enables enterprise-level intelligence from unstructured YouTube video content, transformed into actionable insights for brands, PR consultancies, investment firms, and more.
With over 30,000 curated channels and more than 1 million videos per month, this dataset provides unprecedented visibility into how products, executives and messaging resonate with consumers across the world's largest video platform.
Our proprietary platform combines advanced AI/ML technologies to deliver real-time brand monitoring and influencer tracking. The core innovation is our proprietary voice-print technology that identifies and tracks 50,000+ executives, experts, analysts, and influencers with unprecedented accuracy across channels and appearances.
Advanced NLP maps brand mentions, product references, and competitor comparisons across millions of hours of content. Multi-dimensional sentiment analysis algorithms detect brand perception, purchase intent, and viral conversation trends, delivering structured insights through enterprise-grade dashboards and S3/API access.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset captures the pulse of viral social media trends across TikTok, Instagram, Twitter, and YouTube. It provides insights into the most popular hashtags, content types, and user engagement levels, offering a comprehensive view of how trends unfold across platforms. With regional data and influencer-driven content, this dataset is perfect for:
Dive in to explore what makes content go viral, the behaviors that drive engagement, and how trends evolve on a global scale! 🌍
Facebook
TwitterBabbl Labs' YouTube Public Company & Brand Mentions dataset enables enterprise-level intelligence from unstructured YouTube video content, transformed into actionable insights for brands, PR consultancies, investment firms, and more.
With over 30,000 curated channels and more than 1 million videos per month, this dataset provides unprecedented visibility into how products, executives and messaging resonate with consumers across the world's largest video platform.
Our proprietary platform combines advanced AI/ML technologies to deliver real-time brand monitoring and influencer tracking. The core innovation is our proprietary voice-print technology that identifies and tracks 50,000+ executives, experts, analysts, and influencers with unprecedented accuracy across channels and appearances.
Advanced NLP maps brand mentions, product references, and competitor comparisons across millions of hours of content. Multi-dimensional sentiment analysis algorithms detect brand perception, purchase intent, and viral conversation trends, delivering structured insights through enterprise-grade dashboards and S3/API access.
Facebook
TwitterTo help the influencer Marketing campaigns for Brands and agencies to analyze the trust worthiness of Influncers across India, we at YourExcelguy took this initiative to collect and analyze the data of the influencers (micro & macro influencers).
File Format is Xlsx
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides a comprehensive look at Dhruv Rathee's YouTube channel for 2024, including key metrics like video views, likes, comments, and engagement rates. With Dhruv Rathee's focus on political, social, and educational content, this dataset is ideal for analyzing content trends, audience engagement, and the impact of influencer-driven education. Whether for data science projects, trend analysis, or social media insights, this dataset offers valuable information on one of India's prominent YouTube creators.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This social media content dataset is simulate realistic influencer posts across multiple popular platforms, reflecting diverse content types, sponsorship details, audience demographics, and engagement metrics. The dataset contains over 52,000 rows representing individual content posts generated over the past two years. It includes a balanced distribution of sponsored and non-sponsored content, with detailed disclosure information to support transparency studies and analyses. The variety of platforms, languages, content categories, and audience demographics makes this dataset ideal for exploring influencer marketing dynamics, content performance analytics, disclosure practices, and audience segmentation in social media research.
Dataset Features
id: Unique identifier for each content post (starting from 1).
platform: The social media platform where the content was posted. Values: YouTube, TikTok, Instagram, Bilibili, RedNote.
content_id: Unique ID for each content piece (e.g., content_0, content_1, …).
creator_id: Unique identifier for the content creator, cycling through 5000 distinct creators.
creator_name: Username of the content creator.
content_url: URL pointing to the content.
content_type: Format of the content. Values: video, image, text, mixed.
content_category: The main theme or niche of the content. Values: beauty, lifestyle, tech.
post_date: Timestamp of the post, randomly distributed over the past two years.
language: Language of the content, with probabilities favoring English. Values: English, Chinese, Spanish, Hindi, Japanese.
content_length: Length of the content in seconds (for video) or word count (for text), varying by content type.
content_description: Textual description or caption of the content.
hashtags: A comma-separated string of hashtags used in the post (0 to 5 tags).
views: Number of views (simulated via a Poisson distribution).
likes: Number of likes received.
shares: Number of shares.
comments_count: Count of comments on the post.
comments_text: Aggregated text of comments (0 to 5 comments concatenated).
follower_count: Number of followers the creator had at the time of posting.
is_sponsored: Boolean indicating whether the post is sponsored.
disclosure_type: Disclosure type regarding sponsorship for sponsored posts. Values: explicit, implicit, none (non-sponsored always 'none').
sponsor_name: Name of the sponsoring company if sponsored, else 'Not sponsors'.
sponsor_category: Sponsorship industry category. Values: cosmetics, electronics, fashion, food, gaming, travel or 'Not sponsors'.
disclosure_location: Where sponsorship disclosure appears in the post. Values: video, caption, hashtags, none (non-sponsored always 'none').
audience_age_distribution: Predominant age group of the audience. Values: 13-18, 19-25, 26-35, 36-50, 50+.
audience_gender_distribution: Predominant gender of the audience. Values: male, female, non-binary, unknown.
audience_location: Primary geographic location of the audience. Values: USA, China, India, Japan, Brazil, Germany, UK, Russia.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset tracks influencer marketing campaigns across major social media platforms, providing a robust foundation for analyzing campaign effectiveness, engagement, reach, and sales outcomes. Each record represents a unique campaign and includes details such as the campaign’s platform (Instagram, YouTube, TikTok, Twitter), influencer category (e.g., Fashion, Tech, Fitness), campaign type (Product Launch, Brand Awareness, Giveaway, etc.), start and end dates, total user engagements, estimated reach, product sales, and campaign duration. The dataset structure supports diverse analyses, including ROI calculation, campaign benchmarking, and influencer performance comparison.
Columns:
- campaign_id: Unique identifier for each campaign
- platform: Social media platform where the campaign ran
- influencer_category: Niche or industry focus of the influencer
- campaign_type: Objective or style of the campaign
- start_date, end_date: Campaign time frame
- engagements: Total user interactions (likes, comments, shares, etc.)
- estimated_reach: Estimated number of unique users exposed to the campaign
- product_sales: Number of products sold as a result of the campaign
- campaign_duration_days: Duration of the campaign in days
import pandas as pd
df = pd.read_csv('influencer_marketing_roi_dataset.csv', parse_dates=['start_date', 'end_date'])
print(df.head())
print(df.info())
# Overview of campaign types and platforms
print(df['campaign_type'].value_counts())
print(df['platform'].value_counts())
# Summary statistics
print(df[['engagements', 'estimated_reach', 'product_sales']].describe())
# Average engagements and sales by platform
platform_stats = df.groupby('platform')[['engagements', 'product_sales']].mean()
print(platform_stats)
# Top influencer categories by product sales
top_categories = df.groupby('influencer_category')['product_sales'].sum().sort_values(ascending=False)
print(top_categories)
# Assume a fixed campaign cost for demonstration
df['campaign_cost'] = 500 + df['estimated_reach'] * 0.01 # Example formula
# Calculate ROI: (Revenue - Cost) / Cost
# Assume each product sold yields $40 revenue
df['revenue'] = df['product_sales'] * 40
df['roi'] = (df['revenue'] - df['campaign_cost']) / df['campaign_cost']
# View campaigns with highest ROI
top_roi = df.sort_values('roi', ascending=False).head(10)
print(top_roi[['campaign_id', 'platform', 'roi']])
import matplotlib.pyplot as plt
import seaborn as sns
# Engagements vs. Product Sales scatter plot
plt.figure(figsize=(8,6))
sns.scatterplot(data=df, x='engagements', y='product_sales', hue='platform', alpha=0.6)
plt.title('Engagements vs. Product Sales by Platform')
plt.xlabel('Engagements')
plt.ylabel('Product Sales')
plt.legend()
plt.show()
# Average ROI by Influencer Category
category_roi = df.groupby('influencer_category')['roi'].mean().sort_values()
category_roi.plot(kind='barh', color='teal')
plt.title('Average ROI by Influencer Category')
plt.xlabel('Average ROI')
plt.show()
# Campaigns over time
df['month'] = df['start_date'].dt.to_period('M')
monthly_sales = df.groupby('month')['product_sales'].sum()
monthly_sales.plot(figsize=(10,4), marker='o', title='Monthly Product Sales from Influencer Campaigns')
plt.ylabel('Product Sales')
plt.show()
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://www.youtube.com/watch?v=pLF0bhcY5l8" alt="">
From Youtube channel - Half Moroccan / Half Filipina - Actress in the Philippines
Channel details - www.youtube.com/@IvanaAlawi - 17.4M subscribers - 198 videos - 1,439,294,300 views - Joined Jun 1, 2018 - Philippines
Regarded as one of the biggest social media influencers of her time, Alawi is the most subscribed Filipino celebrity on YouTube, having been honored by Google as the "Top YouTube Content Creator" in the Philippines for two consecutive years. In 2019, she won "Best New Female TV Personality" at the PMPC Star Awards for Television. In 2021, Alawi was ranked fourth on the "100 Most Beautiful Faces in the World" list by TC Candler.
From Official Youtube Channel https://www.youtube.com/@IvanaAlawi
There may be some missing videos esp if the channel has more than 600+ videos, this is because the API itself doesn't return all the videos as explained in this Stackoverlow post.
Facebook
TwitterThis dataset is for local (Saudi Arabia) social media influencers, and the dataset is built using web scraping to get influencers information from https://influence.co/category/riyadh . The dataset focused on Instagram influencers in Saudi Arabia and contains 5 attributes and 243 rows. In particular, the dataset has the Instagram id for the influencers,number of followers, the category name that they belong to and level of impact of influencers on Instagramwhich is the avg engagement rate.
Data source : https://influence.co/category/riyadh
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This data set contains combined on-court performance data for NBA players in the 2016-2017 season, alongside salary, Twitter engagement, and Wikipedia traffic data.
Further information can be found in a series of articles for IBM Developerworks: "Explore valuation and attendance using data science and machine learning" and "Exploring the individual NBA players".
A talk about this dataset has slides from March, 2018, Strata:
Further reading on this dataset is in the book Pragmatic AI, in Chapter 6 or full book, Pragmatic AI: An introduction to Cloud-based Machine Learning and watch lesson 9 in Essential Machine Learning and AI with Python and Jupyter Notebook
You can watch a breakdown of using cluster analysis on the Pragmatic AI YouTube channel
Learn to deploy a Kaggle project into a production Machine Learning sklearn + flask + container by reading Python for Devops: Learn Ruthlessly Effective Automation, Chapter 14: MLOps and Machine learning engineering
Use social media to predict a winning season with this notebook: https://github.com/noahgift/core-stats-datascience/blob/master/Lesson2_7_Trends_Supervized_Learning.ipynb
Learn to use the cloud for data analysis.
Data sources include ESPN, Basketball-Reference, Twitter, Five-ThirtyEight, and Wikipedia. The source code for this dataset (in Python and R) can be found on GitHub. Links to more writing can be found at noahgift.com.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Influencers are categorized by the number of followers they have on social media. They include celebrities with large followings to niche content creators with a loyal following on social-media platforms such as YouTube, Instagram, Facebook, and Twitter.Their followers range in number from hundreds of millions to 1,000. Influencers may be categorized in tiers (mega-, macro-, micro-, and nano-influencers), based on their number of followers.
Businesses pursue people who aim to lessen their consumption of advertisements, and are willing to pay their influencers more. Targeting influencers is seen as increasing marketing's reach, counteracting a growing tendency by prospective customers to ignore marketing.
Marketing researchers Kapitan and Silvera find that influencer selection extends into product personality. This product and benefit matching is key. For a shampoo, it should use an influencer with good hair. Likewise, a flashy product may use bold colors to convey its brand. If an influencer is not flashy, they will clash with the brand. Matching an influencer with the product's purpose and mood is important.
https://sceptermarketing.com/wp-content/uploads/2019/02/social-media-influencers-2l4ues9.png">