https://brightdata.com/licensehttps://brightdata.com/license
Use our Instagram dataset (public data) to extract business and non-business information from complete public profiles and filter by hashtags, followers, account type, or engagement score. Depending on your needs, you may purchase the entire dataset or a customized subset. Popular use cases include sentiment analysis, brand monitoring, influencer marketing, and more. The dataset includes all major data points: # of followers, verified status, account type (business / non-business), links, posts, comments, location, engagement score, hashtags, and much more.
The Top Instagram Accounts Dataset is a collection of 200 rows of data that provides valuable insights into the most popular Instagram accounts across different categories. The dataset contains several columns that provide comprehensive information on each account's performance, engagement rate, and audience size.
1. The "rank": column lists the accounts in order of their popularity on Instagram, starting from the most followed account.
2. The "name": column displays the Instagram handle of the account, which can be used to locate and follow the account on Instagram.
3. The "channel_info": column provides a brief description of the account, such as the type of content it features or the products and services it offers.
4. The "Category": column categorizes the account based on its primary theme or subject matter, such as fashion, sports, entertainment, or food.
5. The "posts": column displays the total number of posts on the account. This column helps to understand the account's level of activity and the amount of content it has produced over time.
6. The "followers": column indicates the number of people who follow the account on Instagram.
7. The "avg likes": column displays the average number of likes that the account's posts receive per post.
8. The "eng rate": column calculates the account's engagement rate by dividing the total number of likes and comments received by the total number of followers, expressed as a percentage.
The Top Instagram Accounts Dataset can be used in a variety of ways to gain insights into the performance and engagement levels of popular Instagram accounts. Here are a few examples of what you can do with this dataset:
1. Conduct category analysis: The dataset provides information on the category of each Instagram account. You can use this information to conduct a category analysis and identify the most popular categories on Instagram.
2. Identify top influencers: The dataset ranks Instagram accounts based on their follower count. You can use this information to identify the top influencers in different categories and use them for influencer marketing campaigns.
3. Analyze engagement levels: The dataset includes columns such as "avg likes" and "eng rate" that provide insights into the engagement levels of Instagram accounts. You can use this information to understand what type of content resonates with Instagram users and create more engaging content for your own account.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Instagram data-download example dataset
In this repository you can find a data-set consisting of 11 personal Instagram archives, or Data-Download Packages (DDPs).
How the data was generated
These Instagram accounts were all new and generated by a group of researchers who were interested to figure out in detail the structure and variety in structure of these Instagram DDPs. The participants user the Instagram account extensively for approximately a week. The participants also intensively communicated with each other so that the data can be used as an example of a network.
The data was primarily generated to evaluate the performance of de-identification software. Therefore, the text in the DDPs particularly contain many randomly chosen (Dutch) first names, phone numbers, e-mail addresses and URLS. In addition, the images in the DDPs contain many faces and text as well. The DDPs contain faces and text (usernames) of third parties. However, only content of so-called `professional accounts' are shared, such as accounts of famous individuals or institutions who self-consciously and actively seek publicity, and these sources are easily publicly available. Furthermore, the DDPs do not contain sensitive personal data of these individuals.
Obtaining your Instagram DDP
After using the Instagram accounts intensively for approximately a week, the participants requested their personal Instagram DDPs by using the following steps. You can follow these steps yourself if you are interested in your personal Instagram DDP.
Instagram then delivered the data in a compressed zip folder with the format username_YYYYMMDD.zip (i.e., Instagram handle and date of download) to the participant, and the participants shared these DDPs with us.
Data cleaning
To comply with the Instagram user agreement, participants shared their full name, phone number and e-mail address. In addition, Instagram logged the i.p. addresses the participant used during their active period on Instagram. After colleting the DDPs, we manually replaced such information with random replacements such that the DDps shared here do not contain any personal data of the participants.
How this data-set can be used
This data-set was generated with the intention to evaluate the performance of the de-identification software. We invite other researchers to use this data-set for example to investigate what type of data can be found in Instagram DDPs or to investigate the structure of Instagram DDPs. The packages can also be used for example data-analyses, although no substantive research questions can be answered using this data as the data does not reflect how research subjects behave `in the wild'.
Authors
The data collection is executed by Laura Boeschoten, Ruben van den Goorbergh and Daniel Oberski of Utrecht University. For questions, please contact l.boeschoten@uu.nl.
Acknowledgments
The researchers would like to thank everyone who participated in this data-generation project.
In 2021, there were 1.21 billion monthly active users of Meta's Instagram, making up over 28 percent of the world's internet users. By 2025, it has been forecast that there will be 1.44 billion monthly active users of the social media platform, which would account for 31.2 percent of global internet users.
How popular is Instagram?
Instagram, as of January 2022, was the fourth most popular social media platform in the world in terms of user numbers. YouTube and WhatsApp ranked in second and third place, respectively, whilst Facebook remained the most popular, with almost three billion monthly active users worldwide.
India had the largest number of Instagram users as of January 2022, with a total of over 230 million users in the country. The second-largest Instagram audience could be found in the United States, with almost 160 million people subscribing to the photo and video sharing app.
Gen Z and Instagram
As of September 2021, Gen Z users in the United States spent an average of five hours per week on Instagram. Although Instagram ranked third in terms of hours per week spent on the platform, Gen Z users spent considerably more time on TikTok, amounting to a weekly average of over 10 hours being spent on the mobile-first video app.
Most followed accounts on Instagram
As of May 2022, Instagram’s own account had 504.37 million followers. In terms of celebrities, Portuguese footballer Cristiano Ronaldo (@chistiano) had over 440.41 million followers on the social network. Moreover, the average media value of an Instagram post by Ronaldo was over 985,000 U.S. dollars.
The most liked post on Instagram as of May 2022 was Photo of an Egg, which was posted in 2019 by the account @world_record_egg. Photo of an Egg has not only exceeded 55 million likes on the platform, but it also has nearly 3.5 million comments, and the account itself has over 4.5 million Instagram followers. After mysterious posts published by the account, World Record Egg revealed itself as part of a mental health campaign aimed at the difficulties and demands of using social media.
Cristiano Ronaldo has one of the most popular Instagram accounts as of April 2024.
The Portuguese footballer is the most-followed person on the photo sharing app platform with 628 million followers. Instagram's own account was ranked first with roughly 672 million followers.
How popular is Instagram?
Instagram is a photo-sharing social networking service that enables users to take pictures and edit them with filters. The platform allows users to post and share their images online and directly with their friends and followers on the social network. The cross-platform app reached one billion monthly active users in mid-2018. In 2020, there were over 114 million Instagram users in the United States and experts project this figure to surpass 127 million users in 2023.
Who uses Instagram?
Instagram audiences are predominantly young – recent data states that almost 60 percent of U.S. Instagram users are aged 34 years or younger. Fall 2020 data reveals that Instagram is also one of the most popular social media for teens and one of the social networks with the biggest reach among teens in the United States.
Celebrity influencers on Instagram
Many celebrities and athletes are brand spokespeople and generate additional income with social media advertising and sponsored content. Unsurprisingly, Ronaldo ranked first again, as the average media value of one of his Instagram posts was 985,441 U.S. dollars.
Context Fakes and spammers are a major problem on all social media platforms, including Instagram. This is the subject of my final-year project in which I set out to find ways of detecting them using machine learning. In this dataset fake and spammer are interchangeable terms.
Content I have personally identified the spammer/fake accounts included in this dataset after carefully examining each instance and as such the dataset has high level of accuracy though there might be a couple of misidentified accounts in the spammers list as well. The dataset has been collected using a crawler from 15-19, March 2019.
Inspiration This dataset could be further improved in quantity and quality measures, but how much accuracy can it achieve? Possible ways of using the models to tackle the problem?
As of April 2024, almost 32 percent of global Instagram audiences were aged between 18 and 24 years, and 30.6 percent of users were aged between 25 and 34 years. Overall, 16 percent of users belonged to the 35 to 44 year age group.
Instagram users
With roughly one billion monthly active users, Instagram belongs to the most popular social networks worldwide. The social photo sharing app is especially popular in India and in the United States, which have respectively 362.9 million and 169.7 million Instagram users each.
Instagram features
One of the most popular features of Instagram is Stories. Users can post photos and videos to their Stories stream and the content is live for others to view for 24 hours before it disappears. In January 2019, the company reported that there were 500 million daily active Instagram Stories users. Instagram Stories directly competes with Snapchat, another photo sharing app that initially became famous due to it’s “vanishing photos” feature.
As of the second quarter of 2021, Snapchat had 293 million daily active users.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Kindly refer to my paper for more information. Please cite my work if you use my dataset in any work : K. R. Purba, D. Asirvatham and R. K. Murugesan, "Classification of instagram fake users using supervised machine learning algorithms," International Journal of Electrical and Computer Engineering (IJECE), vol. 10, no. 3, pp. 2763-2772, 2020.
The dataset was collected using web scraping from third-party Instagram websites, to capture their metadata and up to 12 latest media posts from each user. The collection process was executed from September 1st, 2019, until September 20th, 2019. The dataset contains authentic users and fake users, which were filtered using human annotators. The authentic users were taken from followers of 24 private university pages (8 Indonesian, 8 Malaysian, 8 Australian) on Instagram. To reduce the number of users, they are picked using proportional random sampling based on their source university. All private users were removed, which is a total of 31,335 out of 63,795 users (49.11%). The final number of public users used in this research was 32,460 users.
Var name | Feature name | Description pos | Num posts | Number of total posts that the user has ever posted. flg | Num following | Number of following flr | Num followers | Number of followers bl | Biography length | Length (number of characters) of the user's biography pic | Picture availability | Value 0 if the user has no profile picture, or 1 if has lin | Link availability | Value 0 if the user has no external URL, or 1 if has cl | Average caption length | The average number of character of captions in media cz | Caption zero | Percentage (0.0 to 1.0) of captions that has almost zero (<=3) length ni | Non image percentage | Percentage (0.0 to 1.0) of non-image media. There are three types of media on an Instagram post, i.e. image, video, carousel erl | Engagement rate (Like) | Engagement rate (ER) is commonly defined as (num likes) divide by (num media) divide by (num followers) erc | Engagement rate (Comm.) | Similar to ER like, but it is for comments lt | Location tag percentage | Percentage (0.0 to 1.0) of posts tagged with location hc | Average hashtag count | Average number of hashtags used in a post pr | Promotional keywords | Average use of promotional keywords in hashtag, i.e. {regrann, contest, repost, giveaway, mention, share, give away, quiz} fo | Followers keywords | Average use of followers hunter keywords in hashtag, i.e. {follow, like, folback, follback, f4f} cs | Cosine similarity | Average cosine similarity of between all pair of two posts a user has pi | Post interval | Average interval between posts (in hours)
Output : 2-class User classes : r (real/authentic user), f (fake user / bought followers) 4-class User classes : r (authentic/real user), a (active fake user), i (inactive fake user), s (spammer fake user) Note that the 3 fake user classes (a, i, s) were judged by human annotators.
Instagram’s most popular post
As of April 2024, the most popular post on Instagram was Lionel Messi and his teammates after winning the 2022 FIFA World Cup with Argentina, posted by the account @leomessi. Messi's post, which racked up over 61 million likes within a day, knocked off the reigning post, which was 'Photo of an Egg'. Originally posted in January 2021, 'Photo of an Egg' surpassed the world’s most popular Instagram post at that time, which was a photo by Kylie Jenner’s daughter totaling 18 million likes.
After several cryptic posts published by the account, World Record Egg revealed itself to be a part of a mental health campaign aimed at the pressures of social media use.
Instagram’s most popular accounts
As of April 2024, the official Instagram account @instagram had the most followers of any account on the platform, with 672 million followers. Portuguese footballer Cristiano Ronaldo (@cristiano) was the most followed individual with 628 million followers, while Selena Gomez (@selenagomez) was the most followed woman on the platform with 429 million. Additionally, Inter Miami CF striker Lionel Messi (@leomessi) had a total of 502 million. Celebrities such as The Rock, Kylie Jenner, and Ariana Grande all had over 380 million followers each.
Instagram influencers
In the United States, the leading content category of Instagram influencers was lifestyle, with 15.25 percent of influencers creating lifestyle content in 2021. Music ranked in second place with 10.96 percent, followed by family with 8.24 percent. Having a large audience can be very lucrative: Instagram influencers in the United States, Canada and the United Kingdom with over 90,000 followers made around 1,221 US dollars per post.
Instagram around the globe
Instagram’s worldwide popularity continues to grow, and India is the leading country in terms of number of users, with over 362.9 million users as of January 2024. The United States had 169.65 million Instagram users and Brazil had 134.6 million users. The social media platform was also very popular in Indonesia and Turkey, with 100.9 and 57.1, respectively. As of January 2024, Instagram was the fourth most popular social network in the world, behind Facebook, YouTube and WhatsApp.
Who are leading Pakistan on Instagram?
The dataset contains Top 25 (2 additional for tie) Instagram accounts from Pakistan with category and followers count. All accounts have more than 2 million followers.
Can you find out what kind of contents Pakistanis are interested in on Instagram?
As of January 2024, Instagram was slightly more popular with men than women, with men accounting for 50.6 percent of the platform’s global users. Additionally, the social media app was most popular amongst younger audiences, with almost 32 percent of users aged between 18 and 24 years.
Instagram’s Global Audience
As of January 2024, Instagram was the fourth most popular social media platform globally, reaching two billion monthly active users (MAU). This number is projected to keep growing with no signs of slowing down, which is not a surprise as the global online social penetration rate across all regions is constantly increasing.
As of January 2024, the country with the largest Instagram audience was India with 362.9 million users, followed by the United States with 169.7 million users.
Who is winning over the generations?
Even though Instagram’s audience is almost twice the size of TikTok’s on a global scale, TikTok has shown itself to be a fierce competitor, particularly amongst younger audiences. TikTok was the most downloaded mobile app globally in 2022, generating 672 million downloads. As of 2022, Generation Z in the United States spent more time on TikTok than on Instagram monthly.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These four datasets are gathered from Instagram users who were chosen randomly.
The MainDataset encompasses data for 818 users. The TestDataset encompasses data for 78 users.
Data gathered for each user includes :
1- number of posts
2- number of followers
3- number of followings
4- number of likes for the tenth previous post
5- number of likes for the eleventh previous post
6- number of likes for the twelfth previous post
7- number of self-presenting posts from nine previous posts
8- gender
The MainDataset_after_150_days and TestDataset_after_150_days encompass data of the users of the Main data set and the Test data set, respectively, for after 150 days. For example, User_1 in the MainDataset has 486 posts and in the MainDataset_after_150_days has 562 posts, which means over the course of 150 days he had published 76 posts.
As of April 2024, around 16.5 percent of global active Instagram users were men between the ages of 18 and 24 years. More than half of the global Instagram population worldwide was aged 34 years or younger.
Teens and social media
As one of the biggest social networks worldwide, Instagram is especially popular with teenagers. As of fall 2020, the photo-sharing app ranked third in terms of preferred social network among teenagers in the United States, second to Snapchat and TikTok. Instagram was one of the most influential advertising channels among female Gen Z users when making purchasing decisions. Teens report feeling more confident, popular, and better about themselves when using social media, and less lonely, depressed and anxious.
Social media can have negative effects on teens, which is also much more pronounced on those with low emotional well-being. It was found that 35 percent of teenagers with low social-emotional well-being reported to have experienced cyber bullying when using social media, while in comparison only five percent of teenagers with high social-emotional well-being stated the same. As such, social media can have a big impact on already fragile states of mind.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
There were 392 465 000 Instagram users in India in June 2024, which accounted for 26.7% of its entire population. The majority of them were men - 66.9%. People aged 18 to 24 were the largest user group (172 600 000). The highest difference between men and women occurs within people aged 25 to 34, where men lead by 99 300 000.
🔍 ️⃣ NOTE: We can provide data on any hashtag or word 🔍 ️⃣
Dive into fashion culture on Instagram with this curated dataset of posts tagged with fashion-related hashtags. It includes millions of real-time and historical posts from creators across the style spectrum—featuring content from influencers, brands, and users worldwide.
Key Features:
📱 Post-Level Detail: Captures caption text, hashtags, image URLs, timestamps, like counts, comment counts, and engagement metrics.
👗 Fashion-Centric Filtering: Every entry includes at least one fashion-related hashtag (e.g., fashion, ootd, style).
👤 Creator Metadata: Includes username, follower count, bio, and account type where available.
⚡ Insight-Ready: Ideal for trend spotting, campaign benchmarking, sentiment analysis, and brand tracking within the fashion space.
🚀 Scalable Format: Delivered in structured CSV, ready for analysis or model training.
This dataset is perfect for brands, agencies, researchers, and AI teams looking to analyze how fashion is represented, consumed, and engaged with on Instagram at scale. Post data: By default the dataset provides the latest 10 posts per profile. This can be expanded at request.
Problem Statement
👉 Download the case studies here
A global consumer goods company struggled to understand customer sentiment across various social media platforms. With millions of posts, reviews, and comments generated daily, manually tracking and analyzing public opinion was inefficient. The company needed an automated solution to monitor brand perception, address negative feedback promptly, and leverage insights for marketing strategies.
Challenge
Analyzing social media sentiment posed the following challenges:
Processing vast amounts of unstructured text data from multiple platforms like Twitter, Facebook, and Instagram.
Accurately interpreting slang, emojis, and nuanced language used by social media users.
Identifying trends and actionable insights in real-time to respond to potential crises or opportunities effectively.
Solution Provided
An advanced sentiment analysis system was developed using Natural Language Processing (NLP) and sentiment analysis algorithms. The solution was designed to:
Classify social media posts into positive, negative, and neutral sentiments.
Extract key topics and trends related to the brand and its products.
Provide real-time dashboards for monitoring customer sentiment and identifying areas of improvement.
Development Steps
Data Collection
Aggregated data from major social media platforms using APIs, focusing on brand mentions, hashtags, and product keywords.
Preprocessing
Cleaned and normalized text data, including handling slang, emojis, and misspellings, to prepare it for analysis.
Model Training
Trained NLP models for sentiment classification using supervised learning. Implemented topic modeling algorithms to identify recurring themes and discussions.
Validation
Tested the sentiment analysis models on labeled datasets to ensure high accuracy and relevance in classifying social media posts.
Deployment
Integrated the sentiment analysis system with a real-time analytics dashboard, enabling the marketing and customer support teams to track trends and respond proactively.
Monitoring & Improvement
Established a continuous feedback mechanism to refine models based on evolving language patterns and new social media trends.
Results
Gained Actionable Insights
The system provided detailed insights into customer opinions, helping the company identify strengths and areas for improvement.
Improved Brand Reputation Management
Real-time monitoring enabled swift responses to negative feedback, mitigating potential reputation risks.
Informed Marketing Strategies
Insights from sentiment analysis guided targeted marketing campaigns, resulting in higher engagement and ROI.
Enhanced Customer Relationships
Proactive engagement with customers based on sentiment analysis improved customer satisfaction and loyalty.
Scalable Monitoring Solution
The system scaled efficiently to analyze data across multiple languages and platforms, broadening the company’s reach and understanding.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset supports research on how engagement with social media (Instagram and TikTok) was related to problematic social media use (PSMU) and mental well-being. There are three different files. The SPSS and Excel spreadsheet files include the same dataset but in a different format. The SPSS output presents the data analysis in regard to the difference between Instagram and TikTok users.
Automatically describing images using natural sentences is an essential task to visually impaired people's inclusion on the Internet. Although there are many datasets in the literature, most of them contain only English captions, whereas datasets with captions described in other languages are scarce.
PraCegoVer arose on the Internet, stimulating users from social media to publish images, tag #PraCegoVer and add a short description of their content. Inspired by this movement, we have proposed the #PraCegoVer, a multi-modal dataset with Portuguese captions based on posts from Instagram. It is the first large dataset for image captioning in Portuguese with freely annotated images.
#PraCegoVer has 533,523 pairs with images and captions described in Portuguese collected from more than 14 thousand different profiles. Also, the average caption length in #PraCegoVer is 39.3 words and the standard deviation is 29.7.
Dataset Structure
#PraCegoVer dataset is composed of the main file dataset.json and a collection of compressed files named images.tar.gz.partX
containing the images. The file dataset.json comprehends a list of json objects with the attributes:
Each instance in dataset.json is associated with exactly one image in the images directory whose filename is pointed by the attribute filename. Also, we provide a sample with five instances, so the users can download the sample to get an overview of the dataset before downloading it completely.
Download Instructions
If you just want to have an overview of the dataset structure, you can download sample.tar.gz. But, if you want to use the dataset, or any of its subsets (63k and 173k), you must download all the files and run the following commands to uncompress and join the files:
cat images.tar.gz.part* > images.tar.gz
tar -xzvf images.tar.gz
Alternatively, you can download the entire dataset from the terminal using the python script download_dataset.py available in PraCegoVer repository. In this case, first, you have to download the script and create an access token here. Then, you can run the following command to download and uncompress the image files:
python download_dataset.py --access_token=
This table includes platform data for Facebook participants in the Deactivation experiment. Each row of the dataset corresponds to data from a participant’s Facebook user account. Each column contains a value, or set of values, that aggregates log data for this specific participant over a certain period of time.
As of April 2024, Bahrain was the country with the highest Instagram audience reach with 95.6 percent. Kazakhstan also had a high Instagram audience penetration rate, with 90.8 percent of the population using the social network. In the United Arab Emirates, Turkey, and Brunei, the photo-sharing platform was used by more than 85 percent of each country's population.
https://brightdata.com/licensehttps://brightdata.com/license
Use our Instagram dataset (public data) to extract business and non-business information from complete public profiles and filter by hashtags, followers, account type, or engagement score. Depending on your needs, you may purchase the entire dataset or a customized subset. Popular use cases include sentiment analysis, brand monitoring, influencer marketing, and more. The dataset includes all major data points: # of followers, verified status, account type (business / non-business), links, posts, comments, location, engagement score, hashtags, and much more.