https://brightdata.com/licensehttps://brightdata.com/license
Access detailed insights with our Instagram datasets, featuring follower counts, verified status, account types, and engagement scores. Explore post information including URLs, descriptions, hashtags, comments, likes, media, posting dates, locations, and reel URLs. Perfect for understanding user engagement and content trends to drive informed decisions and optimize your social media strategies. Over 750M records available Price starts at $250/100K records Data formats are available in JSON, NDJSON, CSV, XLSX and Parquet. 100% ethical and compliant data collection Included datapoints:
Account Fbid Id Followers Posts Count Is Business Account Is Professional Account Is Verified Avg Engagement External Url Biography Business Category Name Category Name Post Hashtags Following Posts Profile Image Link Profile URL Profile Name Highlights Count Highlights Full Name Is Private Bio Hashtags URL Is Joined Recently And much more
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset supports research on how Instagram Influencers impact female consumer behaviour to purchase products and the role of factors such as envy, scepticism towards advertising, satisfaction with life, social comparison and maternalism on consumer behaviour. There are two different files. The SPSS and CVS spreadsheet files include the same dataset but in a different format.
This is a dataset of audiens comment for each KOL that uses Instagram as their campaign platform. The comments are scrapped and generated as csv through apify.com
This dataset provides a collection of user reviews for the Threads mobile application from both the Google Play Store and the Apple App Store. It is designed to offer insights into user satisfaction, app performance, and to help identify emerging user patterns and sentiments. The data was gathered by scraping reviews from the respective app marketplaces.
The dataset is typically provided in a CSV file format. Specific row or record counts are not available for the entire dataset, but review counts are detailed for various rating ranges and daily periods. For instance, 15,559 reviews are rated between 4.80 and 5.00, while 11,338 reviews were recorded between 5th and 6th July 2023.
This dataset is ideal for: * Sentiment analysis to understand overall user sentiment towards the Threads app. * Investigating factors that lead to 1-star and 5-star ratings, offering insights into user satisfaction and dissatisfaction. * Evaluating the application's performance and identifying recurring themes in user feedback.
The dataset's geographic scope is global, collecting reviews from users worldwide. The time range for the reviews spans from 6th July 2023 to 25th July 2023. The dataset was last updated on 26th July 2023. It captures feedback from users across two major mobile platforms, Google Play (92% of reviews) and Apple App Store (8% of reviews).
CC-BY-NC
Original Data Source: Threads, an Instagram app Reviews
🔍 ️⃣ NOTE: We can provide data on any hashtag or word 🔍 ️⃣
Dive into fashion culture on Instagram with this curated dataset of posts tagged with fashion-related hashtags. It includes millions of real-time and historical posts from creators across the style spectrum—featuring content from influencers, brands, and users worldwide.
Key Features:
📱 Post-Level Detail: Captures caption text, hashtags, image URLs, timestamps, like counts, comment counts, and engagement metrics.
👗 Fashion-Centric Filtering: Every entry includes at least one fashion-related hashtag (e.g., fashion, ootd, style).
👤 Creator Metadata: Includes username, follower count, bio, and account type where available.
⚡ Insight-Ready: Ideal for trend spotting, campaign benchmarking, sentiment analysis, and brand tracking within the fashion space.
🚀 Scalable Format: Delivered in structured CSV, ready for analysis or model training.
This dataset is perfect for brands, agencies, researchers, and AI teams looking to analyze how fashion is represented, consumed, and engaged with on Instagram at scale. Post data: By default the dataset provides the latest 10 posts per profile. This can be expanded at request.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Instagram Influencer and Brand Dataset
Deskripsi
Dataset ini berisi data influencer Instagram, brand, caption, komentar, dan label terkait untuk keperluan analisis data, klasifikasi, dan riset data science di bidang pemasaran digital dan media sosial.
Struktur Dataset
Penjelasan Struktur File
instagram_influencers.csv: Berisi data profil influencer Instagram. Kolom utama: username, followers_tier, demografi (misal: usia, gender, lokasi), psikografi… See the full description on the dataset page: https://huggingface.co/datasets/AzrilFahmiardi/instagram_influencer_and_brand.
A dataset with Instagram and Twitter post data concerning the Arc de Triomph in and the Arc de Triomph Wrapped, an art installation by Christo and Jeanne-Claude. It includes four csv files: two files with information about Instagram posts (comments, likes, media etc.) for Arc de Triomph as unwrapped and as wrapped respectively and two for Twitter posts. Related publication: Vlachou, S.; Panagopoulos, M. The Arc de Triomphe, Wrapped: Measuring Public Installation Art Engagement and Popularity through Social Media Data Analysis. Informatics 2022, 9, 41. https://doi.org/10.3390/informatics9020041
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Data for a Brief Report/Short Communication published in Body Image (2021). Details of the study are included below via the abstract from the manuscript. The dataset includes online experimental data from 167 women who were recruited via social media and institutional participant pools. The experiment was completed in Qualtrics.Women viewed either neutral travel images (control), body positivity posts with an average-sized model (e.g., ~ UK size 14), or body positivity posts with a larger model (e.g., UK size 18+); which images women viewed is show in the ‘condition’ variable in the data.The data includes the age range, height, weight, calculated BMI, and Instagram use of participants. After viewing the images, women responded to the Positive and Negative Affect Schedule (PANAS), a state version of the Body Satisfaction Scale (BSS), and reported their immediate social comparison with the images (SAC items). Women then selected a lunch for themselves from a hypothetical menu; these selections are detailed in the data, as are the total calories calculated from this and the proportion of their picks which were (provided as a percentage, and as a categorical variable [as used in the paper analyses]). Women also reported whether they were on a special diet (e.g., vegan or vegetarian), had food intolerances, when they last ate, and how hungry they were.
Women also completed trait measures of Body Appreciation (BAS-2) and social comparison (PACS-R). Women also were asked to comment on what they thought the experiment was about. Items and computed scales are included within the dataset.This item includes the dataset collected for the manuscript (in SPSS and CSV formats), the variable list for the CSV file (for users working with the CSV datafile; the variable list and details are contained within the .sav file for the SPSS version), and the SPSS syntax for our analyses (.sps). Also included are the information and consent form (collected via Qualtrics) and the questions as completed by participants (both in pdf format).Please note that the survey order in the PDF is not the same as in the datafiles; users should utilise the variable list (either in CSV or SPSS formats) to identify the items in the data.The SPSS syntax can be used to replicate the analyses reported in the Results section of the paper. Annotations within the syntax file guide the user through these.
A copy of SPSS Statistics is needed to open the .sav and .sps files.
Manuscript abstract:
Body Positivity (or ‘BoPo’) social media content may be beneficial for women’s mood and body image, but concerns have been raised that it may reduce motivation for healthy behaviours. This study examines differences in women’s mood, body satisfaction, and hypothetical food choices after viewing BoPo posts (featuring average or larger women) or a neutral travel control. Women (N = 167, 81.8% aged 18-29) were randomly assigned in an online experiment to one of three conditions (BoPo-average, BoPo-larger, or Travel/Control) and viewed three Instagram posts for two minutes, before reporting their mood and body satisfaction, and selecting a meal from a hypothetical menu. Women who viewed the BoPo posts featuring average-size women reported more positive mood than the control group; women who viewed posts featuring larger women did not. There were no effects of condition on negative mood or body satisfaction. Women did not make less healthy food choices than the control in either BoPo condition; women who viewed the BoPo images of larger women showed a stronger association between hunger and calories selected. These findings suggest that concerns over BoPo promoting unhealthy behaviours may be misplaced, but further research is needed regarding women’s responses to different body sizes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Data is used to examine how sustainable tourism stakeholders use Instagram to engage eco-conscious travelers. Using data-mining and content analysis, it analyzes visuals, hashtags, and captions to uncover effective strategies for promoting sustainability. Findings offer insights for tourism marketers to bridge the gap between environmental awareness and consumer action.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
The DANKMEMES Task A Dataset is comprised of 2,000 images, half memes and half not, automatically extracted from Instagram through a Python script aimed at the hashtag related to the Italian government crisis (“#crisidigoverno”). It was created and used in the context of the DankMemes (https://dankmemes2020.fileli.unipi.it), a shared task proposed for the 2020 EVALITA campaign (http://www.evalita.it/2020), focusing on the automatic classification of Internet memes.
The dataset is split into training and test sets, in a proportion of 80-20% of items. The test dataset has been provided without gold labels, i.e. without the “Meme” attribute; the gold labels are provided in a separate file.
The dataset consists of:
- a folder with images in .jpg format
- a .csv file with the associated image embeddigs, computed employing ResNet (He et al., 2016), a state-of-the-art model for image recognition based on Deep Residual Learning
- a .csv file with the associated variables.
The variables provided for this task are:
- File: the name of the image file associated with the variables;
- Engagement: the number of comments and likes of the image;
- Date: when the image has first been posted on Instagram;
- Picture manipulation: entails the degree of visual modification of the images. Non-manipulated or low impact changes are labeled 0 (e.g. addition of text, or logo). Heavily manipulated, impactful changes (e.g. images altered to include political actors) are labeled 1;
- Visual actors: the political actors (i.e. politicians, parties’ logos) portrayed visually, as edited into the picture or portrayed in the original image;
- Text: the textual content of the image has been extracted through optical character recognition (OCR) using Google’s Tesseract-OCR Engine, and further manually corrected;
- Meme: binary feature, where 0 represents non meme images and 1 meme images. This is the target label for the first subtask.
This dataset contains information on application install interactions of users in the Myket android application market. The dataset was created for the purpose of evaluating interaction prediction models, requiring user and item identifiers along with timestamps of the interactions. Hence, the dataset can be used for interaction prediction and building a recommendation system. Furthermore, the data forms a dynamic network of interactions, and we can also perform network representation learning on the nodes in the network, which are users and applications.
Data Creation The dataset was initially generated by the Myket data team, and later cleaned and subsampled by Erfan Loghmani a master student at Sharif University of Technology at the time. The data team focused on a two-week period and randomly sampled 1/3 of the users with interactions during that period. They then selected install and update interactions for three months before and after the two-week period, resulting in interactions spanning about 6 months and two weeks.
We further subsampled and cleaned the data to focus on application download interactions. We identified the top 8000 most installed applications and selected interactions related to them. We retained users with more than 32 interactions, resulting in 280,391 users. From this group, we randomly selected 10,000 users, and the data was filtered to include only interactions for these users. The detailed procedure can be found in here.
Data Structure The dataset has two main files.
myket.csv: This file contains the interaction information and follows the same format as the datasets used in the "JODIE: Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks" (ACM SIGKDD 2019) project. However, this data does not contain state labels and interaction features, resulting in associated columns being all zero. app_info_sample.csv: This file comprises features associated with applications present in the sample. For each individual application, information such as the approximate number of installs, average rating, count of ratings, and category are included. These features provide insights into the applications present in the dataset.
Dataset Details
Total Instances: 694,121 install interaction instances Instances Format: Triplets of user_id, app_name, timestamp 10,000 users and 7,988 android applications Item features for 7,606 applications
For a detailed summary of the data's statistics, including information on users, applications, and interactions, please refer to the Python notebook available at summary-stats.ipynb. The notebook provides an overview of the dataset's characteristics and can be helpful for understanding the data's structure before using it for research or analysis.
Top 20 Most Installed Applications | Package Name | Count of Interactions | | ---------------------------------- | --------------------- | | com.instagram.android | 15292 | | ir.resaneh1.iptv | 12143 | | com.tencent.ig | 7919 | | com.ForgeGames.SpecialForcesGroup2 | 7797 | | ir.nomogame.ClutchGame | 6193 | | com.dts.freefireth | 6041 | | com.whatsapp | 5876 | | com.supercell.clashofclans | 5817 | | com.mojang.minecraftpe | 5649 | | com.lenovo.anyshare.gps | 5076 | | ir.medu.shad | 4673 | | com.firsttouchgames.dls3 | 4641 | | com.activision.callofduty.shooter | 4357 | | com.tencent.iglite | 4126 | | com.aparat | 3598 | | com.kiloo.subwaysurf | 3135 | | com.supercell.clashroyale | 2793 | | co.palang.QuizOfKings | 2589 | | com.nazdika.app | 2436 | | com.digikala | 2413 |
Comparison with SNAP Datasets The Myket dataset introduced in this repository exhibits distinct characteristics compared to the real-world datasets used by the project. The table below provides a comparative overview of the key dataset characteristics:
Dataset | #Users | #Items | #Interactions | Average Interactions per User | Average Unique Items per User |
---|---|---|---|---|---|
Myket | 10,000 | 7,988 | 694,121 | 69.4 | 54.6 |
LastFM | 980 | 1,000 | 1,293,103 | 1,319.5 | 158.2 |
10,000 | 984 | 672,447 | 67.2 | 7.9 | |
Wikipedia | 8,227 | 1,000 | 157,474 | 19.1 | 2.2 |
MOOC | 7,047 | 97 | 411,749 | 58.4 | 25.3 |
The Myket dataset stands out by having an ample number of both users and items, highlighting its relevance for real-world, large-scale applications. Unlike LastFM, Reddit, and Wikipedia datasets, where users exhibit repetitive item interactions, the Myket dataset contains a comparatively lower amount of repetitive interactions. This unique characteristic reflects the diverse nature of user behaviors in the Android application market environment.
Citation If you use this dataset in your research, please cite the following preprint:
@misc{loghmani2023effect, title={Effect of Choosing Loss Function when Using T-batching for Representation Learning on Dynamic Networks}, author={Erfan Loghmani and MohammadAmin Fazli}, year={2023}, eprint={2308.06862}, archivePrefix={arXiv}, primaryClass={cs.LG} }
Dans le cadre de sa mission patrimoniale de dépôt légal de l’internet, la Bibliothèque nationale de France collecte régulièrement un échantillon du web français, constitué à partir de collectes larges et de collectes ciblées. Ces dernières regroupent les collectes « courantes » (pour les sites de référence sur un champ disciplinaire donné) et les collectes « projet » (portant sur un événement ou un thème particulier). Ce jeu de données contient les URL des sites, en lien avec l’épidémie de Covid-19, collectés dans le cadre de collectes ciblées, entre le 1er février et le 31 juillet 2020.
Le jeu est constitué d’un fichier au format CSV rassemblant près de 4600 URL de sites, blogs, réseaux sociaux et vidéos. Ces contenus relatifs à l’épidémie de Covid-19 ont été collectés dans le cadre de la collecte Actualité éphémère, entre le 1er février et le 31 juillet 2020, soit de l’installation du virus, sur le sol français jusqu’à sa rémission, ce qui correspond à la fin de l’état d’urgence sanitaire (10 juillet 2020). Le fichier CSV comprend également des URL de sites collectés dans le cadre des collectes Vidéos et Instagram qui ont respectivement été réalisées en juin et en juillet 2020. Ces URL servent de point de départ à la constitution des archives de l’internet, consultables par les chercheurs dans les salles de recherche des différents sites de la BnF, ainsi qu’en accès distant dans les bibliothèques de dépôt légal imprimeur (BDLI), en région. La collection épidémie de Covid-19, consultable dans les Archives de l’internet Labs, regroupe, en plus des contenus collectés dans le cadre des trois collectes mentionnées ci-dessus, ceux réunis lors des collectes Presse payante et Actualités. Chaque URL est accompagnée d’informations descriptives (thème de la fiche ayant servi à réaliser la collecte, mots-clefs renseignés) et techniques (fréquence de collecte, historique de l'URL collectée) concernant la collecte. Il est cependant à noter que la fréquence de collecte indiquée dans le fichier CSV correspond à la dernière fréquence associée à l’URL du site à collecter. Cette colonne ne fait donc pas état des changements de fréquence qui ont pu intervenir au cours de la collecte.
Etant donné le caractère imprévisible de l’épidémie de Covid-19 en France, cette collecte n’a pas été réalisée dans le cadre d’une collecte projet s’inscrivant dans un calendrier donné. Les contenus ont donc, dans un premier temps, été sélectionnés lors de la collecte Actualité éphémère. Cinquante-deux correspondants ont participé de manière directe à cette vaste collecte. Une partie d’entre eux appartiennent au réseau des correspondants internes de la BnF tandis que les autres sont rattachés au réseau des correspondants régionaux (qui travaillent dans quinze établissements partenaires en région). Par la suite, deux autres collectes ont été réalisées en juin et juillet 2020 ; il s’agit des collectes Vidéos et Instagram.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://brightdata.com/licensehttps://brightdata.com/license
Access detailed insights with our Instagram datasets, featuring follower counts, verified status, account types, and engagement scores. Explore post information including URLs, descriptions, hashtags, comments, likes, media, posting dates, locations, and reel URLs. Perfect for understanding user engagement and content trends to drive informed decisions and optimize your social media strategies. Over 750M records available Price starts at $250/100K records Data formats are available in JSON, NDJSON, CSV, XLSX and Parquet. 100% ethical and compliant data collection Included datapoints:
Account Fbid Id Followers Posts Count Is Business Account Is Professional Account Is Verified Avg Engagement External Url Biography Business Category Name Category Name Post Hashtags Following Posts Profile Image Link Profile URL Profile Name Highlights Count Highlights Full Name Is Private Bio Hashtags URL Is Joined Recently And much more