100+ datasets found
  1. Social Media Datasets

    • brightdata.com
    .json, .csv, .xlsx
    Updated Sep 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2022). Social Media Datasets [Dataset]. https://brightdata.com/products/datasets/social-media
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Sep 7, 2022
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Gain valuable insights with our comprehensive Social Media Dataset, designed to help businesses, marketers, and analysts track trends, monitor engagement, and optimize strategies. This dataset provides structured and reliable social media data from multiple platforms.

    Dataset Features

    User Profiles: Access public social media profiles, including usernames, bios, follower counts, engagement metrics, and more. Ideal for audience analysis, influencer marketing, and competitive research. Posts & Content: Extract posts, captions, hashtags, media (images/videos), timestamps, and engagement metrics such as likes, shares, and comments. Useful for trend analysis, sentiment tracking, and content strategy optimization. Comments & Interactions: Analyze user interactions, including replies, mentions, and discussions. This data helps brands understand audience sentiment and engagement patterns. Hashtag & Trend Tracking: Monitor trending hashtags, topics, and viral content across platforms to stay ahead of industry trends and consumer interests.

    Customizable Subsets for Specific Needs Our Social Media Dataset is fully customizable, allowing you to filter data based on platform, region, keywords, engagement levels, or specific user profiles. Whether you need a broad dataset for market research or a focused subset for brand monitoring, we tailor the dataset to your needs.

    Popular Use Cases

    Brand Monitoring & Reputation Management: Track brand mentions, customer feedback, and sentiment analysis to manage online reputation effectively. Influencer Marketing & Audience Analysis: Identify key influencers, analyze engagement metrics, and optimize influencer partnerships. Competitive Intelligence: Monitor competitor activity, content performance, and audience engagement to refine marketing strategies. Market Research & Consumer Insights: Analyze social media trends, customer preferences, and emerging topics to inform business decisions. AI & Predictive Analytics: Leverage structured social media data for AI-driven trend forecasting, sentiment analysis, and automated content recommendations.

    Whether you're tracking brand sentiment, analyzing audience engagement, or monitoring industry trends, our Social Media Dataset provides the structured data you need. Get started today and customize your dataset to fit your business objectives.

  2. Instagram accounts with the most followers worldwide 2024

    • statista.com
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon (2025). Instagram accounts with the most followers worldwide 2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset updated
    Jun 17, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    Cristiano Ronaldo has one of the most popular Instagram accounts as of April 2024.

                  The Portuguese footballer is the most-followed person on the photo sharing app platform with 628 million followers. Instagram's own account was ranked first with roughly 672 million followers.
    
                  How popular is Instagram?
    
                  Instagram is a photo-sharing social networking service that enables users to take pictures and edit them with filters. The platform allows users to post and share their images online and directly with their friends and followers on the social network. The cross-platform app reached one billion monthly active users in mid-2018. In 2020, there were over 114 million Instagram users in the United States and experts project this figure to surpass 127 million users in 2023.
    
                  Who uses Instagram?
    
                  Instagram audiences are predominantly young – recent data states that almost 60 percent of U.S. Instagram users are aged 34 years or younger. Fall 2020 data reveals that Instagram is also one of the most popular social media for teens and one of the social networks with the biggest reach among teens in the United States.
    
                  Celebrity influencers on Instagram
                  Many celebrities and athletes are brand spokespeople and generate additional income with social media advertising and sponsored content. Unsurprisingly, Ronaldo ranked first again, as the average media value of one of his Instagram posts was 985,441 U.S. dollars.
    
  3. Social Media Influencers in 2022

    • kaggle.com
    Updated Dec 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ram Jas (2022). Social Media Influencers in 2022 [Dataset]. https://www.kaggle.com/datasets/ramjasmaurya/top-1000-social-media-channels/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 27, 2022
    Dataset provided by
    Kaggle
    Authors
    Ram Jas
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Important : its a 3 month gap data Starting from March 2022 to Dec 2022

    Influencers are categorized by the number of followers they have on social media. They include celebrities with large followings to niche content creators with a loyal following on social-media platforms such as YouTube, Instagram, Facebook, and Twitter.Their followers range in number from hundreds of millions to 1,000. Influencers may be categorized in tiers (mega-, macro-, micro-, and nano-influencers), based on their number of followers.

    Businesses pursue people who aim to lessen their consumption of advertisements, and are willing to pay their influencers more. Targeting influencers is seen as increasing marketing's reach, counteracting a growing tendency by prospective customers to ignore marketing.

    Marketing researchers Kapitan and Silvera find that influencer selection extends into product personality. This product and benefit matching is key. For a shampoo, it should use an influencer with good hair. Likewise, a flashy product may use bold colors to convey its brand. If an influencer is not flashy, they will clash with the brand. Matching an influencer with the product's purpose and mood is important.

    https://sceptermarketing.com/wp-content/uploads/2019/02/social-media-influencers-2l4ues9.png">

  4. Instagram: distribution of global audiences 2024, by gender

    • statista.com
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon (2025). Instagram: distribution of global audiences 2024, by gender [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset updated
    Jun 17, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    As of January 2024, Instagram was slightly more popular with men than women, with men accounting for 50.6 percent of the platform’s global users. Additionally, the social media app was most popular amongst younger audiences, with almost 32 percent of users aged between 18 and 24 years.

                  Instagram’s Global Audience
    
                  As of January 2024, Instagram was the fourth most popular social media platform globally, reaching two billion monthly active users (MAU). This number is projected to keep growing with no signs of slowing down, which is not a surprise as the global online social penetration rate across all regions is constantly increasing.
                  As of January 2024, the country with the largest Instagram audience was India with 362.9 million users, followed by the United States with 169.7 million users.
    
                  Who is winning over the generations?
    
                  Even though Instagram’s audience is almost twice the size of TikTok’s on a global scale, TikTok has shown itself to be a fierce competitor, particularly amongst younger audiences. TikTok was the most downloaded mobile app globally in 2022, generating 672 million downloads. As of 2022, Generation Z in the United States spent more time on TikTok than on Instagram monthly.
    
  5. Instagram: distribution of global audiences 2024, by age and gender

    • statista.com
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon (2025). Instagram: distribution of global audiences 2024, by age and gender [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset updated
    Jun 17, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    As of April 2024, around 16.5 percent of global active Instagram users were men between the ages of 18 and 24 years. More than half of the global Instagram population worldwide was aged 34 years or younger.

                  Teens and social media
    
                  As one of the biggest social networks worldwide, Instagram is especially popular with teenagers. As of fall 2020, the photo-sharing app ranked third in terms of preferred social network among teenagers in the United States, second to Snapchat and TikTok. Instagram was one of the most influential advertising channels among female Gen Z users when making purchasing decisions. Teens report feeling more confident, popular, and better about themselves when using social media, and less lonely, depressed and anxious.
                  Social media can have negative effects on teens, which is also much more pronounced on those with low emotional well-being. It was found that 35 percent of teenagers with low social-emotional well-being reported to have experienced cyber bullying when using social media, while in comparison only five percent of teenagers with high social-emotional well-being stated the same. As such, social media can have a big impact on already fragile states of mind.
    
  6. d

    Data from: The State of Social Media in Canada 2022

    • dataone.org
    • borealisdata.ca
    Updated Dec 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mai, Philip; Gruzd, Anatoliy (2023). The State of Social Media in Canada 2022 [Dataset]. http://doi.org/10.5683/SP3/BDFE7S
    Explore at:
    Dataset updated
    Dec 28, 2023
    Dataset provided by
    Borealis
    Authors
    Mai, Philip; Gruzd, Anatoliy
    Area covered
    Canada
    Description

    The report provides a snapshot of the social media usage trends amongst online Canadian adults based on an online survey of 1500 participants. Canada continues to be one of the most connected countries in the world. An overwhelming majority of online Canadian adults (94%) have an account on at least one social media platform. However, the 2022 survey results show that the COVID-19 pandemic has ushered in some changes in how and where Canadians are spending their time on social media. Dominant platforms such as Facebook, messaging apps and YouTube are still on top but are losing ground to newer platforms such as TikTok and more niche platforms such as Reddit and Twitch.

  7. Instagram: distribution of global audiences 2024, by age group

    • statista.com
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon (2025). Instagram: distribution of global audiences 2024, by age group [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset updated
    Jun 17, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    As of April 2024, almost 32 percent of global Instagram audiences were aged between 18 and 24 years, and 30.6 percent of users were aged between 25 and 34 years. Overall, 16 percent of users belonged to the 35 to 44 year age group.

                  Instagram users
    
                  With roughly one billion monthly active users, Instagram belongs to the most popular social networks worldwide. The social photo sharing app is especially popular in India and in the United States, which have respectively 362.9 million and 169.7 million Instagram users each.
    
                  Instagram features
    
                  One of the most popular features of Instagram is Stories. Users can post photos and videos to their Stories stream and the content is live for others to view for 24 hours before it disappears. In January 2019, the company reported that there were 500 million daily active Instagram Stories users. Instagram Stories directly competes with Snapchat, another photo sharing app that initially became famous due to it’s “vanishing photos” feature.
                  As of the second quarter of 2021, Snapchat had 293 million daily active users.
    
  8. Twitter users in the United States 2019-2028

    • statista.com
    • ai-chatbox.pro
    Updated Jun 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2024). Twitter users in the United States 2019-2028 [Dataset]. https://www.statista.com/topics/3196/social-media-usage-in-the-united-states/
    Explore at:
    Dataset updated
    Jun 13, 2024
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Area covered
    United States
    Description

    The number of Twitter users in the United States was forecast to continuously increase between 2024 and 2028 by in total 4.3 million users (+5.32 percent). After the ninth consecutive increasing year, the Twitter user base is estimated to reach 85.08 million users and therefore a new peak in 2028. Notably, the number of Twitter users of was continuously increasing over the past years.User figures, shown here regarding the platform twitter, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Twitter users in countries like Canada and Mexico.

  9. o

    Social Media Sentiments Analysis Dataset 📊

    • opendatabay.com
    .csv
    Updated Jun 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Social Media Sentiments Analysis Dataset 📊 [Dataset]. https://www.opendatabay.com/data/dataset/840edf8a-202c-42ce-815a-45c7cbc1c364
    Explore at:
    .csvAvailable download formats
    Dataset updated
    Jun 7, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Social Media and Networking
    Description

    The Social Media Sentiments Analysis Dataset captures a vibrant tapestry of emotions, trends, and interactions across various social media platforms. This dataset provides a snapshot of user-generated content, encompassing text, timestamps, hashtags, countries, likes, and retweets. Each entry unveils unique stories—moments of surprise, excitement, admiration, thrill, contentment, and more—shared by individuals worldwide.

    Key Features

    Feature Description Text User-generated content showcasing sentiments Sentiment Categorized emotions Timestamp Date and time information User Unique identifiers of users contributing Platform Social media platform where the content originated Hashtags Identifies trending topics and themes Likes Quantifies user engagement (likes) Retweets Reflects content popularity (retweets) Country Geographical origin of each post Year Year of the post Month Month of the post Day Day of the post Hour Hour of the post How to Use The Social Media Sentiments Analysis Dataset 📊

    The Social Media Sentiments Analysis Dataset is a rich source of information that can be leveraged for various analytical purposes. Below are key ways to make the most of this dataset:

    Sentiment Analysis:

    Explore the emotional landscape by conducting sentiment analysis on the "Text" column. Classify user-generated content into categories such as surprise, excitement, admiration, thrill, contentment, and more.

    Temporal Analysis:

    Investigate trends over time using the "Timestamp" column. Identify patterns, fluctuations, or recurring themes in social media content.

    User Behavior Insights:

    Analyze user engagement through the "Likes" and "Retweets" columns. Discover popular content and user preferences.

    Platform-Specific Analysis:

    Examine variations in content across different social media platforms using the "Platform" column. Understand how sentiments vary across platforms.

    Hashtag Trends:

    Identify trending topics and themes by analyzing the "Hashtags" column. Uncover popular or recurring hashtags.

    Geographical Analysis:

    Explore content distribution based on the "Country" column. Understand regional variations in sentiment and topic preferences.

    User Identification:

    Use the "User" column to track specific users and their contributions. Analyze the impact of influential users on sentiment trends.

    Cross-Analysis:

    Combine multiple features for in-depth insights. For example, analyze sentiment trends over time or across different platforms and countries.

    Original Data Source: Social Media Sentiments Analysis Dataset 📊

  10. s

    Social Media Usage By Country

    • searchlogistics.com
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Social Media Usage By Country [Dataset]. https://www.searchlogistics.com/learn/statistics/social-media-addiction-statistics/
    Explore at:
    Dataset updated
    Apr 1, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The results might surprise you when looking at internet users that are active on social media in each country.

  11. s

    Social Media Worldwide Usage Statistics

    • searchlogistics.com
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Social Media Worldwide Usage Statistics [Dataset]. https://www.searchlogistics.com/learn/statistics/social-media-addiction-statistics/
    Explore at:
    Dataset updated
    Apr 1, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    56.8% of the world’s total population is active on social media.

  12. s

    Which Gender Uses Social Media More By Platform?

    • searchlogistics.com
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Which Gender Uses Social Media More By Platform? [Dataset]. https://www.searchlogistics.com/learn/statistics/social-media-addiction-statistics/
    Explore at:
    Dataset updated
    Apr 1, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The results of which gender uses which platforms are in.

  13. Z

    DeepCube: Post-processing and annotated datasets of social media data

    • data.niaid.nih.gov
    Updated Mar 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexandros Mokas (2024). DeepCube: Post-processing and annotated datasets of social media data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7732930
    Explore at:
    Dataset updated
    Mar 15, 2024
    Dataset provided by
    Eleni Kamateri
    Giannis Tsampoulatidis
    Alexandros Mokas
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Researcher(s): Alexandros Mokas, Eleni Kamateri

    Supervisor: Ioannis Tsampoulatidis

    This repository contains 3 social media datasets:

    2 Post-processing datasets: These datasets contain post-processing data extracted from the analysis of social media posts collected for two different use cases during the first two years of the Deepcube project. More specifically, these include:

    The UC2 dataset containing the post-processing analysis of the Twitter data collected for the DeepCube use case (UC2) dealing with the climate induced migration in Africa. This dataset contains in total 5,695,253 social media posts collected from the Twitter platform, based on the initial version of search criteria relevant to UC2 defined by Universitat De Valencia, focused on the regions of Ethiopia and Somalia and started from 26 June, 2021 till March, 2023.

    The UC5 dataset containing the post-processing analysis of the Twitter and Instagram data collected for the DeepCube use case (UC5) related to the sustainable and environmentally-friendly tourism. This dataset contains in total 58,143 social media posts collected from the Twitter and Instagram platform (12,881 collected from Twitter and 45,262 collected from Instagram), based on the initial version of search criteria relevant to UC5 defined by MURMURATION SAS, focused on the regions of Brasil and started from 26 June, 2021 till March, 2023.

    1 Annotated dataset: An additional anottated dataset was created that contains post-processing data along with annotations of Twitter posts collected for UC2 for the years 2010-2022. More specifically, it includes:

    The UC2 dataset contain the post-processing of the Twitter data collected for the DeepCube use case (UC2) dealing with the climate induced migration in Africa. This dataset contains in total 1721 annotated (412 relevant and 1309 irrelevant) by social media posts collected from the Twitter platform, focused on the region of Somalia and started from 1 January, 2010 till 31 December, 2022.

    For every social media post retrieved from Twitter and Instagram, a preprocessing step was performed. This involved a three-step analysis of each post using the appropriate web service. First, the location of the post was automatically extracted from the text using a location extraction service. Second, the images included in the post were analyzed using a concept extraction service, which identified and provided the top ten concepts that best described the image. These concepts included items such as "person," "building," "drought," "sun," and so on. Finally, the sentiment expressed in the post's text was determined by using a sentiment analysis service. The sentiment was classified as either positive, negative, or neutral.

    After the social media posts were preprocessed, they were visualized using the Social Media Web Application. This intuitive, user-friendly online application was designed for both expert and non-expert users and offers a web-based user interface for filtering and visualizing the collected social media data. The application provides various filtering options, an interactive map, a timeline, and a collection of graphs to help users analyze the data. Moreover, this application provides users with the option to download aggregated data for specific periods by applying filters and clicking the "Download Posts" button. This feature allows users to easily extract and analyze social media data outside of the web application, providing greater flexibility and control over data analysis.

    The dataset is provided by INFALIA. INFALIA, being a spin-off of the CERTH institute and a partner of a research EU project, releases this dataset containing Tweets IDs and post pre-processing data for the sole purpose of enabling the validation of the research conducted within the DeepCube. Moreover, Twitter Content provided in this dataset to third parties remains subject to the Twitter Policy, and those third parties must agree to the Twitter Terms of Service, Privacy Policy, Developer Agreement, and Developer Policy (https://developer.twitter.com/en/developer-terms) before receiving this download.

  14. s

    What Are The Most Used Social Media Platforms?

    • searchlogistics.com
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). What Are The Most Used Social Media Platforms? [Dataset]. https://www.searchlogistics.com/learn/statistics/social-media-addiction-statistics/
    Explore at:
    Dataset updated
    Apr 1, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Facebook and YouTube are still the most used social media platforms today.

  15. COVID-19 Sentiment: 500K Instagram Posts (2020-24)

    • kaggle.com
    Updated Oct 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nirmalya Thakur, PhD (2024). COVID-19 Sentiment: 500K Instagram Posts (2020-24) [Dataset]. http://doi.org/10.34740/kaggle/dsv/9687126
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 21, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Nirmalya Thakur, PhD
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Please cite the following paper when using this dataset:

    N. Thakur, “Five Years of COVID-19 Discourse on Instagram: A Labeled Instagram Dataset of Over Half a Million Posts for Multilingual Sentiment Analysis”, Proceedings of the 7th International Conference on Machine Learning and Natural Language Processing (MLNLP 2024), Chengdu, China, October 18-20, 2024 (Paper accepted for publication, Preprint available at: https://arxiv.org/abs/2410.03293)

    Abstract

    The outbreak of COVID-19 served as a catalyst for content creation and dissemination on social media platforms, as such platforms serve as virtual communities where people can connect and communicate with one another seamlessly. While there have been several works related to the mining and analysis of COVID-19-related posts on social media platforms such as Twitter (or X), YouTube, Facebook, and TikTok, there is still limited research that focuses on the public discourse on Instagram in this context. Furthermore, the prior works in this field have only focused on the development and analysis of datasets of Instagram posts published during the first few months of the outbreak. The work presented in this paper aims to address this research gap and presents a novel multilingual dataset of 500,153 Instagram posts about COVID-19 published between January 2020 and September 2024. This dataset contains Instagram posts in 161 different languages. After the development of this dataset, multilingual sentiment analysis was performed using VADER and twitter-xlm-roberta-base-sentiment. This process involved classifying each post as positive, negative, or neutral. The results of sentiment analysis are presented as a separate attribute in this dataset.

    For each of these posts, the Post ID, Post Description, Date of publication, language code, full version of the language, and sentiment label are presented as separate attributes in the dataset.

    The Instagram posts in this dataset are present in 161 different languages out of which the top 10 languages in terms of frequency are English (343041 posts), Spanish (30220 posts), Hindi (15832 posts), Portuguese (15779 posts), Indonesian (11491 posts), Tamil (9592 posts), Arabic (9416 posts), German (7822 posts), Italian (5162 posts), Turkish (4632 posts)

    There are 535,021 distinct hashtags in this dataset with the top 10 hashtags in terms of frequency being #covid19 (169865 posts), #covid (132485 posts), #coronavirus (117518 posts), #covid_19 (104069 posts), #covidtesting (95095 posts), #coronavirusupdates (75439 posts), #corona (39416 posts), #healthcare (38975 posts), #staysafe (36740 posts), #coronavirusoutbreak (34567 posts)

    The following is a description of the attributes present in this dataset - Post ID: Unique ID of each Instagram post - Post Description: Complete description of each post in the language in which it was originally published - Date: Date of publication in MM/DD/YYYY format - Language code: Language code (for example: “en”) that represents the language of the post as detected using the Google Translate API - Full Language: Full form of the language (for example: “English”) that represents the language of the post as detected using the Google Translate API - Sentiment: Results of sentiment analysis (using the preprocessed version of each post) where each post was classified as positive, negative, or neutral

    Open Research Questions

    This dataset is expected to be helpful for the investigation of the following research questions and even beyond:

    • How does sentiment toward COVID-19 vary across different languages?
    • How has public sentiment toward COVID-19 evolved from 2020 to the present?
    • How do cultural differences affect social media discourse about COVID-19 across various languages?
    • How has COVID-19 impacted mental health, as reflected in social media posts across different languages?
    • How effective were public health campaigns in shifting public sentiment in different languages?
    • What patterns of vaccine hesitancy or support are present in different languages?
    • How did geopolitical events influence public sentiment about COVID-19 in multilingual social media discourse?
    • What role does social media discourse play in shaping public behavior toward COVID-19 in different linguistic communities?
    • How does the sentiment of minority or underrepresented languages compare to that of major world languages regarding COVID-19?
    • What insights can be gained by comparing the sentiment of COVID-19 posts in widely spoken languages (e.g., English, Spanish) to those in less common languages?

    All the Instagram posts that were collected during this data mining process to develop this dataset were publicly available on Instagram and did not require a user to log in to Instagram to view the same (at the time of writing this paper).

  16. Z

    Data from: Five Years of COVID-19 Discourse on Instagram: A Labeled...

    • data.niaid.nih.gov
    Updated Oct 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thakur, Ph.D., Nirmalya (2024). Five Years of COVID-19 Discourse on Instagram: A Labeled Instagram Dataset of Over Half a Million Posts for Multilingual Sentiment Analysis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13896352
    Explore at:
    Dataset updated
    Oct 21, 2024
    Dataset authored and provided by
    Thakur, Ph.D., Nirmalya
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Please cite the following paper when using this dataset:

    N. Thakur, “Five Years of COVID-19 Discourse on Instagram: A Labeled Instagram Dataset of Over Half a Million Posts for Multilingual Sentiment Analysis”, Proceedings of the 7th International Conference on Machine Learning and Natural Language Processing (MLNLP 2024), Chengdu, China, October 18-20, 2024 (Paper accepted for publication, Preprint available at: https://arxiv.org/abs/2410.03293)

    Abstract

    The outbreak of COVID-19 served as a catalyst for content creation and dissemination on social media platforms, as such platforms serve as virtual communities where people can connect and communicate with one another seamlessly. While there have been several works related to the mining and analysis of COVID-19-related posts on social media platforms such as Twitter (or X), YouTube, Facebook, and TikTok, there is still limited research that focuses on the public discourse on Instagram in this context. Furthermore, the prior works in this field have only focused on the development and analysis of datasets of Instagram posts published during the first few months of the outbreak. The work presented in this paper aims to address this research gap and presents a novel multilingual dataset of 500,153 Instagram posts about COVID-19 published between January 2020 and September 2024. This dataset contains Instagram posts in 161 different languages. After the development of this dataset, multilingual sentiment analysis was performed using VADER and twitter-xlm-roberta-base-sentiment. This process involved classifying each post as positive, negative, or neutral. The results of sentiment analysis are presented as a separate attribute in this dataset.

    For each of these posts, the Post ID, Post Description, Date of publication, language code, full version of the language, and sentiment label are presented as separate attributes in the dataset.

    The Instagram posts in this dataset are present in 161 different languages out of which the top 10 languages in terms of frequency are English (343041 posts), Spanish (30220 posts), Hindi (15832 posts), Portuguese (15779 posts), Indonesian (11491 posts), Tamil (9592 posts), Arabic (9416 posts), German (7822 posts), Italian (5162 posts), Turkish (4632 posts)

    There are 535,021 distinct hashtags in this dataset with the top 10 hashtags in terms of frequency being #covid19 (169865 posts), #covid (132485 posts), #coronavirus (117518 posts), #covid_19 (104069 posts), #covidtesting (95095 posts), #coronavirusupdates (75439 posts), #corona (39416 posts), #healthcare (38975 posts), #staysafe (36740 posts), #coronavirusoutbreak (34567 posts)

    The following is a description of the attributes present in this dataset

    Post ID: Unique ID of each Instagram post

    Post Description: Complete description of each post in the language in which it was originally published

    Date: Date of publication in MM/DD/YYYY format

    Language code: Language code (for example: “en”) that represents the language of the post as detected using the Google Translate API

    Full Language: Full form of the language (for example: “English”) that represents the language of the post as detected using the Google Translate API

    Sentiment: Results of sentiment analysis (using the preprocessed version of each post) where each post was classified as positive, negative, or neutral

    Open Research Questions

    This dataset is expected to be helpful for the investigation of the following research questions and even beyond:

    How does sentiment toward COVID-19 vary across different languages?

    How has public sentiment toward COVID-19 evolved from 2020 to the present?

    How do cultural differences affect social media discourse about COVID-19 across various languages?

    How has COVID-19 impacted mental health, as reflected in social media posts across different languages?

    How effective were public health campaigns in shifting public sentiment in different languages?

    What patterns of vaccine hesitancy or support are present in different languages?

    How did geopolitical events influence public sentiment about COVID-19 in multilingual social media discourse?

    What role does social media discourse play in shaping public behavior toward COVID-19 in different linguistic communities?

    How does the sentiment of minority or underrepresented languages compare to that of major world languages regarding COVID-19?

    What insights can be gained by comparing the sentiment of COVID-19 posts in widely spoken languages (e.g., English, Spanish) to those in less common languages?

    All the Instagram posts that were collected during this data mining process to develop this dataset were publicly available on Instagram and did not require a user to log in to Instagram to view the same (at the time of writing this paper).

  17. Z

    Data from: TikTok dataset - Current affairs on TikTok. Virality and...

    • data.niaid.nih.gov
    • ekoizpen-zientifikoa.ehu.eus
    • +1more
    Updated Aug 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peña-Fernández, Simón (2022). TikTok dataset - Current affairs on TikTok. Virality and entertainment for digital natives [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7024884
    Explore at:
    Dataset updated
    Aug 28, 2022
    Dataset provided by
    Peña-Fernández, Simón
    Larrondo-Ureta, Ainara
    Morales-i-Gras, Jordi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Tiktok network graph with 5,638 nodes and 318,986 unique links, representing up to 790,599 weighted links between labels, using Gephi network analysis software.

    Source of:

    Peña-Fernández, Simón, Larrondo-Ureta, Ainara, & Morales-i-Gras, Jordi. (2022). Current affairs on TikTok. Virality and entertainment for digital natives. Profesional De La Información, 31(1), 1–12. https://doi.org/10.5281/zenodo.5962655

    Abstract:

    Since its appearance in 2018, TikTok has become one of the most popular social media platforms among digital natives because of its algorithm-based engagement strategies, a policy of public accounts, and a simple, colorful, and intuitive content interface. As happened in the past with other platforms such as Facebook, Twitter, and Instagram, various media are currently seeking ways to adapt to TikTok and its particular characteristics to attract a younger audience less accustomed to the consumption of journalistic material. Against this background, the aim of this study is to identify the presence of the media and journalists on TikTok, measure the virality and engagement of the content they generate, describe the communities created around them, and identify the presence of journalistic use of these accounts. For this, 23,174 videos from 143 accounts belonging to media from 25 countries were analyzed. The results indicate that, in general, the presence and impact of the media in this social network are low and that most of their content is oriented towards the creation of user communities based on viral content and entertainment. However, albeit with a lesser presence, one can also identify accounts and messages that adapt their content to the specific characteristics of TikTok. Their virality and engagement figures illustrate that there is indeed a niche for current affairs on this social network.

  18. Short Jokes Dataset

    • kaggle.com
    Updated Dec 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Short Jokes Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/short-jokes-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 5, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Short Jokes Dataset

    Humorous Short Jokes

    By Fraser Greenlee (From Huggingface) [source]

    About this dataset

    This dataset offers a valuable resource for various applications such as natural language processing, sentiment analysis, joke generation algorithms, or simply for entertainment purposes. Whether you're a data scientist looking to analyze humor patterns or an individual seeking some quick comedic relief, this dataset has got you covered.

    By utilizing this dataset, researchers can explore different aspects of humor and study the linguistic features that make these short jokes amusing. Moreover, it provides an opportunity for developing computer models capable of generating similar humorous content based on learned patterns.

    How to use the dataset

    • Understanding the Columns:

      • text: This column contains the text of the short joke.
      • **text: No information is provided about this column.
    • Exploring the Jokes:

      • Start by exploring the text column, which contains the actual jokes. You can read through them and have a good laugh!
    • Analyzing the Jokes:

      • To gain insights from this dataset, you can perform various analyses:
        • Sentiment Analysis: Use Natural Language Processing techniques to analyze the sentiment of each joke.
        • Categorization: Group jokes based on common themes or subjects, such as animals, professions, etc.
        • Length Distribution: Analyze and visualize the distribution of joke lengths.
    • Creating New Content or Applications: Since this dataset provides a large collection of short jokes, you can utilize it creatively:

      • Generating Random Jokes: Develop an algorithm that generates new jokes based on patterns found in this dataset.
      • Humor Classification: Build a model that predicts if a given piece of text is funny or not using machine learning techniques.
    • Sharing Your Findings: If you make interesting discoveries or create unique applications using this dataset, consider sharing them with others in Kaggle community.

    Please note that no information regarding dates is available in train.csv; therefore, any temporal analysis or date-based insights won't be feasible with this specific file.

    Research Ideas

    • Analyzing humor patterns: This dataset can be used to analyze different types of humor and identify patterns or common elements in jokes that make them funny. Researchers and linguists can use this dataset to gain insights into the structure, wordplay, or comedic techniques used in short jokes.
    • Natural language processing: With the text data available in this dataset, it can be used for training models in natural language processing (NLP) tasks such as sentiment analysis, joke generation, or understanding humor from written text. NLP researchers and developers can utilize this dataset to build and improve algorithms for detecting or generating funny content.
    • Social media analysis: Short jokes are popular on social media platforms like Twitter or Reddit where users frequently share humorous content. This dataset can be valuable for analyzing the reception and impact of these jokes on social media platforms. By examining trends, engagement metrics, or user reactions to specific jokes from the dataset, marketers or social media analysts can gain insights into what type of humor resonates with different online communities. Overall, this dataset provides a rich resource for exploring various aspects related to humor analysis and NLP tasks while offering opportunities for sociocultural studies related to online comedy culture

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: train.csv | Column name | Description | |:--------------|:----------------------------------------------| | text | The actual content of the short jokes. (Text) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Fraser Greenlee (From Huggingface).

  19. Instagram: countries with the highest audience reach 2024

    • statista.com
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon (2025). Instagram: countries with the highest audience reach 2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset updated
    Jun 17, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    As of April 2024, Bahrain was the country with the highest Instagram audience reach with 95.6 percent. Kazakhstan also had a high Instagram audience penetration rate, with 90.8 percent of the population using the social network. In the United Arab Emirates, Turkey, and Brunei, the photo-sharing platform was used by more than 85 percent of each country's population.

  20. s

    Social Media Usage By Age

    • searchlogistics.com
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Social Media Usage By Age [Dataset]. https://www.searchlogistics.com/learn/statistics/social-media-addiction-statistics/
    Explore at:
    Dataset updated
    Apr 1, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Gen Z and Millennials are the biggest social media users of all age groups.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Bright Data (2022). Social Media Datasets [Dataset]. https://brightdata.com/products/datasets/social-media
Organization logo

Social Media Datasets

Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Sep 7, 2022
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License

https://brightdata.com/licensehttps://brightdata.com/license

Area covered
Worldwide
Description

Gain valuable insights with our comprehensive Social Media Dataset, designed to help businesses, marketers, and analysts track trends, monitor engagement, and optimize strategies. This dataset provides structured and reliable social media data from multiple platforms.

Dataset Features

User Profiles: Access public social media profiles, including usernames, bios, follower counts, engagement metrics, and more. Ideal for audience analysis, influencer marketing, and competitive research. Posts & Content: Extract posts, captions, hashtags, media (images/videos), timestamps, and engagement metrics such as likes, shares, and comments. Useful for trend analysis, sentiment tracking, and content strategy optimization. Comments & Interactions: Analyze user interactions, including replies, mentions, and discussions. This data helps brands understand audience sentiment and engagement patterns. Hashtag & Trend Tracking: Monitor trending hashtags, topics, and viral content across platforms to stay ahead of industry trends and consumer interests.

Customizable Subsets for Specific Needs Our Social Media Dataset is fully customizable, allowing you to filter data based on platform, region, keywords, engagement levels, or specific user profiles. Whether you need a broad dataset for market research or a focused subset for brand monitoring, we tailor the dataset to your needs.

Popular Use Cases

Brand Monitoring & Reputation Management: Track brand mentions, customer feedback, and sentiment analysis to manage online reputation effectively. Influencer Marketing & Audience Analysis: Identify key influencers, analyze engagement metrics, and optimize influencer partnerships. Competitive Intelligence: Monitor competitor activity, content performance, and audience engagement to refine marketing strategies. Market Research & Consumer Insights: Analyze social media trends, customer preferences, and emerging topics to inform business decisions. AI & Predictive Analytics: Leverage structured social media data for AI-driven trend forecasting, sentiment analysis, and automated content recommendations.

Whether you're tracking brand sentiment, analyzing audience engagement, or monitoring industry trends, our Social Media Dataset provides the structured data you need. Get started today and customize your dataset to fit your business objectives.

Search
Clear search
Close search
Google apps
Main menu