100+ datasets found
  1. Social Media Channels and Statistics at the National Archives

    • catalog.data.gov
    • data.amerigeoss.org
    • +1more
    Updated Nov 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Archives and Records Administration (2024). Social Media Channels and Statistics at the National Archives [Dataset]. https://catalog.data.gov/dataset/social-media-channels-and-statistics-at-the-national-archives
    Explore at:
    Dataset updated
    Nov 7, 2024
    Dataset provided by
    National Archives and Records Administrationhttp://www.archives.gov/
    Description

    More than 100 social media channels and statistics for the National Archives and Records Administration.

  2. Instagram accounts with the most followers worldwide 2024

    • statista.com
    • davegsmith.com
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon (2025). Instagram accounts with the most followers worldwide 2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset updated
    Jun 17, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    Cristiano Ronaldo has one of the most popular Instagram accounts as of April 2024.

                  The Portuguese footballer is the most-followed person on the photo sharing app platform with 628 million followers. Instagram's own account was ranked first with roughly 672 million followers.
    
                  How popular is Instagram?
    
                  Instagram is a photo-sharing social networking service that enables users to take pictures and edit them with filters. The platform allows users to post and share their images online and directly with their friends and followers on the social network. The cross-platform app reached one billion monthly active users in mid-2018. In 2020, there were over 114 million Instagram users in the United States and experts project this figure to surpass 127 million users in 2023.
    
                  Who uses Instagram?
    
                  Instagram audiences are predominantly young – recent data states that almost 60 percent of U.S. Instagram users are aged 34 years or younger. Fall 2020 data reveals that Instagram is also one of the most popular social media for teens and one of the social networks with the biggest reach among teens in the United States.
    
                  Celebrity influencers on Instagram
                  Many celebrities and athletes are brand spokespeople and generate additional income with social media advertising and sponsored content. Unsurprisingly, Ronaldo ranked first again, as the average media value of one of his Instagram posts was 985,441 U.S. dollars.
    
  3. Instagram: distribution of global audiences 2024, by age and gender

    • statista.com
    • davegsmith.com
    Updated Jun 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon (2025). Instagram: distribution of global audiences 2024, by age and gender [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset updated
    Jun 17, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    As of April 2024, around 16.5 percent of global active Instagram users were men between the ages of 18 and 24 years. More than half of the global Instagram population worldwide was aged 34 years or younger.

                  Teens and social media
    
                  As one of the biggest social networks worldwide, Instagram is especially popular with teenagers. As of fall 2020, the photo-sharing app ranked third in terms of preferred social network among teenagers in the United States, second to Snapchat and TikTok. Instagram was one of the most influential advertising channels among female Gen Z users when making purchasing decisions. Teens report feeling more confident, popular, and better about themselves when using social media, and less lonely, depressed and anxious.
                  Social media can have negative effects on teens, which is also much more pronounced on those with low emotional well-being. It was found that 35 percent of teenagers with low social-emotional well-being reported to have experienced cyber bullying when using social media, while in comparison only five percent of teenagers with high social-emotional well-being stated the same. As such, social media can have a big impact on already fragile states of mind.
    
  4. m

    Abbreviated FOMO and social media dataset

    • figshare.mq.edu.au
    • researchdata.edu.au
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Danielle Einstein; Carol Dabb; Madeleine Ferrari; Anne McMaugh; Peter McEvoy; Ron Rapee; Eyal Karin; Maree J. Abbott (2023). Abbreviated FOMO and social media dataset [Dataset]. http://doi.org/10.25949/20188298.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Macquarie University
    Authors
    Danielle Einstein; Carol Dabb; Madeleine Ferrari; Anne McMaugh; Peter McEvoy; Ron Rapee; Eyal Karin; Maree J. Abbott
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This database is comprised of 951 participants who provided self-report data online in their school classrooms. The data was collected in 2016 and 2017. The dataset is comprised of 509 males (54%) and 442 females (46%). Their ages ranged from 12 to 16 years (M = 13.69, SD = 0.72). Seven participants did not report their age. The majority were born in Australia (N = 849, 89%). The next most common countries of birth were China (N = 24, 2.5%), the UK (N = 23, 2.4%), and the USA (N = 9, 0.9%). Data were drawn from students at five Australian independent secondary schools. The data contains item responses for the Spence Children’s Anxiety Scale (SCAS; Spence, 1998) which is comprised of 44 items. The Social media question asked about frequency of use with the question “How often do you use social media?”. The response options ranged from constantly to once a week or less. Items measuring Fear of Missing Out were included and incorporated the following five questions based on the APS Stress and Wellbeing in Australia Survey (APS, 2015). These were “When I have a good time it is important for me to share the details online; I am afraid that I will miss out on something if I don’t stay connected to my online social networks; I feel worried and uncomfortable when I can’t access my social media accounts; I find it difficult to relax or sleep after spending time on social networking sites; I feel my brain burnout with the constant connectivity of social media. Internal consistency for this measure was α = .81. Self compassion was measured using the 12-item short-form of the Self-Compassion Scale (SCS-SF; Raes et al., 2011). The data set has the option of downloading an excel file (composed of two worksheet tabs) or CSV files 1) Data and 2) Variable labels. References: Australian Psychological Society. (2015). Stress and wellbeing in Australia survey. https://www.headsup.org.au/docs/default-source/default-document-library/stress-and-wellbeing-in-australia-report.pdf?sfvrsn=7f08274d_4 Raes, F., Pommier, E., Neff, K. D., & Van Gucht, D. (2011). Construction and factorial validation of a short form of the self-compassion scale. Clinical Psychology and Psychotherapy, 18(3), 250-255. https://doi.org/10.1002/cpp.702 Spence, S. H. (1998). A measure of anxiety symptoms among children. Behaviour Research and Therapy, 36(5), 545-566. https://doi.org/10.1016/S0005-7967(98)00034-5

  5. Data from: TikTok dataset - Current affairs on TikTok. Virality and...

    • zenodo.org
    • research.science.eus
    • +1more
    Updated Aug 28, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simón Peña-Fernández; Simón Peña-Fernández; Ainara Larrondo-Ureta; Ainara Larrondo-Ureta; Jordi Morales-i-Gras; Jordi Morales-i-Gras (2022). TikTok dataset - Current affairs on TikTok. Virality and entertainment for digital natives [Dataset]. http://doi.org/10.5281/zenodo.7024885
    Explore at:
    Dataset updated
    Aug 28, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Simón Peña-Fernández; Simón Peña-Fernández; Ainara Larrondo-Ureta; Ainara Larrondo-Ureta; Jordi Morales-i-Gras; Jordi Morales-i-Gras
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Tiktok network graph with 5,638 nodes and 318,986 unique links, representing up to 790,599 weighted links between labels, using Gephi network analysis software.

    Source of:

    Peña-Fernández, Simón, Larrondo-Ureta, Ainara, & Morales-i-Gras, Jordi. (2022). Current affairs on TikTok. Virality and entertainment for digital natives. Profesional De La Información, 31(1), 1–12. https://doi.org/10.5281/zenodo.5962655

    Abstract:

    Since its appearance in 2018, TikTok has become one of the most popular social media platforms among digital natives because of its algorithm-based engagement strategies, a policy of public accounts, and a simple, colorful, and intuitive content interface. As happened in the past with other platforms such as Facebook, Twitter, and Instagram, various media are currently seeking ways to adapt to TikTok and its particular characteristics to attract a younger audience less accustomed to the consumption of journalistic material. Against this background, the aim of this study is to identify the presence of the media and journalists on TikTok, measure the virality and engagement of the content they generate, describe the communities created around them, and identify the presence of journalistic use of these accounts. For this, 23,174 videos from 143 accounts belonging to media from 25 countries were analyzed. The results indicate that, in general, the presence and impact of the media in this social network are low and that most of their content is oriented towards the creation of user communities based on viral content and entertainment. However, albeit with a lesser presence, one can also identify accounts and messages that adapt their content to the specific characteristics of TikTok. Their virality and engagement figures illustrate that there is indeed a niche for current affairs on this social network.

  6. social media impact dataset

    • kaggle.com
    Updated Oct 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DILSHANA SHERIN (2024). social media impact dataset [Dataset]. https://www.kaggle.com/datasets/dilshanasherin/social-media-impact-dataset/versions/1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 18, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    DILSHANA SHERIN
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by thickangel

    Released under Apache 2.0

    Contents

  7. P

    Data from: MuMiN Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Feb 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dan Saattrup Nielsen; Ryan McConville (2022). MuMiN Dataset [Dataset]. https://paperswithcode.com/dataset/mumin
    Explore at:
    Dataset updated
    Feb 22, 2022
    Authors
    Dan Saattrup Nielsen; Ryan McConville
    Description

    MuMiN is a misinformation graph dataset containing rich social media data (tweets, replies, users, images, articles, hashtags), spanning 21 million tweets belonging to 26 thousand Twitter threads, each of which have been semantically linked to 13 thousand fact-checked claims across dozens of topics, events and domains, in 41 different languages, spanning more than a decade.

    MuMiN fills a gap in the existing misinformation datasets in multiple ways:

    By having a large amount of social media information which have been semantically linked to fact-checked claims on an individual basis. By featuring 41 languages, enabling evaluation of multilingual misinformation detection models. By featuring both tweets, articles, images, social connections and hashtags, enabling multimodal approaches to misinformation detection.

    MuMiN features two node classification tasks, related to the veracity of a claim:

    Claim classification: Determine the veracity of a claim, given its social network context. Tweet classification: Determine the likelihood that a social media post to be fact-checked is discussing a misleading claim, given its social network context.

    To use the dataset, see the "Getting Started" guide and tutorial at the MuMiN website.

  8. Social Media Datasets

    • brightdata.com
    .json, .csv, .xlsx
    Updated Sep 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2022). Social Media Datasets [Dataset]. https://brightdata.com/products/datasets/social-media
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Sep 7, 2022
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Gain valuable insights with our comprehensive Social Media Dataset, designed to help businesses, marketers, and analysts track trends, monitor engagement, and optimize strategies. This dataset provides structured and reliable social media data from multiple platforms.

    Dataset Features

    User Profiles: Access public social media profiles, including usernames, bios, follower counts, engagement metrics, and more. Ideal for audience analysis, influencer marketing, and competitive research. Posts & Content: Extract posts, captions, hashtags, media (images/videos), timestamps, and engagement metrics such as likes, shares, and comments. Useful for trend analysis, sentiment tracking, and content strategy optimization. Comments & Interactions: Analyze user interactions, including replies, mentions, and discussions. This data helps brands understand audience sentiment and engagement patterns. Hashtag & Trend Tracking: Monitor trending hashtags, topics, and viral content across platforms to stay ahead of industry trends and consumer interests.

    Customizable Subsets for Specific Needs Our Social Media Dataset is fully customizable, allowing you to filter data based on platform, region, keywords, engagement levels, or specific user profiles. Whether you need a broad dataset for market research or a focused subset for brand monitoring, we tailor the dataset to your needs.

    Popular Use Cases

    Brand Monitoring & Reputation Management: Track brand mentions, customer feedback, and sentiment analysis to manage online reputation effectively. Influencer Marketing & Audience Analysis: Identify key influencers, analyze engagement metrics, and optimize influencer partnerships. Competitive Intelligence: Monitor competitor activity, content performance, and audience engagement to refine marketing strategies. Market Research & Consumer Insights: Analyze social media trends, customer preferences, and emerging topics to inform business decisions. AI & Predictive Analytics: Leverage structured social media data for AI-driven trend forecasting, sentiment analysis, and automated content recommendations.

    Whether you're tracking brand sentiment, analyzing audience engagement, or monitoring industry trends, our Social Media Dataset provides the structured data you need. Get started today and customize your dataset to fit your business objectives.

  9. Data from: Datasets of Twitter mentions and publications in Information...

    • zenodo.org
    • produccioncientifica.ugr.es
    tsv
    Updated Nov 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wenceslao Arroyo-Machado; Wenceslao Arroyo-Machado; Daniel Torres-Salinas; Daniel Torres-Salinas; Nicolás Robinson-García; Nicolás Robinson-García (2021). Datasets of Twitter mentions and publications in Information Science & Library Science and Microbiology [Dataset]. http://doi.org/10.5281/zenodo.4148941
    Explore at:
    tsvAvailable download formats
    Dataset updated
    Nov 19, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Wenceslao Arroyo-Machado; Wenceslao Arroyo-Machado; Daniel Torres-Salinas; Daniel Torres-Salinas; Nicolás Robinson-García; Nicolás Robinson-García
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Datasets used in the study 'Identifying and characterizing social media communities: a socio-semantic network approach to altmetrics'.

    Microbiology publications (mic_publiccations.tsv). Dataset of 101,206 Microbiology publications with their author keywords.

    Microbiology mentions (mic_mentions.tsv). Dataset of 328,110 Twitter mentions to Microbiology publications.

    Information Science & Library Science publications (lis_publications.tsv). Dataset of 8452 Information Science & Library Science publications with their author keywords.

    Information Science & Library Science mentions (lis_mentions.tsv). Dataset of 35,411 Twitter mentions to Information Science & Library Science publications.

  10. m

    Graph-Based Social Media Data on Mental Health Topics

    • data.mendeley.com
    Updated Nov 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samuel Ady Sanjaya (2024). Graph-Based Social Media Data on Mental Health Topics [Dataset]. http://doi.org/10.17632/z45txpdp7f.2
    Explore at:
    Dataset updated
    Nov 4, 2024
    Authors
    Samuel Ady Sanjaya
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is structured as a graph, where nodes represent users and edges capture their interactions, including tweets, retweets, replies, and mentions. Each node provides detailed user attributes, such as unique ID, follower and following counts, and verification status, offering insights into each user's identity, role, and influence in the mental health discourse. The edges illustrate user interactions, highlighting engagement patterns and types of content that drive responses, such as tweet impressions. This interconnected structure enables sentiment analysis and public reaction studies, allowing researchers to explore engagement trends and identify the mental health topics that resonate most with users.

    The dataset consists of three files: 1. Edges Data: Contains graph data essential for social network analysis, including fields for UserID (Source), UserID (Destination), Post/Tweet ID, and Date of Relationship. This file enables analysis of user connections without including tweet content, maintaining compliance with Twitter/X’s data-sharing policies. 2. Nodes Data: Offers user-specific details relevant to network analysis, including UserID, Account Creation Date, Follower and Following counts, Verified Status, and Date Joined Twitter. This file allows researchers to examine user behavior (e.g., identifying influential users or spam-like accounts) without direct reference to tweet content. 3. Twitter/X Content Data: This file contains only the raw tweet text as a single-column dataset, without associated user identifiers or metadata. By isolating the text, we ensure alignment with anonymization standards observed in similar published datasets, safeguarding user privacy in compliance with Twitter/X's data guidelines. This content is crucial for addressing the research focus on mental health discourse in social media. (References to prior Data in Brief publications involving Twitter/X data informed the dataset's structure.)

  11. Top 10 social media by active users

    • kaggle.com
    Updated Aug 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mahmoud Gamil (2024). Top 10 social media by active users [Dataset]. https://www.kaggle.com/datasets/mahmoudredagamail/number-of-monthly-active-users-worldwide
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 15, 2024
    Dataset provided by
    Kaggle
    Authors
    Mahmoud Gamil
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Social Media has become a part of our day-to-day routine, keeping users from across the world well-connected through digital platforms. With each passing year, social media is evolving at a rapid speed. With each passing year, the number of social media users is increasing at an immersive speed. Reports also suggest the number of social media users will reach a milestone of 5.85 billion in 2027.

    In 2024, 62.6% of the world’s population will access social media, which clearly indicates the dominance of social media platforms in today’s world. In this article, we will examine social media statistics for 2024, uncovering monthly active users, daily time spent by users, most downloaded social media apps, etc.

  12. Instagram: distribution of global audiences 2024, by gender

    • statista.com
    • davegsmith.com
    Updated Jun 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon (2025). Instagram: distribution of global audiences 2024, by gender [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset updated
    Jun 17, 2025
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    As of January 2024, Instagram was slightly more popular with men than women, with men accounting for 50.6 percent of the platform’s global users. Additionally, the social media app was most popular amongst younger audiences, with almost 32 percent of users aged between 18 and 24 years.

                  Instagram’s Global Audience
    
                  As of January 2024, Instagram was the fourth most popular social media platform globally, reaching two billion monthly active users (MAU). This number is projected to keep growing with no signs of slowing down, which is not a surprise as the global online social penetration rate across all regions is constantly increasing.
                  As of January 2024, the country with the largest Instagram audience was India with 362.9 million users, followed by the United States with 169.7 million users.
    
                  Who is winning over the generations?
    
                  Even though Instagram’s audience is almost twice the size of TikTok’s on a global scale, TikTok has shown itself to be a fierce competitor, particularly amongst younger audiences. TikTok was the most downloaded mobile app globally in 2022, generating 672 million downloads. As of 2022, Generation Z in the United States spent more time on TikTok than on Instagram monthly.
    
  13. Data from: TrueFace: a Dataset for the Detection of Synthetic Face Images...

    • zenodo.org
    • data.niaid.nih.gov
    xz
    Updated Oct 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giulia Boato; Cecilia Pasquini; Antonio Luigi Stefani; Sebastiano Verde; Daniele Miorandi; Giulia Boato; Cecilia Pasquini; Antonio Luigi Stefani; Sebastiano Verde; Daniele Miorandi (2022). TrueFace: a Dataset for the Detection of Synthetic Face Images from Social Networks [Dataset]. http://doi.org/10.5281/zenodo.7065064
    Explore at:
    xzAvailable download formats
    Dataset updated
    Oct 13, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Giulia Boato; Cecilia Pasquini; Antonio Luigi Stefani; Sebastiano Verde; Daniele Miorandi; Giulia Boato; Cecilia Pasquini; Antonio Luigi Stefani; Sebastiano Verde; Daniele Miorandi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    TrueFace is a first dataset of social media processed real and synthetic faces, obtained by the successful StyleGAN generative models, and shared on Facebook, Twitter and Telegram.

    Images have historically been a universal and cross-cultural communication medium, capable of reaching people of any social background, status or education. Unsurprisingly though, their social impact has often been exploited for malicious purposes, like spreading misinformation and manipulating public opinion. With today's technologies, the possibility to generate highly realistic fakes is within everyone's reach. A major threat derives in particular from the use of synthetically generated faces, which are able to deceive even the most experienced observer. To contrast this fake news phenomenon, researchers have employed artificial intelligence to detect synthetic images by analysing patterns and artifacts introduced by the generative models. However, most online images are subject to repeated sharing operations by social media platforms. Said platforms process uploaded images by applying operations (like compression) that progressively degrade those useful forensic traces, compromising the effectiveness of the developed detectors. To solve the synthetic-vs-real problem "in the wild", more realistic image databases, like TrueFace, are needed to train specialised detectors.

  14. 4

    Data related to the paper "Studying social unrest through the lens of social...

    • data.4tu.nl
    zip
    Updated Dec 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lucas Spierenburg; O. (Oded) Cats; Sander van Cranenburgh (2024). Data related to the paper "Studying social unrest through the lens of social media" [Dataset]. http://doi.org/10.4121/649e8f5d-8e40-4ab7-9d07-b5ef53d810f0.v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 12, 2024
    Dataset provided by
    4TU.ResearchData
    Authors
    Lucas Spierenburg; O. (Oded) Cats; Sander van Cranenburgh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description


    Dataset corresponding to the paper "Studying social unrest through the lens of social media".


    107,674 geolocated visual posts from a social media were collected during and after the 'Nahel Merzouk' riots in the summer 2023 in 7 French cities. These posts were fed to a computer vision model with the objective of identifying riot-related posts. This dataset contains the metadata (date, time, and location) of those posts along with the label of the posts (according to the model). Riot-related posts are then clustered into "events", based on their spatiotemporal proximity (see paper for more details).


    Columns:

    "timestamp" (TIMESTAMP): Date and time of the posts

    "latitude" (REAL): Latitude at which the post was published

    "longitude" (REAL): Longitude at which the post was published

    "pred_class" (INTEGER): Binary variable with value 1 if it represents a riot, 0 otherwise

    "event" (TEXT): Event associated to the post, structured as follows:

    "No event" if the post is not marked as riot-related

    "day_city_id" with "day" being the day of the month associated to the event, such as "2", "city" being the city in which the event happened, such as "Paris", "id" being an integer. "29_Marseille_0" corresponds to event "0" happening in Marseille on June 29th 2023. If the value of the id is "-1", the post could not be associated to any event.

  15. Social Media Usage Survey

    • kaggle.com
    Updated Apr 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SIDDHI PRIYA (2025). Social Media Usage Survey [Dataset]. https://www.kaggle.com/datasets/siddhipriya/social-media-usage-survey/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 8, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    SIDDHI PRIYA
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset captures insights from a survey on social media usage across diverse age groups and genders. It includes data on the most used platforms, daily screen time, reasons for usage, preferred content types, and how social media influences buying decisions. Additionally, it reflects users' concerns about privacy and their willingness to reduce usage. The dataset is useful for analyzing digital behavior, content preferences, and the social impact of online platforms. It can support research in marketing, psychology, and digital well-being, offering a snapshot of how people interact with and perceive social media in their daily lives.

  16. f

    Data from: Mpox Narrative on Instagram: A Labeled Multilingual Dataset of...

    • figshare.com
    xlsx
    Updated Oct 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nirmalya Thakur (2024). Mpox Narrative on Instagram: A Labeled Multilingual Dataset of Instagram Posts on Mpox for Sentiment, Hate Speech, and Anxiety Analysis [Dataset]. http://doi.org/10.6084/m9.figshare.27072247.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Oct 12, 2024
    Dataset provided by
    figshare
    Authors
    Nirmalya Thakur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Please cite this paper when using this dataset: N. Thakur, “Mpox narrative on Instagram: A labeled multilingual dataset of Instagram posts on mpox for sentiment, hate speech, and anxiety analysis,” arXiv [cs.LG], 2024, URL: https://arxiv.org/abs/2409.05292Abstract: The world is currently experiencing an outbreak of mpox, which has been declared a Public Health Emergency of International Concern by WHO. During recent virus outbreaks, social media platforms have played a crucial role in keeping the global population informed and updated regarding various aspects of the outbreaks. As a result, in the last few years, researchers from different disciplines have focused on the development of social media datasets focusing on different virus outbreaks. No prior work in this field has focused on the development of a dataset of Instagram posts about the mpox outbreak. The work presented in this paper (stated above) aims to address this research gap. It presents this multilingual dataset of 60,127 Instagram posts about mpox, published between July 23, 2022, and September 5, 2024. This dataset contains Instagram posts about mpox in 52 languages.For each of these posts, the Post ID, Post Description, Date of publication, language, and translated version of the post (translation to English was performed using the Google Translate API) are presented as separate attributes in the dataset. After developing this dataset, sentiment analysis, hate speech detection, and anxiety or stress detection were also performed. This process included classifying each post intoone of the fine-grain sentiment classes, i.e., fear, surprise, joy, sadness, anger, disgust, or neutralhate or not hateanxiety/stress detected or no anxiety/stress detected.These results are presented as separate attributes in the dataset for the training and testing of machine learning algorithms for sentiment, hate speech, and anxiety or stress detection, as well as for other applications.The 52 distinct languages in which Instagram posts are present in the dataset are English, Portuguese, Indonesian, Spanish, Korean, French, Hindi, Finnish, Turkish, Italian, German, Tamil, Urdu, Thai, Arabic, Persian, Tagalog, Dutch, Catalan, Bengali, Marathi, Malayalam, Swahili, Afrikaans, Panjabi, Gujarati, Somali, Lithuanian, Norwegian, Estonian, Swedish, Telugu, Russian, Danish, Slovak, Japanese, Kannada, Polish, Vietnamese, Hebrew, Romanian, Nepali, Czech, Modern Greek, Albanian, Croatian, Slovenian, Bulgarian, Ukrainian, Welsh, Hungarian, and Latvian.The following is a description of the attributes present in this dataset:Post ID: Unique ID of each Instagram postPost Description: Complete description of each post in the language in which it was originally publishedDate: Date of publication in MM/DD/YYYY formatLanguage: Language of the post as detected using the Google Translate APITranslated Post Description: Translated version of the post description. All posts which were not in English were translated into English using the Google Translate API. No language translation was performed for English posts.Sentiment: Results of sentiment analysis (using the preprocessed version of the translated Post Description) where each post was classified into one of the sentiment classes: fear, surprise, joy, sadness, anger, disgust, and neutralHate: Results of hate speech detection (using the preprocessed version of the translated Post Description) where each post was classified as hate or not hateAnxiety or Stress: Results of anxiety or stress detection (using the preprocessed version of the translated Post Description) where each post was classified as stress/anxiety detected or no stress/anxiety detected.All the Instagram posts that were collected during this data mining process to develop this dataset were publicly available on Instagram and did not require a user to log in to Instagram to view the same (at the time of writing this paper).

  17. f

    Dataset Explaining social media attention in Clinical Medicine research...

    • figshare.com
    xlsx
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pablo Dorta-gonzález (2023). Dataset Explaining social media attention in Clinical Medicine research through social mentions and bibliometric factors [Dataset]. http://doi.org/10.6084/m9.figshare.20411592.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    figshare
    Authors
    Pablo Dorta-gonzález
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The unit of study: articles and reviews in disciplinary journals in the field of Clinical Medicine. The time frame covers 2018-2020 and the country analyzed is Spain.

    The random sample: N=4895. The years 2018 and 2019 correspond to a simple random sample, while the year 2020 is a compendium of a random sample and the complete COVID-19 subgroup.

    Variables: Twitter is nowadays the most used social media platform by the general population (and by researchers in particular) to disseminate and comment on the results of scientific research. There are many and varied factors that may influence social media attention of research. These factors include the mainstream news coverage, the treated topic (COVID-19, for example), and some characteristics of the research such as its proximity to social issues (impact on public policy) and business (impact on patents), their contribution to the consolidation of knowledge (in the format of review or mention on Wikipedia), and the recommendation of experts (Faculty Opinions).

  18. f

    Data from: Early prediction and characterization of high-impact world events...

    • figshare.com
    txt
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mauricio Quezada; jkalyana@ucsd.edu; bpoblete@dcc.uchile.cl; gert@ece.ucsd.edu (2023). Early prediction and characterization of high-impact world events using social media [Dataset]. http://doi.org/10.6084/m9.figshare.3465974.v4
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    figshare
    Authors
    Mauricio Quezada; jkalyana@ucsd.edu; bpoblete@dcc.uchile.cl; gert@ece.ucsd.edu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    World
    Description

    This dataset consists on 5234 news events obtained from Twitter. The file tweets.csv.gz (available upon request via email to the authors) contains a CSV file, called tweets.csv, with all the tweets IDs corresponding to each event in events.csv. The format of each line of the file is the following:tweet_id, event_idWhere:tweet_id is an long number indicating the Twitter ID of the given tweet. Using the Twitter REST API it is possible to retrieve all the information about the given tweet.event_id corresponds to the event ID of the given tweet. The file events.csv.gz contains a CSV file, called events.csv with all the news events captured from Twitter since August, 2013 until June, 2014. The format of each line of the file is the following:

    event_ID,date,total_keywords,total_tweets,keywords

    Where:

    event_ID is an integer which identifies the corresponding event. There are 5234 events, then event_ID ranges from 1 to 5234. date is the date of the event or connected component. The format is YYYY-MM-DD. total_keywords is an integer indicating how many keywords are in the event or connected component. total_tweets is an integer indicating how many tweets belongs to this event. keywords is a string containing total keywords keywords. There is a semicolon between two keywords.

    The files cluster_labels.txt and time_resolutions.txt contain the cluster labels for each event and the time resolutions learned from all events, respectively.

    cluster_labels.txt contains one integer number per line, from 0 to 19. In line i, the cluster label in that line corresponds to the event ID number i. time_resolutions.txt contains one floating point number per line, indicating the time resolution learned for all events, in minutes. There are 20 numbers in the file, one per line, in increasing order, with at most 13 decimal numbers after the point.

  19. f

    Predicting National Suicide Numbers with Social Media Data

    • plos.figshare.com
    pdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hong-Hee Won; Woojae Myung; Gil-Young Song; Won-Hee Lee; Jong-Won Kim; Bernard J. Carroll; Doh Kwan Kim (2023). Predicting National Suicide Numbers with Social Media Data [Dataset]. http://doi.org/10.1371/journal.pone.0061809
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Hong-Hee Won; Woojae Myung; Gil-Young Song; Won-Hee Lee; Jong-Won Kim; Bernard J. Carroll; Doh Kwan Kim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Suicide is not only an individual phenomenon, but it is also influenced by social and environmental factors. With the high suicide rate and the abundance of social media data in South Korea, we have studied the potential of this new medium for predicting completed suicide at the population level. We tested two social media variables (suicide-related and dysphoria-related weblog entries) along with classical social, economic and meteorological variables as predictors of suicide over 3 years (2008 through 2010). Both social media variables were powerfully associated with suicide frequency. The suicide variable displayed high variability and was reactive to celebrity suicide events, while the dysphoria variable showed longer secular trends, with lower variability. We interpret these as reflections of social affect and social mood, respectively. In the final multivariate model, the two social media variables, especially the dysphoria variable, displaced two classical economic predictors – consumer price index and unemployment rate. The prediction model developed with the 2-year training data set (2008 through 2009) was validated in the data for 2010 and was robust in a sensitivity analysis controlling for celebrity suicide effects. These results indicate that social media data may be of value in national suicide forecasting and prevention.

  20. My Digital Footprint

    • kaggle.com
    zip
    Updated Jun 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Girish (2023). My Digital Footprint [Dataset]. https://www.kaggle.com/datasets/girish17019/my-digital-footprint
    Explore at:
    zip(874430159 bytes)Available download formats
    Dataset updated
    Jun 29, 2023
    Authors
    Girish
    Description

    Dataset Info:

    MyDigitalFootprint (MDF) is a novel large-scale dataset composed of smartphone embedded sensors data, physical proximity information, and Online Social Networks interactions aimed at supporting multimodal context-recognition and social relationships modelling in mobile environments. The dataset includes two months of measurements and information collected from the personal mobile devices of 31 volunteer users by following the in-the-wild data collection approach: the data has been collected in the users' natural environment, without limiting their usual behaviour. Existing public datasets generally consist of a limited set of context data, aimed at optimising specific application domains (human activity recognition is the most common example). On the contrary, the dataset contains a comprehensive set of information describing the user context in the mobile environment.

    The complete analysis of the data contained in MDF has been presented in the following publication:

    https://www.sciencedirect.com/science/article/abs/pii/S1574119220301383?via%3Dihub

    The full anonymised dataset is contained in the folder MDF. Moreover, in order to demonstrate the efficacy of MDF, there are three proof of concept context-aware applications based on different machine learning tasks:

    1. A social link prediction algorithm based on physical proximity data,
    2. The recognition of daily-life activities based on smartphone-embedded sensors data,
    3. A pervasive context-aware recommender system.

    For the sake of reproducibility, the data used to evaluate the proof-of-concept applications are contained in the folders link-prediction, context-recognition, and cars, respectively.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
National Archives and Records Administration (2024). Social Media Channels and Statistics at the National Archives [Dataset]. https://catalog.data.gov/dataset/social-media-channels-and-statistics-at-the-national-archives
Organization logo

Social Media Channels and Statistics at the National Archives

Explore at:
Dataset updated
Nov 7, 2024
Dataset provided by
National Archives and Records Administrationhttp://www.archives.gov/
Description

More than 100 social media channels and statistics for the National Archives and Records Administration.

Search
Clear search
Close search
Google apps
Main menu