100+ datasets found
  1. Social Media Influencers in 2022

    • kaggle.com
    Updated Dec 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ram Jas (2022). Social Media Influencers in 2022 [Dataset]. https://www.kaggle.com/datasets/ramjasmaurya/top-1000-social-media-channels/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 27, 2022
    Dataset provided by
    Kaggle
    Authors
    Ram Jas
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Important : its a 3 month gap data Starting from March 2022 to Dec 2022

    Influencers are categorized by the number of followers they have on social media. They include celebrities with large followings to niche content creators with a loyal following on social-media platforms such as YouTube, Instagram, Facebook, and Twitter.Their followers range in number from hundreds of millions to 1,000. Influencers may be categorized in tiers (mega-, macro-, micro-, and nano-influencers), based on their number of followers.

    Businesses pursue people who aim to lessen their consumption of advertisements, and are willing to pay their influencers more. Targeting influencers is seen as increasing marketing's reach, counteracting a growing tendency by prospective customers to ignore marketing.

    Marketing researchers Kapitan and Silvera find that influencer selection extends into product personality. This product and benefit matching is key. For a shampoo, it should use an influencer with good hair. Likewise, a flashy product may use bold colors to convey its brand. If an influencer is not flashy, they will clash with the brand. Matching an influencer with the product's purpose and mood is important.

    https://sceptermarketing.com/wp-content/uploads/2019/02/social-media-influencers-2l4ues9.png">

  2. Social Media Usage Survey

    • kaggle.com
    Updated Apr 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SIDDHI PRIYA (2025). Social Media Usage Survey [Dataset]. https://www.kaggle.com/datasets/siddhipriya/social-media-usage-survey/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 8, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    SIDDHI PRIYA
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset captures insights from a survey on social media usage across diverse age groups and genders. It includes data on the most used platforms, daily screen time, reasons for usage, preferred content types, and how social media influences buying decisions. Additionally, it reflects users' concerns about privacy and their willingness to reduce usage. The dataset is useful for analyzing digital behavior, content preferences, and the social impact of online platforms. It can support research in marketing, psychology, and digital well-being, offering a snapshot of how people interact with and perceive social media in their daily lives.

  3. o

    Social Media Sentiments Analysis Dataset 📊

    • opendatabay.com
    .csv
    Updated Jun 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Social Media Sentiments Analysis Dataset 📊 [Dataset]. https://www.opendatabay.com/data/dataset/840edf8a-202c-42ce-815a-45c7cbc1c364
    Explore at:
    .csvAvailable download formats
    Dataset updated
    Jun 7, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Social Media and Networking
    Description

    The Social Media Sentiments Analysis Dataset captures a vibrant tapestry of emotions, trends, and interactions across various social media platforms. This dataset provides a snapshot of user-generated content, encompassing text, timestamps, hashtags, countries, likes, and retweets. Each entry unveils unique stories—moments of surprise, excitement, admiration, thrill, contentment, and more—shared by individuals worldwide.

    Key Features

    Feature Description Text User-generated content showcasing sentiments Sentiment Categorized emotions Timestamp Date and time information User Unique identifiers of users contributing Platform Social media platform where the content originated Hashtags Identifies trending topics and themes Likes Quantifies user engagement (likes) Retweets Reflects content popularity (retweets) Country Geographical origin of each post Year Year of the post Month Month of the post Day Day of the post Hour Hour of the post How to Use The Social Media Sentiments Analysis Dataset 📊

    The Social Media Sentiments Analysis Dataset is a rich source of information that can be leveraged for various analytical purposes. Below are key ways to make the most of this dataset:

    Sentiment Analysis:

    Explore the emotional landscape by conducting sentiment analysis on the "Text" column. Classify user-generated content into categories such as surprise, excitement, admiration, thrill, contentment, and more.

    Temporal Analysis:

    Investigate trends over time using the "Timestamp" column. Identify patterns, fluctuations, or recurring themes in social media content.

    User Behavior Insights:

    Analyze user engagement through the "Likes" and "Retweets" columns. Discover popular content and user preferences.

    Platform-Specific Analysis:

    Examine variations in content across different social media platforms using the "Platform" column. Understand how sentiments vary across platforms.

    Hashtag Trends:

    Identify trending topics and themes by analyzing the "Hashtags" column. Uncover popular or recurring hashtags.

    Geographical Analysis:

    Explore content distribution based on the "Country" column. Understand regional variations in sentiment and topic preferences.

    User Identification:

    Use the "User" column to track specific users and their contributions. Analyze the impact of influential users on sentiment trends.

    Cross-Analysis:

    Combine multiple features for in-depth insights. For example, analyze sentiment trends over time or across different platforms and countries.

    Original Data Source: Social Media Sentiments Analysis Dataset 📊

  4. m

    Abbreviated FOMO and social media dataset

    • figshare.mq.edu.au
    • researchdata.edu.au
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Danielle Einstein; Carol Dabb; Madeleine Ferrari; Anne McMaugh; Peter McEvoy; Ron Rapee; Eyal Karin; Maree J. Abbott (2023). Abbreviated FOMO and social media dataset [Dataset]. http://doi.org/10.25949/20188298.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Macquarie University
    Authors
    Danielle Einstein; Carol Dabb; Madeleine Ferrari; Anne McMaugh; Peter McEvoy; Ron Rapee; Eyal Karin; Maree J. Abbott
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This database is comprised of 951 participants who provided self-report data online in their school classrooms. The data was collected in 2016 and 2017. The dataset is comprised of 509 males (54%) and 442 females (46%). Their ages ranged from 12 to 16 years (M = 13.69, SD = 0.72). Seven participants did not report their age. The majority were born in Australia (N = 849, 89%). The next most common countries of birth were China (N = 24, 2.5%), the UK (N = 23, 2.4%), and the USA (N = 9, 0.9%). Data were drawn from students at five Australian independent secondary schools. The data contains item responses for the Spence Children’s Anxiety Scale (SCAS; Spence, 1998) which is comprised of 44 items. The Social media question asked about frequency of use with the question “How often do you use social media?”. The response options ranged from constantly to once a week or less. Items measuring Fear of Missing Out were included and incorporated the following five questions based on the APS Stress and Wellbeing in Australia Survey (APS, 2015). These were “When I have a good time it is important for me to share the details online; I am afraid that I will miss out on something if I don’t stay connected to my online social networks; I feel worried and uncomfortable when I can’t access my social media accounts; I find it difficult to relax or sleep after spending time on social networking sites; I feel my brain burnout with the constant connectivity of social media. Internal consistency for this measure was α = .81. Self compassion was measured using the 12-item short-form of the Self-Compassion Scale (SCS-SF; Raes et al., 2011). The data set has the option of downloading an excel file (composed of two worksheet tabs) or CSV files 1) Data and 2) Variable labels. References: Australian Psychological Society. (2015). Stress and wellbeing in Australia survey. https://www.headsup.org.au/docs/default-source/default-document-library/stress-and-wellbeing-in-australia-report.pdf?sfvrsn=7f08274d_4 Raes, F., Pommier, E., Neff, K. D., & Van Gucht, D. (2011). Construction and factorial validation of a short form of the self-compassion scale. Clinical Psychology and Psychotherapy, 18(3), 250-255. https://doi.org/10.1002/cpp.702 Spence, S. H. (1998). A measure of anxiety symptoms among children. Behaviour Research and Therapy, 36(5), 545-566. https://doi.org/10.1016/S0005-7967(98)00034-5

  5. d

    Data from: The State of Social Media in Canada 2022

    • dataone.org
    • borealisdata.ca
    Updated Dec 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mai, Philip; Gruzd, Anatoliy (2023). The State of Social Media in Canada 2022 [Dataset]. http://doi.org/10.5683/SP3/BDFE7S
    Explore at:
    Dataset updated
    Dec 28, 2023
    Dataset provided by
    Borealis
    Authors
    Mai, Philip; Gruzd, Anatoliy
    Area covered
    Canada
    Description

    The report provides a snapshot of the social media usage trends amongst online Canadian adults based on an online survey of 1500 participants. Canada continues to be one of the most connected countries in the world. An overwhelming majority of online Canadian adults (94%) have an account on at least one social media platform. However, the 2022 survey results show that the COVID-19 pandemic has ushered in some changes in how and where Canadians are spending their time on social media. Dominant platforms such as Facebook, messaging apps and YouTube are still on top but are losing ground to newer platforms such as TikTok and more niche platforms such as Reddit and Twitch.

  6. News Title Sentiment Dataset

    • zenodo.org
    bin
    Updated Mar 24, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chang Wei Tan; Chang Wei Tan; Christoph Bergmeir; Christoph Bergmeir; Francois Petitjean; Francois Petitjean; Geoffrey I Webb; Geoffrey I Webb (2021). News Title Sentiment Dataset [Dataset]. http://doi.org/10.5281/zenodo.3902726
    Explore at:
    binAvailable download formats
    Dataset updated
    Mar 24, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Chang Wei Tan; Chang Wei Tan; Christoph Bergmeir; Christoph Bergmeir; Francois Petitjean; Francois Petitjean; Geoffrey I Webb; Geoffrey I Webb
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is part of the Monash, UEA & UCR time series regression repository. http://tseregression.org/

    The goal of this dataset is to predict sentiment score for news title. This dataset contains 83164 time series obtained from the News Popularity in Multiple Social Media Platforms dataset from the UCI repository. This is a large data set of news items and their respective social feedback on multiple platforms: Facebook, Google+ and LinkedIn. The collected data relates to a period of 8 months, between November 2015 and July 2016, accounting for about 100,000 news items on four different topics: economy, microsoft, obama and palestine. This data set is tailored for evaluative comparisons in predictive analytics tasks, although allowing for tasks in other research areas such as topic detection and tracking, sentiment analysis in short text, first story detection or news recommendation. The time series has 3 dimensions.

    Please refer to https://archive.ics.uci.edu/ml/datasets/News+Popularity+in+Multiple+Social+Media+Platforms for more details

    Citation request
    Nuno Moniz and Luis Torgo (2018), Multi-Source Social Feedback of Online News Feeds, CoRR

  7. f

    Dataset Political Personalism in Social Media

    • figshare.com
    pdf
    Updated Aug 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    shahaf zamir (2024). Dataset Political Personalism in Social Media [Dataset]. http://doi.org/10.6084/m9.figshare.14073692.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Aug 27, 2024
    Dataset provided by
    figshare
    Authors
    shahaf zamir
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset covers aspects of online politics in 25 democracies: 15 relatively old established European democracies (Austria, Belgium, Denmark, Finland, France, Germany, Iceland, Ireland, Italy, Luxembourg, Netherlands, Norway, Sweden, Switzerland, United Kingdom); five non-European veteran democracies (Australia, Canada, Israel, Japan, New Zealand); two early (Portugal, Spain) and three late (Czech Republic, Hungary, Poland) third-wave (young) European democracies. The research population includes, in each country, parties that won 4% or more of the votes in two consecutive elections before April 2019 (a total of 141 parties and 145 leaders). The dataset includes external party level information such as performance in the last national elections, governmental status, party age, populism affiliation and leadership selection method. It also includes information related to the party leaders such as their term in leadership office and other formal positions. In addition it includes information about online activity mainly on the consumption (user related activities) of the parties and their leaders in Facebook and Twitter two of the most used social media platforms for political purposes.

  8. o

    A dataset of Covid-related misinformation videos and their spread on social...

    • explore.openaire.eu
    • data.niaid.nih.gov
    Updated Feb 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aleksi Knuutila (2021). A dataset of Covid-related misinformation videos and their spread on social media [Dataset]. http://doi.org/10.5281/zenodo.4557827
    Explore at:
    Dataset updated
    Feb 23, 2021
    Authors
    Aleksi Knuutila
    Description

    This dataset contains metadata about all Covid-related YouTube videos which circulated on public social media, but which YouTube eventually removed because they contained false information. It describes 8,122 videos that were shared between November 2019 and June 2020. The dataset contains unique identifiers for the videos and social media accounts that shared the videos, statistics on social media engagement and metadata such as video titles and view counts where they were recoverable. We publish the data alongside the code used to produce on Github. The dataset has reuse potential for research studying narratives related to the coronavirus, the impact of social media on knowledge about health and the politics of social media platforms.

  9. s

    What Are The Most Used Social Media Platforms?

    • searchlogistics.com
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). What Are The Most Used Social Media Platforms? [Dataset]. https://www.searchlogistics.com/learn/statistics/social-media-addiction-statistics/
    Explore at:
    Dataset updated
    Apr 1, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Facebook and YouTube are still the most used social media platforms today.

  10. f

    Data from: Mpox Narrative on Instagram: A Labeled Multilingual Dataset of...

    • figshare.com
    xlsx
    Updated Oct 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nirmalya Thakur (2024). Mpox Narrative on Instagram: A Labeled Multilingual Dataset of Instagram Posts on Mpox for Sentiment, Hate Speech, and Anxiety Analysis [Dataset]. http://doi.org/10.6084/m9.figshare.27072247.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Oct 12, 2024
    Dataset provided by
    figshare
    Authors
    Nirmalya Thakur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Please cite this paper when using this dataset: N. Thakur, “Mpox narrative on Instagram: A labeled multilingual dataset of Instagram posts on mpox for sentiment, hate speech, and anxiety analysis,” arXiv [cs.LG], 2024, URL: https://arxiv.org/abs/2409.05292Abstract: The world is currently experiencing an outbreak of mpox, which has been declared a Public Health Emergency of International Concern by WHO. During recent virus outbreaks, social media platforms have played a crucial role in keeping the global population informed and updated regarding various aspects of the outbreaks. As a result, in the last few years, researchers from different disciplines have focused on the development of social media datasets focusing on different virus outbreaks. No prior work in this field has focused on the development of a dataset of Instagram posts about the mpox outbreak. The work presented in this paper (stated above) aims to address this research gap. It presents this multilingual dataset of 60,127 Instagram posts about mpox, published between July 23, 2022, and September 5, 2024. This dataset contains Instagram posts about mpox in 52 languages.For each of these posts, the Post ID, Post Description, Date of publication, language, and translated version of the post (translation to English was performed using the Google Translate API) are presented as separate attributes in the dataset. After developing this dataset, sentiment analysis, hate speech detection, and anxiety or stress detection were also performed. This process included classifying each post intoone of the fine-grain sentiment classes, i.e., fear, surprise, joy, sadness, anger, disgust, or neutralhate or not hateanxiety/stress detected or no anxiety/stress detected.These results are presented as separate attributes in the dataset for the training and testing of machine learning algorithms for sentiment, hate speech, and anxiety or stress detection, as well as for other applications.The 52 distinct languages in which Instagram posts are present in the dataset are English, Portuguese, Indonesian, Spanish, Korean, French, Hindi, Finnish, Turkish, Italian, German, Tamil, Urdu, Thai, Arabic, Persian, Tagalog, Dutch, Catalan, Bengali, Marathi, Malayalam, Swahili, Afrikaans, Panjabi, Gujarati, Somali, Lithuanian, Norwegian, Estonian, Swedish, Telugu, Russian, Danish, Slovak, Japanese, Kannada, Polish, Vietnamese, Hebrew, Romanian, Nepali, Czech, Modern Greek, Albanian, Croatian, Slovenian, Bulgarian, Ukrainian, Welsh, Hungarian, and Latvian.The following is a description of the attributes present in this dataset:Post ID: Unique ID of each Instagram postPost Description: Complete description of each post in the language in which it was originally publishedDate: Date of publication in MM/DD/YYYY formatLanguage: Language of the post as detected using the Google Translate APITranslated Post Description: Translated version of the post description. All posts which were not in English were translated into English using the Google Translate API. No language translation was performed for English posts.Sentiment: Results of sentiment analysis (using the preprocessed version of the translated Post Description) where each post was classified into one of the sentiment classes: fear, surprise, joy, sadness, anger, disgust, and neutralHate: Results of hate speech detection (using the preprocessed version of the translated Post Description) where each post was classified as hate or not hateAnxiety or Stress: Results of anxiety or stress detection (using the preprocessed version of the translated Post Description) where each post was classified as stress/anxiety detected or no stress/anxiety detected.All the Instagram posts that were collected during this data mining process to develop this dataset were publicly available on Instagram and did not require a user to log in to Instagram to view the same (at the time of writing this paper).

  11. 🌟 Emoji Trends Dataset

    • kaggle.com
    Updated Jul 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Waqar Ali (2024). 🌟 Emoji Trends Dataset [Dataset]. https://www.kaggle.com/datasets/waqi786/emoji-trends-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 31, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Waqar Ali
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset provides a detailed analysis of emoji usage across various social media platforms. It captures how different emojis are used in different contexts, reflecting emotions, trends, and user demographics.

    With emojis becoming a universal digital language, this dataset helps researchers, marketers, and data analysts explore how people express emotions online and identify patterns in social media communication.

    📌 Key Features: 😊 Emoji Details: Emoji 🎭: The specific emoji used in a post, comment, or message. Context 💬: The meaning or emotion associated with the emoji (e.g., Happy, Love, Funny, Sad). Platform 🌐: The social media platform where the emoji was used (e.g., Facebook, Instagram, Twitter). 👤 User Demographics: User Age 🎂: Age of the user who posted the emoji (ranges from 13 to 65 years). User Gender 🚻: Gender of the user (Male/Female). 📈 Additional Insights: Emoji Popularity 🔥: Frequency of each emoji’s usage across platforms. Trends Over Time 📅: How emoji usage changes based on trends or events. Regional Usage Patterns 🌍: How different cultures and regions use emojis differently. 📊 Use Cases & Applications: 🔹 Understanding emoji trends across social media 🔹 Analyzing emotional expression through digital communication 🔹 Exploring demographic differences in emoji usage 🔹 Identifying platform-specific emoji preferences 🔹 Enhancing sentiment analysis models with emoji insights

    ⚠️ Important Note: This dataset is synthetically generated for educational and analytical purposes. It does not contain real user data but is designed to reflect real-world trends in emoji usage.

  12. DeepCube: Post-processing dataset of social media data

    • zenodo.org
    Updated Mar 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aris Bozas; Eleni Kamateri; Giannis Tsampoulatidis; Aris Bozas; Eleni Kamateri; Giannis Tsampoulatidis (2024). DeepCube: Post-processing dataset of social media data [Dataset]. http://doi.org/10.5281/zenodo.7736979
    Explore at:
    Dataset updated
    Mar 1, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Aris Bozas; Eleni Kamateri; Giannis Tsampoulatidis; Aris Bozas; Eleni Kamateri; Giannis Tsampoulatidis
    Description

    This dataset contains the post-processing of the social media data collected for two different use cases during the first two years of the Deepcube project.

    More specifically, it contains two sub-datasets, including:

    1. The UC2 dataset containing the post-processing of the Twitter data collected for the DeepCube use case (UC2) dealing with the climate induced migration in Africa. This dataset contains in total 5,695,253 social media posts collected from the Twitter platform, based on the initial version of search criteria relevant to UC2 - defined by Universitat De Valencia, focused on the regions of Ethiopia and Somalia and started from 26 June, 2021 till March, 2023.
    2. The UC5 dataset containing the post-processing of the Twitter and Instagram data collected for the DeepCube use case (UC5) related to the sustainable and environmentally-friendly tourism. This dataset contains in total 58,143 social media posts collected from the Twitter and Instagram platform (12,881 collected from Twitter and 45,262 collected from Instagram), based on the initial version of search criteria relevant to UC5- defined by MURMURATION SAS, focused on the regions of Brasil and started from 26 June, 2021 till March, 2023.

    Additionally, an anottated dataset was created by Twitter historical data for UC2 the year 2010-20220.

    1. The UC2 historical anottated dataset containg the post-processing of the Twitter data collected for the DeepCube use case (UC2) dealing with the climate induced migration in Africa. This dataset contains in total 1721 annotated (412 relevant and 1309 irrelevant) by social media posts collected from the Twitter platform , focused on the region Somalia.

    INFALIA, being a spin-off of the CERTH institute (link) and a partner of a research EU project, releases this dataset containing an unlimited number of Tweet IDs for the sole purpose of enabling the validation of the research conducted within the DeepCube. Moreover, Twitter Content provided to in this dataset to third parties remains subject to the Twitter Policy, and those third parties must agree to the Twitter Terms of Service, Privacy Policy, Developer Agreement, and Developer Policy (link - https://developer.twitter.com/en/developer-terms) before receiving this download.

  13. s

    Social Media Usage By Country

    • searchlogistics.com
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Social Media Usage By Country [Dataset]. https://www.searchlogistics.com/learn/statistics/social-media-addiction-statistics/
    Explore at:
    Dataset updated
    Apr 1, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The results might surprise you when looking at internet users that are active on social media in each country.

  14. Z

    Data from: TikTok dataset - Current affairs on TikTok. Virality and...

    • data.niaid.nih.gov
    • ekoizpen-zientifikoa.ehu.eus
    • +1more
    Updated Aug 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peña-Fernández, Simón (2022). TikTok dataset - Current affairs on TikTok. Virality and entertainment for digital natives [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7024884
    Explore at:
    Dataset updated
    Aug 28, 2022
    Dataset provided by
    Larrondo-Ureta, Ainara
    Peña-Fernández, Simón
    Morales-i-Gras, Jordi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Tiktok network graph with 5,638 nodes and 318,986 unique links, representing up to 790,599 weighted links between labels, using Gephi network analysis software.

    Source of:

    Peña-Fernández, Simón, Larrondo-Ureta, Ainara, & Morales-i-Gras, Jordi. (2022). Current affairs on TikTok. Virality and entertainment for digital natives. Profesional De La Información, 31(1), 1–12. https://doi.org/10.5281/zenodo.5962655

    Abstract:

    Since its appearance in 2018, TikTok has become one of the most popular social media platforms among digital natives because of its algorithm-based engagement strategies, a policy of public accounts, and a simple, colorful, and intuitive content interface. As happened in the past with other platforms such as Facebook, Twitter, and Instagram, various media are currently seeking ways to adapt to TikTok and its particular characteristics to attract a younger audience less accustomed to the consumption of journalistic material. Against this background, the aim of this study is to identify the presence of the media and journalists on TikTok, measure the virality and engagement of the content they generate, describe the communities created around them, and identify the presence of journalistic use of these accounts. For this, 23,174 videos from 143 accounts belonging to media from 25 countries were analyzed. The results indicate that, in general, the presence and impact of the media in this social network are low and that most of their content is oriented towards the creation of user communities based on viral content and entertainment. However, albeit with a lesser presence, one can also identify accounts and messages that adapt their content to the specific characteristics of TikTok. Their virality and engagement figures illustrate that there is indeed a niche for current affairs on this social network.

  15. f

    Definitions of the symbols used in this paper.

    • plos.figshare.com
    xls
    Updated Oct 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daiki Suzuki; Sho Tsugawa; Keiichiro Tsukamoto; Shintaro Igari (2023). Definitions of the symbols used in this paper. [Dataset]. http://doi.org/10.1371/journal.pone.0293032.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 16, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Daiki Suzuki; Sho Tsugawa; Keiichiro Tsukamoto; Shintaro Igari
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analyzing the dynamics of information diffusion cascades and accurately predicting their behavior holds significant importance in various applications. In this paper, we concentrate specifically on a recently introduced contrastive cascade graph learning framework, for the task of predicting cascade popularity. This framework follows a pre-training and fine-tuning paradigm to address cascade prediction tasks. In a previous study, the transferability of pre-trained models within the contrastive cascade graph learning framework was examined solely between two social media datasets. However, in our present study, we comprehensively evaluate the transferability of pre-trained models across 13 real datasets and six synthetic datasets. We construct several pre-trained models using real cascades and synthetic cascades generated by the independent cascade model and the Profile model. Then, we fine-tune these pre-trained models on real cascade datasets and evaluate their prediction accuracy based on the mean squared logarithmic error. The main findings derived from our results are as follows. (1) The pre-trained models exhibit transferability across diverse types of real datasets in different domains, encompassing different languages, social media platforms, and diffusion time scales. (2) Synthetic cascade data prove effective for pre-training purposes. The pre-trained models constructed with synthetic cascade data demonstrate comparable effectiveness to those constructed using real data. (3) Synthetic cascade data prove beneficial for fine-tuning the contrastive cascade graph learning models and training other state-of-the-art popularity prediction models. Models trained using a combination of real and synthetic cascades yield significantly lower mean squared logarithmic error compared to those trained solely on real cascades. Our findings affirm the effectiveness of synthetic cascade data in enhancing the accuracy of cascade popularity prediction.

  16. Twitter users in the United States 2019-2028

    • statista.com
    • ai-chatbox.pro
    Updated Jun 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2024). Twitter users in the United States 2019-2028 [Dataset]. https://www.statista.com/topics/3196/social-media-usage-in-the-united-states/
    Explore at:
    Dataset updated
    Jun 13, 2024
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Area covered
    United States
    Description

    The number of Twitter users in the United States was forecast to continuously increase between 2024 and 2028 by in total 4.3 million users (+5.32 percent). After the ninth consecutive increasing year, the Twitter user base is estimated to reach 85.08 million users and therefore a new peak in 2028. Notably, the number of Twitter users of was continuously increasing over the past years.User figures, shown here regarding the platform twitter, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Twitter users in countries like Canada and Mexico.

  17. CMFeed: A Benchmark Dataset for Controllable Multimodal Feedback Synthesis

    • zenodo.org
    Updated May 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Puneet Kumar; Puneet Kumar; Sarthak Malik; Sarthak Malik; Balasubramanian Raman; Balasubramanian Raman; Xiaobai Li; Xiaobai Li (2025). CMFeed: A Benchmark Dataset for Controllable Multimodal Feedback Synthesis [Dataset]. http://doi.org/10.5281/zenodo.11409612
    Explore at:
    Dataset updated
    May 11, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Puneet Kumar; Puneet Kumar; Sarthak Malik; Sarthak Malik; Balasubramanian Raman; Balasubramanian Raman; Xiaobai Li; Xiaobai Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jun 1, 2024
    Description

    Overview
    The Controllable Multimodal Feedback Synthesis (CMFeed) Dataset is designed to enable the generation of sentiment-controlled feedback from multimodal inputs, including text and images. This dataset can be used to train feedback synthesis models in both uncontrolled and sentiment-controlled manners. Serving a crucial role in advancing research, the CMFeed dataset supports the development of human-like feedback synthesis, a novel task defined by the dataset's authors. Additionally, the corresponding feedback synthesis models and benchmark results are presented in the associated code and research publication.

    Task Uniqueness: The task of controllable multimodal feedback synthesis is unique, distinct from LLMs and tasks like VisDial, and not addressed by multi-modal LLMs. LLMs often exhibit errors and hallucinations, as evidenced by their auto-regressive and black-box nature, which can obscure the influence of different modalities on the generated responses [Ref1; Ref2]. Our approach includes an interpretability mechanism, as detailed in the supplementary material of the corresponding research publication, demonstrating how metadata and multimodal features shape responses and learn sentiments. This controllability and interpretability aim to inspire new methodologies in related fields.

    Data Collection and Annotation
    Data was collected by crawling Facebook posts from major news outlets, adhering to ethical and legal standards. The comments were annotated using four sentiment analysis models: FLAIR, SentimentR, RoBERTa, and DistilBERT. Facebook was chosen for dataset construction because of the following factors:
    • Facebook was chosen for data collection because it uniquely provides metadata such as news article link, post shares, post reaction, comment like, comment rank, comment reaction rank, and relevance scores, not available on other platforms.
    • Facebook is the most used social media platform, with 3.07 billion monthly users, compared to 550 million Twitter and 500 million Reddit users. [Ref]
    • Facebook is popular across all age groups (18-29, 30-49, 50-64, 65+), with at least 58% usage, compared to 6% for Twitter and 3% for Reddit. [Ref]. Trends are similar for gender, race, ethnicity, income, education, community, and political affiliation [Ref]
    • The male-to-female user ratio on Facebook is 56.3% to 43.7%; on Twitter, it's 66.72% to 23.28%; Reddit does not report this data. [Ref]

    Filtering Process: To ensure high-quality and reliable data, the dataset underwent two levels of filtering:
    a) Model Agreement Filtering: Retained only comments where at least three out of the four models agreed on the sentiment.
    b) Probability Range Safety Margin: Comments with a sentiment probability between 0.49 and 0.51, indicating low confidence in sentiment classification, were excluded.
    After filtering, 4,512 samples were marked as XX. Though these samples have been released for the reader's understanding, they were not used in training the feedback synthesis model proposed in the corresponding research paper.

    Dataset Description
    • Total Samples: 61,734
    • Total Samples Annotated: 57,222 after filtering.
    • Total Posts: 3,646
    • Average Likes per Post: 65.1
    • Average Likes per Comment: 10.5
    • Average Length of News Text: 655 words
    • Average Number of Images per Post: 3.7

    Components of the Dataset
    The dataset comprises two main components:
    CMFeed.csv File: Contains metadata, comment, and reaction details related to each post.
    Images Folder: Contains folders with images corresponding to each post.

    Data Format and Fields of the CSV File
    The dataset is structured in CMFeed.csv file along with corresponding images in related folders. This CSV file includes the following fields:
    Id: Unique identifier
    Post: The heading of the news article.
    News_text: The text of the news article.
    News_link: URL link to the original news article.
    News_Images: A path to the folder containing images related to the post.
    Post_shares: Number of times the post has been shared.
    Post_reaction: A JSON object capturing reactions (like, love, etc.) to the post and their counts.
    Comment: Text of the user comment.
    Comment_like: Number of likes on the comment.
    Comment_reaction_rank: A JSON object detailing the type and count of reactions the comment received.
    Comment_link: URL link to the original comment on Facebook.
    Comment_rank: Rank of the comment based on engagement and relevance.
    Score: Sentiment score computed based on the consensus of sentiment analysis models.
    Agreement: Indicates the consensus level among the sentiment models, ranging from -4 (all negative) to 4 (all positive). 3 negative and 1 positive will result into -2 and 3 positives and 1 negative will result into +2.
    Sentiment_class: Categorizes the sentiment of the comment into 1 (positive) or 0 (negative).

    More Considerations During Dataset Construction
    We thoroughly considered issues such as the choice of social media platform for data collection, bias and generalizability of the data, selection of news handles/websites, ethical protocols, privacy and potential misuse before beginning data collection. While achieving completely unbiased and fair data is unattainable, we endeavored to minimize biases and ensure as much generalizability as possible. Building on these considerations, we made the following decisions about data sources and handling to ensure the integrity and utility of the dataset:

    • Why not merge data from different social media platforms?
    We chose not to merge data from platforms such as Reddit and Twitter with Facebook due to the lack of comprehensive metadata, clear ethical guidelines, and control mechanisms—such as who can comment and whether users' anonymity is maintained—on these platforms other than Facebook. These factors are critical for our analysis. Our focus on Facebook alone was crucial to ensure consistency in data quality and format.

    • Choice of four news handles: We selected four news handles—BBC News, Sky News, Fox News, and NY Daily News—to ensure diversity and comprehensive regional coverage. These news outlets were chosen for their distinct regional focuses and editorial perspectives: BBC News is known for its global coverage with a centrist view, Sky News offers geographically targeted and politically varied content learning center/right in the UK/EU/US, Fox News is recognized for its right-leaning content in the US, and NY Daily News provides left-leaning coverage in New York. Many other news handles such as NDTV, The Hindu, Xinhua, and SCMP are also large-scale but may contain information in regional languages such as Indian and Chinese, hence, they have not been selected. This selection ensures a broad spectrum of political discourse and audience engagement.

    • Dataset Generalizability and Bias: With 3.07 billion of the total 5 billion social media users, the extensive user base of Facebook, reflective of broader social media engagement patterns, ensures that the insights gained are applicable across various platforms, reducing bias and strengthening the generalizability of our findings. Additionally, the geographic and political diversity of these news sources, ranging from local (NY Daily News) to international (BBC News), and spanning political spectra from left (NY Daily News) to right (Fox News), ensures a balanced representation of global and political viewpoints in our dataset. This approach not only mitigates regional and ideological biases but also enriches the dataset with a wide array of perspectives, further solidifying the robustness and applicability of our research.

    • Dataset size and diversity: Facebook prohibits the automatic scraping of its users' personal data. In compliance with this policy, we manually scraped publicly available data. This labor-intensive process requiring around 800 hours of manual effort, limited our data volume but allowed for precise selection. We followed ethical protocols for scraping Facebook data , selecting 1000 posts from each of the four news handles to enhance diversity and reduce bias. Initially, 4000 posts were collected; after preprocessing (detailed in Section 3.1), 3646 posts remained. We then processed all associated comments, resulting in a total of 61734 comments. This manual method ensures adherence to Facebook’s policies and the integrity of our dataset.

    Ethical considerations, data privacy and misuse prevention
    The data collection adheres to Facebook’s ethical guidelines [<a href="https://developers.facebook.com/terms/"

  18. u

    Analysis of social media and organizational learning

    • researchdata.up.ac.za
    pdf
    Updated Feb 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harry Moongela; Marie Hattingh (2023). Analysis of social media and organizational learning [Dataset]. http://doi.org/10.25403/UPresearchdata.21952859.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Feb 4, 2023
    Dataset provided by
    University of Pretoria
    Authors
    Harry Moongela; Marie Hattingh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These datasets consist of qualitative data collected through semi-structured in-depth interviews as well as a focus group from three different companies with seven industry experts.The data collected was to address the use of social media to enhance organisational learning and also to address the gap that exists in terms of the integration of organisational learning (OL) and social media and also address the lack of guidelines for organisations that would like to implement the use of social media to facilitate OL. The data were triangulated by comparing the results from the three companies.

  19. Bluesky Social Dataset

    • zenodo.org
    Updated Dec 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrea Failla; Andrea Failla; Giulio Rossetti; Giulio Rossetti (2024). Bluesky Social Dataset [Dataset]. http://doi.org/10.5281/zenodo.11082879
    Explore at:
    Dataset updated
    Dec 2, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Andrea Failla; Andrea Failla; Giulio Rossetti; Giulio Rossetti
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Bluesky Social Dataset

    1st Dec 2024. This version of the dataset has been superseeded and is now restricted. Please refer to the most recent release.

    Pollution of online social spaces caused by rampaging d/misinformation is a growing societal concern. However, recent decisions to reduce access to social media APIs are causing a shortage of publicly available, recent, social media data, thus hindering the advancement of computational social science as a whole. To address this pressing issue, we present a large, high-coverage dataset of social interactions and user-generated content from Bluesky Social.

    The dataset contains the complete post history of over 4M users (81% of all registered accounts), totaling 235M posts. We also make available social data covering follow, comment, repost, and quote interactions.

    Since Bluesky allows users to create and bookmark feed generators (i.e., content recommendation algorithms), we also release the full output of several popular algorithms available on the platform, along with their “like” interactions and time of bookmarking.

    Dataset

    Here is a description of the dataset files.

    • followers.csv.gz. This compressed file contains the anonymized follower edge list. Once decompressed, each row consists of two comma-separated integers u, v, representing a directed following relation (i.e., user u follows user v).
    • posts.tar.gz. This compressed folder contains data on the individual posts collected. Decompressing this file results in 100 files, each containing the full posts of up to 50,000 users. Each post is stored as a JSON-formatted line.
    • interactions.csv.gz. This compressed file contains the anonymized interactions edge list. Once decompressed, each row consists of six comma-separated integers, and represents a comment, repost, or quote interaction. These integers correspond to the following fields, in this order: user_id, replied_author, thread_root_author, reposted_author ,quoted_author, and date.
    • graphs.tar.gz. This compressed folder contains edge list files for the graphs emerging from reposts, quotes, and replies. Each interaction is timestamped. The folder also contains timestamped higher-order interactions emerging from discussion threads, each containing all users participating in a thread.
    • feed_posts.tar.gz. This compressed folder contains posts that appear in 11 thematic feeds. Decompressing this folder results in 11 files containing posts from one feed each. Posts are stored as a JSON-formatted line. Fields are correspond to those in posts.tar.gz, except for those related to sentiment analysis (sent_label, sent_score), and reposts (repost_from, reposted_author);
    • feed_bookmarks.csv. This file contains users who bookmarked any of the collected feeds. Each record contains three comma-separated values, namely the feed name, the user id, and the timestamp.
    • feed_post_likes.tar.gz. This compressed folder contains data on likes to posts appearing in the feeds, one file per feed. Each record in the files contains the following information, in this order: the id of the ``liker'', the id of the post's author, the id of the liked post, and the like timestamp;
    • scripts.tar.gz. A collection of Python scripts, including the ones originally used to crawl the data, and to perform experiments. These scripts are detailed in a document released within the folder.

    Citation

    If used for research purposes, please cite the following paper describing the dataset details:

    Andrea Failla and Giulio Rossetti. "I'm in the Bluesky Tonight: Insights from a Year Worth of Social Data". PlosOne (2024) a https://doi.org/10.1371/journal.pone.0310330

    Right to Erasure (Right to be forgotten)

    Note: If your account was created after March 21st, 2024, or if you did not post on Bluesky before such date, no data about your account exists in the dataset. Before sending a data removal request, please make sure that you were active and posting on bluesky before March 21st, 2024.

    Users included in the Bluesky dataset have the right to opt out and request the removal of their data, in accordance with GDPR provisions (Article 17). It should be noted, however, that the dataset was created for scientific research purposes, thereby falling under the scenarios for which GDPR provides derogations (Article 17(3)(d) and Article 89).

    We emphasize that, in compliance with GDPR (Article 4(5)), the released data has been thoroughly pseudonymized. Specifically, usernames and object identifiers (e.g., URIs) have been removed, and object timestamps have been coarsened to further protect individual privacy.

    If you wish to have your activities excluded from this dataset, please submit your request to blueskydatasetmoderation@gmail.com (with subject "Removal request: [username]").
    We will process your request within a reasonable timeframe.

    Acknowledgments:

    This work is supported by :

    • the European Union – Horizon 2020 Program under the scheme “INFRAIA-01-2018-2019 – Integrating Activities for Advanced Communities”,
      Grant Agreement n.871042, “SoBigData++: European Integrated Infrastructure for Social Mining and Big Data Analytics” (http://www.sobigdata.eu);
    • SoBigData.it which receives funding from the European Union – NextGenerationEU – National Recovery and Resilience Plan (Piano Nazionale di Ripresa e Resilienza, PNRR) – Project: “SoBigData.it – Strengthening the Italian RI for Social Mining and Big Data Analytics” – Prot. IR0000013 – Avviso n. 3264 del 28/12/2021;
    • EU NextGenerationEU programme under the funding schemes PNRR-PE-AI FAIR (Future Artificial Intelligence Research).
  20. Twitter dataset

    • figshare.com
    txt
    Updated Dec 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mehdi khalil (2024). Twitter dataset [Dataset]. http://doi.org/10.6084/m9.figshare.28069163.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Dec 20, 2024
    Dataset provided by
    figshare
    Authors
    mehdi khalil
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Truth Seeker Dataset is designed to support research in the detection and classification of misinformation on social media platforms, particularly focusing on Twitter. This dataset is part of a broader initiative to enhance the understanding of how machine learning (ML) and natural language processing (NLP) can be leveraged to identify fake news and misleading content in real-time.Dataset CompositionThe Truth Seeker Dataset comprises a substantial collection of social media posts that have been meticulously labeled as either real or fake. It was constructed using advanced ML algorithms and NLP techniques to analyze the language patterns in social media communications. The dataset includes:Raw Social Media Posts: A diverse range of tweets that reflect various topics and sentiments.Labeling: Each post is annotated with binary labels indicating its authenticity (real or fake).Feature Sets: Two distinct subsets of the dataset have been created using different NLP vectorization methods—Word2Vec and TF-IDF. This allows researchers to explore how different feature representations impact model performance.Research ApplicationsThe primary aim of the Truth Seeker Dataset is to facilitate the development and validation of models that can accurately classify social media content. Key applications include:Fake News Detection: Utilizing various ML algorithms, including Random Forest and AdBoost, which have demonstrated high F1 scores in preliminary evaluations.Model Comparison: Researchers can compare the effectiveness of different ML approaches on the same dataset, enabling a clearer understanding of which methods yield the best results in detecting misinformation.Algorithm Development: The dataset serves as a benchmark for developing new algorithms aimed at improving accuracy in fake news detection.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ram Jas (2022). Social Media Influencers in 2022 [Dataset]. https://www.kaggle.com/datasets/ramjasmaurya/top-1000-social-media-channels/code
Organization logo

Social Media Influencers in 2022

Top 1000 social media influencers from instagram,youtube and tiktok each in 2022

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 27, 2022
Dataset provided by
Kaggle
Authors
Ram Jas
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Important : its a 3 month gap data Starting from March 2022 to Dec 2022

Influencers are categorized by the number of followers they have on social media. They include celebrities with large followings to niche content creators with a loyal following on social-media platforms such as YouTube, Instagram, Facebook, and Twitter.Their followers range in number from hundreds of millions to 1,000. Influencers may be categorized in tiers (mega-, macro-, micro-, and nano-influencers), based on their number of followers.

Businesses pursue people who aim to lessen their consumption of advertisements, and are willing to pay their influencers more. Targeting influencers is seen as increasing marketing's reach, counteracting a growing tendency by prospective customers to ignore marketing.

Marketing researchers Kapitan and Silvera find that influencer selection extends into product personality. This product and benefit matching is key. For a shampoo, it should use an influencer with good hair. Likewise, a flashy product may use bold colors to convey its brand. If an influencer is not flashy, they will clash with the brand. Matching an influencer with the product's purpose and mood is important.

https://sceptermarketing.com/wp-content/uploads/2019/02/social-media-influencers-2l4ues9.png">

Search
Clear search
Close search
Google apps
Main menu