81 datasets found
  1. Twitter users in the United States 2019-2028

    • statista.com
    Updated Jun 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Twitter users in the United States 2019-2028 [Dataset]. https://www.statista.com/topics/3196/social-media-usage-in-the-united-states/
    Explore at:
    Dataset updated
    Jun 13, 2024
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Area covered
    United States
    Description

    The number of Twitter users in the United States was forecast to continuously increase between 2024 and 2028 by in total 4.3 million users (+5.32 percent). After the ninth consecutive increasing year, the Twitter user base is estimated to reach 85.08 million users and therefore a new peak in 2028. Notably, the number of Twitter users of was continuously increasing over the past years.User figures, shown here regarding the platform twitter, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Twitter users in countries like Canada and Mexico.

  2. Social media users in the United States 2020-2029

    • statista.com
    Updated Dec 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Social media users in the United States 2020-2029 [Dataset]. https://www.statista.com/statistics/278409/number-of-social-network-users-in-the-united-states/
    Explore at:
    Dataset updated
    Dec 12, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    United States
    Description

    The number of social media users in the United States was forecast to continuously increase between 2024 and 2029 by in total 26 million users (+8.55 percent). After the ninth consecutive increasing year, the social media user base is estimated to reach 330.07 million users and therefore a new peak in 2029. Notably, the number of social media users of was continuously increasing over the past years.The shown figures regarding social media users have been derived from survey data that has been processed to estimate missing demographics.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).

  3. P

    Sentiment Analysis for Social Media Monitoring Dataset

    • paperswithcode.com
    Updated Mar 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Sentiment Analysis for Social Media Monitoring Dataset [Dataset]. https://paperswithcode.com/dataset/sentiment-analysis-for-social-media
    Explore at:
    Dataset updated
    Mar 6, 2025
    Description

    Problem Statement

    👉 Download the case studies here

    A global consumer goods company struggled to understand customer sentiment across various social media platforms. With millions of posts, reviews, and comments generated daily, manually tracking and analyzing public opinion was inefficient. The company needed an automated solution to monitor brand perception, address negative feedback promptly, and leverage insights for marketing strategies.

    Challenge

    Analyzing social media sentiment posed the following challenges:

    Processing vast amounts of unstructured text data from multiple platforms like Twitter, Facebook, and Instagram.

    Accurately interpreting slang, emojis, and nuanced language used by social media users.

    Identifying trends and actionable insights in real-time to respond to potential crises or opportunities effectively.

    Solution Provided

    An advanced sentiment analysis system was developed using Natural Language Processing (NLP) and sentiment analysis algorithms. The solution was designed to:

    Classify social media posts into positive, negative, and neutral sentiments.

    Extract key topics and trends related to the brand and its products.

    Provide real-time dashboards for monitoring customer sentiment and identifying areas of improvement.

    Development Steps

    Data Collection

    Aggregated data from major social media platforms using APIs, focusing on brand mentions, hashtags, and product keywords.

    Preprocessing

    Cleaned and normalized text data, including handling slang, emojis, and misspellings, to prepare it for analysis.

    Model Training

    Trained NLP models for sentiment classification using supervised learning. Implemented topic modeling algorithms to identify recurring themes and discussions.

    Validation

    Tested the sentiment analysis models on labeled datasets to ensure high accuracy and relevance in classifying social media posts.

    Deployment

    Integrated the sentiment analysis system with a real-time analytics dashboard, enabling the marketing and customer support teams to track trends and respond proactively.

    Monitoring & Improvement

    Established a continuous feedback mechanism to refine models based on evolving language patterns and new social media trends.

    Results

    Gained Actionable Insights

    The system provided detailed insights into customer opinions, helping the company identify strengths and areas for improvement.

    Improved Brand Reputation Management

    Real-time monitoring enabled swift responses to negative feedback, mitigating potential reputation risks.

    Informed Marketing Strategies

    Insights from sentiment analysis guided targeted marketing campaigns, resulting in higher engagement and ROI.

    Enhanced Customer Relationships

    Proactive engagement with customers based on sentiment analysis improved customer satisfaction and loyalty.

    Scalable Monitoring Solution

    The system scaled efficiently to analyze data across multiple languages and platforms, broadening the company’s reach and understanding.

  4. c

    Provenance of social media: survey data, 2016

    • datacatalogue.cessda.eu
    Updated Mar 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Edwards, P; Corsar, D; Markovic, M (2025). Provenance of social media: survey data, 2016 [Dataset]. http://doi.org/10.5255/UKDA-SN-852507
    Explore at:
    Dataset updated
    Mar 25, 2025
    Dataset provided by
    University of Aberdeen
    Authors
    Edwards, P; Corsar, D; Markovic, M
    Time period covered
    Aug 1, 2016 - Aug 31, 2016
    Area covered
    United Kingdom
    Variables measured
    Individual
    Measurement technique
    This dataset was created via an online survey. Respondents were self selecting from emails to social science researchers and Tweets requesting participants to complete the form. The sample population were social science researchers that have used or plan to use data from social media services in their research.
    Description

    Survey instrument and anonymised responses collected as part of Sub-Project B4 “Provenance of Social Media” of the larger Social Media - Developing Understanding, Infrastructure & Engagement (Social Media Enhancement) award (ES/M001628/1). The survey aimed to further our understanding of the current practices and attitudes towards the provenance of data collected from social media platforms and its analysis by researchers in the social sciences. This includes all forms of social media, such as Twitter, Facebook, Wikipedia, Quora, blogs, discussion forums, etc. The survey was conducted as an online-survey using Google Forms. Findings from this survey influenced the work of the sub-project, and the development of tools to support researchers who wish to increase the transparency of their research using social media data.

    Dataset of collected survey responses, and pdf versions of the Google Forms online survey instrument. Each PDF file denotes one possible survey path that depended on the response of a participant to the question “What level of experience do you have using data from a social media platforms as part of your research?” The three paths are:

    (1) SurveyInstrument-Path-1.pdf - is used if the participant selected the option "I have used/am currently using social media data as part of my research."

    (2) SurveyInstrument-Path-2.pdf - is used if the participant selected the option "I am aware of others using social media data as part of their research and may consider using it within mine."

    (3) SurveyInstrument-Path-3.pdf - is used if the participant selected the option "Neither of the above."

    There is now a broad consensus that new forms of social data emerging from people’s day-to-day activities on the web have the potential to transform the social sciences. However, there is also agreement that current analytical techniques fall short of the methodological standards required for academic research and policymaking and that conclusions drawn from social media data have much greater utility when combined with results drawn from other datasets (including various public sector resources made available through open data initiatives). In this proposal we outline the case for further investigations into the challenges surrounding social media data and the social sciences. Aspects of the work will involve analysis of social media data in a number of contexts, including: -transport disruption around the 2014 Commonwealth Games (Glasgow) - news stories about Scottish independence and UK-EU relations - island communities in the Western Isles. Guided by insights from these case studies we will: - develop a suite of software tools to support various aspects of data analysis and curation; - provide guidance on ethical considerations surrounding analysis of social media data; - deliver training workshops for social science researchers; - engage with the public on this important topic through a series of festivals (food, music, science).

  5. News Popularity in Multiple Social Media Platforms

    • kaggle.com
    zip
    Updated Sep 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    soroush khandouzi (2021). News Popularity in Multiple Social Media Platforms [Dataset]. https://www.kaggle.com/soroushkhandouzi/news-popularity-in-multiple-social-media-platforms
    Explore at:
    zip(10865230 bytes)Available download formats
    Dataset updated
    Sep 30, 2021
    Authors
    soroush khandouzi
    Description

    Dataset

    This dataset was created by soroush khandouzi

    Contents

  6. Reddit users in the United States 2019-2028

    • statista.com
    Updated Jun 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2024). Reddit users in the United States 2019-2028 [Dataset]. https://www.statista.com/topics/3196/social-media-usage-in-the-united-states/
    Explore at:
    Dataset updated
    Jun 13, 2024
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Area covered
    United States
    Description

    The number of Reddit users in the United States was forecast to continuously increase between 2024 and 2028 by in total 10.3 million users (+5.21 percent). After the ninth consecutive increasing year, the Reddit user base is estimated to reach 208.12 million users and therefore a new peak in 2028. Notably, the number of Reddit users of was continuously increasing over the past years.User figures, shown here with regards to the platform reddit, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once. Reddit users encompass both users that are logged in and those that are not.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Reddit users in countries like Mexico and Canada.

  7. DeepCube: Post-processing and annotated datasets of social media data

    • zenodo.org
    • data.niaid.nih.gov
    Updated Mar 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexandros Mokas; Eleni Kamateri; Giannis Tsampoulatidis; Alexandros Mokas; Eleni Kamateri; Giannis Tsampoulatidis (2024). DeepCube: Post-processing and annotated datasets of social media data [Dataset]. http://doi.org/10.5281/zenodo.10731637
    Explore at:
    Dataset updated
    Mar 15, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alexandros Mokas; Eleni Kamateri; Giannis Tsampoulatidis; Alexandros Mokas; Eleni Kamateri; Giannis Tsampoulatidis
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Researcher(s): Alexandros Mokas, Eleni Kamateri

    Supervisor: Ioannis Tsampoulatidis

    This repository contains 3 social media datasets:

    2 Post-processing datasets: These datasets contain post-processing data extracted from the analysis of social media posts collected for two different use cases during the first two years of the Deepcube project. More specifically, these include:

    • The UC2 dataset containing the post-processing analysis of the Twitter data collected for the DeepCube use case (UC2) dealing with the climate induced migration in Africa. This dataset contains in total 5,695,253 social media posts collected from the Twitter platform, based on the initial version of search criteria relevant to UC2 defined by Universitat De Valencia, focused on the regions of Ethiopia and Somalia and started from 26 June, 2021 till March, 2023.
    • The UC5 dataset containing the post-processing analysis of the Twitter and Instagram data collected for the DeepCube use case (UC5) related to the sustainable and environmentally-friendly tourism. This dataset contains in total 58,143 social media posts collected from the Twitter and Instagram platform (12,881 collected from Twitter and 45,262 collected from Instagram), based on the initial version of search criteria relevant to UC5 defined by MURMURATION SAS, focused on the regions of Brasil and started from 26 June, 2021 till March, 2023.

    1 Annotated dataset: An additional anottated dataset was created that contains post-processing data along with annotations of Twitter posts collected for UC2 for the years 2010-2022. More specifically, it includes:

    • The UC2 dataset contain the post-processing of the Twitter data collected for the DeepCube use case (UC2) dealing with the climate induced migration in Africa. This dataset contains in total 1721 annotated (412 relevant and 1309 irrelevant) by social media posts collected from the Twitter platform, focused on the region of Somalia and started from 1 January, 2010 till 31 December, 2022.

    For every social media post retrieved from Twitter and Instagram, a preprocessing step was performed. This involved a three-step analysis of each post using the appropriate web service. First, the location of the post was automatically extracted from the text using a location extraction service. Second, the images included in the post were analyzed using a concept extraction service, which identified and provided the top ten concepts that best described the image. These concepts included items such as "person," "building," "drought," "sun," and so on. Finally, the sentiment expressed in the post's text was determined by using a sentiment analysis service. The sentiment was classified as either positive, negative, or neutral.

    After the social media posts were preprocessed, they were visualized using the Social Media Web Application. This intuitive, user-friendly online application was designed for both expert and non-expert users and offers a web-based user interface for filtering and visualizing the collected social media data. The application provides various filtering options, an interactive map, a timeline, and a collection of graphs to help users analyze the data. Moreover, this application provides users with the option to download aggregated data for specific periods by applying filters and clicking the "Download Posts" button. This feature allows users to easily extract and analyze social media data outside of the web application, providing greater flexibility and control over data analysis.

    The dataset is provided by INFALIA.

    INFALIA, being a spin-off of the CERTH institute and a partner of a research EU project, releases this dataset containing Tweets IDs and post pre-processing data for the sole purpose of enabling the validation of the research conducted within the DeepCube. Moreover, Twitter Content provided in this dataset to third parties remains subject to the Twitter Policy, and those third parties must agree to the Twitter Terms of Service, Privacy Policy, Developer Agreement, and Developer Policy (https://developer.twitter.com/en/developer-terms) before receiving this download.

  8. Dataset of the paper "Why and how to use enterprise social media platforms:...

    • zenodo.org
    Updated Jan 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rachel Yee; Maria-Jose Miquel-Romero; Sonia Cruz-Ros; Rachel Yee; Maria-Jose Miquel-Romero; Sonia Cruz-Ros (2025). Dataset of the paper "Why and how to use enterprise social media platforms: The employee's perspective" [Dataset]. http://doi.org/10.5281/zenodo.14775726
    Explore at:
    Dataset updated
    Jan 31, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Rachel Yee; Maria-Jose Miquel-Romero; Sonia Cruz-Ros; Rachel Yee; Maria-Jose Miquel-Romero; Sonia Cruz-Ros
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 30, 2025
    Description

    This dataset was used in the paper:

    Yee, R. W., Miquel-Romero, M. J., & Cruz-Ros, S. (2021). Why and how to use enterprise social media platforms: The employee’s perspective. Journal of Business Research, 137, 517-526. https://doi.org/10.1016/j.jbusres.2021.08.057.

  9. P

    Data from: iDRAMA-Scored-2024: A Dataset of the Scored Social Media Platform...

    • paperswithcode.com
    Updated May 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jay Patel; Pujan Paudel; Emiliano De Cristofaro; Gianluca Stringhini; Jeremy Blackburn (2024). iDRAMA-Scored-2024: A Dataset of the Scored Social Media Platform from 2020 to 2023 Dataset [Dataset]. https://paperswithcode.com/dataset/idrama-scored-2024
    Explore at:
    Dataset updated
    May 15, 2024
    Authors
    Jay Patel; Pujan Paudel; Emiliano De Cristofaro; Gianluca Stringhini; Jeremy Blackburn
    Description

    Online web communities often face bans for violating platform policies, encouraging their migration to alternative platforms. This migration, however, can result in increased toxicity and unforeseen consequences on the new platform. In recent years, researchers have collected data from many alternative platforms, indicating coordinated efforts leading to offline events, conspiracy movements, hate speech propagation, and harassment. Thus, it becomes crucial to characterize and understand these alternative platforms. To advance research in this direction, we collect and release a large-scale dataset from Scored -- an alternative Reddit platform that sheltered banned fringe communities, for example, c/TheDonald (a prominent right-wing community) and c/GreatAwakening (a conspiratorial community). Over four years, we collected approximately 57M posts from Scored, with at least 58 communities identified as migrating from Reddit and over 950 communities created since the platform's inception. Furthermore, we provide sentence embeddings of all posts in our dataset, generated through a state-of-the-art model, to further advance the field in characterizing the discussions within these communities. We aim to provide these resources to facilitate their investigations without the need for extensive data collection and processing efforts.

  10. s

    Data from: Facebook Users

    • searchlogistics.com
    Updated Mar 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Facebook Users [Dataset]. https://www.searchlogistics.com/learn/statistics/social-media-user-statistics/
    Explore at:
    Dataset updated
    Mar 17, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Facebook is fast approaching 3 billion monthly active users. That’s about 36% of the world’s entire population that log in and use Facebook at least once a month.

  11. Instagram users in Central & Western Europe 2019-2028

    • statista.com
    Updated Nov 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2024). Instagram users in Central & Western Europe 2019-2028 [Dataset]. https://www.statista.com/topics/4106/social-media-usage-in-europe/
    Explore at:
    Dataset updated
    Nov 4, 2024
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Area covered
    Western Europe
    Description

    The number of Instagram users in Central & Western Europe was forecast to increase between 2024 and 2028 by in total 4.5 million users (+3.52 percent). This overall increase does not happen continuously, notably not in 2028. The Instagram user base is estimated to amount to 132.21 million users in 2028. Notably, the number of Instagram users of was continuously increasing over the past years.User figures, shown here with regards to the platform instagram, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Instagram users in countries like Eastern Europe and Northern Europe.

  12. News Title Sentiment Dataset

    • zenodo.org
    bin
    Updated Mar 24, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chang Wei Tan; Chang Wei Tan; Christoph Bergmeir; Christoph Bergmeir; Francois Petitjean; Francois Petitjean; Geoffrey I Webb; Geoffrey I Webb (2021). News Title Sentiment Dataset [Dataset]. http://doi.org/10.5281/zenodo.3902726
    Explore at:
    binAvailable download formats
    Dataset updated
    Mar 24, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Chang Wei Tan; Chang Wei Tan; Christoph Bergmeir; Christoph Bergmeir; Francois Petitjean; Francois Petitjean; Geoffrey I Webb; Geoffrey I Webb
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is part of the Monash, UEA & UCR time series regression repository. http://tseregression.org/

    The goal of this dataset is to predict sentiment score for news title. This dataset contains 83164 time series obtained from the News Popularity in Multiple Social Media Platforms dataset from the UCI repository. This is a large data set of news items and their respective social feedback on multiple platforms: Facebook, Google+ and LinkedIn. The collected data relates to a period of 8 months, between November 2015 and July 2016, accounting for about 100,000 news items on four different topics: economy, microsoft, obama and palestine. This data set is tailored for evaluative comparisons in predictive analytics tasks, although allowing for tasks in other research areas such as topic detection and tracking, sentiment analysis in short text, first story detection or news recommendation. The time series has 3 dimensions.

    Please refer to https://archive.ics.uci.edu/ml/datasets/News+Popularity+in+Multiple+Social+Media+Platforms for more details

    Citation request
    Nuno Moniz and Luis Torgo (2018), Multi-Source Social Feedback of Online News Feeds, CoRR

  13. d

    Dataplex: Reddit Data | Global Social Media Data | 2.1M+ subreddits: trends,...

    • datarade.ai
    .json, .csv
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataplex, Dataplex: Reddit Data | Global Social Media Data | 2.1M+ subreddits: trends, audience insights + more | Ideal for Interest-Based Segmentation [Dataset]. https://datarade.ai/data-products/dataplex-reddit-data-global-social-media-data-1-1m-mill-dataplex
    Explore at:
    .json, .csvAvailable download formats
    Dataset authored and provided by
    Dataplex
    Area covered
    Chile, Botswana, Martinique, Macao, Holy See, Christmas Island, Gambia, Mexico, Côte d'Ivoire, Jersey
    Description

    The Reddit Subreddit Dataset by Dataplex offers a comprehensive and detailed view of Reddit’s vast ecosystem, now enhanced with appended AI-generated columns that provide additional insights and categorization. This dataset includes data from over 2.1 million subreddits, making it an invaluable resource for a wide range of analytical applications, from social media analysis to market research.

    Dataset Overview:

    This dataset includes detailed information on subreddit activities, user interactions, post frequency, comment data, and more. The inclusion of AI-generated columns adds an extra layer of analysis, offering sentiment analysis, topic categorization, and predictive insights that help users better understand the dynamics of each subreddit.

    2.1 Million Subreddits with Enhanced AI Insights: The dataset covers over 2.1 million subreddits and now includes AI-enhanced columns that provide: - Sentiment Analysis: AI-driven sentiment scores for posts and comments, allowing users to gauge community mood and reactions. - Topic Categorization: Automated categorization of subreddit content into relevant topics, making it easier to filter and analyze specific types of discussions. - Predictive Insights: AI models that predict trends, content virality, and user engagement, helping users anticipate future developments within subreddits.

    Sourced Directly from Reddit:

    All social media data in this dataset is sourced directly from Reddit, ensuring accuracy and authenticity. The dataset is updated regularly, reflecting the latest trends and user interactions on the platform. This ensures that users have access to the most current and relevant data for their analyses.

    Key Features:

    • Subreddit Metrics: Detailed data on subreddit activity, including the number of posts, comments, votes, and user participation.
    • User Engagement: Insights into how users interact with content, including comment threads, upvotes/downvotes, and participation rates.
    • Trending Topics: Track emerging trends and viral content across the platform, helping you stay ahead of the curve in understanding social media dynamics.
    • AI-Enhanced Analysis: Utilize AI-generated columns for sentiment analysis, topic categorization, and predictive insights, providing a deeper understanding of the data.

    Use Cases:

    • Social Media Analysis: Researchers and analysts can use this dataset to study online behavior, track the spread of information, and understand how content resonates with different audiences.
    • Market Research: Marketers can leverage the dataset to identify target audiences, understand consumer preferences, and tailor campaigns to specific communities.
    • Content Strategy: Content creators and strategists can use insights from the dataset to craft content that aligns with trending topics and user interests, maximizing engagement.
    • Academic Research: Academics can explore the dynamics of online communities, studying everything from the spread of misinformation to the formation of online subcultures.

    Data Quality and Reliability:

    The Reddit Subreddit Dataset emphasizes data quality and reliability. Each record is carefully compiled from Reddit’s vast database, ensuring that the information is both accurate and up-to-date. The AI-generated columns further enhance the dataset's value, providing automated insights that help users quickly identify key trends and sentiments.

    Integration and Usability:

    The dataset is provided in a format that is compatible with most data analysis tools and platforms, making it easy to integrate into existing workflows. Users can quickly import, analyze, and utilize the data for various applications, from market research to academic studies.

    User-Friendly Structure and Metadata:

    The data is organized for easy navigation and analysis, with metadata files included to help users identify relevant subreddits and data points. The AI-enhanced columns are clearly labeled and structured, allowing users to efficiently incorporate these insights into their analyses.

    Ideal For:

    • Data Analysts: Conduct in-depth analyses of subreddit trends, user engagement, and content virality. The dataset’s extensive coverage and AI-enhanced insights make it an invaluable tool for data-driven research.
    • Marketers: Use the dataset to better understand your target audience, tailor campaigns to specific interests, and track the effectiveness of marketing efforts across Reddit.
    • Researchers: Explore the social dynamics of online communities, analyze the spread of ideas and information, and study the impact of digital media on public discourse, all while leveraging AI-generated insights.

    This dataset is an essential resource for anyone looking to understand the intricacies of Reddit's vast ecosystem, offering the data and AI-enhanced insights needed to drive informed decisions and strategies across various fields. Whether you’re tracking emerging trends, analyzing user behavior, or conduc...

  14. Z

    Data from: A Large-Scale Dataset of Twitter Chatter about Online Learning...

    • data.niaid.nih.gov
    • dataverse.harvard.edu
    • +1more
    Updated Aug 10, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nirmalya Thakur (2022). A Large-Scale Dataset of Twitter Chatter about Online Learning during the Current COVID-19 Omicron Wave [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6624080
    Explore at:
    Dataset updated
    Aug 10, 2022
    Dataset authored and provided by
    Nirmalya Thakur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Please cite the following paper when using this dataset:

    N. Thakur, “A Large-Scale Dataset of Twitter Chatter about Online Learning during the Current COVID-19 Omicron Wave,” Journal of Data, vol. 7, no. 8, p. 109, Aug. 2022, doi: 10.3390/data7080109

    Abstract

    The COVID-19 Omicron variant, reported to be the most immune evasive variant of COVID-19, is resulting in a surge of COVID-19 cases globally. This has caused schools, colleges, and universities in different parts of the world to transition to online learning. As a result, social media platforms such as Twitter are seeing an increase in conversations, centered around information seeking and sharing, related to online learning. Mining such conversations, such as Tweets, to develop a dataset can serve as a data resource for interdisciplinary research related to the analysis of interest, views, opinions, perspectives, attitudes, and feedback towards online learning during the current surge of COVID-19 cases caused by the Omicron variant. Therefore this work presents a large-scale public Twitter dataset of conversations about online learning since the first detected case of the COVID-19 Omicron variant in November 2021. The dataset is compliant with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter and the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management.

    Data Description

    The dataset comprises a total of 52,984 Tweet IDs (that correspond to the same number of Tweets) about online learning that were posted on Twitter from 9th November 2021 to 13th July 2022. The earliest date was selected as 9th November 2021, as the Omicron variant was detected for the first time in a sample that was collected on this date. 13th July 2022 was the most recent date as per the time of data collection and publication of this dataset.

    The dataset consists of 9 .txt files. An overview of these dataset files along with the number of Tweet IDs and the date range of the associated tweets is as follows. Table 1 shows the list of all the synonyms or terms that were used for the dataset development.

    Filename: TweetIDs_November_2021.txt (No. of Tweet IDs: 1283, Date Range of the associated Tweet IDs: November 1, 2021 to November 30, 2021)

    Filename: TweetIDs_December_2021.txt (No. of Tweet IDs: 10545, Date Range of the associated Tweet IDs: December 1, 2021 to December 31, 2021)

    Filename: TweetIDs_January_2022.txt (No. of Tweet IDs: 23078, Date Range of the associated Tweet IDs: January 1, 2022 to January 31, 2022)

    Filename: TweetIDs_February_2022.txt (No. of Tweet IDs: 4751, Date Range of the associated Tweet IDs: February 1, 2022 to February 28, 2022)

    Filename: TweetIDs_March_2022.txt (No. of Tweet IDs: 3434, Date Range of the associated Tweet IDs: March 1, 2022 to March 31, 2022)

    Filename: TweetIDs_April_2022.txt (No. of Tweet IDs: 3355, Date Range of the associated Tweet IDs: April 1, 2022 to April 30, 2022)

    Filename: TweetIDs_May_2022.txt (No. of Tweet IDs: 3120, Date Range of the associated Tweet IDs: May 1, 2022 to May 31, 2022)

    Filename: TweetIDs_June_2022.txt (No. of Tweet IDs: 2361, Date Range of the associated Tweet IDs: June 1, 2022 to June 30, 2022)

    Filename: TweetIDs_July_2022.txt (No. of Tweet IDs: 1057, Date Range of the associated Tweet IDs: July 1, 2022 to July 13, 2022)

    The dataset contains only Tweet IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used. For hydrating this dataset the Hydrator application (link to download and a step-by-step tutorial on how to use Hydrator) may be used.

    Table 1. List of commonly used synonyms, terms, and phrases for online learning and COVID-19 that were used for the dataset development

    Terminology

    List of synonyms and terms

    COVID-19

    Omicron, COVID, COVID19, coronavirus, coronaviruspandemic, COVID-19, corona, coronaoutbreak, omicron variant, SARS CoV-2, corona virus

    online learning

    online education, online learning, remote education, remote learning, e-learning, elearning, distance learning, distance education, virtual learning, virtual education, online teaching, remote teaching, virtual teaching, online class, online classes, remote class, remote classes, distance class, distance classes, virtual class, virtual classes, online course, online courses, remote course, remote courses, distance course, distance courses, virtual course, virtual courses, online school, virtual school, remote school, online college, online university, virtual college, virtual university, remote college, remote university, online lecture, virtual lecture, remote lecture, online lectures, virtual lectures, remote lectures

  15. i

    Data from: Twitter Big Data as a Resource for Exoskeleton Research: A...

    • ieee-dataport.org
    Updated Oct 22, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nirmalya Thakur (2022). Twitter Big Data as a Resource for Exoskeleton Research: A Large-Scale Dataset of about 140,000 Tweets and 100 Research Questions [Dataset]. http://doi.org/10.21227/r5mv-ax79
    Explore at:
    Dataset updated
    Oct 22, 2022
    Dataset provided by
    IEEE Dataport
    Authors
    Nirmalya Thakur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Please cite the following paper when using this dataset:N. Thakur, "Twitter Big Data as a Resource for Exoskeleton Research: A Large-Scale Dataset of about 140,000 Tweets from 2017–2022 and 100 Research Questions", Journal of Analytics, Volume 1, Issue 2, 2022, pp. 72-97, DOI: https://doi.org/10.3390/analytics1020007AbstractThe exoskeleton technology has been rapidly advancing in the recent past due to its multitude of applications and diverse use cases in assisted living, military, healthcare, firefighting, and industry 4.0. The exoskeleton market is projected to increase by multiple times its current value within the next two years. Therefore, it is crucial to study the degree and trends of user interest, views, opinions, perspectives, attitudes, acceptance, feedback, engagement, buying behavior, and satisfaction, towards exoskeletons, for which the availability of Big Data of conversations about exoskeletons is necessary. The Internet of Everything style of today’s living, characterized by people spending more time on the internet than ever before, with a specific focus on social media platforms, holds the potential for the development of such a dataset by the mining of relevant social media conversations. Twitter, one such social media platform, is highly popular amongst all age groups, where the topics found in the conversation paradigms include emerging technologies such as exoskeletons. To address this research challenge, this work makes two scientific contributions to this field. First, it presents an open-access dataset of about 140,000 Tweets about exoskeletons that were posted in a 5-year period from 21 May 2017 to 21 May 2022. Second, based on a comprehensive review of the recent works in the fields of Big Data, Natural Language Processing, Information Retrieval, Data Mining, Pattern Recognition, and Artificial Intelligence that may be applied to relevant Twitter data for advancing research, innovation, and discovery in the field of exoskeleton research, a total of 100 Research Questions are presented for researchers to study, analyze, evaluate, ideate, and investigate based on this dataset.

  16. Data from: IA Tweets Analysis Dataset (Spanish)

    • zenodo.org
    • produccioncientifica.uca.es
    • +1more
    csv
    Updated Aug 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriel Guerrero-Contreras; Gabriel Guerrero-Contreras; Sara Balderas-Díaz; Sara Balderas-Díaz; Alejandro Serrano-Fernández; Andrés Muñoz; Andrés Muñoz; Alejandro Serrano-Fernández (2024). IA Tweets Analysis Dataset (Spanish) [Dataset]. http://doi.org/10.5281/zenodo.10821485
    Explore at:
    csvAvailable download formats
    Dataset updated
    Aug 3, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gabriel Guerrero-Contreras; Gabriel Guerrero-Contreras; Sara Balderas-Díaz; Sara Balderas-Díaz; Alejandro Serrano-Fernández; Andrés Muñoz; Andrés Muñoz; Alejandro Serrano-Fernández
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    General Description

    This dataset comprises 4,038 tweets in Spanish, related to discussions about artificial intelligence (AI), and was created and utilized in the publication "Enhancing Sentiment Analysis on Social Media: Integrating Text and Metadata for Refined Insights," (10.1109/IE61493.2024.10599899) presented at the 20th International Conference on Intelligent Environments. It is designed to support research on public perception, sentiment, and engagement with AI topics on social media from a Spanish-speaking perspective. Each entry includes detailed annotations covering sentiment analysis, user engagement metrics, and user profile characteristics, among others.

    Data Collection Method

    Tweets were gathered through the Twitter API v1.1 by targeting keywords and hashtags associated with artificial intelligence, focusing specifically on content in Spanish. The dataset captures a wide array of discussions, offering a holistic view of the Spanish-speaking public's sentiment towards AI.

    Dataset Content

    • ID: A unique identifier for each tweet.
    • text: The textual content of the tweet. It is a string with a maximum allowed length of 280 characters.
    • polarity: The tweet's sentiment polarity (e.g., Positive, Negative, Neutral).
    • favorite_count: Indicates how many times the tweet has been liked by Twitter users. It is a non-negative integer.
    • retweet_count: The number of times this tweet has been retweeted. It is a non-negative integer.
    • user_verified: When true, indicates that the user has a verified account, which helps the public recognize the authenticity of accounts of public interest. It is a boolean data type with two allowed values: True or False.
    • user_default_profile: When true, indicates that the user has not altered the theme or background of their user profile. It is a boolean data type with two allowed values: True or False.
    • user_has_extended_profile: When true, indicates that the user has an extended profile. An extended profile on Twitter allows users to provide more detailed information about themselves, such as an extended biography, a header image, details about their location, website, and other additional data. It is a boolean data type with two allowed values: True or False.
    • user_followers_count: The current number of followers the account has. It is a non-negative integer.
    • user_friends_count: The number of users that the account is following. It is a non-negative integer.
    • user_favourites_count: The number of tweets this user has liked since the account was created. It is a non-negative integer.
    • user_statuses_count: The number of tweets (including retweets) posted by the user. It is a non-negative integer.
    • user_protected: When true, indicates that this user has chosen to protect their tweets, meaning their tweets are not publicly visible without their permission. It is a boolean data type with two allowed values: True or False.
    • user_is_translator: When true, indicates that the user posting the tweet is a verified translator on Twitter. This means they have been recognized and validated by the platform as translators of content in different languages. It is a boolean data type with two allowed values: True or False.

    Cite as

    Guerrero-Contreras, G., Balderas-Díaz, S., Serrano-Fernández, A., & Muñoz, A. (2024, June). Enhancing Sentiment Analysis on Social Media: Integrating Text and Metadata for Refined Insights. In 2024 International Conference on Intelligent Environments (IE) (pp. 62-69). IEEE.

    Potential Use Cases

    This dataset is aimed at academic researchers and practitioners with interests in:

    • Sentiment analysis and natural language processing (NLP) with a focus on AI discussions in the Spanish language.
    • Social media analysis on public engagement and perception of artificial intelligence among Spanish speakers.
    • Exploring correlations between user engagement metrics and sentiment in discussions about AI.

    Data Format and File Type

    The dataset is provided in CSV format, ensuring compatibility with a wide range of data analysis tools and programming environments.

    License

    The dataset is available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license, permitting sharing, copying, distribution, transmission, and adaptation of the work for any purpose, including commercial, provided proper attribution is given.

  17. Z

    Data from: TrueFace: a Dataset for the Detection of Synthetic Face Images...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Oct 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Miorandi, Daniele (2022). TrueFace: a Dataset for the Detection of Synthetic Face Images from Social Networks [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7065063
    Explore at:
    Dataset updated
    Oct 13, 2022
    Dataset provided by
    Miorandi, Daniele
    Stefani, Antonio Luigi
    Pasquini, Cecilia
    Verde, Sebastiano
    Boato, Giulia
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    TrueFace is a first dataset of social media processed real and synthetic faces, obtained by the successful StyleGAN generative models, and shared on Facebook, Twitter and Telegram.

    Images have historically been a universal and cross-cultural communication medium, capable of reaching people of any social background, status or education. Unsurprisingly though, their social impact has often been exploited for malicious purposes, like spreading misinformation and manipulating public opinion. With today's technologies, the possibility to generate highly realistic fakes is within everyone's reach. A major threat derives in particular from the use of synthetically generated faces, which are able to deceive even the most experienced observer. To contrast this fake news phenomenon, researchers have employed artificial intelligence to detect synthetic images by analysing patterns and artifacts introduced by the generative models. However, most online images are subject to repeated sharing operations by social media platforms. Said platforms process uploaded images by applying operations (like compression) that progressively degrade those useful forensic traces, compromising the effectiveness of the developed detectors. To solve the synthetic-vs-real problem "in the wild", more realistic image databases, like TrueFace, are needed to train specialised detectors.

  18. Z

    Data from: TikTok dataset - Current affairs on TikTok. Virality and...

    • data.niaid.nih.gov
    • ekoizpen-zientifikoa.ehu.eus
    • +1more
    Updated Aug 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Morales-i-Gras, Jordi (2022). TikTok dataset - Current affairs on TikTok. Virality and entertainment for digital natives [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_7024884
    Explore at:
    Dataset updated
    Aug 28, 2022
    Dataset provided by
    Morales-i-Gras, Jordi
    Peña-Fernández, Simón
    Larrondo-Ureta, Ainara
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Tiktok network graph with 5,638 nodes and 318,986 unique links, representing up to 790,599 weighted links between labels, using Gephi network analysis software.

    Source of:

    Peña-Fernández, Simón, Larrondo-Ureta, Ainara, & Morales-i-Gras, Jordi. (2022). Current affairs on TikTok. Virality and entertainment for digital natives. Profesional De La Información, 31(1), 1–12. https://doi.org/10.5281/zenodo.5962655

    Abstract:

    Since its appearance in 2018, TikTok has become one of the most popular social media platforms among digital natives because of its algorithm-based engagement strategies, a policy of public accounts, and a simple, colorful, and intuitive content interface. As happened in the past with other platforms such as Facebook, Twitter, and Instagram, various media are currently seeking ways to adapt to TikTok and its particular characteristics to attract a younger audience less accustomed to the consumption of journalistic material. Against this background, the aim of this study is to identify the presence of the media and journalists on TikTok, measure the virality and engagement of the content they generate, describe the communities created around them, and identify the presence of journalistic use of these accounts. For this, 23,174 videos from 143 accounts belonging to media from 25 countries were analyzed. The results indicate that, in general, the presence and impact of the media in this social network are low and that most of their content is oriented towards the creation of user communities based on viral content and entertainment. However, albeit with a lesser presence, one can also identify accounts and messages that adapt their content to the specific characteristics of TikTok. Their virality and engagement figures illustrate that there is indeed a niche for current affairs on this social network.

  19. Average daily time spent on social media worldwide 2012-2024

    • statista.com
    • wwwexpressvpn.online
    • +1more
    Updated Apr 10, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Average daily time spent on social media worldwide 2012-2024 [Dataset]. https://www.statista.com/statistics/433871/daily-social-media-usage-worldwide/
    Explore at:
    Dataset updated
    Apr 10, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    How much time do people spend on social media? As of 2024, the average daily social media usage of internet users worldwide amounted to 143 minutes per day, down from 151 minutes in the previous year. Currently, the country with the most time spent on social media per day is Brazil, with online users spending an average of three hours and 49 minutes on social media each day. In comparison, the daily time spent with social media in the U.S. was just two hours and 16 minutes. Global social media usageCurrently, the global social network penetration rate is 62.3 percent. Northern Europe had an 81.7 percent social media penetration rate, topping the ranking of global social media usage by region. Eastern and Middle Africa closed the ranking with 10.1 and 9.6 percent usage reach, respectively. People access social media for a variety of reasons. Users like to find funny or entertaining content and enjoy sharing photos and videos with friends, but mainly use social media to stay in touch with current events friends. Global impact of social mediaSocial media has a wide-reaching and significant impact on not only online activities but also offline behavior and life in general. During a global online user survey in February 2019, a significant share of respondents stated that social media had increased their access to information, ease of communication, and freedom of expression. On the flip side, respondents also felt that social media had worsened their personal privacy, increased a polarization in politics and heightened everyday distractions.

  20. P

    3MASSIV Dataset

    • paperswithcode.com
    Updated May 13, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vikram Gupta; Trisha Mittal; Puneet Mathur; Vaibhav Mishra; Mayank Maheshwari; Aniket Bera; Debdoot Mukherjee; Dinesh Manocha (2022). 3MASSIV Dataset [Dataset]. https://paperswithcode.com/dataset/3massiv
    Explore at:
    Dataset updated
    May 13, 2022
    Authors
    Vikram Gupta; Trisha Mittal; Puneet Mathur; Vaibhav Mishra; Mayank Maheshwari; Aniket Bera; Debdoot Mukherjee; Dinesh Manocha
    Description

    A multilingual, multimodal and multi-aspect, expertly-annotated dataset of diverse short videos extracted from short-video social media platform - Moj. 3MASSIV comprises of 50k short videos (~20 seconds average duration) and 100K unlabeled videos in 11 different languages and captures popular short video trends like pranks, fails, romance, comedy expressed via unique audio-visual formats like self-shot videos, reaction videos, lip-synching, self-sung songs, etc.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Twitter users in the United States 2019-2028 [Dataset]. https://www.statista.com/topics/3196/social-media-usage-in-the-united-states/
Organization logo

Twitter users in the United States 2019-2028

Explore at:
74 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jun 13, 2024
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Area covered
United States
Description

The number of Twitter users in the United States was forecast to continuously increase between 2024 and 2028 by in total 4.3 million users (+5.32 percent). After the ninth consecutive increasing year, the Twitter user base is estimated to reach 85.08 million users and therefore a new peak in 2028. Notably, the number of Twitter users of was continuously increasing over the past years.User figures, shown here regarding the platform twitter, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Twitter users in countries like Canada and Mexico.

Search
Clear search
Close search
Google apps
Main menu