100+ datasets found
  1. Reddit usage reach in the United States 2025, by age group

    • statista.com
    Updated Nov 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Reddit usage reach in the United States 2025, by age group [Dataset]. https://www.statista.com/statistics/261766/share-of-us-internet-users-who-use-reddit-by-age-group/
    Explore at:
    Dataset updated
    Nov 25, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Feb 5, 2025 - Jun 18, 2025
    Area covered
    United States
    Description

    According to a survey of adults in the United States in 2025, ** percent of respondents who used Reddit were aged between 18 and 29 years. Reddit usage tends to be affected by users’ age, with older users reporting lower levels of engagement. Reddit engagement in numbers Reddit is one of the most popular websites in the forum category, allowing users to interact in multiple close-knit communities organized in sub-threads and divided by topics. In March 2024, Reddit.com registered an average of *** billion monthly visits from desktop and mobile combined. Reddit users are mostly based in North America, with the United States accounting for the biggest share of traffic worldwide by far. The future of Reddit Reddit was created in 2005 and was redesigned for the very first time in 2018 to make it more appealing to new users and increase engagement from non-participating guests (jokingly called “lurkers”) who nonetheless enjoy the content. In February 2024, the company announced it was entering the public market by releasing its S-1 registration statement. In 2025, the company generated around *** billion U.S. dollars worldwide in revenues. This translated into an average revenue per user (ARPU) of around **** dollars in the last quarter of 2024.

  2. Distribution of Reddit.com traffic 2024, by country

    • statista.com
    Updated Nov 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Distribution of Reddit.com traffic 2024, by country [Dataset]. https://www.statista.com/statistics/325144/reddit-global-active-user-distribution/
    Explore at:
    Dataset updated
    Nov 25, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    In the six months ending March 2024, the United States accounted for 48.46 percent of traffic to the online forum Reddit.com. The United Kingdom was ranked second, accounting for 7.16 percent of web visits to the social media platform. Reddit in the United States In August 2023, Reddit accounted for slightly over 1.6 percent of social media website traffic in the United States. Founded in 2005, Reddit is a discussion website which enables users to aggregate news by posting links and let other users vote and comment on them. There are thousands of subforums, called subreddits, on a wide range of topics available. One of the most popular subreddits is the AMA (“Ask Me Anything”), where celebrities, public figures or people in unique positions post threads that allow other Reddit users to ask them anything. In 2022, Nicolas Cage's AMA post generated over 238.5 thousand upvotes, making it the most popular AMA of the year. Reddit users in the United States Reddit use in the United States is more prevalent among younger online audiences. During a February 2021 survey, it was found that 36 percent of internet users aged 18 to 29 years and 22 percent of users aged 30 to 49 years used Reddit. However, the reach of the social platform strongly declines with age. Also, whilst around a 23 of male adults in the U.S. access Reddit, only 12 percent of women do the same.

  3. b

    Reddit Revenue and Usage Statistics (2026)

    • businessofapps.com
    Updated Oct 2, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Business of Apps (2020). Reddit Revenue and Usage Statistics (2026) [Dataset]. https://www.businessofapps.com/data/reddit-statistics/
    Explore at:
    Dataset updated
    Oct 2, 2020
    Dataset authored and provided by
    Business of Apps
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    In a second attempt to be one of the first inductees of the startup incubator Y Combinator, University of Virginia roommates Steve Huffman and Alexis Ohanian built Reddit, which they labelled “the...

  4. S

    Reddit Statistics 2026: Traffic, Users, and More

    • sqmagazine.co.uk
    Updated Oct 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SQ Magazine (2025). Reddit Statistics 2026: Traffic, Users, and More [Dataset]. https://sqmagazine.co.uk/reddit-statistics/
    Explore at:
    Dataset updated
    Oct 3, 2025
    Dataset authored and provided by
    SQ Magazine
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2024 - Dec 31, 2026
    Area covered
    Worldwide, Earth
    Description

    Discover key Reddit statistics, including user growth, subreddit activity, engagement rates, content trends, and traffic insights!

  5. Reddit user worldwide 2024, by country

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, Reddit user worldwide 2024, by country [Dataset]. https://www.statista.com/forecasts/1174696/reddit-user-by-country
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 1, 2024 - Dec 31, 2024
    Area covered
    Albania
    Description

    Comparing the *** selected regions regarding the number of Reddit users , the United States is leading the ranking (****** million users) and is followed by the United Kingdom with ***** million users. At the other end of the spectrum is Gabon with **** million users, indicating a difference of ****** million users to the United States. User figures, shown here with regards to the platform reddit, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once. Reddit users encompass both users that are logged in and those that are not.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to *** countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).

  6. Reddit usage reach in the United States 2025, by ethnicity

    • statista.com
    Updated Nov 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Reddit usage reach in the United States 2025, by ethnicity [Dataset]. https://www.statista.com/statistics/261770/share-of-us-internet-users-who-use-reddit-by-ethnicity/
    Explore at:
    Dataset updated
    Nov 25, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Feb 5, 2025 - Jun 18, 2025
    Area covered
    United States
    Description

    According to a survey of internet users conducted in the United States in 2025, ** percent of Black Americans reported having ever used Reddit. Asian Americans appeared to be more likely than both Black and White Americans to have ever used the social media and community forum, with ** percent of users in the demographic reporting to have used the popular forum and social media.

  7. Reddit Top Posts: 50-Subreddit Analysis 2011-2024

    • kaggle.com
    zip
    Updated Sep 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sachin Kanchan (2024). Reddit Top Posts: 50-Subreddit Analysis 2011-2024 [Dataset]. https://www.kaggle.com/datasets/sachinkanchan92/reddit-top-posts-50-subreddit-analysis-2011-2024
    Explore at:
    zip(10894944 bytes)Available download formats
    Dataset updated
    Sep 14, 2024
    Authors
    Sachin Kanchan
    License

    https://www.reddit.com/wiki/apihttps://www.reddit.com/wiki/api

    Description

    Dataset Overview

    This dataset presents a comprehensive collection of top-performing posts - top 1000 posts of All time - from 50 of Reddit's most popular subreddits by subscriber count, offering a view of the content trends and user engagement patterns across diverse online communities.

    The data spans from September 2011 to September 2024, providing a wide range of historical perspective on digital content evolution and user interaction dynamics with about 50,000 records in the dataset overall.

    Key Dataset Characteristics

    • Data Source: Extracted via Reddit's official API
    • Date Range: September 2011 to September 2024
    • Scope: Top posts from 50 high-traffic subreddits
    • Structure: 50 individual CSV files, each representing a distinct subreddit
    • Sample Size: Up to 1000 posts per subreddit (constrained by API limitations)

    File Structure

    The dataset comprises 50 separate CSV files, each named after its corresponding subreddit (e.g., "askreddit_top_1000_posts.csv"). This structure facilitates both focused subreddit-specific analysis and broader cross-subreddit comparative studies.

    Data Schema

    Each CSV file contains the following fields:

    1. id: Unique post identifier
    2. title: Post title
    3. score: Net vote count (upvotes minus downvotes)
    4. upvote_ratio: Proportion of upvotes to total votes
    5. num_comments: Comment count
    6. created_utc: Post creation timestamp
    7. subreddit: Subreddit name
    8. subscribers: Subreddit subscriber count at data collection time
    9. permalink: Reddit URL for the post
    10. url: External content URL (if applicable)
    11. domain: Source domain for external links
    12. num_awards: Count of Reddit awards received
    13. num_crossposts: Frequency of post sharing to other subreddits
    14. crosspost_subreddits: List of subreddits where the post was cross-posted
    15. post_type: Content category (e.g., text, link, image, video)
    16. is_nsfw: Boolean for Not Safe For Work content
    17. is_bot: Boolean indicating bot-generated posts
    18. is_megathread: Boolean for large discussion threads
    19. body: Full text content for text-posts

    Initial Data Check

    An initial data quality check was done to ensure data integrity and if intensive cleaning might be required.

    Observations: - Some minimal amount of cleaning is required that is left on the user to explore. - body - The post body sometimes contains extremely long texts with new lines and bullet points. These appear wonky on csv, often overflowing to new lines due to limitations of Excel's csv format. - But exporting the file as xlsx seems to format the text properly. No issues in Pandas either, though pandas by default limits the number of characters you can see in a field, or can adjust the width of the column to display all contents (Not advisable since extremely long texts)

    Data Considerations

    1. Text Encoding: Some special characters, apostrophes, and emojis in post titles may require cleaning or standardization.
    2. Boolean Fields: Values are True, False, or Null. Null values typically indicate archived posts where data retrieval was limited.
    3. Awards Data: The num_awards field may be null for older, archived posts due to data availability constraints.
    4. API Limitations: Each file contains up to 1000 posts, reflecting Reddit's API constraints, though initial plan was to fetch more records.
    5. Author Information: The post 'author' (user who made the reddit post) column was dropped during data preprocessing for privacy reasons.

    Data Source and Collection

    This dataset was collected using the official Reddit API, in compliance with Reddit's Data API Terms.

    Ownership and Rights

    • The content in this dataset is owned by the original Reddit users who created it, not by the dataset creator.
    • This dataset is provided under Reddit's API Terms and is subject to those terms.

    Usage Guidelines

    1. By using this dataset, you agree to comply with Reddit's Data API Terms: https://www.reddit.com/wiki/api-terms
    2. This data should be used for research and analysis purposes only.
    3. Do not use this data to train machine learning or AI models without express permission from the original content creators.
    4. Any publication or presentation using this data should properly attribute Reddit and the original content creators.

    Data Processing

    The post 'author' (user who made the reddit post) field was dropped during data preprocessing to respect users' privacy and as a standard procedure to remove personally identifiable information.

    Ethical Considerations

    • Respect user privacy: Although this data is publicly available, please use it responsibly.
    • Be aware of potential biases in the data, which may reflect the demographics and opinions of specific Reddit communities.

    Additional Notes

    • This dataset does not include any personally identifiable informa...
  8. Reddit's /r/funny Subreddit

    • kaggle.com
    zip
    Updated Dec 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Reddit's /r/funny Subreddit [Dataset]. https://www.kaggle.com/datasets/thedevastator/explore-reddit-s-funny-subreddit-analyze-communi
    Explore at:
    zip(93052 bytes)Available download formats
    Dataset updated
    Dec 15, 2022
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Explore Reddit's Funny Subreddit & Analyze Community Engagement!

    Quantifying Community Interaction Through Reddit Posts

    By Reddit [source]

    About this dataset

    This dataset offers an insightful analysis into one of the most talked-about online communities today: Reddit. Specifically, we are focusing on the funny subreddit, a subsection of the main forum that enjoys the highest engagement across all Reddit users. Not only does this dataset include post titles, scores and other details regarding post creation and engagement; it also includes powerful metrics to measure active community interaction such as comment numbers and timestamps. By diving deep into this data, we can paint a fuller picture in terms of what people find funny in our digital age - how well do certain topics draw responses? How does sentiment change over time? And how can community managers use these insights to grow their platforms and better engage their userbase for lasting success? With this comprehensive dataset at your fingertips, you'll be able to answer each question - and more

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    Introduction

    Welcome to the Reddit's Funny Subreddit Kaggle Dataset. In this dataset you will explore and analyze posts from the popular subreddit to gain insights into community engagement. With this dataset, you can understand user engagement trends and learn how people interact with content from different topics. This guide will provide further information about how to use this dataset for your data analysis projects.

    Important Columns

    This datasets contains columns such as: title, score, url, comms_num (number of comments), created (date of post), body (content of post) and timestamp. All these columns are important in understanding user interactions with each post on Reddit’s Funny Subreddit.

    Exploratory Data Analysis

    In order to get a better understanding of user engagement on the subreddit, some initial exploration is necessary. By using graphical tools such as histograms or boxplots we can understand basic parameter values like scores or comments numbers for each post in the subreddit easily by just observing their distribution over time or through different parameters (for example: type of joke).

    Dimensionality reduction

    For more advanced analytics it is recommended that a dimensionality reduction technique like PCA should be used first before tackling any real analysis tasks so that similar posts can be grouped together and easier conclusions regarding them can be drawn out later on more confidently by leaving out any kind of conflicting/irrelevant variables which could cloud up any data-driven decisions taken forward at a later date if not properly accounted for early on in an appropriate manner after dimensional consolidation has been performed successfully first correctly effectively right off the bat once starting out cleanly and properly upfront accordingly throughout..

    Further Guidance

    If further assistance with using this dataset is required then further readings into topics like text mining, natural language processing , machine learning , etc are highly recommended where detailed explanation related to various steps which could help unlock greater value from Reddit's funny subreddits are explained elaborately hopefully giving readers or researchers ideas over what sort of approaches need being taking when it comes analyzing text-based online service platforms such as Reddit during data analytics/science related tasks

    Research Ideas

    • Analyzing post title length vs. engagement (i.e., score, comments).
    • Comparing sentiment of post bodies between posts that have high/low scores and comments.
    • Comparing topics within the posts that have high/low scores and comments to look for any differences in content or style of writing based on engagement level

    Acknowledgements

    If you use this dataset in your research, please credit the original authors.

    Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: funny.csv | Column name | Description | |:--------------|:------------------------...

  9. Reddit: distribution of global audiences 2025, by gender

    • statista.com
    Updated Nov 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Reddit: distribution of global audiences 2025, by gender [Dataset]. https://www.statista.com/statistics/1255182/distribution-of-users-on-reddit-worldwide-gender/
    Explore at:
    Dataset updated
    Nov 25, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    As of the second quarter of 2025, the majority of Reddit users were male, accounting for ** percent of its audience base. Additionally, most of Reddit's desktop users were based in the United States.

  10. All Subreddits and Relations between them

    • kaggle.com
    zip
    Updated Sep 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). All Subreddits and Relations between them [Dataset]. https://www.kaggle.com/datasets/thedevastator/all-subreddits-and-relations-between-them
    Explore at:
    zip(17648915 bytes)Available download formats
    Dataset updated
    Sep 20, 2022
    Authors
    The Devastator
    License

    https://www.reddit.com/wiki/apihttps://www.reddit.com/wiki/api

    Description

    Reddit Graph Dataset

    This dataset aims to build a graph of subreddit links based on how they reference each other. The original database dump can be found here.

    Subreddits Columns

    • name (str): name of the subreddit.
      • between 2 and 21 characters (lowercase letters, digits and underscores).
    • type (str): type of the subreddit.
    • title (str): title of the subreddit
    • description (str): short description of the subreddit.
    • subscribers (int?): amount of subscribers at the moment.
    • nsfw (bool?): indicator if its flaged as not safe for work 🔞.
    • quarantined (bool?): indicator if it has been quarantined 😷.
    • color (str): key color of the subreddit.
    • img_banner (str?): url of the image used as the banner.
    • img_icon (str?): url of the image used as the icon (snoo).
    • created_at (datetime): utc timestamp of when the subreddit was created.
    • updated_at (datetime): utc timestamp of when the information of the subreddit was last updated.

    note: the '?' indicates that the value can be null under certain conditions.

    Subreddits Stats

    TYPEAMOUNT
    TOTAL127800
    public59227
    banned31473
    restricted14601
    public [nsfw]14244
    private5139
    restricted [nsfw]3014
    public [quarantined]29
    restricted [quarantined]21
    archived17
    premium12
    public [nsfw] [quarantined]11
    user [nsfw]6
    user4
    restricted [nsfw] [quarantined]1
    employees1

    Links Columns

    • source (str): name of the subreddit where the link was found.
    • target (str): name of the linked subreddit.
    • type (str): place where the reference from source to target was found.
    • updated_at (datetime): utc timestamp of when the information the link was last updated.

    Subreddits Stats

    TYPEAMOUNT
    TOTAL349744
    wiki214206
    sidebar123650
    topbar7291
    description4597
  11. The Reddit /r/Place Dataset

    • kaggle.com
    zip
    Updated Apr 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lexyr (2022). The Reddit /r/Place Dataset [Dataset]. https://www.kaggle.com/datasets/pavellexyr/the-reddit-place-dataset
    Explore at:
    zip(55288851 bytes)Available download formats
    Dataset updated
    Apr 5, 2022
    Authors
    Lexyr
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Context

    For a moment, starting April 1st, 2022, Reddit became the talk of the entire internet as they reopened /r/Place - a collaborative pixel canvas stemming all the way from 2017.

    Join us in following the recorded history of /r/Place and its related discussion, from the first pixels placed to its inevitable erasure.

    Content

    This dataset contains all the posts and comments made on /r/Place from from the first of January, 2022, to the fourth of April, 2022.

    The dataset was procured using SocialGrep.

    To preserve users' anonymity and to prevent targeted harassment, the data does not include usernames.

    Acknowledgements

    We would like to thank the Reddit user /u/prosto_sanja for recording the timelapse data. We would also like to thank all Redditors who participated in the canvas. Here's to another good April Fools.

    Inspiration

    The place raises many questions, some of them surprisingly serious.

    Which artpieces were most frequently accused of botting? Do the accusations match their behaviour - can we infer a "botted" picture somehow, even without looking at its users' Reddit histories?

    The "amogus" is one of the smallest units of individual contribution, surprisingly resilient due to its cheeky, non-intrusive nature. How did their numbers change over time - does this match any shift in the players' moods as they feel more empowered or disenfranchised to create together?

    Which areas ended up the most talked about?

  12. Reddit: quarterly number of DAU 2021-2025, by online status

    • statista.com
    Updated Feb 6, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2026). Reddit: quarterly number of DAU 2021-2025, by online status [Dataset]. https://www.statista.com/statistics/1453133/reddit-quarterly-dau-by-online-status/
    Explore at:
    Dataset updated
    Feb 6, 2026
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    During the fourth quarter of 2025, online forum and news aggregator Reddit saw approximately 121.4 million daily active users (DAU) engaging with its platform. Of these, over 70.7 million users were not logged in and accessed the platform's content without proving they registered to Reddit. This represents an increase compared to the previous quarter, when Reddit saw 65.8 million logged-off DAU.

  13. Reddit usage reach in the United States 2025, by gender

    • statista.com
    Updated Nov 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Reddit usage reach in the United States 2025, by gender [Dataset]. https://www.statista.com/statistics/261765/share-of-us-internet-users-who-use-reddit-by-gender/
    Explore at:
    Dataset updated
    Nov 25, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Feb 5, 2025 - Jun 18, 2025
    Area covered
    United States
    Description

    As of 2025, 29 percent of male respondents in the United States stated that they used Reddit, compared to 23 percent of their female counterpart. Reddit is a social networking and online forum company. The platform is organized in thematic groups, also called subreddits.

  14. Reddit: /r/news

    • kaggle.com
    zip
    Updated Dec 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Reddit: /r/news [Dataset]. https://www.kaggle.com/datasets/thedevastator/uncovering-popularity-and-user-engagement-trends/discussion
    Explore at:
    zip(146481 bytes)Available download formats
    Dataset updated
    Dec 17, 2022
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Reddit: /r/news

    Exploring Topics, Scores, and Engagement

    By Reddit [source]

    About this dataset

    This dataset provides an in-depth look into learning what communities find important and engaging in the news. With this data, researchers can discover trends related to user engagement and popular topics within subreddits. By examining the “score” and “comms_num” columns, our researchers will be able to pinpoint which topics are most liked, discussed or shared within the various subreddits. Researchers may also gain insights into not only how popular a topic is but how it is growing over time. Additionally, by exploring the body column of our dataset, researchers can understand more about which types of news stories drive conversation within particular subreddits—providing an opportunity for deeper analysis of that subreddit’s diverse community dynamics

    The dataset includes eight columns: title, score, id, url, comms_num created**body and timestamp** which can help us identify key insights into user engagement among popular subreddits. With this data we may also determine relationships between topics of discussion and their impact on user engagement allowing us to create a better understanding surrounding issue-based conversations online as well as uncover emerging trends in online news consumption habits

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset is useful for those who are looking to gain insight into the popularity and user engagement of specific subreddits. The data includes 8 different columns including title, score, id, url, comms_num, created, body and timestamp. This can provide valuable information about how users view and interact with particular topics across various subreddits.

    In this guide we’ll look at how you can use this dataset to uncover trends in user engagement on topics within specific subreddits as well as measure the overall popularity of these topics within a subreddit.

    1) Analyzing Score: By analyzing the “score” column you can determine which news stories are popular in a particular subreddit and which ones aren't by looking at how many upvotes each story has received. With this data you will be able to determine trends in what types of stories users preferred within a particular subreddit over time.

    2) Analyzing Comms_Num: Similarly to analyzing the score column you can analyze the “comms_num” column to see which news stories had more engagement from users by tracking number of comments received on each post. Knowing these points can provide insight into what types of stories tend to draw more comment activity from users in certain subreddits from one day or an extended period of time such tracking post activity for multiple weeks or months at once 3) Analyzing Body: Additionally by looking at the “body” column for each post researchers can gain a better understanding which kinds of topics/news draw attention among specific Reddit communities.. With that complete picture researchers have access not only to data measuring Reddit buzz but also access topic discussion/comments helping generate further insights into why certain posts might be popular or receive more comments than others

    Overallthis dataset provides valuable insights about user engagedment related specifically topics trending accross subsbreddits allowing anyone interested reseraching such things easier way access those insights all one place

    Research Ideas

    • Grouping news topics within particular subreddits and assessing the overall popularity of those topics in terms of scores/user engagement.
    • Correlating user engagement with certain news topics to understand how they influence discussion or reactions on a subreddit.
    • Examining the potential correlation between score and the actual body content of a given post to assess what types of content are most successful in gaining interest from users and creating positive engagement for posts

    Acknowledgements

    If you use this dataset in your research, please credit the original authors.

    Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: news.csv | Column name | Description ...

  15. d

    Dataplex: Reddit Data | Consumer Behavior Data | 2.1M+ subreddits: trends,...

    • datarade.ai
    .json, .csv
    Updated Aug 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataplex (2024). Dataplex: Reddit Data | Consumer Behavior Data | 2.1M+ subreddits: trends, audience insights + more | Ideal for Interest-Based Segmentation [Dataset]. https://datarade.ai/data-products/dataplex-reddit-data-consumer-behavior-data-2-1m-subred-dataplex
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Aug 7, 2024
    Dataset authored and provided by
    Dataplex
    Area covered
    Netherlands, Lithuania, Cuba, Saint Barthélemy, Togo, Cocos (Keeling) Islands, Tunisia, Belize, Croatia, Burkina Faso
    Description

    The Reddit Subreddit Dataset by Dataplex offers a comprehensive and detailed view of Reddit’s vast ecosystem, now enhanced with appended AI-generated columns that provide additional insights and categorization. This dataset includes data from over 2.1 million subreddits, making it an invaluable resource for a wide range of analytical applications, from social media analysis to market research.

    Dataset Overview:

    This dataset includes detailed information on subreddit activities, user interactions, post frequency, comment data, and more. The inclusion of AI-generated columns adds an extra layer of analysis, offering sentiment analysis, topic categorization, and predictive insights that help users better understand the dynamics of each subreddit.

    2.1 Million Subreddits with Enhanced AI Insights: The dataset covers over 2.1 million subreddits and now includes AI-enhanced columns that provide: - Sentiment Analysis: AI-driven sentiment scores for posts and comments, allowing users to gauge community mood and reactions. - Topic Categorization: Automated categorization of subreddit content into relevant topics, making it easier to filter and analyze specific types of discussions. - Predictive Insights: AI models that predict trends, content virality, and user engagement, helping users anticipate future developments within subreddits.

    Sourced Directly from Reddit:

    All data in this dataset is sourced directly from Reddit, ensuring accuracy and authenticity. The dataset is updated regularly, reflecting the latest trends and user interactions on the platform. This ensures that users have access to the most current and relevant data for their analyses.

    Key Features:

    • Subreddit Metrics: Detailed data on subreddit activity, including the number of posts, comments, votes, and user participation.
    • User Engagement: Insights into how users interact with content, including comment threads, upvotes/downvotes, and participation rates.
    • Trending Topics: Track emerging trends and viral content across the platform, helping you stay ahead of the curve in understanding social media dynamics.
    • AI-Enhanced Analysis: Utilize AI-generated columns for sentiment analysis, topic categorization, and predictive insights, providing a deeper understanding of the data.

    Use Cases:

    • Social Media Analysis: Researchers and analysts can use this dataset to study online behavior, track the spread of information, and understand how content resonates with different audiences.
    • Market Research: Marketers can leverage the dataset to identify target audiences, understand consumer preferences, and tailor campaigns to specific communities.
    • Content Strategy: Content creators and strategists can use insights from the dataset to craft content that aligns with trending topics and user interests, maximizing engagement.
    • Academic Research: Academics can explore the dynamics of online communities, studying everything from the spread of misinformation to the formation of online subcultures.

    Data Quality and Reliability:

    The Reddit Subreddit Dataset emphasizes data quality and reliability. Each record is carefully compiled from Reddit’s vast database, ensuring that the information is both accurate and up-to-date. The AI-generated columns further enhance the dataset's value, providing automated insights that help users quickly identify key trends and sentiments.

    Integration and Usability:

    The dataset is provided in a format that is compatible with most data analysis tools and platforms, making it easy to integrate into existing workflows. Users can quickly import, analyze, and utilize the data for various applications, from market research to academic studies.

    User-Friendly Structure and Metadata:

    The data is organized for easy navigation and analysis, with metadata files included to help users identify relevant subreddits and data points. The AI-enhanced columns are clearly labeled and structured, allowing users to efficiently incorporate these insights into their analyses.

    Ideal For:

    • Data Analysts: Conduct in-depth analyses of subreddit trends, user engagement, and content virality. The dataset’s extensive coverage and AI-enhanced insights make it an invaluable tool for data-driven research.
    • Marketers: Use the dataset to better understand your target audience, tailor campaigns to specific interests, and track the effectiveness of marketing efforts across Reddit.
    • Researchers: Explore consumer behavior data of online communities, analyze the spread of ideas and information, and study the impact of digital media on public discourse, all while leveraging AI-generated insights.

    This dataset is an essential resource for anyone looking to understand the intricacies of Reddit's vast ecosystem, offering the data and AI-enhanced insights needed to drive informed decisions and strategies across various fields. Whether you’re tracking emerging trends, analyzing user behavior, or conducting acade...

  16. Data from: WikiReddit: Tracing Information and Attention Flows Between...

    • zenodo.org
    bin
    Updated May 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patrick Gildersleve; Patrick Gildersleve; Anna Beers; Anna Beers; Viviane Ito; Viviane Ito; Agustin Orozco; Agustin Orozco; Francesca Tripodi; Francesca Tripodi (2025). WikiReddit: Tracing Information and Attention Flows Between Online Platforms [Dataset]. http://doi.org/10.5281/zenodo.14653265
    Explore at:
    binAvailable download formats
    Dataset updated
    May 4, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Patrick Gildersleve; Patrick Gildersleve; Anna Beers; Anna Beers; Viviane Ito; Viviane Ito; Agustin Orozco; Agustin Orozco; Francesca Tripodi; Francesca Tripodi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 15, 2025
    Description

    Preprint

    Gildersleve, P., Beers, A., Ito, V., Orozco, A., & Tripodi, F. (2025). WikiReddit: Tracing Information and Attention Flows Between Online Platforms. arXiv [Cs.CY]. https://doi.org/10.48550/arXiv.2502.04942
    Accepted at the International AAAI Conference on Web and Social Media (ICWSM) 2025

    Abstract

    The World Wide Web is a complex interconnected digital ecosystem, where information and attention flow between platforms and communities throughout the globe. These interactions co-construct how we understand the world, reflecting and shaping public discourse. Unfortunately, researchers often struggle to understand how information circulates and evolves across the web because platform-specific data is often siloed and restricted by linguistic barriers. To address this gap, we present a comprehensive, multilingual dataset capturing all Wikipedia links shared in posts and comments on Reddit from 2020 to 2023, excluding those from private and NSFW subreddits. Each linked Wikipedia article is enriched with revision history, page view data, article ID, redirects, and Wikidata identifiers. Through a research agreement with Reddit, our dataset ensures user privacy while providing a query and ID mechanism that integrates with the Reddit and Wikipedia APIs. This enables extended analyses for researchers studying how information flows across platforms. For example, Reddit discussions use Wikipedia for deliberation and fact-checking which subsequently influences Wikipedia content, by driving traffic to articles or inspiring edits. By analyzing the relationship between information shared and discussed on these platforms, our dataset provides a foundation for examining the interplay between social media discourse and collaborative knowledge consumption and production.

    Datasheet

    Motivation

    The motivations for this dataset stem from the challenges researchers face in studying the flow of information across the web. While the World Wide Web enables global communication and collaboration, data silos, linguistic barriers, and platform-specific restrictions hinder our ability to understand how information circulates, evolves, and impacts public discourse. Wikipedia and Reddit, as major hubs of knowledge sharing and discussion, offer an invaluable lens into these processes. However, without comprehensive data capturing their interactions, researchers are unable to fully examine how platforms co-construct knowledge. This dataset bridges this gap, providing the tools needed to study the interconnectedness of social media and collaborative knowledge systems.

    Composition

    WikiReddit, a comprehensive dataset capturing all Wikipedia mentions (including links) shared in posts and comments on Reddit from 2020 to 2023, excluding those from private and NSFW (not safe for work) subreddits. The SQL database comprises 336K total posts, 10.2M comments, 1.95M unique links, and 1.26M unique articles spanning 59 languages on Reddit and 276 Wikipedia language subdomains. Each linked Wikipedia article is enriched with its revision history and page view data within a ±10-day window of its posting, as well as article ID, redirects, and Wikidata identifiers. Supplementary anonymous metadata from Reddit posts and comments further contextualizes the links, offering a robust resource for analysing cross-platform information flows, collective attention dynamics, and the role of Wikipedia in online discourse.

    Collection Process

    Data was collected from the Reddit4Researchers and Wikipedia APIs. No personally identifiable information is published in the dataset. Data from Reddit to Wikipedia is linked via the hyperlink and article titles appearing in Reddit posts.

    Preprocessing/cleaning/labeling

    Extensive processing with tools such as regex was applied to the Reddit post/comment text to extract the Wikipedia URLs. Redirects for Wikipedia URLs and article titles were found through the API and mapped to the collected data. Reddit IDs are hashed with SHA-256 for post/comment/user/subreddit anonymity.

    Uses

    We foresee several applications of this dataset and preview four here. First, Reddit linking data can be used to understand how attention is driven from one platform to another. Second, Reddit linking data can shed light on how Wikipedia's archive of knowledge is used in the larger social web. Third, our dataset could provide insights into how external attention is topically distributed across Wikipedia. Our dataset can help extend that analysis into the disparities in what types of external communities Wikipedia is used in, and how it is used. Fourth, relatedly, a topic analysis of our dataset could reveal how Wikipedia usage on Reddit contributes to societal benefits and harms. Our dataset could help examine if homogeneity within the Reddit and Wikipedia audiences shapes topic patterns and assess whether these relationships mitigate or amplify problematic engagement online.

    Distribution

    The dataset is publicly shared with a Creative Commons Attribution 4.0 International license. The article describing this dataset should be cited: https://doi.org/10.48550/arXiv.2502.04942

    Maintenance

    Patrick Gildersleve will maintain this dataset, and add further years of content as and when available.


    SQL Database Schema

    Table: posts

    Column NameTypeDescription
    subreddit_idTEXTThe unique identifier for the subreddit.
    crosspost_parent_idTEXTThe ID of the original Reddit post if this post is a crosspost.
    post_idTEXTUnique identifier for the Reddit post.
    created_atTIMESTAMPThe timestamp when the post was created.
    updated_atTIMESTAMPThe timestamp when the post was last updated.
    language_codeTEXTThe language code of the post.
    scoreINTEGERThe score (upvotes minus downvotes) of the post.
    upvote_ratioREALThe ratio of upvotes to total votes.
    gildingsINTEGERNumber of awards (gildings) received by the post.
    num_commentsINTEGERNumber of comments on the post.

    Table: comments

    Column NameTypeDescription
    subreddit_idTEXTThe unique identifier for the subreddit.
    post_idTEXTThe ID of the Reddit post the comment belongs to.
    parent_idTEXTThe ID of the parent comment (if a reply).
    comment_idTEXTUnique identifier for the comment.
    created_atTIMESTAMPThe timestamp when the comment was created.
    last_modified_atTIMESTAMPThe timestamp when the comment was last modified.
    scoreINTEGERThe score (upvotes minus downvotes) of the comment.
    upvote_ratioREALThe ratio of upvotes to total votes for the comment.
    gildedINTEGERNumber of awards (gildings) received by the comment.

    Table: postlinks

    Column NameTypeDescription
    post_idTEXTUnique identifier for the Reddit post.
    end_processed_validINTEGERWhether the extracted URL from the post resolves to a valid URL.
    end_processed_urlTEXTThe extracted URL from the Reddit post.
    final_validINTEGERWhether the final URL from the post resolves to a valid URL after redirections.
    final_statusINTEGERHTTP status code of the final URL.
    final_urlTEXTThe final URL after redirections.
    redirectedINTEGERIndicator of whether the posted URL was redirected (1) or not (0).
    in_titleINTEGERIndicator of whether the link appears in the post title (1) or post body (0).

    Table: commentlinks

    Column NameTypeDescription
    comment_idTEXTUnique identifier for the Reddit comment.
    end_processed_validINTEGERWhether the extracted URL from the comment resolves to a valid URL.
    end_processed_urlTEXTThe extracted URL from the comment.
    final_validINTEGERWhether the final URL from the comment resolves to a valid URL after redirections.
    final_statusINTEGERHTTP status code of the final

  17. p

    Reddit Datasets

    • promptcloud.com
    csv
    Updated Feb 12, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PromptCloud (2026). Reddit Datasets [Dataset]. https://www.promptcloud.com/dataset/reddit/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Feb 12, 2026
    Dataset authored and provided by
    PromptCloud
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Extracting Insights from Online DiscussionsReddit is one of the largest social discussion platforms, making it a valuable source for real-time opinions, trends, sentiment analysis, and user interactions across various industries. Scraping Reddit data allows businesses, researchers, and analysts to explore public discussions, track sentiment, and gain actionable insights from user-generated content. Benefits and Impact: Trend […]

  18. reddit user posting behavior (mid-2013)

    • figshare.com
    application/gzip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Randy Olson (2023). reddit user posting behavior (mid-2013) [Dataset]. http://doi.org/10.6084/m9.figshare.874101.v2
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Randy Olson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This file contains the posting preferences for over 850,000 active reddit users. This sample was taken in mid-2013. This data was used to generate the interactive visualization, "redditviz," and will be analyzed in detail in an upcoming research article. Please cite our paper "Navigating the massive world of reddit" if you use this data in your work. URL: http://arxiv.org/abs/1312.3387 The file is organized as follows: Each line is an entry for an anonymous user. Each user was randomly assigned a unique ID, which is what shows in the first entry of each line. Following the user ID, separated by commas, are the subreddits (i.e., interests) that the user regularly posts in. In order for a user to be considered "active" in that subreddit, they had to post or comment there at least 10 times in their last 1,000 posts and comments.

  19. m

    Reddit r/AskScience Flair Dataset

    • data.mendeley.com
    Updated May 23, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sumit Mishra (2022). Reddit r/AskScience Flair Dataset [Dataset]. http://doi.org/10.17632/k9r2d9z999.3
    Explore at:
    Dataset updated
    May 23, 2022
    Authors
    Sumit Mishra
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Reddit is a social news, content rating and discussion website. It's one of the most popular sites on the internet. Reddit has 52 million daily active users and approximately 430 million users who use it once a month. Reddit has different subreddits and here We'll use the r/AskScience Subreddit.

    The dataset is extracted from the subreddit /r/AskScience from Reddit. The data was collected between 01-01-2016 and 20-05-2022. It contains 612,668 Datapoints and 25 Columns. The database contains a number of information about the questions asked on the subreddit, the description of the submission, the flair of the question, NSFW or SFW status, the year of the submission, and more. The data is extracted using python and Pushshift's API. A little bit of cleaning is done using NumPy and pandas as well. (see the descriptions of individual columns below).

    The dataset contains the following columns and descriptions: author - Redditor Name author_fullname - Redditor Full name contest_mode - Contest mode [implement obscured scores and randomized sorting]. created_utc - Time the submission was created, represented in Unix Time. domain - Domain of submission. edited - If the post is edited or not. full_link - Link of the post on the subreddit. id - ID of the submission. is_self - Whether or not the submission is a self post (text-only). link_flair_css_class - CSS Class used to identify the flair. link_flair_text - Flair on the post or The link flair’s text content. locked - Whether or not the submission has been locked. num_comments - The number of comments on the submission. over_18 - Whether or not the submission has been marked as NSFW. permalink - A permalink for the submission. retrieved_on - time ingested. score - The number of upvotes for the submission. description - Description of the Submission. spoiler - Whether or not the submission has been marked as a spoiler. stickied - Whether or not the submission is stickied. thumbnail - Thumbnail of Submission. question - Question Asked in the Submission. url - The URL the submission links to, or the permalink if a self post. year - Year of the Submission. banned - Banned by the moderator or not.

    This dataset can be used for Flair Prediction, NSFW Classification, and different Text Mining/NLP tasks. Exploratory Data Analysis can also be done to get the insights and see the trend and patterns over the years.

  20. Reddit users worldwide 2019-2028

    • statista.com
    Updated Oct 22, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2019). Reddit users worldwide 2019-2028 [Dataset]. https://www.statista.com/study/67973/reddit/
    Explore at:
    Dataset updated
    Oct 22, 2019
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Description

    The global number of Reddit users in was forecast to continuously increase between 2024 and 2028 by in total 52.1 million users (+10.33 percent). After the ninth consecutive increasing year, the Reddit user base is estimated to reach 556.59 million users and therefore a new peak in 2028. Notably, the number of Reddit users of was continuously increasing over the past years.User figures, shown here with regards to the platform reddit, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once. Reddit users encompass both users that are logged in and those that are not.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Reddit users in countries like Asia and Europe.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2025). Reddit usage reach in the United States 2025, by age group [Dataset]. https://www.statista.com/statistics/261766/share-of-us-internet-users-who-use-reddit-by-age-group/
Organization logo

Reddit usage reach in the United States 2025, by age group

Explore at:
47 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Nov 25, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Feb 5, 2025 - Jun 18, 2025
Area covered
United States
Description

According to a survey of adults in the United States in 2025, ** percent of respondents who used Reddit were aged between 18 and 29 years. Reddit usage tends to be affected by users’ age, with older users reporting lower levels of engagement. Reddit engagement in numbers Reddit is one of the most popular websites in the forum category, allowing users to interact in multiple close-knit communities organized in sub-threads and divided by topics. In March 2024, Reddit.com registered an average of *** billion monthly visits from desktop and mobile combined. Reddit users are mostly based in North America, with the United States accounting for the biggest share of traffic worldwide by far. The future of Reddit Reddit was created in 2005 and was redesigned for the very first time in 2018 to make it more appealing to new users and increase engagement from non-participating guests (jokingly called “lurkers”) who nonetheless enjoy the content. In February 2024, the company announced it was entering the public market by releasing its S-1 registration statement. In 2025, the company generated around *** billion U.S. dollars worldwide in revenues. This translated into an average revenue per user (ARPU) of around **** dollars in the last quarter of 2024.

Search
Clear search
Close search
Google apps
Main menu