This statistic shows the number of logged-in Reddit users in the United States from 2017 to 2025. In 2020, it was reported that ** million users accessed their account on the social news aggregation website. This figure is projected to rise to ** million users in 2025.
In the six months ending March 2024, the United States accounted for 48.46 percent of traffic to the online forum Reddit.com. The United Kingdom was ranked second, accounting for 7.16 percent of web visits to the social media platform. Reddit in the United States In August 2023, Reddit accounted for slightly over 1.6 percent of social media website traffic in the United States. Founded in 2005, Reddit is a discussion website which enables users to aggregate news by posting links and let other users vote and comment on them. There are thousands of subforums, called subreddits, on a wide range of topics available. One of the most popular subreddits is the AMA (“Ask Me Anything”), where celebrities, public figures or people in unique positions post threads that allow other Reddit users to ask them anything. In 2022, Nicolas Cage's AMA post generated over 238.5 thousand upvotes, making it the most popular AMA of the year. Reddit users in the United States Reddit use in the United States is more prevalent among younger online audiences. During a February 2021 survey, it was found that 36 percent of internet users aged 18 to 29 years and 22 percent of users aged 30 to 49 years used Reddit. However, the reach of the social platform strongly declines with age. Also, whilst around a 23 of male adults in the U.S. access Reddit, only 12 percent of women do the same.
According to a survey of adults in the United States in 2024, 46 percent of respondents who used Reddit were aged between 19 and 29 years. Reddit usage tends to be affected by users’ age, with older users reporting lower levels of engagement. Reddit engagement in numbers Reddit is one of the most popular websites in the forum category, allowing users to interact in multiple close-knitted communities organized in sub-threads and divided by topics. In March 2024, Reddit.com registered an average of 2.2 billion monthly visits from desktop and mobile combined. Reddit users are mostly based in North America, with the United States accounting for the biggest share of traffic worldwide by far. The future of Reddit Reddit was created in 2005, was redesigned for the very first time in 2018 to make it more appealing to new users and increase engagement from non-participating guests (jokingly called “lurkers”) who nonetheless enjoy the content. In February 2024, the company announced it was entering the public market by releasing its S-1 registration statement. In 2024, the company generated around 1.3 billion U.S. dollars worldwide in revenues. This translated into an average revenue per user (ARPU) of around 4.21 dollars in the last quarter of 2024.
As of March 2021, users in their twenties and thirties accounted for almost two-thirds of Reddit active user accounts in the United States. According to recent data, users aged 20 to 29 years, accounted for 28.1 percent of the social news app's user base on the Android platform.
The Reddit Subreddit Dataset by Dataplex offers a comprehensive and detailed view of Reddit’s vast ecosystem, now enhanced with appended AI-generated columns that provide additional insights and categorization. This dataset includes data from over 2.1 million subreddits, making it an invaluable resource for a wide range of analytical applications, from social media analysis to market research.
Dataset Overview:
This dataset includes detailed information on subreddit activities, user interactions, post frequency, comment data, and more. The inclusion of AI-generated columns adds an extra layer of analysis, offering sentiment analysis, topic categorization, and predictive insights that help users better understand the dynamics of each subreddit.
2.1 Million Subreddits with Enhanced AI Insights: The dataset covers over 2.1 million subreddits and now includes AI-enhanced columns that provide: - Sentiment Analysis: AI-driven sentiment scores for posts and comments, allowing users to gauge community mood and reactions. - Topic Categorization: Automated categorization of subreddit content into relevant topics, making it easier to filter and analyze specific types of discussions. - Predictive Insights: AI models that predict trends, content virality, and user engagement, helping users anticipate future developments within subreddits.
Sourced Directly from Reddit:
All social media data in this dataset is sourced directly from Reddit, ensuring accuracy and authenticity. The dataset is updated regularly, reflecting the latest trends and user interactions on the platform. This ensures that users have access to the most current and relevant data for their analyses.
Key Features:
Use Cases:
Data Quality and Reliability:
The Reddit Subreddit Dataset emphasizes data quality and reliability. Each record is carefully compiled from Reddit’s vast database, ensuring that the information is both accurate and up-to-date. The AI-generated columns further enhance the dataset's value, providing automated insights that help users quickly identify key trends and sentiments.
Integration and Usability:
The dataset is provided in a format that is compatible with most data analysis tools and platforms, making it easy to integrate into existing workflows. Users can quickly import, analyze, and utilize the data for various applications, from market research to academic studies.
User-Friendly Structure and Metadata:
The data is organized for easy navigation and analysis, with metadata files included to help users identify relevant subreddits and data points. The AI-enhanced columns are clearly labeled and structured, allowing users to efficiently incorporate these insights into their analyses.
Ideal For:
This dataset is an essential resource for anyone looking to understand the intricacies of Reddit's vast ecosystem, offering the data and AI-enhanced insights needed to drive informed decisions and strategies across various fields. Whether you’re tracking emerging trends, analyzing user behavior, or conduc...
As of the third quarter of 2024, the majority of Reddit users were male, accounting for 59.8 percent of its audience base. Overall, women accounted for roughly 39.1 percent of the website users. Additionally, most of Reddit's desktop users were based in the United States.
The Reddit dataset is a graph dataset from Reddit posts made in the month of September, 2014. The node label in this case is the community, or “subreddit”, that a post belongs to. 50 large communities have been sampled to build a post-to-post graph, connecting posts if the same user comments on both. In total this dataset contains 232,965 posts with an average degree of 492. The first 20 days are used for training and the remaining days for testing (with 30% used for validation). For features, off-the-shelf 300-dimensional GloVe CommonCrawl word vectors are used.
The number of Reddit users in the United States was forecast to continuously increase between 2024 and 2028 by in total 10.3 million users (+5.21 percent). After the ninth consecutive increasing year, the Reddit user base is estimated to reach 208.12 million users and therefore a new peak in 2028. Notably, the number of Reddit users of was continuously increasing over the past years.User figures, shown here with regards to the platform reddit, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once. Reddit users encompass both users that are logged in and those that are not.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Reddit users in countries like Mexico and Canada.
According to a survey of internet users conducted in the United States between February and June, 2024, 14 percent of Black Americans reported having ever used Reddit. Asian Americans appeared to be more likely than both Black and white Americans to have ever used the social media and community forum, with 36 percent of users in the demographic reporting to have used the popular forum and social media.
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Reddit is a massive platform for news, content, and discussions, hosting millions of active users daily. Among its vast number of subreddits, we focus on the r/AskScience community, where users engage in science-related discussions and questions.
This dataset is derived from the r/AskScience subreddit, collected between January 1, 2016, and May 20, 2022. It includes 612,668 datapoints across 22 columns, featuring diverse information such as the content of the questions, submission descriptions, associated flairs, NSFW/SFW status, year of submission, and more. The data was extracted using Python and Pushshift's API, followed by some cleaning with NumPy and pandas. Detailed column descriptions are available for clarity.
This dataset provides an in-depth corpus of posts and comments from the Reddit board /r/datasets, covering its entire history up to 1st March 2022. Its primary purpose is to serve as a collection of datasets related to Reddit content, enabling analysts and data scientists to explore online community data. The data was acquired using SocialGrep. To safeguard user privacy, usernames have been excluded from this dataset, preventing targeted harassment and preserving anonymity. It includes details such as comment body text, sentiment analysis, and comment scores, offering a rich resource for various analytical tasks.
The dataset is structured as a table containing all comments. While the specific file format is typically CSV, the total number of values for key columns such as id
, subreddit.id
, created_utc
, permalink
, body
, sentiment
, and score
is 54,848 records. For the subreddit.nsfw
column, all 54,848 values indicate 'false', meaning no NSFW subreddits are included in this specific count. The body
column shows that 5% of comments are '[deleted]', 2% are '[removed]', and the remaining 93% consist of other content. Sentiment scores range from -1.00 to 1.00, with varying distributions across different ranges. Comment scores range from -65 to 195, also with varying frequencies across score bands.
This dataset is ideally suited for data science and analytics projects. It can be used for: * Natural Language Processing (NLP) tasks, such as text analysis and sentiment classification. * Studying the dynamics of online communities and social networks. * Analyzing user sentiment towards various topics discussed on Reddit. * Exploring the factors influencing comment scores and engagement. * Developing models for content moderation or recommendation based on Reddit data.
The dataset spans a significant time range, including all posts and comments from the inception of the /r/datasets board up to 1st March 2022. Its geographic scope is global, representing activity across Reddit's platform without specific regional limitations. The demographic scope primarily focuses on the users interacting within the /r/datasets community on Reddit. As mentioned, usernames are specifically excluded to ensure user anonymity.
CC-BY
This dataset is valuable for a wide range of users, including: * Data scientists and analysts looking for real-world social media data for their projects. * Researchers in fields such as computer science, social networks, and linguistics, for studying online behaviour and communication patterns. * Developers creating applications that involve text analysis or sentiment prediction. * Anyone interested in gaining insights into Reddit communities and their discussions.
Original Data Source: The Reddit Dataset Dataset
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Reddit is a social news, content rating and discussion website. It's one of the most popular sites on the internet. Reddit has 52 million daily active users and approximately 430 million users who use it once a month. Reddit has different subreddits and here We'll use the r/AskScience Subreddit.
The dataset is extracted from the subreddit /r/AskScience from Reddit. The data was collected between 01-01-2016 and 20-05-2022. It contains 612,668 Datapoints and 25 Columns. The database contains a number of information about the questions asked on the subreddit, the description of the submission, the flair of the question, NSFW or SFW status, the year of the submission, and more. The data is extracted using python and Pushshift's API. A little bit of cleaning is done using NumPy and pandas as well. (see the descriptions of individual columns below).
The dataset contains the following columns and descriptions: author - Redditor Name author_fullname - Redditor Full name contest_mode - Contest mode [implement obscured scores and randomized sorting]. created_utc - Time the submission was created, represented in Unix Time. domain - Domain of submission. edited - If the post is edited or not. full_link - Link of the post on the subreddit. id - ID of the submission. is_self - Whether or not the submission is a self post (text-only). link_flair_css_class - CSS Class used to identify the flair. link_flair_text - Flair on the post or The link flair’s text content. locked - Whether or not the submission has been locked. num_comments - The number of comments on the submission. over_18 - Whether or not the submission has been marked as NSFW. permalink - A permalink for the submission. retrieved_on - time ingested. score - The number of upvotes for the submission. description - Description of the Submission. spoiler - Whether or not the submission has been marked as a spoiler. stickied - Whether or not the submission is stickied. thumbnail - Thumbnail of Submission. question - Question Asked in the Submission. url - The URL the submission links to, or the permalink if a self post. year - Year of the Submission. banned - Banned by the moderator or not.
This dataset can be used for Flair Prediction, NSFW Classification, and different Text Mining/NLP tasks. Exploratory Data Analysis can also be done to get the insights and see the trend and patterns over the years.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This file contains the posting preferences for over 850,000 active reddit users. This sample was taken in mid-2013. This data was used to generate the interactive visualization, "redditviz," and will be analyzed in detail in an upcoming research article. Please cite our paper "Navigating the massive world of reddit" if you use this data in your work. URL: http://arxiv.org/abs/1312.3387 The file is organized as follows: Each line is an entry for an anonymous user. Each user was randomly assigned a unique ID, which is what shows in the first entry of each line. Following the user ID, separated by commas, are the subreddits (i.e., interests) that the user regularly posts in. In order for a user to be considered "active" in that subreddit, they had to post or comment there at least 10 times in their last 1,000 posts and comments.
Comparing the *** selected regions regarding the number of Reddit users , the United States is leading the ranking (****** million users) and is followed by the United Kingdom with ***** million users. At the other end of the spectrum is Gabon with **** million users, indicating a difference of ****** million users to the United States. User figures, shown here with regards to the platform reddit, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once. Reddit users encompass both users that are logged in and those that are not.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to *** countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
The number of Reddit users in Israel was forecast to increase between 2024 and 2028 by in total 0.01 million users (+0.76 percent). This overall increase does not happen continuously, notably not in 2027. The Reddit user base is estimated to amount to 1.32 million users in 2028. Notably, the number of Reddit users of was continuously increasing over the past years.User figures, shown here with regards to the platform reddit, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once. Reddit users encompass both users that are logged in and those that are not.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Reddit users in countries like Bahrain and Kuwait.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Extracting Insights from Online DiscussionsReddit is one of the largest social discussion platforms, making it a valuable source for real-time opinions, trends, sentiment analysis, and user interactions across various industries. Scraping Reddit data allows businesses, researchers, and analysts to explore public discussions, track sentiment, and gain actionable insights from user-generated content. Benefits and Impact: Trend […]
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F949630%2F1a380791014d44ae3581e006f4540b9a%2F898dc7.png?generation=1580627804062875&alt=media" alt="Reddit Banner">
This data is an extract from a bigger reddit dataset (All reddit comments from May 2019, 157Gb or data uncompressed) that contains both more comments and more associated informations (timestamps, author, flairs etc...).
For ease of use, I picked the first 25 000 comments for each of the 40 most frequented subreddits (May 2019), this was if anyone wants to us the subreddit as categorical data, the volumes are balanced.
I also excluded any removed comments / comments whose author got deleted and comments deemed too short (less than 4 tokens) and changed the format (json -> csv).
This is primarily a NLP dataset, but in addition to the comments I added the 3 features I deemed the most important, I also aimed for feature type variety.
The information kept here is:
The data is but a small extract of what is being collected by pushshift.io on a monthly basis. You easily find the full information if you want to work with more features and more data.
Have fun! The variety of feature types should allow you to gain a few interesting insights or build some simple models.
If you think the License (CC0: Public Domain) should be different, contact me
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Reddit [source]
This dataset offers a comprehensive look into the conversations and trends within the Reddit Food Subreddit. It contains analysis from user-generated posts, including columns such as title, score, id, url, comms_num, created, body and timestamp. This incredible source of data provides an opportunity to dive into Reddit’s vibrant food community and explore key insights related to user interactions with their peers or favorite dishes. Through this dataset exploration in topics like sentiment analysis or predictive analytics can open up interesting discoveries on how users talk about food related issues or what type of recipes get the most appreciation!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
The present dataset gives a comprehensive collection of Reddit posts from the Food subsreddit - presenting a great resource to study trends and user sentiments on the topics of food. With this data, you can uncover interesting patterns and insights related to user-generated content in food-related subreddits.
- Developing an artificial intelligence model to predict future trends in food recipes and menu items.
- Analyzing user sentiment on certain ingredients or meals through natural language processing to detect the best meals and dishes in terms of consumer opinion.
- Generating sentiment analysis graphs of different posts over time to observe changes in the public's opinion on staple food topics such as veganism, clean eating and more niche subjects like Korean cuisine or soul food
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: food.csv | Column name | Description | |:--------------|:--------------------------------------------------------| | title | The title of the post. (String) | | score | The number of upvotes the post has received. (Integer) | | url | The URL of the post. (String) | | comms_num | The number of comments the post has received. (Integer) | | created | The date and time the post was created. (DateTime) | | body | The body of the post. (String) | | timestamp | The timestamp of the post. (Integer) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Reddit.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Card for one-million-reddit-questions
Dataset Summary
This corpus contains a million posts on /r/AskReddit, annotated with their score.
Languages
Mainly English.
Dataset Structure
Data Instances
A data point is a Reddit post.
Data Fields
'type': the type of the data point. Can be 'post' or 'comment'. 'id': the base-36 Reddit ID of the data point. Unique when combined with type. 'subreddit.id': the base-36 Reddit ID of… See the full description on the dataset page: https://huggingface.co/datasets/SocialGrep/one-million-reddit-questions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data 1: Dataset with articles posted in the r/Liberal and r/Conservative subreddits. In total, we collected a corpus of 226,010 articles. We have collected news articles to understand political expression through the shared news articles. Data 2: Dataset with articles posted in the Liberal, Conservative, and Restricted (private or banned) subreddits. In total, we collected a corpus of 1.3 million articles. We have collected news articles to understand radicalized communities through the shared news articles.
Part 1 has Data 1 (all) and Data 2 (Raw and Labeled Data - Restricted.json) Part 2 has Data 2 (Raw and Labeled Data - Liberal.json, and Conservative.json) and Data 2 (Raw and Unlabeled Data - first 40 of the 76 .json files) Part 3 has Data 2 (Raw and Unlabeled Data - reamaining 36 of the 76 .json files)
This statistic shows the number of logged-in Reddit users in the United States from 2017 to 2025. In 2020, it was reported that ** million users accessed their account on the social news aggregation website. This figure is projected to rise to ** million users in 2025.