Facebook
TwitterAccording to a survey of adults in the United States in 2025, ** percent of respondents who used Reddit were aged between 18 and 29 years. Reddit usage tends to be affected by users’ age, with older users reporting lower levels of engagement. Reddit engagement in numbers Reddit is one of the most popular websites in the forum category, allowing users to interact in multiple close-knit communities organized in sub-threads and divided by topics. In March 2024, Reddit.com registered an average of *** billion monthly visits from desktop and mobile combined. Reddit users are mostly based in North America, with the United States accounting for the biggest share of traffic worldwide by far. The future of Reddit Reddit was created in 2005 and was redesigned for the very first time in 2018 to make it more appealing to new users and increase engagement from non-participating guests (jokingly called “lurkers”) who nonetheless enjoy the content. In February 2024, the company announced it was entering the public market by releasing its S-1 registration statement. In 2025, the company generated around *** billion U.S. dollars worldwide in revenues. This translated into an average revenue per user (ARPU) of around **** dollars in the last quarter of 2024.
Facebook
TwitterIn the six months ending March 2024, the United States accounted for 48.46 percent of traffic to the online forum Reddit.com. The United Kingdom was ranked second, accounting for 7.16 percent of web visits to the social media platform. Reddit in the United States In August 2023, Reddit accounted for slightly over 1.6 percent of social media website traffic in the United States. Founded in 2005, Reddit is a discussion website which enables users to aggregate news by posting links and let other users vote and comment on them. There are thousands of subforums, called subreddits, on a wide range of topics available. One of the most popular subreddits is the AMA (“Ask Me Anything”), where celebrities, public figures or people in unique positions post threads that allow other Reddit users to ask them anything. In 2022, Nicolas Cage's AMA post generated over 238.5 thousand upvotes, making it the most popular AMA of the year. Reddit users in the United States Reddit use in the United States is more prevalent among younger online audiences. During a February 2021 survey, it was found that 36 percent of internet users aged 18 to 29 years and 22 percent of users aged 30 to 49 years used Reddit. However, the reach of the social platform strongly declines with age. Also, whilst around a 23 of male adults in the U.S. access Reddit, only 12 percent of women do the same.
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
In a second attempt to be one of the first inductees of the startup incubator Y Combinator, University of Virginia roommates Steve Huffman and Alexis Ohanian built Reddit, which they labelled “the...
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Discover key Reddit statistics, including user growth, subreddit activity, engagement rates, content trends, and traffic insights!
Facebook
TwitterComparing the *** selected regions regarding the number of Reddit users , the United States is leading the ranking (****** million users) and is followed by the United Kingdom with ***** million users. At the other end of the spectrum is Gabon with **** million users, indicating a difference of ****** million users to the United States. User figures, shown here with regards to the platform reddit, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once. Reddit users encompass both users that are logged in and those that are not.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to *** countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
Facebook
TwitterAccording to a survey of internet users conducted in the United States in 2025, ** percent of Black Americans reported having ever used Reddit. Asian Americans appeared to be more likely than both Black and White Americans to have ever used the social media and community forum, with ** percent of users in the demographic reporting to have used the popular forum and social media.
Facebook
Twitterhttps://www.reddit.com/wiki/apihttps://www.reddit.com/wiki/api
This dataset presents a comprehensive collection of top-performing posts - top 1000 posts of All time - from 50 of Reddit's most popular subreddits by subscriber count, offering a view of the content trends and user engagement patterns across diverse online communities.
The data spans from September 2011 to September 2024, providing a wide range of historical perspective on digital content evolution and user interaction dynamics with about 50,000 records in the dataset overall.
The dataset comprises 50 separate CSV files, each named after its corresponding subreddit (e.g., "askreddit_top_1000_posts.csv"). This structure facilitates both focused subreddit-specific analysis and broader cross-subreddit comparative studies.
Each CSV file contains the following fields:
id: Unique post identifiertitle: Post titlescore: Net vote count (upvotes minus downvotes)upvote_ratio: Proportion of upvotes to total votesnum_comments: Comment countcreated_utc: Post creation timestampsubreddit: Subreddit namesubscribers: Subreddit subscriber count at data collection timepermalink: Reddit URL for the posturl: External content URL (if applicable)domain: Source domain for external linksnum_awards: Count of Reddit awards receivednum_crossposts: Frequency of post sharing to other subredditscrosspost_subreddits: List of subreddits where the post was cross-postedpost_type: Content category (e.g., text, link, image, video)is_nsfw: Boolean for Not Safe For Work contentis_bot: Boolean indicating bot-generated postsis_megathread: Boolean for large discussion threadsbody: Full text content for text-postsAn initial data quality check was done to ensure data integrity and if intensive cleaning might be required.
Observations:
- Some minimal amount of cleaning is required that is left on the user to explore.
- body - The post body sometimes contains extremely long texts with new lines and bullet points. These appear wonky on csv, often overflowing to new lines due to limitations of Excel's csv format.
- But exporting the file as xlsx seems to format the text properly. No issues in Pandas either, though pandas by default limits the number of characters you can see in a field, or can adjust the width of the column to display all contents (Not advisable since extremely long texts)
num_awards field may be null for older, archived posts due to data availability constraints.This dataset was collected using the official Reddit API, in compliance with Reddit's Data API Terms.
The post 'author' (user who made the reddit post) field was dropped during data preprocessing to respect users' privacy and as a standard procedure to remove personally identifiable information.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Reddit [source]
This dataset offers an insightful analysis into one of the most talked-about online communities today: Reddit. Specifically, we are focusing on the funny subreddit, a subsection of the main forum that enjoys the highest engagement across all Reddit users. Not only does this dataset include post titles, scores and other details regarding post creation and engagement; it also includes powerful metrics to measure active community interaction such as comment numbers and timestamps. By diving deep into this data, we can paint a fuller picture in terms of what people find funny in our digital age - how well do certain topics draw responses? How does sentiment change over time? And how can community managers use these insights to grow their platforms and better engage their userbase for lasting success? With this comprehensive dataset at your fingertips, you'll be able to answer each question - and more
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
Introduction
Welcome to the Reddit's Funny Subreddit Kaggle Dataset. In this dataset you will explore and analyze posts from the popular subreddit to gain insights into community engagement. With this dataset, you can understand user engagement trends and learn how people interact with content from different topics. This guide will provide further information about how to use this dataset for your data analysis projects.
Important Columns
This datasets contains columns such as: title, score, url, comms_num (number of comments), created (date of post), body (content of post) and timestamp. All these columns are important in understanding user interactions with each post on Reddit’s Funny Subreddit.
Exploratory Data Analysis
In order to get a better understanding of user engagement on the subreddit, some initial exploration is necessary. By using graphical tools such as histograms or boxplots we can understand basic parameter values like scores or comments numbers for each post in the subreddit easily by just observing their distribution over time or through different parameters (for example: type of joke).
Dimensionality reduction
For more advanced analytics it is recommended that a dimensionality reduction technique like PCA should be used first before tackling any real analysis tasks so that similar posts can be grouped together and easier conclusions regarding them can be drawn out later on more confidently by leaving out any kind of conflicting/irrelevant variables which could cloud up any data-driven decisions taken forward at a later date if not properly accounted for early on in an appropriate manner after dimensional consolidation has been performed successfully first correctly effectively right off the bat once starting out cleanly and properly upfront accordingly throughout..
Further Guidance
If further assistance with using this dataset is required then further readings into topics like text mining, natural language processing , machine learning , etc are highly recommended where detailed explanation related to various steps which could help unlock greater value from Reddit's funny subreddits are explained elaborately hopefully giving readers or researchers ideas over what sort of approaches need being taking when it comes analyzing text-based online service platforms such as Reddit during data analytics/science related tasks
- Analyzing post title length vs. engagement (i.e., score, comments).
- Comparing sentiment of post bodies between posts that have high/low scores and comments.
- Comparing topics within the posts that have high/low scores and comments to look for any differences in content or style of writing based on engagement level
If you use this dataset in your research, please credit the original authors.
License
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: funny.csv | Column name | Description | |:--------------|:------------------------...
Facebook
TwitterAs of the second quarter of 2025, the majority of Reddit users were male, accounting for ** percent of its audience base. Additionally, most of Reddit's desktop users were based in the United States.
Facebook
Twitterhttps://www.reddit.com/wiki/apihttps://www.reddit.com/wiki/api
This dataset aims to build a graph of subreddit links based on how they reference each other. The original database dump can be found here.
name (str): name of the subreddit.
type (str): type of the subreddit.
title (str): title of the subredditdescription (str): short description of the subreddit.subscribers (int?): amount of subscribers at the moment.nsfw (bool?): indicator if its flaged as not safe for work 🔞.quarantined (bool?): indicator if it has been quarantined 😷.color (str): key color of the subreddit.img_banner (str?): url of the image used as the banner.img_icon (str?): url of the image used as the icon (snoo).created_at (datetime): utc timestamp of when the subreddit was created.updated_at (datetime): utc timestamp of when the information of the subreddit was last updated.note: the '?' indicates that the value can be null under certain conditions.
| TYPE | AMOUNT |
|---|---|
| TOTAL | 127800 |
| public | 59227 |
| banned | 31473 |
| restricted | 14601 |
| public [nsfw] | 14244 |
| private | 5139 |
| restricted [nsfw] | 3014 |
| public [quarantined] | 29 |
| restricted [quarantined] | 21 |
| archived | 17 |
| premium | 12 |
| public [nsfw] [quarantined] | 11 |
| user [nsfw] | 6 |
| user | 4 |
| restricted [nsfw] [quarantined] | 1 |
| employees | 1 |
source (str): name of the subreddit where the link was found.target (str): name of the linked subreddit.type (str): place where the reference from source to target was found.
updated_at (datetime): utc timestamp of when the information the link was last updated.| TYPE | AMOUNT |
|---|---|
| TOTAL | 349744 |
| wiki | 214206 |
| sidebar | 123650 |
| topbar | 7291 |
| description | 4597 |
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
For a moment, starting April 1st, 2022, Reddit became the talk of the entire internet as they reopened /r/Place - a collaborative pixel canvas stemming all the way from 2017.
Join us in following the recorded history of /r/Place and its related discussion, from the first pixels placed to its inevitable erasure.
This dataset contains all the posts and comments made on /r/Place from from the first of January, 2022, to the fourth of April, 2022.
The dataset was procured using SocialGrep.
To preserve users' anonymity and to prevent targeted harassment, the data does not include usernames.
We would like to thank the Reddit user /u/prosto_sanja for recording the timelapse data. We would also like to thank all Redditors who participated in the canvas. Here's to another good April Fools.
The place raises many questions, some of them surprisingly serious.
Which artpieces were most frequently accused of botting? Do the accusations match their behaviour - can we infer a "botted" picture somehow, even without looking at its users' Reddit histories?
The "amogus" is one of the smallest units of individual contribution, surprisingly resilient due to its cheeky, non-intrusive nature. How did their numbers change over time - does this match any shift in the players' moods as they feel more empowered or disenfranchised to create together?
Which areas ended up the most talked about?
Facebook
TwitterDuring the fourth quarter of 2025, online forum and news aggregator Reddit saw approximately 121.4 million daily active users (DAU) engaging with its platform. Of these, over 70.7 million users were not logged in and accessed the platform's content without proving they registered to Reddit. This represents an increase compared to the previous quarter, when Reddit saw 65.8 million logged-off DAU.
Facebook
TwitterAs of 2025, 29 percent of male respondents in the United States stated that they used Reddit, compared to 23 percent of their female counterpart. Reddit is a social networking and online forum company. The platform is organized in thematic groups, also called subreddits.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Reddit [source]
This dataset provides an in-depth look into learning what communities find important and engaging in the news. With this data, researchers can discover trends related to user engagement and popular topics within subreddits. By examining the “score” and “comms_num” columns, our researchers will be able to pinpoint which topics are most liked, discussed or shared within the various subreddits. Researchers may also gain insights into not only how popular a topic is but how it is growing over time. Additionally, by exploring the body column of our dataset, researchers can understand more about which types of news stories drive conversation within particular subreddits—providing an opportunity for deeper analysis of that subreddit’s diverse community dynamics
The dataset includes eight columns: title, score, id, url, comms_num created**body and timestamp** which can help us identify key insights into user engagement among popular subreddits. With this data we may also determine relationships between topics of discussion and their impact on user engagement allowing us to create a better understanding surrounding issue-based conversations online as well as uncover emerging trends in online news consumption habits
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset is useful for those who are looking to gain insight into the popularity and user engagement of specific subreddits. The data includes 8 different columns including title, score, id, url, comms_num, created, body and timestamp. This can provide valuable information about how users view and interact with particular topics across various subreddits.
In this guide we’ll look at how you can use this dataset to uncover trends in user engagement on topics within specific subreddits as well as measure the overall popularity of these topics within a subreddit.
1) Analyzing Score: By analyzing the “score” column you can determine which news stories are popular in a particular subreddit and which ones aren't by looking at how many upvotes each story has received. With this data you will be able to determine trends in what types of stories users preferred within a particular subreddit over time.
2) Analyzing Comms_Num: Similarly to analyzing the score column you can analyze the “comms_num” column to see which news stories had more engagement from users by tracking number of comments received on each post. Knowing these points can provide insight into what types of stories tend to draw more comment activity from users in certain subreddits from one day or an extended period of time such tracking post activity for multiple weeks or months at once 3) Analyzing Body: Additionally by looking at the “body” column for each post researchers can gain a better understanding which kinds of topics/news draw attention among specific Reddit communities.. With that complete picture researchers have access not only to data measuring Reddit buzz but also access topic discussion/comments helping generate further insights into why certain posts might be popular or receive more comments than others
Overallthis dataset provides valuable insights about user engagedment related specifically topics trending accross subsbreddits allowing anyone interested reseraching such things easier way access those insights all one place
- Grouping news topics within particular subreddits and assessing the overall popularity of those topics in terms of scores/user engagement.
- Correlating user engagement with certain news topics to understand how they influence discussion or reactions on a subreddit.
- Examining the potential correlation between score and the actual body content of a given post to assess what types of content are most successful in gaining interest from users and creating positive engagement for posts
If you use this dataset in your research, please credit the original authors.
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: news.csv | Column name | Description ...
Facebook
TwitterThe Reddit Subreddit Dataset by Dataplex offers a comprehensive and detailed view of Reddit’s vast ecosystem, now enhanced with appended AI-generated columns that provide additional insights and categorization. This dataset includes data from over 2.1 million subreddits, making it an invaluable resource for a wide range of analytical applications, from social media analysis to market research.
Dataset Overview:
This dataset includes detailed information on subreddit activities, user interactions, post frequency, comment data, and more. The inclusion of AI-generated columns adds an extra layer of analysis, offering sentiment analysis, topic categorization, and predictive insights that help users better understand the dynamics of each subreddit.
2.1 Million Subreddits with Enhanced AI Insights: The dataset covers over 2.1 million subreddits and now includes AI-enhanced columns that provide: - Sentiment Analysis: AI-driven sentiment scores for posts and comments, allowing users to gauge community mood and reactions. - Topic Categorization: Automated categorization of subreddit content into relevant topics, making it easier to filter and analyze specific types of discussions. - Predictive Insights: AI models that predict trends, content virality, and user engagement, helping users anticipate future developments within subreddits.
Sourced Directly from Reddit:
All data in this dataset is sourced directly from Reddit, ensuring accuracy and authenticity. The dataset is updated regularly, reflecting the latest trends and user interactions on the platform. This ensures that users have access to the most current and relevant data for their analyses.
Key Features:
Use Cases:
Data Quality and Reliability:
The Reddit Subreddit Dataset emphasizes data quality and reliability. Each record is carefully compiled from Reddit’s vast database, ensuring that the information is both accurate and up-to-date. The AI-generated columns further enhance the dataset's value, providing automated insights that help users quickly identify key trends and sentiments.
Integration and Usability:
The dataset is provided in a format that is compatible with most data analysis tools and platforms, making it easy to integrate into existing workflows. Users can quickly import, analyze, and utilize the data for various applications, from market research to academic studies.
User-Friendly Structure and Metadata:
The data is organized for easy navigation and analysis, with metadata files included to help users identify relevant subreddits and data points. The AI-enhanced columns are clearly labeled and structured, allowing users to efficiently incorporate these insights into their analyses.
Ideal For:
This dataset is an essential resource for anyone looking to understand the intricacies of Reddit's vast ecosystem, offering the data and AI-enhanced insights needed to drive informed decisions and strategies across various fields. Whether you’re tracking emerging trends, analyzing user behavior, or conducting acade...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The World Wide Web is a complex interconnected digital ecosystem, where information and attention flow between platforms and communities throughout the globe. These interactions co-construct how we understand the world, reflecting and shaping public discourse. Unfortunately, researchers often struggle to understand how information circulates and evolves across the web because platform-specific data is often siloed and restricted by linguistic barriers. To address this gap, we present a comprehensive, multilingual dataset capturing all Wikipedia links shared in posts and comments on Reddit from 2020 to 2023, excluding those from private and NSFW subreddits. Each linked Wikipedia article is enriched with revision history, page view data, article ID, redirects, and Wikidata identifiers. Through a research agreement with Reddit, our dataset ensures user privacy while providing a query and ID mechanism that integrates with the Reddit and Wikipedia APIs. This enables extended analyses for researchers studying how information flows across platforms. For example, Reddit discussions use Wikipedia for deliberation and fact-checking which subsequently influences Wikipedia content, by driving traffic to articles or inspiring edits. By analyzing the relationship between information shared and discussed on these platforms, our dataset provides a foundation for examining the interplay between social media discourse and collaborative knowledge consumption and production.
The motivations for this dataset stem from the challenges researchers face in studying the flow of information across the web. While the World Wide Web enables global communication and collaboration, data silos, linguistic barriers, and platform-specific restrictions hinder our ability to understand how information circulates, evolves, and impacts public discourse. Wikipedia and Reddit, as major hubs of knowledge sharing and discussion, offer an invaluable lens into these processes. However, without comprehensive data capturing their interactions, researchers are unable to fully examine how platforms co-construct knowledge. This dataset bridges this gap, providing the tools needed to study the interconnectedness of social media and collaborative knowledge systems.
WikiReddit, a comprehensive dataset capturing all Wikipedia mentions (including links) shared in posts and comments on Reddit from 2020 to 2023, excluding those from private and NSFW (not safe for work) subreddits. The SQL database comprises 336K total posts, 10.2M comments, 1.95M unique links, and 1.26M unique articles spanning 59 languages on Reddit and 276 Wikipedia language subdomains. Each linked Wikipedia article is enriched with its revision history and page view data within a ±10-day window of its posting, as well as article ID, redirects, and Wikidata identifiers. Supplementary anonymous metadata from Reddit posts and comments further contextualizes the links, offering a robust resource for analysing cross-platform information flows, collective attention dynamics, and the role of Wikipedia in online discourse.
Data was collected from the Reddit4Researchers and Wikipedia APIs. No personally identifiable information is published in the dataset. Data from Reddit to Wikipedia is linked via the hyperlink and article titles appearing in Reddit posts.
Extensive processing with tools such as regex was applied to the Reddit post/comment text to extract the Wikipedia URLs. Redirects for Wikipedia URLs and article titles were found through the API and mapped to the collected data. Reddit IDs are hashed with SHA-256 for post/comment/user/subreddit anonymity.
We foresee several applications of this dataset and preview four here. First, Reddit linking data can be used to understand how attention is driven from one platform to another. Second, Reddit linking data can shed light on how Wikipedia's archive of knowledge is used in the larger social web. Third, our dataset could provide insights into how external attention is topically distributed across Wikipedia. Our dataset can help extend that analysis into the disparities in what types of external communities Wikipedia is used in, and how it is used. Fourth, relatedly, a topic analysis of our dataset could reveal how Wikipedia usage on Reddit contributes to societal benefits and harms. Our dataset could help examine if homogeneity within the Reddit and Wikipedia audiences shapes topic patterns and assess whether these relationships mitigate or amplify problematic engagement online.
The dataset is publicly shared with a Creative Commons Attribution 4.0 International license. The article describing this dataset should be cited: https://doi.org/10.48550/arXiv.2502.04942
Patrick Gildersleve will maintain this dataset, and add further years of content as and when available.
posts| Column Name | Type | Description |
|---|---|---|
subreddit_id | TEXT | The unique identifier for the subreddit. |
crosspost_parent_id | TEXT | The ID of the original Reddit post if this post is a crosspost. |
post_id | TEXT | Unique identifier for the Reddit post. |
created_at | TIMESTAMP | The timestamp when the post was created. |
updated_at | TIMESTAMP | The timestamp when the post was last updated. |
language_code | TEXT | The language code of the post. |
score | INTEGER | The score (upvotes minus downvotes) of the post. |
upvote_ratio | REAL | The ratio of upvotes to total votes. |
gildings | INTEGER | Number of awards (gildings) received by the post. |
num_comments | INTEGER | Number of comments on the post. |
comments| Column Name | Type | Description |
|---|---|---|
subreddit_id | TEXT | The unique identifier for the subreddit. |
post_id | TEXT | The ID of the Reddit post the comment belongs to. |
parent_id | TEXT | The ID of the parent comment (if a reply). |
comment_id | TEXT | Unique identifier for the comment. |
created_at | TIMESTAMP | The timestamp when the comment was created. |
last_modified_at | TIMESTAMP | The timestamp when the comment was last modified. |
score | INTEGER | The score (upvotes minus downvotes) of the comment. |
upvote_ratio | REAL | The ratio of upvotes to total votes for the comment. |
gilded | INTEGER | Number of awards (gildings) received by the comment. |
postlinks| Column Name | Type | Description |
|---|---|---|
post_id | TEXT | Unique identifier for the Reddit post. |
end_processed_valid | INTEGER | Whether the extracted URL from the post resolves to a valid URL. |
end_processed_url | TEXT | The extracted URL from the Reddit post. |
final_valid | INTEGER | Whether the final URL from the post resolves to a valid URL after redirections. |
final_status | INTEGER | HTTP status code of the final URL. |
final_url | TEXT | The final URL after redirections. |
redirected | INTEGER | Indicator of whether the posted URL was redirected (1) or not (0). |
in_title | INTEGER | Indicator of whether the link appears in the post title (1) or post body (0). |
commentlinks| Column Name | Type | Description |
|---|---|---|
comment_id | TEXT | Unique identifier for the Reddit comment. |
end_processed_valid | INTEGER | Whether the extracted URL from the comment resolves to a valid URL. |
end_processed_url | TEXT | The extracted URL from the comment. |
final_valid | INTEGER | Whether the final URL from the comment resolves to a valid URL after redirections. |
final_status | INTEGER | HTTP status code of the final |
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Extracting Insights from Online DiscussionsReddit is one of the largest social discussion platforms, making it a valuable source for real-time opinions, trends, sentiment analysis, and user interactions across various industries. Scraping Reddit data allows businesses, researchers, and analysts to explore public discussions, track sentiment, and gain actionable insights from user-generated content. Benefits and Impact: Trend […]
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This file contains the posting preferences for over 850,000 active reddit users. This sample was taken in mid-2013. This data was used to generate the interactive visualization, "redditviz," and will be analyzed in detail in an upcoming research article. Please cite our paper "Navigating the massive world of reddit" if you use this data in your work. URL: http://arxiv.org/abs/1312.3387 The file is organized as follows: Each line is an entry for an anonymous user. Each user was randomly assigned a unique ID, which is what shows in the first entry of each line. Following the user ID, separated by commas, are the subreddits (i.e., interests) that the user regularly posts in. In order for a user to be considered "active" in that subreddit, they had to post or comment there at least 10 times in their last 1,000 posts and comments.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Reddit is a social news, content rating and discussion website. It's one of the most popular sites on the internet. Reddit has 52 million daily active users and approximately 430 million users who use it once a month. Reddit has different subreddits and here We'll use the r/AskScience Subreddit.
The dataset is extracted from the subreddit /r/AskScience from Reddit. The data was collected between 01-01-2016 and 20-05-2022. It contains 612,668 Datapoints and 25 Columns. The database contains a number of information about the questions asked on the subreddit, the description of the submission, the flair of the question, NSFW or SFW status, the year of the submission, and more. The data is extracted using python and Pushshift's API. A little bit of cleaning is done using NumPy and pandas as well. (see the descriptions of individual columns below).
The dataset contains the following columns and descriptions: author - Redditor Name author_fullname - Redditor Full name contest_mode - Contest mode [implement obscured scores and randomized sorting]. created_utc - Time the submission was created, represented in Unix Time. domain - Domain of submission. edited - If the post is edited or not. full_link - Link of the post on the subreddit. id - ID of the submission. is_self - Whether or not the submission is a self post (text-only). link_flair_css_class - CSS Class used to identify the flair. link_flair_text - Flair on the post or The link flair’s text content. locked - Whether or not the submission has been locked. num_comments - The number of comments on the submission. over_18 - Whether or not the submission has been marked as NSFW. permalink - A permalink for the submission. retrieved_on - time ingested. score - The number of upvotes for the submission. description - Description of the Submission. spoiler - Whether or not the submission has been marked as a spoiler. stickied - Whether or not the submission is stickied. thumbnail - Thumbnail of Submission. question - Question Asked in the Submission. url - The URL the submission links to, or the permalink if a self post. year - Year of the Submission. banned - Banned by the moderator or not.
This dataset can be used for Flair Prediction, NSFW Classification, and different Text Mining/NLP tasks. Exploratory Data Analysis can also be done to get the insights and see the trend and patterns over the years.
Facebook
TwitterThe global number of Reddit users in was forecast to continuously increase between 2024 and 2028 by in total 52.1 million users (+10.33 percent). After the ninth consecutive increasing year, the Reddit user base is estimated to reach 556.59 million users and therefore a new peak in 2028. Notably, the number of Reddit users of was continuously increasing over the past years.User figures, shown here with regards to the platform reddit, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once. Reddit users encompass both users that are logged in and those that are not.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Reddit users in countries like Asia and Europe.
Facebook
TwitterAccording to a survey of adults in the United States in 2025, ** percent of respondents who used Reddit were aged between 18 and 29 years. Reddit usage tends to be affected by users’ age, with older users reporting lower levels of engagement. Reddit engagement in numbers Reddit is one of the most popular websites in the forum category, allowing users to interact in multiple close-knit communities organized in sub-threads and divided by topics. In March 2024, Reddit.com registered an average of *** billion monthly visits from desktop and mobile combined. Reddit users are mostly based in North America, with the United States accounting for the biggest share of traffic worldwide by far. The future of Reddit Reddit was created in 2005 and was redesigned for the very first time in 2018 to make it more appealing to new users and increase engagement from non-participating guests (jokingly called “lurkers”) who nonetheless enjoy the content. In February 2024, the company announced it was entering the public market by releasing its S-1 registration statement. In 2025, the company generated around *** billion U.S. dollars worldwide in revenues. This translated into an average revenue per user (ARPU) of around **** dollars in the last quarter of 2024.