100+ datasets found
  1. Reddit Datasets

    • brightdata.com
    .json, .csv, .xlsx
    Updated Jan 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2023). Reddit Datasets [Dataset]. https://brightdata.com/products/datasets/reddit
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Jan 11, 2023
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Access our extensive Reddit datasets that provide detailed information on posts, communities (subreddits), and user engagement. Gain insights into post performance, user comments, community statistics, and content trends with our ethically sourced data. Free samples are available for evaluation. 3M+ records available Price starts at $250/100K records Data formats are available in JSON, NDJSON, CSV, XLSX and Parquet. 100% ethical and compliant data collection Included datapoints:

    Post ID, Title & URL Post Description & Date Username of Poster Upvotes & Comment Count Community Name, URL & Description Community Member Count Attached Photos & Videos Full Post Comments Related Posts Post Karma Post Tags And more

  2. U.S. number of logged-in Reddit users 2017-2025

    • statista.com
    Updated Jun 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). U.S. number of logged-in Reddit users 2017-2025 [Dataset]. https://www.statista.com/statistics/1018997/number-of-reddit-users-usa/
    Explore at:
    Dataset updated
    Jun 24, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    United States
    Description

    This statistic shows the number of logged-in Reddit users in the United States from 2017 to 2025. In 2020, it was reported that ** million users accessed their account on the social news aggregation website. This figure is projected to rise to ** million users in 2025.

  3. S

    Reddit Statistics 2025: Traffic, Users, and More

    • sqmagazine.co.uk
    Updated Jul 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SQ Magazine (2025). Reddit Statistics 2025: Traffic, Users, and More [Dataset]. https://sqmagazine.co.uk/reddit-statistics/
    Explore at:
    Dataset updated
    Jul 22, 2025
    Dataset authored and provided by
    SQ Magazine
    License

    https://sqmagazine.co.uk/privacy-policy/https://sqmagazine.co.uk/privacy-policy/

    Time period covered
    Jan 1, 2024 - Dec 31, 2025
    Area covered
    Global
    Description

    It started in a Harvard dorm room in 2005. What began as a simple idea for "the front page of the internet" has now transformed into one of the most active and uniquely engaging platforms on the web. Reddit has evolved far beyond its early roots as a niche forum...

  4. Total global visitor traffic to Reddit.com 2024

    • statista.com
    Updated Aug 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Total global visitor traffic to Reddit.com 2024 [Dataset]. https://www.statista.com/statistics/443332/reddit-monthly-visitors/
    Explore at:
    Dataset updated
    Aug 20, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Oct 2023 - Mar 2024
    Area covered
    Worldwide
    Description

    Reddit is a web traffic powerhouse: in March 2024 approximately 2.2 billion visits were measured to the online forum, making it one of the most-visited websites online. The front page of the internet Formerly known as “the front page of the internet”, Reddit is an online forum platform with over 130,000 sub-forums and communities. The platform allows registered users, called Redditors, to post content. Each post is open to the entire Reddit community to vote upon, either by down- or upvotes. The most popular posts are featured directly on the front page. Subreddits are available by category and Redditors can follow selected subreddits relevant to their interest and also control what content they see on their custom front page. Some of the most popular subreddits are r/AskReddit or r/AMA – the “Ask Me Anything” format. According to the company, Reddit hosted 1,800 AMAs in 2018, with a wide range of topics and hosts. One of the most popular Reddit AMA of 2022 by number of upvotes was by actor Nicolas Cagem with more than 238.5 thousand upvotes. Reddit usage The United States account for the biggest share of Reddit's desktop traffic, followed by the UK, and Canada. As of March 2023, Reddit ranked among the most popular social media websites in the United States.

  5. m

    Reddit Users Statistics and Facts

    • market.biz
    Updated Sep 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market.biz (2025). Reddit Users Statistics and Facts [Dataset]. https://market.biz/reddit-users-statistics/
    Explore at:
    Dataset updated
    Sep 4, 2025
    Dataset provided by
    Market.biz
    License

    https://market.biz/privacy-policyhttps://market.biz/privacy-policy

    Time period covered
    2022 - 2032
    Area covered
    ASIA, South America, North America, Africa, Europe, Australia
    Description

    Introduction

    Reddit Users Statistics: In 2024, the platform exceeded 1 billion monthly active users, surpassing Snapchat’s 800 million and Twitter’s 335 million. Following its IPO this year, Reddit is already demonstrating significant progress. In 2025, Reddit attracted 1.1 billion monthly users and recorded 3.80 billion visits in the U.S. in January alone.

    With an annual revenue of $1.3 billion, it serves as a global center for niche communities, humor, and affluent users. Reddit has undoubtedly established itself as a noticeable social media platform, attractive to a diverse array of demographics. The platform encourages excited communities across a multitude of interests, markets, and target audiences.

    However, in contrast to other social media platforms, where engagement may be more direct, Reddit’s community-centric culture necessitates that brands earn the trust of users by delivering genuine value.

  6. m

    Reddit Advertising Statistics and Facts

    • market.biz
    Updated Sep 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market.biz (2025). Reddit Advertising Statistics and Facts [Dataset]. https://market.biz/reddit-advertising-statistics/
    Explore at:
    Dataset updated
    Sep 3, 2025
    Dataset provided by
    Market.biz
    License

    https://market.biz/privacy-policyhttps://market.biz/privacy-policy

    Time period covered
    2022 - 2032
    Area covered
    South America, Europe, Australia, North America, ASIA, Africa
    Description

    Introduction

    Reddit Advertising Statistics: Reddit is a social sharing platform that revolves around users submitting text, images, and videos, which can be voted on by others. Content that gains popularity ascends to the top, whereas content that receives downvotes (i.e., less favored) becomes less prominent. Reddit advertisements can facilitate various actions to achieve your objectives.

    To assess how effectively your Reddit ads are generating purchases and sales, it is advisable to share conversion data via the Reddit Pixel or Conversions API.In the United States alone, Reddit garnered approximately 3.31 billion visits in January 2025, highlighting its substantial presence in the digital landscape.

    With an annual revenue of $1.3 billion in 2024, predominantly from advertising, Reddit has firmly established itself among the leading social media platforms.

  7. Reddit: content created H2 2024

    • statista.com
    Updated Jul 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Reddit: content created H2 2024 [Dataset]. https://www.statista.com/statistics/1319008/reddit-content-created/
    Explore at:
    Dataset updated
    Jul 23, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jul 2024 - Dec 2024
    Area covered
    Worldwide
    Description

    In the second half of 2024, a total of 5.96 billion pieces of content were created on Reddit. Of these, over 1.7 billion were comments left by registered users under posted content. Over 3.4 billion chats were exchanged during the examined period, while private messages on the platform had a volume of approximately 522 million pieces of content.

  8. p

    Reddit Statistics 2025

    • podbase.com
    html
    Updated Aug 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Podbase (2025). Reddit Statistics 2025 [Dataset]. https://www.podbase.com/blogs/reddit-statistics
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Aug 28, 2025
    Dataset authored and provided by
    Podbase
    Variables measured
    100,000+ active Reddit communities, r/funny has about 66 million members, 22+ billion Reddit posts and comments, 46% of U.S. Reddit users are ages 18-29, 59.8% male share of Reddit users (late 2024), 108.1 million daily active Reddit users in Q1 2025, 2.2 billion monthly Reddit visitors in January 2025, Reddit ads reach nearly 11% of global internet users, U.S. mobile users spend 25-30 minutes per day on Reddit
    Measurement technique
    Platform-reported usage metrics, Industry reporting, Third-party analytics
    Description

    Key Reddit statistics for 2025 tailored to print-on-demand and ecommerce founders, including usage, demographics, ad reach, and engagement.

  9. Distribution of Reddit.com traffic 2024, by country

    • statista.com
    Updated Aug 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Distribution of Reddit.com traffic 2024, by country [Dataset]. https://www.statista.com/statistics/325144/reddit-global-active-user-distribution/
    Explore at:
    Dataset updated
    Aug 20, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    In the six months ending March 2024, the United States accounted for 48.46 percent of traffic to the online forum Reddit.com. The United Kingdom was ranked second, accounting for 7.16 percent of web visits to the social media platform. Reddit in the United States In August 2023, Reddit accounted for slightly over 1.6 percent of social media website traffic in the United States. Founded in 2005, Reddit is a discussion website which enables users to aggregate news by posting links and let other users vote and comment on them. There are thousands of subforums, called subreddits, on a wide range of topics available. One of the most popular subreddits is the AMA (“Ask Me Anything”), where celebrities, public figures or people in unique positions post threads that allow other Reddit users to ask them anything. In 2022, Nicolas Cage's AMA post generated over 238.5 thousand upvotes, making it the most popular AMA of the year. Reddit users in the United States Reddit use in the United States is more prevalent among younger online audiences. During a February 2021 survey, it was found that 36 percent of internet users aged 18 to 29 years and 22 percent of users aged 30 to 49 years used Reddit. However, the reach of the social platform strongly declines with age. Also, whilst around a 23 of male adults in the U.S. access Reddit, only 12 percent of women do the same.

  10. m

    Reddit Statistics and Facts

    • market.biz
    Updated Aug 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market.biz (2025). Reddit Statistics and Facts [Dataset]. https://market.biz/reddit-statistics/
    Explore at:
    Dataset updated
    Aug 4, 2025
    Dataset provided by
    Market.biz
    License

    https://market.biz/privacy-policyhttps://market.biz/privacy-policy

    Time period covered
    2022 - 2032
    Area covered
    Europe, North America, Africa, Australia, South America, ASIA
    Description

    Introduction

    Reddit Statistics: According to Similarweb, as of 2025, Reddit has approximately 3.793 billion monthly active users, highlighting its continuously increasing popularity. In the United States alone, Reddit had around 3.31 billion visits in January 2025, highlighting its substantial influence in the digital landscape.

    With an annual revenue of $1.3 billion in 2024, primarily resulting from advertising, Reddit has established itself as one of the leading social media platforms. A year following its launch, Reddit was acquired by Conde Nast Publications for $20 million.

    Then, Conde Nast's parent company, Advance Publications, transitioned Reddit to its own subsidiary a few years later. It continues to be the largest shareholder, although a recent investment from Tencent may have reduced its ownership percentage.

  11. p

    Reddit Datasets

    • promptcloud.com
    csv
    Updated Mar 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PromptCloud (2025). Reddit Datasets [Dataset]. https://www.promptcloud.com/dataset/reddit/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Mar 28, 2025
    Dataset authored and provided by
    PromptCloud
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Extracting Insights from Online DiscussionsReddit is one of the largest social discussion platforms, making it a valuable source for real-time opinions, trends, sentiment analysis, and user interactions across various industries. Scraping Reddit data allows businesses, researchers, and analysts to explore public discussions, track sentiment, and gain actionable insights from user-generated content. Benefits and Impact: Trend […]

  12. Data from: Hybrid Approaches to Detect Comments Violating Macro Norms on...

    • zenodo.org
    csv
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eshwar Chandrasekharan; Mattia Samory; Eric Gilbert; Eshwar Chandrasekharan; Mattia Samory; Eric Gilbert (2020). Hybrid Approaches to Detect Comments Violating Macro Norms on Reddit [Dataset]. http://doi.org/10.5281/zenodo.3338698
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Eshwar Chandrasekharan; Mattia Samory; Eric Gilbert; Eshwar Chandrasekharan; Mattia Samory; Eric Gilbert
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    [Content warning: Files may contain instances of highly inflammatory and offensive content.]


    This dataset was generated as an extension of our CSCW 2018 paper:

    Eshwar Chandrasekharan, Mattia Samory, Shagun Jhaver, Hunter Charvat, Amy Bruckman, Cliff Lampe, Jacob Eisenstein, and Eric Gilbert. 2018. The Internet’s Hidden Rules: An Empirical Study of Reddit Norm Violations at Micro, Meso, and Macro Scales. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (2018), 32.

    Description:

    Working with over 2M removed comments collected from 100 different communities on Reddit (subreddit names listed in data/study-subreddits.csv), we identified 8 macro norms, i.e., norms that are widely enforced on most parts of Reddit. We extracted these macro norms by employing a hybrid approach—classification, topic modeling, and open-coding—on comments identified to be norm violations within at least 85 out of the 100 study subreddits. Finally, we labelled over 40K Reddit comments removed by moderators according to the specific type of macro norm being violated, and make this dataset publicly available (also available on Github).

    For each of the labeled topics, we identified the top 5000 removed comments that were best fit by the LDA topic model. In this way, we identified over 5000 removed comments that are examples of each type of macro norm violation described in the paper. The removed comments were sorted by their topic fit, stored into respective files based on the type of norm violation they represent, and are made available on this repo.

    Here we make the following datasets publicly available:

    * 1 file containing the log of over 2M removed comments obtained from the top 100 subreddits between May 2016 to March 2017, after filtering out the following comments: 1) comments by u/AutoModerator, 2) replies to removed comments (i.e., children of the poisoned tree - refer to the paper for more information), and 3) non-readable comments (not utf-8 encoded).

    * 8 files, each containing 5000+ removed comments obtained from Reddit, are stored in: data/macro-norm-violations/ , and they are split into different files based on the macro norm they violated. Each new line in the files represent a comment that was posted on Reddit between May 2016 to March 2017, and subsequently removed by subreddit moderators for violating community norms. All comments were preprocessed using the script in code/preprocessing-reddit-comments.py , in order to do the following: 1. remove new lines, 2. convert text to lowercase, and 3. strip numbers and punctuations from comments.

    Description of 1 file containing over 2M removed comments from 100 subreddits.

    • "reddit-removal-log.csv" - all comments that were removed from the 100 study subreddits during the study period described above (post-filtering).

    Descriptions of each file containing 5059 comments (that were removed from Reddit, and preprocessed) violating macro norms present in data/macro-norm-violations/:

    • "macro-norm-violations-n10-t0-misogynistic-slurs.csv" - Comments that use misogynistic slurs.
    • "macro-norm-violations-n15-t2-hatespeech-racist-homophobic.csv" - Comments containing hate speech that is racist or homophobic.
    • "macro-norm-violations-n10-t3-opposing-political-views-trump.csv", "macro-norm-violations-n15-t10-opposing-political-views-trump.csv" - Comments with opposing political views around Trump (depends on originating sub).
    • "macro-norm-violations-n10-t4-verbal-attacks-on-Reddit.csv" - Comments containing verbal attacks on Reddit or specific subreddits.
    • "macro-norm-violations-n10-t5-porno-links.csv" - Comments with pornographic links.
    • "macro-norm-violations-n10-t8-personal-attacks.csv", "macro-norm-violations-n10-t9-personal-attacks.csv"- Comments containing personal attacks.
    • "macro-norm-violations-n15-t3-abusing-and-criticisizing-mods.csv" - Comments abusing and criticisizng moderators.
    • "macro-norm-violations-n15-t9-namecalling-claiming-other-too-sensitive.csv" - Comments with name-calling, or claiming that the other person is too sensitive.

    More details about the dataset can be found on arXiv: https://arxiv.org/abs/1904.03596

  13. Reddit: quarterly number of DAUs 2021-2025, by region

    • statista.com
    Updated Jul 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Reddit: quarterly number of DAUs 2021-2025, by region [Dataset]. https://www.statista.com/statistics/1453149/reddit-quarterly-dau-by-region/
    Explore at:
    Dataset updated
    Jul 28, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    United States, Worldwide
    Description

    During the first quarter of 2025, Reddit saw approximately 50 million daily active users (DAU) in the United States. In comparison, 58 million online users in the rest of the global market engaged with Reddit content daily in the last examined quarter. Between the beginning of 2021 and the end of 2024, the number of global daily active users engaging with Reddit experienced a moderate but constant increase.

  14. Reddit Mental Health Dataset

    • zenodo.org
    csv
    Updated Oct 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel M. Low; Daniel M. Low; Laurie Rumker; Tanya Talker; John Torous; Guillermo Cecchi; Satrajit S. Ghosh; Laurie Rumker; Tanya Talker; John Torous; Guillermo Cecchi; Satrajit S. Ghosh (2020). Reddit Mental Health Dataset [Dataset]. http://doi.org/10.17605/osf.io/7peyq
    Explore at:
    csvAvailable download formats
    Dataset updated
    Oct 16, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Daniel M. Low; Daniel M. Low; Laurie Rumker; Tanya Talker; John Torous; Guillermo Cecchi; Satrajit S. Ghosh; Laurie Rumker; Tanya Talker; John Torous; Guillermo Cecchi; Satrajit S. Ghosh
    License

    ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
    License information was derived automatically

    Description

    This dataset contains posts from 28 subreddits (15 mental health support groups) from 2018-2020. We used this dataset to understand the impact of COVID-19 on mental health support groups from January to April, 2020 and included older timeframes to obtain baseline posts before COVID-19.

    Please cite if you use this dataset:

    Low, D. M., Rumker, L., Torous, J., Cecchi, G., Ghosh, S. S., & Talkar, T. (2020). Natural Language Processing Reveals Vulnerable Mental Health Support Groups and Heightened Health Anxiety on Reddit During COVID-19: Observational Study. Journal of medical Internet research, 22(10), e22635.

    @article{low2020natural,
     title={Natural Language Processing Reveals Vulnerable Mental Health Support Groups and Heightened Health Anxiety on Reddit During COVID-19: Observational Study},
     author={Low, Daniel M and Rumker, Laurie and Torous, John and Cecchi, Guillermo and Ghosh, Satrajit S and Talkar, Tanya},
     journal={Journal of medical Internet research},
     volume={22},
     number={10},
     pages={e22635},
     year={2020},
     publisher={JMIR Publications Inc., Toronto, Canada}
    }


    License

    This dataset is made available under the Public Domain Dedication and License v1.0 whose full text can be found at: http://www.opendatacommons.org/licenses/pddl/1.0/

    It was downloaded using pushshift API. Re-use of this data is subject to Reddit API terms.

    Reddit Mental Health Dataset

    Contains posts and text features for the following timeframes from 28 mental health and non-mental health subreddits:

    • 15 specific mental health support groups (r/EDAnonymous, r/addiction, r/alcoholism, r/adhd, r/anxiety, r/autism, r/bipolarreddit, r/bpd, r/depression, r/healthanxiety, r/lonely, r/ptsd, r/schizophrenia, r/socialanxiety, and r/suicidewatch)
    • 2 broad mental health subreddits (r/mentalhealth, r/COVID19_support)
    • 11 non-mental health subreddits (r/conspiracy, r/divorce, r/fitness, r/guns, r/jokes, r/legaladvice, r/meditation, r/parenting, r/personalfinance, r/relationships, r/teaching).

    filenames and corresponding timeframes:

    • post: Jan 1 to April 20, 2020 (called "mid-pandemic" in manuscript; r/COVID19_support appears). Unique users: 320,364.
    • pre: Dec 2018 to Dec 2019. A full year which provides more data for a baseline of Reddit posts. Unique users: 327,289.
    • 2019: Jan 1 to April 20, 2019 (r/EDAnonymous appears). A control for seasonal fluctuations to match post data. Unique users: 282,560.
    • 2018: Jan 1 to April 20, 2018. A control for seasonal fluctuations to match post data. Unique users: 177,089

    Unique users across all time windows (pre and 2019 overlap): 826,961.

    See manuscript Supplementary Materials (https://doi.org/10.31234/osf.io/xvwcy) for more information.

    Note: if subsampling (e.g., to balance subreddits), we recommend bootstrapping analyses for unbiased results.

  15. Data from: The Reddit Politosphere: A Large-Scale Text and Network Resource...

    • zenodo.org
    • data.niaid.nih.gov
    bz2, csv, json
    Updated Jan 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Valentin Hofmann; Valentin Hofmann; Hinrich Schütze; Hinrich Schütze; Janet B. Pierrehumbert; Janet B. Pierrehumbert (2022). The Reddit Politosphere: A Large-Scale Text and Network Resource of Online Political Discourse [Dataset]. http://doi.org/10.5281/zenodo.5851729
    Explore at:
    bz2, csv, jsonAvailable download formats
    Dataset updated
    Jan 16, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Valentin Hofmann; Valentin Hofmann; Hinrich Schütze; Hinrich Schütze; Janet B. Pierrehumbert; Janet B. Pierrehumbert
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Reddit Politosphere is a large-scale resource of online political discourse covering more than 600 political discussion groups over a period of 12 years. Based on the Pushshift Reddit Dataset, it is to the best of our knowledge the largest and ideologically most comprehensive dataset of its type now available. One key feature of the Reddit Politosphere is that it consists of both text and network data. We also release annotated metadata for subreddits and users.

    Documentation and scripts for easy data access are provided in an associated repository on GitHub.

  16. May 2015 Reddit Comments

    • kaggle.com
    zip
    Updated Jun 4, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaggle (2019). May 2015 Reddit Comments [Dataset]. https://www.kaggle.com/datasets/kaggle/reddit-comments-may-2015
    Explore at:
    zip(21429083286 bytes)Available download formats
    Dataset updated
    Jun 4, 2019
    Dataset authored and provided by
    Kagglehttp://kaggle.com/
    License

    https://www.reddit.com/wiki/apihttps://www.reddit.com/wiki/api

    Description

    Recently Reddit released an enormous dataset containing all ~1.7 billion of their publicly available comments. The full dataset is an unwieldy 1+ terabyte uncompressed, so we've decided to host a small portion of the comments here for Kagglers to explore. (You don't even need to leave your browser!)

    You can find all the comments from May 2015 on scripts for your natural language processing pleasure. What had redditors laughing, bickering, and NSFW-ing this spring?

    Who knows? Top visualizations may just end up on Reddit.

    Data Description

    The database has one table, May2015, with the following fields:

    • created_utc
    • ups
    • subreddit_id
    • link_id
    • name
    • score_hidden
    • author_flair_css_class
    • author_flair_text
    • subreddit
    • id
    • removal_reason
    • gilded
    • downs
    • archived
    • author
    • score
    • retrieved_on
    • body
    • distinguished
    • edited
    • controversiality
    • parent_id
  17. b

    Reddit Overview

    • bullfincher.io
    Updated Aug 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bullfincher (2024). Reddit Overview [Dataset]. https://bullfincher.io/companies/reddit/overview
    Explore at:
    Dataset updated
    Aug 11, 2024
    Dataset authored and provided by
    Bullfincher
    License

    https://bullfincher.io/privacy-policyhttps://bullfincher.io/privacy-policy

    Description

    Reddit, Inc. operates a website that organizes digital communities. It organizes communities based on specific interests that enable users to engage in conversations by sharing experiences, submitting links, uploading images and videos, and replying to one another. The company was founded in 2005 and is headquartered in San Francisco, California. Reddit, Inc. operates as a subsidiary of Advance Publications, Inc.

  18. m

    Reddit r/AskScience Flair Dataset

    • data.mendeley.com
    Updated May 23, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sumit Mishra (2022). Reddit r/AskScience Flair Dataset [Dataset]. http://doi.org/10.17632/k9r2d9z999.3
    Explore at:
    Dataset updated
    May 23, 2022
    Authors
    Sumit Mishra
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Reddit is a social news, content rating and discussion website. It's one of the most popular sites on the internet. Reddit has 52 million daily active users and approximately 430 million users who use it once a month. Reddit has different subreddits and here We'll use the r/AskScience Subreddit.

    The dataset is extracted from the subreddit /r/AskScience from Reddit. The data was collected between 01-01-2016 and 20-05-2022. It contains 612,668 Datapoints and 25 Columns. The database contains a number of information about the questions asked on the subreddit, the description of the submission, the flair of the question, NSFW or SFW status, the year of the submission, and more. The data is extracted using python and Pushshift's API. A little bit of cleaning is done using NumPy and pandas as well. (see the descriptions of individual columns below).

    The dataset contains the following columns and descriptions: author - Redditor Name author_fullname - Redditor Full name contest_mode - Contest mode [implement obscured scores and randomized sorting]. created_utc - Time the submission was created, represented in Unix Time. domain - Domain of submission. edited - If the post is edited or not. full_link - Link of the post on the subreddit. id - ID of the submission. is_self - Whether or not the submission is a self post (text-only). link_flair_css_class - CSS Class used to identify the flair. link_flair_text - Flair on the post or The link flair’s text content. locked - Whether or not the submission has been locked. num_comments - The number of comments on the submission. over_18 - Whether or not the submission has been marked as NSFW. permalink - A permalink for the submission. retrieved_on - time ingested. score - The number of upvotes for the submission. description - Description of the Submission. spoiler - Whether or not the submission has been marked as a spoiler. stickied - Whether or not the submission is stickied. thumbnail - Thumbnail of Submission. question - Question Asked in the Submission. url - The URL the submission links to, or the permalink if a self post. year - Year of the Submission. banned - Banned by the moderator or not.

    This dataset can be used for Flair Prediction, NSFW Classification, and different Text Mining/NLP tasks. Exploratory Data Analysis can also be done to get the insights and see the trend and patterns over the years.

  19. h

    reddit-r-bitcoin-data-for-jun-2022

    • huggingface.co
    • opendatalab.com
    Updated Jun 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SocialGrep (2022). reddit-r-bitcoin-data-for-jun-2022 [Dataset]. https://huggingface.co/datasets/SocialGrep/reddit-r-bitcoin-data-for-jun-2022
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 15, 2022
    Authors
    SocialGrep
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Lite version of our Reddit /r/Bitcoin dataset - CSV of all posts & comments to the /r/Bitcoin subreddit over Jun 2022.

  20. d

    Dataplex: Reddit Data | Global Social Media Data | 2.1M+ subreddits: trends,...

    • datarade.ai
    .json, .csv
    Updated Aug 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataplex (2024). Dataplex: Reddit Data | Global Social Media Data | 2.1M+ subreddits: trends, audience insights + more | Ideal for Interest-Based Segmentation [Dataset]. https://datarade.ai/data-products/dataplex-reddit-data-global-social-media-data-1-1m-mill-dataplex
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Aug 12, 2024
    Dataset authored and provided by
    Dataplex
    Area covered
    Mexico, Chile, Gambia, Jersey, Macao, Botswana, Holy See, Côte d'Ivoire, Christmas Island, Martinique
    Description

    The Reddit Subreddit Dataset by Dataplex offers a comprehensive and detailed view of Reddit’s vast ecosystem, now enhanced with appended AI-generated columns that provide additional insights and categorization. This dataset includes data from over 2.1 million subreddits, making it an invaluable resource for a wide range of analytical applications, from social media analysis to market research.

    Dataset Overview:

    This dataset includes detailed information on subreddit activities, user interactions, post frequency, comment data, and more. The inclusion of AI-generated columns adds an extra layer of analysis, offering sentiment analysis, topic categorization, and predictive insights that help users better understand the dynamics of each subreddit.

    2.1 Million Subreddits with Enhanced AI Insights: The dataset covers over 2.1 million subreddits and now includes AI-enhanced columns that provide: - Sentiment Analysis: AI-driven sentiment scores for posts and comments, allowing users to gauge community mood and reactions. - Topic Categorization: Automated categorization of subreddit content into relevant topics, making it easier to filter and analyze specific types of discussions. - Predictive Insights: AI models that predict trends, content virality, and user engagement, helping users anticipate future developments within subreddits.

    Sourced Directly from Reddit:

    All social media data in this dataset is sourced directly from Reddit, ensuring accuracy and authenticity. The dataset is updated regularly, reflecting the latest trends and user interactions on the platform. This ensures that users have access to the most current and relevant data for their analyses.

    Key Features:

    • Subreddit Metrics: Detailed data on subreddit activity, including the number of posts, comments, votes, and user participation.
    • User Engagement: Insights into how users interact with content, including comment threads, upvotes/downvotes, and participation rates.
    • Trending Topics: Track emerging trends and viral content across the platform, helping you stay ahead of the curve in understanding social media dynamics.
    • AI-Enhanced Analysis: Utilize AI-generated columns for sentiment analysis, topic categorization, and predictive insights, providing a deeper understanding of the data.

    Use Cases:

    • Social Media Analysis: Researchers and analysts can use this dataset to study online behavior, track the spread of information, and understand how content resonates with different audiences.
    • Market Research: Marketers can leverage the dataset to identify target audiences, understand consumer preferences, and tailor campaigns to specific communities.
    • Content Strategy: Content creators and strategists can use insights from the dataset to craft content that aligns with trending topics and user interests, maximizing engagement.
    • Academic Research: Academics can explore the dynamics of online communities, studying everything from the spread of misinformation to the formation of online subcultures.

    Data Quality and Reliability:

    The Reddit Subreddit Dataset emphasizes data quality and reliability. Each record is carefully compiled from Reddit’s vast database, ensuring that the information is both accurate and up-to-date. The AI-generated columns further enhance the dataset's value, providing automated insights that help users quickly identify key trends and sentiments.

    Integration and Usability:

    The dataset is provided in a format that is compatible with most data analysis tools and platforms, making it easy to integrate into existing workflows. Users can quickly import, analyze, and utilize the data for various applications, from market research to academic studies.

    User-Friendly Structure and Metadata:

    The data is organized for easy navigation and analysis, with metadata files included to help users identify relevant subreddits and data points. The AI-enhanced columns are clearly labeled and structured, allowing users to efficiently incorporate these insights into their analyses.

    Ideal For:

    • Data Analysts: Conduct in-depth analyses of subreddit trends, user engagement, and content virality. The dataset’s extensive coverage and AI-enhanced insights make it an invaluable tool for data-driven research.
    • Marketers: Use the dataset to better understand your target audience, tailor campaigns to specific interests, and track the effectiveness of marketing efforts across Reddit.
    • Researchers: Explore the social dynamics of online communities, analyze the spread of ideas and information, and study the impact of digital media on public discourse, all while leveraging AI-generated insights.

    This dataset is an essential resource for anyone looking to understand the intricacies of Reddit's vast ecosystem, offering the data and AI-enhanced insights needed to drive informed decisions and strategies across various fields. Whether you’re tracking emerging trends, analyzing user behavior, or conduc...

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Bright Data (2023). Reddit Datasets [Dataset]. https://brightdata.com/products/datasets/reddit
Organization logo

Reddit Datasets

Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Jan 11, 2023
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License

https://brightdata.com/licensehttps://brightdata.com/license

Area covered
Worldwide
Description

Access our extensive Reddit datasets that provide detailed information on posts, communities (subreddits), and user engagement. Gain insights into post performance, user comments, community statistics, and content trends with our ethically sourced data. Free samples are available for evaluation. 3M+ records available Price starts at $250/100K records Data formats are available in JSON, NDJSON, CSV, XLSX and Parquet. 100% ethical and compliant data collection Included datapoints:

Post ID, Title & URL Post Description & Date Username of Poster Upvotes & Comment Count Community Name, URL & Description Community Member Count Attached Photos & Videos Full Post Comments Related Posts Post Karma Post Tags And more

Search
Clear search
Close search
Google apps
Main menu