The Reddit Subreddit Dataset by Dataplex offers a comprehensive and detailed view of Reddit’s vast ecosystem, now enhanced with appended AI-generated columns that provide additional insights and categorization. This dataset includes data from over 2.1 million subreddits, making it an invaluable resource for a wide range of analytical applications, from social media analysis to market research.
Dataset Overview:
This dataset includes detailed information on subreddit activities, user interactions, post frequency, comment data, and more. The inclusion of AI-generated columns adds an extra layer of analysis, offering sentiment analysis, topic categorization, and predictive insights that help users better understand the dynamics of each subreddit.
2.1 Million Subreddits with Enhanced AI Insights: The dataset covers over 2.1 million subreddits and now includes AI-enhanced columns that provide: - Sentiment Analysis: AI-driven sentiment scores for posts and comments, allowing users to gauge community mood and reactions. - Topic Categorization: Automated categorization of subreddit content into relevant topics, making it easier to filter and analyze specific types of discussions. - Predictive Insights: AI models that predict trends, content virality, and user engagement, helping users anticipate future developments within subreddits.
Sourced Directly from Reddit:
All data in this dataset is sourced directly from Reddit, ensuring accuracy and authenticity. The dataset is updated regularly, reflecting the latest trends and user interactions on the platform. This ensures that users have access to the most current and relevant data for their analyses.
Key Features:
Use Cases:
Data Quality and Reliability:
The Reddit Subreddit Dataset emphasizes data quality and reliability. Each record is carefully compiled from Reddit’s vast database, ensuring that the information is both accurate and up-to-date. The AI-generated columns further enhance the dataset's value, providing automated insights that help users quickly identify key trends and sentiments.
Integration and Usability:
The dataset is provided in a format that is compatible with most data analysis tools and platforms, making it easy to integrate into existing workflows. Users can quickly import, analyze, and utilize the data for various applications, from market research to academic studies.
User-Friendly Structure and Metadata:
The data is organized for easy navigation and analysis, with metadata files included to help users identify relevant subreddits and data points. The AI-enhanced columns are clearly labeled and structured, allowing users to efficiently incorporate these insights into their analyses.
Ideal For:
This dataset is an essential resource for anyone looking to understand the intricacies of Reddit's vast ecosystem, offering the data and AI-enhanced insights needed to drive informed decisions and strategies across various fields. Whether you’re tracking emerging trends, analyzing user behavior, or conducting acade...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overall, this project was meant test the relationship between social media posts and their short-term effect on stock prices. We decided to use Reddit posts from financial specific subreddit communities like r/wallstreetbets, r/investing, and r/stocks to see the changes in the market associated with a variety of posts made by users. This idea came to light because of the GameStop short squeeze that showed the power of social media in the market. Typically, stock prices should purely represent the total present value of all the future value of the company, but the question we are asking is whether social media can impact that intrinsic value. Our research question was known from the start and it was do Reddit posts for or against a certain stock provide insight into how the market will move in a short window. To solve this problem, we selected five large tech companies including Apple, Tesla, Amazon, Microsoft, and Google. These companies would likely give us more data in the subreddits and would have less volatility day to day allowing us to simulate an experiment easier. They trade at very high values so a change from a Reddit post would have to be significant giving us proof that there is an effect.
Next, we had to choose our data sources for to have data to test with. First, we tried to locate the Reddit data using a Reddit API, but due to circumstances regarding Reddit requiring approval to use their data we switched to a Kaggle dataset that contained metadata from Reddit. For our second data set we had planned to use Yahoo Finance through yfinance, but due to the large amount of data we were pulling from this public API our IP address was temporarily blocked. This caused us to switch our second data to pull from Alpha Vantage. While this was a large switch in the public it was a minor roadblock and fixing the Finance pulling section allowed for everything else to continue to work in succession. Once we had both of our datasets programmatically pulled into our local vs code, we implemented a pipeline to clean, merge, and analyze all the data. At the end, we implement a Snakemake workflow to ensure the project was easily reproducible. To continue, we utilized Textblob to label our Reddit posts with a sentiment value of positive, negative, or neutral and provide us with a correlation value to analyze with. We then matched the time frame of each post with the stock data and computed any possible changes, found a correlation coefficient, and graphed our findings.
To conclude the data analysis, we found that there is relatively small or no correlation between the total companies, but Microsoft and Google do show stronger correlations when analyzed on their own. However, this may be due to other circumstances like why the post was made or if the market had other trends on those dates already. A larger analysis with more data from other social media platforms would be needed to conclude for our hypothesis that there is a strong correlation.
The global number of Reddit users in was forecast to continuously increase between 2024 and 2028 by in total 52.1 million users (+10.33 percent). After the ninth consecutive increasing year, the Reddit user base is estimated to reach 556.59 million users and therefore a new peak in 2028. Notably, the number of Reddit users of was continuously increasing over the past years.User figures, shown here with regards to the platform reddit, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once. Reddit users encompass both users that are logged in and those that are not.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Reddit users in countries like Asia and Europe.
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The U.S. AI Training Dataset Market size was valued at USD 590.4 million in 2023 and is projected to reach USD 1880.70 million by 2032, exhibiting a CAGR of 18.0 % during the forecasts period. The U. S. AI training dataset market deals with the generation, selection, and organization of datasets used in training artificial intelligence. These datasets contain the requisite information that the machine learning algorithms need to infer and learn from. Conducts include the advancement and improvement of AI solutions in different fields of business like transport, medical analysis, computing language, and money related measurements. The applications include training the models for activities such as image classification, predictive modeling, and natural language interface. Other emerging trends are the change in direction of more and better-quality, various and annotated data for the improvement of model efficiency, synthetic data generation for data shortage, and data confidentiality and ethical issues in dataset management. Furthermore, due to arising technologies in artificial intelligence and machine learning, there is a noticeable development in building and using the datasets. Recent developments include: In February 2024, Google struck a deal worth USD 60 million per year with Reddit that will give the former real-time access to the latter’s data and use Google AI to enhance Reddit’s search capabilities. , In February 2024, Microsoft announced around USD 2.1 billion investment in Mistral AI to expedite the growth and deployment of large language models. The U.S. giant is expected to underpin Mistral AI with Azure AI supercomputing infrastructure to provide top-notch scale and performance for AI training and inference workloads. .
The number of Reddit users in the United States was forecast to continuously increase between 2024 and 2028 by in total 10.3 million users (+5.21 percent). After the ninth consecutive increasing year, the Reddit user base is estimated to reach 208.12 million users and therefore a new peak in 2028. Notably, the number of Reddit users of was continuously increasing over the past years.User figures, shown here with regards to the platform reddit, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once. Reddit users encompass both users that are logged in and those that are not.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Reddit users in countries like Mexico and Canada.
The number of Reddit users in France was forecast to continuously increase between 2024 and 2028 by in total *** million users (+**** percent). After the eighth consecutive increasing year, the Reddit user base is estimated to reach ***** million users and therefore a new peak in 2028. Notably, the number of Reddit users of was continuously increasing over the past years.User figures, shown here with regards to the platform reddit, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once. Reddit users encompass both users that are logged in and those that are not.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to *** countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Reddit users in countries like Netherlands and Luxembourg.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets for the case study on the spread of the vernacular term "based" across 4chan/pol/, Reddit, and Twitter. Data was gathered in November 2021. All files are anonymised as much as possible. They contain:
Table 2 below details the queries we carried out for the collection of the initial datasets. For all platforms, we chose to retain non-English languages since the diffusion of the term in other languages was also deemed relevant.
source | query | query type |
(#based OR (based (pilled OR pill OR redpilled OR redpill OR chad OR virgin OR cringe OR cringy OR triggered OR trigger OR tbh OR lol OR lmao OR wtf OR swag OR nigga OR finna OR bitch OR rare) ) OR " is based" OR "that\'s based" OR "based as fuck" OR "based af" OR "too based" OR "fucking based" "extremely based" OR "totally based" OR "incredibly based" OR "very based" OR "so based" OR "pretty based" OR "quite based" OR "kinda based" OR "kind of based" OR "fairly based" OR "based ngl" OR "as based as" OR "thank you based " OR "stay based" OR "based god") -"based in"-"based off"-"based * off"-"based around"-"based * around"-"based on"-"based * on"-"based out of"-"based upon"-"based * upon"-"based at"-"based from"-"is based by"-"is based of"-"on which * is based"-"upon which * is based"-"which is based there"-"is based all over"-"based more on"-"plant based"-"text based"-"turn based"-"need based"-"evidence based"-"community based" -"web based" -is:retweet -is:nullcast | Twitter v2 API | |
based -"based in" -"based off" -"based around" -"based on" -"based them on" -"based it on" -"evidence based" | Pushshift API | |
4chan/pol/ | lower(body) LIKE '%based%' AND lower(body) NOT SIMILAR TO '%(-based|debased|based in |based off |based around |based on |based them on|based it on|based her on|based him on|based only on|based completely on|based solely on|based purely on|based entirely on|based not on |based not simply on|based entirely around|based out of|based upon |based at |is based by |is based of|on which it is based|on which this is based|which is based there|is based all over|which it is based|is based of |based firmly on|based off |based solely off|based more on|plant based|text based|turn based|need based|evidence based|community based|home based|internet based|web based|physics based)%' | PostgreSQL |
There were some data gaps for 4chan/pol/ and Reddit. /pol/ data was missing because of gaps in the archives (mostly due to outages). The following time periods are incomplete or missing entirely:
15 - 16 April 2019
14 - 15 December 2019
3 - 10 December 2020
29 March 2021
10 - 12 April 2021
16 - 18 August 2021
11 October 2021
The 4plebs archive moreover only started in November 2013, meaning the first two years of /pol/’s existence are missing.
The data returned by the Pushshift API did not return posts for certain dates. We somewhat mitigated this by also retrieving data through the new Beta endpoint. However, the following time periods were still missing data:
1 - 30 September 2017
1 February - 31 March 2018
5 - 6 November 2020
23 March 2021 through 27 March 2021
10 - 13 April 2021
Afterward initial data collection, we carried out several rounds of filtering to get rid of remaining false positives. For 4chan/pol/, we only needed to do this filtering once (attaining 0.95 precision), while for Twitter we carried out eight rounds (0.92 precision). For Reddit, we formulated nearly 500 exclusions but failed to generate a precision over 0.9. We thus had to do more rigorous filtering. We observed that longer comments were more likely to be false positives, so we removed all comments over 350 characters long. We settled on this number on the basis of our first sample; almost no true positives were over 350 characters long. Furthermore, we removed all comments except for those wherein based was used as a standalone word (thus excluding e.g. “plant-based”), at the start or end of a sentence, in capitals, or in conjunction with certain keywords or in certain phrases (e.g. “kinda based”). We also deleted posts by bot accounts by (rather crudely) removing posts of usernames including ‘bot’ or ‘auto’. This finally led to a precision of 0.9.
-based|location based |
@-mentions with “based” "on which "where "wherever #based #customer| alkaline based| anime based | are based near | astrology based | at the based of| b0Iuip5wnA| based economy| based game | based locally| based my name | based near | based not upon| based points| based purely off| based quite near | based solely off| based soy source| based upstairs| blast based| class based| clearly based of this| combat based| condition based| dos based| emotional based| eth based| fact based| gender based| he based his | he's based in | indian based| is based for fans| is based lies| is based near | is based not around | is based not on | is based once again on | is based there| is based within| issue based| jersey based| listen to 01 we rare| music based| oil based| on which it's based| page based 1000| paper based| park based | pc based| pic based| pill based regimen| puzzle based| sex based | she based her | she's based in | skill based| story based| they based their | they're based in| toronto based| trigger on a new yoga 2| u.s. based| universal press| us based| value based| we're based in | where you based?| you're based in |#alkaline #based|#apps #based|#based #acidic|#flash #based|#home #based|#miami #based|#piano #based|#value #based|american based|australia based|australian based|based my decision|based entirely around|based entirely on|based exactly on |based her announcement|based her decision|based her off|based him off|based his announcement|based his decision|based largely on|based less on|based mostly on|based my guess|based only around|based only on|based partly on|based partly upon|based purely on |based solely around|based solely on|based strictly on|based the announcement|based the decision|based their announcement|based their decision|based, not upon|battery based|behavior based|behaviour based|blockchain based|book based series|canon based|character based|cloud based|commision based|component based|computer based|confusion based|content based|depression based|dev based|dnd based|factually based|faith based|fear based|flash based|flintstones based|flour based|home based|homin based|i based my|interaction based|is based circa|is based competely on|is based entirely off|is based here|is based more on|is based outta|is based totally on |is based up here|is based way more on|live conferences with r3|living based of|london based|luck based|malex based|market based|miami based|needs based|nyc based|on which the film is based|opinion based|piano based|point based|potato based|premise is based|region based|religious based|science based|she is based there|slavery based show|softball based|thanks richard clark|u.k. based|uk based|vendor based|vodka based|volunteer based|water based|where he is based|where the disney film is based|where the military is based|who are based there|who is based there|wordpress cms |
Allowed all posts:
|
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
We develop a new Data-Driven Phasic Word Identification (DDPWI) methodology to determine which words matter as the bitcoin pricing dynamic changes from one phase to another. With Google search volumes as a baseline, we find that Reddit submissions are both correlated with Google and have a comparable relationship with a variety of bitcoin metrics, using Spearman's rho. Reddit provides complete access to the text of submissions. Rather than associating sentiment with market activity, we describe the DDPWI method for finding specific 'price dynamic' words associated with changes in the bitcoin pricing pattern through 2017 and 2018. We assess the significance of these changes using Wilcoxon Rank-Sum Tests with Bonferroni corrections. These price dynamic words are used to pull out associated words in the submissions thereby providing the context to their use. For example, the price dynamic word 'ban', which became significantly higher in frequency as prices fell, occurred in the context of both government regulation and internet companies banning cryptocurrency adverts. This approach could be used more generally to look at social media and discussion forums at a granular level identifying specific words that impact the metric under investigation rather than overall sentiment.
The number of Reddit users in Israel was forecast to increase between 2024 and 2028 by in total 0.01 million users (+0.76 percent). This overall increase does not happen continuously, notably not in 2027. The Reddit user base is estimated to amount to 1.32 million users in 2028. Notably, the number of Reddit users of was continuously increasing over the past years.User figures, shown here with regards to the platform reddit, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once. Reddit users encompass both users that are logged in and those that are not.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Reddit users in countries like Bahrain and Kuwait.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Actually, I prepare this dataset for students on my Deep Learning and NLP course.
But I am also very happy to see kagglers play around with it.
Have fun!
Description:
There are two channels of data provided in this dataset:
News data: I crawled historical news headlines from Reddit WorldNews Channel (/r/worldnews). They are ranked by reddit users' votes, and only the top 25 headlines are considered for a single date. (Range: 2008-06-08 to 2016-07-01)
Stock data: Dow Jones Industrial Average (DJIA) is used to "prove the concept". (Range: 2008-08-08 to 2016-07-01)
I provided three data files in .csv format:
RedditNews.csv: two columns The first column is the "date", and second column is the "news headlines". All news are ranked from top to bottom based on how hot they are. Hence, there are 25 lines for each date.
DJIA_table.csv: Downloaded directly from Yahoo Finance: check out the web page for more info.
Combined_News_DJIA.csv: To make things easier for my students, I provide this combined dataset with 27 columns. The first column is "Date", the second is "Label", and the following ones are news headlines ranging from "Top1" to "Top25".
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset provides insights into public opinion regarding Bitcoin, derived from comments posted on the /r/Bitcoin subreddit during June 2022 [1, 2]. It is designed to help users track current trends and developments within the cryptocurrency world [2]. The data includes the actual body text of the comments, alongside their assigned sentiment, making it a valuable resource for understanding the evolving landscape of Bitcoin [1, 2].
The dataset includes several key columns for each comment: * type: Describes the type of post, stored as a String [1-3]. * subreddit.name: The name of the subreddit, which is "/r/Bitcoin" in this case, stored as a String [1-3]. * subreddit.nsfw: Indicates whether the subreddit is Not Safe For Work (NSFW), a Boolean value [1-4]. The sources indicate that almost all entries (170,032 out of 170,036) are marked as 'false' for NSFW [4]. * created_utc: The timestamp when the post was created, allowing for chronological analysis [1-8]. * permalink: The permanent link to the original post or comment on Reddit, a String [1-3]. * score: The score of the post, an Integer value, typically reflecting upvotes or downvotes [1, 2]. * body: The main text content of the comment, stored as a String [1-3]. Notably, about 7% of comments are "[removed]" and 3% are "[deleted]" [8]. * sentiment: The assigned sentiment of the post, a String. This column also appears to have numerical values ranging from -1.00 (most negative) to 1.00 (most positive), with detailed label counts across various ranges [1, 3, 8-10]. A significant portion of comments, 32,903, fall into the -0.04 to 0.00 sentiment range [9].
This dataset focuses on comments from the /r/Bitcoin subreddit from June 2022 [1, 2]. It contains approximately 170,035 unique comment entries [4]. The timestamps for created_utc
are distributed across June 2022, with varying numbers of comments per time interval, for example, 12,392 comments were recorded between 1655544958.04 and 1655596797.80 [6]. The sentiment analysis is detailed across numerous bins, showing a wide spread of positive, negative, and neutral sentiments [8-10].
This dataset is ideal for data science and analytics [2]. Potential uses include: * Tracking cryptocurrency trends: Staying up-to-date with the latest developments in Bitcoin [2]. * Sentiment analysis: Analysing public opinion and sentiment towards Bitcoin over time [1]. * Natural Language Processing (NLP) research: Utilising the comment body text for linguistic analysis [2]. * Market research: Understanding community discussions and concerns related to Bitcoin. * Time-series analysis: Observing how sentiment and discussion volume change over the month of June 2022.
The dataset covers content from the Reddit /r/Bitcoin subreddit [1, 2]. * Time Range: Specifically the month of June 2022 [1, 2]. * Geographic Scope: While Reddit is global, the specific geographic origin of users is not detailed in the dataset columns. However, it can be considered a global snapshot of online discussion [11]. * Demographic Scope: Reflects the opinions and discussions of Reddit users who actively participate in the /r/Bitcoin subreddit.
CC0
Original Data Source:Viral Fads and Cryptocurrency
Comparing the *** selected regions regarding the number of Reddit users , the United States is leading the ranking (****** million users) and is followed by the United Kingdom with ***** million users. At the other end of the spectrum is Gabon with **** million users, indicating a difference of ****** million users to the United States. User figures, shown here with regards to the platform reddit, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once. Reddit users encompass both users that are logged in and those that are not.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to *** countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The recent extreme volatility in cryptocurrency prices occurred in the setting of popular social media forums devoted to the discussion of cryptocurrencies. We develop a framework that discovers potential causes of phasic shifts in the price movement captured by social media discussions. This draws on principles developed in healthcare epidemiology where, similarly, only observational data are available. Such causes may have a major, one-off effect or recurring effects on the trend in the price series. We find a one-off effect of regulatory bans on bitcoin, the repeated effects of rival innovations on ether and the influence of technical traders, captured through discussion of market price, on both cryptocurrencies. The results for Bitcoin differ from Ethereum, which is consistent with the observed differences in the timing of the highest price and the price phases. This framework could be applied to a wide range of cryptocurrency price series where there exists a relevant social media text source. Identified causes with a recurring effect may have value in predictive modelling, whilst one-off causes may provide insight into unpredictable black swan events that can have a major impact on a system.
The number of Reddit users in Kenya was forecast to continuously increase between 2024 and 2028 by in total 0.4 million users (+102.56 percent). After the eighth consecutive increasing year, the Reddit user base is estimated to reach 0.83 million users and therefore a new peak in 2028. Notably, the number of Reddit users of was continuously increasing over the past years.User figures, shown here with regards to the platform reddit, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once. Reddit users encompass both users that are logged in and those that are not.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Reddit users in countries like Uganda and Tanzania.
The number of Reddit users in Brazil was forecast to continuously increase between 2024 and 2028 by in total ************ users (+***** percent). After the ****** consecutive increasing year, the Reddit user base is estimated to reach ************ users and therefore a new peak in 2028. Notably, the number of Reddit users of was continuously increasing over the past years.User figures, shown here with regards to the platform reddit, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once. Reddit users encompass both users that are logged in and those that are not.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to *** countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
The number of Reddit users in Germany was forecast to continuously increase between 2024 and 2028 by in total one million user (+4.33 percent). After the eighth consecutive increasing year, the Reddit user base is estimated to reach 24.04 million users and therefore a new peak in 2028. Notably, the number of Reddit users of was continuously increasing over the past years.User figures, shown here with regards to the platform reddit, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once. Reddit users encompass both users that are logged in and those that are not.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Reddit users in countries like Switzerland and Austria.
The number of Reddit users in Latvia was forecast to continuously decrease between 2024 and 2028 by in total 0.01 million users (-2.33 percent). While the Reddit user base was increasing earlier, it deteriorated and the Reddit user base was forecast to reach 0.42 million users in 2028. Notably, the number of Reddit users of was continuously increasing over the past years.User figures, shown here with regards to the platform reddit, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once. Reddit users encompass both users that are logged in and those that are not.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Reddit users in countries like Estonia and Lithuania.
The number of Reddit users in Lithuania was forecast to continuously increase between 2024 and 2028 by in total 0.01 million users (+1.19 percent). The Reddit user base is estimated to amount to 0.85 million users in 2028. Notably, the number of Reddit users of was continuously increasing over the past years.User figures, shown here with regards to the platform reddit, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once. Reddit users encompass both users that are logged in and those that are not.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Reddit users in countries like Estonia and Latvia.
The number of Reddit users in Estonia was forecast to continuously decrease between 2024 and 2028 by in total 0.03 million users (-5.77 percent). While the Reddit user base was increasing earlier, it deteriorated and the Reddit user base was forecast to reach 0.49 million users in 2028. User figures, shown here with regards to the platform reddit, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once. Reddit users encompass both users that are logged in and those that are not.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Reddit users in countries like Latvia and Lithuania.
The number of Reddit users in Romania was forecast to increase between 2024 and 2028 by in total *** million users (+**** percent). This overall increase does not happen continuously, notably not in 2027 and 2028. The Reddit user base is estimated to amount to **** million users in 2028. Notably, the number of Reddit users of was continuously increasing over the past years.User figures, shown here with regards to the platform reddit, have been estimated by taking into account company filings or press material, secondary research, app downloads and traffic data. They refer to the average monthly active users over the period and count multiple accounts by persons only once. Reddit users encompass both users that are logged in and those that are not.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to *** countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).Find more key insights for the number of Reddit users in countries like Bulgaria and Serbia.
The Reddit Subreddit Dataset by Dataplex offers a comprehensive and detailed view of Reddit’s vast ecosystem, now enhanced with appended AI-generated columns that provide additional insights and categorization. This dataset includes data from over 2.1 million subreddits, making it an invaluable resource for a wide range of analytical applications, from social media analysis to market research.
Dataset Overview:
This dataset includes detailed information on subreddit activities, user interactions, post frequency, comment data, and more. The inclusion of AI-generated columns adds an extra layer of analysis, offering sentiment analysis, topic categorization, and predictive insights that help users better understand the dynamics of each subreddit.
2.1 Million Subreddits with Enhanced AI Insights: The dataset covers over 2.1 million subreddits and now includes AI-enhanced columns that provide: - Sentiment Analysis: AI-driven sentiment scores for posts and comments, allowing users to gauge community mood and reactions. - Topic Categorization: Automated categorization of subreddit content into relevant topics, making it easier to filter and analyze specific types of discussions. - Predictive Insights: AI models that predict trends, content virality, and user engagement, helping users anticipate future developments within subreddits.
Sourced Directly from Reddit:
All data in this dataset is sourced directly from Reddit, ensuring accuracy and authenticity. The dataset is updated regularly, reflecting the latest trends and user interactions on the platform. This ensures that users have access to the most current and relevant data for their analyses.
Key Features:
Use Cases:
Data Quality and Reliability:
The Reddit Subreddit Dataset emphasizes data quality and reliability. Each record is carefully compiled from Reddit’s vast database, ensuring that the information is both accurate and up-to-date. The AI-generated columns further enhance the dataset's value, providing automated insights that help users quickly identify key trends and sentiments.
Integration and Usability:
The dataset is provided in a format that is compatible with most data analysis tools and platforms, making it easy to integrate into existing workflows. Users can quickly import, analyze, and utilize the data for various applications, from market research to academic studies.
User-Friendly Structure and Metadata:
The data is organized for easy navigation and analysis, with metadata files included to help users identify relevant subreddits and data points. The AI-enhanced columns are clearly labeled and structured, allowing users to efficiently incorporate these insights into their analyses.
Ideal For:
This dataset is an essential resource for anyone looking to understand the intricacies of Reddit's vast ecosystem, offering the data and AI-enhanced insights needed to drive informed decisions and strategies across various fields. Whether you’re tracking emerging trends, analyzing user behavior, or conducting acade...