Consumers from countries in Africa, Asia, and South America were most likely to be an owner of cryptocurrencies, such as Bitcoin, in 2025. This conclusion can be reached after combining ** different surveys from the Statista's Consumer Insights over the course of that year. Nearly one out of three respondents to Statista's survey in Nigeria, for instance, mentioned they either owned or use a digital coin, rather than *** out of 100 respondents in the United States. This is a significant change from a list that looks at the Bitcoin (BTC) trading volume in ** countries: There, the United States and Russia were said to have traded the highest amounts of this particular virtual coin. Nevertheless, African and Latin American countries are noticeable entries in that list too. Daily use, or an investment tool? The survey asked whether consumers either owned or used cryptocurrencies but does not specify their exact use or purpose. Some countries, however, are more likely to use digital currencies on a day-to-day basis. Nigeria increasingly uses mobile money operations to either pay in stores or to send money to family and friends. Polish consumers could buy several types of products with a cryptocurrency in 2019. Opposed to this is the country of Vietnam: Here, the use of Bitcoin and other cryptocurrencies as a payment method is forbidden. Owning some form of cryptocurrency in Vietnam as an investment is allowed, however. Which countries are more likely to invest in cryptocurrencies? Professional investors looking for a cryptocurrency-themed ETF were more often found in Europe than in the United or China, according to a survey in early 2020. Most of the largest crypto hedge fund managers with a location in Europe in 2020, were either from the United Kingdom or Switzerland - the country with the highest cryptocurrency adoption rate in Europe according to Statista's Global Consumer Survey. Whether this had changed by 2025 was not yet clear.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Reddit [source]
This dataset contains detailed information on posts, scores and comments from the Reddit subreddit ‘CryptoCurrency’ - a fascinating online community devoted to discussion and analysis of the latest developments in blockchain investments, digital currencies, and other associated topics. Dive into the data to see what ultimate insights cryptocurrency enthusiasts are offering each other - their post titles, scores (the net upvotes a post has received), comment counts, created dates and timestamps are all laid out here for easy exploration. By taking advantage of this unique snapshot into crypto discussions and trends you can gain a better understanding not only of what topics have been popular over time but also how they're being discussed across this passionate community. Are there particular trends or patterns that emerge? It's up to you to uncover them!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset contains posts and comments from the subreddit ‘CryptoCurrency’, which is a widely-followed discussion board devoted to discussing cryptocurrencies, blockchain investments, and other related topics. The dataset contains a large number of posts from the subreddit and their associated scores, comment counts and creation timestamps. This dataset can be used in numerous ways for both research and practical business applications.
First, let's explore what columns are contained within this dataset: title, score, url, comms_num (number of comments), created (date and time post was created), body (actual content of the post), timestamp. With this information at hand you can begin answering key questions such as: What type of topics bring more attention? What topics are not popular? Are there any correlations between posts with higher scores(upvotes) or more comments?
To better understand these questions there are numerous tools that can be employed on this data including Natural Language Processing tools such as TF-IDF vectorizers or Latent Dirichlet Allocation to understand what type of themes dominate these conversations. Additionally machine learning algorithms such as clustering techniques like K Nearest Neighbors or Unsupervised Learning techniques like Principal Component Analysis could help uncover insights from this data set. For example if we wanted to find out which words in titles correlated with higher scores then KNN could give us a better understanding as it would build clusters based on similar titles/words and show how each vary in relation score wise giving us an overview on how related words influence scores before analyzing content or any other factors within the data set.
Furthermore Reddit users actively engage with posts so by looking at comment counts insight can also be taken into effect regarding popularity etc... For example one may observe that whenever new coin values arise they tend to have more comments than usual - an insight indicating high levels of user engagement at certain moments in time when compared to regular periods which could be useful when making comparisons between individual coins etc..
Overall this data can provide tremendous value depending on its usage case - whether it stands for research purposes only or applied analytics geared towards predicting prices/engagement/ user sentiment etc it all depends but nonetheless opportunities lie within unlocking financial opportunities through cryptocurrency discussion found on reddit thus making it highly valuable for multiple purposes utilized properly!
- This dataset can be used to create a sentiment analysis of the comments and posts on CryptoCurrency topics and how these conversations have changed over time. This can help ascertain how different events within the crypto market have been received by investors, speculators, and other users on the subreddit.
- The dataset can also be utilized to identify trends in successful topics of conversation (in terms of post scores) and give insight into what types of topics are popular among Redditors in the CryptoCurrency space.
- Furthermore, this dataset could provide insight into user behavior on CryptoCurrency subreddits by enabling analysis around peak times for certain conversations or post popularity as well as which users tend to comment or post more frequently in response times vs others
If you use this dataset in your research, please credit the original authors. Data Source
This dataset was created by AHMED ALY1
Released under Data files © Original Authors
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Ethereum (ETH-USD) Historical Dataset from 2015 to 2021
Date: Represents the date at which the share is traded in the stock market.
Open: Represents the opening price of the stock at a particular date. It is the price at which a stock started trading when the opening bell rang.
Close: Represents the closing price of the stock at a particular date. It is the last buy-sell order executed between two traders. The closing price is the raw price, which is just the cash value of the last transacted price before the market closes.
High: The high is the highest price at which a stock is traded during a period. Here the period is a day.
Low: The low is the lowest price at which a stock is traded during a period. Here the period is a day.
Adj Close: The adjusted closing price amends a stock's closing price to reflect that stock's value after accounting for any corporate actions. The adjusted closing price factors in corporate actions, such as stock splits, dividends, and rights offerings.
Volume: Volume is the number of shares of security traded during a given period of time. Here the security is stock and the period of time is a day.
Sources: Investopedia
This dataset was created by k.khubiev
This is an auto-updating [up to today] extra dataset for the G-Research Crypto forecasting competition.
This is some extra data collected by me for the cryptocurrency prediction competition. This specific file is the dataset of Binance Coin
This dataset is an extra updating dataset for the G-Research Crypto Forecasting competition.
This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of the 1-minute resolution, collected for all competition assets and both retrieval and uploading are fully automated. see discussion topic.
For every asset in the competition, the following fields from Binance's official API endpoint for historical candlestick data are collected, saved, and processed.
1. **timestamp** - A timestamp for the minute covered by the row.
2. **Asset_ID** - An ID code for the cryptoasset.
3. **Count** - The number of trades that took place this minute.
4. **Open** - The USD price at the beginning of the minute.
5. **High** - The highest USD price during the minute.
6. **Low** - The lowest USD price during the minute.
7. **Close** - The USD price at the end of the minute.
8. **Volume** - The number of cryptoasset u units traded during the minute.
9. **VWAP** - The volume-weighted average price for the minute.
10. **Target** - 15 minute residualized returns. See the 'Prediction and Evaluation section of this notebook for details of how the target is calculated.
11. **Weight** - Weight, defined by the competition hosts [here](https://www.kaggle.com/cstein06/tutorial-to-the-g-research-crypto-competition)
12. **Asset_Name** - Human readable Asset name.
The dataframe is indexed by timestamp
and sorted from oldest to newest.
The first row starts at the first timestamp available on the exchange, which is July 2017 for the longest-running pairs.
The following is a collection of simple starter notebooks for Kaggle's Crypto Comp showing PurgedTimeSeries in use with the collected dataset. Purged TimesSeries is explained here. There are many configuration variables below to allow you to experiment. Use either GPU or TPU. You can control which years are loaded, which neural networks are used, and whether to use feature engineering. You can experiment with different data preprocessing, model architecture, loss, optimizers, and learning rate schedules. The extra datasets contain the full history of the assets in the same format as the competition, so you can input that into your model too.
These notebooks follow the ideas presented in my "Initial Thoughts" here. Some code sections have been reused from Chris' great (great) notebook series on SIIM ISIC melanoma detection competition here
This is a work in progress and will be updated constantly throughout the competition. At the moment, there are some known issues that still needed to be addressed:
Opening price with an added indicator (MA50):
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fb8664e6f26dc84e9a40d5a3d915c9640%2Fdownload.png?generation=1582053879538546&alt=media" alt="">
Volume and number of trades:
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fcd04ed586b08c1576a7b67d163ad9889%2Fdownload-1.png?generation=1582053899082078&alt=media" alt="">
This data is being collected automatically from the crypto exchange Binance.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Other-Long-Term-Assets Time Series for Robinhood Markets Inc. Robinhood Markets, Inc. operates financial services platform in the United States. Its platform allows users to invest in stocks, exchange-traded funds (ETFs), American depository receipts, options, gold, and cryptocurrencies. The company offers fractional trading, recurring investments, fully-paid securities lending, access to investing on margin, cash sweep, instant withdrawals, retirement program, around-the-clock trading, joint investing accounts, event contracts, and future contract services. It also provides various learning and education solutions comprise Snacks, an accessible digest of business news stories for a new generation of investors.; Learn, which is an online collection of guides, feature tutorials, and financial dictionary; Newsfeeds that offer access to free, premium news from sites from various sites, such as Barron's, Reuters, and Dow Jones. In addition, the company offers In-App Education, a resource that covers investing fundamentals, including why people invest, a stock market overview, and tips on how to define investing goals, as well as allows customers to understand the basics of investing before their first trade; and Crypto Learn and Earn, an educational module available to various crypto customers through Robinhood Learn to teach customers the basics related to cryptocurrency. Further, it provides Robinhood credit cards, cash card and spending accounts, and wallets. The company also owns and operates a digital currency marketplace that allows companies and individuals from all around the world to buy and sell bitcoin, litecoin, ethereum, ripple, and bitcoin cash. Robinhood Markets, Inc. was incorporated in 2013 and is headquartered in Menlo Park, California.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
COVID-19 affected the world’s economy severely and increased the inflation rate in both developed and developing countries. COVID-19 also affected the financial markets and crypto markets significantly, however, some crypto markets flourished and touched their peak during the pandemic era. This study performs an analysis of the impact of COVID-19 on public opinion and sentiments regarding the financial markets and crypto markets. It conducts sentiment analysis on tweets related to financial markets and crypto markets posted during COVID-19 peak days. Using sentiment analysis, it investigates the people’s sentiments regarding investment in these markets during COVID-19. In addition, damage analysis in terms of market value is also carried out along with the worse time for financial and crypto markets. For analysis, the data is extracted from Twitter using the SNSscraper library. This study proposes a hybrid model called CNN-LSTM (convolutional neural network-long short-term memory model) for sentiment classification. CNN-LSTM outperforms with 0.89, and 0.92 F1 Scores for crypto and financial markets, respectively. Moreover, topic extraction from the tweets is also performed along with the sentiments related to each topic.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cash-and-Equivalents Time Series for Robinhood Markets Inc. Robinhood Markets, Inc. operates financial services platform in the United States. Its platform allows users to invest in stocks, exchange-traded funds (ETFs), American depository receipts, options, gold, and cryptocurrencies. The company offers fractional trading, recurring investments, fully-paid securities lending, access to investing on margin, cash sweep, instant withdrawals, retirement program, around-the-clock trading, joint investing accounts, event contracts, and future contract services. It also provides various learning and education solutions comprise Snacks, an accessible digest of business news stories for a new generation of investors.; Learn, which is an online collection of guides, feature tutorials, and financial dictionary; Newsfeeds that offer access to free, premium news from sites from various sites, such as Barron's, Reuters, and Dow Jones. In addition, the company offers In-App Education, a resource that covers investing fundamentals, including why people invest, a stock market overview, and tips on how to define investing goals, as well as allows customers to understand the basics of investing before their first trade; and Crypto Learn and Earn, an educational module available to various crypto customers through Robinhood Learn to teach customers the basics related to cryptocurrency. Further, it provides Robinhood credit cards, cash card and spending accounts, and wallets. The company also owns and operates a digital currency marketplace that allows companies and individuals from all around the world to buy and sell bitcoin, litecoin, ethereum, ripple, and bitcoin cash. Robinhood Markets, Inc. was incorporated in 2013 and is headquartered in Menlo Park, California.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Stock-Based-Compensation Time Series for Robinhood Markets Inc. Robinhood Markets, Inc. operates financial services platform in the United States. Its platform allows users to invest in stocks, exchange-traded funds (ETFs), American depository receipts, options, gold, and cryptocurrencies. The company offers fractional trading, recurring investments, fully-paid securities lending, access to investing on margin, cash sweep, instant withdrawals, retirement program, around-the-clock trading, joint investing accounts, event contracts, and future contract services. It also provides various learning and education solutions comprise Snacks, an accessible digest of business news stories for a new generation of investors.; Learn, which is an online collection of guides, feature tutorials, and financial dictionary; Newsfeeds that offer access to free, premium news from sites from various sites, such as Barron's, Reuters, and Dow Jones. In addition, the company offers In-App Education, a resource that covers investing fundamentals, including why people invest, a stock market overview, and tips on how to define investing goals, as well as allows customers to understand the basics of investing before their first trade; and Crypto Learn and Earn, an educational module available to various crypto customers through Robinhood Learn to teach customers the basics related to cryptocurrency. Further, it provides Robinhood credit cards, cash card and spending accounts, and wallets. The company also owns and operates a digital currency marketplace that allows companies and individuals from all around the world to buy and sell bitcoin, litecoin, ethereum, ripple, and bitcoin cash. Robinhood Markets, Inc. was incorporated in 2013 and is headquartered in Menlo Park, California.
This dataset was created by ernestoeperez88
This dataset is an extra updating dataset for the G-Research Crypto Forecasting competition.
This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of the 1-minute resolution, collected for all competition assets and both retrieval and uploading are fully automated. see discussion topic.
For every asset in the competition, the following fields from Binance's official API endpoint for historical candlestick data are collected, saved, and processed.
1. **timestamp** - A timestamp for the minute covered by the row.
2. **Asset_ID** - An ID code for the cryptoasset.
3. **Count** - The number of trades that took place this minute.
4. **Open** - The USD price at the beginning of the minute.
5. **High** - The highest USD price during the minute.
6. **Low** - The lowest USD price during the minute.
7. **Close** - The USD price at the end of the minute.
8. **Volume** - The number of cryptoasset u units traded during the minute.
9. **VWAP** - The volume-weighted average price for the minute.
10. **Target** - 15 minute residualized returns. See the 'Prediction and Evaluation section of this notebook for details of how the target is calculated.
11. **Weight** - Weight, defined by the competition hosts [here](https://www.kaggle.com/cstein06/tutorial-to-the-g-research-crypto-competition)
12. **Asset_Name** - Human readable Asset name.
The dataframe is indexed by timestamp
and sorted from oldest to newest.
The first row starts at the first timestamp available on the exchange, which is July 2017 for the longest-running pairs.
The following is a collection of simple starter notebooks for Kaggle's Crypto Comp showing PurgedTimeSeries in use with the collected dataset. Purged TimesSeries is explained here. There are many configuration variables below to allow you to experiment. Use either GPU or TPU. You can control which years are loaded, which neural networks are used, and whether to use feature engineering. You can experiment with different data preprocessing, model architecture, loss, optimizers, and learning rate schedules. The extra datasets contain the full history of the assets in the same format as the competition, so you can input that into your model too.
These notebooks follow the ideas presented in my "Initial Thoughts" here. Some code sections have been reused from Chris' great (great) notebook series on SIIM ISIC melanoma detection competition here
This is a work in progress and will be updated constantly throughout the competition. At the moment, there are some known issues that still needed to be addressed:
Opening price with an added indicator (MA50):
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fb8664e6f26dc84e9a40d5a3d915c9640%2Fdownload.png?generation=1582053879538546&alt=media" alt="">
Volume and number of trades:
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fcd04ed586b08c1576a7b67d163ad9889%2Fdownload-1.png?generation=1582053899082078&alt=media" alt="">
This data is being collected automatically from the crypto exchange Binance.
https://www.kappasignal.com/p/legal-disclaimer.htmlhttps://www.kappasignal.com/p/legal-disclaimer.html
This analysis presents a rigorous exploration of financial data, incorporating a diverse range of statistical features. By providing a robust foundation, it facilitates advanced research and innovative modeling techniques within the field of finance.
Historical daily stock prices (open, high, low, close, volume)
Fundamental data (e.g., market capitalization, price to earnings P/E ratio, dividend yield, earnings per share EPS, price to earnings growth, debt-to-equity ratio, price-to-book ratio, current ratio, free cash flow, projected earnings growth, return on equity, dividend payout ratio, price to sales ratio, credit rating)
Technical indicators (e.g., moving averages, RSI, MACD, average directional index, aroon oscillator, stochastic oscillator, on-balance volume, accumulation/distribution A/D line, parabolic SAR indicator, bollinger bands indicators, fibonacci, williams percent range, commodity channel index)
Feature engineering based on financial data and technical indicators
Sentiment analysis data from social media and news articles
Macroeconomic data (e.g., GDP, unemployment rate, interest rates, consumer spending, building permits, consumer confidence, inflation, producer price index, money supply, home sales, retail sales, bond yields)
Stock price prediction
Portfolio optimization
Algorithmic trading
Market sentiment analysis
Risk management
Researchers investigating the effectiveness of machine learning in stock market prediction
Analysts developing quantitative trading Buy/Sell strategies
Individuals interested in building their own stock market prediction models
Students learning about machine learning and financial applications
The dataset may include different levels of granularity (e.g., daily, hourly)
Data cleaning and preprocessing are essential before model training
Regular updates are recommended to maintain the accuracy and relevance of the data
Blockchain technology is forecast to increase to nearly 1,000 trillion U.S. dollars by 2032, but this was lower than in a previous forecast. This is according to a market research forecast, focusing on blockchain with cloud applications for specific business segments. The numbers do not include decentralized applications such as blockchain gaming. Originally, a forecast from June 2022 predicted "blockchain technology" would reach 1,235 billion U.S. dollars by 2030, at a CAGR of 82.8 percent. A newer forecast from December 2023 predicts a value of 943 billion U.S. dollars in 2032 with a CAGR of 56.1 percent. The source does not explain this difference.
This dataset is an extra updating dataset for the G-Research Crypto Forecasting competition.
This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of the 1-minute resolution, collected for all competition assets and both retrieval and uploading are fully automated. see discussion topic.
For every asset in the competition, the following fields from Binance's official API endpoint for historical candlestick data are collected, saved, and processed.
1. **timestamp** - A timestamp for the minute covered by the row.
2. **Asset_ID** - An ID code for the cryptoasset.
3. **Count** - The number of trades that took place this minute.
4. **Open** - The USD price at the beginning of the minute.
5. **High** - The highest USD price during the minute.
6. **Low** - The lowest USD price during the minute.
7. **Close** - The USD price at the end of the minute.
8. **Volume** - The number of cryptoasset u units traded during the minute.
9. **VWAP** - The volume-weighted average price for the minute.
10. **Target** - 15 minute residualized returns. See the 'Prediction and Evaluation section of this notebook for details of how the target is calculated.
11. **Weight** - Weight, defined by the competition hosts [here](https://www.kaggle.com/cstein06/tutorial-to-the-g-research-crypto-competition)
12. **Asset_Name** - Human readable Asset name.
The dataframe is indexed by timestamp
and sorted from oldest to newest.
The first row starts at the first timestamp available on the exchange, which is July 2017 for the longest-running pairs.
The following is a collection of simple starter notebooks for Kaggle's Crypto Comp showing PurgedTimeSeries in use with the collected dataset. Purged TimesSeries is explained here. There are many configuration variables below to allow you to experiment. Use either GPU or TPU. You can control which years are loaded, which neural networks are used, and whether to use feature engineering. You can experiment with different data preprocessing, model architecture, loss, optimizers, and learning rate schedules. The extra datasets contain the full history of the assets in the same format as the competition, so you can input that into your model too.
These notebooks follow the ideas presented in my "Initial Thoughts" here. Some code sections have been reused from Chris' great (great) notebook series on SIIM ISIC melanoma detection competition here
This is a work in progress and will be updated constantly throughout the competition. At the moment, there are some known issues that still needed to be addressed:
Opening price with an added indicator (MA50):
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fb8664e6f26dc84e9a40d5a3d915c9640%2Fdownload.png?generation=1582053879538546&alt=media" alt="">
Volume and number of trades:
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fcd04ed586b08c1576a7b67d163ad9889%2Fdownload-1.png?generation=1582053899082078&alt=media" alt="">
This data is being collected automatically from the crypto exchange Binance.
https://www.kappasignal.com/p/legal-disclaimer.htmlhttps://www.kappasignal.com/p/legal-disclaimer.html
This analysis presents a rigorous exploration of financial data, incorporating a diverse range of statistical features. By providing a robust foundation, it facilitates advanced research and innovative modeling techniques within the field of finance.
Historical daily stock prices (open, high, low, close, volume)
Fundamental data (e.g., market capitalization, price to earnings P/E ratio, dividend yield, earnings per share EPS, price to earnings growth, debt-to-equity ratio, price-to-book ratio, current ratio, free cash flow, projected earnings growth, return on equity, dividend payout ratio, price to sales ratio, credit rating)
Technical indicators (e.g., moving averages, RSI, MACD, average directional index, aroon oscillator, stochastic oscillator, on-balance volume, accumulation/distribution A/D line, parabolic SAR indicator, bollinger bands indicators, fibonacci, williams percent range, commodity channel index)
Feature engineering based on financial data and technical indicators
Sentiment analysis data from social media and news articles
Macroeconomic data (e.g., GDP, unemployment rate, interest rates, consumer spending, building permits, consumer confidence, inflation, producer price index, money supply, home sales, retail sales, bond yields)
Stock price prediction
Portfolio optimization
Algorithmic trading
Market sentiment analysis
Risk management
Researchers investigating the effectiveness of machine learning in stock market prediction
Analysts developing quantitative trading Buy/Sell strategies
Individuals interested in building their own stock market prediction models
Students learning about machine learning and financial applications
The dataset may include different levels of granularity (e.g., daily, hourly)
Data cleaning and preprocessing are essential before model training
Regular updates are recommended to maintain the accuracy and relevance of the data
https://www.kappasignal.com/p/legal-disclaimer.htmlhttps://www.kappasignal.com/p/legal-disclaimer.html
This analysis presents a rigorous exploration of financial data, incorporating a diverse range of statistical features. By providing a robust foundation, it facilitates advanced research and innovative modeling techniques within the field of finance.
Historical daily stock prices (open, high, low, close, volume)
Fundamental data (e.g., market capitalization, price to earnings P/E ratio, dividend yield, earnings per share EPS, price to earnings growth, debt-to-equity ratio, price-to-book ratio, current ratio, free cash flow, projected earnings growth, return on equity, dividend payout ratio, price to sales ratio, credit rating)
Technical indicators (e.g., moving averages, RSI, MACD, average directional index, aroon oscillator, stochastic oscillator, on-balance volume, accumulation/distribution A/D line, parabolic SAR indicator, bollinger bands indicators, fibonacci, williams percent range, commodity channel index)
Feature engineering based on financial data and technical indicators
Sentiment analysis data from social media and news articles
Macroeconomic data (e.g., GDP, unemployment rate, interest rates, consumer spending, building permits, consumer confidence, inflation, producer price index, money supply, home sales, retail sales, bond yields)
Stock price prediction
Portfolio optimization
Algorithmic trading
Market sentiment analysis
Risk management
Researchers investigating the effectiveness of machine learning in stock market prediction
Analysts developing quantitative trading Buy/Sell strategies
Individuals interested in building their own stock market prediction models
Students learning about machine learning and financial applications
The dataset may include different levels of granularity (e.g., daily, hourly)
Data cleaning and preprocessing are essential before model training
Regular updates are recommended to maintain the accuracy and relevance of the data
This dataset is an extra updating dataset for the G-Research Crypto Forecasting competition.
This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of the 1-minute resolution, collected for all competition assets and both retrieval and uploading are fully automated. see discussion topic.
For every asset in the competition, the following fields from Binance's official API endpoint for historical candlestick data are collected, saved, and processed.
1. **timestamp** - A timestamp for the minute covered by the row.
2. **Asset_ID** - An ID code for the cryptoasset.
3. **Count** - The number of trades that took place this minute.
4. **Open** - The USD price at the beginning of the minute.
5. **High** - The highest USD price during the minute.
6. **Low** - The lowest USD price during the minute.
7. **Close** - The USD price at the end of the minute.
8. **Volume** - The number of cryptoasset u units traded during the minute.
9. **VWAP** - The volume-weighted average price for the minute.
10. **Target** - 15 minute residualized returns. See the 'Prediction and Evaluation section of this notebook for details of how the target is calculated.
11. **Weight** - Weight, defined by the competition hosts [here](https://www.kaggle.com/cstein06/tutorial-to-the-g-research-crypto-competition)
12. **Asset_Name** - Human readable Asset name.
The dataframe is indexed by timestamp
and sorted from oldest to newest.
The first row starts at the first timestamp available on the exchange, which is July 2017 for the longest-running pairs.
The following is a collection of simple starter notebooks for Kaggle's Crypto Comp showing PurgedTimeSeries in use with the collected dataset. Purged TimesSeries is explained here. There are many configuration variables below to allow you to experiment. Use either GPU or TPU. You can control which years are loaded, which neural networks are used, and whether to use feature engineering. You can experiment with different data preprocessing, model architecture, loss, optimizers, and learning rate schedules. The extra datasets contain the full history of the assets in the same format as the competition, so you can input that into your model too.
These notebooks follow the ideas presented in my "Initial Thoughts" here. Some code sections have been reused from Chris' great (great) notebook series on SIIM ISIC melanoma detection competition here
This is a work in progress and will be updated constantly throughout the competition. At the moment, there are some known issues that still needed to be addressed:
Opening price with an added indicator (MA50):
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fb8664e6f26dc84e9a40d5a3d915c9640%2Fdownload.png?generation=1582053879538546&alt=media" alt="">
Volume and number of trades:
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fcd04ed586b08c1576a7b67d163ad9889%2Fdownload-1.png?generation=1582053899082078&alt=media" alt="">
This data is being collected automatically from the crypto exchange Binance.
https://www.kappasignal.com/p/legal-disclaimer.htmlhttps://www.kappasignal.com/p/legal-disclaimer.html
This analysis presents a rigorous exploration of financial data, incorporating a diverse range of statistical features. By providing a robust foundation, it facilitates advanced research and innovative modeling techniques within the field of finance.
Historical daily stock prices (open, high, low, close, volume)
Fundamental data (e.g., market capitalization, price to earnings P/E ratio, dividend yield, earnings per share EPS, price to earnings growth, debt-to-equity ratio, price-to-book ratio, current ratio, free cash flow, projected earnings growth, return on equity, dividend payout ratio, price to sales ratio, credit rating)
Technical indicators (e.g., moving averages, RSI, MACD, average directional index, aroon oscillator, stochastic oscillator, on-balance volume, accumulation/distribution A/D line, parabolic SAR indicator, bollinger bands indicators, fibonacci, williams percent range, commodity channel index)
Feature engineering based on financial data and technical indicators
Sentiment analysis data from social media and news articles
Macroeconomic data (e.g., GDP, unemployment rate, interest rates, consumer spending, building permits, consumer confidence, inflation, producer price index, money supply, home sales, retail sales, bond yields)
Stock price prediction
Portfolio optimization
Algorithmic trading
Market sentiment analysis
Risk management
Researchers investigating the effectiveness of machine learning in stock market prediction
Analysts developing quantitative trading Buy/Sell strategies
Individuals interested in building their own stock market prediction models
Students learning about machine learning and financial applications
The dataset may include different levels of granularity (e.g., daily, hourly)
Data cleaning and preprocessing are essential before model training
Regular updates are recommended to maintain the accuracy and relevance of the data
Consumers from countries in Africa, Asia, and South America were most likely to be an owner of cryptocurrencies, such as Bitcoin, in 2025. This conclusion can be reached after combining ** different surveys from the Statista's Consumer Insights over the course of that year. Nearly one out of three respondents to Statista's survey in Nigeria, for instance, mentioned they either owned or use a digital coin, rather than *** out of 100 respondents in the United States. This is a significant change from a list that looks at the Bitcoin (BTC) trading volume in ** countries: There, the United States and Russia were said to have traded the highest amounts of this particular virtual coin. Nevertheless, African and Latin American countries are noticeable entries in that list too. Daily use, or an investment tool? The survey asked whether consumers either owned or used cryptocurrencies but does not specify their exact use or purpose. Some countries, however, are more likely to use digital currencies on a day-to-day basis. Nigeria increasingly uses mobile money operations to either pay in stores or to send money to family and friends. Polish consumers could buy several types of products with a cryptocurrency in 2019. Opposed to this is the country of Vietnam: Here, the use of Bitcoin and other cryptocurrencies as a payment method is forbidden. Owning some form of cryptocurrency in Vietnam as an investment is allowed, however. Which countries are more likely to invest in cryptocurrencies? Professional investors looking for a cryptocurrency-themed ETF were more often found in Europe than in the United or China, according to a survey in early 2020. Most of the largest crypto hedge fund managers with a location in Europe in 2020, were either from the United Kingdom or Switzerland - the country with the highest cryptocurrency adoption rate in Europe according to Statista's Global Consumer Survey. Whether this had changed by 2025 was not yet clear.