https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains historical price data for Bitcoin (BTC/USDT) from January 1, 2018, to the present. The data is sourced using the Binance API, providing granular candlestick data in four timeframes: - 15-minute (15M) - 1-hour (1H) - 4-hour (4H) - 1-day (1D)
This dataset includes the following fields for each timeframe: - Open time: The timestamp for when the interval began. - Open: The price of Bitcoin at the beginning of the interval. - High: The highest price during the interval. - Low: The lowest price during the interval. - Close: The price of Bitcoin at the end of the interval. - Volume: The trading volume during the interval. - Close time: The timestamp for when the interval closed. - Quote asset volume: The total quote asset volume traded during the interval. - Number of trades: The number of trades executed within the interval. - Taker buy base asset volume: The volume of the base asset bought by takers. - Taker buy quote asset volume: The volume of the quote asset spent by takers. - Ignore: A placeholder column from Binance API, not used in analysis.
Binance API: Used for retrieving 15-minute, 1-hour, 4-hour, and 1-day candlestick data from 2018 to the present.
This dataset is automatically updated every day using a custom Python program.
The source code for the update script is available on GitHub:
🔗 Bitcoin Dataset Kaggle Auto Updater
This dataset is provided under the CC0 Public Domain Dedication. It is free to use for any purpose, with no restrictions on usage or redistribution.
DOGE started it. SHIB took it mainstream. BONK and PEPE brought in the crowds. Now what?
Stay on top of the entire meme coin ecosystem through CoinAPI's comprehensive data feeds. We've connected to 350+ exchanges so you don't have to, bringing together every significant market into one unified API that actually works when you need it. Dig into historical patterns that shaped today's meme coin landscape. Compare volume spikes across different tokens during viral moments. Track institutional entry points that transformed joke coins into serious market movers.
From quick price checks to in-depth research projects, our institutional-grade precision helps you navigate this volatile but opportunity-rich corner of the crypto market. With Digital Asset Data complete market coverage, you'll never miss a beat. Serious data for not-so-serious coins. That's the CoinAPI difference
➡️ Why choose us?
📊 Market Coverage & Data Types: ◦ Real-time and historical data since 2010 (for chosen assets) ◦ Full order book depth (L2/L3) ◦ Trade-by-trade data ◦ OHLCV across multiple timeframes ◦ Market indexes (VWAP, PRIMKT) ◦ Exchange rates with fiat pairs ◦ Spot, futures, options, and perpetual contracts ◦ Coverage of 90%+ global trading volume ◦ Full Crypto Trade Data
🔧 Technical Excellence: ◦ 99,9% uptime guarantee ◦ Multiple delivery methods: REST, WebSocket, FIX, S3 ◦ Standardized data format across exchanges ◦ Ultra-low latency data streaming ◦ Detailed documentation ◦ Custom integration assistance
CoinAPI represents the gold standard in cryptocurrency data, trusted by leading financial institutions, technology providers, and market makers worldwide. By combining technology with rigorous data validation protocols, we provide the foundation upon which many financial products are being built.
CoinAPI gives you comprehensive crypto futures data and derivatives data from exchanges around the world. We track futures contracts, perpetual swaps, and options markets, delivering both real-time updates and historical information for complete market analysis.
Our system captures all the key metrics derivative traders care about - from open interest and funding rates to trading volumes and order book depth. This detailed market intelligence helps you spot opportunities and manage risk more effectively.
Getting connected is straightforward - choose from REST API for flexible queries, WebSocket for instant market updates, or FIX protocol for institutional-grade integration. We support both real-time streaming for active trading and historical data access for backtesting and research.
Why work with us?
Market Coverage & Data Types: - Real-time and historical data since 2010 (for chosen assets) - Full order book depth (L2/L3) - Tick-by-tick data - OHLCV across multiple timeframes - Market indexes (VWAP, PRIMKT) - Exchange rates with fiat pairs - Spot, futures, options, and perpetual contracts (Crypto Derivatives Data) - Coverage of 90%+ global trading volume
Technical Excellence: - 99,9% uptime guarantee - Multiple delivery methods: REST, WebSocket, FIX, S3 - Standardized data format across exchanges - Ultra-low latency data streaming - Detailed documentation - Custom integration assistance
Whether you're building automated trading systems, conducting market research, or managing investment risk, our reliable data provides the solid foundation you need. Professional traders trust our information because it delivers the accuracy and consistency required for serious derivatives trading.
CoinAPI delivers complete crypto market data with full price history and trading volumes. Access in-depth analytics and historical insights through simple export options via flat files and S3 API. Our extensive trading data integrates easily with your analytics tools for better market understanding.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cryptocurrency historical datasets from January 2012 (if available) to October 2021 were obtained and integrated from various sources and Application Programming Interfaces (APIs) including Yahoo Finance, Cryptodownload, CoinMarketCap, various Kaggle datasets, and multiple APIs. While these datasets used various formats of time (e.g., minutes, hours, days), in order to integrate the datasets days format was used for in this research study. The integrated cryptocurrency historical datasets for 80 cryptocurrencies including but not limited to Bitcoin (BTC), Ethereum (ETH), Binance Coin (BNB), Cardano (ADA), Tether (USDT), Ripple (XRP), Solana (SOL), Polkadot (DOT), USD Coin (USDC), Dogecoin (DOGE), Tron (TRX), Bitcoin Cash (BCH), Litecoin (LTC), EOS (EOS), Cosmos (ATOM), Stellar (XLM), Wrapped Bitcoin (WBTC), Uniswap (UNI), Terra (LUNA), SHIBA INU (SHIB), and 60 more cryptocurrencies were uploaded in this online Mendeley data repository. Although the primary attribute of including the mentioned cryptocurrencies was the Market Capitalization, a subject matter expert i.e., a professional trader has also guided the initial selection of the cryptocurrencies by analyzing various indicators such as Relative Strength Index (RSI), Moving Average Convergence/Divergence (MACD), MYC Signals, Bollinger Bands, Fibonacci Retracement, Stochastic Oscillator and Ichimoku Cloud. The primary features of this dataset that were used as the decision-making criteria of the CLUS-MCDA II approach are Timestamps, Open, High, Low, Closed, Volume (Currency), % Change (7 days and 24 hours), Market Cap and Weighted Price values. The available excel and CSV files in this data set are just part of the integrated data and other databases, datasets and API References that was used in this study are as follows: [1] https://finance.yahoo.com/ [2] https://coinmarketcap.com/historical/ [3] https://cryptodatadownload.com/ [4] https://kaggle.com/philmohun/cryptocurrency-financial-data [5] https://kaggle.com/deepshah16/meme-cryptocurrency-historical-data [6] https://kaggle.com/sudalairajkumar/cryptocurrencypricehistory [7] https://min-api.cryptocompare.com/data/price?fsym=BTC&tsyms=USD [8] https://min-api.cryptocompare.com/ [9] https://p.nomics.com/cryptocurrency-bitcoin-api [10] https://www.coinapi.io/ [11] https://www.coingecko.com/en/api [12] https://cryptowat.ch/ [13] https://www.alphavantage.co/ This dataset is part of the CLUS-MCDA (Cluster analysis for improving Multiple Criteria Decision Analysis) and CLUS-MCDAII Project: https://aimaghsoodi.github.io/CLUSMCDA-R-Package/ https://github.com/Aimaghsoodi/CLUS-MCDA-II https://github.com/azadkavian/CLUS-MCDA
CoinAPI's Level 1 Crypto Quote Data delivers essential digital asset market intelligence, capturing real-time bid/ask prices and volumes across 350+ exchanges including both CEX and DEX platforms.
This comprehensive data stream provides precise market snapshots with microsecond-accurate timestamps, perfect for applications demanding rapid price discovery and effective market monitoring.
Designed for minimal latency and maximum update frequency, our feed powers everything from sophisticated trading algorithms and responsive price widgets to in-depth market analysis tools.
You can access data through FIX or WebSocket for instant streaming or REST API for historical analysis and backtesting.
Why work with us?
Market Coverage & Data Types: - Real-time and historical data since 2010 (for chosen assets) - Full order book depth (L2/L3) - Tick-by-tick data - OHLCV across multiple timeframes - Market indexes (VWAP, PRIMKT) - Exchange rates with fiat pairs - Spot, futures, options, and perpetual contracts - Coverage of 90%+ global trading volume - Full Cryptocurrency Investor Data
Technical Excellence: - 99,9% uptime guarantee - Multiple delivery methods: REST, WebSocket, FIX, S3 - Standardized data format across exchanges - Ultra-low latency data streaming - Detailed documentation - Custom integration assistance
CoinAPI is trusted by financial institutions, trading firms, hedge funds, researchers, and technology developers worldwide. We provide reliable cryptocurrency market data through our commitment to quality and technical performance.
This dataset is an extra updating dataset for the G-Research Crypto Forecasting competition.
This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of the 1-minute resolution, collected for all competition assets and both retrieval and uploading are fully automated. see discussion topic.
For every asset in the competition, the following fields from Binance's official API endpoint for historical candlestick data are collected, saved, and processed.
1. **timestamp** - A timestamp for the minute covered by the row.
2. **Asset_ID** - An ID code for the cryptoasset.
3. **Count** - The number of trades that took place this minute.
4. **Open** - The USD price at the beginning of the minute.
5. **High** - The highest USD price during the minute.
6. **Low** - The lowest USD price during the minute.
7. **Close** - The USD price at the end of the minute.
8. **Volume** - The number of cryptoasset u units traded during the minute.
9. **VWAP** - The volume-weighted average price for the minute.
10. **Target** - 15 minute residualized returns. See the 'Prediction and Evaluation section of this notebook for details of how the target is calculated.
11. **Weight** - Weight, defined by the competition hosts [here](https://www.kaggle.com/cstein06/tutorial-to-the-g-research-crypto-competition)
12. **Asset_Name** - Human readable Asset name.
The dataframe is indexed by timestamp
and sorted from oldest to newest.
The first row starts at the first timestamp available on the exchange, which is July 2017 for the longest-running pairs.
The following is a collection of simple starter notebooks for Kaggle's Crypto Comp showing PurgedTimeSeries in use with the collected dataset. Purged TimesSeries is explained here. There are many configuration variables below to allow you to experiment. Use either GPU or TPU. You can control which years are loaded, which neural networks are used, and whether to use feature engineering. You can experiment with different data preprocessing, model architecture, loss, optimizers, and learning rate schedules. The extra datasets contain the full history of the assets in the same format as the competition, so you can input that into your model too.
These notebooks follow the ideas presented in my "Initial Thoughts" here. Some code sections have been reused from Chris' great (great) notebook series on SIIM ISIC melanoma detection competition here
This is a work in progress and will be updated constantly throughout the competition. At the moment, there are some known issues that still needed to be addressed:
Opening price with an added indicator (MA50):
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fb8664e6f26dc84e9a40d5a3d915c9640%2Fdownload.png?generation=1582053879538546&alt=media" alt="">
Volume and number of trades:
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fcd04ed586b08c1576a7b67d163ad9889%2Fdownload-1.png?generation=1582053899082078&alt=media" alt="">
This data is being collected automatically from the crypto exchange Binance.
Cryptocurrency options markets have grown increasingly sophisticated, requiring reliable data infrastructure to support trading and analysis. Our platform gives you direct access to comprehensive crypto options data through straightforward API connections.
We capture the complete options chain across major crypto derivatives exchanges, delivering real-time and historical cryptocurrency market data that shows exactly what's happening in these complex markets. Each options contract is tracked with precision - strikes, expiration dates, premiums, open interest, and volume metrics all accessible through our standardized data feeds.
The data is available through multiple integration methods depending on your needs. Use our REST API for flexible queries and historical analysis, WebSocket for real-time market monitoring, or FIX protocol for institutional-grade connectivity with minimal latency.
Why work with us?
Market Coverage & Data Types: - Real-time and historical data since 2010 (for chosen assets) - Full order book depth (L2/L3) - Tick-by-tick data - OHLCV across multiple timeframes - Market indexes (VWAP, PRIMKT) - Exchange rates with fiat pairs - Spot, futures, options, and perpetual contracts - Coverage of 90%+ global trading volume
Technical Excellence: - 99% uptime guarantee - Multiple delivery methods: REST, WebSocket, FIX, S3 - Standardized data format across exchanges - Ultra-low latency data streaming - Detailed documentation - Custom integration assistance
When options traders need reliable market intelligence, they don't leave it to chance. That's why trading desks across five continents, quantitative hedge funds managing billions, and fintech innovators building tomorrow's trading platforms all rely on our data infrastructure. We've established ourselves as a dependable source in a market where accuracy isn't just preferred - it's essential. While others promise comprehensive coverage, we deliver it consistently, trade after trade, day after day.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
On a quest to compare different cryptoexchanges, I came up with the idea to compare metrics across multiple platforms (at the moment just two). CoinGecko and CoinMarketCap are two of the biggest websites for monitoring both exchanges and cryptoprojects. In response to over-inflated volumes faked by crypto exchanges, both websites came up with independent metrics for assessing the worth of a given exchange.
Collected on May 10, 2020
CoinGecko's data is a bit more holistic, containing metrics across a multitude of areas (you can read more in the original blog post here. The data from CoinGecko consists of the following:
-Exchange Name -Trust Score (on a scale of N/A-10) -Type (centralized/decentralized) -AML (risk: How well prepared are they to handle financial crime?) -API Coverage (Blanket Measure that includes: (1) Tickers Data (2) Historical Trades Data (3) Order Book Data (4) Candlestick/OHLC (5) WebSocket API (6) API Trading (7) Public Documentation -API Last Updated (When was the API last updated?) -Bid Ask Spread (Average buy/sell spread across all pairs) -Candlestick (Available/Not) -Combined Orderbook Percentile (See above link) -Estimated_Reserves (estimated holdings of major crypto) -Grade_Score (Overall API score) -Historical Data (available/not) -Jurisdiction Risk (risk: risk of Terrorist activity/bribery/corruption?) -KYC Procedures (risk: Know Your Customer?) -License and Authorization (risk: has exchange sought regulatory approval?) -Liquidity (don't confuse with "CMC Liquidity". THIS column is a combo of (1) Web traffic & Reported Volume (2) Order book spread (3) Trading Activity (4) Trust Score on Trading Pairs -Negative News (risk: any bad news?) -Normalized Trading Volume (Trading Volume normalized to web traffic) -Normalized Volume Percentile (see above blog link) -Orderbook (available/not) -Public Documentation (got well documented API available to everyone?) -Regulatory Compliance (risk rating from compliance perspective) -Regulatory last updated (last time regulatory metrics were updated) -Reported Trading Volume (volume as listed by the exchange) -Reported Normalized Trading Volume (Ratio of normalized to reported volume [0-1]) -Sanctions (risk: risk of sanctions?) -Scale (based on: (1) Normalized Trading Volume Percentile (2) Normalized Order Book Depth Percentile -Senior Public Figure (risk: does exchange have transparent public relations? etc) -Tickers (tick tick tick...) -Trading via API (can data be traded through the API?) -Websocket (got websockets?)
-Green Pairs (Percentage of trading pairs deemed to have good liquidity) -Yellow Pairs (Percentage of trading pairs deemed to have fair liquidity -Red Pairs (Percentage of trading pairs deemed to have poor liquidity) -Unknown Pairs (percentage of trading pairs that do not have sufficient order book data)
~
Again, CoinMarketCap only has one metric (that was recently updated and scales from 1-1000, 1000 being very liquid and 1 not. You can go check the article out for yourself. In the dataset, this is the "CMC Liquidity" column, not to be confused with the "Liquidity" column, which refers to the CoinGecko Metric!
Thanks to coingecko and cmc for making their data scrapable :)
[CMC, you should try to give us a little more access to the figures that define your metric. Thanks!]
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
CoinAPI's crypto OHLCV and trade data give you the complete picture of market activity across more than 350 exchanges worldwide. Our candlestick data covers everything from 1-second intervals for scalping to monthly timeframes for trend analysis, ensuring you have the right level of detail for your trading approach.
Each candlestick provides the essential price information traders rely on - open, high, low, and close prices - along with corresponding volume data that shows the market strength behind each move. This combination of price action and trading volume creates the foundation for effective technical analysis and trading decisions.
Getting this data is straightforward - use our WebSocket streams for real-time market monitoring when every second counts, or access historical candlesticks through our REST API when you're conducting deeper market research or backtesting strategies. We maintain comprehensive historical records, giving you the ability to analyze patterns across different market cycles.
Why work with us?
Market Coverage & Data Types: - Full Cryptocurrency Data - Real-time and historical data since 2010 (for chosen assets) - Full order book depth (L2/L3) - Tick-by-tick data - OHLCV across multiple timeframes - Market indexes (VWAP, PRIMKT) - Exchange rates with fiat pairs - Spot, futures, options, and perpetual contracts - Coverage of 90%+ global trading volume
Technical Excellence: - 99% uptime guarantee - Multiple delivery methods: REST, WebSocket, FIX, S3 - Standardized data format across exchanges - Ultra-low latency data streaming - Detailed documentation - Custom integration assistance
Whether you're building algorithmic trading systems, conducting research, or creating visualization tools, our real-time and historical candlesticks from exchanges worldwide provide the reliable market data you need
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Crypto Fear and Greed Index’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/adelsondias/crypto-fear-and-greed-index on 13 February 2022.
--- Dataset description provided by original source is as follows ---
Each day, the website https://alternative.me/crypto/fear-and-greed-index/ publishes this index based on analysis of emotions and sentiments from different sources crunched into one simple number: The Fear & Greed Index for Bitcoin and other large cryptocurrencies.
The crypto market behaviour is very emotional. People tend to get greedy when the market is rising which results in FOMO (Fear of missing out). Also, people often sell their coins in irrational reaction of seeing red numbers. With our Fear and Greed Index, we try to save you from your own emotional overreactions. There are two simple assumptions:
Therefore, we analyze the current sentiment of the Bitcoin market and crunch the numbers into a simple meter from 0 to 100. Zero means "Extreme Fear", while 100 means "Extreme Greed". See below for further information on our data sources.
We are gathering data from the five following sources. Each data point is valued the same as the day before in order to visualize a meaningful progress in sentiment change of the crypto market.
First of all, the current index is for bitcoin only (we offer separate indices for large alt coins soon), because a big part of it is the volatility of the coin price.
But let’s list all the different factors we’re including in the current index:
We’re measuring the current volatility and max. drawdowns of bitcoin and compare it with the corresponding average values of the last 30 days and 90 days. We argue that an unusual rise in volatility is a sign of a fearful market.
Also, we’re measuring the current volume and market momentum (again in comparison with the last 30/90 day average values) and put those two values together. Generally, when we see high buying volumes in a positive market on a daily basis, we conclude that the market acts overly greedy / too bullish.
While our reddit sentiment analysis is still not in the live index (we’re still experimenting some market-related key words in the text processing algorithm), our twitter analysis is running. There, we gather and count posts on various hashtags for each coin (publicly, we show only those for Bitcoin) and check how fast and how many interactions they receive in certain time frames). A unusual high interaction rate results in a grown public interest in the coin and in our eyes, corresponds to a greedy market behaviour.
Together with strawpoll.com (disclaimer: we own this site, too), quite a large public polling platform, we’re conducting weekly crypto polls and ask people how they see the market. Usually, we’re seeing 2,000 - 3,000 votes on each poll, so we do get a picture of the sentiment of a group of crypto investors. We don’t give those results too much attention, but it was quite useful in the beginning of our studies. You can see some recent results here.
The dominance of a coin resembles the market cap share of the whole crypto market. Especially for Bitcoin, we think that a rise in Bitcoin dominance is caused by a fear of (and thus a reduction of) too speculative alt-coin investments, since Bitcoin is becoming more and more the safe haven of crypto. On the other side, when Bitcoin dominance shrinks, people are getting more greedy by investing in more risky alt-coins, dreaming of their chance in next big bull run. Anyhow, analyzing the dominance for a coin other than Bitcoin, you could argue the other way round, since more interest in an alt-coin may conclude a bullish/greedy behaviour for that specific coin.
We pull Google Trends data for various Bitcoin related search queries and crunch those numbers, especially the change of search volumes as well as recommended other currently popular searches. For example, if you check Google Trends for "Bitcoin", you can’t get much information from the search volume. But currently, you can see that there is currently a +1,550% rise of the query „bitcoin price manipulation“ in the box of related search queries (as of 05/29/2018). This is clearly a sign of fear in the market, and we use that for our index.
There's a story behind every dataset and here's your opportunity to share yours.
This dataset is produced and maintained by the administrators of https://alternative.me/crypto/fear-and-greed-index/.
This published version is an unofficial copy of their data, which can be also collected using their API (e.g., GET https://api.alternative.me/fng/?limit=10&format=csv&date_format=us).
--- Original source retains full ownership of the source dataset ---
Our Market Data covers historical and real-time data. For CEXs, our data spans back to 2015, and for DEXs, we cover since the genesis trade. We cover every instrument on any exchange, so if it's traded, we cover it.
We understand you need to access the data you want, when and where you need it. With this in mind, we built our Market Data with several delivery options, including a robust streaming service offering the most advanced live data distribution in the cryptocurrency industry, as well as REST API, CSV via cloud services, and BigQuery.
Our Market Data empowers traders, analysts, and financial institutions with the insights needed to navigate the complex derivatives market effectively.
| Use Cases | Backtesting Hedging Trading Strategies Risk Management Regulatory Compliance
| Why work with us? |
A proven enterprise-grade solution We prioritize the needs of enterprises in our product development, ensuring our solutions meet the requirements of larger organizations seeking best-in-class crypto data.
A UI-free approach to crypto data We recognize the importance of flexibility when it comes to crypto data, and so we offer you complete freedom by taking a UI-free approach to data delivery. This gives you total control over how you use and interpret the data, reducing friction and streamlining workflows.
Flexible to meet your needs Flexibility lies at the heart of our product and is fundamental to how crypto data can deliver value across industries and use cases. Living this philosophy, we’re always building custom options that can help you achieve your specific objectives. Whether it’s tailoring a package to meet your requirements, or adapting infrastructure to support your use case, our data and product teams are on-hand to help you find the best way to achieve your priority outcomes.
This dataset is an extra updating dataset for the G-Research Crypto Forecasting competition.
This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of the 1-minute resolution, collected for all competition assets and both retrieval and uploading are fully automated. see discussion topic.
For every asset in the competition, the following fields from Binance's official API endpoint for historical candlestick data are collected, saved, and processed.
1. **timestamp** - A timestamp for the minute covered by the row.
2. **Asset_ID** - An ID code for the cryptoasset.
3. **Count** - The number of trades that took place this minute.
4. **Open** - The USD price at the beginning of the minute.
5. **High** - The highest USD price during the minute.
6. **Low** - The lowest USD price during the minute.
7. **Close** - The USD price at the end of the minute.
8. **Volume** - The number of cryptoasset u units traded during the minute.
9. **VWAP** - The volume-weighted average price for the minute.
10. **Target** - 15 minute residualized returns. See the 'Prediction and Evaluation section of this notebook for details of how the target is calculated.
11. **Weight** - Weight, defined by the competition hosts [here](https://www.kaggle.com/cstein06/tutorial-to-the-g-research-crypto-competition)
12. **Asset_Name** - Human readable Asset name.
The dataframe is indexed by timestamp
and sorted from oldest to newest.
The first row starts at the first timestamp available on the exchange, which is July 2017 for the longest-running pairs.
The following is a collection of simple starter notebooks for Kaggle's Crypto Comp showing PurgedTimeSeries in use with the collected dataset. Purged TimesSeries is explained here. There are many configuration variables below to allow you to experiment. Use either GPU or TPU. You can control which years are loaded, which neural networks are used, and whether to use feature engineering. You can experiment with different data preprocessing, model architecture, loss, optimizers, and learning rate schedules. The extra datasets contain the full history of the assets in the same format as the competition, so you can input that into your model too.
These notebooks follow the ideas presented in my "Initial Thoughts" here. Some code sections have been reused from Chris' great (great) notebook series on SIIM ISIC melanoma detection competition here
This is a work in progress and will be updated constantly throughout the competition. At the moment, there are some known issues that still needed to be addressed:
Opening price with an added indicator (MA50):
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fb8664e6f26dc84e9a40d5a3d915c9640%2Fdownload.png?generation=1582053879538546&alt=media" alt="">
Volume and number of trades:
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fcd04ed586b08c1576a7b67d163ad9889%2Fdownload-1.png?generation=1582053899082078&alt=media" alt="">
This data is being collected automatically from the crypto exchange Binance.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
3MEth Dataset OverviewSection 1: Token TransactionsThis section provides 303 million transaction records from 3,880 tokens and 35 million users on the Ethereum blockchain. The data is stored in 3,880 CSV files, each representing a specific token. Each transaction includes the following information:Sender and receiver wallet addresses: Enables network analysis and user behavior studies.Token address: Links transactions to specific tokens for token-specific analysis.Transaction value: Reflects the number of tokens transferred, essential for liquidity studies.Blockchain timestamp: Captures transaction timing for temporal analysis.Apart from the large dataset, we also provide a smaller CSV file containing 267,242 transaction records from 29,164 wallet addresses. This smaller dataset involves a total of 1,194 tokens, covering the time period September 2016 to November 2023. This detailed transaction data is critical for studying user behavior, liquidity patterns, and tasks such as link prediction and fraud detection.Section 2: Token InformationThis section offers metadata for 3,880 tokens, stored in corresponding CSV files. Each file contains:Timestamp: Marks the time of data update.Token price: Useful for price prediction and volatility studies.Market capitalization: Reflects the token's market size and dominance.24-hour trading volume: Indicates liquidity and trading activity.Section 3: Global Market IndicesThis section provides macro-level data to contextualize token transactions, stored in separate CSV files. Key indicators include:Bitcoin dominance: Tracks Bitcoin's share of the cryptocurrency market.Total market capitalization: Measures the overall market's value, with breakdowns by token type.Stablecoin market capitalization: Highlights stablecoin liquidity and stability.24-hour trading volume: A key measure of market activity.These indices are essential for integrating global market trends into predictive models for volatility and risk-adjusted returns.Section 4: Textual IndicesThis section contains sentiment data from Reddit's Ethereum community, covering 7,800 top posts from 2014 to 2024. Each post includes:Post score (net upvotes): Reflects engagement and sentiment strength.Timestamp: Aligns sentiment with price movements.Number of comments: Gauges sentiment intensity.Sentiment indices: Sentiment scores computed using methods detailed in the data preprocessing section.The full Reddit textual dataset is available upon request; please contact us for access. Alternatively our open-source repository includes a tool to guide users in collecting Reddit data. Researchers are encouraged to apply for a Reddit API Key and adhere to Reddit's policies. This data is valuable for understanding social dynamics in the market and enhancing sentiment analysis models that can explain market movements and improve behavioral predictions.
Get complete Meme Coin Market Data with CoinAPI. Track DOGE, SHIB, BONK, PEPE, and more across 350+ exchanges through our unified API. Explore historical volumes and trades with institutional-grade precision. Discover our Digital Asset Data landscape.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Through Telegram API, the authors collected this database over four months ago. These data are Telegram's comments of over eight professional Telegram channels about cryptocurrencies from December 2023 to March 2024. The theory of Behavioral economics shows that the opinions of people, especially experts, can impact the stock market trend (here, cryptocurrencies). Existing databases often cover tweets or Telegram's comments on one or more cryptocurrencies. Also, in these databases, no attention is paid to the user's expertise, and most of the data is extracted using hashtags. Failure to pay attention to the user's expertise causes the irrelevant volume to increase and the neutral polarity considerably. This database has a main table with eight columns. The columns of the main table are explained in the attached document. Researchers can use this dataset in various machine learning tasks, such as sentiment analysis and deep transfer learning with sentiment analysis. Also, this data can be used to check the impact of influencers' opinions on the cryptocurrency market trend. The use of this database is allowed by mentioning the source. Furthermore, we have added Python code to extract Telegram's comments. We used the RoBERTa pre-trained deep neural network and BiGRU deep neural network with an attention layer-based HDRB model(https://ieeexplore.ieee.org/document/10292644) for sentiment analysis.
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Market Analysis for Blockchain API Management Platforms The global blockchain API management platform market is projected to reach USD XXX million by 2033, exhibiting a CAGR of XX% over the forecast period (2025-2033). The surge in demand for blockchain technology in various industries, such as supply chain, data security, and the Internet of Things, is driving market growth. Additionally, the increasing need for enhanced security and data management capabilities in blockchain applications is fueling the adoption of API management platforms. The market is segmented by type (on-premise and cloud-based) and application (supply chain, data security, smart contracts, and Internet of Things). The cloud-based segment is anticipated to dominate due to its scalability, flexibility, and cost-effectiveness. Key industry players include IBM, Microsoft, Amazon, Oracle, and BlockCypher. Regional analysis reveals that North America currently holds the largest market share, followed by Europe and Asia Pacific. The growing adoption of blockchain in sectors like healthcare, finance, and government is contributing to market expansion in these regions. However, challenges such as data privacy concerns and regulatory uncertainties may hinder market growth in certain regions.
https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order
The Blockchain API Management Platform market is emerging as a pivotal sector within the rapidly evolving landscape of digital transformation, leveraging blockchain technology to provide secure, efficient, and scalable solutions for various industries. As businesses increasingly seek to integrate blockchain capabili
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Authors, through Twitter API, collected this database over eight months. These data are tweets of over 50 experts regarding market analysis of 40 cryptocurrencies. These experts are known as influencers on social networks such as Twitter. The theory of Behavioral economics shows that the opinions of people, especially experts, can impact the stock market trend (here, cryptocurrencies). Existing databases often cover tweets related to one or more cryptocurrencies. Also, in these databases, no attention is paid to the user's expertise, and most of the data is extracted using hashtags. Failure to pay attention to the user's expertise causes the irrelevant volume to increase and the neutral polarity to increase considerably. This database has a main table named "Tweets1" with 11 columns and 40 tables to separate comments related to each cryptocurrency. The columns of the main table and the cryptocurrency tables are explained in the attached document. Researchers can use this dataset in various machine learning tasks, such as sentiment analysis and deep transfer learning with sentiment analysis. Also, this data can be used to check the impact of influencers' opinions on the cryptocurrency market trend. The use of this database is allowed by mentioning the source. Also, in this version, we have added the excel version of the database and Python code to extract the names of influencers and tweets. in Version(3): In the new version, three datasets related to historical prices and sentiments related to Bitcoin, Ethereum, and Binance have been added as Excel files from January 1, 2023, to June 12, 2023. Also, two datasets of 52 influential tweets in cryptocurrencies have been published, along with the score and polarity of sentiments regarding more than 300 cryptocurrencies from February 2021 to June 2023. Also, two Python codes related to the sentiment analysis algorithm of tweets with Python have been published. This algorithm combines RoBERTa pre-trained deep neural network and BiGRU deep neural network with an attention layer (see code Preprocessing_and_sentiment_analysis with python).
Algorithmic trading demands data that's both comprehensive and precise. CoinAPI delivers exactly this - institutional-grade cryptocurrency data spanning 350+ global exchanges through a unified API infrastructure that scales with your trading operation.
For high-frequency strategies where microseconds matter, our trade feeds provide the timestamp precision and delivery consistency required for effective execution. Our platform captures Bitcoin price data alongside 800+ other cryptocurrencies, ensuring complete market coverage for both established and emerging digital assets.
➡️ Why choose us?
📊 Market Coverage & Data Types: ◦ Real-time and historical data since 2010 (for chosen assets) ◦ Full order book depth (L2/L3) ◦ Trade-by-trade data ◦ OHLCV across multiple timeframes ◦ Market indexes (VWAP, PRIMKT) ◦ Exchange rates with fiat pairs ◦ Spot, futures, options, and perpetual contracts ◦ Coverage of 90%+ global trading volume
🔧 Technical Excellence: ◦ 99,9% uptime guarantee ◦ Multiple delivery methods: REST, WebSocket, FIX, S3 ◦ Standardized data format across exchanges ◦ Ultra-low latency data streaming ◦ Detailed documentation ◦ Custom integration assistance
Whether you're deploying latency-sensitive algorithms or developing longer-term systematic strategies, CoinAPI provides the reliable data foundation that professional cryptocurrency trading requires. From market microstructure analysis to strategy backtesting, our unified historical and real-time feeds support the complete algorithmic trading lifecycle.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains historical price data for Bitcoin (BTC/USDT) from January 1, 2018, to the present. The data is sourced using the Binance API, providing granular candlestick data in four timeframes: - 15-minute (15M) - 1-hour (1H) - 4-hour (4H) - 1-day (1D)
This dataset includes the following fields for each timeframe: - Open time: The timestamp for when the interval began. - Open: The price of Bitcoin at the beginning of the interval. - High: The highest price during the interval. - Low: The lowest price during the interval. - Close: The price of Bitcoin at the end of the interval. - Volume: The trading volume during the interval. - Close time: The timestamp for when the interval closed. - Quote asset volume: The total quote asset volume traded during the interval. - Number of trades: The number of trades executed within the interval. - Taker buy base asset volume: The volume of the base asset bought by takers. - Taker buy quote asset volume: The volume of the quote asset spent by takers. - Ignore: A placeholder column from Binance API, not used in analysis.
Binance API: Used for retrieving 15-minute, 1-hour, 4-hour, and 1-day candlestick data from 2018 to the present.
This dataset is automatically updated every day using a custom Python program.
The source code for the update script is available on GitHub:
🔗 Bitcoin Dataset Kaggle Auto Updater
This dataset is provided under the CC0 Public Domain Dedication. It is free to use for any purpose, with no restrictions on usage or redistribution.