Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In March 2024 Bitcoin BTC reached a new all-time high with prices exceeding 73000 USD marking a milestone for the cryptocurrency market This surge was due to the approval of Bitcoin exchange-traded funds ETFs in the United States allowing investors to access Bitcoin without directly holding it This development increased Bitcoin’s credibility and brought fresh demand from institutional investors echoing previous price surges in 2021 when Tesla announced its 15 billion investment in Bitcoin and Coinbase was listed on the Nasdaq By the end of 2022 Bitcoin prices dropped sharply to 15000 USD following the collapse of cryptocurrency exchange FTX and its bankruptcy which caused a loss of confidence in the market By August 2024 Bitcoin rebounded to approximately 64178 USD but remained volatile due to inflation and interest rate hikes Unlike fiat currency like the US dollar Bitcoin’s supply is finite with 21 million coins as its maximum supply By September 2024 over 92 percent of Bitcoin had been mined Bitcoin’s value is tied to its scarcity and its mining process is regulated through halving events which cut the reward for mining every four years making it harder and more energy-intensive to mine The next halving event in 2024 will reduce the reward to 3125 BTC from its current 625 BTC The final Bitcoin is expected to be mined around 2140 The energy required to mine Bitcoin has led to criticisms about its environmental impact with estimates in 2021 suggesting that one Bitcoin transaction used as much energy as Argentina Bitcoin’s future price is difficult to predict due to the influence of large holders known as whales who own about 92 percent of all Bitcoin These whales can cause dramatic market swings by making large trades and many retail investors still dominate the market While institutional interest has grown it remains a small fraction compared to retail Bitcoin is vulnerable to external factors like regulatory changes and economic crises leading some to believe it is in a speculative bubble However others argue that Bitcoin is still in its early stages of adoption and will grow further as more institutions and governments recognize its potential as a hedge against inflation and a store of value 2024 has also seen the rise of Bitcoin Layer 2 technologies like the Lightning Network which improve scalability by enabling faster and cheaper transactions These innovations are crucial for Bitcoin’s wider adoption especially for day-to-day use and cross-border remittances At the same time central bank digital currencies CBDCs are gaining traction as several governments including China and the European Union have accelerated the development of their own state-controlled digital currencies while Bitcoin remains decentralized offering financial sovereignty for those who prefer independence from government control The rise of CBDCs is expected to increase interest in Bitcoin as a hedge against these centralized currencies Bitcoin’s journey in 2024 highlights its growing institutional acceptance alongside its inherent market volatility While the approval of Bitcoin ETFs has significantly boosted interest the market remains sensitive to events like exchange collapses and regulatory decisions With the limited supply of Bitcoin and improvements in its transaction efficiency it is expected to remain a key player in the financial world for years to come Whether Bitcoin is currently in a speculative bubble or on a sustainable path to greater adoption will ultimately be revealed over time.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Blockchain technology, first implemented by Satoshi Nakamoto in 2009 as a core component of Bitcoin, is a distributed, public ledger recording transactions. Its usage allows secure peer-to-peer communication by linking blocks containing hash pointers to a previous block, a timestamp, and transaction data. Bitcoin is a decentralized digital currency (cryptocurrency) which leverages the Blockchain to store transactions in a distributed manner in order to mitigate against flaws in the financial industry.
Nearly ten years after its inception, Bitcoin and other cryptocurrencies experienced an explosion in popular awareness. The value of Bitcoin, on the other hand, has experienced more volatility. Meanwhile, as use cases of Bitcoin and Blockchain grow, mature, and expand, hype and controversy have swirled.
In this dataset, you will have access to information about blockchain blocks and transactions. All historical data are in the bigquery-public-data:crypto_bitcoin
dataset. It’s updated it every 10 minutes. The data can be joined with historical prices in kernels. See available similar datasets here: https://www.kaggle.com/datasets?search=bitcoin.
You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.crypto_bitcoin.[TABLENAME]
. Fork this kernel to get started.
Allen Day (Twitter | Medium), Google Cloud Developer Advocate & Colin Bookman, Google Cloud Customer Engineer retrieve data from the Bitcoin network using a custom client available on GitHub that they built with the bitcoinj
Java library. Historical data from the origin block to 2018-01-31 were loaded in bulk to two BigQuery tables, blocks_raw and transactions. These tables contain fresh data, as they are now appended when new blocks are broadcast to the Bitcoin network. For additional information visit the Google Cloud Big Data and Machine Learning Blog post "Bitcoin in BigQuery: Blockchain analytics on public data".
Photo by Andre Francois on Unsplash.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains daily historical market data for Bitcoin (BTC) priced in USD, spanning 10 years from Origin till 2024-05-01. It includes key financial metrics such as Open, High, Low, Close, Adjusted Close, and Volume. This dataset is perfect for economic analysis, time series modelling, and cryptocurrency research.
This dataset is ideal for: 1. Financial Analysis: Analyzing Bitcoin price trends, volatility, and market behaviour over a decade. 2. Time Series Analysis: Using historical data to build predictive models for Bitcoin prices. 3. Algorithmic Trading: Developing trading strategies and backtesting them. 4. Cryptocurrency Research: Studying the adoption and market dynamics of Bitcoin. 5. Data Visualization: Creating charts and graphs to visualize Bitcoin’s price history.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cryptocurrency historical datasets from January 2012 (if available) to October 2021 were obtained and integrated from various sources and Application Programming Interfaces (APIs) including Yahoo Finance, Cryptodownload, CoinMarketCap, various Kaggle datasets, and multiple APIs. While these datasets used various formats of time (e.g., minutes, hours, days), in order to integrate the datasets days format was used for in this research study. The integrated cryptocurrency historical datasets for 80 cryptocurrencies including but not limited to Bitcoin (BTC), Ethereum (ETH), Binance Coin (BNB), Cardano (ADA), Tether (USDT), Ripple (XRP), Solana (SOL), Polkadot (DOT), USD Coin (USDC), Dogecoin (DOGE), Tron (TRX), Bitcoin Cash (BCH), Litecoin (LTC), EOS (EOS), Cosmos (ATOM), Stellar (XLM), Wrapped Bitcoin (WBTC), Uniswap (UNI), Terra (LUNA), SHIBA INU (SHIB), and 60 more cryptocurrencies were uploaded in this online Mendeley data repository. Although the primary attribute of including the mentioned cryptocurrencies was the Market Capitalization, a subject matter expert i.e., a professional trader has also guided the initial selection of the cryptocurrencies by analyzing various indicators such as Relative Strength Index (RSI), Moving Average Convergence/Divergence (MACD), MYC Signals, Bollinger Bands, Fibonacci Retracement, Stochastic Oscillator and Ichimoku Cloud. The primary features of this dataset that were used as the decision-making criteria of the CLUS-MCDA II approach are Timestamps, Open, High, Low, Closed, Volume (Currency), % Change (7 days and 24 hours), Market Cap and Weighted Price values. The available excel and CSV files in this data set are just part of the integrated data and other databases, datasets and API References that was used in this study are as follows: [1] https://finance.yahoo.com/ [2] https://coinmarketcap.com/historical/ [3] https://cryptodatadownload.com/ [4] https://kaggle.com/philmohun/cryptocurrency-financial-data [5] https://kaggle.com/deepshah16/meme-cryptocurrency-historical-data [6] https://kaggle.com/sudalairajkumar/cryptocurrencypricehistory [7] https://min-api.cryptocompare.com/data/price?fsym=BTC&tsyms=USD [8] https://min-api.cryptocompare.com/ [9] https://p.nomics.com/cryptocurrency-bitcoin-api [10] https://www.coinapi.io/ [11] https://www.coingecko.com/en/api [12] https://cryptowat.ch/ [13] https://www.alphavantage.co/ This dataset is part of the CLUS-MCDA (Cluster analysis for improving Multiple Criteria Decision Analysis) and CLUS-MCDAII Project: https://aimaghsoodi.github.io/CLUSMCDA-R-Package/ https://github.com/Aimaghsoodi/CLUS-MCDA-II https://github.com/azadkavian/CLUS-MCDA
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Construction This dataset captures the temporal network of Bitcoin (BTC) flow exchanged between entities at the finest time resolution in UNIX timestamp. Its construction is based on the blockchain covering the period from January, 3rd of 2009 to January the 25th of 2021. The blockchain extraction has been made using bitcoin-etl (https://github.com/blockchain-etl/bitcoin-etl) Python package. The entity-entity network is built by aggregating Bitcoin addresses using the common-input heuristic [1] as well as popular Bitcoin users' addresses provided by https://www.walletexplorer.com/ [1] M. Harrigan and C. Fretter, "The Unreasonable Effectiveness of Address Clustering," 2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld), Toulouse, France, 2016, pp. 368-373, doi: 10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.0071.keywords: {Online banking;Merging;Protocols;Upper bound;Bipartite graph;Electronic mail;Size measurement;bitcoin;cryptocurrency;blockchain}, Dataset Description Bitcoin Activity Temporal Coverage: From 03 January 2009 to 25 January 2021 Overview: This dataset provides a comprehensive representation of Bitcoin exchanges between entities over a significant temporal span, spanning from the inception of Bitcoin to recent years. It encompasses various temporal resolutions and representations to facilitate Bitcoin transaction network analysis in the context of temporal graphs. Every dates have been retrieved from bloc UNIX timestamp and GMT timezone. Contents: The dataset is distributed across three compressed archives: All data are stored in the Apache Parquet file format, a columnar storage format optimized for analytical queries. It can be used with pyspark Python package. orbitaal-stream_graph.tar.gz: The root directory is STREAM_GRAPH/ Contains a stream graph representation of Bitcoin exchanges at the finest temporal scale, corresponding to the validation time of each block (averaging approximately 10 minutes). The stream graph is divided into 13 files, one for each year Files format is parquet Name format is orbitaal-stream_graph-date-[YYYY]-file-id-[ID].snappy.parquet, where [YYYY] stands for the corresponding year and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year ordering These files are in the subdirectory STREAM_GRAPH/EDGES/ orbitaal-snapshot-all.tar.gz: The root directory is SNAPSHOT/ Contains the snapshot network representing all transactions aggregated over the whole dataset period (from Jan. 2009 to Jan. 2021). Files format is parquet Name format is orbitaal-snapshot-all.snappy.parquet. These files are in the subdirectory SNAPSHOT/EDGES/ALL/ orbitaal-snapshot-year.tar.gz: The root directory is SNAPSHOT/ Contains the yearly resolution of snapshot networks Files format is parquet Name format is orbitaal-snapshot-date-[YYYY]-file-id-[ID].snappy.parquet, where [YYYY] stands for the corresponding year and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year ordering These files are in the subdirectory SNAPSHOT/EDGES/year/ orbitaal-snapshot-month.tar.gz: The root directory is SNAPSHOT/ Contains the monthly resoluted snapshot networks Files format is parquet Name format is orbitaal-snapshot-date-[YYYY]-[MM]-file-id-[ID].snappy.parquet, where [YYYY] and [MM] stands for the corresponding year and month, and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year and month ordering These files are in the subdirectory SNAPSHOT/EDGES/month/ orbitaal-snapshot-day.tar.gz: The root directory is SNAPSHOT/ Contains the daily resoluted snapshot networks Files format is parquet Name format is orbitaal-snapshot-date-[YYYY]-[MM]-[DD]-file-id-[ID].snappy.parquet, where [YYYY], [MM], and [DD] stand for the corresponding year, month, and day, and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year, month, and day ordering These files are in the subdirectory SNAPSHOT/EDGES/day/ orbitaal-snapshot-hour.tar.gz: The root directory is SNAPSHOT/ Contains the hourly resoluted snapshot networks Files format is parquet Name format is orbitaal-snapshot-date-[YYYY]-[MM]-[DD]-[hh]-file-id-[ID].snappy.parquet, where [YYYY], [MM], [DD], and [hh] stand for the corresponding year, month, day, and hour, and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year, month, day and hour ordering These files are in the subdirectory SNAPSHOT/EDGES/hour/ orbitaal-nodetable.tar.gz: The root directory is NODE_TABLE/ Contains two files in parquet format, the first one gives information related to nodes present in stream graphs and snapshots such as period of activity and associated global Bitcoin balance, and the other one contains the list of all associated Bitcoin addresses. Small samples in CSV format orbitaal-stream_graph-2016_07_08.csv and orbitaal-stream_graph-2016_07_09.csv These two CSV files are related to stream graph representations of an halvening happening in 2016.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset contains historical price data for the top global cryptocurrencies, sourced from Yahoo Finance. The data spans the following time frames for each cryptocurrency:
BTC-USD (Bitcoin): From 2014 to December 2024 ETH-USD (Ethereum): From 2017 to December 2024 XRP-USD (Ripple): From 2017 to December 2024 USDT-USD (Tether): From 2017 to December 2024 SOL-USD (Solana): From 2020 to December 2024 BNB-USD (Binance Coin): From 2017 to December 2024 DOGE-USD (Dogecoin): From 2017 to December 2024 USDC-USD (USD Coin): From 2018 to December 2024 ADA-USD (Cardano): From 2017 to December 2024 STETH-USD (Staked Ethereum): From 2020 to December 2024
Key Features:
Date: The date of the record. Open: The opening price of the cryptocurrency on that day. High: The highest price during the day. Low: The lowest price during the day. Close: The closing price of the cryptocurrency on that day. Adj Close: The adjusted closing price, factoring in stock splits or dividends (for stablecoins like USDT and USDC, this value should be the same as the closing price). Volume: The trading volume for that day.
Data Source:
The dataset is sourced from Yahoo Finance and spans daily data from 2014 to December 2024, offering a rich set of data points for cryptocurrency analysis.
Use Cases:
Market Analysis: Analyze price trends and historical market behavior of leading cryptocurrencies. Price Prediction: Use the data to build predictive models, such as time-series forecasting for future price movements. Backtesting: Test trading strategies and financial models on historical data. Volatility Analysis: Assess the volatility of top cryptocurrencies to gauge market risk. Overview of the Cryptocurrencies in the Dataset: Bitcoin (BTC): The pioneer cryptocurrency, often referred to as digital gold and used as a store of value. Ethereum (ETH): A decentralized platform for building smart contracts and decentralized applications (DApps). Ripple (XRP): A payment protocol focused on enabling fast and low-cost international transfers. Tether (USDT): A popular stablecoin pegged to the US Dollar, providing price stability for trading and transactions. Solana (SOL): A high-speed blockchain known for low transaction fees and scalability, often seen as a competitor to Ethereum. Binance Coin (BNB): The native token of Binance, the world's largest cryptocurrency exchange, used for various purposes within the Binance ecosystem. Dogecoin (DOGE): Initially a meme-inspired coin, Dogecoin has gained a strong community and mainstream popularity. USD Coin (USDC): A fully-backed stablecoin pegged to the US Dollar, commonly used in decentralized finance (DeFi) applications. Cardano (ADA): A proof-of-stake blockchain focused on scalability, sustainability, and security. Staked Ethereum (STETH): A token representing Ethereum staked in the Ethereum 2.0 network, earning staking rewards.
This dataset provides a comprehensive overview of key cryptocurrencies that have shaped and continue to influence the digital asset market. Whether you're conducting research, building prediction models, or analyzing trends, this dataset is an essential resource for understanding the evolution of cryptocurrencies from 2014 to December 2024.
Bitcoin's blockchain size was close to reaching 652.93 gigabytes in June 2025, as the database saw exponential growth by nearly one gigabyte every few days. The Bitcoin blockchain contains a continuously growing and tamper-evident list of all Bitcoin transactions and records since its initial release in January 2009. Bitcoin has a set limit of 21 million coins, the last of which will be mined around 2140, according to a forecast made in 2017. Bitcoin mining: A somewhat uncharted world Despite interest in the topic, there are few accurate figures on how big Bitcoin mining is on a country-by-country basis. Bitcoin's design philosophy is at the heart of this. Created out of protest against governments and central banks, Bitcoin's blockchain effectively hides both the country of origin and the destination country within a (mining) transaction. Research involving IP addresses placed the United States as the world's most Bitcoin mining country in 2022 - but the source admits IP addresses can easily be manipulated using VPN. Note that mining figures are different from figures on Bitcoin trading: Africa and Latin America were more interested in buying and selling BTC than some of the world's developed economies. Bitcoin developments Bitcoin's trade volume slowed in the second quarter of 2023, after hitting a noticeable growth at the beginning of the year. The coin outperformed most of the market. Some attribute this to the announcement in June 2023 that BlackRock filed for a Bitcoin ETF. This iShares Bitcoin Trust was to use Coinbase Custody as its custodian. Regulators in the United States had not yet approved any applications for spot ETFs on Bitcoin.
Bitcoin's circulating supply has grown steadily since its inception in 2009, reaching over 19.9 million coins by late July 2025. This gradual increase reflects the cryptocurrency's design, which put a limit of 21 million on the total number of bitcoins that can ever exist. This impacts the Bitcoin price somewhat, as its scarcity can lead to volatility on the market. Maximum supply and scarcity Bitcoin is unusual from other cryptocurrencies in that its maximum supply is getting closer. By July 2025, more than 90 percent of all possible Bitcoin had been created. That said, Bitcoin's circulating supply is expected to reach its maximum around the year 2140. Meanwhile, mining becomes exponentially more difficult and energy-intensive. Institutional investors In 2025, countries like the United States openly started discussing the possibility of buying bitcoins to hold in reserve. By the time of writing, it was unclear whether this would happen. Nevertheless, institutional investors displayed more interest in the cryptocurrency than before. Certain companies owned several thousands of Bitcoin tokens in 2025, for example. This and the limited number of Bitcoin may further fuel price volatility.
Bitcoin News Daily (2015-2023) - 10 Articles Per Day Sampled
This dataset contains Bitcoin-related news articles collected between 2015 and 2023. Each day has a sample of 10 news articles to ensure coverage and balance across the years.
Dataset Details
Source: The dataset is sourced from various news outlets and covers multiple topics related to Bitcoin, including price, regulation, market trends, and technology. Time Span: The dataset includes articles from 2015 to… See the full description on the dataset page: https://huggingface.co/datasets/tahamajs/bitcoin_news_daily_10.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains the Pagerank values and rankings of Bitcoin addresses and transaction IDs (TXID). It contains a total of 1.608.748.675 addresses or TXIDs.
Part 2 is available at https://zenodo.org/deposit/6077428
File format
The dataset is compressed with bzip2. It can be uncompressed using the command bunzip2. The dataset is divided into multiple files since it was large. The files are space-delimited plain text files and have the following five fields:
Label: A alphanumeric Bitcoin address (e.g. 1DzTCMmWABEDM1rYFL1RgdLyE59jXMzEHV) or a 64 character hexadecimal transaction ID (e.g. 000000000fdf0c619cd8e0d512c7e2c0da5a5808e60f12f1e0d01522d2986a51) Type: String
Label type: It's value is 0 if the label is transaction ID and 1 if the label is a Bitcoin address. Type: Integer
Rank: Unique Pagerank rank where the ties (addresses having the same Pagerank value) are resolved by sorting the addresses. Type: Integer
Rank with ties: Pagerank rank where the ties (addresses having the same Pagerank value) have the same rank. Type: Integer
Pagerank value: Pagerank of the address and transaction IDs calculated using Pagerank algorithm. Type: Floating-point number
Sample lines:
000000000fdf0c619cd8e0d512c7e2c0da5a5808e60f12f1e0d01522d2986a51 0 427225664 266976712 0.979246
1DzTCMmWABEDM1rYFL1RgdLyE59jXMzEHV 1 1114666798 508037940 0.877961
"head.txt" contains the first 10 lines of each file. "tail.txt" contains the last 10 lines of each file.
Dataset Generation
The Bitcoin transactions between blocks 0 (mined on 03.01.2009) and 713.999 (mined on 13.12.2021) are extracted. A transaction graph is constructed, where Bitcoin addresses and transaction IDs are nodes of the graph and the transaction inputs and outputs are edges of the graph. Pagerank is applied on this transaction graph. This computation is performed using the system presented in the paper 'Parallel analysis of Ethereum blockchain transaction data using cluster computing'.
Note
If you use our dataset in your research, please cite our paper: https://link.springer.com/article/10.1007/s10586-021-03511-0
@article{kilic2022parallel,
title={Parallel Analysis of Ethereum Blockchain Transaction Data using Cluster Computing},
journal={Cluster Computing},
author={K{\i}l{\i}{\c{c}}, Baran and {\"O}zturan, Can and Sen, Alper},
year={2022},
month={Jan}
}
Other Datasets
If you are interested, please also check out our Pagerank Dataset for Ethereum Blockchain.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Litecoin is a cryptocurrency and distributed ledger system (blockchain) that is nearly identical to Bitcoin. Litecoin uses the scrypt algorithm (vs. Bitcoin’s SHA256) for proof-of-work and has a 2.5 minute block time (vs. Bitcoin’s 10 minute block time). Aside from these differences, it is nearly identical to Bitcoin.This dataset contains the blockchain data in their entirety, pre-processed to be human-friendly and to support common use cases such as auditing, investigating, and researching the economic and financial properties of the system. This dataset is part of a larger effort to make cryptocurrency data available in BigQuery through the Google Cloud Public Datasets program . The program is hosting several cryptocurrency datasets, with plans to both expand offerings to include additional cryptocurrencies and reduce the latency of updates. You can find these datasets by searching "cryptocurrency" in GCP Marketplace. For analytics interoperability, we designed a unified schema that allows all Bitcoin-like datasets to share queries. Interested in learning more about how the data from these blockchains were brought into BigQuery? Looking for more ways to analyze the data? Check out the Google Cloud Big Data blog post and try the sample queries below to get started. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Note: This version simply updates the dataset with a new Readme - Supplementary Material file.
The paper describing this dataset can be cited as:
Zilius, K., Spiliotopoulos, T., & van Moorsel, A. (2023). A Dataset of Coordinated Cryptocurrency-Related Social Media Campaigns. Proceedings of the International AAAI Conference on Web and Social Media, 17(1), 1112-1121. https://doi.org/10.1609/icwsm.v17i1.22219
Abstract
The rise in adoption of cryptoassets has brought many new and inexperienced investors in the cryptocurrency space. These investors can be disproportionally influenced by information they receive online, and particularly from social media. This paper presents a dataset of crypto-related bounty events and the users that participate in them. These events coordinate social media campaigns to create artificial "hype" around a crypto project in order to influence the price of its token. The dataset consists of information about 15.8K cross-media bounty events, 185K participants, 10M forum comments and 82M social media URLs collected from the Bounties(Altcoins) subforum of the BitcoinTalk online forum from May 2014 to December 2022. We describe the data collection and the data processing methods employed and we present a basic characterization of the dataset. Furthermore, we discuss potential research opportunities afforded by the dataset across many disciplines and we highlight potential novel insights into how the cryptocurrency industry operates and how it interacts with its audience.
Bitcoin's transaction volume was at its highest in December 2023, when the network processed over ******* coins on the same day. Bitcoin generally has a higher transaction activity than other cryptocurrencies, except Ethereum. This cryptocurrency is often processed more than *********** times per day. Note that the transaction volume here refers to transactions registered within the Bitcoin blockchain. It should not be confused with Bitcoin's 24-hour trade volume, a metric associated with crypto exchanges. The more Bitcoin transactions, the more it is used in B2C payments? A Bitcoin transaction recorded in the blockchain can be any transaction, including B2C but also P2P. While it is possible to see in the blockchain which address sent Bitcoin to whom, details on who this person is and where they are from are typically missing. Bitcoin was designed to go against monetary authorities and prides itself on being anonymous. An important argument against Bitcoin replacing cash or cards in payments is that the cryptocurrency was not allowed for such a task: Bitcoin ranks among the slowest cryptocurrencies in terms of transaction speed. Are cryptocurrencies taking over payments? Cryptocurrency payments are set to grow at a CAGR of nearly ** percent between 2022 and 2029, although the market is relatively small. The forecast is according to a market estimate made in early 2023, based on various conditions and sources available at that time. Research across ** countries during the same time suggested that the market share of cryptocurrency in e-commerce transactions was "less than *** percent" in all surveyed countries, with predictions being this would not change in the future.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
**This data set contains Bitcoin data for years 2009-2011. For years 2011-2018 (~45GB), please see https://github.com/cakcora/CoinWorks/blob/master/data.MD
We provide input and output edges of transactions. This data is divided into yearly and monthly files. Each year's data is zipped together and contains 12 input edge files and 12 output edge files of transactions that were mined in the blocks of that year/month.
Each line in the input edge file is tab separated with the format:
Unix time of transaction\thash of transaction\thash of first input transaction\tindex of output from first input transaction\thash of second input transaction\tindex of output from second input transaction\t(additional inputs, if exist)\r
Each line in the output edge file is tab separated with the format:
Unix time of transaction\thash of transaction\thash of first output address\tamount of first output bitcoins\thash of second output address\tamount of second output bitcoins\t(additional outputs, if exist)\r
https://user-images.githubusercontent.com/6596905/38154759-80cbf57a-3439-11e8-8d84-9706e5825d5c.png" alt="Bitcoin Graph">
Consider the Bitcoin graph in the figure above, where transactions and addresses are shown with rectangles and circles, respectively. This graph would be given in two files: inputsYear_Month.txt and outputsYear_Month.txt. Files would include these lines:
-- inputsYear_Month.txt
UnixTimeOft_1 HashOft_1 HashOft_x1 0 HashOft_x2 8
UnixTimeOft_2 HashOft_2 HashOft_x3 1 HashOft_x4 3 HashOft_x5 0
UnixTimeOft_3 HashOft_3 HashOft_1 1
UnixTimeOft_4 HashOft_4 HashOft_3 2 HashOft_2 0
-- outputsYear_Month.txt
UnixTimeOft_1 HashOft_1 HashOfa_6 10^8 HashOfa_7 0.8^0.8
UnixTimeOft_2 HashOft_2 HashOfa_8 3.8*10^8
UnixTimeOft_3 HashOft_3 HashOfa_9 0.2*10^8 HashOfa_10 0.2*10^8 HashOfa_11 0.3*10^8
UnixTimeOft_4 HashOft_4 HashOfa_12 3.7*10^8 HashOfa_13 0.3*10^8
Please visit the full dataset page for your data related questions.
Consumers from countries in Africa, Asia, and South America were most likely to be an owner of cryptocurrencies, such as Bitcoin, in 2025. This conclusion can be reached after combining ** different surveys from the Statista's Consumer Insights over the course of that year. Nearly one out of three respondents to Statista's survey in Nigeria, for instance, mentioned they either owned or use a digital coin, rather than *** out of 100 respondents in the United States. This is a significant change from a list that looks at the Bitcoin (BTC) trading volume in ** countries: There, the United States and Russia were said to have traded the highest amounts of this particular virtual coin. Nevertheless, African and Latin American countries are noticeable entries in that list too. Daily use, or an investment tool? The survey asked whether consumers either owned or used cryptocurrencies but does not specify their exact use or purpose. Some countries, however, are more likely to use digital currencies on a day-to-day basis. Nigeria increasingly uses mobile money operations to either pay in stores or to send money to family and friends. Polish consumers could buy several types of products with a cryptocurrency in 2019. Opposed to this is the country of Vietnam: Here, the use of Bitcoin and other cryptocurrencies as a payment method is forbidden. Owning some form of cryptocurrency in Vietnam as an investment is allowed, however. Which countries are more likely to invest in cryptocurrencies? Professional investors looking for a cryptocurrency-themed ETF were more often found in Europe than in the United or China, according to a survey in early 2020. Most of the largest crypto hedge fund managers with a location in Europe in 2020, were either from the United Kingdom or Switzerland - the country with the highest cryptocurrency adoption rate in Europe according to Statista's Global Consumer Survey. Whether this had changed by 2025 was not yet clear.
This dataset is an extra updating dataset for the G-Research Crypto Forecasting competition.
This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of the 1-minute resolution, collected for all competition assets and both retrieval and uploading are fully automated. see discussion topic.
For every asset in the competition, the following fields from Binance's official API endpoint for historical candlestick data are collected, saved, and processed.
1. **timestamp** - A timestamp for the minute covered by the row.
2. **Asset_ID** - An ID code for the cryptoasset.
3. **Count** - The number of trades that took place this minute.
4. **Open** - The USD price at the beginning of the minute.
5. **High** - The highest USD price during the minute.
6. **Low** - The lowest USD price during the minute.
7. **Close** - The USD price at the end of the minute.
8. **Volume** - The number of cryptoasset u units traded during the minute.
9. **VWAP** - The volume-weighted average price for the minute.
10. **Target** - 15 minute residualized returns. See the 'Prediction and Evaluation section of this notebook for details of how the target is calculated.
11. **Weight** - Weight, defined by the competition hosts [here](https://www.kaggle.com/cstein06/tutorial-to-the-g-research-crypto-competition)
12. **Asset_Name** - Human readable Asset name.
The dataframe is indexed by timestamp
and sorted from oldest to newest.
The first row starts at the first timestamp available on the exchange, which is July 2017 for the longest-running pairs.
The following is a collection of simple starter notebooks for Kaggle's Crypto Comp showing PurgedTimeSeries in use with the collected dataset. Purged TimesSeries is explained here. There are many configuration variables below to allow you to experiment. Use either GPU or TPU. You can control which years are loaded, which neural networks are used, and whether to use feature engineering. You can experiment with different data preprocessing, model architecture, loss, optimizers, and learning rate schedules. The extra datasets contain the full history of the assets in the same format as the competition, so you can input that into your model too.
These notebooks follow the ideas presented in my "Initial Thoughts" here. Some code sections have been reused from Chris' great (great) notebook series on SIIM ISIC melanoma detection competition here
This is a work in progress and will be updated constantly throughout the competition. At the moment, there are some known issues that still needed to be addressed:
Opening price with an added indicator (MA50):
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fb8664e6f26dc84e9a40d5a3d915c9640%2Fdownload.png?generation=1582053879538546&alt=media" alt="">
Volume and number of trades:
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fcd04ed586b08c1576a7b67d163ad9889%2Fdownload-1.png?generation=1582053899082078&alt=media" alt="">
This data is being collected automatically from the crypto exchange Binance.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Take a thrilling expedition into the mesmerizing universe of Bitcoin Cash addresses with this remarkable dataset. This treasure trove goes far beyond mere addresses, unveiling valuable revelations about connected transactions, historical balance records, and activity timestamps. Immerse yourself in the labyrinthine complexities of the blockchain system, as this invaluable resource stands as an ever-evolving and essential sanctuary of knowledge. No matter if you're a discerning financial analyst, an avid researcher, or an impassioned blockchain devotee, this meticulously crafted dataset has been customized to meet your every requirement and desire.
For any further details or inquiries about this output dataset, please connect with us at info@blockchair.com. Our dedicated team is always available to guide and ensure you harness the full potential of the information at hand.
https://fred.stlouisfed.org/legal/#copyright-citation-requiredhttps://fred.stlouisfed.org/legal/#copyright-citation-required
Graph and download economic data for Coinbase Bitcoin (CBBTCUSD) from 2014-12-01 to 2025-08-17 about cryptocurrency and USA.
This dataset is an extra updating dataset for the G-Research Crypto Forecasting competition.
This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of the 1-minute resolution, collected for all competition assets and both retrieval and uploading are fully automated. see discussion topic.
For every asset in the competition, the following fields from Binance's official API endpoint for historical candlestick data are collected, saved, and processed.
1. **timestamp** - A timestamp for the minute covered by the row.
2. **Asset_ID** - An ID code for the cryptoasset.
3. **Count** - The number of trades that took place this minute.
4. **Open** - The USD price at the beginning of the minute.
5. **High** - The highest USD price during the minute.
6. **Low** - The lowest USD price during the minute.
7. **Close** - The USD price at the end of the minute.
8. **Volume** - The number of cryptoasset u units traded during the minute.
9. **VWAP** - The volume-weighted average price for the minute.
10. **Target** - 15 minute residualized returns. See the 'Prediction and Evaluation section of this notebook for details of how the target is calculated.
11. **Weight** - Weight, defined by the competition hosts [here](https://www.kaggle.com/cstein06/tutorial-to-the-g-research-crypto-competition)
12. **Asset_Name** - Human readable Asset name.
The dataframe is indexed by timestamp
and sorted from oldest to newest.
The first row starts at the first timestamp available on the exchange, which is July 2017 for the longest-running pairs.
The following is a collection of simple starter notebooks for Kaggle's Crypto Comp showing PurgedTimeSeries in use with the collected dataset. Purged TimesSeries is explained here. There are many configuration variables below to allow you to experiment. Use either GPU or TPU. You can control which years are loaded, which neural networks are used, and whether to use feature engineering. You can experiment with different data preprocessing, model architecture, loss, optimizers, and learning rate schedules. The extra datasets contain the full history of the assets in the same format as the competition, so you can input that into your model too.
These notebooks follow the ideas presented in my "Initial Thoughts" here. Some code sections have been reused from Chris' great (great) notebook series on SIIM ISIC melanoma detection competition here
This is a work in progress and will be updated constantly throughout the competition. At the moment, there are some known issues that still needed to be addressed:
Opening price with an added indicator (MA50):
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fb8664e6f26dc84e9a40d5a3d915c9640%2Fdownload.png?generation=1582053879538546&alt=media" alt="">
Volume and number of trades:
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fcd04ed586b08c1576a7b67d163ad9889%2Fdownload-1.png?generation=1582053899082078&alt=media" alt="">
This data is being collected automatically from the crypto exchange Binance.
This dataset is an extra updating dataset for the G-Research Crypto Forecasting competition.
This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of the 1-minute resolution, collected for all competition assets and both retrieval and uploading are fully automated. see discussion topic.
For every asset in the competition, the following fields from Binance's official API endpoint for historical candlestick data are collected, saved, and processed.
1. **timestamp** - A timestamp for the minute covered by the row.
2. **Asset_ID** - An ID code for the cryptoasset.
3. **Count** - The number of trades that took place this minute.
4. **Open** - The USD price at the beginning of the minute.
5. **High** - The highest USD price during the minute.
6. **Low** - The lowest USD price during the minute.
7. **Close** - The USD price at the end of the minute.
8. **Volume** - The number of cryptoasset u units traded during the minute.
9. **VWAP** - The volume-weighted average price for the minute.
10. **Target** - 15 minute residualized returns. See the 'Prediction and Evaluation section of this notebook for details of how the target is calculated.
11. **Weight** - Weight, defined by the competition hosts [here](https://www.kaggle.com/cstein06/tutorial-to-the-g-research-crypto-competition)
12. **Asset_Name** - Human readable Asset name.
The dataframe is indexed by timestamp
and sorted from oldest to newest.
The first row starts at the first timestamp available on the exchange, which is July 2017 for the longest-running pairs.
The following is a collection of simple starter notebooks for Kaggle's Crypto Comp showing PurgedTimeSeries in use with the collected dataset. Purged TimesSeries is explained here. There are many configuration variables below to allow you to experiment. Use either GPU or TPU. You can control which years are loaded, which neural networks are used, and whether to use feature engineering. You can experiment with different data preprocessing, model architecture, loss, optimizers, and learning rate schedules. The extra datasets contain the full history of the assets in the same format as the competition, so you can input that into your model too.
These notebooks follow the ideas presented in my "Initial Thoughts" here. Some code sections have been reused from Chris' great (great) notebook series on SIIM ISIC melanoma detection competition here
This is a work in progress and will be updated constantly throughout the competition. At the moment, there are some known issues that still needed to be addressed:
Opening price with an added indicator (MA50):
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fb8664e6f26dc84e9a40d5a3d915c9640%2Fdownload.png?generation=1582053879538546&alt=media" alt="">
Volume and number of trades:
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fcd04ed586b08c1576a7b67d163ad9889%2Fdownload-1.png?generation=1582053899082078&alt=media" alt="">
This data is being collected automatically from the crypto exchange Binance.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In March 2024 Bitcoin BTC reached a new all-time high with prices exceeding 73000 USD marking a milestone for the cryptocurrency market This surge was due to the approval of Bitcoin exchange-traded funds ETFs in the United States allowing investors to access Bitcoin without directly holding it This development increased Bitcoin’s credibility and brought fresh demand from institutional investors echoing previous price surges in 2021 when Tesla announced its 15 billion investment in Bitcoin and Coinbase was listed on the Nasdaq By the end of 2022 Bitcoin prices dropped sharply to 15000 USD following the collapse of cryptocurrency exchange FTX and its bankruptcy which caused a loss of confidence in the market By August 2024 Bitcoin rebounded to approximately 64178 USD but remained volatile due to inflation and interest rate hikes Unlike fiat currency like the US dollar Bitcoin’s supply is finite with 21 million coins as its maximum supply By September 2024 over 92 percent of Bitcoin had been mined Bitcoin’s value is tied to its scarcity and its mining process is regulated through halving events which cut the reward for mining every four years making it harder and more energy-intensive to mine The next halving event in 2024 will reduce the reward to 3125 BTC from its current 625 BTC The final Bitcoin is expected to be mined around 2140 The energy required to mine Bitcoin has led to criticisms about its environmental impact with estimates in 2021 suggesting that one Bitcoin transaction used as much energy as Argentina Bitcoin’s future price is difficult to predict due to the influence of large holders known as whales who own about 92 percent of all Bitcoin These whales can cause dramatic market swings by making large trades and many retail investors still dominate the market While institutional interest has grown it remains a small fraction compared to retail Bitcoin is vulnerable to external factors like regulatory changes and economic crises leading some to believe it is in a speculative bubble However others argue that Bitcoin is still in its early stages of adoption and will grow further as more institutions and governments recognize its potential as a hedge against inflation and a store of value 2024 has also seen the rise of Bitcoin Layer 2 technologies like the Lightning Network which improve scalability by enabling faster and cheaper transactions These innovations are crucial for Bitcoin’s wider adoption especially for day-to-day use and cross-border remittances At the same time central bank digital currencies CBDCs are gaining traction as several governments including China and the European Union have accelerated the development of their own state-controlled digital currencies while Bitcoin remains decentralized offering financial sovereignty for those who prefer independence from government control The rise of CBDCs is expected to increase interest in Bitcoin as a hedge against these centralized currencies Bitcoin’s journey in 2024 highlights its growing institutional acceptance alongside its inherent market volatility While the approval of Bitcoin ETFs has significantly boosted interest the market remains sensitive to events like exchange collapses and regulatory decisions With the limited supply of Bitcoin and improvements in its transaction efficiency it is expected to remain a key player in the financial world for years to come Whether Bitcoin is currently in a speculative bubble or on a sustainable path to greater adoption will ultimately be revealed over time.