41 datasets found
  1. Crypto Fear and Greed Index

    • kaggle.com
    zip
    Updated Sep 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adelson de Araujo (2022). Crypto Fear and Greed Index [Dataset]. https://www.kaggle.com/datasets/adelsondias/crypto-fear-and-greed-index
    Explore at:
    zip(6461 bytes)Available download formats
    Dataset updated
    Sep 7, 2022
    Authors
    Adelson de Araujo
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Crypto Fear and Greed Index

    Each day, the website https://alternative.me/crypto/fear-and-greed-index/ publishes this index based on analysis of emotions and sentiments from different sources crunched into one simple number: The Fear & Greed Index for Bitcoin and other large cryptocurrencies.

    Why Measure Fear and Greed?

    The crypto market behaviour is very emotional. People tend to get greedy when the market is rising which results in FOMO (Fear of missing out). Also, people often sell their coins in irrational reaction of seeing red numbers. With our Fear and Greed Index, we try to save you from your own emotional overreactions. There are two simple assumptions:

    • Extreme fear can be a sign that investors are too worried. That could be a buying opportunity.
    • When Investors are getting too greedy, that means the market is due for a correction.

    Therefore, we analyze the current sentiment of the Bitcoin market and crunch the numbers into a simple meter from 0 to 100. Zero means "Extreme Fear", while 100 means "Extreme Greed". See below for further information on our data sources.

    Data Sources

    We are gathering data from the five following sources. Each data point is valued the same as the day before in order to visualize a meaningful progress in sentiment change of the crypto market.

    First of all, the current index is for bitcoin only (we offer separate indices for large alt coins soon), because a big part of it is the volatility of the coin price.

    But let’s list all the different factors we’re including in the current index:

    Volatility (25 %)

    We’re measuring the current volatility and max. drawdowns of bitcoin and compare it with the corresponding average values of the last 30 days and 90 days. We argue that an unusual rise in volatility is a sign of a fearful market.

    Market Momentum/Volume (25%)

    Also, we’re measuring the current volume and market momentum (again in comparison with the last 30/90 day average values) and put those two values together. Generally, when we see high buying volumes in a positive market on a daily basis, we conclude that the market acts overly greedy / too bullish.

    Social Media (15%)

    While our reddit sentiment analysis is still not in the live index (we’re still experimenting some market-related key words in the text processing algorithm), our twitter analysis is running. There, we gather and count posts on various hashtags for each coin (publicly, we show only those for Bitcoin) and check how fast and how many interactions they receive in certain time frames). A unusual high interaction rate results in a grown public interest in the coin and in our eyes, corresponds to a greedy market behaviour.

    Surveys (15%) currently paused

    Together with strawpoll.com (disclaimer: we own this site, too), quite a large public polling platform, we’re conducting weekly crypto polls and ask people how they see the market. Usually, we’re seeing 2,000 - 3,000 votes on each poll, so we do get a picture of the sentiment of a group of crypto investors. We don’t give those results too much attention, but it was quite useful in the beginning of our studies. You can see some recent results here.

    Dominance (10%)

    The dominance of a coin resembles the market cap share of the whole crypto market. Especially for Bitcoin, we think that a rise in Bitcoin dominance is caused by a fear of (and thus a reduction of) too speculative alt-coin investments, since Bitcoin is becoming more and more the safe haven of crypto. On the other side, when Bitcoin dominance shrinks, people are getting more greedy by investing in more risky alt-coins, dreaming of their chance in next big bull run. Anyhow, analyzing the dominance for a coin other than Bitcoin, you could argue the other way round, since more interest in an alt-coin may conclude a bullish/greedy behaviour for that specific coin.

    Trends (10%)

    We pull Google Trends data for various Bitcoin related search queries and crunch those numbers, especially the change of search volumes as well as recommended other currently popular searches. For example, if you check Google Trends for "Bitcoin", you can’t get much information from the search volume. But currently, you can see that there is currently a +1,550% rise of the query „bitcoin price manipulation“ in the box of related search queries (as of 05/29/2018). This is clearly a sign of fear in the market, and we use that for our index.

    There's a story behind every dataset and here's your opportunity to share yours.

    Copyright disclaimer

    This dataset is produced and maintained by the administrators of https://alternative.me/crypto/fear-and-greed-index/.

    This published version is an unofficial copy of their data, which can be also collected using their API (e.g., GET https://api.alternative.me/fng/?limit=10&format=csv&date_format=us).

  2. c

    Integrated Cryptocurrency Historical Data for a Predictive Data-Driven...

    • cryptodata.center
    Updated Dec 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Integrated Cryptocurrency Historical Data for a Predictive Data-Driven Decision-Making Algorithm - Dataset - CryptoData Hub [Dataset]. https://cryptodata.center/dataset/integrated-cryptocurrency-historical-data-for-a-predictive-data-driven-decision-making-algorithm
    Explore at:
    Dataset updated
    Dec 4, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cryptocurrency historical datasets from January 2012 (if available) to October 2021 were obtained and integrated from various sources and Application Programming Interfaces (APIs) including Yahoo Finance, Cryptodownload, CoinMarketCap, various Kaggle datasets, and multiple APIs. While these datasets used various formats of time (e.g., minutes, hours, days), in order to integrate the datasets days format was used for in this research study. The integrated cryptocurrency historical datasets for 80 cryptocurrencies including but not limited to Bitcoin (BTC), Ethereum (ETH), Binance Coin (BNB), Cardano (ADA), Tether (USDT), Ripple (XRP), Solana (SOL), Polkadot (DOT), USD Coin (USDC), Dogecoin (DOGE), Tron (TRX), Bitcoin Cash (BCH), Litecoin (LTC), EOS (EOS), Cosmos (ATOM), Stellar (XLM), Wrapped Bitcoin (WBTC), Uniswap (UNI), Terra (LUNA), SHIBA INU (SHIB), and 60 more cryptocurrencies were uploaded in this online Mendeley data repository. Although the primary attribute of including the mentioned cryptocurrencies was the Market Capitalization, a subject matter expert i.e., a professional trader has also guided the initial selection of the cryptocurrencies by analyzing various indicators such as Relative Strength Index (RSI), Moving Average Convergence/Divergence (MACD), MYC Signals, Bollinger Bands, Fibonacci Retracement, Stochastic Oscillator and Ichimoku Cloud. The primary features of this dataset that were used as the decision-making criteria of the CLUS-MCDA II approach are Timestamps, Open, High, Low, Closed, Volume (Currency), % Change (7 days and 24 hours), Market Cap and Weighted Price values. The available excel and CSV files in this data set are just part of the integrated data and other databases, datasets and API References that was used in this study are as follows: [1] https://finance.yahoo.com/ [2] https://coinmarketcap.com/historical/ [3] https://cryptodatadownload.com/ [4] https://kaggle.com/philmohun/cryptocurrency-financial-data [5] https://kaggle.com/deepshah16/meme-cryptocurrency-historical-data [6] https://kaggle.com/sudalairajkumar/cryptocurrencypricehistory [7] https://min-api.cryptocompare.com/data/price?fsym=BTC&tsyms=USD [8] https://min-api.cryptocompare.com/ [9] https://p.nomics.com/cryptocurrency-bitcoin-api [10] https://www.coinapi.io/ [11] https://www.coingecko.com/en/api [12] https://cryptowat.ch/ [13] https://www.alphavantage.co/ This dataset is part of the CLUS-MCDA (Cluster analysis for improving Multiple Criteria Decision Analysis) and CLUS-MCDAII Project: https://aimaghsoodi.github.io/CLUSMCDA-R-Package/ https://github.com/Aimaghsoodi/CLUS-MCDA-II https://github.com/azadkavian/CLUS-MCDA

  3. w

    Websites using Cryptocurrency Widgets Using Coingecko Api

    • webtechsurvey.com
    csv
    Updated Oct 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WebTechSurvey (2025). Websites using Cryptocurrency Widgets Using Coingecko Api [Dataset]. https://webtechsurvey.com/technology/cryptocurrency-widgets-using-coingecko-api
    Explore at:
    csvAvailable download formats
    Dataset updated
    Oct 13, 2025
    Dataset authored and provided by
    WebTechSurvey
    License

    https://webtechsurvey.com/termshttps://webtechsurvey.com/terms

    Time period covered
    2025
    Area covered
    Global
    Description

    A complete list of live websites using the Cryptocurrency Widgets Using Coingecko Api technology, compiled through global website indexing conducted by WebTechSurvey.

  4. Daily Updated Global Financial Data(Crypto,Stocks)

    • kaggle.com
    zip
    Updated Oct 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aniket Aher (2025). Daily Updated Global Financial Data(Crypto,Stocks) [Dataset]. https://www.kaggle.com/datasets/theaniketaher/daily-updated-global-financial-datacryptostocks/suggestions
    Explore at:
    zip(221672 bytes)Available download formats
    Dataset updated
    Oct 6, 2025
    Authors
    Aniket Aher
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Overview This dataset provides daily snapshots of cryptocurrency, stock market, and forex data.

    Sources Yahoo Finance (via yfinance)

    Features Automated daily updates Covers major global indices and top cryptocurrencies Includes sentiment analysis for financial news

    Use Cases Financial market analysis Machine learning for price prediction Trading strategy research

    License Data compiled from public APIs for educational and analytical use.

  5. Dataset for Multivariate Bitcoin Price Forecasting.

    • figshare.com
    txt
    Updated Apr 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anny Mardjo; Chidchanok Choksuchat (2023). Dataset for Multivariate Bitcoin Price Forecasting. [Dataset]. http://doi.org/10.6084/m9.figshare.22678540.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Apr 22, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Anny Mardjo; Chidchanok Choksuchat
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset was collected for the period spanning between 01/07/2019 and 31/12/2022.The historical Twitter volume were retrieved using ‘‘Bitcoin’’ (case insensitive) as the keyword from bitinfocharts.com. Google search volume was retrieved using library Gtrends. 2000 tweets per day using 4 times interval were crawled by employing Twitter API with the keyword “Bitcoin. The daily closing prices of Bitcoin, oil price, gold price, and U.S stock market indexes (S&P 500, NASDAQ, and Dow Jones Industrial Average) were collected using R libraries either Quantmod or Quandl.

  6. Cryptocurrency extra data - Bitcoin

    • kaggle.com
    zip
    Updated Dec 22, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yam Peleg (2021). Cryptocurrency extra data - Bitcoin [Dataset]. http://doi.org/10.34740/kaggle/dsv/2957358
    Explore at:
    zip(1293027802 bytes)Available download formats
    Dataset updated
    Dec 22, 2021
    Authors
    Yam Peleg
    Description

    Context:

    This dataset is an extra updating dataset for the G-Research Crypto Forecasting competition.

    Introduction

    This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of the 1-minute resolution, collected for all competition assets and both retrieval and uploading are fully automated. see discussion topic.

    The Data

    For every asset in the competition, the following fields from Binance's official API endpoint for historical candlestick data are collected, saved, and processed.

    
    1. **timestamp** - A timestamp for the minute covered by the row.
    2. **Asset_ID** - An ID code for the cryptoasset.
    3. **Count** - The number of trades that took place this minute.
    4. **Open** - The USD price at the beginning of the minute.
    5. **High** - The highest USD price during the minute.
    6. **Low** - The lowest USD price during the minute.
    7. **Close** - The USD price at the end of the minute.
    8. **Volume** - The number of cryptoasset u units traded during the minute.
    9. **VWAP** - The volume-weighted average price for the minute.
    10. **Target** - 15 minute residualized returns. See the 'Prediction and Evaluation section of this notebook for details of how the target is calculated.
    11. **Weight** - Weight, defined by the competition hosts [here](https://www.kaggle.com/cstein06/tutorial-to-the-g-research-crypto-competition)
    12. **Asset_Name** - Human readable Asset name.
    

    Indexing

    The dataframe is indexed by timestamp and sorted from oldest to newest. The first row starts at the first timestamp available on the exchange, which is July 2017 for the longest-running pairs.

    Usage Example

    The following is a collection of simple starter notebooks for Kaggle's Crypto Comp showing PurgedTimeSeries in use with the collected dataset. Purged TimesSeries is explained here. There are many configuration variables below to allow you to experiment. Use either GPU or TPU. You can control which years are loaded, which neural networks are used, and whether to use feature engineering. You can experiment with different data preprocessing, model architecture, loss, optimizers, and learning rate schedules. The extra datasets contain the full history of the assets in the same format as the competition, so you can input that into your model too.

    Baseline Example Notebooks:

    These notebooks follow the ideas presented in my "Initial Thoughts" here. Some code sections have been reused from Chris' great (great) notebook series on SIIM ISIC melanoma detection competition here

    Loose-ends:

    This is a work in progress and will be updated constantly throughout the competition. At the moment, there are some known issues that still needed to be addressed:

    • VWAP: - At the moment VWAP calculation formula is still unclear. Currently the dataset uses an approximation calculated from the Open, High, Low, Close, Volume candlesticks. [Waiting for competition hosts input]
    • Target Labeling: There exist some mismatches to the original target provided by the hosts at some time intervals. On all the others - it is the same. The labeling code can be seen here. [Waiting for competition hosts] input]
    • Filtering: No filtration of 0 volume data is taken place.

    Example Visualisations

    Opening price with an added indicator (MA50): https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fb8664e6f26dc84e9a40d5a3d915c9640%2Fdownload.png?generation=1582053879538546&alt=media" alt="">

    Volume and number of trades: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fcd04ed586b08c1576a7b67d163ad9889%2Fdownload-1.png?generation=1582053899082078&alt=media" alt="">

    License

    This data is being collected automatically from the crypto exchange Binance.

  7. Data Set: Python Crypto Misuses in the Wild

    • figshare.com
    zip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anna-Katharina Wickert; Lars Baumgärtner; Florian Breitfelder; Mira Mezini (2023). Data Set: Python Crypto Misuses in the Wild [Dataset]. http://doi.org/10.6084/m9.figshare.16499085.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Anna-Katharina Wickert; Lars Baumgärtner; Florian Breitfelder; Mira Mezini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Study results and scripts to obtain the results for our paper "Python Crypto Misuses in the Wild" [@akwick @gh0st42 @Breitfelder @miramezini]The archives in this folder contains the following:- evaluations.tar.gz contains the evaluation folder from the GitHub project linked in References. - tools.tar.gz contains the tools folder from the GitHub project linked in References.- repos-py-with-dep-only-src-files.zip contains the source files and their dependencies of the Python projects analyzed.- repos-micropy-with-dep-only-src-files.zip contains the sources files and their depedencies of the MicroPython projects analyzed.

  8. d

    BlockDB Canonical Raw Logs (Lineage-Verified) | Ethereum & EVM Chains |...

    • datarade.ai
    Updated Nov 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BlockDB (2025). BlockDB Canonical Raw Logs (Lineage-Verified) | Ethereum & EVM Chains | Historical, EOD, Real-Time | Cryptocurrency Data [Dataset]. https://datarade.ai/data-products/blockdb-canonical-raw-logs-lineage-verified-ethereum-ev-blockdb
    Explore at:
    .json, .csv, .xls, .parquetAvailable download formats
    Dataset updated
    Nov 6, 2025
    Dataset authored and provided by
    BlockDB
    Area covered
    Sri Lanka, Estonia, Italy, South Africa, Timor-Leste, Bosnia and Herzegovina, Kazakhstan, Saint Martin (French part), Gambia, Western Sahara
    Description

    Dataset Overview Each row represents a unique log emitted during transaction execution: • Canonical positioning: (block_number, tx_index, log_index) • Emitting contract address • Primary event topic (topic_zero) • Additional topics (data_topics) • Raw event data payload

    All fields are stored exactly as produced by the node, with direct RLP verifiability for topics, data, and contract address.

    Every log includes a deterministic _tracing_id that links the record to its genesis event and upstream transaction, forming the foundation for decoded events, swaps, liquidity, NFT events, and custom protocol decoders in downstream BlockDB products.

    Chains and Coverage ETH, BSC, Base, Arbitrum, Unichain, Avalanche, Polygon, Celo, Linea, Optimism (others on request). Full history from chain genesis; reorg-aware real-time ingestion and updates.

    Schema List of columns exactly as delivered: • block_number BIGINT – Block number that contains the emitting transaction • tx_index INTEGER – Zero-based index of the transaction within the block • log_index INTEGER – Zero-based position of the log within the transaction • contract_address BYTEA – 20-byte address of the contract that emitted the log • topic_zero BYTEA – 32-byte primary topic hash identifying the event type (NULL for anonymous events) • data_topics BYTEA[] – Array of additional topics (topics[1..n]), as raw bytes • data BYTEA – Raw event data payload as emitted on-chain • _tracing_id BYTEA – Deterministic lineage identifier of this log record • _created_at TIMESTAMPTZ – Record creation timestamp • _updated_at TIMESTAMPTZ – Record last update timestamp

    Notes • Primary key: (block_number, tx_index, log_index) guarantees canonical ordering and uniqueness. • Foreign key: (block_number, tx_index) links each log directly to its canonical transaction record. • Indexes on contract_address, topic_zero, and (contract_address, topic_zero) support fast protocol- or event-specific scans. • Binary values can be rendered as hex via encode(column, 'hex') in SQL for display or downstream joins.

    Lineage & Integrity Direct RLP-verifiable fields: contract_address, topic_zero, data_topics, data, and log_index are all directly or indirectly validated against node RLP.

    _tracing_id provides a deterministic, cryptographic handle for each log row, enabling: • Provenance tracking from raw logs to decoded events and higher-level features • Reproducible analytics and signal extraction • Cross-system consistency checks (RPC vs. indexers vs. internal warehouses)

    Common Use Cases • Building decoded event layers (swaps, LP actions, mints, burns, governance events, NFT activity) • Reconstructing DEX activity and liquidity flows directly from raw logs • Protocol-specific analytics (AMMs, lending, perpetuals, bridges, staking) from first principles • Detecting MEV patterns, liquidations, and arbitrage events at log-level resolution

    Quality • Verifiable lineage: deterministic cryptographic hashes per row • Reorg-aware ingestion: continuity and consistency across forks • Complete historical coverage: from chain genesis to present

  9. Cryptocurrency extra data - Ethereum Classic

    • kaggle.com
    zip
    Updated Jan 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yam Peleg (2022). Cryptocurrency extra data - Ethereum Classic [Dataset]. https://www.kaggle.com/yamqwe/cryptocurrency-extra-data-ethereum-classic
    Explore at:
    zip(1259913408 bytes)Available download formats
    Dataset updated
    Jan 19, 2022
    Authors
    Yam Peleg
    Description

    Context:

    This dataset is an extra updating dataset for the G-Research Crypto Forecasting competition.

    Introduction

    This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of the 1-minute resolution, collected for all competition assets and both retrieval and uploading are fully automated. see discussion topic.

    The Data

    For every asset in the competition, the following fields from Binance's official API endpoint for historical candlestick data are collected, saved, and processed.

    
    1. **timestamp** - A timestamp for the minute covered by the row.
    2. **Asset_ID** - An ID code for the cryptoasset.
    3. **Count** - The number of trades that took place this minute.
    4. **Open** - The USD price at the beginning of the minute.
    5. **High** - The highest USD price during the minute.
    6. **Low** - The lowest USD price during the minute.
    7. **Close** - The USD price at the end of the minute.
    8. **Volume** - The number of cryptoasset u units traded during the minute.
    9. **VWAP** - The volume-weighted average price for the minute.
    10. **Target** - 15 minute residualized returns. See the 'Prediction and Evaluation section of this notebook for details of how the target is calculated.
    11. **Weight** - Weight, defined by the competition hosts [here](https://www.kaggle.com/cstein06/tutorial-to-the-g-research-crypto-competition)
    12. **Asset_Name** - Human readable Asset name.
    

    Indexing

    The dataframe is indexed by timestamp and sorted from oldest to newest. The first row starts at the first timestamp available on the exchange, which is July 2017 for the longest-running pairs.

    Usage Example

    The following is a collection of simple starter notebooks for Kaggle's Crypto Comp showing PurgedTimeSeries in use with the collected dataset. Purged TimesSeries is explained here. There are many configuration variables below to allow you to experiment. Use either GPU or TPU. You can control which years are loaded, which neural networks are used, and whether to use feature engineering. You can experiment with different data preprocessing, model architecture, loss, optimizers, and learning rate schedules. The extra datasets contain the full history of the assets in the same format as the competition, so you can input that into your model too.

    Baseline Example Notebooks:

    These notebooks follow the ideas presented in my "Initial Thoughts" here. Some code sections have been reused from Chris' great (great) notebook series on SIIM ISIC melanoma detection competition here

    Loose-ends:

    This is a work in progress and will be updated constantly throughout the competition. At the moment, there are some known issues that still needed to be addressed:

    • VWAP: - At the moment VWAP calculation formula is still unclear. Currently the dataset uses an approximation calculated from the Open, High, Low, Close, Volume candlesticks. [Waiting for competition hosts input]
    • Target Labeling: There exist some mismatches to the original target provided by the hosts at some time intervals. On all the others - it is the same. The labeling code can be seen here. [Waiting for competition hosts] input]
    • Filtering: No filtration of 0 volume data is taken place.

    Example Visualisations

    Opening price with an added indicator (MA50): https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fb8664e6f26dc84e9a40d5a3d915c9640%2Fdownload.png?generation=1582053879538546&alt=media" alt="">

    Volume and number of trades: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fcd04ed586b08c1576a7b67d163ad9889%2Fdownload-1.png?generation=1582053899082078&alt=media" alt="">

    License

    This data is being collected automatically from the crypto exchange Binance.

  10. d

    BlockDB Coins Tokens Details | Ethereum & EVM Chains | Historical, EOD,...

    • datarade.ai
    Updated Jul 14, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BlockDB (2017). BlockDB Coins Tokens Details | Ethereum & EVM Chains | Historical, EOD, Real-Time | Crypto Token Data [Dataset]. https://datarade.ai/data-products/erc20-tokens-details-ethereum-evm-chains-historical-eo-blockdb
    Explore at:
    .json, .csv, .xls, .parquetAvailable download formats
    Dataset updated
    Jul 14, 2017
    Dataset authored and provided by
    BlockDB
    Area covered
    Ascension and Tristan da Cunha, Mauritius, Mali, Finland, Suriname, Portugal, Guinea, Hong Kong, Saint Martin (French part), Timor-Leste
    Description

    Dataset Overview Canonical on-chain token reference for fungible and non-fungible assets, providing unified structure and lineage for every recognized contract. Each row represents a unique token or collection, traceable to its genesis event and ABI-decoded metadata.

    Chains and Coverage ETH, BSC, Base, Arbitrum, Unichain, Avalanche, Polygon, Celo, Linea, Optimism (others on request). Full history from chain genesis; reorg-aware real-time ingestion and updates. Includes both native coins (ETH, BNB, AVAX, etc.) and token contracts (ERC-20, ERC-721, ERC-1155, ERC-4626, custom standards).

    Schema List the columns exactly as delivered. • contract_address BYTEA - PK; 20-byte contract address • block_number BIGINT - first block where the token was recognized • block_time TIMESTAMPTZ - UTC timestamp when the block was mined • tx_index INTEGER - tx index for that event • log_index INTEGER - log index for that event • name TEXT - asset name (from ABI or native coin registry) • symbol TEXT - token symbol or ticker • decimals SMALLINT - number of decimal places for fungible tokens (NULL for NFTs) • metadata_uri TEXT - optional field for NFT metadata base URI (if applicable) • _tracing_id BYTEA - deterministic row-level hash • _parent_tracing_ids BYTEA[] - hash(es) of immediate parent rows in the derivation graph • _genesis_tracing_ids BYTEA[] - hash(es) of original sources (genesis of the derivation path) • _created_at TIMESTAMPTZ - Record creation timestamp. • _updated_at TIMESTAMPTZ - Record last update timestamp

    Notes • Use encode(contract_address,'hex') for hex presentation. • Metadata for each token type is retrieved deterministically via ABI decoding or registry sources. • If the ABI read was unsuccessful, the token is not present in this table.

    Lineage Every row has a verifiable path back to the originating raw events via the lineage triple and tracing graph: • _tracing_id - this row’s identity • _parent_tracing_ids - immediate sources • _genesis_tracing_ids - original on-chain sources This supports audits and exact reprocessing to source transactions/logs/function calls.

    Common Use Cases • Canonical token registry for normalization across DeFi datasets • Symbol, name, decimals lookups for accurate unit scaling in analytics • Cross-chain asset identity resolution • Foundation for NFT, LP token, and vault datasets • Integration layer for pricing engines, wallets, and indexers

    Quality • Each row includes a cryptographic hash linking back to raw on-chain events for auditability. • Tick-level resolution for precision. • Reorg-aware ingestion ensuring data integrity. • Complete backfills to chain genesis for consistency.

  11. Cryptocurrency extra data - Bitcoin Cash

    • kaggle.com
    zip
    Updated Jan 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yam Peleg (2022). Cryptocurrency extra data - Bitcoin Cash [Dataset]. https://www.kaggle.com/yamqwe/cryptocurrency-extra-data-bitcoin-cash
    Explore at:
    zip(1253909016 bytes)Available download formats
    Dataset updated
    Jan 19, 2022
    Authors
    Yam Peleg
    Description

    Context:

    This dataset is an extra updating dataset for the G-Research Crypto Forecasting competition.

    Introduction

    This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of the 1-minute resolution, collected for all competition assets and both retrieval and uploading are fully automated. see discussion topic.

    The Data

    For every asset in the competition, the following fields from Binance's official API endpoint for historical candlestick data are collected, saved, and processed.

    
    1. **timestamp** - A timestamp for the minute covered by the row.
    2. **Asset_ID** - An ID code for the cryptoasset.
    3. **Count** - The number of trades that took place this minute.
    4. **Open** - The USD price at the beginning of the minute.
    5. **High** - The highest USD price during the minute.
    6. **Low** - The lowest USD price during the minute.
    7. **Close** - The USD price at the end of the minute.
    8. **Volume** - The number of cryptoasset u units traded during the minute.
    9. **VWAP** - The volume-weighted average price for the minute.
    10. **Target** - 15 minute residualized returns. See the 'Prediction and Evaluation section of this notebook for details of how the target is calculated.
    11. **Weight** - Weight, defined by the competition hosts [here](https://www.kaggle.com/cstein06/tutorial-to-the-g-research-crypto-competition)
    12. **Asset_Name** - Human readable Asset name.
    

    Indexing

    The dataframe is indexed by timestamp and sorted from oldest to newest. The first row starts at the first timestamp available on the exchange, which is July 2017 for the longest-running pairs.

    Usage Example

    The following is a collection of simple starter notebooks for Kaggle's Crypto Comp showing PurgedTimeSeries in use with the collected dataset. Purged TimesSeries is explained here. There are many configuration variables below to allow you to experiment. Use either GPU or TPU. You can control which years are loaded, which neural networks are used, and whether to use feature engineering. You can experiment with different data preprocessing, model architecture, loss, optimizers, and learning rate schedules. The extra datasets contain the full history of the assets in the same format as the competition, so you can input that into your model too.

    Baseline Example Notebooks:

    These notebooks follow the ideas presented in my "Initial Thoughts" here. Some code sections have been reused from Chris' great (great) notebook series on SIIM ISIC melanoma detection competition here

    Loose-ends:

    This is a work in progress and will be updated constantly throughout the competition. At the moment, there are some known issues that still needed to be addressed:

    • VWAP: - At the moment VWAP calculation formula is still unclear. Currently the dataset uses an approximation calculated from the Open, High, Low, Close, Volume candlesticks. [Waiting for competition hosts input]
    • Target Labeling: There exist some mismatches to the original target provided by the hosts at some time intervals. On all the others - it is the same. The labeling code can be seen here. [Waiting for competition hosts] input]
    • Filtering: No filtration of 0 volume data is taken place.

    Example Visualisations

    Opening price with an added indicator (MA50): https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fb8664e6f26dc84e9a40d5a3d915c9640%2Fdownload.png?generation=1582053879538546&alt=media" alt="">

    Volume and number of trades: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fcd04ed586b08c1576a7b67d163ad9889%2Fdownload-1.png?generation=1582053899082078&alt=media" alt="">

    License

    This data is being collected automatically from the crypto exchange Binance.

  12. d

    BlockDB Stablecoins Prices | Level 3 Tick-by-Tick | Ethereum & EVM Chains |...

    • datarade.ai
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BlockDB, BlockDB Stablecoins Prices | Level 3 Tick-by-Tick | Ethereum & EVM Chains | Historical, EOD, Real-Time | Stablecoins Data [Dataset]. https://datarade.ai/data-products/blockdb-stablecoins-prices-level-3-tick-by-tick-ethereum-blockdb
    Explore at:
    .json, .csv, .xls, .parquetAvailable download formats
    Dataset authored and provided by
    BlockDB
    Area covered
    Congo, Jersey, Costa Rica, Anguilla, Estonia, Cabo Verde, Kuwait, Switzerland, Guam, Indonesia
    Description

    Dataset Overview The Level-3 Stablecoin dataset provides complete tick-level liquidity grids for all Uniswap V3-style pools containing stablecoin pairs (e.g., USDC/USDT, DAI/USDC). Each snapshot represents the entire liquidity surface − showing the exact executable size, slippage, and price impact across the full tick range (−887 272 → +887 272).

    Key traits • Focused on USD and major fiat-pegged tokens (USDC, USDT, DAI, FRAX, LUSD, crvUSD, etc.) • Schema-stable, versioned, and audit-ready • Real-time (WSS) and historical/EOD delivery • Built for deep-liquidity and peg-stability analytics

    Chains and Coverage ETH, BSC, Base, Arbitrum, Unichain, Avalanche, Polygon, Celo, Linea, Optimism (others on request). Full history from chain genesis; reorg-aware real-time ingestion and updates. Coverage includes all major DEX protocols holding stablecoin pairs: • Uniswap V2, V3, V4 • Curve, Balancer, Aerodrome, Solidly, Maverick, Pancake, and others

    Schema List the columns exactly as delivered. • snapshot_id BIGINT - unique identifier for each grid snapshot. • pool_uid BIGINT - reference to the liquidity pool (liquidity_pools.uid). • block_number BIGINT - block number of the originating event. • tx_index INTEGER - transaction index within that block. • log_index INTEGER - log index within the transaction. • token_in BYTEA - 20-byte ERC-20 address of the input token. • token_out BYTEA - 20-byte ERC-20 address of the output token. • current_price NUMERIC(78,18) - mid-price (token_out per 1 token_in, decimals-adjusted). • grid_step_bps SMALLINT - spacing between grid points, in basis points. • grid_radius_bps INTEGER - total radius of the grid window, in basis points. • grid_points SMALLINT - number of grid points; must equal radius/step + 1. • reserves_in_adj NUMERIC(78,18) - adjusted reserve amount of the input token (decimals-normalized). • reserves_out_adj NUMERIC(78,18) - adjusted reserve amount of the output token (decimals-normalized). • _tracing_id BYTEA - deterministic row-level hash • _parent_tracing_ids BYTEA[] - hash(es) of immediate parent rows in the derivation graph • _genesis_tracing_ids BYTEA[] - hash(es) of original sources (genesis of the derivation path) • _created_at TIMESTAMPTZ - Record creation timestamp. • _updated_at TIMESTAMPTZ - Record last update timestamp

    token_to_token_l3_points • snapshot_id BIGINT - reference to the parent snapshot (token_to_token_l3_snapshots.snapshot_id). • point_index SMALLINT - sequential index (0 … grid_points − 1). • offset_bps_abs INTEGER - absolute offset from the mid-price, in basis points. • size_in NUMERIC(78,18) - executable input amount required to reach this offset. • size_out NUMERIC(78,18) - corresponding output amount at that offset. • price_at_point NUMERIC(78,18) - average price (out / in) including impact.

    Lineage Every row has a verifiable path back to the originating raw events via the lineage triple and tracing graph: • _tracing_id - this row’s identity • _parent_tracing_ids - immediate sources • _genesis_tracing_ids - original on-chain sources This supports audits and exact reprocessing to source transactions/logs/function calls.

    Common Use Cases • Peg-stability and liquidity-depth analysis for stablecoins • Execution cost and slippage modeling at tick-level precision • Liquidity surface visualizations for Uniswap V3/V4 pools • Cross-pool liquidity comparisons and routing research • Quantitative modeling for arbitrage, MEV, and impact forecasting

    Quality • Each row includes a cryptographic hash linking back to raw on-chain events for auditability. • Tick-level resolution for precision. • Reorg-aware ingestion ensuring data integrity. • Complete backfills to chain genesis for consistency.

  13. Cryptocurrency extra data - IOTA

    • kaggle.com
    zip
    Updated Jan 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yam Peleg (2022). Cryptocurrency extra data - IOTA [Dataset]. https://www.kaggle.com/yamqwe/cryptocurrency-extra-data-iota
    Explore at:
    zip(1196411839 bytes)Available download formats
    Dataset updated
    Jan 20, 2022
    Authors
    Yam Peleg
    Description

    Context:

    This dataset is an extra updating dataset for the G-Research Crypto Forecasting competition.

    Introduction

    This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of the 1-minute resolution, collected for all competition assets and both retrieval and uploading are fully automated. see discussion topic.

    The Data

    For every asset in the competition, the following fields from Binance's official API endpoint for historical candlestick data are collected, saved, and processed.

    
    1. **timestamp** - A timestamp for the minute covered by the row.
    2. **Asset_ID** - An ID code for the cryptoasset.
    3. **Count** - The number of trades that took place this minute.
    4. **Open** - The USD price at the beginning of the minute.
    5. **High** - The highest USD price during the minute.
    6. **Low** - The lowest USD price during the minute.
    7. **Close** - The USD price at the end of the minute.
    8. **Volume** - The number of cryptoasset u units traded during the minute.
    9. **VWAP** - The volume-weighted average price for the minute.
    10. **Target** - 15 minute residualized returns. See the 'Prediction and Evaluation section of this notebook for details of how the target is calculated.
    11. **Weight** - Weight, defined by the competition hosts [here](https://www.kaggle.com/cstein06/tutorial-to-the-g-research-crypto-competition)
    12. **Asset_Name** - Human readable Asset name.
    

    Indexing

    The dataframe is indexed by timestamp and sorted from oldest to newest. The first row starts at the first timestamp available on the exchange, which is July 2017 for the longest-running pairs.

    Usage Example

    The following is a collection of simple starter notebooks for Kaggle's Crypto Comp showing PurgedTimeSeries in use with the collected dataset. Purged TimesSeries is explained here. There are many configuration variables below to allow you to experiment. Use either GPU or TPU. You can control which years are loaded, which neural networks are used, and whether to use feature engineering. You can experiment with different data preprocessing, model architecture, loss, optimizers, and learning rate schedules. The extra datasets contain the full history of the assets in the same format as the competition, so you can input that into your model too.

    Baseline Example Notebooks:

    These notebooks follow the ideas presented in my "Initial Thoughts" here. Some code sections have been reused from Chris' great (great) notebook series on SIIM ISIC melanoma detection competition here

    Loose-ends:

    This is a work in progress and will be updated constantly throughout the competition. At the moment, there are some known issues that still needed to be addressed:

    • VWAP: - At the moment VWAP calculation formula is still unclear. Currently the dataset uses an approximation calculated from the Open, High, Low, Close, Volume candlesticks. [Waiting for competition hosts input]
    • Target Labeling: There exist some mismatches to the original target provided by the hosts at some time intervals. On all the others - it is the same. The labeling code can be seen here. [Waiting for competition hosts] input]
    • Filtering: No filtration of 0 volume data is taken place.

    Example Visualisations

    Opening price with an added indicator (MA50): https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fb8664e6f26dc84e9a40d5a3d915c9640%2Fdownload.png?generation=1582053879538546&alt=media" alt="">

    Volume and number of trades: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fcd04ed586b08c1576a7b67d163ad9889%2Fdownload-1.png?generation=1582053899082078&alt=media" alt="">

    License

    This data is being collected automatically from the crypto exchange Binance.

  14. d

    BlockDB Liquidity Pools Details & Fees | Ethereum & EVM Chains | Historical,...

    • datarade.ai
    Updated Oct 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BlockDB (2025). BlockDB Liquidity Pools Details & Fees | Ethereum & EVM Chains | Historical, EOD, Real-Time | Decentralized Finance (DeFi) Data [Dataset]. https://datarade.ai/data-products/liquidity-pools-details-ethereum-evm-chains-historical-blockdb
    Explore at:
    .json, .csv, .xls, .parquetAvailable download formats
    Dataset updated
    Oct 11, 2025
    Dataset authored and provided by
    BlockDB
    Area covered
    Sint Maarten (Dutch part), Bouvet Island, Mexico, Serbia, Cameroon, Western Sahara, Jamaica, Bhutan, Lebanon, Lithuania
    Description

    Dataset Overview Canonical registry of all DEX liquidity pools, normalized into the BlockDB schema for deterministic cross-chain analysis. Fees are modeled separately as versioned terms, so fee changes over time are tracked precisely and are joinable back to the pool.

    Chains and Coverage ETH, BSC, Base, Arbitrum, Unichain, Avalanche, Polygon, Celo, Linea, Optimism (others on request). Full history from chain genesis; reorg-aware real-time ingestion and updates. Coverage includes: • Uniswap V2, V3, V4 • Balancer V2, PancakeSwap, Solidly, Maverick, Aerodrome, and others

    Schema List the columns exactly as delivered.

    liquidity_pools (base registry) • uid BIGINT NOT NULL - stable pool identifier (derived from address or pool_id) • block_number BIGINT - first block where the token was recognized • block_time TIMESTAMPTZ - UTC timestamp when the block was mined • tx_index INTEGER - tx index for that event • log_index INTEGER - log index for that event • contract_address BYTEA NULL - 20-byte pool address (v2/v3-style) • pool_id BYTEA NULL - 32-byte pool id (v4-style / manager-based) • factory BYTEA NOT NULL - DEX factory / pool manager address • type_id INTEGER NOT NULL - pool type FK (constant-product, concentrated, stable/weighted, etc.) • pairnum NUMERIC(6) NULL - optional pair ordinal/descriptor • tokens BYTEA[] NOT NULL - array of 20-byte token addresses (order matches protocol convention) • asset_managers BYTEA[] NULL - per-token managers (e.g., Balancer) • amp NUMERIC(6) NULL - amplification for stable/weighted math • pool_type TEXT NULL - optional human-readable type label • weights NUMERIC(6,5)[] NULL - per-token weights in 0..1 (5 dp) • tick_spacing SMALLINT NULL - grid size for concentrated liquidity • _tracing_id BYTEA - deterministic row-level hash • _parent_tracing_ids BYTEA[] - hash(es) of immediate parent rows in the derivation graph • _genesis_tracing_ids BYTEA[] - hash(es) of original sources (genesis of the derivation path) • _created_at TIMESTAMPTZ - Record creation timestamp. • _updated_at TIMESTAMPTZ - Record last update timestamp

    liquidity_pool_fee_terms (versioned, non-overlapping) • pool_uid BIGINT NOT NULL - FK → liquidity_pools(uid) • block_number BIGINT - first block where the token was recognized • block_time TIMESTAMPTZ - UTC timestamp when the block was mined • tx_index INTEGER - tx index for that event • log_index INTEGER - log index for that event • pool_fee NUMERIC(18,18) NOT NULL - Total fee fraction (e.g. 0.003 = 0.30 %) • user_fee_bps SMALLINT NULL - Optional user-side fee share (0–10 000) • protocol_fee_bps SMALLINT NULL - Optional protocol-side fee share (0–10 000) • fee_source TEXT NOT NULL - Provenance of fee rate (e.g. onchain:event) • fee_share_source TEXT NOT NULL - Provenance of fee split (e.g. onchain:param, docs) • _tracing_id BYTEA - deterministic row-level hash • _parent_tracing_ids BYTEA[] - hash(es) of immediate parent rows in the derivation graph • _genesis_tracing_ids BYTEA[] - hash(es) of original sources (genesis of the derivation path) • _created_at TIMESTAMPTZ - Record creation timestamp. • _updated_at TIMESTAMPTZ - Record last update timestamp

    Checks • At least one identifier present (contract_address or pool_id) and lengths enforced (20B/32B).

    Notes • Fee terms are non-overlapping; each record defines a valid block-range. • Use liquidity_pool_fee_terms for historical fee reconstruction or to obtain the active fee at a given block.

    Lineage Every row has a verifiable path back to the originating raw events via the lineage triple and tracing graph: • _tracing_id - this row’s identity • _parent_tracing_ids - immediate sources • _genesis_tracing_ids - original on-chain sources This supports audits and exact reprocessing to source transactions/logs/function calls.

    Common Use Cases • Building the complete DEX pool registry for routing and analytics • Filtering pools by fee, type, or token pair • Integrating with reserves, price, and swap datasets for liquidity intelligence • MEV routing, arbitrage path optimization, and chain-wide pool analytics • Constructing pool-level AI or quantitative features

    Quality • Each row includes a cryptographic hash linking back to raw on-chain events for auditability. • Tick-level resolution for precision. • Reorg-aware ingestion ensuring data integrity. • Complete backfills to chain genesis for consistency.

  15. d

    BlockDB Canonical Raw Transactions (Lineage-Verified) | Ethereum & EVM...

    • datarade.ai
    Updated Nov 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BlockDB (2025). BlockDB Canonical Raw Transactions (Lineage-Verified) | Ethereum & EVM Chains | Historical, EOD, Real-Time | Cryptocurrency Data [Dataset]. https://datarade.ai/data-products/blockdb-canonical-raw-transactions-lineage-verified-ether-blockdb
    Explore at:
    .json, .csv, .xls, .parquetAvailable download formats
    Dataset updated
    Nov 6, 2025
    Dataset authored and provided by
    BlockDB
    Area covered
    Guinea, Tunisia, Turks and Caicos Islands, Antarctica, Fiji, Jamaica, Isle of Man, Bhutan, Netherlands, Portugal
    Description

    Dataset Overview Each transaction entry reflects the exact canonical form of the transaction as included in a block, including sender/recipient addresses, gas usage, execution status, transaction type, and (when applicable) created contract addresses.

    All records include a deterministic _tracing_id that links each row to its genesis source event, enabling full reconstruction of execution flow and downstream derivations.

    Chains and Coverage ETH, BSC, Base, Arbitrum, Unichain, Avalanche, Polygon, Celo, Linea, Optimism (others on request). Full history from chain genesis; reorg-aware real-time ingestion and updates.

    Schema List of columns exactly as delivered: • tx_hash BYTEA - 32-byte Keccak-256 transaction hash • from_address BYTEA - 20-byte sender address • to_address BYTEA - 20-byte recipient address (NULL for contract creation) • block_number BIGINT - Block number containing the transaction • tx_index INTEGER - Zero-based index of the transaction within the block • created_contract_address BYTEA - 20-byte contract address created by this transaction (if applicable) • gas_used BIGINT - Gas consumed during transaction execution • effective_gas_price_wei NUMERIC(38,0) - Final gas price paid in wei • status_success BOOLEAN - TRUE if successful, FALSE if reverted, NULL for pre-Byzantium blocks • root BYTEA - Post-transaction state root for pre-Byzantium blocks (NULL otherwise) • tx_type SMALLINT - Transaction type (e.g., 1 = legacy, 2 = EIP-1559) • _tracing_id BYTEA - Deterministic lineage identifier for this transaction • _created_at TIMESTAMPTZ - Record creation timestamp • _updated_at TIMESTAMPTZ - Record last update timestamp

    Notes • All binary values can be converted to hex via encode(column, 'hex') for readability. • Unique constraint ensures canonical (block_number, tx_index) ordering. • tx_index and block inclusion can be directly verified with canonical RLP.

    Lineage Each tx record includes a deterministic _tracing_id, forming the root lineage reference for all derived BlockDB datasets (swaps, liquidity, and token prices). This ensures verifiable traceability, reproducibility, and proof-of-derivation for every downstream record.

    Common Use Cases • Ground-truth layer for swaps, liquidity, and token analytics • Backtesting models using canonical transaction flow • Gas market analysis (EIP-1559, congestion modeling, MEV impact) • Contract deployment tracking and behavior analysis

    Quality • Verifiable lineage: deterministic cryptographic hashes per row • Reorg-aware ingestion: continuity and consistency across forks • Complete historical coverage: from chain genesis to present

  16. Cryptocurrency extra data - Cardano

    • kaggle.com
    zip
    Updated Jan 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yam Peleg (2022). Cryptocurrency extra data - Cardano [Dataset]. https://www.kaggle.com/datasets/yamqwe/cryptocurrency-extra-data-cardano/code
    Explore at:
    zip(1254179058 bytes)Available download formats
    Dataset updated
    Jan 20, 2022
    Authors
    Yam Peleg
    Description

    Context:

    This dataset is an extra updating dataset for the G-Research Crypto Forecasting competition.

    Introduction

    This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of the 1-minute resolution, collected for all competition assets and both retrieval and uploading are fully automated. see discussion topic.

    The Data

    For every asset in the competition, the following fields from Binance's official API endpoint for historical candlestick data are collected, saved, and processed.

    
    1. **timestamp** - A timestamp for the minute covered by the row.
    2. **Asset_ID** - An ID code for the cryptoasset.
    3. **Count** - The number of trades that took place this minute.
    4. **Open** - The USD price at the beginning of the minute.
    5. **High** - The highest USD price during the minute.
    6. **Low** - The lowest USD price during the minute.
    7. **Close** - The USD price at the end of the minute.
    8. **Volume** - The number of cryptoasset u units traded during the minute.
    9. **VWAP** - The volume-weighted average price for the minute.
    10. **Target** - 15 minute residualized returns. See the 'Prediction and Evaluation section of this notebook for details of how the target is calculated.
    11. **Weight** - Weight, defined by the competition hosts [here](https://www.kaggle.com/cstein06/tutorial-to-the-g-research-crypto-competition)
    12. **Asset_Name** - Human readable Asset name.
    

    Indexing

    The dataframe is indexed by timestamp and sorted from oldest to newest. The first row starts at the first timestamp available on the exchange, which is July 2017 for the longest-running pairs.

    Usage Example

    The following is a collection of simple starter notebooks for Kaggle's Crypto Comp showing PurgedTimeSeries in use with the collected dataset. Purged TimesSeries is explained here. There are many configuration variables below to allow you to experiment. Use either GPU or TPU. You can control which years are loaded, which neural networks are used, and whether to use feature engineering. You can experiment with different data preprocessing, model architecture, loss, optimizers, and learning rate schedules. The extra datasets contain the full history of the assets in the same format as the competition, so you can input that into your model too.

    Baseline Example Notebooks:

    These notebooks follow the ideas presented in my "Initial Thoughts" here. Some code sections have been reused from Chris' great (great) notebook series on SIIM ISIC melanoma detection competition here

    Loose-ends:

    This is a work in progress and will be updated constantly throughout the competition. At the moment, there are some known issues that still needed to be addressed:

    • VWAP: - At the moment VWAP calculation formula is still unclear. Currently the dataset uses an approximation calculated from the Open, High, Low, Close, Volume candlesticks. [Waiting for competition hosts input]
    • Target Labeling: There exist some mismatches to the original target provided by the hosts at some time intervals. On all the others - it is the same. The labeling code can be seen here. [Waiting for competition hosts] input]
    • Filtering: No filtration of 0 volume data is taken place.

    Example Visualisations

    Opening price with an added indicator (MA50): https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fb8664e6f26dc84e9a40d5a3d915c9640%2Fdownload.png?generation=1582053879538546&alt=media" alt="">

    Volume and number of trades: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fcd04ed586b08c1576a7b67d163ad9889%2Fdownload-1.png?generation=1582053899082078&alt=media" alt="">

    License

    This data is being collected automatically from the crypto exchange Binance.

  17. Cryptocurrency extra data - Monero

    • kaggle.com
    zip
    Updated Jan 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yam Peleg (2022). Cryptocurrency extra data - Monero [Dataset]. https://www.kaggle.com/yamqwe/cryptocurrency-extra-data-monero
    Explore at:
    zip(1204684577 bytes)Available download formats
    Dataset updated
    Jan 20, 2022
    Authors
    Yam Peleg
    Description

    Context:

    This dataset is an extra updating dataset for the G-Research Crypto Forecasting competition.

    Introduction

    This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of the 1-minute resolution, collected for all competition assets and both retrieval and uploading are fully automated. see discussion topic.

    The Data

    For every asset in the competition, the following fields from Binance's official API endpoint for historical candlestick data are collected, saved, and processed.

    
    1. **timestamp** - A timestamp for the minute covered by the row.
    2. **Asset_ID** - An ID code for the cryptoasset.
    3. **Count** - The number of trades that took place this minute.
    4. **Open** - The USD price at the beginning of the minute.
    5. **High** - The highest USD price during the minute.
    6. **Low** - The lowest USD price during the minute.
    7. **Close** - The USD price at the end of the minute.
    8. **Volume** - The number of cryptoasset u units traded during the minute.
    9. **VWAP** - The volume-weighted average price for the minute.
    10. **Target** - 15 minute residualized returns. See the 'Prediction and Evaluation section of this notebook for details of how the target is calculated.
    11. **Weight** - Weight, defined by the competition hosts [here](https://www.kaggle.com/cstein06/tutorial-to-the-g-research-crypto-competition)
    12. **Asset_Name** - Human readable Asset name.
    

    Indexing

    The dataframe is indexed by timestamp and sorted from oldest to newest. The first row starts at the first timestamp available on the exchange, which is July 2017 for the longest-running pairs.

    Usage Example

    The following is a collection of simple starter notebooks for Kaggle's Crypto Comp showing PurgedTimeSeries in use with the collected dataset. Purged TimesSeries is explained here. There are many configuration variables below to allow you to experiment. Use either GPU or TPU. You can control which years are loaded, which neural networks are used, and whether to use feature engineering. You can experiment with different data preprocessing, model architecture, loss, optimizers, and learning rate schedules. The extra datasets contain the full history of the assets in the same format as the competition, so you can input that into your model too.

    Baseline Example Notebooks:

    These notebooks follow the ideas presented in my "Initial Thoughts" here. Some code sections have been reused from Chris' great (great) notebook series on SIIM ISIC melanoma detection competition here

    Loose-ends:

    This is a work in progress and will be updated constantly throughout the competition. At the moment, there are some known issues that still needed to be addressed:

    • VWAP: - At the moment VWAP calculation formula is still unclear. Currently the dataset uses an approximation calculated from the Open, High, Low, Close, Volume candlesticks. [Waiting for competition hosts input]
    • Target Labeling: There exist some mismatches to the original target provided by the hosts at some time intervals. On all the others - it is the same. The labeling code can be seen here. [Waiting for competition hosts] input]
    • Filtering: No filtration of 0 volume data is taken place.

    Example Visualisations

    Opening price with an added indicator (MA50): https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fb8664e6f26dc84e9a40d5a3d915c9640%2Fdownload.png?generation=1582053879538546&alt=media" alt="">

    Volume and number of trades: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fcd04ed586b08c1576a7b67d163ad9889%2Fdownload-1.png?generation=1582053899082078&alt=media" alt="">

    License

    This data is being collected automatically from the crypto exchange Binance.

  18. d

    BlockDB Liquidity Pools Details & Fees | Aerodrome Slipstream | Ethereum &...

    • datarade.ai
    Updated Oct 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BlockDB (2025). BlockDB Liquidity Pools Details & Fees | Aerodrome Slipstream | Ethereum & EVM Chains | Historical, EOD, Real-Time | Decentralized Finance (DeFi) Data [Dataset]. https://datarade.ai/data-products/blockdb-liquidity-pools-details-fees-aerodrome-slipstream-blockdb
    Explore at:
    .json, .csv, .xls, .parquetAvailable download formats
    Dataset updated
    Oct 11, 2025
    Dataset authored and provided by
    BlockDB
    Area covered
    Kenya, Kiribati, Macao, Saint Martin (French part), Mozambique, Panama, Dominican Republic, Saint Kitts and Nevis, United States of America, Saint Pierre and Miquelon
    Description

    Dataset Overview Canonical registry of all verified Aerodrome Slipstream liquidity pools, normalized into the BlockDB schema for deterministic cross-chain analysis. Fees are modeled separately as versioned terms, so fee changes over time are tracked precisely and are joinable back to the pool.

    This dataset provides the structural backbone for connecting Aerodrome Slipstream pool states with liquidity, price, and swap datasets.

    Chains and Coverage ETH, BSC, Base, Arbitrum, Unichain, Avalanche, Polygon, Celo, Linea, Optimism (others on request). Full backfill to chain genesis with reorg-aware real-time ingestion. Covers all verified Aerodrome Slipstream deployments across supported EVM networks.

    Schema List the columns exactly as delivered.

    liquidity_pools (base registry) • uid BIGINT NOT NULL - stable pool identifier (derived from address or pool_id) • block_number BIGINT - first block where the token was recognized • block_time TIMESTAMPTZ - UTC timestamp when the block was mined • tx_index INTEGER - tx index for that event • log_index INTEGER - log index for that event • contract_address BYTEA NULL - 20-byte pool address (v2/v3-style) • pool_id BYTEA NULL - 32-byte pool id (v4-style / manager-based) • factory BYTEA NOT NULL - DEX factory / pool manager address • type_id INTEGER NOT NULL - pool type FK (constant-product, concentrated, stable/weighted, etc.) • pairnum NUMERIC(6) NULL - optional pair ordinal/descriptor • tokens BYTEA[] NOT NULL - array of 20-byte token addresses (order matches protocol convention) • asset_managers BYTEA[] NULL - per-token managers (e.g., Balancer) • amp NUMERIC(6) NULL - amplification for stable/weighted math • pool_type TEXT NULL - optional human-readable type label • weights NUMERIC(6,5)[] NULL - per-token weights in 0..1 (5 dp) • tick_spacing SMALLINT NULL - grid size for concentrated liquidity • _tracing_id BYTEA - deterministic row-level hash • _parent_tracing_ids BYTEA[] - hash(es) of immediate parent rows in the derivation graph • _genesis_tracing_ids BYTEA[] - hash(es) of original sources (genesis of the derivation path) • _created_at TIMESTAMPTZ - Record creation timestamp. • _updated_at TIMESTAMPTZ - Record last update timestamp

    liquidity_pool_fee_terms (versioned, non-overlapping) • pool_uid BIGINT NOT NULL - FK → liquidity_pools(uid) • block_number BIGINT - first block where the token was recognized • block_time TIMESTAMPTZ - UTC timestamp when the block was mined • tx_index INTEGER - tx index for that event • log_index INTEGER - log index for that event • pool_fee NUMERIC(18,18) NOT NULL - Total fee fraction (e.g. 0.003 = 0.30 %) • user_fee_bps SMALLINT NULL - Optional user-side fee share (0–10 000) • protocol_fee_bps SMALLINT NULL - Optional protocol-side fee share (0–10 000) • fee_source TEXT NOT NULL - Provenance of fee rate (e.g. onchain:event) • fee_share_source TEXT NOT NULL - Provenance of fee split (e.g. onchain:param, docs) • _tracing_id BYTEA - deterministic row-level hash • _parent_tracing_ids BYTEA[] - hash(es) of immediate parent rows in the derivation graph • _genesis_tracing_ids BYTEA[] - hash(es) of original sources (genesis of the derivation path) • _created_at TIMESTAMPTZ - Record creation timestamp. • _updated_at TIMESTAMPTZ - Record last update timestamp

    Checks • At least one identifier present (contract_address or pool_id) and lengths enforced (20B/32B).

    Notes • Fee terms are non-overlapping; each record defines a valid block-range. • Use liquidity_pool_fee_terms for historical fee reconstruction or to obtain the active fee at a given block.

    Lineage Every row has a verifiable path back to the originating raw events via the lineage triple and tracing graph: • _tracing_id - this row’s identity • _parent_tracing_ids - immediate sources • _genesis_tracing_ids - original on-chain sources This supports audits and exact reprocessing to source transactions/logs/function calls.

    Common Use Cases • Building the complete DEX pool registry for routing and analytics • Filtering pools by fee, type, or token pair • Integrating with reserves, price, and swap datasets for liquidity intelligence • MEV routing, arbitrage path optimization, and chain-wide pool analytics • Constructing pool-level AI or quantitative features

    Quality • Each row includes a cryptographic hash linking back to raw on-chain events for auditability. • Tick-level resolution for precision. • Reorg-aware ingestion ensuring data integrity. • Complete backfills to chain genesis for consistency.

  19. d

    BlockDB ERC20 Tokens Details | Ethereum & EVM Chains | Historical, EOD,...

    • datarade.ai
    Updated Jul 14, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BlockDB (2017). BlockDB ERC20 Tokens Details | Ethereum & EVM Chains | Historical, EOD, Real-Time | Crypto Token Data [Dataset]. https://datarade.ai/data-products/blockdb-erc20-tokens-details-ethereum-evm-chains-histor-blockdb
    Explore at:
    .json, .csv, .xls, .parquetAvailable download formats
    Dataset updated
    Jul 14, 2017
    Dataset authored and provided by
    BlockDB
    Area covered
    Guyana, Peru, Sri Lanka, Sweden, Cuba, Kosovo, Suriname, Uganda, Holy See, Mauritius
    Description

    Dataset Overview Canonical ERC-20 token reference with deterministic tracing at the row level. One row per token contract, with audit-grade lineage to the first recognition event and to parent/genesis derivations.

    Chains and Coverage ETH, BSC, Base, Arbitrum, Unichain, Avalanche, Polygon, Celo, Linea, Optimism (others on request). Full history from chain genesis; reorg-aware real-time ingestion and updates.

    Schema List the columns exactly as delivered. • contract_address BYTEA - PK; 20-byte ERC-20 contract address • block_number BIGINT - first block where the token was recognized • block_time TIMESTAMPTZ - UTC timestamp when the block was mined • tx_index INTEGER - tx index for that event • log_index INTEGER - log index for that event • name TEXT - asset name • symbol TEXT - token symbol or ticker • decimals SMALLINT - number of decimal places • _tracing_id BYTEA - deterministic row-level hash • _parent_tracing_ids BYTEA[] - hash(es) of immediate parent rows in the derivation graph • _genesis_tracing_ids BYTEA[] - hash(es) of original sources (genesis of the derivation path) • _created_at TIMESTAMPTZ - Record creation timestamp. • _updated_at TIMESTAMPTZ - Record last update timestamp

    Notes • Use encode(contract_address,'hex') for hex presentation. • Metadata for each token type is retrieved deterministically via ABI decoding or registry sources. • If the ABI read was unsuccessful, the token is not present in this table.

    Lineage Every row has a verifiable path back to the originating raw events via the lineage triple and tracing graph: • _tracing_id - this row’s identity • _parent_tracing_ids - immediate sources • _genesis_tracing_ids - original on-chain sources This supports audits and exact reprocessing to source transactions/logs/function calls.

    Common Use Cases • Token registry to normalize joins for swaps, transfers, pools, and prices • Amount scaling via decimals for analytics, PnL, and model features • App backends: display names/symbols and validate token addresses

    Quality • Each row includes a cryptographic hash linking back to raw on-chain events for auditability. • Tick-level resolution for precision. • Reorg-aware ingestion ensuring data integrity. • Complete backfills to chain genesis for consistency.

  20. d

    BlockDB Stablecoins Prices | Level 2 Tick-by-Tick | Ethereum & EVM Chains |...

    • datarade.ai
    Updated Oct 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BlockDB (2025). BlockDB Stablecoins Prices | Level 2 Tick-by-Tick | Ethereum & EVM Chains | Historical, EOD, Real-Time | Stablecoins Data [Dataset]. https://datarade.ai/data-products/blockdb-stablecoins-prices-level-2-tick-by-tick-ethereum-blockdb
    Explore at:
    .json, .csv, .xls, .parquetAvailable download formats
    Dataset updated
    Oct 10, 2025
    Dataset authored and provided by
    BlockDB
    Area covered
    Ukraine, United States of America, Japan, Gabon, Togo, Saint Kitts and Nevis, Sierra Leone, Philippines, Netherlands, Jersey
    Description

    Dataset Overview The Level-2 (L2) stablecoin liquidity impact dataset provides quantized liquidity grids around the mid-price (typically ±10%) for all major USD-pegged stablecoin pairs. Each row represents a depth snapshot at a defined basis-point spacing (e.g. 10 bps), describing how much stablecoin can be executed before the price deviates by a certain percentage.

    These grids allow analysts and developers to model depth curves, slippage, and peg resilience with high precision, fully reproducible from on-chain data.

    Key traits • Focused on USD and major fiat-pegged tokens (USDC, USDT, DAI, FRAX, LUSD, crvUSD, etc.) • Schema-stable, versioned, and audit-ready • Real-time (WSS) and historical/EOD delivery • Fully verifiable lineage back to pool events and raw on-chain logs

    Chains and Coverage ETH, BSC, Base, Arbitrum, Unichain, Avalanche, Polygon, Celo, Linea, Optimism (others on request). Full history from chain genesis; reorg-aware real-time ingestion and updates. Coverage includes all major DEX protocols holding stablecoin pairs: • Uniswap V2, V3, V4 • Curve, Balancer, Aerodrome, Solidly, Maverick, Pancake, and others

    Schema List the columns exactly as delivered. • snapshot_id BIGINT - unique identifier for each grid snapshot. • pool_uid BIGINT - reference to the liquidity pool (liquidity_pools.uid). • block_number BIGINT - block number of the originating event. • tx_index INTEGER - transaction index within that block. • log_index INTEGER - log index within the transaction. • token_in BYTEA - 20-byte ERC-20 address of the input token. • token_out BYTEA - 20-byte ERC-20 address of the output token. • current_price NUMERIC(78,18) - mid-price (token_out per 1 token_in, decimals-adjusted). • grid_step_bps SMALLINT - spacing between grid points, in basis points. • grid_radius_bps INTEGER - total radius of the grid window, in basis points. • grid_points SMALLINT - number of grid points; must equal radius/step + 1. • _tracing_id BYTEA - deterministic row-level hash • _parent_tracing_ids BYTEA[] - hash(es) of immediate parent rows in the derivation graph • _genesis_tracing_ids BYTEA[] - hash(es) of original sources (genesis of the derivation path) • _created_at TIMESTAMPTZ - Record creation timestamp. • _updated_at TIMESTAMPTZ - Record last update timestamp

    token_to_token_l2_points • snapshot_id BIGINT - reference to the parent snapshot (token_to_token_l2_snapshots.snapshot_id). • point_index SMALLINT - sequential index (0 … grid_points − 1). • offset_bps_abs INTEGER - absolute offset from the mid-price, in basis points. • size_in NUMERIC(78,18) - executable input amount required to reach this offset. • size_out NUMERIC(78,18) - corresponding output amount at that offset. • price_at_point NUMERIC(78,18) - average price (out / in) including impact.

    (For hex display: encode(token_in,'hex'), encode(token_out,'hex').)

    Lineage Every row has a verifiable path back to the originating raw events via the lineage triple and tracing graph: • _tracing_id - this row’s identity • _parent_tracing_ids - immediate sources • _genesis_tracing_ids - original on-chain sources This supports audits and exact reprocessing to source transactions/logs/function calls.

    Common Use Cases • Stablecoin peg analysis through liquidity depth modeling • Execution algorithm calibration (impact-aware order sizing) • Market-making and slippage profiling • Cross-chain peg risk and liquidity fragmentation studies • MEV and arbitrage signal extraction • AI/ML feature generation for depeg and volatility prediction

    Quality • Each row includes a cryptographic hash linking back to raw on-chain events for auditability. • Tick-level resolution for precision. • Reorg-aware ingestion ensuring data integrity. • Complete backfills to chain genesis for consistency.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Adelson de Araujo (2022). Crypto Fear and Greed Index [Dataset]. https://www.kaggle.com/datasets/adelsondias/crypto-fear-and-greed-index
Organization logo

Crypto Fear and Greed Index

The Fear & Greed Index for Bitcoin and other cryptocurrencies (alternative.me).

Explore at:
121 scholarly articles cite this dataset (View in Google Scholar)
zip(6461 bytes)Available download formats
Dataset updated
Sep 7, 2022
Authors
Adelson de Araujo
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Crypto Fear and Greed Index

Each day, the website https://alternative.me/crypto/fear-and-greed-index/ publishes this index based on analysis of emotions and sentiments from different sources crunched into one simple number: The Fear & Greed Index for Bitcoin and other large cryptocurrencies.

Why Measure Fear and Greed?

The crypto market behaviour is very emotional. People tend to get greedy when the market is rising which results in FOMO (Fear of missing out). Also, people often sell their coins in irrational reaction of seeing red numbers. With our Fear and Greed Index, we try to save you from your own emotional overreactions. There are two simple assumptions:

  • Extreme fear can be a sign that investors are too worried. That could be a buying opportunity.
  • When Investors are getting too greedy, that means the market is due for a correction.

Therefore, we analyze the current sentiment of the Bitcoin market and crunch the numbers into a simple meter from 0 to 100. Zero means "Extreme Fear", while 100 means "Extreme Greed". See below for further information on our data sources.

Data Sources

We are gathering data from the five following sources. Each data point is valued the same as the day before in order to visualize a meaningful progress in sentiment change of the crypto market.

First of all, the current index is for bitcoin only (we offer separate indices for large alt coins soon), because a big part of it is the volatility of the coin price.

But let’s list all the different factors we’re including in the current index:

Volatility (25 %)

We’re measuring the current volatility and max. drawdowns of bitcoin and compare it with the corresponding average values of the last 30 days and 90 days. We argue that an unusual rise in volatility is a sign of a fearful market.

Market Momentum/Volume (25%)

Also, we’re measuring the current volume and market momentum (again in comparison with the last 30/90 day average values) and put those two values together. Generally, when we see high buying volumes in a positive market on a daily basis, we conclude that the market acts overly greedy / too bullish.

Social Media (15%)

While our reddit sentiment analysis is still not in the live index (we’re still experimenting some market-related key words in the text processing algorithm), our twitter analysis is running. There, we gather and count posts on various hashtags for each coin (publicly, we show only those for Bitcoin) and check how fast and how many interactions they receive in certain time frames). A unusual high interaction rate results in a grown public interest in the coin and in our eyes, corresponds to a greedy market behaviour.

Surveys (15%) currently paused

Together with strawpoll.com (disclaimer: we own this site, too), quite a large public polling platform, we’re conducting weekly crypto polls and ask people how they see the market. Usually, we’re seeing 2,000 - 3,000 votes on each poll, so we do get a picture of the sentiment of a group of crypto investors. We don’t give those results too much attention, but it was quite useful in the beginning of our studies. You can see some recent results here.

Dominance (10%)

The dominance of a coin resembles the market cap share of the whole crypto market. Especially for Bitcoin, we think that a rise in Bitcoin dominance is caused by a fear of (and thus a reduction of) too speculative alt-coin investments, since Bitcoin is becoming more and more the safe haven of crypto. On the other side, when Bitcoin dominance shrinks, people are getting more greedy by investing in more risky alt-coins, dreaming of their chance in next big bull run. Anyhow, analyzing the dominance for a coin other than Bitcoin, you could argue the other way round, since more interest in an alt-coin may conclude a bullish/greedy behaviour for that specific coin.

Trends (10%)

We pull Google Trends data for various Bitcoin related search queries and crunch those numbers, especially the change of search volumes as well as recommended other currently popular searches. For example, if you check Google Trends for "Bitcoin", you can’t get much information from the search volume. But currently, you can see that there is currently a +1,550% rise of the query „bitcoin price manipulation“ in the box of related search queries (as of 05/29/2018). This is clearly a sign of fear in the market, and we use that for our index.

There's a story behind every dataset and here's your opportunity to share yours.

Copyright disclaimer

This dataset is produced and maintained by the administrators of https://alternative.me/crypto/fear-and-greed-index/.

This published version is an unofficial copy of their data, which can be also collected using their API (e.g., GET https://api.alternative.me/fng/?limit=10&format=csv&date_format=us).

Search
Clear search
Close search
Google apps
Main menu