4 datasets found
  1. Cryptocurrency extra data - Litecoin

    • kaggle.com
    Updated Jan 20, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yam Peleg (2022). Cryptocurrency extra data - Litecoin [Dataset]. http://doi.org/10.34740/kaggle/dsv/3066229
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 20, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Yam Peleg
    Description

    Context:

    This dataset is an extra updating dataset for the G-Research Crypto Forecasting competition.

    Introduction

    This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of the 1-minute resolution, collected for all competition assets and both retrieval and uploading are fully automated. see discussion topic.

    The Data

    For every asset in the competition, the following fields from Binance's official API endpoint for historical candlestick data are collected, saved, and processed.

    
    1. **timestamp** - A timestamp for the minute covered by the row.
    2. **Asset_ID** - An ID code for the cryptoasset.
    3. **Count** - The number of trades that took place this minute.
    4. **Open** - The USD price at the beginning of the minute.
    5. **High** - The highest USD price during the minute.
    6. **Low** - The lowest USD price during the minute.
    7. **Close** - The USD price at the end of the minute.
    8. **Volume** - The number of cryptoasset u units traded during the minute.
    9. **VWAP** - The volume-weighted average price for the minute.
    10. **Target** - 15 minute residualized returns. See the 'Prediction and Evaluation section of this notebook for details of how the target is calculated.
    11. **Weight** - Weight, defined by the competition hosts [here](https://www.kaggle.com/cstein06/tutorial-to-the-g-research-crypto-competition)
    12. **Asset_Name** - Human readable Asset name.
    

    Indexing

    The dataframe is indexed by timestamp and sorted from oldest to newest. The first row starts at the first timestamp available on the exchange, which is July 2017 for the longest-running pairs.

    Usage Example

    The following is a collection of simple starter notebooks for Kaggle's Crypto Comp showing PurgedTimeSeries in use with the collected dataset. Purged TimesSeries is explained here. There are many configuration variables below to allow you to experiment. Use either GPU or TPU. You can control which years are loaded, which neural networks are used, and whether to use feature engineering. You can experiment with different data preprocessing, model architecture, loss, optimizers, and learning rate schedules. The extra datasets contain the full history of the assets in the same format as the competition, so you can input that into your model too.

    Baseline Example Notebooks:

    These notebooks follow the ideas presented in my "Initial Thoughts" here. Some code sections have been reused from Chris' great (great) notebook series on SIIM ISIC melanoma detection competition here

    Loose-ends:

    This is a work in progress and will be updated constantly throughout the competition. At the moment, there are some known issues that still needed to be addressed:

    • VWAP: - At the moment VWAP calculation formula is still unclear. Currently the dataset uses an approximation calculated from the Open, High, Low, Close, Volume candlesticks. [Waiting for competition hosts input]
    • Target Labeling: There exist some mismatches to the original target provided by the hosts at some time intervals. On all the others - it is the same. The labeling code can be seen here. [Waiting for competition hosts] input]
    • Filtering: No filtration of 0 volume data is taken place.

    Example Visualisations

    Opening price with an added indicator (MA50): https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fb8664e6f26dc84e9a40d5a3d915c9640%2Fdownload.png?generation=1582053879538546&alt=media" alt="">

    Volume and number of trades: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fcd04ed586b08c1576a7b67d163ad9889%2Fdownload-1.png?generation=1582053899082078&alt=media" alt="">

    License

    This data is being collected automatically from the crypto exchange Binance.

  2. Cryptocurrency extra data - TRON

    • kaggle.com
    Updated Jan 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yam Peleg (2022). Cryptocurrency extra data - TRON [Dataset]. http://doi.org/10.34740/kaggle/dsv/3066485
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 20, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Yam Peleg
    Description

    Context:

    This dataset is an extra updating dataset for the G-Research Crypto Forecasting competition.

    Introduction

    This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of the 1-minute resolution, collected for all competition assets and both retrieval and uploading are fully automated. see discussion topic.

    The Data

    For every asset in the competition, the following fields from Binance's official API endpoint for historical candlestick data are collected, saved, and processed.

    
    1. **timestamp** - A timestamp for the minute covered by the row.
    2. **Asset_ID** - An ID code for the cryptoasset.
    3. **Count** - The number of trades that took place this minute.
    4. **Open** - The USD price at the beginning of the minute.
    5. **High** - The highest USD price during the minute.
    6. **Low** - The lowest USD price during the minute.
    7. **Close** - The USD price at the end of the minute.
    8. **Volume** - The number of cryptoasset u units traded during the minute.
    9. **VWAP** - The volume-weighted average price for the minute.
    10. **Target** - 15 minute residualized returns. See the 'Prediction and Evaluation section of this notebook for details of how the target is calculated.
    11. **Weight** - Weight, defined by the competition hosts [here](https://www.kaggle.com/cstein06/tutorial-to-the-g-research-crypto-competition)
    12. **Asset_Name** - Human readable Asset name.
    

    Indexing

    The dataframe is indexed by timestamp and sorted from oldest to newest. The first row starts at the first timestamp available on the exchange, which is July 2017 for the longest-running pairs.

    Usage Example

    The following is a collection of simple starter notebooks for Kaggle's Crypto Comp showing PurgedTimeSeries in use with the collected dataset. Purged TimesSeries is explained here. There are many configuration variables below to allow you to experiment. Use either GPU or TPU. You can control which years are loaded, which neural networks are used, and whether to use feature engineering. You can experiment with different data preprocessing, model architecture, loss, optimizers, and learning rate schedules. The extra datasets contain the full history of the assets in the same format as the competition, so you can input that into your model too.

    Baseline Example Notebooks:

    These notebooks follow the ideas presented in my "Initial Thoughts" here. Some code sections have been reused from Chris' great (great) notebook series on SIIM ISIC melanoma detection competition here

    Loose-ends:

    This is a work in progress and will be updated constantly throughout the competition. At the moment, there are some known issues that still needed to be addressed:

    • VWAP: - At the moment VWAP calculation formula is still unclear. Currently the dataset uses an approximation calculated from the Open, High, Low, Close, Volume candlesticks. [Waiting for competition hosts input]
    • Target Labeling: There exist some mismatches to the original target provided by the hosts at some time intervals. On all the others - it is the same. The labeling code can be seen here. [Waiting for competition hosts] input]
    • Filtering: No filtration of 0 volume data is taken place.

    Example Visualisations

    Opening price with an added indicator (MA50): https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fb8664e6f26dc84e9a40d5a3d915c9640%2Fdownload.png?generation=1582053879538546&alt=media" alt="">

    Volume and number of trades: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fcd04ed586b08c1576a7b67d163ad9889%2Fdownload-1.png?generation=1582053899082078&alt=media" alt="">

    License

    This data is being collected automatically from the crypto exchange Binance.

  3. Cryptocurrency extra data - Ethereum Classic

    • kaggle.com
    Updated Jan 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yam Peleg (2022). Cryptocurrency extra data - Ethereum Classic [Dataset]. http://doi.org/10.34740/kaggle/dsv/3066021
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 19, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Yam Peleg
    Description

    Context:

    This dataset is an extra updating dataset for the G-Research Crypto Forecasting competition.

    Introduction

    This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of the 1-minute resolution, collected for all competition assets and both retrieval and uploading are fully automated. see discussion topic.

    The Data

    For every asset in the competition, the following fields from Binance's official API endpoint for historical candlestick data are collected, saved, and processed.

    
    1. **timestamp** - A timestamp for the minute covered by the row.
    2. **Asset_ID** - An ID code for the cryptoasset.
    3. **Count** - The number of trades that took place this minute.
    4. **Open** - The USD price at the beginning of the minute.
    5. **High** - The highest USD price during the minute.
    6. **Low** - The lowest USD price during the minute.
    7. **Close** - The USD price at the end of the minute.
    8. **Volume** - The number of cryptoasset u units traded during the minute.
    9. **VWAP** - The volume-weighted average price for the minute.
    10. **Target** - 15 minute residualized returns. See the 'Prediction and Evaluation section of this notebook for details of how the target is calculated.
    11. **Weight** - Weight, defined by the competition hosts [here](https://www.kaggle.com/cstein06/tutorial-to-the-g-research-crypto-competition)
    12. **Asset_Name** - Human readable Asset name.
    

    Indexing

    The dataframe is indexed by timestamp and sorted from oldest to newest. The first row starts at the first timestamp available on the exchange, which is July 2017 for the longest-running pairs.

    Usage Example

    The following is a collection of simple starter notebooks for Kaggle's Crypto Comp showing PurgedTimeSeries in use with the collected dataset. Purged TimesSeries is explained here. There are many configuration variables below to allow you to experiment. Use either GPU or TPU. You can control which years are loaded, which neural networks are used, and whether to use feature engineering. You can experiment with different data preprocessing, model architecture, loss, optimizers, and learning rate schedules. The extra datasets contain the full history of the assets in the same format as the competition, so you can input that into your model too.

    Baseline Example Notebooks:

    These notebooks follow the ideas presented in my "Initial Thoughts" here. Some code sections have been reused from Chris' great (great) notebook series on SIIM ISIC melanoma detection competition here

    Loose-ends:

    This is a work in progress and will be updated constantly throughout the competition. At the moment, there are some known issues that still needed to be addressed:

    • VWAP: - At the moment VWAP calculation formula is still unclear. Currently the dataset uses an approximation calculated from the Open, High, Low, Close, Volume candlesticks. [Waiting for competition hosts input]
    • Target Labeling: There exist some mismatches to the original target provided by the hosts at some time intervals. On all the others - it is the same. The labeling code can be seen here. [Waiting for competition hosts] input]
    • Filtering: No filtration of 0 volume data is taken place.

    Example Visualisations

    Opening price with an added indicator (MA50): https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fb8664e6f26dc84e9a40d5a3d915c9640%2Fdownload.png?generation=1582053879538546&alt=media" alt="">

    Volume and number of trades: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fcd04ed586b08c1576a7b67d163ad9889%2Fdownload-1.png?generation=1582053899082078&alt=media" alt="">

    License

    This data is being collected automatically from the crypto exchange Binance.

  4. Bitcoin daily (Jul 2010-Mar 2024)

    • kaggle.com
    Updated Mar 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Krairy (2024). Bitcoin daily (Jul 2010-Mar 2024) [Dataset]. https://www.kaggle.com/Datasets/Krairy/Bitcoin-Daily-Price-and-Vol-Jul-2010-Mar-2024/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 20, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Krairy
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The longest Bitcoin price series on Kaggle. Collected from various sources - so you don't have to.

    Open, High, Low, Close prices (in US Dollars) and trading Volume data.

    Is bitcoin a scam or the new gold? Is it a good asset for investments? Can you mine the seasonality patterns? Can you predict the price of bitcoin next year? Would it help to augment this series with exogeneous data, for instance, summary of SEC conferences or Elon Musk's tweets posts? Can the bitcoin price be handy to predict other events, for instance, the sentiment in the news? Let's find out!

    Sources 7.2010-09.2014: Investing.com 09.2014-03.2014: YahooFinance API (with yfinance)

  5. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Yam Peleg (2022). Cryptocurrency extra data - Litecoin [Dataset]. http://doi.org/10.34740/kaggle/dsv/3066229
Organization logo

Cryptocurrency extra data - Litecoin

[Auto Updating] Market data collection for G-Research Crypto forecasting comp

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 20, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Yam Peleg
Description

Context:

This dataset is an extra updating dataset for the G-Research Crypto Forecasting competition.

Introduction

This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of the 1-minute resolution, collected for all competition assets and both retrieval and uploading are fully automated. see discussion topic.

The Data

For every asset in the competition, the following fields from Binance's official API endpoint for historical candlestick data are collected, saved, and processed.


1. **timestamp** - A timestamp for the minute covered by the row.
2. **Asset_ID** - An ID code for the cryptoasset.
3. **Count** - The number of trades that took place this minute.
4. **Open** - The USD price at the beginning of the minute.
5. **High** - The highest USD price during the minute.
6. **Low** - The lowest USD price during the minute.
7. **Close** - The USD price at the end of the minute.
8. **Volume** - The number of cryptoasset u units traded during the minute.
9. **VWAP** - The volume-weighted average price for the minute.
10. **Target** - 15 minute residualized returns. See the 'Prediction and Evaluation section of this notebook for details of how the target is calculated.
11. **Weight** - Weight, defined by the competition hosts [here](https://www.kaggle.com/cstein06/tutorial-to-the-g-research-crypto-competition)
12. **Asset_Name** - Human readable Asset name.

Indexing

The dataframe is indexed by timestamp and sorted from oldest to newest. The first row starts at the first timestamp available on the exchange, which is July 2017 for the longest-running pairs.

Usage Example

The following is a collection of simple starter notebooks for Kaggle's Crypto Comp showing PurgedTimeSeries in use with the collected dataset. Purged TimesSeries is explained here. There are many configuration variables below to allow you to experiment. Use either GPU or TPU. You can control which years are loaded, which neural networks are used, and whether to use feature engineering. You can experiment with different data preprocessing, model architecture, loss, optimizers, and learning rate schedules. The extra datasets contain the full history of the assets in the same format as the competition, so you can input that into your model too.

Baseline Example Notebooks:

These notebooks follow the ideas presented in my "Initial Thoughts" here. Some code sections have been reused from Chris' great (great) notebook series on SIIM ISIC melanoma detection competition here

Loose-ends:

This is a work in progress and will be updated constantly throughout the competition. At the moment, there are some known issues that still needed to be addressed:

  • VWAP: - At the moment VWAP calculation formula is still unclear. Currently the dataset uses an approximation calculated from the Open, High, Low, Close, Volume candlesticks. [Waiting for competition hosts input]
  • Target Labeling: There exist some mismatches to the original target provided by the hosts at some time intervals. On all the others - it is the same. The labeling code can be seen here. [Waiting for competition hosts] input]
  • Filtering: No filtration of 0 volume data is taken place.

Example Visualisations

Opening price with an added indicator (MA50): https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fb8664e6f26dc84e9a40d5a3d915c9640%2Fdownload.png?generation=1582053879538546&alt=media" alt="">

Volume and number of trades: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fcd04ed586b08c1576a7b67d163ad9889%2Fdownload-1.png?generation=1582053899082078&alt=media" alt="">

License

This data is being collected automatically from the crypto exchange Binance.

Search
Clear search
Close search
Google apps
Main menu