2 datasets found
  1. Cryptocurrency extra data - Ethereum

    • kaggle.com
    Updated Jan 19, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yam Peleg (2022). Cryptocurrency extra data - Ethereum [Dataset]. http://doi.org/10.34740/kaggle/dsv/3066125
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 19, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Yam Peleg
    Description

    Context:

    This dataset is an extra updating dataset for the G-Research Crypto Forecasting competition.

    Introduction

    This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of the 1-minute resolution, collected for all competition assets and both retrieval and uploading are fully automated. see discussion topic.

    The Data

    For every asset in the competition, the following fields from Binance's official API endpoint for historical candlestick data are collected, saved, and processed.

    
    1. **timestamp** - A timestamp for the minute covered by the row.
    2. **Asset_ID** - An ID code for the cryptoasset.
    3. **Count** - The number of trades that took place this minute.
    4. **Open** - The USD price at the beginning of the minute.
    5. **High** - The highest USD price during the minute.
    6. **Low** - The lowest USD price during the minute.
    7. **Close** - The USD price at the end of the minute.
    8. **Volume** - The number of cryptoasset u units traded during the minute.
    9. **VWAP** - The volume-weighted average price for the minute.
    10. **Target** - 15 minute residualized returns. See the 'Prediction and Evaluation section of this notebook for details of how the target is calculated.
    11. **Weight** - Weight, defined by the competition hosts [here](https://www.kaggle.com/cstein06/tutorial-to-the-g-research-crypto-competition)
    12. **Asset_Name** - Human readable Asset name.
    

    Indexing

    The dataframe is indexed by timestamp and sorted from oldest to newest. The first row starts at the first timestamp available on the exchange, which is July 2017 for the longest-running pairs.

    Usage Example

    The following is a collection of simple starter notebooks for Kaggle's Crypto Comp showing PurgedTimeSeries in use with the collected dataset. Purged TimesSeries is explained here. There are many configuration variables below to allow you to experiment. Use either GPU or TPU. You can control which years are loaded, which neural networks are used, and whether to use feature engineering. You can experiment with different data preprocessing, model architecture, loss, optimizers, and learning rate schedules. The extra datasets contain the full history of the assets in the same format as the competition, so you can input that into your model too.

    Baseline Example Notebooks:

    These notebooks follow the ideas presented in my "Initial Thoughts" here. Some code sections have been reused from Chris' great (great) notebook series on SIIM ISIC melanoma detection competition here

    Loose-ends:

    This is a work in progress and will be updated constantly throughout the competition. At the moment, there are some known issues that still needed to be addressed:

    • VWAP: - At the moment VWAP calculation formula is still unclear. Currently the dataset uses an approximation calculated from the Open, High, Low, Close, Volume candlesticks. [Waiting for competition hosts input]
    • Target Labeling: There exist some mismatches to the original target provided by the hosts at some time intervals. On all the others - it is the same. The labeling code can be seen here. [Waiting for competition hosts] input]
    • Filtering: No filtration of 0 volume data is taken place.

    Example Visualisations

    Opening price with an added indicator (MA50): https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fb8664e6f26dc84e9a40d5a3d915c9640%2Fdownload.png?generation=1582053879538546&alt=media" alt="">

    Volume and number of trades: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fcd04ed586b08c1576a7b67d163ad9889%2Fdownload-1.png?generation=1582053899082078&alt=media" alt="">

    License

    This data is being collected automatically from the crypto exchange Binance.

  2. Cryptocurrency extra data - Litecoin

    • kaggle.com
    zip
    Updated Nov 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yam Peleg (2021). Cryptocurrency extra data - Litecoin [Dataset]. https://www.kaggle.com/yamqwe/cryptocurrency-extra-data-litecoin
    Explore at:
    zip(246580187 bytes)Available download formats
    Dataset updated
    Nov 12, 2021
    Authors
    Yam Peleg
    Description

    Context:

    This dataset is an extra updating dataset for the G-Research Crypto Forecasting competition.

    Introduction

    This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of the 1-minute resolution, collected for all competition assets and both retrieval and uploading are fully automated. see discussion topic.

    The Data

    For every asset in the competition, the following fields from Binance's official API endpoint for historical candlestick data are collected, saved, and processed.

    
    1. **timestamp** - A timestamp for the minute covered by the row.
    2. **Asset_ID** - An ID code for the cryptoasset.
    3. **Count** - The number of trades that took place this minute.
    4. **Open** - The USD price at the beginning of the minute.
    5. **High** - The highest USD price during the minute.
    6. **Low** - The lowest USD price during the minute.
    7. **Close** - The USD price at the end of the minute.
    8. **Volume** - The number of cryptoasset u units traded during the minute.
    9. **VWAP** - The volume-weighted average price for the minute.
    10. **Target** - 15 minute residualized returns. See the 'Prediction and Evaluation section of this notebook for details of how the target is calculated.
    11. **Weight** - Weight, defined by the competition hosts [here](https://www.kaggle.com/cstein06/tutorial-to-the-g-research-crypto-competition)
    12. **Asset_Name** - Human readable Asset name.
    

    Indexing

    The dataframe is indexed by timestamp and sorted from oldest to newest. The first row starts at the first timestamp available on the exchange, which is July 2017 for the longest-running pairs.

    Usage Example

    The following is a collection of simple starter notebooks for Kaggle's Crypto Comp showing PurgedTimeSeries in use with the collected dataset. Purged TimesSeries is explained here. There are many configuration variables below to allow you to experiment. Use either GPU or TPU. You can control which years are loaded, which neural networks are used, and whether to use feature engineering. You can experiment with different data preprocessing, model architecture, loss, optimizers, and learning rate schedules. The extra datasets contain the full history of the assets in the same format as the competition, so you can input that into your model too.

    Baseline Example Notebooks:

    These notebooks follow the ideas presented in my "Initial Thoughts" here. Some code sections have been reused from Chris' great (great) notebook series on SIIM ISIC melanoma detection competition here

    Loose-ends:

    This is a work in progress and will be updated constantly throughout the competition. At the moment, there are some known issues that still needed to be addressed:

    • VWAP: - At the moment VWAP calculation formula is still unclear. Currently the dataset uses an approximation calculated from the Open, High, Low, Close, Volume candlesticks. [Waiting for competition hosts input]
    • Target Labeling: There exist some mismatches to the original target provided by the hosts at some time intervals. On all the others - it is the same. The labeling code can be seen here. [Waiting for competition hosts] input]
    • Filtering: No filtration of 0 volume data is taken place.

    Example Visualisations

    Opening price with an added indicator (MA50): https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fb8664e6f26dc84e9a40d5a3d915c9640%2Fdownload.png?generation=1582053879538546&alt=media" alt="">

    Volume and number of trades: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fcd04ed586b08c1576a7b67d163ad9889%2Fdownload-1.png?generation=1582053899082078&alt=media" alt="">

    License

    This data is being collected automatically from the crypto exchange Binance.

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Yam Peleg (2022). Cryptocurrency extra data - Ethereum [Dataset]. http://doi.org/10.34740/kaggle/dsv/3066125
Organization logo

Cryptocurrency extra data - Ethereum

[Auto Updating] Market data collection for G-Research Crypto forecasting comp

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 19, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Yam Peleg
Description

Context:

This dataset is an extra updating dataset for the G-Research Crypto Forecasting competition.

Introduction

This is a daily updated dataset, automaticlly collecting market data for G-Research crypto forecasting competition. The data is of the 1-minute resolution, collected for all competition assets and both retrieval and uploading are fully automated. see discussion topic.

The Data

For every asset in the competition, the following fields from Binance's official API endpoint for historical candlestick data are collected, saved, and processed.


1. **timestamp** - A timestamp for the minute covered by the row.
2. **Asset_ID** - An ID code for the cryptoasset.
3. **Count** - The number of trades that took place this minute.
4. **Open** - The USD price at the beginning of the minute.
5. **High** - The highest USD price during the minute.
6. **Low** - The lowest USD price during the minute.
7. **Close** - The USD price at the end of the minute.
8. **Volume** - The number of cryptoasset u units traded during the minute.
9. **VWAP** - The volume-weighted average price for the minute.
10. **Target** - 15 minute residualized returns. See the 'Prediction and Evaluation section of this notebook for details of how the target is calculated.
11. **Weight** - Weight, defined by the competition hosts [here](https://www.kaggle.com/cstein06/tutorial-to-the-g-research-crypto-competition)
12. **Asset_Name** - Human readable Asset name.

Indexing

The dataframe is indexed by timestamp and sorted from oldest to newest. The first row starts at the first timestamp available on the exchange, which is July 2017 for the longest-running pairs.

Usage Example

The following is a collection of simple starter notebooks for Kaggle's Crypto Comp showing PurgedTimeSeries in use with the collected dataset. Purged TimesSeries is explained here. There are many configuration variables below to allow you to experiment. Use either GPU or TPU. You can control which years are loaded, which neural networks are used, and whether to use feature engineering. You can experiment with different data preprocessing, model architecture, loss, optimizers, and learning rate schedules. The extra datasets contain the full history of the assets in the same format as the competition, so you can input that into your model too.

Baseline Example Notebooks:

These notebooks follow the ideas presented in my "Initial Thoughts" here. Some code sections have been reused from Chris' great (great) notebook series on SIIM ISIC melanoma detection competition here

Loose-ends:

This is a work in progress and will be updated constantly throughout the competition. At the moment, there are some known issues that still needed to be addressed:

  • VWAP: - At the moment VWAP calculation formula is still unclear. Currently the dataset uses an approximation calculated from the Open, High, Low, Close, Volume candlesticks. [Waiting for competition hosts input]
  • Target Labeling: There exist some mismatches to the original target provided by the hosts at some time intervals. On all the others - it is the same. The labeling code can be seen here. [Waiting for competition hosts] input]
  • Filtering: No filtration of 0 volume data is taken place.

Example Visualisations

Opening price with an added indicator (MA50): https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fb8664e6f26dc84e9a40d5a3d915c9640%2Fdownload.png?generation=1582053879538546&alt=media" alt="">

Volume and number of trades: https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2234678%2Fcd04ed586b08c1576a7b67d163ad9889%2Fdownload-1.png?generation=1582053899082078&alt=media" alt="">

License

This data is being collected automatically from the crypto exchange Binance.

Search
Clear search
Close search
Google apps
Main menu