6 datasets found
  1. Encrypted Stock Market Data from Numerai

    • kaggle.com
    zip
    Updated Dec 2, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Numerai (2016). Encrypted Stock Market Data from Numerai [Dataset]. https://www.kaggle.com/numerai/encrypted-stock-market-data-from-numerai
    Explore at:
    zip(16033675 bytes)Available download formats
    Dataset updated
    Dec 2, 2016
    Authors
    Numerai
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    This is a sample of the training data used in the Numerai machine learning competition. https://numer.ai/about

    Content

    The data is cleaned, regularized and encrypted global equity data. The first 21 columns (feature1 - feature21) are features, and target is the binary class you’re trying to predict.

    Goal

    We want to see what the Kaggle community will produce with this dataset using Kernels.

  2. Global Stock Market Data 2003-2023 Numerai Signals

    • kaggle.com
    zip
    Updated Jun 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joakim Arvidsson (2023). Global Stock Market Data 2003-2023 Numerai Signals [Dataset]. https://www.kaggle.com/datasets/joebeachcapital/yfinance-global-stock-data-2003-23-numerai-signals
    Explore at:
    zip(188001740 bytes)Available download formats
    Dataset updated
    Jun 6, 2023
    Authors
    Joakim Arvidsson
    License

    https://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/

    Description

    20 years of Yahoo Finance Open, High, Low, Close, Adjusted Close, Volume data, plus generated technical features (RSI, SMA) on close to 5000 global equities. Various targets including 20 days raw returns, residual returns, etc. Use to create predictive models on Numerai Signals tournament to stake and earn/burn $NMR.

  3. Numerai Signals - Latest Version (daily updated)

    • kaggle.com
    zip
    Updated Nov 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joakim Arvidsson (2025). Numerai Signals - Latest Version (daily updated) [Dataset]. https://www.kaggle.com/datasets/joebeachcapital/numerai-signals-v1-0-data-daily-updated/code
    Explore at:
    zip(6812036379 bytes)Available download formats
    Dataset updated
    Nov 8, 2025
    Authors
    Joakim Arvidsson
    License

    https://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/

    Description

    Daily updated data for Numerai Signals. Data is from 2003 till today and includes over 5000 stocks from 26 markets, as well as 57 basic factors (e.g. growth, value, momentum, etc).

    This dataset now always downloads daily all available files of the latest version (currently, as of 3rd of August, 2025, v2.1), which now includes the neutralisation matrix, latest targets, etc.

    See Code tab for a starter notebook.

  4. YFinance Stock Price Data for Numerai Signals

    • kaggle.com
    zip
    Updated Nov 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    katsu1110 (2025). YFinance Stock Price Data for Numerai Signals [Dataset]. https://www.kaggle.com/code1110/yfinance-stock-price-data-for-numerai-signals
    Explore at:
    zip(457271215 bytes)Available download formats
    Dataset updated
    Nov 6, 2025
    Authors
    katsu1110
    Description

    This Stock price OHLCV data is updated daily to be used for the weekly submission to the Numerai Signals. Note that there are some very strange values especially for (adjusted) close and volume data, which are known to be an issue with the Yahoo! Finance API. When you use this data, make sure that you deal with these unrealisitc values.

  5. valid_numerai_signals_tickers_stocknewsapi

    • kaggle.com
    zip
    Updated Feb 17, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carlo Lepelaars (2021). valid_numerai_signals_tickers_stocknewsapi [Dataset]. https://www.kaggle.com/carlolepelaars/valid-numerai-signals-tickers-stocknewsapi
    Explore at:
    zip(11938 bytes)Available download formats
    Dataset updated
    Feb 17, 2021
    Authors
    Carlo Lepelaars
    Description

    Content

    Pickle file denoting intersection of valid Numerai Signals tickers and tickers available on stocknewsapi.com

    File is a dictionary with one key ("tickers") which points to a list of stock tickers.

    Acknowledgements

  6. Numerai All history

    • kaggle.com
    zip
    Updated Sep 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mathurin Aché (2021). Numerai All history [Dataset]. https://www.kaggle.com/datasets/mathurinache/numerai-all-history/code
    Explore at:
    zip(1644605768 bytes)Available download formats
    Dataset updated
    Sep 9, 2021
    Authors
    Mathurin Aché
    Description

    Highlights We have just released the biggest upgrade to Numerai’s dataset ever. The new dataset has 4x the number of rows, more than 3x the number of features, and 20 optional targets. The fastest way to get started with the new dataset is to run through the new example scripts 43 You can continue to use the old dataset in the same way but models on the new dataset have much higher scores in historical tests. The website’s “Download Data” button will only download new data. The legacy data can still be downloaded via the API (GraphQL or NumerAPI) The website’s “Upload Predictions” button will only work for predictions made on the new data. Submissions using the legacy data can still be made via the API New Data The new data has both more features and more eras. There are now 1050 features instead of 310, and a total of 679 training and validation eras with targets provided instead of 142.

    The eras are now weekly instead of monthly. This means that eras match the tournament more precisely, however they are now “overlapping”. This means that nearby eras are correlated with one another because their targets are generated from stock market performance from a shared, or “overlapping”, period of time.

    1054×650 9.55 KB The new “training” period covers the same time period as eras #1-132 in the old data, but is now weekly rather than monthly.

    The new “test” period is the same as the previous “test” period.

    The new “validation” period covers the same time period as eras #197-212 in the old data plus an additional time period, and is now weekly rather than monthly.

    The new “live” period functions just like the “live” period in the old data.

    training_data One continuous period of historical data Has targets provided tournament_data Consists of “test” and “live” All of these rows must be predicted for a submission to be valid No targets provided Test is used for internal testing, but is not part of the tournament scoring and payouts Live is what users stake on and are scored on in the tournament validation_data A separate file. Predictions on these rows are not required for submission It can be submitted at any time to receive diagnostics on your predictions Has targets provided This is the most recent data that we provide, far removed from training data. This makes it particularly useful for seeing how your models’ performance declines over time, and how it would have been performing lately.

    568×1372 24.7 KB New Targets The final major change is that there are now many different targets in the dataset. The tournament target, which is the one you are scored on, is always called “target”. Currently “target” corresponds to “target_nomi_20”, but this may change in the future. However you will also find 20 more targets which are not scored on, but you may find useful for training. The 20 targets consist of 10 different types of targets constructed using 2 different time periods, 20 and 60 days. Additional targets may also be added in the future.

    Be aware that some of the new targets have different binning distributions than what you see with Nomi, i.e. 7 bins rather than 5, with less rigid constraints on samples per bin. Training models to be good at multiple targets and/or ensembling models trained on different targets is a great way to improve generalization performance and increase the uniqueness of your model.

    The new targets are regularized in different ways and exhibit a range of correlations with each other from around ~0.3 to ~0.9. Due to this regularization you may find that models trained on some of the new targets generalize to predict “target” better than models trained on “target”. Other targets may yield models that appear to generalize poorly to “target” but end up helping in an ensemble.

    You may also find that training on the 60 day targets, e.g. “target_nomi_60” yields more stable models when scored on the 20 day “target”. But beware: the eras are even more overlapped when using 60 day targets! You need to sample every 4th era to get non-overlapping eras with the 20 day targets, but every 12th era to get non-overlapping eras with the 60 day targets. If you choose not to subsample in this way, you instead need to be very careful about purging overlapping eras from your cross-validation folds. With great power comes great responsibility!

    Finally, be careful about just selecting a target that does well on Validation. Target selection is yet another way to overfit. When in doubt, cross-validate!

    API The new data can be accessed either through the “Download Data” button in the leaderboard sidebar or through s3 links returned by the dataset API using the filename argument; a list of valid filenames can be retrieved through the new list_datasets API query. The new training_data and validation_data files will be the same every week, while the tournament_data file will be updated with the latest live er...

  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Numerai (2016). Encrypted Stock Market Data from Numerai [Dataset]. https://www.kaggle.com/numerai/encrypted-stock-market-data-from-numerai
Organization logo

Encrypted Stock Market Data from Numerai

~100,000 rows of cleaned, regularized and encrypted equities data.

Explore at:
zip(16033675 bytes)Available download formats
Dataset updated
Dec 2, 2016
Authors
Numerai
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

This is a sample of the training data used in the Numerai machine learning competition. https://numer.ai/about

Content

The data is cleaned, regularized and encrypted global equity data. The first 21 columns (feature1 - feature21) are features, and target is the binary class you’re trying to predict.

Goal

We want to see what the Kaggle community will produce with this dataset using Kernels.

Search
Clear search
Close search
Google apps
Main menu