6 datasets found

Encrypted Stock Market Data from Numerai
kaggle.com
zip
Updated Dec 2, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Numerai (2016). Encrypted Stock Market Data from Numerai [Dataset]. https://www.kaggle.com/numerai/encrypted-stock-market-data-from-numerai
Explore at:
zip(16033675 bytes)Available download formats
Dataset updated
Dec 2, 2016
Authors
Numerai
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

This is a sample of the training data used in the Numerai machine learning competition. https://numer.ai/about

Content

The data is cleaned, regularized and encrypted global equity data. The first 21 columns (feature1 - feature21) are features, and target is the binary class you’re trying to predict.

Goal

We want to see what the Kaggle community will produce with this dataset using Kernels.
Global Stock Market Data 2003-2023 Numerai Signals
kaggle.com
zip
Updated Jun 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joakim Arvidsson (2023). Global Stock Market Data 2003-2023 Numerai Signals [Dataset]. https://www.kaggle.com/datasets/joebeachcapital/yfinance-global-stock-data-2003-23-numerai-signals
Explore at:
zip(188001740 bytes)Available download formats
Dataset updated
Jun 6, 2023
Authors
Joakim Arvidsson
License
https://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/
Description
20 years of Yahoo Finance Open, High, Low, Close, Adjusted Close, Volume data, plus generated technical features (RSI, SMA) on close to 5000 global equities. Various targets including 20 days raw returns, residual returns, etc. Use to create predictive models on Numerai Signals tournament to stake and earn/burn $NMR.
Numerai Signals - Latest Version (daily updated)
kaggle.com
zip
Updated Nov 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joakim Arvidsson (2025). Numerai Signals - Latest Version (daily updated) [Dataset]. https://www.kaggle.com/datasets/joebeachcapital/numerai-signals-v1-0-data-daily-updated/code
Explore at:
zip(6812036379 bytes)Available download formats
Dataset updated
Nov 8, 2025
Authors
Joakim Arvidsson
License
https://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/
Description
Daily updated data for Numerai Signals. Data is from 2003 till today and includes over 5000 stocks from 26 markets, as well as 57 basic factors (e.g. growth, value, momentum, etc).

This dataset now always downloads daily all available files of the latest version (currently, as of 3rd of August, 2025, v2.1), which now includes the neutralisation matrix, latest targets, etc.

See Code tab for a starter notebook.
YFinance Stock Price Data for Numerai Signals
kaggle.com
zip
Updated Nov 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
katsu1110 (2025). YFinance Stock Price Data for Numerai Signals [Dataset]. https://www.kaggle.com/code1110/yfinance-stock-price-data-for-numerai-signals
Explore at:
zip(457271215 bytes)Available download formats
Dataset updated
Nov 6, 2025
Authors
katsu1110
Description
This Stock price OHLCV data is updated daily to be used for the weekly submission to the Numerai Signals. Note that there are some very strange values especially for (adjusted) close and volume data, which are known to be an issue with the Yahoo! Finance API. When you use this data, make sure that you deal with these unrealisitc values.
valid_numerai_signals_tickers_stocknewsapi
kaggle.com
zip
Updated Feb 17, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carlo Lepelaars (2021). valid_numerai_signals_tickers_stocknewsapi [Dataset]. https://www.kaggle.com/carlolepelaars/valid-numerai-signals-tickers-stocknewsapi
Explore at:
zip(11938 bytes)Available download formats
Dataset updated
Feb 17, 2021
Authors
Carlo Lepelaars
Description
Content

Pickle file denoting intersection of valid Numerai Signals tickers and tickers available on stocknewsapi.com

File is a dictionary with one key ("tickers") which points to a list of stock tickers.

Acknowledgements

signals.numer.ai

stocknewsapi.com
Numerai All history
kaggle.com
zip
Updated Sep 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mathurin Aché (2021). Numerai All history [Dataset]. https://www.kaggle.com/datasets/mathurinache/numerai-all-history/code
Explore at:
zip(1644605768 bytes)Available download formats
Dataset updated
Sep 9, 2021
Authors
Mathurin Aché
Description
Highlights We have just released the biggest upgrade to Numerai’s dataset ever. The new dataset has 4x the number of rows, more than 3x the number of features, and 20 optional targets. The fastest way to get started with the new dataset is to run through the new example scripts 43 You can continue to use the old dataset in the same way but models on the new dataset have much higher scores in historical tests. The website’s “Download Data” button will only download new data. The legacy data can still be downloaded via the API (GraphQL or NumerAPI) The website’s “Upload Predictions” button will only work for predictions made on the new data. Submissions using the legacy data can still be made via the API New Data The new data has both more features and more eras. There are now 1050 features instead of 310, and a total of 679 training and validation eras with targets provided instead of 142.

The eras are now weekly instead of monthly. This means that eras match the tournament more precisely, however they are now “overlapping”. This means that nearby eras are correlated with one another because their targets are generated from stock market performance from a shared, or “overlapping”, period of time.

1054×650 9.55 KB The new “training” period covers the same time period as eras #1-132 in the old data, but is now weekly rather than monthly.

The new “test” period is the same as the previous “test” period.

The new “validation” period covers the same time period as eras #197-212 in the old data plus an additional time period, and is now weekly rather than monthly.

The new “live” period functions just like the “live” period in the old data.

training_data One continuous period of historical data Has targets provided tournament_data Consists of “test” and “live” All of these rows must be predicted for a submission to be valid No targets provided Test is used for internal testing, but is not part of the tournament scoring and payouts Live is what users stake on and are scored on in the tournament validation_data A separate file. Predictions on these rows are not required for submission It can be submitted at any time to receive diagnostics on your predictions Has targets provided This is the most recent data that we provide, far removed from training data. This makes it particularly useful for seeing how your models’ performance declines over time, and how it would have been performing lately.

568×1372 24.7 KB New Targets The final major change is that there are now many different targets in the dataset. The tournament target, which is the one you are scored on, is always called “target”. Currently “target” corresponds to “target_nomi_20”, but this may change in the future. However you will also find 20 more targets which are not scored on, but you may find useful for training. The 20 targets consist of 10 different types of targets constructed using 2 different time periods, 20 and 60 days. Additional targets may also be added in the future.

Be aware that some of the new targets have different binning distributions than what you see with Nomi, i.e. 7 bins rather than 5, with less rigid constraints on samples per bin. Training models to be good at multiple targets and/or ensembling models trained on different targets is a great way to improve generalization performance and increase the uniqueness of your model.

The new targets are regularized in different ways and exhibit a range of correlations with each other from around ~0.3 to ~0.9. Due to this regularization you may find that models trained on some of the new targets generalize to predict “target” better than models trained on “target”. Other targets may yield models that appear to generalize poorly to “target” but end up helping in an ensemble.

You may also find that training on the 60 day targets, e.g. “target_nomi_60” yields more stable models when scored on the 20 day “target”. But beware: the eras are even more overlapped when using 60 day targets! You need to sample every 4th era to get non-overlapping eras with the 20 day targets, but every 12th era to get non-overlapping eras with the 60 day targets. If you choose not to subsample in this way, you instead need to be very careful about purging overlapping eras from your cross-validation folds. With great power comes great responsibility!

Finally, be careful about just selecting a target that does well on Validation. Target selection is yet another way to overfit. When in doubt, cross-validate!

API The new data can be accessed either through the “Download Data” button in the leaderboard sidebar or through s3 links returned by the dataset API using the filename argument; a list of valid filenames can be retrieved through the new list_datasets API query. The new training_data and validation_data files will be the same every week, while the tournament_data file will be updated with the latest live er...
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Numerai (2016). Encrypted Stock Market Data from Numerai [Dataset]. https://www.kaggle.com/numerai/encrypted-stock-market-data-from-numerai

Encrypted Stock Market Data from Numerai

~100,000 rows of cleaned, regularized and encrypted equities data.

Explore at:

zip(16033675 bytes)Available download formats

Dataset updated

Dec 2, 2016

Authors

Numerai

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

This is a sample of the training data used in the Numerai machine learning competition. https://numer.ai/about

Content

The data is cleaned, regularized and encrypted global equity data. The first 21 columns (feature1 - feature21) are features, and target is the binary class you’re trying to predict.

Goal

We want to see what the Kaggle community will produce with this dataset using Kernels.

Clear search

Close search

Google apps

Main menu

Encrypted Stock Market Data from Numerai

Context

Content

Goal

Global Stock Market Data 2003-2023 Numerai Signals

Numerai Signals - Latest Version (daily updated)

YFinance Stock Price Data for Numerai Signals

valid_numerai_signals_tickers_stocknewsapi

Content

Acknowledgements

Numerai All history

Encrypted Stock Market Data from Numerai

~100,000 rows of cleaned, regularized and encrypted equities data.

Context

Content

Goal