The algorithmic trading space is buzzing with new strategies. Companies have spent billions in infrastructures and R&D to be able to jump ahead of the competition and beat the market. Still, it is well acknowledged that the buy & hold strategy is able to outperform many of the algorithmic strategies, especially in the long-run. However, finding value in stocks is an art that very few mastered, can a computer do that?
This Data repo contains two datasets:
Example_2019_price_var.csv. I built this dataset thanks to Financial Modeling Prep API and to pandas_datareader. Each row is a stock from the technology sector of the US stock market (that is available from the aforementioned API, which is free and highly recommended). The column contains the percent price variation of each stock for the year 2019. In other words, it collects the percent price variation of each stock from the first trading day on Jan 2019 to the last trading day of Dec 2019. To compute this price variation I decided to consider the Adjusted Close Price.
Example_DATASET.csv. I built this dataset thanks to Financial Modeling Prep API. Each row is a stock from the technology sector of the US stock market (that is available from the aforementioned API). Each column is a financial indicator that can be found in the 2018 10-K filings of each company. There are no Nans or empty cells. Furthermore, the last column is the CLASS of each stock, where:
In other words, the last column is used to classify each stock in buy-worthy or not, and this relationship is what should allow a machine learning model to learn to recognize stocks that will increase their value from those that won't.
NOTE: the number of stocks does not match between the two datasets because the API did not have all the required financial indicators for some stocks. It is possible to remove from Example_2019_price_var.csv those rows that do not appear in Example_DATASET.csv.
I built this dataset during the 2019 winter holidays period, because I wanted to answer a simple question: is it possible to have a machine learning model learn the differences between stocks that perform well and those that don't, and then leverage this knowledge in order to predict which stock will be worth buying? Moreover, is it possible to achieve this simply by looking at financial indicators found in the 10-K filings?
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
NASDAQ: Index: NASDAQ 100 Technology Sector Index data was reported at 9,723.190 NA in Apr 2025. This records an increase from the previous number of 9,472.590 NA for Mar 2025. NASDAQ: Index: NASDAQ 100 Technology Sector Index data is updated monthly, averaging 4,219.390 NA from Jan 2012 (Median) to Apr 2025, with 160 observations. The data reached an all-time high of 10,862.950 NA in Jan 2025 and a record low of 1,306.370 NA in May 2012. NASDAQ: Index: NASDAQ 100 Technology Sector Index data remains active status in CEIC and is reported by Exchange Data International Limited. The data is categorized under Global Database’s United States – Table US.EDI.SE: NASDAQ: Monthly.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Japan's main stock market index, the JP225, rose to 40839 points on July 30, 2025, gaining 0.40% from the previous session. Over the past month, the index has climbed 2.13% and is up 4.44% compared to the same time last year, according to trading on a contract for difference (CFD) that tracks this benchmark index from Japan. Japan Stock Market Index (JP225) - values, historical data, forecasts and news - updated on July of 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Canada's main stock market index, the TSX, fell to 27370 points on July 30, 2025, losing 0.62% from the previous session. Over the past month, the index has climbed 1.86% and is up 18.43% compared to the same time last year, according to trading on a contract for difference (CFD) that tracks this benchmark index from Canada. Canada Stock Market Index (TSX) - values, historical data, forecasts and news - updated on July of 2025.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
The algorithmic trading space is buzzing with new strategies. Companies have spent billions in infrastructures and R&D to be able to jump ahead of the competition and beat the market. Still, it is well acknowledged that the buy & hold strategy is able to outperform many of the algorithmic strategies, especially in the long-run. However, finding value in stocks is an art that very few mastered, can a computer do that?
This Data repo contains two datasets:
Example_2019_price_var.csv. I built this dataset thanks to Financial Modeling Prep API and to pandas_datareader. Each row is a stock from the technology sector of the US stock market (that is available from the aforementioned API, which is free and highly recommended). The column contains the percent price variation of each stock for the year 2019. In other words, it collects the percent price variation of each stock from the first trading day on Jan 2019 to the last trading day of Dec 2019. To compute this price variation I decided to consider the Adjusted Close Price.
Example_DATASET.csv. I built this dataset thanks to Financial Modeling Prep API. Each row is a stock from the technology sector of the US stock market (that is available from the aforementioned API). Each column is a financial indicator that can be found in the 2018 10-K filings of each company. There are no Nans or empty cells. Furthermore, the last column is the CLASS of each stock, where:
In other words, the last column is used to classify each stock in buy-worthy or not, and this relationship is what should allow a machine learning model to learn to recognize stocks that will increase their value from those that won't.
NOTE: the number of stocks does not match between the two datasets because the API did not have all the required financial indicators for some stocks. It is possible to remove from Example_2019_price_var.csv those rows that do not appear in Example_DATASET.csv.
I built this dataset during the 2019 winter holidays period, because I wanted to answer a simple question: is it possible to have a machine learning model learn the differences between stocks that perform well and those that don't, and then leverage this knowledge in order to predict which stock will be worth buying? Moreover, is it possible to achieve this simply by looking at financial indicators found in the 10-K filings?