Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
All data compiled from Yahoo Finance
If you have questions, e-mail me: jiunyyen@gmail.com
Happy mining!
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset contains almost all the stocks listed on these exchanges as of the date shown in the file name. Some of the symbols cannot be found on Yahoo Finance, which I plan on using CNN Money to scrape. There are other symbols that have different classes that require some modification before I can make them queryable... I have yet to decide on the best course of action. If you want to know what these excluded symbols are, see excluded_symbols.txt.
Note: there used to be some tickers missing because of poor connection, that's been solved now.
I've also been asked why I don't put everything into one table, and here's my rationale (copy/pasted from my email):
It is possible and I've debated this before, but I've decided to go with individual files for quite a number of reasons, and I highly recommend you consider these before combining them: 1) I don't need to load everything into memory or search for the right rows if I only want to work with particular sets, 2) easier and faster to manipulate (append, remove, or whatever) when all the data of a ticker is in the same place, 3) I don't need to repeat ticker names for each row just to know which row belongs to which ticker, 4) reduce risk, latency, and waits during parallel processing of different ticker data, 5) in case of any unforeseen bad writes or termination, this way reduces the chances of affecting the entire dataset and allows for restart anytime without the need to keep backup things up every 5 minutes. I get all these benefits only at the cost of slightly larger compressed file and a few more lines of code. To me it's worth it, but I can understand if you are frustrated, but it is possible to concatenate everything.
https://github.com/qks1lver/redtide
Listing files (i.e. NYSE.txt) are from http://eoddata.com/symbols.aspx
Daily historical data compiled from Yahoo Finance
If you have questions, e-mail me: jiunyyen@gmail.com
Happy mining!
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains historical daily prices for all tickers currently trading on NASDAQ. The up to date list is available from nasdaqtrader.com. The historic data is retrieved from Yahoo finance via yfinance python package.
It contains prices for up to 01 of April 2020. If you need more up to date data, just fork and re-run data collection script also available from Kaggle.
The date for every symbol is saved in CSV format with common fields:
All that ticker data is then stored in either ETFs or stocks folder, depending on a type. Moreover, each filename is the corresponding ticker symbol. At last, symbols_valid_meta.csv
contains some additional metadata for each ticker such as full name.
NYSE Integrated is a proprietary data feed that disseminates full order book updates from the New York Stock Exchange (XNYS). It delivers every quote and order at each price level, along with any event that updates the order book after an order is placed, such as trade executions, modifications, or cancellations.
NYSE is the leading venue for listing blue-chip companies and large-cap stocks. Powered by NYSE's Pillar platform, its hybrid market model of floor-based auction and electronic trading allows it to capture a significant portion of trading activity during the US equity market open and close. As of January 2025, the NYSE represented approximately 6.31% of the average daily volume (ADV) across all exchange-listed US securities, including those listed on Nasdaq, other NYSE venues, and Cboe exchanges.
NYSE is also the only exchange to offer Designated Market Maker (DMM) privileges, allowing the floor to send D-Quote Orders, short for Discretionary Orders, throughout the day. Most D-Quote Orders execute in the closing auction, where they're known as Closing D Orders and allow traders to access the NYSE closing auction after 3:50 PM. This creates significant price discovery during the NYSE Closing Auction, where interest represented via the floor contributes more than 40% of total volume.
NYSE is also unique for being the only exchange with a Parity/Priority Allocation model for matching. This resembles a mixed FIFO and pro-rata matching algorithm, where the participant who sets the best price is matched first, and then the remaining shares are allocated to other orders entered by floor brokers at that price (parity allocation). Floor brokers may utilize e-Quotes to to receive such parity allocation of incoming executions.
With L3 granularity, NYSE Integrated captures information beyond the L1, top-of-book data available through SIP feeds, enabling accurate modeling of the book imbalances, queue dynamics, and the auction process. This data includes explicit trade aggressor side, odd lots, and imbalances. Auction imbalances offer valuable insights into NYSE’s opening and closing auctions by providing details like imbalance quantity, paired quantity, imbalance reference price, and book clearing price.
Historical data is available for usage-based rates or with any Databento US Equities subscription. Visit our pricing page for more details or to upgrade your plan.
Asset class: Equities
Origin: Directly captured at Equinix NY4 (Secaucus, NJ) with an FPGA-based network card and hardware timestamping. Synchronized to UTC with PTP.
Supported data encodings: DBN, CSV, JSON (Learn more)
Supported market data schemas: MBO, MBP-1, MBP-10, TBBO, Trades, BBO-1s, BBO-1m, OHLCV-1s, OHLCV-1m, OHLCV-1h, OHLCV-1d, Definition, Imbalance, Statistics, Status (Learn more)
Resolution: Immediate publication, nanosecond-resolution timestamps
https://fred.stlouisfed.org/legal/#copyright-pre-approvalhttps://fred.stlouisfed.org/legal/#copyright-pre-approval
View data of the S&P 500, an index of the stocks of 500 leading companies in the US economy, which provides a gauge of the U.S. equity market.
This dataset offers both live (delayed) prices and End Of Day time series on equity options
1/ Live (delayed) prices for options on European stocks and indices including:
Reference spot price, bid/ask screen price, fair value price (based on surface calibration), implicit volatility, forward
Greeks : delta, vega
Canari.dev computes AI-generated forecast signals indicating which option is over/underpriced, based on the holders strategy (buy and hold until maturity, 1 hour to 2 days holding horizon...). From these signals is derived a "Canari price" which is also available in this live tables.
Visit our website (canari.dev ) for more details about our forecast signals.
The delay ranges from 15 to 40 minutes depending on underlyings.
2/ Historical time series:
Implied vol
Realized vol
Smile
Forward
See a full API presentation here : https://youtu.be/qitPO-SFmY4 .
These data are also readily accessible in Excel thanks the provided Add-in available on Github: https://github.com/canari-dev/Excel-macro-to-consume-Canari-API
If you need help, contact us at: contact@canari.dev
User Guide: You can get a preview of the API by typing "data.canari.dev" in your web browser. This will show you a free version of this API with limited data.
Here are examples of possible syntaxes:
For live options prices: data.canari.dev/OPT/DAI data.canari.dev/OPT/OESX/0923 The "csv" suffix to get a csv rather than html formating, for example: data.canari.dev/OPT/DB1/1223/csv For historical parameters: Implied vol : data.canari.dev/IV/BMW
data.canari.dev/IV/ALV/1224
data.canari.dev/IV/DTE/1224/csv
Realized vol (intraday, maturity expressed as EWM, span in business days): data.canari.dev/RV/IFX ... Implied dividend flow: data.canari.dev/DIV/IBE ... Smile (vol spread between ATM strike and 90% strike, normalized to 1Y with factor 1/√T): data.canari.dev/SMI/DTE ... Forward: data.canari.dev/FWD/BNP ...
List of available underlyings: Code Name OESX Eurostoxx50 ODAX DAX OSMI SMI (Swiss index) OESB Eurostoxx Banks OVS2 VSTOXX ITK AB Inbev ABBN ABB ASM ASML ADS Adidas AIR Air Liquide EAD Airbus ALV Allianz AXA Axa BAS BASF BBVD BBVA BMW BMW BNP BNP BAY Bayer DBK Deutsche Bank DB1 Deutsche Boerse DPW Deutsche Post DTE Deutsche Telekom EOA E.ON ENL5 Enel INN ING IBE Iberdrola IFX Infineon IES5 Intesa Sanpaolo PPX Kering LOR L Oreal MOH LVMH LIN Linde DAI Mercedes-Benz MUV2 Munich Re NESN Nestle NOVN Novartis PHI1 Philips REP Repsol ROG Roche SAP SAP SNW Sanofi BSD2 Santander SND Schneider SIE Siemens SGE Société Générale SREN Swiss Re TNE5 Telefonica TOTB TotalEnergies UBSN UBS CRI5 Unicredito SQU Vinci VO3 Volkswagen ANN Vonovia ZURN Zurich Insurance Group
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Stock market prediction remains active research in a quest to inform investors on how to trade (buy/sell) at the most opportune time. The prevalent methods used by stock market players in trying to predict the likely future trade prices are either technical, fundamental or time series analysis. This research wanted to try out machine learning methods, in contrast to the existing prevalent methods. Artificial neural networks (ANNs) tend to be the preferred machine learning method for this type of application. However, ANNs require some historical data to learn from, in order to do predictions. The research used an ANN model to test the hypothesis that the next day price (prediction) can be determined from the stock prices of the immediate last five days. The final ANN model used for the tests was a feedforward multi-layer perceptron (MLP) with error backpropagation, using sigmoid activation function, with network configuration 5:21:21:1. The data period used was a 5-year dataset (2008 to 2012), with 80% of the data (4-year data) used for training and the balance 20% used for testing (last 1-year data). The original raw data for Nairobi Securities Exchange (NSE) was scrapped from a publicly available and accessible website of a stock market analysis company in Kenya (Synergy, 2020). This daily prices data was first exported to a spreadsheet, then cleaned off headers and other redundant information, leaving only the data with stock name, date of trade and the related data such as volumes, low prices, high prices and adjusted prices. The data was further sorted by the stock names and then the trading dates. The data dimension was finally reduced to only what was needed for the research, which was the stock name, the date of trade and the adjusted price (average trade price). This final dataset was in CSV format, as hereby presented. The research tested three NSE stocks with the mean absolute percentage error (MAPE) ranging between 0.77% to 1.91%, over the 3-month testing period, while the root mean squared error (RMSE) ranged between 1.83 and 3.07. This raw data can be used to train and test any machine learning model that requires training and testing data. The data can also be used to validate and reproduce the results already presented in this research. There could be slight variance between what is obtained when reproducing the results, due to the differences in the final exact weights that the trained ANN model eventually achieves. However, these differences should not be significant. List of data files on this dataset: stock01_NSE_01jan2008_to_31dec2012_Kakuzi.csv stock02_NSE_01jan2008_to_31dec2012_StandardBank.csv stock03_NSE_01jan2008_to_31dec2012_KenyaAirways.csv stock04_NSE_01jan2008_to_31dec2012_BamburiCement.csv stock05_NSE_01jan2008_to_31dec2012_Kengen.csv stock06_NSE_01jan2008_to_31dec2012_BAT.csv References: Synergy Systems Ltd. (2020). MyStocks. Retrieved March 9, 2020, from http://live.mystocks.co.ke/
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Stock market prediction remains active research in a quest to inform investors on how to trade (buy/sell) at the most opportune time. The prevalent methods used by stock market players in trying to predict the likely future trade prices are either technical, fundamental or time series analysis. This research wanted to try out machine learning methods, in contrast to the existing prevalent methods. Artificial neural networks (ANNs) tend to be the preferred machine learning method for this type of application. However, ANNs require some historical data to learn from, in order to do predictions. The research used an ANN model to test the hypothesis that the next day price (prediction) can be determined from the stock prices of the immediate last five days.
The final ANN model used for the tests was a feedforward multi-layer perceptron (MLP) with error backpropagation, using sigmoid activation function, with network configuration 5:21:21:1. The data period used was a 5-year dataset (2008 to 2012), with 80% of the data (4-year data) used for training and the balance 20% used for testing (last 1-year data).
The original raw data for Nairobi Securities Exchange (NSE) was scrapped from a publicly available and accessible website of a stock market analysis company in Kenya (Synergy, 2020). This data was first exported to a spreadsheet, then cleaned off headers and other redundant information, leaving only the data with stock name, date of trade and the related data such as volumes, low prices, high prices and adjusted prices. The data was further sorted by the stock names and then the trading dates. The data dimension was finally reduced to only what was needed for the research, which was the stock name, the date of trade and the adjusted price (average trade price). This final dataset was in CSV format, as hereby presented.
The research tested three NSE stocks with the mean absolute percentage error (MAPE) ranging between 0.77% to 1.91%, over the 3-month testing period, while the root mean squared error (RMSE) ranged between 1.83 and 3.07.
This raw data can be used to train and test any machine learning model that requires training and testing data. The data can also be used to validate and reproduce the results already presented in this research. There could be slight variance between what is obtained when reproducing the results, due to the differences in the final exact weights that the trained ANN model eventually achieves. However, these differences should not be significant.
List of data files on this dataset: stock01_NSE_01jan2008_to_31dec2012_Kakuzi.csv stock02_NSE_01jan2008_to_31dec2012_StandardBank.csv stock03_NSE_01jan2008_to_31dec2012_KenyaAirways.csv stock04_NSE_01jan2008_to_31dec2012_BamburiCement.csv stock05_NSE_01jan2008_to_31dec2012_Kengen.csv stock06_NSE_01jan2008_to_31dec2012_BAT.csv
References: Synergy Systems Ltd. (2020). MyStocks. Retrieved March 9, 2020, from http://live.mystocks.co.ke/
NIFTY 500 is India’s first broad-based stock market index of the Indian stock market. It contains the top 500 listed companies on the NSE. The NIFTY 500 index represents about 96.1% of free-float market capitalization and 96.5% of the total turnover on the National Stock Exchange (NSE).
NIFTY 500 companies are disaggregated into 72 industry indices. Industry weights in the index reflect industry weights in the market. For example, if the banking sector has a 5% weight in the universe of stocks traded on the NSE, banking stocks in the index would also have an approximate representation of 5% in the index. NIFTY 500 can be used for a variety of purposes such as benchmarking fund portfolios, launching index funds, ETFs, and other structured products.
The dataset comprises various parameters and features for each of the NIFTY 500 Stocks, including Company Name, Symbol, Industry, Series, Open, High, Low, Previous Close, Last Traded Price, Change, Percentage Change, Share Volume, Value in Indian Rupee, 52 Week High, 52 Week Low, 365 Day Percentage Change, and 30 Day Percentage Change.
Company Name: Name of the Company.
Symbol: A stock symbol is a unique series of letters assigned to a security for trading purposes.
Industry: Name of the industry to which the stock belongs.
Series: EQ stands for Equity. In this series intraday trading is possible in addition to delivery and BE stands for Book Entry. Shares falling in the Trade-to-Trade or T-segment are traded in this series and no intraday is allowed. This means trades can only be settled by accepting or giving the delivery of shares.
Open: It is the price at which the financial security opens in the market when trading begins. It may or may not be different from the previous day's closing price. The security may open at a higher price than the closing price due to excess demand for the security.
High: It is the highest price at which a stock is traded during the course of the trading day and is typically higher than the closing or equal to the opening price.
Low: Today's low is a security's intraday low trading price. Today's low is the lowest price at which a stock trades over the course of a trading day.
Previous Close: The previous close almost always refers to the prior day's final price of a security when the market officially closes for the day. It can apply to a stock, bond, commodity, futures or option co-contract, market index, or any other security.
Last Traded Price: The last traded price (LTP) usually differs from the closing price of the day. This is because the closing price of the day on NSE is the weighted average price of the last 30 mins of trading. The last traded price of the day is the actual last traded price.
Change: For a stock or bond quote, change is the difference between the current price and the last trade of the previous day. For interest rates, change is benchmarked against a major market rate (e.g., LIBOR) and may only be updated as infrequently as once a quarter.
Percentage Change: Take the selling price and subtract the initial purchase price. The result is the gain or loss. Take the gain or loss from the investment and divide it by the original amount or purchase price of the investment. Finally, multiply the result by 100 to arrive at the percentage change in the investment.
Share Volume: Volume is an indicator that means the total number of shares that have been bought or sold in a specific period of time or during the trading day. It will also involve the buying and selling of every share during a specific time period.
Value (Indian Rupee): Market value—also known as market cap—is calculated by multiplying a company's outstanding shares by its current market price.
52-Week High: A 52-week high is the highest share price that a stock has traded at during a passing year. Many market aficionados view the 52-week high as an important factor in determining a stock's current value and predicting future price movement. 52-week High prices are adjusted for Bonus, Split & Rights Corporate actions.
52-Week Low: A 52-week low is the lowest ...
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset contains the file required for training and testing and split accordingly.
There are two groups of features that you can use for prediction:
Files found in Fundamentals folder is a processed format of the files found in raw folder. Ratios and other values are stretched to match the length of the closing price column such that the value in the pe_ratio column for example is the PE ratio from the most recent quarter and this applies for every column.
Technical indicators are calculated with the default parameters used in Pandas_TA package.
Data is collected form finance.yahoo.com and macrotrends.net Timeframe for the given data is different from one ticker to another because of unavailability of some stocks for a given time frame on either of the websites.
All code required to collect the data and perform preprocessing and feature engineering to get the data in the given format can be found in the following notebooks:
Columns names are supposed to be self-explanatory assuming you are familiar with the stock market. Some acronyms you may encounter:
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This is a dataset of Twitter stock prices over a range of 9 years. The stock prices' date ranges from November 2013 to October 2022. The data is in CSV
format which is tabular and can be loaded quickly.
The dataset can be used for:
There are 7 columns in this dataset.
Note: The currency is in
USD
($
)
Image credits: IndiaTimes
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Stock market data is widely analyzed for educational, business and personal interests.
The data is the price history and trading volumes of the fifty stocks in the index NIFTY 50 from NSE (National Stock Exchange) India. All datasets are at a day-level with pricing and trading values split across .cvs files for each stock along with a metadata file with some macro-information about the stocks itself. The data spans from 1st January, 2000 to 30th April, 2021.
Since new stock market data is generated and made available every day, in order to have the latest and most useful information, the dataset will be updated once a month.
NSE India: https://www.nseindia.com/
Thanks to NSE for providing all the data publicly.
Various machine learning techniques can be applied and explored to stock market data, especially for trading algorithms and learning time series models.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
All data compiled from Yahoo Finance
If you have questions, e-mail me: jiunyyen@gmail.com
Happy mining!