https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains historical daily prices for all tickers currently trading on NASDAQ. The up to date list is available from nasdaqtrader.com. The historic data is retrieved from Yahoo finance via yfinance python package.
It contains prices for up to 01 of April 2020. If you need more up to date data, just fork and re-run data collection script also available from Kaggle.
The date for every symbol is saved in CSV format with common fields:
All that ticker data is then stored in either ETFs or stocks folder, depending on a type. Moreover, each filename is the corresponding ticker symbol. At last, symbols_valid_meta.csv
contains some additional metadata for each ticker such as full name.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains historical stock price data for major banks from the year 2014 to 2024. The dataset includes daily stock prices, trading volume, and other relevant financial metrics for prominent banks. The stock prices are provided in IDR (Indonesian Rupiah) currency.
PT Bank Central Asia Tbk (BBCA.JK), more commonly recognized as Bank Central Asia (BCA). As one of Indonesia's largest privately-owned banks, BCA was founded in 1955 and provides a diverse array of banking services encompassing consumer banking, corporate banking, investment banking, and asset management. With a widespread presence throughout Indonesia, including numerous branches and ATMs, BCA is esteemed for its robust financial achievements, inventive banking offerings, and dedication to customer satisfaction.
Dataset Variables:
Data Sources: The dataset is compiled from reliable financial sources, including stock exchanges, financial news websites, and reputable financial data providers. Data cleaning and preprocessing techniques have been applied to ensure accuracy and consistency. More info: https://finance.yahoo.com/quote/BBCA.JK/history/
Use Case: This dataset can be utilized for various purposes, including financial analysis, stock market forecasting, algorithmic trading strategies, and academic research. Researchers, analysts, and data scientists can explore the trends, patterns, and relationships within the data to derive valuable insights into the performance of the banking sector over the specified period. Additionally, this dataset can serve as a benchmark for evaluating the performance of machine learning models and quantitative trading strategies in the banking industry.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
After some rigorous SQL queries and coding on python. I made this dataset. In this dataset, all stocks of the Indian Stock Market are present a total of 2435 stocks. The data is of 1-year rows represent stock name and column represent date and I have filled the table with closing price. Enjoy and do some stock price predictions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The US_Stock_Data.csv
dataset offers a comprehensive view of the US stock market and related financial instruments, spanning from January 2, 2020, to February 2, 2024. This dataset includes 39 columns, covering a broad spectrum of financial data points such as prices and volumes of major stocks, indices, commodities, and cryptocurrencies. The data is presented in a structured CSV file format, making it easily accessible and usable for various financial analyses, market research, and predictive modeling. This dataset is ideal for anyone looking to gain insights into the trends and movements within the US financial markets during this period, including the impact of major global events.
The dataset captures daily financial data across multiple assets, providing a well-rounded perspective of market dynamics. Key features include:
The dataset’s structure is designed for straightforward integration into various analytical tools and platforms. Each column is dedicated to a specific asset's daily price or volume, enabling users to perform a wide range of analyses, from simple trend observations to complex predictive models. The inclusion of intraday data for Bitcoin provides a detailed view of market movements.
This dataset is highly versatile and can be utilized for various financial research purposes:
The dataset’s daily updates ensure that users have access to the most current data, which is crucial for real-time analysis and decision-making. Whether for academic research, market analysis, or financial modeling, the US_Stock_Data.csv
dataset provides a valuable foundation for exploring the complexities of financial markets over the specified period.
This dataset would not be possible without the contributions of Dhaval Patel, who initially curated the US stock market data spanning from 2020 to 2024. Full credit goes to Dhaval Patel for creating and maintaining the dataset. You can find the original dataset here: US Stock Market 2020 to 2024.
This dataset was created by Tolga Kaplan
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Hridvi Saluja
Released under Apache 2.0
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains 4,987 daily record behavior of financial markets. It includes stock price metrics, macroeconomic indicators, sentiment scores, and event flags.
Key highlights:
Time span: 4,987 days
Financial indicators: Open, High, Low, Close, Adjusted Close, Volume
Macroeconomic variables: GDP, Inflation, Unemployment, Interest Rate, CPI
Sentiment analysis: News and Social Sentiment scores
Event tagging: Binary event flag (e.g., market shocks)
Target label: Market condition — Stable, Volatile, or Crash
This dataset was created by AlukoSayo
China Stock Market Public Opinion Analysis Data Set。It can be used to do some deep learning related projects。
Context The StockNet dataset, introduced by Xu and Cohen at ACL 2018, is a benchmark for measuring the effectiveness of textual information in stock market prediction. While the original dataset provides valuable price and news data, it requires significant pre-processing and feature engineering to be used effectively in advanced machine learning models.
This dataset was created to bridge that gap. We have taken the original data for 87 stocks and performed extensive feature engineering, creating a rich, multi-modal feature repository.
A key contribution of this work is a preliminary statistical analysis of the news data for each stock. Based on the consistency and volume of news, we have categorized the 87 stocks into two distinct groups, allowing researchers to choose the most appropriate modeling strategy:
joint_prediction_model_set: Stocks with rich and consistent news data, ideal for building complex, single models that analyze all stocks jointly.
panel_data_model_set: Stocks with less consistent news data, which are better suited for traditional panel data analysis.
Content and File Structure The dataset is organized into two main directories, corresponding to the two stock categories mentioned above.
1.joint_prediction_model_set This directory contains stocks suitable for sophisticated, news-aware joint modeling.
-Directory Structure: This directory contains a separate sub-directory for each stock suitable for joint modeling (e.g., AAPL/, MSFT/, etc.).
-Folder Contents: Inside each stock's folder, you will find a set of files, each corresponding to a different category of engineered features. These files include:
-News Graph Embeddings: A NumPy tensor file (.npy) containing the encoded graph embeddings from daily news. Its shape is (Days, N, 128), where N is the number of daily articles.
-Engineered Features: A CSV file containing fundamental features derived directly from OHLCV data (e.g., intraday_range, log_return).
-Technical Indicators: A CSV file with a wide array of popular technical indicators (e.g., SMA, EMA, MACD, RSI, Bollinger Bands).
-Statistical & Time Features: A CSV file with rolling statistical features (e.g., volatility, skew, kurtosis) over an optimized window, plus cyclical time-based features.
-Advanced & Transformational Features: A CSV file with complex features like lagged variables, wavelet transform coefficients, and the Hurst Exponent.
2.panel_data_model_set This directory contains stocks that are more suitable for panel data models, based on the statistical properties of their associated news data.
-Directory Structure: Similar to the joint prediction set, this directory also contains a separate sub-directory for each stock in this category.
-Folder Contents: Inside each stock's folder, you will find the cleaned and structured price and news text data. This facilitates the application of econometric models or machine learning techniques designed for panel data, where observations are tracked for the same subjects (stocks) over a period of time.
-Further Information: For a detailed breakdown of the statistical analysis used to separate the stocks into these two groups, please refer to the data_preview.ipynb notebook located in the TRACE_ACL18_raw_data directory.
Methodology The features for the joint_prediction_model_set were generated systematically for each stock:
-News-to-Graph Pipeline: Daily news headlines were processed to extract named entities. These entities were then used to query Wikidata and build knowledge subgraphs. A Graph Convolutional Network (GCN) model encoded these graphs into dense vectors.
-Feature Engineering: All other features were generated from the raw price and volume data. The process included basic calculations, technical analysis via pandas-ta, generation of statistical and time-based features, and advanced transformations like wavelet analysis.
Acknowledgements This dataset is an extension and transformation of the original StockNet dataset. We extend our sincere gratitude to the original authors for their contribution to the field.
Original Paper: "StockNet: A Probing Task for Measuring Stock Market Prediction" by Yumeng Xu and Mohit Bansal. (ACL 2018).
Original Data Repository: https://github.com/yumoxu/stocknet-dataset
Inspiration This dataset opens the door to numerous exciting research questions:
-Can you build a single, powerful joint model using the joint_prediction_model_set to predict movements for all stocks simultaneously?
-How does a sophisticated joint model compare against a traditional panel data model trained on the panel_data_model_set?
-What is the lift in predictive power from using news-based graph embeddings versus using only technical indicators?
-Can you apply transfer learning or multi-task learning, using the feature-rich joint set to improve predictions for the panel set?
The dataset contains prices and volumes for different stocks
Here is an example:
cat 201801_Amsterdam_AALB_NoExpiry.txt
01/02/2018,09:01:00, 42.39, 42.39, 42.21, 42.21, 737 01/02/2018,09:02:00, 42.28, 42.28, 42.27, 42.27, 277 01/02/2018,09:04:00, 42.24, 42.24, 42.24, 42.24, 177 01/02/2018,09:05:00, 42.23, 42.23, 42.22, 42.22, 1543 01/02/2018,09:06:00, 42.23, 42.23, 42.23, 42.23, 241
The dataset contains trading data for 2182 unique stocks, on 40 unique stock exchanges. The monthly data is provided by stocks with each stock being associated with a specific stock exchange and is initially stored in the .txt format. Each file contains a trading history of a stock in a particular month and has the following schema.
Dataset is a zipped file of stocks from many stock markets and forex. It covers the whole of 2018. Notice the following: 1. All mentioned timestamps are CET. 2. There are missing records and irregularities on the updates – see the previous example. You need to decide how to handle the missing values/records. 3. Different stocks have different update frequencies.
This dataset was created by Yashaswi Upmon
It contains the following files:
ADBE Stock Prices Dataset
This dataset contains historical stock price data for Adobe Inc. (ticker: ADBE). The dataset provides valuable insights into the stock performance of Adobe Inc., making it useful for financial analysis, stock market prediction, and machine learning applications related to stock price forecasting.
This dataset was created by Ashkan Forootan
Largest UK companies by market cap
The largest UK companies by market cap are those listed on the UK stock exchange with the highest total value of all shares, representing their perceived worth by investors. These companies, such as BP, Shell, Unilever, HSBC Holdings, and GlaxoSmithKline, are considered some of the most valuable and powerful in the country, with a significant impact on the global economy. AstraZeneca, Rio Tinto, and Reckitt Benckiser are also notable high-market cap companies in the UK, reflecting their strong foothold in their respective markets.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides historical stock market performance data for specific companies. It enables users to analyze and understand the past trends and fluctuations in stock prices over time. This information can be utilized for various purposes such as investment analysis, financial research, and market trend forecasting.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset contains the file required for training and testing and split accordingly.
There are two groups of features that you can use for prediction:
Files found in Fundamentals folder is a processed format of the files found in raw folder. Ratios and other values are stretched to match the length of the closing price column such that the value in the pe_ratio column for example is the PE ratio from the most recent quarter and this applies for every column.
Technical indicators are calculated with the default parameters used in Pandas_TA package.
Data is collected form finance.yahoo.com and macrotrends.net Timeframe for the given data is different from one ticker to another because of unavailability of some stocks for a given time frame on either of the websites.
All code required to collect the data and perform preprocessing and feature engineering to get the data in the given format can be found in the following notebooks:
Columns names are supposed to be self-explanatory assuming you are familiar with the stock market. Some acronyms you may encounter:
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains historical stock market data for MercadoLibre (MELI) from August 10, 2007, to March 16, 2025. It provides key financial indicators such as Open, High, Low, Close (OHLC) prices, Adjusted Close prices, and Trading Volume for each trading day.
The dataset includes the following columns:
Column Name | Description |
---|---|
date | Trading date (YYYY-MM-DD format) |
open | Opening stock price for the day |
high | Highest stock price of the day |
low | Lowest stock price of the day |
close | Closing stock price of the day |
adj_close | Adjusted closing price (accounting for splits & dividends) |
volume | Number of shares traded on that day |
This dataset is valuable for: - Stock Market Analysis: Analyze trends in MercadoLibre's stock performance over time. - Time Series Forecasting: Build machine learning models to predict future stock prices. - Technical Analysis: Identify patterns using OHLC data for trading strategies. - Financial Research: Study the impact of macroeconomic factors on stock prices.
The dataset is compiled from stock market historical data sources and is updated Weekly.
You can download the dataset and use it for research, trading analysis, and machine learning models. If you find this dataset useful, consider giving it a ⭐ on Kaggle!
Contect info:
You can contect me for more data sets
-X
📢 Note: This dataset is for educational and research purposes only. It should not be considered financial advice.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Real and up to date stock market exchange of cryptocurrencies can be quite expensive and are hard to get. However, historical financial data are the starting point to develop algorithm(s) to analyze market trend and why not beat the market by predicting market movement.
Data provided in this dataset are historical data from the beginning of GBP-USD pair market on Kraken exchange up to the present (2021 December). This data comes frome real trades on one of the most popular cryptocurrencies exchange.
Historical market data, also known as trading history, time and sales or tick data, provides a detailed record of every trade that happens on Kraken exchange, and includes the following information: - Timestamp - The exact date and time of each trade. - Price - The price at which each trade occurred. - Volume - The amount of volume that was traded.
In addition, OHLCVT data are provided for the most common period interval: 1 min, 5 min, 15 min, 1 hour, 12 hours and 1 day. OHLCVT stands for Open, High, Low, Close, Volume and Trades and represents the following trading information for each time period: - Open - The first traded price - High - The highest traded price - Low - The lowest traded price - Close - The final traded price - Volume - The total volume traded by all trades - Trades - The number of individual trades
Don't hesitate to tell me if you need other period interval 😉 ...
This dataset will be updated every quarter to add new and up to date market trend. Let me know if you need an update more frequently.
Can you beat the market? Let see what you can do with these data!
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Stock Price Prediction
A stock market, equity market or share market is the aggregation of buyers and sellers of stocks (also called shares), which represent ownership claims on businesses; these may include securities listed on a public stock exchange, as well as stock that is only traded privately, such as shares of private companies which are sold to investors through equity crowdfunding platforms.
The secret of a successful stock trader is being able to look into the future of the stocks and make wise decisions. Accurate prediction of stock market returns is a very challenging task due to volatile and non-linear nature of the financial stock markets. With the introduction of artificial intelligence and increased computational capabilities, programmed methods of prediction have proved to be more efficient in predicting stock prices.
Here, you are provided dataset of a public stock market for 104 stocks. Can you forecast the future closing prices for these stocks with your Data Science skills for the next 2 months?
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains historical daily prices for all tickers currently trading on NASDAQ. The up to date list is available from nasdaqtrader.com. The historic data is retrieved from Yahoo finance via yfinance python package.
It contains prices for up to 01 of April 2020. If you need more up to date data, just fork and re-run data collection script also available from Kaggle.
The date for every symbol is saved in CSV format with common fields:
All that ticker data is then stored in either ETFs or stocks folder, depending on a type. Moreover, each filename is the corresponding ticker symbol. At last, symbols_valid_meta.csv
contains some additional metadata for each ticker such as full name.