This dataset includes daily historical price data for Bitcoin (BTC-USD) from 2014 to 2025, obtained through web scraping from the Yahoo Finance page using Selenium. The primary data source can be accessed at Yahoo Finance - Bitcoin Historical Data . The dataset contains daily information such as opening price (Open), highest price (High), lowest price (Low), closing price (Close), adjusted closing price (Adj Close), and trading volume (Volume).
About Bitcoin: Bitcoin (BTC) is the world's first decentralized digital currency, introduced in 2009 by an anonymous creator known as Satoshi Nakamoto. It operates on a peer-to-peer network powered by blockchain technology, enabling secure, transparent, and trustless transactions without the need for intermediaries like banks. Bitcoin's limited supply of 21 million coins and its growing adoption have made it a popular asset for investment, trading, and as a hedge against inflation.
We are excited to share this dataset and look forward to seeing the insights it can provide. We hope it will inspire collaboration and innovation within the community. By leveraging this daily data, we can explore trends, develop predictive models, and design innovative trading strategies that deepen our understanding of Bitcoin's market behavior. Together, we can unlock new opportunities and contribute to the collective advancement of cryptocurrency research and analysis.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains historical daily prices for all tickers currently trading on NASDAQ. The up to date list is available from nasdaqtrader.com. The historic data is retrieved from Yahoo finance via yfinance python package.
It contains prices for up to 01 of April 2020. If you need more up to date data, just fork and re-run data collection script also available from Kaggle.
The date for every symbol is saved in CSV format with common fields:
All that ticker data is then stored in either ETFs or stocks folder, depending on a type. Moreover, each filename is the corresponding ticker symbol. At last, symbols_valid_meta.csv
contains some additional metadata for each ticker such as full name.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Time Series Forecasting with Yahoo Stock Price ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/arashnic/time-series-forecasting-with-yahoo-stock-price on 30 September 2021.
--- Dataset description provided by original source is as follows ---
Stocks and financial instrument trading is a lucrative proposition. Stock markets across the world facilitate such trades and thus wealth exchanges hands. Stock prices move up and down all the time and having ability to predict its movement has immense potential to make one rich. Stock price prediction has kept people interested from a long time. There are hypothesis like the Efficient Market Hypothesis, which says that it is almost impossible to beat the market consistently and there are others which disagree with it.
There are a number of known approaches and new research going on to find the magic formula to make you rich. One of the traditional methods is the time series forecasting. Fundamental analysis is another method where numerous performance ratios are analyzed to assess a given stock. On the emerging front, there are neural networks, genetic algorithms, and ensembling techniques.
Another challenging problem in stock price prediction is Black Swan Event, unpredictable events that cause stock market turbulence. These are events that occur from time to time, are unpredictable and often come with little or no warning.
A black swan event is an event that is completely unexpected and cannot be predicted. Unexpected events are generally referred to as black swans when they have significant consequences, though an event with few consequences might also be a black swan event. It may or may not be possible to provide explanations for the occurrence after the fact – but not before. In complex systems, like economies, markets and weather systems, there are often several causes. After such an event, many of the explanations for its occurrence will be overly simplistic.
#
#
https://www.visualcapitalist.com/wp-content/uploads/2020/03/mm3_black_swan_events_shareable.jpg">
#
#
New bleeding age state-of-the-art deep learning models stock predictions is overcoming such obstacles e.g. "Transformer and Time Embeddings". An objectives are to apply these novel models to forecast stock price.
Stock price prediction is the task of forecasting the future value of a given stock. Given the historical daily close price for S&P 500 Index, prepare and compare forecasting solutions. S&P 500 or Standard and Poor's 500 index is an index comprising of 500 stocks from different sectors of US economy and is an indicator of US equities. Other such indices are the Dow 30, NIFTY 50, Nikkei 225, etc. For the purpose of understanding, we are utilizing S&P500 index, concepts, and knowledge can be applied to other stocks as well.
The historical stock price information is also publicly available. For our current use case, we will utilize the pandas_datareader library to get the required S&P 500 index history using Yahoo Finance databases. We utilize the closing price information from the dataset available though other information such as opening price, adjusted closing price, etc., are also available. We prepare a utility function get_raw_data() to extract required information in a pandas dataframe. The function takes index ticker name as input. For S&P 500 index, the ticker name is ^GSPC. The following snippet uses the utility function to get the required data.(See Simple LSTM Regression)
Features and Terminology: In stock trading, the high and low refer to the maximum and minimum prices in a given time period. Open and close are the prices at which a stock began and ended trading in the same period. Volume is the total amount of trading activity. Adjusted values factor in corporate actions such as dividends, stock splits, and new share issuance.
Mining and updating of this dateset will depend upon Yahoo Finance .
Sort of variation of sequence modeling and bleeding age e.g. attention can be applied for research and forecasting
--- Original source retains full ownership of the source dataset ---
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides historical daily trading data for Meta Platforms, Inc. (formerly known as Facebook, Inc.), with the stock ticker symbol META. The data was collected from Yahoo Finance and consists of richly detailed records suitable for a wide variety of financial analyses and data science projects.
The dataset is structured as a CSV file, with each row corresponding to a single trading day. It covers a comprehensive time range, allowing users to examine Meta’s stock price evolution, volatility, and trading activity over time.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Actually, I prepare this dataset for students on my Deep Learning and NLP course.
But I am also very happy to see kagglers play around with it.
Have fun!
Description:
There are two channels of data provided in this dataset:
News data: I crawled historical news headlines from Reddit WorldNews Channel (/r/worldnews). They are ranked by reddit users' votes, and only the top 25 headlines are considered for a single date. (Range: 2008-06-08 to 2016-07-01)
Stock data: Dow Jones Industrial Average (DJIA) is used to "prove the concept". (Range: 2008-08-08 to 2016-07-01)
I provided three data files in .csv format:
RedditNews.csv: two columns The first column is the "date", and second column is the "news headlines". All news are ranked from top to bottom based on how hot they are. Hence, there are 25 lines for each date.
DJIA_table.csv: Downloaded directly from Yahoo Finance: check out the web page for more info.
Combined_News_DJIA.csv: To make things easier for my students, I provide this combined dataset with 27 columns. The first column is "Date", the second is "Label", and the following ones are news headlines ranging from "Top1" to "Top25".
=========================================
To my students:
I made this a binary classification task. Hence, there are only two labels:
"1" when DJIA Adj Close value rose or stayed as the same;
"0" when DJIA Adj Close value decreased.
For task evaluation, please use data from 2008-08-08 to 2014-12-31 as Training Set, and Test Set is then the following two years data (from 2015-01-02 to 2016-07-01). This is roughly a 80%/20% split.
And, of course, use AUC as the evaluation metric.
=========================================
+++++++++++++++++++++++++++++++++++++++++
To all kagglers:
Please upvote this dataset if you like this idea for market prediction.
If you think you coded an amazing trading algorithm,
friendly advice
do play safe with your own money :)
+++++++++++++++++++++++++++++++++++++++++
Feel free to contact me if there is any question~
And, remember me when you become a millionaire :P
Note: If you'd like to cite this dataset in your publications, please use:
Sun, J. (2016, August). Daily News for Stock Market Prediction, Version 1. Retrieved [Date You Retrieved This Data] from https://www.kaggle.com/aaron7sun/stocknews.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides historical stock price data for The Coca-Cola Company (NYSE: KO) from September 6, 1919, to January 31, 2025. Extracted from Yahoo Finance, this dataset is valuable for stock market analysis, long-term trend evaluation, and financial modeling.
Date: The trading date in YYYY-MM-DD format.
Open: Opening price of Coca-Cola stock on the respective day.
High: Highest price recorded during the trading session.
Low: Lowest price recorded during the trading session.
Close: Closing price of the stock at the end of the trading session.
Adj Close: Adjusted closing price, accounting for stock splits and dividends.
Volume: Total number of shares traded on that day.
Long-Term Market Trend Analysis – Analyze Coca-Cola’s stock performance over a century. Financial Forecasting – Train machine learning models to predict future stock prices. Volatility Analysis – Assess price fluctuations over different market cycles. Investment Strategy Development – Backtest various trading strategies.
This dataset has been extracted from Yahoo Finance.
This dataset is publicly available for educational and research purposes. Please cite Yahoo Finance and Muhammad Atif Latif when using it in any analysis.
Click here for more Datasets
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
The dataset of INR to Dollar exchange rates from 2003 to 2024 downloaded from Yahoo Finance likely contains historical exchange rate data for the Indian Rupee (INR) against the US Dollar (USD) over the specified time period. Here's a general description of what you might find in such a dataset:
Date: Each entry in the dataset likely includes a date or timestamp indicating when the exchange rate was recorded.
Exchange Rate: The dataset should include the exchange rate value, representing the number of Indian Rupees equivalent to one US Dollar on the corresponding date.
Time Period: The dataset should cover exchange rate data for each trading day or a specified frequency (e.g., weekly, monthly) from 2003 to 2024.
Additional Information: Depending on the source and format of the dataset, it may include additional information such as opening, high, low, and closing exchange rates for each day, as well as volume and adjusted closing prices.
Currency Pair: The dataset focuses specifically on the exchange rate between the Indian Rupee (INR) and the US Dollar (USD), allowing users to analyze trends and fluctuations in the value of the Indian Rupee relative to the US Dollar over time.
The dataset of INR to Dollar exchange rates from 2003 to 2024 downloaded from Yahoo Finance likely contains historical exchange rate data for the Indian Rupee (INR) against the US Dollar (USD) over the specified time period. Here's a general description of what you might find in such a dataset:
Date: Each entry in the dataset likely includes a date or timestamp indicating when the exchange rate was recorded.
Exchange Rate: The dataset should include the exchange rate value, representing the number of Indian Rupees equivalent to one US Dollar on the corresponding date.
Time Period: The dataset should cover exchange rate data for each trading day or a specified frequency (e.g., weekly, monthly) from 2003 to 2024.
Additional Information: Depending on the source and format of the dataset, it may include additional information such as opening, high, low, and closing exchange rates for each day, as well as volume and adjusted closing prices.
Currency Pair: The dataset focuses specifically on the exchange rate between the Indian Rupee (INR) and the US Dollar (USD), allowing users to analyze trends and fluctuations in the value of the Indian Rupee relative to the US Dollar over time.
Data Quality: It's important to consider the reliability and accuracy of the data. Ensure that the dataset is sourced from a reputable financial data provider like Yahoo Finance and that any missing or erroneous data points are appropriately handled.
Overall, this dataset can be used for various analytical purposes, including trend analysis, forecasting, and risk management in the context of currency exchange markets and international finance.: It's important to consider the reliability and accuracy of the data. Ensure that the dataset is sourced from a reputable financial data provider like Yahoo Finance and that any missing or erroneous data points are appropriately handled.
Overall, this dataset can be used for various analytical purposes, including trend analysis, forecasting, and risk management in the context of currency exchange markets and international finance.
The largest reported data leakage as of January 2025 was the Cam4 data breach in March 2020, which exposed more than 10 billion data records. The second-largest data breach in history so far, the Yahoo data breach, occurred in 2013. The company initially reported about one billion exposed data records, but after an investigation, the company updated the number, revealing that three billion accounts were affected. The National Public Data Breach was announced in August 2024. The incident became public when personally identifiable information of individuals became available for sale on the dark web. Overall, the security professionals estimate the leakage of nearly three billion personal records. The next significant data leakage was the March 2018 security breach of India's national ID database, Aadhaar, with over 1.1 billion records exposed. This included biometric information such as identification numbers and fingerprint scans, which could be used to open bank accounts and receive financial aid, among other government services.
Cybercrime - the dark side of digitalization As the world continues its journey into the digital age, corporations and governments across the globe have been increasing their reliance on technology to collect, analyze and store personal data. This, in turn, has led to a rise in the number of cyber crimes, ranging from minor breaches to global-scale attacks impacting billions of users – such as in the case of Yahoo. Within the U.S. alone, 1802 cases of data compromise were reported in 2022. This was a marked increase from the 447 cases reported a decade prior. The high price of data protection As of 2022, the average cost of a single data breach across all industries worldwide stood at around 4.35 million U.S. dollars. This was found to be most costly in the healthcare sector, with each leak reported to have cost the affected party a hefty 10.1 million U.S. dollars. The financial segment followed closely behind. Here, each breach resulted in a loss of approximately 6 million U.S. dollars - 1.5 million more than the global average.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset contains the file required for training and testing and split accordingly.
There are two groups of features that you can use for prediction:
Files found in Fundamentals folder is a processed format of the files found in raw folder. Ratios and other values are stretched to match the length of the closing price column such that the value in the pe_ratio column for example is the PE ratio from the most recent quarter and this applies for every column.
Technical indicators are calculated with the default parameters used in Pandas_TA package.
Data is collected form finance.yahoo.com and macrotrends.net Timeframe for the given data is different from one ticker to another because of unavailability of some stocks for a given time frame on either of the websites.
All code required to collect the data and perform preprocessing and feature engineering to get the data in the given format can be found in the following notebooks:
Columns names are supposed to be self-explanatory assuming you are familiar with the stock market. Some acronyms you may encounter:
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
I have scrapped the data from finance yahoo stocks for my own project related work Thank you for data science community without help of varies people and their ideas i cannot be achieved this dataset publishing in kaggle ,this is small contribution from me
Daily price data for indexes tracking stock exchanges from all over the world (United States, China, Canada, Germany, Japan, and more). The data was all collected from Yahoo Finance, which had several decades of data available for most exchanges.
The data scrapped from Yahoo_finance stocks for last 20 years
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains the historical stock prices and related financial information for five major technology companies: Apple (AAPL), Microsoft (MSFT), Amazon (AMZN), Google (GOOGL), and Tesla (TSLA). The dataset spans a five-year period from January 1, 2019, to January 1, 2024. It includes key stock metrics such as Open, High, Low, Close, Adjusted Close, and Volume for each trading day.
The data was sourced using the yfinance library in Python, which provides convenient access to historical market data from Yahoo Finance.
The dataset contains the following columns:
Date: The trading date. Open: The opening price of the stock on that date. High: The highest price of the stock on that date. Low: The lowest price of the stock on that date. Close: The closing price of the stock on that date. Adj Close: The adjusted closing price, accounting for dividends and splits. Volume: The number of shares traded on that date. Ticker: The stock ticker symbol representing each company.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset presents the comprehensive stock price history of PT Telkom Indonesia (TLKM.JK), a multinational telecommunications conglomerate from 2001 to 2023. The dataset includes daily stock prices, trading volume, and other relevant financial metrics. The stock prices are provided in IDR (Indonesian Rupiah) currency. Telkom has major business lines in fixed-line telephony, internet, and data communications. It is operated as the parent company of the Telkom Group, which is engaged in a broad range of businesses which consist of telecommunication, multimedia, property, and financial services.
Dataset Variables:
Date: The date of the stock price data. Open Price: The opening price of the bank's stock on the given date. Close Price: The closing price of the bank's stock on the given date. High Price: The highest price reached by the bank's stock during the trading day. Low Price: The lowest price reached by the bank's stock during the trading day. Adjusted Low Price: The closing price on a given trading day, adjusted to reflect any corporate actions, such as stock splits, dividends, rights offerings, or other adjustments that may affect the stock price. Volume: The number of shares traded on the given date.
Data Sources: The dataset is compiled from reliable financial sources, including stock exchanges, financial news websites, and reputable financial data providers. Data cleaning and preprocessing techniques have been applied to ensure accuracy and consistency. More info: https://finance.yahoo.com/quote/TLKM.JK/history/
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset contains daily OHLCV data for ~ 2000 Indian Stocks listed on the National Stock Exchange for all time. The columns are multi-index columns, so this needs to be taken into account when reading and using the data. Source : Yahoo Finance Type: All files are CSV format. Currency : INR
All the tickers have been collected from here : https://www.nseindia.com/market-data/securities-available-for-trading
If using pandas
, the following function is a utility to read any of the CSV files:
```
import pandas as pd
def read_ohlcv(filename):
"read a given ohlcv data file downloaded from yfinance"
return pd.read_csv(
filename,
skiprows=[0, 1, 2], # remove the multiindex rows that cause trouble
names=["Date", "Close", "High", "Low", "Open", "Volume"],
index_col="Date",
parse_dates=["Date"],
)
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Technology companies have become a dominant driver in recent years of economic growth, consumer tastes and the financial markets. The biggest tech stocks as a group, for example, have dramatically outpaced the broader market in the past decade.
That's because technology has reshaped in a major way how people communicate, consume information, shop, socialize, and work.
Broadly speaking, companies in the technology sector engage in the research, development, and manufacture of technologically based goods and services. They create software, and design and manufacture computers, mobile devices, and home appliances. They also provide products and services related to information technology.
This dataset contains 3 files with the daily stock price and volume of the three companies: Google, Apple, and Facebook from 07/09/2017 to 07/09/2022. Source: Yahoo! Finance
Apple Inc. (AAPL) One Apple Park Way Cupertino, CA 95014 United States 408 996 1010 https://www.apple.com
Sector(s): Technology Industry: Consumer Electronics Full Time Employees: 154,000
Total Revenue (2021): $365,817,000
Net Income (2021):$94,680,000
Exchange: Nasdaq
Alphabet Inc. (GOOG) 1600 Amphitheatre Parkway Mountain View, CA 94043 United States 650 253 0000 https://www.abc.xyz
Sector(s): Communication Services Industry: Internet Content & Information Full Time Employees: 174,014
Total Revenue (2021): $257,637,000 Net Income (2021):$76,033,000 Exchange: Nasdaq
Meta Platforms, Inc. (META) 1601 Willow Road Menlo Park, CA 94025 United States 650 543 4800 https://investor.fb.com
Sector(s): Communication Services Industry: Internet Content & Information Full Time Employees: 83,553
Total Revenue (2021): $117,929,000 Net Income (2021):$39,370,000 Exchange: Nasdaq
Yahoo! Finance Investopedia Nasdaq
Not seeing a result you expected?
Learn how you can add new datasets to our index.
This dataset includes daily historical price data for Bitcoin (BTC-USD) from 2014 to 2025, obtained through web scraping from the Yahoo Finance page using Selenium. The primary data source can be accessed at Yahoo Finance - Bitcoin Historical Data . The dataset contains daily information such as opening price (Open), highest price (High), lowest price (Low), closing price (Close), adjusted closing price (Adj Close), and trading volume (Volume).
About Bitcoin: Bitcoin (BTC) is the world's first decentralized digital currency, introduced in 2009 by an anonymous creator known as Satoshi Nakamoto. It operates on a peer-to-peer network powered by blockchain technology, enabling secure, transparent, and trustless transactions without the need for intermediaries like banks. Bitcoin's limited supply of 21 million coins and its growing adoption have made it a popular asset for investment, trading, and as a hedge against inflation.
We are excited to share this dataset and look forward to seeing the insights it can provide. We hope it will inspire collaboration and innovation within the community. By leveraging this daily data, we can explore trends, develop predictive models, and design innovative trading strategies that deepen our understanding of Bitcoin's market behavior. Together, we can unlock new opportunities and contribute to the collective advancement of cryptocurrency research and analysis.