I always wanted to have a program that fetch the whole stock market data at once without concerning about new companies that went public recently. So, here it is.
This dataset contains 2 python scripts which one can fetch the data from on their own machine without any special requirements by just running the collect.py . I have done this part in May/21/2021 (Version 2). So, the data is available until then. If one wants to have extend that period, they can run the collect.py .
tickers.csv contains ticker names along with some additional data such as name of the company, sector, industry, and the country of the company.
Each CSV file in stocksData folder named as the company's ticker name. Each file has 8 columns: - Date: as an index. - Open, Close, High, Low: which is in dollars. - Volume: which is number of shares that traded in specific date. - Stock Splits: Show if there is a stock split in specific day as the split ratio. - Dividends: which is in dollars. If a company doesn’t provide dividends for their share holders, this column can be dropped.
I've used finviz site and yfinance package to gather this rich data.
I hope one can find this helpful and interesting. If you have any questions don't hesitate to contact me at milad@miladtabrizi.com .
http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
The National Stock Exchange of India Ltd. (NSE) is an Indian stock exchange located at Mumbai, Maharashtra, India. National Stock Exchange (NSE) was established in 1992 as a demutualized electronic exchange. It was promoted by leading financial institutions on request of the Government of India. It is India’s largest exchange by turnover. In 1994, it launched electronic screen-based trading. Thereafter, it went on to launch index futures and internet trading in 2000, which were the first of its kind in the country.
With the help of NSE, you can trade in the following segments:
Equities
Indices
Mutual Funds
Exchange Traded Funds
Initial Public Offerings
Security Lending and Borrowing Scheme
https://cdn6.newsnation.in/images/2019/06/24/Sharemarket-164616041_6.jpg" alt="Stock image">
Companies on successful IPOs gets their Stocks traded over different Stock Exchnage platforms. NSE is one important platofrm in India. There are thousands of companies trading their stocks in NSE. But, I have chosen two popular and high rated IT service companies of India; TCS and INFOSYS. and the third one is the benchmark for Indian IT companies , i.e. NIFTY_IT_INDEX .
The dataset contains three csv files. Each resembling to INFOSYS, NIFTY_IT_INDEX, and TCS, respectively. One can easily identify that by the name of CSV files.
Timeline of Data recording : 1-1-2015 to 31-12-2015.
Source of Data : Official NSE website.
Method : We have used the NSEpy api to fetch the data from NSE site. I have also mentioned my approach in this Kernel - "**WebScraper to download data for NSE**". Please go though that to better understand the nature of this dataset.
INFOSYS - 248 x 15 || NIFTY_IT_INDEX - 248 x 7 || **TCS - 248 x 15
Colum Descriptors:
Date
: date on which data is recorded
Symbol
: NSE symbol of the stock
Series
: Series of that stock | EQ - Equity
OTHER SERIES' ARE:
EQ: It stands for Equity. In this series intraday trading is possible in addition to delivery.
BE: It stands for Book Entry. Shares falling in the Trade-to-Trade or T-segment are traded in this series and no intraday is allowed. This means trades can only be settled by accepting or giving the delivery of shares.
BL: This series is for facilitating block deals. Block deal is a trade, with a minimum quantity of 5 lakh shares or minimum value of Rs. 5 crore, executed through a single transaction, on the special “Block Deal window”. The window is opened for only 35 minutes in the morning from 9:15 to 9:50AM.
BT: This series provides an exit route to small investors having shares in the physical form with a cap of maximum 500 shares.
GC: This series allows Government Securities and Treasury Bills to be traded under this category.
IL: This series allows only FIIs to trade among themselves. Permissible only in those securities where maximum permissible limit for FIIs is not breached.
Prev Close
: Last day close point
Open
: current day open point
High
: current day highest point
Low
: current day lowest point
Last
: the final quoted trading price for a particular stock, or stock-market index, during the most recent day of trading.
Close
: Closing point for the current day
VWAP
: volume-weighted average price is the ratio of the value traded to total volume traded over a particular time horizon
Volume
: the amount of a security that was traded during a given period of time. For every buyer, there is a seller, and each
transaction contributes to the count of total volume.
Turnover
: Total Turnover of the stock till that day
Trades
: Number of buy or Sell of the stock.
Deliverable
: Volumethe quantity of shares which actually move from one set of people (who had those shares in their demat account before today and are selling today) to another set of people (who have purchased those shares and will get those shares by T+2 days in their demat account).
%Deliverble
: percentage deliverables of that stock
I woul dlike to acknowledge all my sincere thanks to the brains behind NSEpy api, and in particular SWAPNIL JARIWALA , who is also maintaining an amazing open source github repo for this api.
I have also built a starter kernel for this dataset. You can find that right here .
I am so excited to see your magical approaches for the same dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Google Stock Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/varpit94/google-stock-data on 28 January 2022.
--- Dataset description provided by original source is as follows ---
Google LLC is an American multinational technology company that specializes in Internet-related services and products, which include online advertising technologies, a search engine, cloud computing, software, and hardware. It is considered one of the Big Five companies in the American information technology industry, along with Amazon, Facebook, Apple, and Microsoft. Google was founded on September 4, 1998, by Larry Page and Sergey Brin while they were Ph.D. students at Stanford University in California. Together they own about 14% of its publicly-listed shares and control 56% of the stockholder voting power through super-voting stock. The company went public via an initial public offering (IPO) in 2004. In 2015, Google was reorganized as a wholly-owned subsidiary of Alphabet Inc. Google is Alphabet's largest subsidiary and is a holding company for Alphabet's Internet properties and interests. Sundar Pichai was appointed CEO of Google on October 24, 2015, replacing Larry Page, who became the CEO of Alphabet. On December 3, 2019, Pichai also became the CEO of Alphabet.
This dataset provides historical data of Alphabet Inc. (GOOG). The data is available at a daily level. Currency is USD.
--- Original source retains full ownership of the source dataset ---
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
A novel dataset for bankruptcy prediction related to American public companies listed on the New York Stock Exchange and NASDAQ is provided. The dataset comprises accounting data from 8,262 distinct companies recorded during the period spanning from 1999 to 2018.
According to the Security Exchange Commission (SEC), a company in the American market is deemed bankrupt under two circumstances. Firstly, if the firm's management files for Chapter 11 of the Bankruptcy Code, indicating an intention to "reorganize" its business. In this case, the company's management continues to oversee day-to-day operations, but significant business decisions necessitate approval from a bankruptcy court. Secondly, if the firm's management files for Chapter 7 of the Bankruptcy Code, indicating a complete cessation of operations and the company going out of business entirely.
In this dataset, the fiscal year prior to the filing of bankruptcy under either Chapter 11 or Chapter 7 is labeled as "Bankruptcy" (1) for the subsequent year. Conversely, if the company does not experience these bankruptcy events, it is considered to be operating normally (0). The dataset is complete, without any missing values, synthetic entries, or imputed added values.
The resulting dataset comprises a total of 78,682 observations of firm-year combinations. To facilitate model training and evaluation, the dataset is divided into three subsets based on time periods. The training set consists of data from 1999 to 2011, the validation set comprises data from 2012 to 2014, and the test set encompasses the years 2015 to 2018. The test set serves as a means to assess the predictive capability of models in real-world scenarios involving unseen cases.
Variable Name | Description |
---|---|
X1 | Current assets - All the assets of a company that are expected to be sold or used as a result of standard |
business operations over the next year | |
X2 | Cost of goods sold - The total amount a company paid as a cost directly related to the sale of products |
X3 | Depreciation and amortization - Depreciation refers to the loss of value of a tangible fixed asset over |
time (such as property, machinery, buildings, and plant). Amortization refers to the loss of value of | |
intangible assets over time. | |
X4 | EBITDA - Earnings before interest, taxes, depreciation, and amortization. It is a measure of a company's |
overall financial performance, serving as an alternative to net income. | |
X5 | Inventory - The accounting of items and raw materials that a company either uses in production or sells. |
X6 | Net Income - The overall profitability of a company after all expenses and costs have been deducted from |
total revenue. | |
X7 | Total Receivables - The balance of money due to a firm for goods or services delivered or used but not |
yet paid for by customers. | |
X8 | Market value - The price of an asset in a marketplace. In this dataset, it refers to the market |
capitalization since companies are publicly traded in the stock market. | |
X9 | Net sales - The sum of a company's gross sales minus its returns, allowances, and discounts. |
X10 | Total assets - All the assets, or items of value, a business owns. |
X11 | Total Long-term debt - A company's loans and other liabilities that will not become due within one year |
of the balance sheet date. | |
X12 | EBIT - Earnings before interest and taxes. |
X13 | Gross Profit - The profit a business makes after subtracting all the costs that are related to |
manufacturi... |
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This file contains a dataset of companies in the information and communication industry listed on the Japanese stock market from 2009 to 2015. The dataset describes the name of the company, the name of the listed market, the date of listing, the date of real establishment, the number of years listed, the type of startup, the real establishment (yes = 1, no = 0), the data source, and the date the data source was viewed.
In the U.S. public companies, certain insiders and broker-dealers are required to regularly file with the SEC. The SEC makes this data available online for anybody to view and use via their Electronic Data Gathering, Analysis, and Retrieval (EDGAR) database. The SEC updates this data every quarter going back to January, 2009. To aid analysis a quick summary view of the data has been created that is not available in the original dataset. The quick summary view pulls together signals into a single table that otherwise would have to be joined from multiple tables and enables a more streamlined user experience. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets.瞭解詳情
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Corporate Profits in the United States decreased to 3203.60 USD Billion in the first quarter of 2025 from 3312 USD Billion in the fourth quarter of 2024. This dataset provides the latest reported value for - United States Corporate Profits - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
Xverum’s Company Data delivers comprehensive insights into over 50 million global businesses, from fast-growing startups to established private companies. This dataset is a trusted source for investors, analysts, and B2B teams seeking reliable firmographic data, company registry attributes, and organizational details across industries and geographies.
Whether you’re researching potential clients, running B2B campaigns, or building smarter go-to-market strategies, this company dataset gives you the full picture—updated every 30 days.
What’s Included: ✅ 50M+ Verified Company Records across 249 countries ✅ 40+ Firmographic Attributes, including: ✔️ Company Name, Industry ✔️ Employee Count, HQ Location, Founding Year ✔️ Company Domain, Company Profile URL, Registry Type ✅ Private, Public & Startup Coverage with a focus on any business size. ✅ Custom Region Delivery – segment by country, region or worldwide. ✅ 30-Day Refresh Cycle to keep your data fresh and investment-ready ✅ Available in CSV, JSON, or via API & S3
Use Cases: ➡️ Company Research & Competitive Benchmarking Analyze growth metrics and benchmarks across industries and private company peers.
➡️ B2B Lead Generation & Outreach Fuel CRM and outbound sales platforms with firmographic-enriched startup and SMB records.
➡️ Investor Intelligence & Deal Sourcing Spot high-growth startups by tracking employee expansion, market entry, and location-based clusters.
➡️ Market Mapping & Go-To-Market Planning Build total addressable market (TAM) maps using verified business registry records and firmographics.
Why Choose Xverum’s Company Dataset? ✅ Global Reach: 50M companies, with data on startups, SMEs, and private firms in emerging and developed markets ✅ Flexible Formats: Delivered via API, bulk export, or cloud delivery ✅ GDPR & CCPA Compliant: Ethically sourced and privacy-focused
Ready to enrich your CRM or power your next B2B campaign? Request a free sample today or contact us to dive deeper into your data needs.
In the U.S. public companies, certain insiders and broker-dealers are required to regularly file with the SEC. The SEC makes this data available online for anybody to view and use via their Electronic Data Gathering, Analysis, and Retrieval (EDGAR) database. The SEC updates this data every quarter going back to January, 2009. For more information please see this site.
To aid analysis a quick summary view of the data has been created that is not available in the original dataset. The quick summary view pulls together signals into a single table that otherwise would have to be joined from multiple tables and enables a more streamlined user experience.
DISCLAIMER: The Financial Statement and Notes Data Sets contain information derived from structured data filed with the Commission by individual registrants as well as Commission-generated filing identifiers. Because the data sets are derived from information provided by individual registrants, we cannot guarantee the accuracy of the data sets. In addition, it is possible inaccuracies or other errors were introduced into the data sets during the process of extracting the data and compiling the data sets. Finally, the data sets do not reflect all available information, including certain metadata associated with Commission filings. The data sets are intended to assist the public in analyzing data contained in Commission filings; however, they are not a substitute for such filings. Investors should review the full Commission filings before making any investment decision.
The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.
https://www.lseg.com/en/policies/website-disclaimerhttps://www.lseg.com/en/policies/website-disclaimer
Access historical and point-in-time financial statements, ratios, multiples, and press releases, with LSEG's S&P Compustat Database.
Point of Interest (POI) is defined as an entity (such as a business) at a ground location (point) which may be (of interest). We provide high-quality POI data that is fresh, consistent, customizable, easy to use and with high-density coverage for all countries of the world.
This is our process flow:
Our machine learning systems continuously crawl for new POI data
Our geoparsing and geocoding calculates their geo locations
Our categorization systems cleanup and standardize the datasets
Our data pipeline API publishes the datasets on our data store
A new POI comes into existence. It could be a bar, a stadium, a museum, a restaurant, a cinema, or store, etc.. In today's interconnected world its information will appear very quickly in social media, pictures, websites, press releases. Soon after that, our systems will pick it up.
POI Data is in constant flux. Every minute worldwide over 200 businesses will move, over 600 new businesses will open their doors and over 400 businesses will cease to exist. And over 94% of all businesses have a public online presence of some kind tracking such changes. When a business changes, their website and social media presence will change too. We'll then extract and merge the new information, thus creating the most accurate and up-to-date business information dataset across the globe.
We offer our customers perpetual data licenses for any dataset representing this ever changing information, downloaded at any given point in time. This makes our company's licensing model unique in the current Data as a Service - DaaS Industry. Our customers don't have to delete our data after the expiration of a certain "Term", regardless of whether the data was purchased as a one time snapshot, or via our data update pipeline.
Customers requiring regularly updated datasets may subscribe to our Annual subscription plans. Our data is continuously being refreshed, therefore subscription plans are recommended for those who need the most up to date data. The main differentiators between us vs the competition are our flexible licensing terms and our data freshness.
Data samples may be downloaded at https://store.poidata.xyz/us
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Japan's main stock market index, the JP225, rose to 40790 points on July 23, 2025, gaining 2.55% from the previous session. Over the past month, the index has climbed 5.15% and is up 4.18% compared to the same time last year, according to trading on a contract for difference (CFD) that tracks this benchmark index from Japan. Japan Stock Market Index (JP225) - values, historical data, forecasts and news - updated on July of 2025.
https://fred.stlouisfed.org/legal/#copyright-pre-approvalhttps://fred.stlouisfed.org/legal/#copyright-pre-approval
View data of the S&P 500, an index of the stocks of 500 leading companies in the US economy, which provides a gauge of the U.S. equity market.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overall, this project was meant test the relationship between social media posts and their short-term effect on stock prices. We decided to use Reddit posts from financial specific subreddit communities like r/wallstreetbets, r/investing, and r/stocks to see the changes in the market associated with a variety of posts made by users. This idea came to light because of the GameStop short squeeze that showed the power of social media in the market. Typically, stock prices should purely represent the total present value of all the future value of the company, but the question we are asking is whether social media can impact that intrinsic value. Our research question was known from the start and it was do Reddit posts for or against a certain stock provide insight into how the market will move in a short window. To solve this problem, we selected five large tech companies including Apple, Tesla, Amazon, Microsoft, and Google. These companies would likely give us more data in the subreddits and would have less volatility day to day allowing us to simulate an experiment easier. They trade at very high values so a change from a Reddit post would have to be significant giving us proof that there is an effect.
Next, we had to choose our data sources for to have data to test with. First, we tried to locate the Reddit data using a Reddit API, but due to circumstances regarding Reddit requiring approval to use their data we switched to a Kaggle dataset that contained metadata from Reddit. For our second data set we had planned to use Yahoo Finance through yfinance, but due to the large amount of data we were pulling from this public API our IP address was temporarily blocked. This caused us to switch our second data to pull from Alpha Vantage. While this was a large switch in the public it was a minor roadblock and fixing the Finance pulling section allowed for everything else to continue to work in succession. Once we had both of our datasets programmatically pulled into our local vs code, we implemented a pipeline to clean, merge, and analyze all the data. At the end, we implement a Snakemake workflow to ensure the project was easily reproducible. To continue, we utilized Textblob to label our Reddit posts with a sentiment value of positive, negative, or neutral and provide us with a correlation value to analyze with. We then matched the time frame of each post with the stock data and computed any possible changes, found a correlation coefficient, and graphed our findings.
To conclude the data analysis, we found that there is relatively small or no correlation between the total companies, but Microsoft and Google do show stronger correlations when analyzed on their own. However, this may be due to other circumstances like why the post was made or if the market had other trends on those dates already. A larger analysis with more data from other social media platforms would be needed to conclude for our hypothesis that there is a strong correlation.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Public spending is often a contentious subject because different political parties have different agendas as to what should be the current national priorities. Of course, the same is true for the public in general. It is thus of interest to determine whether public spending is indeed as biased and capricious as it is often perceived, or whether there nevertheless exist some fundamental principles that guide it. We use data from the Commission for the Prevention of Corruption of the Republic of Slovenia, detailing every transfer of public money to the private sector from January 2003 to May 2020. During this time Slovenia has done business with no less than 248,989 companies. We find that the cumulative distribution of money received per company can be reasonably well explained by means of a power-law or a log-normal fit. We also show evidence for the first-mover advantage, and determine that the attachment rate of public spending to companies over time is roughly linear. These results indicate that Slovenian public spending is to a large extent guided by self organizing principles that, against all odds, go beyond nefarious interests and lobbying. Methods Being a (relatively) small nation, Slovenia keeps excellent records of its public spending via the Commission for the Prevention of Corruption of the Republic of Slovenia (CPC). It is an independent agency with a broad mandate to prevent and investigate corruption and other breaches of ethics and integrity, with a special focus on transparency of public spending. CPC gathers the data from nine different public institutions in Slovenia, including the Ministry of Finance, Public procurement portal and Public Payments Administration. In particular, all private companies registered in Slovenia are under legal obligation to report the exact information on any business done using public funds. Hence, CPC keeps track of all transactions where public money is being spent on business with private sector. For transparency motives all this data is publicly available on the CPC website. Upon signing the adequate contract, we received this data set from the CPC. The received data set includes all public-to-private transactions from January 2003 to May 2020. During these 209 months, it turns out, Republic of Slovenia has done business (ordering services or buying goods) with exactly 248,989 companies. To avoid noise and uncertainties we excluded from further analysis companies that in this period made less than 10,000 EUR. This cut-off in total spending per company translates to 105,086 companies to which we focus in the analysis. In this matrix, each element is the amount of money (in EUR) that some company received from all public bodies during one of these 209 months. In other words, for any given company we have a time series with 209 values, each value representing the volume of business done using public funds overthat month. This data is possibly unique in the world. Its completeness and precision allows for examining the presence of self-organization, which is what we devote the rest of this paper to. Specifically, we find a heavy-tailed distributions of total public spending per company versus company rank that can be fitted reasonably well by a power-law.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
France's main stock market index, the FR40, fell to 7744 points on July 22, 2025, losing 0.69% from the previous session. Over the past month, the index has climbed 2.74% and is up 1.92% compared to the same time last year, according to trading on a contract for difference (CFD) that tracks this benchmark index from France. France Stock Market Index (FR40) - values, historical data, forecasts and news - updated on July of 2025.
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
NEW!: Use the new Business Account Number lookup tool.
SUMMARY This dataset includes the locations of businesses that pay taxes to the City and County of San Francisco. Each registered business may have multiple locations and each location is a single row. The Treasurer & Tax Collector’s Office collects this data through business registration applications, account update/closure forms, and taxpayer filings. Business locations marked as “Administratively Closed” have not filed or communicated with TTX for 3 years, or were marked as closed following a notification from another City and County Department.
The data is collected to help enforce the Business and Tax Regulations Code including, but not limited to: Article 6, Article 12, Article 12-A, and Article 12-A-1. http://sftreasurer.org/registration.
HOW TO USE THIS DATASET
To learn more about using this dataset watch this video. To update your listing or look up your BAN see this FAQ: Registered Business Locations Explainer
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Bankruptcies in the United States increased to 23309 Companies in the first quarter of 2025 from 23107 Companies in the fourth quarter of 2024. This dataset provides - United States Bankruptcies - actual values, historical data, forecast, chart, statistics, economic calendar and news.
The Project Green Light Locations data set documents private businesses and other organizations that participate in Project Green Light Detroit, a program started on January 1, 2016 as a partnership between local businesses, the City of Detroit and community groups. Local businesses and organizations that participate in this program have installed real-time camera connections with Detroit Police Department headquarters and visibly communicate their participation to the public through project-specific green lights and signage at each participating location. PGL locations have grown in number and diversity from eight gas stations at the inception of the program in 2016 to include a growing number of retail and service-based businesses, residential facilities such as apartment complexes and nursing homes, and community organizations such as houses of worship. The PGLL dataset records location-level data and includes the business or organization name, address, business type, corresponding police precinct, and the date a location went live with a real-time camera connection to DPD. Records in the data set have been enriched through geocoding to enable us to map address locations. If a business or organization has multiple participating locations, a record for each location is included in the data set.
I always wanted to have a program that fetch the whole stock market data at once without concerning about new companies that went public recently. So, here it is.
This dataset contains 2 python scripts which one can fetch the data from on their own machine without any special requirements by just running the collect.py . I have done this part in May/21/2021 (Version 2). So, the data is available until then. If one wants to have extend that period, they can run the collect.py .
tickers.csv contains ticker names along with some additional data such as name of the company, sector, industry, and the country of the company.
Each CSV file in stocksData folder named as the company's ticker name. Each file has 8 columns: - Date: as an index. - Open, Close, High, Low: which is in dollars. - Volume: which is number of shares that traded in specific date. - Stock Splits: Show if there is a stock split in specific day as the split ratio. - Dividends: which is in dollars. If a company doesn’t provide dividends for their share holders, this column can be dropped.
I've used finviz site and yfinance package to gather this rich data.
I hope one can find this helpful and interesting. If you have any questions don't hesitate to contact me at milad@miladtabrizi.com .