47 datasets found

US Stock Market
kaggle.com
Updated May 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Milad (2021). US Stock Market [Dataset]. https://www.kaggle.com/mryder/us-stock-market-historical-data/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 26, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Milad
Description
Context

I always wanted to have a program that fetch the whole stock market data at once without concerning about new companies that went public recently. So, here it is.

Content

This dataset contains 2 python scripts which one can fetch the data from on their own machine without any special requirements by just running the collect.py . I have done this part in May/21/2021 (Version 2). So, the data is available until then. If one wants to have extend that period, they can run the collect.py .

Columns Description

tickers.csv contains ticker names along with some additional data such as name of the company, sector, industry, and the country of the company.

Each CSV file in stocksData folder named as the company's ticker name. Each file has 8 columns: - Date: as an index. - Open, Close, High, Low: which is in dollars. - Volume: which is number of shares that traded in specific date. - Stock Splits: Show if there is a stock split in specific day as the split ratio. - Dividends: which is in dollars. If a company doesn’t provide dividends for their share holders, this column can be dropped.

Acknowledgements

I've used finviz site and yfinance package to gather this rich data.

Inspiration

I hope one can find this helpful and interesting. If you have any questions don't hesitate to contact me at milad@miladtabrizi.com .
National Stock Exchange : Time Series
kaggle.com
Updated Dec 4, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Atul Anand {Jha} (2019). National Stock Exchange : Time Series [Dataset]. https://www.kaggle.com/atulanandjha/national-stock-exchange-time-series/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 4, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Atul Anand {Jha}
License
http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
Description
Context

The National Stock Exchange of India Ltd. (NSE) is an Indian stock exchange located at Mumbai, Maharashtra, India. National Stock Exchange (NSE) was established in 1992 as a demutualized electronic exchange. It was promoted by leading financial institutions on request of the Government of India. It is India’s largest exchange by turnover. In 1994, it launched electronic screen-based trading. Thereafter, it went on to launch index futures and internet trading in 2000, which were the first of its kind in the country.

With the help of NSE, you can trade in the following segments:

Equities

Indices

Mutual Funds

Exchange Traded Funds

Initial Public Offerings

Security Lending and Borrowing Scheme

https://cdn6.newsnation.in/images/2019/06/24/Sharemarket-164616041_6.jpg" alt="Stock image">

Companies on successful IPOs gets their Stocks traded over different Stock Exchnage platforms. NSE is one important platofrm in India. There are thousands of companies trading their stocks in NSE. But, I have chosen two popular and high rated IT service companies of India; TCS and INFOSYS. and the third one is the benchmark for Indian IT companies , i.e. NIFTY_IT_INDEX .

Content

The dataset contains three csv files. Each resembling to INFOSYS, NIFTY_IT_INDEX, and TCS, respectively. One can easily identify that by the name of CSV files.

Timeline of Data recording : 1-1-2015 to 31-12-2015.

Source of Data : Official NSE website.

Method : We have used the NSEpy api to fetch the data from NSE site. I have also mentioned my approach in this Kernel - "**WebScraper to download data for NSE**". Please go though that to better understand the nature of this dataset.

Shape of Dataset:

INFOSYS - 248 x 15 || NIFTY_IT_INDEX - 248 x 7 || **TCS - 248 x 15

Colum Descriptors:

Date: date on which data is recorded

Symbol: NSE symbol of the stock

Series: Series of that stock | EQ - Equity

OTHER SERIES' ARE:

EQ: It stands for Equity. In this series intraday trading is possible in addition to delivery.

BE: It stands for Book Entry. Shares falling in the Trade-to-Trade or T-segment are traded in this series and no intraday is allowed. This means trades can only be settled by accepting or giving the delivery of shares.

BL: This series is for facilitating block deals. Block deal is a trade, with a minimum quantity of 5 lakh shares or minimum value of Rs. 5 crore, executed through a single transaction, on the special “Block Deal window”. The window is opened for only 35 minutes in the morning from 9:15 to 9:50AM.

BT: This series provides an exit route to small investors having shares in the physical form with a cap of maximum 500 shares.

GC: This series allows Government Securities and Treasury Bills to be traded under this category.

IL: This series allows only FIIs to trade among themselves. Permissible only in those securities where maximum permissible limit for FIIs is not breached.

Prev Close: Last day close point

Open: current day open point

High: current day highest point

Low: current day lowest point

Last: the final quoted trading price for a particular stock, or stock-market index, during the most recent day of trading.

Close: Closing point for the current day

VWAP: volume-weighted average price is the ratio of the value traded to total volume traded over a particular time horizon

Volume: the amount of a security that was traded during a given period of time. For every buyer, there is a seller, and each transaction contributes to the count of total volume.

Turnover: Total Turnover of the stock till that day

Trades: Number of buy or Sell of the stock.

Deliverable: Volumethe quantity of shares which actually move from one set of people (who had those shares in their demat account before today and are selling today) to another set of people (who have purchased those shares and will get those shares by T+2 days in their demat account).

%Deliverble: percentage deliverables of that stock

Acknowledgements

I woul dlike to acknowledge all my sincere thanks to the brains behind NSEpy api, and in particular SWAPNIL JARIWALA , who is also maintaining an amazing open source github repo for this api.

Inspiration

I have also built a starter kernel for this dataset. You can find that right here .

I am so excited to see your magical approaches for the same dataset.

THANKS!
A
‘Google Stock Data’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Google Stock Data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-google-stock-data-1a5f/latest
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Google Stock Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/varpit94/google-stock-data on 28 January 2022.

--- Dataset description provided by original source is as follows ---

What is Google?

Google LLC is an American multinational technology company that specializes in Internet-related services and products, which include online advertising technologies, a search engine, cloud computing, software, and hardware. It is considered one of the Big Five companies in the American information technology industry, along with Amazon, Facebook, Apple, and Microsoft. Google was founded on September 4, 1998, by Larry Page and Sergey Brin while they were Ph.D. students at Stanford University in California. Together they own about 14% of its publicly-listed shares and control 56% of the stockholder voting power through super-voting stock. The company went public via an initial public offering (IPO) in 2004. In 2015, Google was reorganized as a wholly-owned subsidiary of Alphabet Inc. Google is Alphabet's largest subsidiary and is a holding company for Alphabet's Internet properties and interests. Sundar Pichai was appointed CEO of Google on October 24, 2015, replacing Larry Page, who became the CEO of Alphabet. On December 3, 2019, Pichai also became the CEO of Alphabet.

Information about this dataset

This dataset provides historical data of Alphabet Inc. (GOOG). The data is available at a daily level. Currency is USD.

--- Original source retains full ownership of the source dataset ---

US Company Bankruptcy Prediction Dataset

kaggle.com

Updated Jun 1, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

Utkarsh Singh (2023). US Company Bankruptcy Prediction Dataset [Dataset]. https://www.kaggle.com/utkarshx27/american-companies-bankruptcy-prediction-dataset/discussion

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jun 1, 2023

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Utkarsh Singh

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Area covered

United States

Description

A novel dataset for bankruptcy prediction related to American public companies listed on the New York Stock Exchange and NASDAQ is provided. The dataset comprises accounting data from 8,262 distinct companies recorded during the period spanning from 1999 to 2018.

According to the Security Exchange Commission (SEC), a company in the American market is deemed bankrupt under two circumstances. Firstly, if the firm's management files for Chapter 11 of the Bankruptcy Code, indicating an intention to "reorganize" its business. In this case, the company's management continues to oversee day-to-day operations, but significant business decisions necessitate approval from a bankruptcy court. Secondly, if the firm's management files for Chapter 7 of the Bankruptcy Code, indicating a complete cessation of operations and the company going out of business entirely.

In this dataset, the fiscal year prior to the filing of bankruptcy under either Chapter 11 or Chapter 7 is labeled as "Bankruptcy" (1) for the subsequent year. Conversely, if the company does not experience these bankruptcy events, it is considered to be operating normally (0). The dataset is complete, without any missing values, synthetic entries, or imputed added values.

The resulting dataset comprises a total of 78,682 observations of firm-year combinations. To facilitate model training and evaluation, the dataset is divided into three subsets based on time periods. The training set consists of data from 1999 to 2011, the validation set comprises data from 2012 to 2014, and the test set encompasses the years 2015 to 2018. The test set serves as a means to assess the predictive capability of models in real-world scenarios involving unseen cases.

Variable Name	Description
X1	Current assets - All the assets of a company that are expected to be sold or used as a result of standard
	business operations over the next year
X2	Cost of goods sold - The total amount a company paid as a cost directly related to the sale of products
X3	Depreciation and amortization - Depreciation refers to the loss of value of a tangible fixed asset over
	time (such as property, machinery, buildings, and plant). Amortization refers to the loss of value of
	intangible assets over time.
X4	EBITDA - Earnings before interest, taxes, depreciation, and amortization. It is a measure of a company's
	overall financial performance, serving as an alternative to net income.
X5	Inventory - The accounting of items and raw materials that a company either uses in production or sells.
X6	Net Income - The overall profitability of a company after all expenses and costs have been deducted from
	total revenue.
X7	Total Receivables - The balance of money due to a firm for goods or services delivered or used but not
	yet paid for by customers.
X8	Market value - The price of an asset in a marketplace. In this dataset, it refers to the market
	capitalization since companies are publicly traded in the stock market.
X9	Net sales - The sum of a company's gross sales minus its returns, allowances, and discounts.
X10	Total assets - All the assets, or items of value, a business owns.
X11	Total Long-term debt - A company's loans and other liabilities that will not become due within one year
	of the balance sheet date.
X12	EBIT - Earnings before interest and taxes.
X13	Gross Profit - The profit a business makes after subtracting all the costs that are related to
	manufacturi...

j
Data from: IPO Database in Japanese IT industry sector 2009-2015
jstagedata.jst.go.jp
xlsx
Updated Jul 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Takeyasu Ichikohji; Koji Nakano; Masamichi Ogami (2023). IPO Database in Japanese IT industry sector 2009-2015 [Dataset]. http://doi.org/10.50895/data.abas.21293715.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.50895/data.abas.21293715.v1
Dataset updated
Jul 27, 2023
Dataset provided by
Global Business Research Center
Authors
Takeyasu Ichikohji; Koji Nakano; Masamichi Ogami
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This file contains a dataset of companies in the information and communication industry listed on the Japanese stock market from 2009 to 2015. The dataset describes the name of the company, the name of the listed market, the date of listing, the date of real establishment, the number of years listed, the type of startup, the real establishment (yes = 1, no = 0), the data source, and the date the data source was viewed.
SEC Public Dataset
console.cloud.google.com
Updated Aug 18, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:U.S.%20Securities%20and%20Exchange%20Commission&hl=zh-TW&inv=1&invt=Ab2c-Q (2023). SEC Public Dataset [Dataset]. https://console.cloud.google.com/marketplace/product/sec-public-data-bq/sec-public-dataset?hl=zh-TW
Explore at:
Dataset updated
Aug 18, 2023
Dataset provided by
Googlehttp://google.com/
Description
In the U.S. public companies, certain insiders and broker-dealers are required to regularly file with the SEC. The SEC makes this data available online for anybody to view and use via their Electronic Data Gathering, Analysis, and Retrieval (EDGAR) database. The SEC updates this data every quarter going back to January, 2009. To aid analysis a quick summary view of the data has been created that is not available in the original dataset. The quick summary view pulls together signals into a single table that otherwise would have to be joined from multiple tables and enables a more streamlined user experience. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets.瞭解詳情
T
United States Corporate Profits
tradingeconomics.com
jp.tradingeconomics.com
+13more
csv, excel, json, xml
Updated Jun 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2025). United States Corporate Profits [Dataset]. https://tradingeconomics.com/united-states/corporate-profits
Explore at:
excel, xml, json, csvAvailable download formats
Dataset updated
Jun 26, 2025
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Mar 31, 1947 - Mar 31, 2025
Area covered
United States
Description
Corporate Profits in the United States decreased to 3203.60 USD Billion in the first quarter of 2025 from 3312 USD Billion in the fourth quarter of 2024. This dataset provides the latest reported value for - United States Corporate Profits - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.
d
Global Company Data | 50M+ Private, Public & Startup Profiles, Verified...
datarade.ai
.csv, .json
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xverum, Global Company Data | 50M+ Private, Public & Startup Profiles, Verified Firmographics & Registry-Level Business Info [Dataset]. https://datarade.ai/data-products/xverum-apac-commerce-data-individual-company-data-apac-b-xverum
Explore at:
.csv, .jsonAvailable download formats
Dataset authored and provided by
Xverum
Area covered
Timor-Leste, Palestine, Bahrain, Tokelau, Japan, Wallis and Futuna, Hong Kong, Yemen, Oman, Kiribati
Description
Xverum’s Company Data delivers comprehensive insights into over 50 million global businesses, from fast-growing startups to established private companies. This dataset is a trusted source for investors, analysts, and B2B teams seeking reliable firmographic data, company registry attributes, and organizational details across industries and geographies.

Whether you’re researching potential clients, running B2B campaigns, or building smarter go-to-market strategies, this company dataset gives you the full picture—updated every 30 days.

What’s Included: ✅ 50M+ Verified Company Records across 249 countries ✅ 40+ Firmographic Attributes, including: ✔️ Company Name, Industry ✔️ Employee Count, HQ Location, Founding Year ✔️ Company Domain, Company Profile URL, Registry Type ✅ Private, Public & Startup Coverage with a focus on any business size. ✅ Custom Region Delivery – segment by country, region or worldwide. ✅ 30-Day Refresh Cycle to keep your data fresh and investment-ready ✅ Available in CSV, JSON, or via API & S3

Use Cases: ➡️ Company Research & Competitive Benchmarking Analyze growth metrics and benchmarks across industries and private company peers.

➡️ B2B Lead Generation & Outreach Fuel CRM and outbound sales platforms with firmographic-enriched startup and SMB records.

➡️ Investor Intelligence & Deal Sourcing Spot high-growth startups by tracking employee expansion, market entry, and location-based clusters.

➡️ Market Mapping & Go-To-Market Planning Build total addressable market (TAM) maps using verified business registry records and firmographics.

Why Choose Xverum’s Company Dataset? ✅ Global Reach: 50M companies, with data on startups, SMEs, and private firms in emerging and developed markets ✅ Flexible Formats: Delivered via API, bulk export, or cloud delivery ✅ GDPR & CCPA Compliant: Ethically sourced and privacy-focused

Ready to enrich your CRM or power your next B2B campaign? Request a free sample today or contact us to dive deeper into your data needs.
Data from: SEC Filings
kaggle.com
zip
Updated Jun 5, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2020). SEC Filings [Dataset]. https://www.kaggle.com/datasets/bigquery/sec-filings
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Jun 5, 2020
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
Description
In the U.S. public companies, certain insiders and broker-dealers are required to regularly file with the SEC. The SEC makes this data available online for anybody to view and use via their Electronic Data Gathering, Analysis, and Retrieval (EDGAR) database. The SEC updates this data every quarter going back to January, 2009. For more information please see this site.

To aid analysis a quick summary view of the data has been created that is not available in the original dataset. The quick summary view pulls together signals into a single table that otherwise would have to be joined from multiple tables and enables a more streamlined user experience.

DISCLAIMER: The Financial Statement and Notes Data Sets contain information derived from structured data filed with the Commission by individual registrants as well as Commission-generated filing identifiers. Because the data sets are derived from information provided by individual registrants, we cannot guarantee the accuracy of the data sets. In addition, it is possible inaccuracies or other errors were introduced into the data sets during the process of extracting the data and compiling the data sets. Finally, the data sets do not reflect all available information, including certain metadata associated with Commission filings. The data sets are intended to assist the public in analyzing data contained in Commission filings; however, they are not a substitute for such filings. Investors should review the full Commission filings before making any investment decision.
Amount of data created, consumed, and stored 2010-2023, with forecasts to...
statista.com
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
Explore at:
Dataset updated
Jun 30, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
May 2024
Area covered
Worldwide
Description
The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.
S&P Compustat Database
lseg.com
sql
Updated Nov 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LSEG (2024). S&P Compustat Database [Dataset]. https://www.lseg.com/en/data-analytics/financial-data/company-data/fundamentals-data/standardized-fundamentals/sp-compustat-database
Explore at:
sqlAvailable download formats
Dataset updated
Nov 25, 2024
Dataset provided by
London Stock Exchange Grouphttp://www.londonstockexchangegroup.com/
Authors
LSEG
License
https://www.lseg.com/en/policies/website-disclaimerhttps://www.lseg.com/en/policies/website-disclaimer
Description
Access historical and point-in-time financial statements, ratios, multiples, and press releases, with LSEG's S&P Compustat Database.
d
US Restaurant POI dataset with metadata
datarade.ai
.csv
Updated Jul 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Geolytica (2022). US Restaurant POI dataset with metadata [Dataset]. https://datarade.ai/data-products/us-restaurant-poi-dataset-with-metadata-geolytica
Explore at:
.csvAvailable download formats
Dataset updated
Jul 30, 2022
Dataset authored and provided by
Geolytica
Area covered
United States of America
Description
Point of Interest (POI) is defined as an entity (such as a business) at a ground location (point) which may be (of interest). We provide high-quality POI data that is fresh, consistent, customizable, easy to use and with high-density coverage for all countries of the world.

This is our process flow:

Our machine learning systems continuously crawl for new POI data Our geoparsing and geocoding calculates their geo locations Our categorization systems cleanup and standardize the datasets Our data pipeline API publishes the datasets on our data store

A new POI comes into existence. It could be a bar, a stadium, a museum, a restaurant, a cinema, or store, etc.. In today's interconnected world its information will appear very quickly in social media, pictures, websites, press releases. Soon after that, our systems will pick it up.

POI Data is in constant flux. Every minute worldwide over 200 businesses will move, over 600 new businesses will open their doors and over 400 businesses will cease to exist. And over 94% of all businesses have a public online presence of some kind tracking such changes. When a business changes, their website and social media presence will change too. We'll then extract and merge the new information, thus creating the most accurate and up-to-date business information dataset across the globe.

We offer our customers perpetual data licenses for any dataset representing this ever changing information, downloaded at any given point in time. This makes our company's licensing model unique in the current Data as a Service - DaaS Industry. Our customers don't have to delete our data after the expiration of a certain "Term", regardless of whether the data was purchased as a one time snapshot, or via our data update pipeline.

Customers requiring regularly updated datasets may subscribe to our Annual subscription plans. Our data is continuously being refreshed, therefore subscription plans are recommended for those who need the most up to date data. The main differentiators between us vs the competition are our flexible licensing terms and our data freshness.

Data samples may be downloaded at https://store.poidata.xyz/us
T
Japan Stock Market Index (JP225) Data
tradingeconomics.com
ko.tradingeconomics.com
+12more
csv, excel, json, xml
Updated Feb 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2024). Japan Stock Market Index (JP225) Data [Dataset]. https://tradingeconomics.com/japan/stock-market
Explore at:
excel, csv, xml, jsonAvailable download formats
Dataset updated
Feb 1, 2024
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 5, 1965 - Jul 23, 2025
Area covered
Japan
Description
Japan's main stock market index, the JP225, rose to 40790 points on July 23, 2025, gaining 2.55% from the previous session. Over the past month, the index has climbed 5.15% and is up 4.18% compared to the same time last year, according to trading on a contract for difference (CFD) that tracks this benchmark index from Japan. Japan Stock Market Index (JP225) - values, historical data, forecasts and news - updated on July of 2025.
F
S&P 500
fred.stlouisfed.org
json
Updated Jul 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). S&P 500 [Dataset]. https://fred.stlouisfed.org/series/SP500
Explore at:
jsonAvailable download formats
Dataset updated
Jul 23, 2025
License
https://fred.stlouisfed.org/legal/#copyright-pre-approvalhttps://fred.stlouisfed.org/legal/#copyright-pre-approval
Description
View data of the S&P 500, an index of the stocks of 500 leading companies in the US economy, which provides a gauge of the U.S. equity market.
Reddit Sentiment VS Stock Price
zenodo.org
bin, csv, json, png +2
Updated May 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Will Baysingar; Will Baysingar (2025). Reddit Sentiment VS Stock Price [Dataset]. http://doi.org/10.5281/zenodo.15367306
Explore at:
csv, bin, png, text/x-python, txt, jsonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15367306
Dataset updated
May 8, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Will Baysingar; Will Baysingar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Overall, this project was meant test the relationship between social media posts and their short-term effect on stock prices. We decided to use Reddit posts from financial specific subreddit communities like r/wallstreetbets, r/investing, and r/stocks to see the changes in the market associated with a variety of posts made by users. This idea came to light because of the GameStop short squeeze that showed the power of social media in the market. Typically, stock prices should purely represent the total present value of all the future value of the company, but the question we are asking is whether social media can impact that intrinsic value. Our research question was known from the start and it was do Reddit posts for or against a certain stock provide insight into how the market will move in a short window. To solve this problem, we selected five large tech companies including Apple, Tesla, Amazon, Microsoft, and Google. These companies would likely give us more data in the subreddits and would have less volatility day to day allowing us to simulate an experiment easier. They trade at very high values so a change from a Reddit post would have to be significant giving us proof that there is an effect.

Next, we had to choose our data sources for to have data to test with. First, we tried to locate the Reddit data using a Reddit API, but due to circumstances regarding Reddit requiring approval to use their data we switched to a Kaggle dataset that contained metadata from Reddit. For our second data set we had planned to use Yahoo Finance through yfinance, but due to the large amount of data we were pulling from this public API our IP address was temporarily blocked. This caused us to switch our second data to pull from Alpha Vantage. While this was a large switch in the public it was a minor roadblock and fixing the Finance pulling section allowed for everything else to continue to work in succession. Once we had both of our datasets programmatically pulled into our local vs code, we implemented a pipeline to clean, merge, and analyze all the data. At the end, we implement a Snakemake workflow to ensure the project was easily reproducible. To continue, we utilized Textblob to label our Reddit posts with a sentiment value of positive, negative, or neutral and provide us with a correlation value to analyze with. We then matched the time frame of each post with the stock data and computed any possible changes, found a correlation coefficient, and graphed our findings.

To conclude the data analysis, we found that there is relatively small or no correlation between the total companies, but Microsoft and Google do show stronger correlations when analyzed on their own. However, this may be due to other circumstances like why the post was made or if the market had other trends on those dates already. A larger analysis with more data from other social media platforms would be needed to conclude for our hypothesis that there is a strong correlation.
n
Data from Commission for the Prevention of Corruption of the Republic of...
data.niaid.nih.gov
datadryad.org
zip
Updated Aug 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jelena Joksimovic; Matjaz Perc; Zoran Levnajic (2023). Data from Commission for the Prevention of Corruption of the Republic of Slovenia: public-to-private transactions [Dataset]. http://doi.org/10.5061/dryad.5x69p8d6x
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.5x69p8d6x
Dataset updated
Aug 22, 2023
Dataset provided by
University of Maribor
Univerzitetno Središče Novo mesto
Authors
Jelena Joksimovic; Matjaz Perc; Zoran Levnajic
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Area covered
Slovenia
Description
Public spending is often a contentious subject because different political parties have different agendas as to what should be the current national priorities. Of course, the same is true for the public in general. It is thus of interest to determine whether public spending is indeed as biased and capricious as it is often perceived, or whether there nevertheless exist some fundamental principles that guide it. We use data from the Commission for the Prevention of Corruption of the Republic of Slovenia, detailing every transfer of public money to the private sector from January 2003 to May 2020. During this time Slovenia has done business with no less than 248,989 companies. We find that the cumulative distribution of money received per company can be reasonably well explained by means of a power-law or a log-normal fit. We also show evidence for the first-mover advantage, and determine that the attachment rate of public spending to companies over time is roughly linear. These results indicate that Slovenian public spending is to a large extent guided by self organizing principles that, against all odds, go beyond nefarious interests and lobbying. Methods Being a (relatively) small nation, Slovenia keeps excellent records of its public spending via the Commission for the Prevention of Corruption of the Republic of Slovenia (CPC). It is an independent agency with a broad mandate to prevent and investigate corruption and other breaches of ethics and integrity, with a special focus on transparency of public spending. CPC gathers the data from nine different public institutions in Slovenia, including the Ministry of Finance, Public procurement portal and Public Payments Administration. In particular, all private companies registered in Slovenia are under legal obligation to report the exact information on any business done using public funds. Hence, CPC keeps track of all transactions where public money is being spent on business with private sector. For transparency motives all this data is publicly available on the CPC website. Upon signing the adequate contract, we received this data set from the CPC. The received data set includes all public-to-private transactions from January 2003 to May 2020. During these 209 months, it turns out, Republic of Slovenia has done business (ordering services or buying goods) with exactly 248,989 companies. To avoid noise and uncertainties we excluded from further analysis companies that in this period made less than 10,000 EUR. This cut-off in total spending per company translates to 105,086 companies to which we focus in the analysis. In this matrix, each element is the amount of money (in EUR) that some company received from all public bodies during one of these 209 months. In other words, for any given company we have a time series with 209 values, each value representing the volume of business done using public funds overthat month. This data is possibly unique in the world. Its completeness and precision allows for examining the presence of self-organization, which is what we devote the rest of this paper to. Specifically, we find a heavy-tailed distributions of total public spending per company versus company rank that can be fitted reasonably well by a power-law.
T
France Stock Market Index (FR40) Data
tradingeconomics.com
pl.tradingeconomics.com
+13more
csv, excel, json, xml
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS, France Stock Market Index (FR40) Data [Dataset]. https://tradingeconomics.com/france/stock-market
Explore at:
json, xml, csv, excelAvailable download formats
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jul 9, 1987 - Jul 22, 2025
Area covered
France
Description
France's main stock market index, the FR40, fell to 7744 points on July 22, 2025, losing 0.69% from the previous session. Over the past month, the index has climbed 2.74% and is up 1.92% compared to the same time last year, according to trading on a contract for difference (CFD) that tracks this benchmark index from France. France Stock Market Index (FR40) - values, historical data, forecasts and news - updated on July of 2025.
D
Registered Business Locations - San Francisco
data.sfgov.org
datadiscoverystudio.org
+3more
Updated Jul 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City and County of San Francisco (2025). Registered Business Locations - San Francisco [Dataset]. https://data.sfgov.org/widgets/g8m3-pdis
Explore at:
kmz, tsv, csv, xml, application/geo+json, application/rssxml, application/rdfxml, kmlAvailable download formats
Dataset updated
Jul 23, 2025
Dataset authored and provided by
City and County of San Francisco
License
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Area covered
San Francisco
Description
NEW!: Use the new Business Account Number lookup tool.

SUMMARY This dataset includes the locations of businesses that pay taxes to the City and County of San Francisco. Each registered business may have multiple locations and each location is a single row. The Treasurer & Tax Collector’s Office collects this data through business registration applications, account update/closure forms, and taxpayer filings. Business locations marked as “Administratively Closed” have not filed or communicated with TTX for 3 years, or were marked as closed following a notification from another City and County Department.

The data is collected to help enforce the Business and Tax Regulations Code including, but not limited to: Article 6, Article 12, Article 12-A, and Article 12-A-1. http://sftreasurer.org/registration.

HOW TO USE THIS DATASET
System migration in 2014: When the City transitioned to a new system in 2014, only active business accounts were migrated. As a result, any businesses that had already closed by that point were not included in the current dataset.
2018 account cleanup: In 2018, TTX did a major cleanup of dormant and unresponsive accounts and closed approximately 40,000 inactive businesses.

To learn more about using this dataset watch this video. To update your listing or look up your BAN see this FAQ: Registered Business Locations Explainer

United States Bankruptcies

tradingeconomics.com
jp.tradingeconomics.com
+13more

csv, excel, json, xml

Facebook

Twitter

Click to copy link

Link copied

Cite

TRADING ECONOMICS, United States Bankruptcies [Dataset]. https://tradingeconomics.com/united-states/bankruptcies

Explore at:

json, xml, csv, excelAvailable download formats

Dataset authored and provided by

TRADING ECONOMICS

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered

Dec 31, 1980 - Mar 31, 2025

Area covered

United States

Description

Bankruptcies in the United States increased to 23309 Companies in the first quarter of 2025 from 23107 Companies in the fourth quarter of 2024. This dataset provides - United States Bankruptcies - actual values, historical data, forecast, chart, statistics, economic calendar and news.

Project Green Light Locations

data.ferndalemi.gov
detroitdata.org
+2more

Updated Oct 17, 2017

Facebook

Twitter

Click to copy link

Link copied

Cite

City of Detroit (2017). Project Green Light Locations [Dataset]. https://data.ferndalemi.gov/items/b827c82731294708b500f7c10b3240b1

Explore at:

Dataset updated

Oct 17, 2017

Dataset authored and provided by

City of Detroit

Area covered

Description

The Project Green Light Locations data set documents private businesses and other organizations that participate in Project Green Light Detroit, a program started on January 1, 2016 as a partnership between local businesses, the City of Detroit and community groups. Local businesses and organizations that participate in this program have installed real-time camera connections with Detroit Police Department headquarters and visibly communicate their participation to the public through project-specific green lights and signage at each participating location. PGL locations have grown in number and diversity from eight gas stations at the inception of the program in 2016 to include a growing number of retail and service-based businesses, residential facilities such as apartment complexes and nursing homes, and community organizations such as houses of worship. The PGLL dataset records location-level data and includes the business or organization name, address, business type, corresponding police precinct, and the date a location went live with a real-time camera connection to DPD. Records in the data set have been enriched through geocoding to enable us to map address locations. If a business or organization has multiple participating locations, a record for each location is included in the data set.

US Stock Market

Context

Content

Columns Description

Acknowledgements

Inspiration

National Stock Exchange : Time Series

Context

Content

Shape of Dataset:

OTHER SERIES' ARE:

Acknowledgements

Inspiration

THANKS!

‘Google Stock Data’ analyzed by Analyst-2

What is Google?

Information about this dataset

US Company Bankruptcy Prediction Dataset

Data from: IPO Database in Japanese IT industry sector 2009-2015

SEC Public Dataset

United States Corporate Profits

Global Company Data | 50M+ Private, Public & Startup Profiles, Verified...

Data from: SEC Filings

Amount of data created, consumed, and stored 2010-2023, with forecasts to...

S&P Compustat Database

US Restaurant POI dataset with metadata

Japan Stock Market Index (JP225) Data

S&P 500

Reddit Sentiment VS Stock Price

Data from Commission for the Prevention of Corruption of the Republic of...

France Stock Market Index (FR40) Data

Registered Business Locations - San Francisco

United States Bankruptcies

Project Green Light Locations

US Stock Market

Historical data of US stock market

Context

Content

Columns Description

Acknowledgements

Inspiration