28 datasets found

Stock Market Dataset
kaggle.com
zip
Updated Apr 2, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oleh Onyshchak (2020). Stock Market Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/1054465
Explore at:
zip(547714524 bytes)Available download formats
Unique identifier
https://doi.org/10.34740/kaggle/dsv/1054465
Dataset updated
Apr 2, 2020
Authors
Oleh Onyshchak
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Overview

This dataset contains historical daily prices for all tickers currently trading on NASDAQ. The up to date list is available from nasdaqtrader.com. The historic data is retrieved from Yahoo finance via yfinance python package.

It contains prices for up to 01 of April 2020. If you need more up to date data, just fork and re-run data collection script also available from Kaggle.

Data Structure

The date for every symbol is saved in CSV format with common fields:

Date - specifies trading date

Open - opening price

High - maximum price during the day

Low - minimum price during the day

Close - close price adjusted for splits

Adj Close - adjusted close price adjusted for both dividends and splits.

Volume - the number of shares that changed hands during a given day

All that ticker data is then stored in either ETFs or stocks folder, depending on a type. Moreover, each filename is the corresponding ticker symbol. At last, symbols_valid_meta.csv contains some additional metadata for each ticker such as full name.
TRACE_ACL18
kaggle.com
zip
Updated Aug 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Guoxuan Sun (2025). TRACE_ACL18 [Dataset]. https://www.kaggle.com/datasets/williamtage/trace-acl18
Explore at:
zip(198329468 bytes)Available download formats
Dataset updated
Aug 1, 2025
Authors
Guoxuan Sun
Description
Context The StockNet dataset, introduced by Xu and Cohen at ACL 2018, is a benchmark for measuring the effectiveness of textual information in stock market prediction. While the original dataset provides valuable price and news data, it requires significant pre-processing and feature engineering to be used effectively in advanced machine learning models.

This dataset was created to bridge that gap. We have taken the original data for 87 stocks and performed extensive feature engineering, creating a rich, multi-modal feature repository.

A key contribution of this work is a preliminary statistical analysis of the news data for each stock. Based on the consistency and volume of news, we have categorized the 87 stocks into two distinct groups, allowing researchers to choose the most appropriate modeling strategy:

joint_prediction_model_set: Stocks with rich and consistent news data, ideal for building complex, single models that analyze all stocks jointly.

panel_data_model_set: Stocks with less consistent news data, which are better suited for traditional panel data analysis.

Content and File Structure The dataset is organized into two main directories, corresponding to the two stock categories mentioned above.

1.joint_prediction_model_set This directory contains stocks suitable for sophisticated, news-aware joint modeling.

-Directory Structure: This directory contains a separate sub-directory for each stock suitable for joint modeling (e.g., AAPL/, MSFT/, etc.).

-Folder Contents: Inside each stock's folder, you will find a set of files, each corresponding to a different category of engineered features. These files include:

-News Graph Embeddings: A NumPy tensor file (.npy) containing the encoded graph embeddings from daily news. Its shape is (Days, N, 128), where N is the number of daily articles.

-Engineered Features: A CSV file containing fundamental features derived directly from OHLCV data (e.g., intraday_range, log_return).

-Technical Indicators: A CSV file with a wide array of popular technical indicators (e.g., SMA, EMA, MACD, RSI, Bollinger Bands).

-Statistical & Time Features: A CSV file with rolling statistical features (e.g., volatility, skew, kurtosis) over an optimized window, plus cyclical time-based features.

-Advanced & Transformational Features: A CSV file with complex features like lagged variables, wavelet transform coefficients, and the Hurst Exponent.

2.panel_data_model_set This directory contains stocks that are more suitable for panel data models, based on the statistical properties of their associated news data.

-Directory Structure: Similar to the joint prediction set, this directory also contains a separate sub-directory for each stock in this category.

-Folder Contents: Inside each stock's folder, you will find the cleaned and structured price and news text data. This facilitates the application of econometric models or machine learning techniques designed for panel data, where observations are tracked for the same subjects (stocks) over a period of time.

-Further Information: For a detailed breakdown of the statistical analysis used to separate the stocks into these two groups, please refer to the data_preview.ipynb notebook located in the TRACE_ACL18_raw_data directory.

Methodology The features for the joint_prediction_model_set were generated systematically for each stock:

-News-to-Graph Pipeline: Daily news headlines were processed to extract named entities. These entities were then used to query Wikidata and build knowledge subgraphs. A Graph Convolutional Network (GCN) model encoded these graphs into dense vectors.

-Feature Engineering: All other features were generated from the raw price and volume data. The process included basic calculations, technical analysis via pandas-ta, generation of statistical and time-based features, and advanced transformations like wavelet analysis.

Acknowledgements This dataset is an extension and transformation of the original StockNet dataset. We extend our sincere gratitude to the original authors for their contribution to the field.

Original Paper: "StockNet: A Probing Task for Measuring Stock Market Prediction" by Yumeng Xu and Mohit Bansal. (ACL 2018).

Original Data Repository: https://github.com/yumoxu/stocknet-dataset

Inspiration This dataset opens the door to numerous exciting research questions:

-Can you build a single, powerful joint model using the joint_prediction_model_set to predict movements for all stocks simultaneously?

-How does a sophisticated joint model compare against a traditional panel data model trained on the panel_data_model_set?

-What is the lift in predictive power from using news-based graph embeddings versus using only technical indicators?

-Can you apply transfer learning or multi-task learning, using the feature-rich joint set to improve predictions for the panel set?
f
Historical Nifty 50 Constituent Weights (Rolling 20-Year Window)
figshare.com
csv
Updated Sep 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sukrit Bera (2025). Historical Nifty 50 Constituent Weights (Rolling 20-Year Window) [Dataset]. http://doi.org/10.6084/m9.figshare.30217915.v3
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.30217915.v3
Dataset updated
Sep 26, 2025
Dataset provided by
figshare
Authors
Sukrit Bera
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
SUMMARY & CONTEXTThis dataset aims to provide a comprehensive, rolling 20-year history of the constituent stocks and their corresponding weights in India's Nifty 50 index. The data begins on January 31, 2008, and is actively maintained with monthly updates. After hitting the 20-year mark, as new monthly data is added, the oldest month's data will be removed to maintain a consistent 20-year window. This dataset was developed as a foundational feature for a graph-based model analyzing the market structure of the Indian stock market. Unlike typical snapshots that only show the current 50 stocks, this dataset is a survivorship bias-free compilation that includes all stocks that have been part of the Nifty 50 index during this period. The data has been meticulously cleaned and adjusted for corporate actions, making it a robust feature set for financial analysis and quantitative modeling.DATA SOURCE & FREQUENCYPrimary Source: All raw data is sourced from the official historical data reports published by Nifty Indices (niftyindices.com), ensuring the highest level of accuracy.Data Frequency: The data is recorded on a monthly and event-driven basis. It includes end-of-month (EOM) weights but also captures intra-month data points for any date on which the Nifty 50 index was reshuffled or rebalanced. For periods between these data points, the weights can be considered static.METHODOLOGY & DATA INTEGRITYThe dataset was constructed based on official Nifty 50 rebalancing announcements. It relies on the observed assumption that on most reshuffles, the weights of stocks that aren’t being reshuffled stay almost the same before and after the change. Significant effort was made to handle exceptions and complex corporate actions:Corporate Actions: Adjustments were systematically made for major events like mergers (HDFC/HDFCBANK), demergers (Reliance/JIOFIN, ITC/ITCHOTELS), and dual listings (TATAMOTORS/TATAMTRDVR).Rebalancing Extrapolation: In cases where EOM weights did not align with beginning-of-month (BOM) realities post-reshuffle, a logarithmic-linear extrapolation method was used to estimate the weights of incoming/outgoing stocks.2013 Rebalancing Exception: For the second half rebalancing of 2013, due to significant discrepancies, all 50 stocks' weights were recalculated using the extrapolation method instead of carrying over previous values.Weight Normalization: On any given date, the sum of all 50 constituent weights is normalized to equal 100%. The weights are provided with a precision of up to 5 decimal places, and the sum for all observations is validated to a strict tolerance of 1e-6.TICKER & NAMING CONVENTIONSFor consistency across the time series, several historical stock tickers have been mapped to their modern or unified equivalents:INFOSYSTCH -> INFYHEROHONDA -> HEROMOTOCOBAJAJ-AUTO -> BAJAUTOSSTL -> VEDLREL -> RELINFRAZOMATO -> ETERNALCONTENTS & FILE STRUCTUREThis dataset is distributed as a collection of files. The primary data is contained in weights.csv, with several supplementary files provided for context, validation, and analysis.weights.csv: The main data file.Layout: This file is in a standard CSV format. The first row contains the headers, with DATE in the first column and stock tickers in the subsequent columns. Each row corresponds to a specific date.Values: The cells contain the stock's weight (as a percentage) in the Nifty 50 index on a given date. A value of 0 indicates the stock was not an index constituent at that time.sectors.csv: A helper file that maps each stock ticker to its corresponding industry sector.summary.csv: A simple summary file containing the first and last observed dates for each stock, along with a count of its non-zero weight observations.validate.py: A Python script to check weights.csv for data integrity issues (e.g., ensuring daily weights sum to 100).validation_report.txt: The output report generated by validate.py, showing the results of the latest data validation checks.analysis.ipynb: A Jupyter Notebook providing sample analyses that can be performed using this dataset, such as visualizing sector rotation and calculating HHI score over time.README.md: This file, containing the complete documentation for the dataset.CHANGELOG.md: A file for tracking all updates and changes made to the dataset over time.LICENSE.txt: The full legal text of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license, which is applicable to this dataset.POTENTIAL USE CASESAnalyzing historical sector rotation and weight concentration in the Indian market.Building features for quantitative models that aim to predict market movements.Backtesting investment strategies benchmarked against the Nifty 50.ACKNOWLEDGEMENTS & CITATIONThis dataset was created by Sukrit Bera. A permanent, versioned archive of this dataset is available on Figshare. If you use this dataset in your research, please use the following official citation, which includes the permanent DOI:Bera, S. (2025). Historical Nifty 50 Constituent Weights (Rolling 20-Year Window) [Data set]. figshare. https://doi.org/10.6084/m9.figshare.30217915LICENSINGThis dataset is made available under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license. The license selected in the metadata dropdown (CC BY 4.0) is the closest available option on this platform. The full terms of the applicable CC BY-NC-SA 4.0 license is available HERE, as well as in the uploaded LICENSE.txt file in the dataset. The CC BY-NC-SA 4.0 license DOES NOT permit commercial use. This dataset is FREE for academic and non-commercial research with attribution. If you wish to use this dataset for commercial purposes, please contact Sukrit Bera at sukritb2005@gmail.com to negotiate a separate, commercial license.DATA DICTIONARYColumn Name: DATEData Type: DateDescription: The date of the weight recording. This is the first column.Column Name: [Stock Ticker]Data Type: floatDescription: The percentage weight of the stock (e.g., 'RELIANCE', 'TCS') in the Nifty 50 index. A value of 0 indicates it was not an index constituent on that date.
Multi-aspect Integrated Migration Indicators (MIMI) dataset
zenodo.org
data.niaid.nih.gov
+1more
csv
Updated Apr 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Diletta Goglia; Diletta Goglia (2025). Multi-aspect Integrated Migration Indicators (MIMI) dataset [Dataset]. http://doi.org/10.5281/zenodo.6493325
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6493325
Dataset updated
Apr 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Diletta Goglia; Diletta Goglia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Nowadays, new branches of research are proposing the use of non-traditional data sources for the study of migration trends in order to find an original methodology to answer open questions about cross-border human mobility. The Multi-aspect Integrated Migration Indicators (MIMI) dataset is a new dataset to be exploited in migration studies as a concrete example of this new approach. It includes both official data about bidirectional human migration (traditional flow and stock data) with multidisciplinary variables and original indicators, including economic, demographic, cultural and geographic indicators, together with the Facebook Social Connectedness Index (SCI). It is built by gathering, embedding and integrating traditional and novel variables, resulting in this new multidisciplinary dataset that could significantly contribute to nowcast/forecast bilateral migration trends and migration drivers.

Thanks to this variety of knowledge, experts from several research fields (demographers, sociologists, economists) could exploit MIMI to investigate the trends in the various indicators, and the relationship among them. Moreover, it could be possible to develop complex models based on these data, able to assess human migration by evaluating related interdisciplinary drivers, as well as models able to nowcast and predict traditional migration indicators in accordance with original variables, such as the strength of social connectivity. Here, the SCI could have an important role. It measures the relative probability that two individuals across two countries are friends with each other on Facebook, therefore it could be employed as a proxy of social connections across borders, to be studied as a possible driver of migration.

All in all, the motivations for building and releasing the MIMI dataset lie in the need of new perspectives, methods and analyses that can no longer prescind from taking into account a variety of new factors. The heterogeneous and multidimensional sets of data present in MIMI offer an all-encompassing overview of the characteristics of human migration, enabling a better understanding and an original potential exploration of the relationship between migration and non-traditional sources of data.

The MIMI dataset is made up of one single CSV file that includes 28,821 rows (records/entries) and 876 columns (variables/features/indicators). Each row is identified uniquely by a pairs of countries, built from the joining of the two ISO-3166 alpha-2 codes for the origin and destination country, respectively. The dataset contains as main features the country-to-country bilateral migration flows and stocks, together with multidisciplinary variables measuring cultural, demographic, geographic and economic variables for the two countries, together with the Facebook strength of connectedness of each pair.

Related paper: Goglia, D., Pollacci, L., Sirbu, A. (2022). Dataset of Multi-aspect Integrated Migration Indicators. https://doi.org/10.5281/zenodo.6500885
GICS - Global Industry Classification Standard
kaggle.com
Updated Apr 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Merlos (2024). GICS - Global Industry Classification Standard [Dataset]. https://www.kaggle.com/datasets/merlos/gics-global-industry-classification-standard
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 28, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Merlos
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
The Global Industry Classification Standard (GICS) is an industry taxonomy developed in 1999 by MSCI and Standard & Poor's (S&P) for use by the global financial community. The GICS structure consists of

11 sectors

25 industry groups,

74 industries

163 sub-industries into which S&P has categorized all major public companies.

The system is similar to ICB (Industry Classification Benchmark), a classification structure maintained by FTSE Group.

GICS is used as a basis for S&P and MSCI financial market indexes in which each company is assigned to a sub-industry, and to an industry, industry group, and sector, by its principal business activity.

"GICS" is a registered trademark of McGraw Hill Financial and MSCI Inc.

The GICS schema follows this hierarchy: - Sector - Industry Group - Industry - Sub-industry

That is, a sector is composed by industry groups, which are composed by industries which are composed by sub-industries.

Each item in the hierarchy has an id. Each ids are prefixed by the id of the parent in the hierarchy and generally the number of the ids are increased by 5 or 10. For example the Sector Industrials has the id 20, the Industry group Capital Goods has the id is prefixed by that 20, resulting in 2010.

Dataset

The dataset is composed by CSV files (currently 2 files). Each representing a different version of the GICS classification.

For each file the columns are:

SectorId (2 digits)

Sector (string)

IndustryGroupId (4 digits)

IndustryGroup (string)

IndustryId (7 digits)

Industry (string)

SubIndustryId (10 digits)

SubIndustry (string)

SubIndustryDescription (string)

References

https://en.wikipedia.org/wiki/Global_Industry_Classification_Standard

https://www.msci.com/our-solutions/indexes/gics
Daily News for Stock Market Prediction
kaggle.com
zip
Updated Nov 13, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aaron7sun (2019). Daily News for Stock Market Prediction [Dataset]. https://www.kaggle.com/datasets/aaron7sun/stocknews/discussion/41925
Explore at:
zip(6097730 bytes)Available download formats
Dataset updated
Nov 13, 2019
Authors
Aaron7sun
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Actually, I prepare this dataset for students on my Deep Learning and NLP course.

But I am also very happy to see kagglers play around with it.

Have fun!

Description:

There are two channels of data provided in this dataset:

News data: I crawled historical news headlines from Reddit WorldNews Channel (/r/worldnews). They are ranked by reddit users' votes, and only the top 25 headlines are considered for a single date. (Range: 2008-06-08 to 2016-07-01)

Stock data: Dow Jones Industrial Average (DJIA) is used to "prove the concept". (Range: 2008-08-08 to 2016-07-01)

I provided three data files in .csv format:

RedditNews.csv: two columns The first column is the "date", and second column is the "news headlines". All news are ranked from top to bottom based on how hot they are. Hence, there are 25 lines for each date.

DJIA_table.csv: Downloaded directly from Yahoo Finance: check out the web page for more info.

Combined_News_DJIA.csv: To make things easier for my students, I provide this combined dataset with 27 columns. The first column is "Date", the second is "Label", and the following ones are news headlines ranging from "Top1" to "Top25".

=========================================

To my students:

I made this a binary classification task. Hence, there are only two labels:

"1" when DJIA Adj Close value rose or stayed as the same;

"0" when DJIA Adj Close value decreased.

For task evaluation, please use data from 2008-08-08 to 2014-12-31 as Training Set, and Test Set is then the following two years data (from 2015-01-02 to 2016-07-01). This is roughly a 80%/20% split.

And, of course, use AUC as the evaluation metric.

=========================================

+++++++++++++++++++++++++++++++++++++++++

To all kagglers:

Please upvote this dataset if you like this idea for market prediction.

If you think you coded an amazing trading algorithm,

friendly advice

do play safe with your own money :)

+++++++++++++++++++++++++++++++++++++++++

Feel free to contact me if there is any question~

And, remember me when you become a millionaire :P

Note: If you'd like to cite this dataset in your publications, please use:

Sun, J. (2016, August). Daily News for Stock Market Prediction, Version 1. Retrieved [Date You Retrieved This Data] from https://www.kaggle.com/aaron7sun/stocknews.
H
Common Ownership Data: Scraped SEC form 13F filings for 1999-2017
dataverse.harvard.edu
bin, csv +3
Updated Aug 17, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harvard Dataverse (2020). Common Ownership Data: Scraped SEC form 13F filings for 1999-2017 [Dataset]. http://doi.org/10.7910/DVN/ZRH3EU
Explore at:
txt(25964), bin(323182551), txt(14847), bin(2934960), text/x-perl-script(21999), csv(2363718396), bin(271859768), txt(3008286), txt(110929), bin(4653090), txt(303881), tsv(11192545), txt(156950), txt(196510)Available download formats
Unique identifier
https://doi.org/10.7910/DVN/ZRH3EU
Dataset updated
Aug 17, 2020
Dataset provided by
Harvard Dataverse
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
Jan 1, 1999 - Dec 31, 2017
Description
Introduction In the course of researching the common ownership hypothesis, we found a number of issues with the Thomson Reuters (TR) "S34" dataset used by many researchers and frequently accessed via Wharton Research Data Services (WRDS). WRDS has done extensive work to improve the database, working with other researchers that have uncovered problems, specifically fixing a lack of records of BlackRock holdings. However, even with the updated dataset posted in the summer of 2018, we discovered a number of discrepancies when accessing data for constituent firms of the S&P 500 Index. We therefore set out to separately create a dataset of 13(f) holdings from the source documents, which are all public and available electronically from the Securities and Exchange Commission (SEC) website. Coverage is good starting in 1999, when electronic filing became mandatory. However, the SEC's Inspector General issued a critical report in 2010 about the information contained in 13(f) filings. The process: We gathered all 13(f) filings from 1999-2017 here. The corpus is over 318,000 filings and occupies ~25GB of space if unzipped. (We do not include the raw filings here as they can be downloaded from EDGAR). We wrote code to parse the filings to extract holding information using regular expressions in Perl. Our target list of holdings was all public firms with a market capitalization of at least $10M. From the header of the file, we first extract the filing date, reporting date, and reporting entity (Central Index Key, or CIK, and CIKNAME). Beginning with the September 30 2013 filing date, all filings were in XML format, which made parsing fairly straightforward, as all values are contained in tags. Prior to that date, the filings are remarkable for the heterogeneity in formatting. Several examples are linked to below. Our approach was to look for any lines containing a CUSIP code that we were interested in, and then attempting to determine the "number of shares" field and the "value" field. To help validate the values we extracted, we downloaded stock price data from CRSP for the filing date, as that allows for a logic check of (price * shares) = value. We do not claim that this will exhaustively extract all holding information. We can provide examples of filings that are formatted in such a way that we are not able to extract the relevant information. In both XML and non-XML filings, we attempt to remove any derivative holdings by looking for phrases such as OPT, CALL, PUT, WARR, etc. We then perform some final data cleaning: in the case of amended filings, we keep an amended level of holdings if the amended report a) occurred within 90 days of the reporting date and b) the initial filing fails our logic check described above. The resulting dataset has around 48M reported holdings (CIK-CUSIP) for all 76 quarters and between 4,000 and 7,000 CUSIPs and between 1,000 and 4,000 investors per quarter. We do not claim that our dataset is perfect; there are undoubtedly errors. As documented elsewhere, there are often errors in the actual source documents as well. However, our method seemed to produce more reliable data in several cases than the TR dataset, as shown in Online Appendix B of the related paper linked above. Included Files Perl Parsing Code (find_holdings_snp.pl). For reference, only needed if you wish to re-parse original filings. Investor holdings for 1999-2017: lightly cleaned. Each CIK-CUSIP-rdate is unique. Over 47M records. The fields are CIK: the central index key assigned by the SEC for this investor. Mapping to names is available below. CUSIP: the identity of the holdings. Consult the SEC's 13(f) listings to identify your CUSIPs of interest. shares: the number of shares reportedly held. Merging in CRSP data on shares outstanding at the CUSIP-Month level allows one to construct \beta. We make no distinction for the sole/shared/none voting discretion fields. If a researcher is interested, we did collect that starting in mid-2013, when filings are in XML format. rdate: reporting date (end of quarter). 8 digit, YYYYMMDD. fdate: filing date. 8 digit, YYYYMMDD. ftype: the form name. Notes: we did not consolidate separate BlackRock entities (or any other possibly related entities). If one wants to do so, use the CIK-CIKname mapping file below. We drop any CUSIP-rdate observation where any investor in that CUSIP reports owning greater than 50% of shares outstanding (even though legitimate cases exist - see, for example, Diamond Offshore and Loews Corporation). We also drop any CUSIP-rdate observation where greater than 120% of shares outstanding are reported to be held by 13(f) investors. Cases where the shares held are listed as zero likely mean the investor filing lists a holding for the firm but that our code could not find the number of shares due to the formatting of the file. We leave these in the data so that any researchers that find a zero know to go back to that source filing to manually gather the holdings for the securities they are interested in. Processed 13f holdings (airlines.parquet, cereal.parquet, out_scrape.parquet). These are used in our related AEJ:Microeconomics paper. The files contain all firms within the airline industry, RTE cereal industry, and all large cap firms (a superset of the S&P 500) respectively. These are a merged version of the scrape_parsed.csv file described above, that include the shares outstanding and percent ownership used to calculate measures of common ownership. These are distributed as brotli compressed Apache Parquet (binary) files. This preserves date information correctly. mgrno: manager number (which is actually CIK in the scraped data) rdate: reporting date ncusip: cusip rrdate: reportaing date in stata format mgrname: manager name shares: shares sole: shares with sole authority shared: shares with shared authority none: shares with no authority isbr/isfi/iss/isba/isvg: is this blackrock, statestreet, vanguard, barclay, fidelity numowners: how many owners prc: price at reporting date shares_out: shares outstanding at reporting date value: reported value in 13(f) beta: shares/shares_out permno: permno Profit weight values (i.e. \kappa) for all firms in the sample. (public_scrape_kappas_XXXX.parquet). Each file represents one year of data and is around 200MB and distributed as a compressed (brotli) parquet file. Fields are simply CUSIP_FROM, CUSIP_TO, KAPPA, QUARTER. Note that these have not been adjusted for multi-class share firms, insider holdings, etc. If looking at a particular market, some additional data cleaning on the investor holdings (above) followed by recomputing profit weights is recommended. For this, we did merge the separate BlackRock entities prior to computing \kappa. CIKmap.csv (~250K observations) Mapping is from CIK-rdate to CIKname. Use this if you want to consolidate holdings across reporting entities or explore the identities of reporting firms. In the case of amended filings that use different names than original ones, we keep the earliest name. Example of Parsing Challenge Prior to the XML era, filings were far from uniform, which creates a notable challenge for parsing them for holdings. In the examples directory we include several example text files of raw 13f filings. Example 1 is a "well behaved" filing, with CUSIP, followed by value, followed by number of shares, as recommended by the SEC. Example 2 shows a case where the ordering is changed: CUSIP, then shares, then value. The column headers show "item 5" coming before "item 4". Example 3 shows a case of a fixed width table, which in principle could be parsed very easily using the tags at the top, although not all filings consistently use these tags. Example 4 shows a case with a fixed width table, with no tag for the CUSIP column. Also, notice that if the firm holds more than 10M shares of a firm, that number occupies the entire width of the column and there is no longer a column separator (i.e. Cisco Systems on line 374). Example 5 shows a comma-separated table format. Example 6 shows a case of changing the column ordering, but also adding an (unrequired) column for share price. Example 7 shows a case where the table is split across subsequent pages, and so the CUSIP appears on a different line than the number of shares.
m
Nairobi Securities Exchange Prices 2008-2012 for 6 selected stocks
data.mendeley.com
Updated Mar 10, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Barack Wanjawa (2020). Nairobi Securities Exchange Prices 2008-2012 for 6 selected stocks [Dataset]. http://doi.org/10.17632/95fb84nzcd.2
Explore at:
Unique identifier
https://doi.org/10.17632/95fb84nzcd.2
Dataset updated
Mar 10, 2020
Authors
Barack Wanjawa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Stock market prediction remains active research in a quest to inform investors on how to trade (buy/sell) at the most opportune time. The prevalent methods used by stock market players in trying to predict the likely future trade prices are either technical, fundamental or time series analysis. This research wanted to try out machine learning methods, in contrast to the existing prevalent methods. Artificial neural networks (ANNs) tend to be the preferred machine learning method for this type of application. However, ANNs require some historical data to learn from, in order to do predictions. The research used an ANN model to test the hypothesis that the next day price (prediction) can be determined from the stock prices of the immediate last five days.

The final ANN model used for the tests was a feedforward multi-layer perceptron (MLP) with error backpropagation, using sigmoid activation function, with network configuration 5:21:21:1. The data period used was a 5-year dataset (2008 to 2012), with 80% of the data (4-year data) used for training and the balance 20% used for testing (last 1-year data).

The original raw data for Nairobi Securities Exchange (NSE) was scrapped from a publicly available and accessible website of a stock market analysis company in Kenya (Synergy, 2020). This data was first exported to a spreadsheet, then cleaned off headers and other redundant information, leaving only the data with stock name, date of trade and the related data such as volumes, low prices, high prices and adjusted prices. The data was further sorted by the stock names and then the trading dates. The data dimension was finally reduced to only what was needed for the research, which was the stock name, the date of trade and the adjusted price (average trade price). This final dataset was in CSV format, as hereby presented.

The research tested three NSE stocks with the mean absolute percentage error (MAPE) ranging between 0.77% to 1.91%, over the 3-month testing period, while the root mean squared error (RMSE) ranged between 1.83 and 3.07.

This raw data can be used to train and test any machine learning model that requires training and testing data. The data can also be used to validate and reproduce the results already presented in this research. There could be slight variance between what is obtained when reproducing the results, due to the differences in the final exact weights that the trained ANN model eventually achieves. However, these differences should not be significant.

List of data files on this dataset: stock01_NSE_01jan2008_to_31dec2012_Kakuzi.csv stock02_NSE_01jan2008_to_31dec2012_StandardBank.csv stock03_NSE_01jan2008_to_31dec2012_KenyaAirways.csv stock04_NSE_01jan2008_to_31dec2012_BamburiCement.csv stock05_NSE_01jan2008_to_31dec2012_Kengen.csv stock06_NSE_01jan2008_to_31dec2012_BAT.csv

References: Synergy Systems Ltd. (2020). MyStocks. Retrieved March 9, 2020, from http://live.mystocks.co.ke/
d
Data from: Effect of management on natural capital stocks underlying...
datadryad.org
data.niaid.nih.gov
zip
Updated Jul 21, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fleur J. F. Maseyk; László Demeter; Anna Maria Csergo; Yvonne M. Buckley (2018). Effect of management on natural capital stocks underlying ecosystem service provision: a ‘provider group’ approach [Dataset]. http://doi.org/10.5061/dryad.97cj5
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.97cj5
Dataset updated
Jul 21, 2018
Dataset provided by
Dryad
Authors
Fleur J. F. Maseyk; László Demeter; Anna Maria Csergo; Yvonne M. Buckley
Time period covered
Jul 20, 2017
Area covered
Romania, Transylvania, Carpathian Mountains
Description
Maseyk et al_BiodivConserv_Data&RScripts1. R code DataPrep (R script for data compilation and file preparation); 2. R code LMM and graphs (R script for Linear Mixed Models and plotting); 3. Masterfile.csv (raw data file); 4. Abandoned.csv, Mowed.csv and Grazed.csv (input data by management type); 5. Count.csv, Cover.csv, Evar.csv, InvSimpson.csv (input data by metric).Final Data and R code.zip
d
S&P 500 Companies with Financial Information
datahub.io
Updated Aug 29, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2017). S&P 500 Companies with Financial Information [Dataset]. https://datahub.io/core/s-and-p-500-companies
Explore at:
Dataset updated
Aug 29, 2017
License
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Description
List of companies in the S&P 500 (Standard and Poor's 500). The S&P 500 is a free-float, capitalization-weighted index of the top 500 publicly listed stocks in the US (top 500 by market cap). The ...
z
Simulated Inventory Management Database and Object-Centric Event Logs for...
zenodo.org
bin, csv +2
Updated May 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alessandro Berti; Alessandro Berti (2025). Simulated Inventory Management Database and Object-Centric Event Logs for Process Analysis [Dataset]. http://doi.org/10.5281/zenodo.15515788
Explore at:
xml, text/x-python, csv, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15515788
Dataset updated
May 26, 2025
Dataset provided by
Zenodo
Authors
Alessandro Berti; Alessandro Berti
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract: This repository/dataset provides a suite of Python scripts to generate a simulated relational database for inventory management processes and transform this data into object-centric event logs (OCEL) suitable for advanced process mining analysis. The primary goal is to offer a synthetic yet realistic dataset that facilitates research, development, and application of object-centric process mining techniques in the domain of inventory control and supply chain management. The generated event logs capture common inventory operations, track stock level changes, and are enriched with key inventory management parameters (like EOQ, Safety Stock, Reorder Point) and status-based activity labels (e.g., indicating understock or overstock situations).

Overview: Inventory management is a critical business process characterized by the interaction of various entities such as materials, purchase orders, sales orders, plants, suppliers, and customers. Traditional process mining often struggles to capture these complex interactions. Object-Centric Process Mining (OCPM) offers a more suitable paradigm. This project provides the tools to create and explore such data.

The workflow involves:

Database Simulation: Generating a SQLite database with tables for materials, sales orders, purchase orders, goods movements, stock levels, etc., populated with simulated data.

Initial OCEL Generation: Extracting data from the SQLite database and structuring it as an object-centric event log (in CSV format). This log includes activities like "Create Purchase Order Item", "Goods Receipt", "Create Sales Order Item", "Goods Issue", and tracks running stock levels for materials.

OCEL Post-processing and Enrichment:

Calculating standard inventory management metrics such as Economic Order Quantity (EOQ), Safety Stock (SS), and Reorder Point (ROP) for each material-plant combination based on the simulated historical data.

Merging these metrics into the event log.

Enhancing activity labels to include the current stock status (e.g., "Understock", "Overstock", "Normal") relative to calculated SS and Overstock (OS) levels (where OS = SS + EOQ).

Generating new, distinct events to explicitly mark the moments when stock statuses change (e.g., "START UNDERSTOCK", "ST CHANGE NORMAL to OVERSTOCK", "END NORMAL").

Format Conversion: Converting the CSV-based OCELs into the standard OCEL XML/OCEL2 format using the pm4py library.

Contents:

The repository contains the following Python scripts:

01_generate_simulation.py:

Creates a SQLite database named inventory_management.db.

Defines and populates tables including: Materials, SalesOrderDocuments, SalesOrderItems, PurchaseOrderDocuments, PurchaseOrderItems, PurchaseRequisitions, GoodsReceiptsAndIssues, MaterialStocks, MaterialDocuments, SalesDocumentFlows, and OrderSuggestions.

Simulates data for a configurable number of materials, customers, sales, purchases, etc., with randomized dates and quantities.

02_database_to_ocel_csv.py:

Connects to the inventory_management.db.

Executes a SQL query to extract relevant events and their associated objects for inventory processes.

Constructs an initial object-centric event log, saved as ocel_inventory_management.csv.

Identified object types include: MAT (Material), PLA (Plant), PO_ITEM (Purchase Order Item), SO_ITEM (Sales Order Item), CUSTOMER, SUPPLIER.

Calculates "Stock Before" and "Stock After" for each event affecting material stock.

Standardizes column names to OCEL conventions (e.g., ocel:activity, ocel:timestamp, ocel:type:).

03_ocel_csv_to_ocel.py:

Reads ocel_inventory_management.csv.

Uses pm4py to convert the CSV event log into the standard OCEL XML format (ocel_inventory_management.xml).

04_postprocess_activities.py:

Reads data from inventory_management.db to calculate inventory parameters:

Annual Demand (Dm)

Average Daily Demand (dm)

Standard Deviation of Daily Demand (σm)

Average Lead Time (lm)

Economic Order Quantity (EOQ): (2⋅Dm⋅S)/H (where S is fixed order cost, H is holding cost)

Safety Stock (SS): z⋅σm⋅lm (where z is the z-score for the desired service level)

Reorder Point (ROP): (dm⋅lm)+SS

Merges these calculated parameters with ocel_inventory_management.csv.

Computes an Overstock level (OS) as SS+EOQ.

Derives a "Current Status" (Understock, Overstock, Normal) for each event based on "Stock After" relative to SS and OS.

Appends this status to the ocel:activity label (e.g., "Goods Issue (Understock)").

Generates new events for status changes (e.g., "START NORMAL", "ST CHANGE UNDERSTOCK to NORMAL", "END OVERSTOCK") with adjusted timestamps to precisely mark these transitions.

Creates a new object type MAT_PLA (Material-Plant combination) for easier status tracking.

Saves the enriched and transformed log as post_ocel_inventory_management.csv.

05_ocel_csv_to_ocel.py:

Reads the post-processed post_ocel_inventory_management.csv.

Uses pm4py to convert this enriched CSV event log into the standard OCEL XML format (post_ocel_inventory_management.xml).

Generated Dataset Files (if included, or can be generated using the scripts):

inventory_management.db: The SQLite database containing the simulated raw data.

ocel_inventory_management.csv: The initial OCEL in CSV format.

ocel_inventory_management.xml: The initial OCEL in standard OCEL XML format.

post_ocel_inventory_management.csv: The post-processed and enriched OCEL in CSV format.

post_ocel_inventory_management.xml: The post-processed and enriched OCEL in standard OCEL XML format.

How to Use:

Ensure you have Python installed along with the following libraries: sqlite3 (standard library), pandas, numpy, pm4py.

Run the scripts sequentially in a terminal or command prompt:

python 01_generate_simulation.py (generates inventory_management.db)

python 02_database_to_ocel_csv.py (generates ocel_inventory_management.csv from the database)

python 03_ocel_csv_to_ocel.py (generates ocel_inventory_management.xml)

python 04_postprocess_activities.py (generates post_ocel_inventory_management.csv using the database and the initial CSV OCEL)

python 05_ocel_csv_to_ocel.py (generates post_ocel_inventory_management.xml)

Potential Applications and Research: This dataset and the accompanying scripts can be used for:

Applying and evaluating object-centric process mining algorithms on inventory management data.

Analyzing inventory dynamics, such as the causes and effects of understocking or overstocking.

Discovering and conformance checking process models that involve multiple interacting objects (materials, orders, plants).

Investigating the impact of different inventory control parameters (EOQ, SS, ROP) on process execution.

Developing educational materials for teaching OCPM in a supply chain context.

Serving as a benchmark for new OCEL-based analysis techniques.

Keywords: Object-Centric Event Log, OCEL, Process Mining, Inventory Management, Supply Chain, Simulation, Synthetic Data, SQLite, Python, pandas, pm4py, Economic Order Quantity (EOQ), Safety Stock (SS), Reorder Point (ROP), Stock Status Analysis.
Kospi Stock Price
kaggle.com
Updated Sep 30, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oh InQueue (2017). Kospi Stock Price [Dataset]. https://www.kaggle.com/gomjellie/kospi-price-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 30, 2017
Dataset provided by
Kaggle
Authors
Oh InQueue
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

I got all these .csv files using pandas data reader but getting every single kospi data through pandas data reader is annoying. so I decided to share this files.

Content

Files

kospi.csv contains average kospi price. you can use this for checking whether if korean stock is day-off or not. xxxxxx.csv contains each single price records. xxxxxx is it's unique ticker.

Columns

Date

format - \d{4}-\d{2}-\d{2}

Open

format - \d{1,}\.\d{1}

High

format - \d{1,}\.\d{1}

Low

format - \d{1,}\.\d{1}

Close

format - \d{1,}\.\d{1}

Adj Close

format - \d{1,}\.\d{1}

Volume

format - \d+

Acknowledgements

blog post which describes how i got these data's. you might need this to update csv files.

git repository git repository

Inspiration

Good luck.
m
Data from: Systemic risk spillovers incorporating investor...
data.mendeley.com
Updated Jun 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xia Zhao (2025). Systemic risk spillovers incorporating investor sentiment：Evidence from an improved TENET analysis [Dataset]. http://doi.org/10.17632/j5zzhy652g.3
Explore at:
Unique identifier
https://doi.org/10.17632/j5zzhy652g.3
Dataset updated
Jun 9, 2025
Authors
Xia Zhao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The folder "code and data" contains the code for data processing and empirical results. It includes two folders, data is used to store data, and model is used to store running python and R code.

1.Data Description： 1.1.The folder "TENET network data at each time point" stores the adjacency matrix and other data of each time node in the TENET network. It is called in "Network topology analysis.R". 1.2.Ping An Bank Investor Sentiment (Bayesian Machine Learning).csv is Ping An Bank's investor sentiment data based on machine learning methods 1.3.Ping An Bank Investor Sentiment (Financial Dictionary).csv is Ping An Bank's investor sentiment data based on Financial Dictionary methods 1.4.Ping An Bank Investor Sentiment (Pre-trained Deep Learning (ERNIE)).csv is Ping An Bank's investor sentiment data based on ERNIE model. 1.5aligned_sentiment_indices.csv stores variables related to market sentiment, among which ISI, CICSI and Confidence index are derived from the CSMAR database, and BI is the investor sentiment index calculated by ERNIE based on Baidu AI platform. 1.6 The IIC.csv file contains data on tail risk spillovers within the financial sector. 1.7 The DS.csv file contains data on tail risk spillovers between any financial sector of a financial institution and any other financial sector. 1.8 The BIC.csv file contains data on how much risk each sector spillsover to others. 1.9 The BIC_receive.csv contains data on how much risk each sector receives from others. 1.10 The three files HHI.csv, NAS.csv, and AS.csv store network topology indicator data. 1.11 The code number.xlsx store the stock codes and abbreviations of all financial institutions. 1.12 The Stock Market Value.csv is the market value data of financial institutions, which is used to identify Systemically Important Financial Institutions (Härdle et al. (2016)).

2.Figure： 2.1Figure 1 can be obtained through the ''Sentiment Comparison of Three Approaches for Individual Financial Institutions.py''. 2.2Figure 2 can be obtained via ''Comparison of Market sentiment.py''. 2.3Figures 3 can be obtained through ''Change in average λ for systematic risk (compare to inclusion of sentiment variables).py''. 2.4Figure 4 requires you to choose to run ''Comparison of elemental standardisation treatments for TENET.py''. 2.5Figure 5 requires you to choose to run ''Comparison of average λ and spillover intensity.py''. 2.6Figure 6-11 are obtained by running ''Network topology analysis.R''.The same procedure is also run for Tables 5 and 6 concerning the rankings of risk emitters and receivers. 2.7Figure 12 is obtained by running ''Evolution of Cross-Sector Tail Risk Spillovers and Spill-Ins.py''. 2.8Figure 13 is obtained by running ''Tail risk spillovers between any financial sector of a financial institution and any other financial sector.py''. 2.9Figure 14 is obtained by running ''Tail Risk Spillovers within the Financial Sector.py''.
u
Data from: The Epps effect under alternative sampling schemes: Dataset
zivahub.uct.ac.za
txt
Updated Nov 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patrick Chang; Etienne Pienaar; Tim Gebbie (2020). The Epps effect under alternative sampling schemes: Dataset [Dataset]. http://doi.org/10.25375/uct.13258811.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.25375/uct.13258811.v1
Dataset updated
Nov 20, 2020
Dataset provided by
University of Cape Town
Authors
Patrick Chang; Etienne Pienaar; Tim Gebbie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data consists of transaction data for 10 equities from the Johannesburg Stock Exchange. The data consists of five trading days ranging from 2019-06-24 to 2019-06-28. The data has been processed to only contain transactions. Furthermore, transactions with the same time stamp have been aggregated using a volume weighted average so that there is only one trade per time stamp. Missing data is indicated with NaN's.The 10 equities included are: FirstRand Limited (FSR), Shoprite Holdings Ltd (SHP), Absa Group Ltd (ABG), Nedbank Group Ltd (NED), Standard Bank Group Ltd (SBK), Sasol Ltd (SOL), Mondi Plc (MNP), Anglo American Plc (AGL), Naspers Ltd (NPN) and British American Tobacco Plc (BTI).The data structure in each csv file is 10 columns which contain the trading information for the assets traded. The transaction data are in chronological order. The three files have the exact same structure with each file containing information for the transaction tuple: price, time and volume.The data should only be used to aid the reproducibility for the paper:The Epps effect under alternative sampling schemes. The steps to reproduce the results can be found in our GitHub site: https://github.com/CHNPAT005/PCRBTG-VT.The research focuses on investigating the Epps effect under different definitions of time.The work is funded by the South African Statistical Association. The original data was sourced from Bloomberg Pro. The code for the research is done using Julia Pro.
d
NYSE and Other Listings
datahub.io
Updated Aug 31, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2017). NYSE and Other Listings [Dataset]. https://datahub.io/core/nyse-other-listings
Explore at:
Dataset updated
Aug 31, 2017
Description
List of companies in the NYSE, and other exchanges.

Data and documentation are available on NASDAQ's official webpage. Data is updated regularly on the FTP site.

The file used in this repository: ...
n
Carbon Stocks of Individual Trees in African Drylands: Allometry and Output...
access.earthdata.nasa.gov
earthdata.nasa.gov
+3more
Updated Oct 19, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Carbon Stocks of Individual Trees in African Drylands: Allometry and Output Data [Dataset]. http://doi.org/10.3334/ORNLDAAC/2117
Explore at:
Unique identifier
https://doi.org/10.3334/ORNLDAAC/2117
Dataset updated
Oct 19, 2023
Time period covered
Nov 1, 2002 - Mar 31, 2021
Area covered

Description
This dataset provides allometrically-estimated carbon stocks of 9,947,310,221 tree crowns derived from 50-cm resolution satellite images within the 0 to 1000 mm/year precipitation zone of Africa north of the equator and south of the Sahara Desert. These data are presented in GeoPackage (.gpkg) format and are summarized in Cloud-Optimized GeoTIFF (COG) format. An interactive viewer application developed to display these carbon estimates at the individual tree level across the study area is available at: https://trees.pgc.umn.edu/app. The analysis utilized 326,523 Maxar multispectral satellite images collected between 2002 to 2021 for the early dry season months of November to March to identify tree crowns. Metadata from satellite image processing across the study area are presented in Shapefile (.shp) format. Additionally, field measurements from destructive harvests used to derive allometry equations are contained in comma-separated values (*.csv) files. These data demonstrate a new tool for studying discrete semi-arid carbon stocks at the tree level with immediate applications provided by the viewer application. Uncertainty of carbon estimates are +/- 19.8%.
m
Instances for production scheduling with stock- and staff-related...
data.mendeley.com
Updated Jul 26, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carlo S. Sartori (2021). Instances for production scheduling with stock- and staff-related restrictions [Dataset]. http://doi.org/10.17632/rpbv622wyd.2
Explore at:
Unique identifier
https://doi.org/10.17632/rpbv622wyd.2
Dataset updated
Jul 26, 2021
Authors
Carlo S. Sartori
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Datasets used in the paper "Production scheduling with stock- and staff-related restrictions". The folder "Instances/" stores all instance files for both low- and high-demand instances as per the description in the associated paper. The folder "Solutions/" stores 10 solution files per instance obtained by means of a special-purpose Late Acceptance Hill Climbing Metaheuristic. Meanwhile, the folder "Validator/" contains a ".jar" file which can be executed to validate solutions to the instances in this dataset. All folders also contain an associated "README.txt" file explaining how to use the files inside them.

The file "table_avgs.txt" is a CSV containing the complete average results per instance which were summarized in the corresponding paper. Meanwhile, the file "table_costs.txt" is a CSV with the cost of each solution in the "Solutions/" folder for each execution.

Instance names are formatted as T_D_R_B, where T: is either the letter "L" or "H" standing for "low-" and "high-demand" instances, respectively. D: is the number of days in the time horizon of the instance. R: the number of requests to be served within the time horizon (it is not necessarily true that all R can be served in a feasible solution). B: the length, in minutes, of a block (micro-period within a day where production of one item type at full capacity may take place). Each day is formed by a series of micro-periods.

For further details concerning the instances, the interested reader is referred to the paper "Production scheduling with stock- and staff-related restrictions".
d
Creditor Reporting System (CRS), 1973-1998
search.dataone.org
borealisdata.ca
Updated Sep 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Organization for Economic Co-operation and Development (OECD) (2024). Creditor Reporting System (CRS), 1973-1998 [Dataset]. http://doi.org/10.5683/SP3/VJVY1D
Explore at:
Unique identifier
https://doi.org/10.5683/SP3/VJVY1D
Dataset updated
Sep 18, 2024
Dataset provided by
Borealis
Authors
Organization for Economic Co-operation and Development (OECD)
Time period covered
Jan 1, 1973 - Jan 1, 1998
Description
Detailed data for long- and short-term debt stocks and service payments. Data are available for major country group, individual countries, and territories. The Creditor Reporting System (CRS) is an information system comprised of data on Official Development Assistance (ODA) and Official Aid (OA). The system has been in existence since 1967 and is sponsored jointly by the OECD and the World Bank, and operated by the OECD. A subset of the CRS consists of individual grant and loan commitments (between 6000-30000 transactions a year) submitted by DAC donors (22 Members) and multilateral institutions on a regular basis. Reporters are asked to supply (in their national currency), detailed financial information on the commitment (to the developing country) such as: terms of repayment (for loans), tying status and sector allocation. The secretariat converts the amounts of the projects into US dollars using the annual average exchange rates. 11 data files (number of logical records varies; csv (comma-separated) format); accompanying documentation (1 PDF file)
AMAZON STOCK PRICE HISTORY
kaggle.com
Updated Sep 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adil Shamim (2025). AMAZON STOCK PRICE HISTORY [Dataset]. https://www.kaggle.com/datasets/adilshamim8/amazon-stock-price-history/versions/8
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 6, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Adil Shamim
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset contains the historical stock price data for Amazon.com, Inc. (AMZN), one of the largest and most influential technology companies in the world. The data has been sourced directly from Yahoo Finance, a widely trusted provider of financial market data. It spans a significant time range, enabling users to analyze Amazon’s market performance over the years, observe long-term trends, and identify key events in the company’s history.

Data Source

Source: Yahoo Finance (Amazon AMZN Historical Data)

Stock Ticker: AMZN (Amazon.com, Inc.)

Dataset Features

The dataset is structured as a CSV file, with each row representing a single trading day. The following columns are included:

Date: The date of the trading session (in YYYY-MM-DD format).

Open: The price at which Amazon stock opened for trading on that day.

High: The highest price reached during the trading session.

Low: The lowest price reached during the day.

Close: The closing price of Amazon stock for that day.

Adj Close: The adjusted closing price, which accounts for any corporate actions (e.g., dividends, stock splits) that might have affected the price.

Volume: The total number of shares traded on that day.

Coverage and Frequency

Frequency: Daily (including all trading days)

Time Period: The dataset covers all available historical data for Amazon stock as provided by Yahoo Finance. For specific start and end dates, please check the dataset page or CSV file.

Potential Uses

This dataset is suitable for a wide range of financial, academic, and data science projects, such as:

Time Series Analysis: Analyze trends and patterns in Amazon’s stock price movements over time.

Forecasting: Build predictive models to forecast future stock prices using statistical or machine learning techniques.

Technical Analysis: Apply traditional financial indicators and strategies (e.g., moving averages, RSI, MACD) to study price action.

Event Impact Studies: Examine the effect of external events (earnings releases, product launches, macroeconomic news) on Amazon’s stock price.

Portfolio Simulation: Use historical data to backtest trading strategies or simulate investment portfolios.

Educational Purposes: Teach or learn about financial markets, data manipulation, and data visualization.

Data Quality and Notes

The data is directly downloaded from Yahoo Finance and is believed to be accurate. However, users should verify specific data points if using for investment or trading decisions.

Missing or non-trading days (weekends, holidays) are omitted from the dataset.

Adjusted Close is the most accurate reflection of the stock’s value over time, especially when analyzing performance across long periods.

Acknowledgments

Yahoo Finance for providing free and accessible financial data.

Please cite Yahoo Finance as the original data source if you use this dataset in your work.

License

This dataset is provided for educational and research purposes only. Please review Yahoo Finance’s terms of service before using this data for commercial purposes.
s
Data from: Plant species richness promotes soil carbon and nitrogen stocks...
repository.soilwise-he.eu
data.niaid.nih.gov
+1more
Updated Aug 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Data from: Plant species richness promotes soil carbon and nitrogen stocks in grasslands without legumes [Dataset]. http://doi.org/10.5061/dryad.p83h7
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.p83h7
Dataset updated
Aug 20, 2025
Description
Open AccessPlant and soil data from the last year of the biodiversity experimentData from: Wen-feng Cong, Jasper van Ruijven, Liesje Mommer, Gerlinde De Deyn, Frank Berendse and Ellis Hoffland. (2014) Plant species richness promotes soil carbon and nitrogen stocks in grasslands without legumes. Data were collected in the 11-year grassland biodiversity experiment in Wageningen, the Netherlands, in 2010 and 2011. Abbreviated headlines are as follows: “”BLK”= block; “PT”= plot; 'SR' = plant species richness; “MI” = monoculture identity (Ac = Agrostis capillaris; Ao = Anthoxanthum odoratum; Cj = Centaurea jacea; Fr = Festuca rubra; Hl = Holcus lanatus; Lv = Leucanthemum vulgare; Pl = Plantago lanceolata; Ra = Rumex acetosa); 'AAB' = average aboveground biomass from 2000 to 2010 (g m-2); 'RB' = standing root biomass (g fresh weight m-2) up to 50 cm depth in June 2010; 'CS' = soil carbon stocks (g C m-2) in April 2011; 'NS' = soil nitrogen stocks (g N m-2) in April 2011. 'CD' = soil organic carbon decomposition (mg CO2-C kg-1 soil) measured in soil collected in April 2011; 'NM' = potential net N mineralization rate (µg N kg-1 soil day-1) measured in soil collected in April 2011.data file.csv

Facebook

Twitter

Click to copy link

Link copied

Cite

Oleh Onyshchak (2020). Stock Market Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/1054465

Stock Market Dataset

Historical daily prices of Nasdaq-traded stocks and ETFs

Explore at:

4 scholarly articles cite this dataset (View in Google Scholar)

zip(547714524 bytes)Available download formats

Unique identifier

https://doi.org/10.34740/kaggle/dsv/1054465

Dataset updated

Apr 2, 2020

Authors

Oleh Onyshchak

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Overview

This dataset contains historical daily prices for all tickers currently trading on NASDAQ. The up to date list is available from nasdaqtrader.com. The historic data is retrieved from Yahoo finance via yfinance python package.

It contains prices for up to 01 of April 2020. If you need more up to date data, just fork and re-run data collection script also available from Kaggle.

Data Structure

The date for every symbol is saved in CSV format with common fields:

Date - specifies trading date
Open - opening price
High - maximum price during the day
Low - minimum price during the day
Close - close price adjusted for splits
Adj Close - adjusted close price adjusted for both dividends and splits.
Volume - the number of shares that changed hands during a given day

All that ticker data is then stored in either ETFs or stocks folder, depending on a type. Moreover, each filename is the corresponding ticker symbol. At last, symbols_valid_meta.csv contains some additional metadata for each ticker such as full name.

Clear search

Close search

Google apps

Main menu

Stock Market Dataset

Overview

Data Structure

TRACE_ACL18

Historical Nifty 50 Constituent Weights (Rolling 20-Year Window)

Multi-aspect Integrated Migration Indicators (MIMI) dataset

GICS - Global Industry Classification Standard

Dataset

References

Daily News for Stock Market Prediction

Common Ownership Data: Scraped SEC form 13F filings for 1999-2017

Nairobi Securities Exchange Prices 2008-2012 for 6 selected stocks

Data from: Effect of management on natural capital stocks underlying...

S&P 500 Companies with Financial Information

Simulated Inventory Management Database and Object-Centric Event Logs for...

Kospi Stock Price

Context

Content

Files

Columns

Acknowledgements

Inspiration

Data from: Systemic risk spillovers incorporating investor...

Data from: The Epps effect under alternative sampling schemes: Dataset

NYSE and Other Listings

Carbon Stocks of Individual Trees in African Drylands: Allometry and Output...

Instances for production scheduling with stock- and staff-related...

Creditor Reporting System (CRS), 1973-1998

AMAZON STOCK PRICE HISTORY

Data Source

Dataset Features

Coverage and Frequency

Potential Uses

Data Quality and Notes

Acknowledgments

License

Data from: Plant species richness promotes soil carbon and nitrogen stocks...

Stock Market Dataset

Historical daily prices of Nasdaq-traded stocks and ETFs

Overview

Data Structure