Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
The "yahoo_finance_dataset(2018-2023)" dataset is a financial dataset containing daily stock market data for multiple assets such as equities, ETFs, and indexes. It spans from April 1, 2018 to March 31, 2023, and contains 1257 rows and 7 columns. The data was sourced from Yahoo Finance, and the purpose of the dataset is to provide researchers, analysts, and investors with a comprehensive dataset that they can use to analyze stock market trends, identify patterns, and develop investment strategies. The dataset can be used for various tasks, including stock price prediction, trend analysis, portfolio optimization, and risk management. The dataset is provided in XLSX format, which makes it easy to import into various data analysis tools, including Python, R, and Excel.
The dataset includes the following columns:
Date: The date on which the stock market data was recorded. Open: The opening price of the asset on the given date. High: The highest price of the asset on the given date. Low: The lowest price of the asset on the given date. Close*: The closing price of the asset on the given date. Note that this price does not take into account any after-hours trading that may have occurred after the market officially closed. Adj Close**: The adjusted closing price of the asset on the given date. This price takes into account any dividends, stock splits, or other corporate actions that may have occurred, which can affect the stock price. Volume: The total number of shares of the asset that were traded on the given date.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Description 📊🔍
The Sujet-Finance-QA-Vision-100k is a comprehensive dataset containing over 100,000 question-answer pairs derived from more than 9,800 financial document images. This dataset is designed to support research and development in the field of financial document analysis and visual question answering.
Key Features:
🖼️ 9,801 unique financial document images ❓ 107,050 question-answer pairs 🇬🇧 English language 📄 Diverse financial document types… See the full description on the dataset page: https://huggingface.co/datasets/sujet-ai/Sujet-Finance-QA-Vision-100k.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This repository contains a meticulously scraped dataset from various financial websites. The data extraction process ensures high-quality and accurate text, including content from both the websites and their embedded PDFs.
We applied the advanced Mixtral 7X8 model to generate the following additional fields:
The prompt used to generate the additional fields was highly effective, thanks to extensive discussions and collaboration with the Mistral AI team. This ensures that the dataset provides valuable insights and is ready for further analysis and model training.
This dataset can be used for various applications, including but not limited to:
Facebook
TwitterThe dataset captures 20,985 projects across 165 low- and middle-income countries supported by loans and grants from official sector institutions in China worth $1.34 trillion. It tracks projects over 22 commitment years (2000-2021) and provides details on the timing of project implementation over a 24-year period (2000-2023).
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Paper |Homepage |Github
🛠️ Usage
Regarding the data, first of all, you should download the MMfin.tsv and MMfin_CN.tsv files, as well as the relevant financial images. The folder structure is shown as follows: ├─ datasets ├─ images ├─ MMfin ... ├─ MMfin_CN ... │ MMfin.tsv │ MMfin_CN.tsv
The following is the process of inference and evaluation (Qwen2-VL-2B-Instruct as an example): export LMUData="The path of the datasets" python… See the full description on the dataset page: https://huggingface.co/datasets/hithink-ai/MME-Finance.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Image generated by DALL-E. See prompt for more details
💼 📊 Synthetic Financial Domain Documents with PII Labels
gretelai/synthetic_pii_finance_multilingual is a dataset of full length synthetic financial documents containing Personally Identifiable Information (PII), generated using Gretel Navigator and released under Apache 2.0. This dataset is designed to assist with the following use cases:
🏷️ Training NER (Named Entity Recognition) models to detect and label PII in… See the full description on the dataset page: https://huggingface.co/datasets/gretelai/synthetic_pii_finance_multilingual.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset, titled "Financial-QA-10k", contains 10,000 question-answer pairs derived from company financial reports, specifically the 10-K filings. The questions are designed to cover a wide range of topics relevant to financial analysis, company operations, and strategic insights, making it a valuable resource for researchers, data scientists, and finance professionals. Each entry includes the question, the corresponding answer, the context from which the answer is derived, the company's stock ticker, and the specific filing year. The dataset aims to facilitate the development and evaluation of natural language processing models in the financial domain.
About the Dataset Dataset Structure:
Sample Data:
Question: What area did NVIDIA initially focus on before expanding into other markets? Answer: NVIDIA initially focused on PC graphics. Context: Since our original focus on PC graphics, we have expanded into various markets. Ticker: NVDA Filing: 2023_10K
Potential Uses:
Natural Language Processing (NLP): Develop and test NLP models for question answering, context understanding, and information retrieval. Financial Analysis: Extract and analyze specific financial and operational insights from large volumes of textual data. Educational Purposes: Serve as a training and testing resource for students and researchers in finance and data science.
Facebook
TwitterVOSA Financial system incorporating General Ledger, Accounts Payable and Accounts Receivable, Cash Management
Facebook
TwitterThis (financial and personal) data is required to be kept as part of the auditing process of the co-ordinating country. It is required to be retained for several years after the ESSnet is completed.
Facebook
TwitterAll financial transactions made by the Intellectual Property Office as part of the Government’s commitment to transparency in expenditure
Facebook
TwitterFinance Datasets
Historical stock and cryptocurrency price data.
Contents
Stocks (5 years of daily OHLCV data)
AAPL - Apple Inc. GOOGL - Alphabet Inc. MSFT - Microsoft Corp. AMZN - Amazon.com Inc. TSLA - Tesla Inc. META - Meta Platforms NVDA - NVIDIA Corp. AMD - Advanced Micro Devices INTC - Intel Corp. NFLX - Netflix Inc.
Cryptocurrencies (full history)
BTC_USD - Bitcoin ETH_USD - Ethereum SOL_USD - Solana ADA_USD - Cardano DOT_USD - Polkadot… See the full description on the dataset page: https://huggingface.co/datasets/misterdonn/finance-datasets.
Facebook
TwitterThis data set contains a summary of information about candidate campaigns and political committees by election year. For candidate campaigns and single-year/election committees, a single record is provided that covers all activity of the campaign for the given election year. Information for continuing political committees is summarized by calendar/reporting year. The data set covers that prior 16 years plus the current election year. The data are compiled from the campaign reports deposit (C3), campaign summary reports (C4), campaign registrations (C1/C1pc) and candidate declarations and elections data provided to the PDC by the Washington Secretary of State. Records are updated in near real-time, typically less than 2 minutes from the time the campaign submits new data.
This dataset is a best-effort by the PDC to provide a complete set of records as described herewith. The PDC provides access to the original reports for the purpose of record verification.
Descriptions attached to this dataset do not constitute legal definitions; please consult RCW 42.17A and WAC Title 390 for legal definitions and additional information regarding political finance disclosure requirements.
CONDITION OF RELEASE: This publication and or referenced documents constitutes a list of individuals prepared by the Washington State Public Disclosure Commission and may not be used for commercial purposes. This list is provided on the condition and with the understanding that the persons receiving it agree to this statutorily imposed limitation on its use. See RCW 42.56.070(9) and AGO 1975 No. 15.
Facebook
Twitterhttps://www.lseg.com/en/policies/website-disclaimerhttps://www.lseg.com/en/policies/website-disclaimer
Explore LSEG's Project Finance Deals Data, providing loan information and league tables to the global deal-making community.
Facebook
TwitterRevenue and invoicing
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Quantitative Finance Fine-Tuning Dataset
A dataset of 24 Q&A examples designed to fine-tune large language models (LLMs) for quantitative finance.
📂 Categories
Category Topics Examples
Volatility Models SABR (corrected), Bergomi, rBergomi, Heston 5
Derivatives Pricing Dupire, VIX, Black-Scholes Greeks, CVaR 5
Interest Rates & Credit HJM, Hull-White, Merton, CDS 4
Numerical Methods Crank-Nicolson, Monte Carlo, FFT, LSM 5
Quant Strategies Momentum, Pairs… See the full description on the dataset page: https://huggingface.co/datasets/mo35/quant-finance-dataset.
Facebook
Twitterhttps://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain
Graph and download economic data for Finance Companies; Equity Capital, Level (BOGZ1FL615080003Q) from Q4 1945 to Q1 2026 about finance companies, companies, equity, finance, capital, financial, and USA.
Facebook
Twitterhttps://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain
Graph and download economic data for Domestic Finance Companies, All Other Assets and Accounts and Notes Receivable, Flow (STFAFOXDFBANA) from Q2 1984 to Q1 2026 about notes, flow, finance companies, accounting, companies, finance, financial, domestic, assets, and USA.
Facebook
Twitterhttps://www.mordorintelligence.com/privacy-policyhttps://www.mordorintelligence.com/privacy-policy
The Embedded Finance Market is Segmented by Type (Payments, Insurance, Lending, Investments, Other Service Types), End-Use Industry (IT & Telecommunication, Manufacturing, and More), Business Model (Retail Consumers, and Businesses), and Region (North America, South America, and More). The Market Forecasts are Provided in Terms of Value (USD).
Facebook
TwitterAnonymousLLMer/finance-corpus-aihub-wiki dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterhttps://www.mordorintelligence.com/privacy-policyhttps://www.mordorintelligence.com/privacy-policy
The Finance Cloud Market Report is Segmented by Solution (Core Accounting and GL, Financial Forecasting and Planning, and More), Deployment Model (Public Cloud, Private Cloud, and Hybrid / Multi-Cloud), End-User (Banking, Insurance, Capital Markets, and More), Organization Size (Large Enterprises and Small and Medium Enterprises (SMEs)), and Geography. The Market Forecasts are Provided in Terms of Value (USD).
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
The "yahoo_finance_dataset(2018-2023)" dataset is a financial dataset containing daily stock market data for multiple assets such as equities, ETFs, and indexes. It spans from April 1, 2018 to March 31, 2023, and contains 1257 rows and 7 columns. The data was sourced from Yahoo Finance, and the purpose of the dataset is to provide researchers, analysts, and investors with a comprehensive dataset that they can use to analyze stock market trends, identify patterns, and develop investment strategies. The dataset can be used for various tasks, including stock price prediction, trend analysis, portfolio optimization, and risk management. The dataset is provided in XLSX format, which makes it easy to import into various data analysis tools, including Python, R, and Excel.
The dataset includes the following columns:
Date: The date on which the stock market data was recorded. Open: The opening price of the asset on the given date. High: The highest price of the asset on the given date. Low: The lowest price of the asset on the given date. Close*: The closing price of the asset on the given date. Note that this price does not take into account any after-hours trading that may have occurred after the market officially closed. Adj Close**: The adjusted closing price of the asset on the given date. This price takes into account any dividends, stock splits, or other corporate actions that may have occurred, which can affect the stock price. Volume: The total number of shares of the asset that were traded on the given date.