Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This repository contains a meticulously scraped dataset from various financial websites. The data extraction process ensures high-quality and accurate text, including content from both the websites and their embedded PDFs.
We applied the advanced Mixtral 7X8 model to generate the following additional fields:
The prompt used to generate the additional fields was highly effective, thanks to extensive discussions and collaboration with the Mistral AI team. This ensures that the dataset provides valuable insights and is ready for further analysis and model training.
This dataset can be used for various applications, including but not limited to:
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset, titled "Financial-QA-10k", contains 10,000 question-answer pairs derived from company financial reports, specifically the 10-K filings. The questions are designed to cover a wide range of topics relevant to financial analysis, company operations, and strategic insights, making it a valuable resource for researchers, data scientists, and finance professionals. Each entry includes the question, the corresponding answer, the context from which the answer is derived, the company's stock ticker, and the specific filing year. The dataset aims to facilitate the development and evaluation of natural language processing models in the financial domain.
About the Dataset Dataset Structure:
Sample Data:
Question: What area did NVIDIA initially focus on before expanding into other markets? Answer: NVIDIA initially focused on PC graphics. Context: Since our original focus on PC graphics, we have expanded into various markets. Ticker: NVDA Filing: 2023_10K
Potential Uses:
Natural Language Processing (NLP): Develop and test NLP models for question answering, context understanding, and information retrieval. Financial Analysis: Extract and analyze specific financial and operational insights from large volumes of textual data. Educational Purposes: Serve as a training and testing resource for students and researchers in finance and data science.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Description 📊🔍
The Sujet-Finance-QA-Vision-100k is a comprehensive dataset containing over 100,000 question-answer pairs derived from more than 9,800 financial document images. This dataset is designed to support research and development in the field of financial document analysis and visual question answering.
Key Features:
🖼️ 9,801 unique financial document images ❓ 107,050 question-answer pairs 🇬🇧 English language 📄 Diverse financial document types… See the full description on the dataset page: https://huggingface.co/datasets/sujet-ai/Sujet-Finance-QA-Vision-100k.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These three datasets provide closing price information for the following assets: Google, Apple, Microsoft, Netflix, Amazon, Pfizer, Astra Zeneca, Johnson & Johnson, ETH, BTC and LTC.The time period spans from 2012 to the end of 2020.
Facebook
TwitterThis dataset was created by DanishJavedCodes
Facebook
TwitterThe data sets provide the text and detailed numeric information in all financial statements and their notes extracted from exhibits to corporate financial reports filed with the Commission using eXtensible Business Reporting Language (XBRL).
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Image generated by DALL-E. See prompt for more details
💼 📊 Synthetic Financial Domain Documents with PII Labels
gretelai/synthetic_pii_finance_multilingual is a dataset of full length synthetic financial documents containing Personally Identifiable Information (PII), generated using Gretel Navigator and released under Apache 2.0. This dataset is designed to assist with the following use cases:
🏷️ Training NER (Named Entity Recognition) models to detect and label PII in… See the full description on the dataset page: https://huggingface.co/datasets/gretelai/synthetic_pii_finance_multilingual.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Under the current IDBG Corporate Results Framework (CRF) 2020-2023 (https://crf.iadb.org/en), the IDB committed to reach 30% of the total amount approved (including all lending operations) of climate finance during this period. In 2022, the IDB Group - composed of the IDB, IDB Lab (formerly the Multilateral Investment Fund) and IDB Invest - approved US$7.8 billion in climate finance as per the MDB climate finance tracking methodology. This resource is aimed at development activities carried out by the public and private sectors that reduce greenhouse gas (GHG) emissions and thus mitigate climate change, and/or that reduce vulnerability to climate change and contribute to an adaptation process. The IDB approved US$6.1 billion in climate finance (45.3% of total approvals). The IDB Group is composed of two separate legal entities: the IDB and the Inter-American Investment Corporation (IIC), which was rebranded as IDB Invest in 2017. The IDB Lab is a trust fund administered by the IDB and serves a unique function as the IDB Group s innovation laboratory. This dataset pertains to the IDB. Climate finance for the entire IDB Group (IDB, IDB Lab, and IDB Invest) in 2023 was US$8.3 billion.
Facebook
TwitterAll financial transactions made by the Intellectual Property Office as part of the Government’s commitment to transparency in expenditure
Facebook
TwitterThe first table of the G.20 shows seasonally adjusted data for the flows and levels of finance company receivables outstanding. These data include simple annual percent changes of total, consumer, real estate, and business receivables. The percent change in a given period is calculated as the flow of receivables in the current period divided by the level in the previous period. Percent changes and levels are calculated from unrounded data. The second and third pages of the G.20 show data that are not seasonally adjusted. The second page contains levels of outstanding receivables by receivable type, while the third page contains flow of receivables by type.
Facebook
TwitterThis (financial and personal) data is required to be kept as part of the auditing process of the co-ordinating country. It is required to be retained for several years after the ESSnet is completed.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Description
The Twitter Financial News dataset is an English-language dataset containing an annotated corpus of finance-related tweets. This dataset is used to classify finance-related tweets for their topic.
The dataset holds 21,107 documents annotated with 20 labels:
topics = { "LABEL_0": "Analyst Update", "LABEL_1": "Fed | Central Banks", "LABEL_2": "Company | Product News", "LABEL_3": "Treasuries | Corporate Debt", "LABEL_4": "Dividend"… See the full description on the dataset page: https://huggingface.co/datasets/zeroshot/twitter-financial-news-topic.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Finance-Instruct-500k Dataset
Overview
Finance-Instruct-500k is a comprehensive and meticulously curated dataset designed to train advanced language models for financial tasks, reasoning, and multi-turn conversations. Combining data from numerous high-quality financial datasets, this corpus provides over 500,000 entries, offering unparalleled depth and versatility for finance-related instruction tuning and fine-tuning. The dataset includes content tailored for financial… See the full description on the dataset page: https://huggingface.co/datasets/oieieio/Finance-Instruct-500k.
Facebook
TwitterAll financial transactions made by Companies House as part of the Government’s commitment to transparency in expenditure
Facebook
TwitterThis Data is from a crash course on Davidson
It contains personal transactions on credit and debit transactions.
Thanks to DAVIDSON
This data can be analysed to answer questions like the total expenses incurred, the total income etc..
Facebook
Twitterhttps://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain
Graph and download economic data for Domestic Finance Companies, All Other Assets and Accounts and Notes Receivable, Flow (STFAFOXDFBANA) from Q2 1984 to Q2 2025 about notes, flow, finance companies, accounting, companies, finance, financial, domestic, assets, and USA.
Facebook
TwitterThe FR 3033p is the first part of a two-stage survey series, which has been conducted at regular five-year intervals since 1955. It is a census survey designed to identify the universe of finance companies eligible for potential inclusion in the FR 3033s. It gathers limited information including total assets, areas of specialization, and information on the corporate structure of such companies. The second part of these information collections, the FR 3033s, collects balance sheet data on major categories of consumer and business credit receivables and major liabilities, along with income and expenses, and is used to gather information on the scope of a company's operations and loan and lease servicing activities. In addition, additional questions were added to collect lending information related to the COVID-19 impacts.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Our data sheds light on the distribution of Finance stores across different online platforms. WooCommerce leads with a substantial number of stores, holding 25.47K stores, which accounts for 49.97% of the total in this category. Custom Cart follows with 7.73K stores, making up 15.17% of the Finance market. Meanwhile, Shopify offers a significant presence as well, with 6.03K stores, or 11.84% of the total. This chart gives a clear picture of how stores within the Finance sector are spread across these key platforms.
Facebook
TwitterAnonymousLLMer/finance-corpus-aihub-wiki dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This data contains the latest State and Local Government Finance data from the U.S. Census. A detailed description of the project can be found in: Pierson K., Hand M., and Thompson F. (2015). The Government Finance Database: A Common Resource for Quantitative Research in Public Financial Analysis. PLoS ONE doi: 10.1371/journal.pone.0130119
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This repository contains a meticulously scraped dataset from various financial websites. The data extraction process ensures high-quality and accurate text, including content from both the websites and their embedded PDFs.
We applied the advanced Mixtral 7X8 model to generate the following additional fields:
The prompt used to generate the additional fields was highly effective, thanks to extensive discussions and collaboration with the Mistral AI team. This ensures that the dataset provides valuable insights and is ready for further analysis and model training.
This dataset can be used for various applications, including but not limited to: