100+ datasets found
  1. High-Quality Financial News Dataset for NLP Tasks

    • kaggle.com
    zip
    Updated Oct 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sayel Abualigah (2025). High-Quality Financial News Dataset for NLP Tasks [Dataset]. https://www.kaggle.com/datasets/sayelabualigah/high-quality-financial-news-dataset-for-nlp-tasks
    Explore at:
    zip(1566953 bytes)Available download formats
    Dataset updated
    Oct 21, 2025
    Authors
    Sayel Abualigah
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    High-Quality Financial News Dataset

    Description

    This repository contains a meticulously scraped dataset from various financial websites. The data extraction process ensures high-quality and accurate text, including content from both the websites and their embedded PDFs.

    Dataset Features

    • Date: The date of the announcement.
    • Subject: The subject of the financial news.
    • Content: The full content of the announcement, including text from the website and PDFs.

    Additional Processed Fields

    We applied the advanced Mixtral 7X8 model to generate the following additional fields:

    • ParaphrasedSubject: A paraphrased version of the original subject.
    • CompactedSummary: A concise summary limited to 1.5 lines.
    • DetailedSummary: A detailed summary of the content.
    • Impact: The impact of the announcement, summarized in 2 lines.

    Methodology

    The prompt used to generate the additional fields was highly effective, thanks to extensive discussions and collaboration with the Mistral AI team. This ensures that the dataset provides valuable insights and is ready for further analysis and model training.

    Usage

    This dataset can be used for various applications, including but not limited to:

    • Financial news analysis
    • Abstractive/Exctractive Summarization tasks
    • Machine learning model training
    • Natural language processing tasks
  2. Financial Q&A - 10k

    • kaggle.com
    zip
    Updated Jun 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yousef Saeedian (2024). Financial Q&A - 10k [Dataset]. https://www.kaggle.com/datasets/yousefsaeedian/financial-q-and-a-10k
    Explore at:
    zip(753665 bytes)Available download formats
    Dataset updated
    Jun 17, 2024
    Authors
    Yousef Saeedian
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset, titled "Financial-QA-10k", contains 10,000 question-answer pairs derived from company financial reports, specifically the 10-K filings. The questions are designed to cover a wide range of topics relevant to financial analysis, company operations, and strategic insights, making it a valuable resource for researchers, data scientists, and finance professionals. Each entry includes the question, the corresponding answer, the context from which the answer is derived, the company's stock ticker, and the specific filing year. The dataset aims to facilitate the development and evaluation of natural language processing models in the financial domain.

    About the Dataset Dataset Structure:

    • Rows: 7000
    • Columns: 5
    • question: The financial or operational question asked.
    • answer: The specific answer to the question.
    • context: The textual context extracted from the 10-K filing, providing additional information.
    • ticker: The stock ticker symbol of the company.
    • filing: The year of the 10-K filing from which the question and answer are derived.

    Sample Data:

    Question: What area did NVIDIA initially focus on before expanding into other markets? Answer: NVIDIA initially focused on PC graphics. Context: Since our original focus on PC graphics, we have expanded into various markets. Ticker: NVDA Filing: 2023_10K

    Potential Uses:

    Natural Language Processing (NLP): Develop and test NLP models for question answering, context understanding, and information retrieval. Financial Analysis: Extract and analyze specific financial and operational insights from large volumes of textual data. Educational Purposes: Serve as a training and testing resource for students and researchers in finance and data science.

  3. h

    Sujet-Finance-QA-Vision-100k

    • huggingface.co
    Updated Jul 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sujet AI (2024). Sujet-Finance-QA-Vision-100k [Dataset]. https://huggingface.co/datasets/sujet-ai/Sujet-Finance-QA-Vision-100k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 14, 2024
    Dataset authored and provided by
    Sujet AI
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Description 📊🔍

    The Sujet-Finance-QA-Vision-100k is a comprehensive dataset containing over 100,000 question-answer pairs derived from more than 9,800 financial document images. This dataset is designed to support research and development in the field of financial document analysis and visual question answering.

      Key Features:
    

    🖼️ 9,801 unique financial document images ❓ 107,050 question-answer pairs 🇬🇧 English language 📄 Diverse financial document types… See the full description on the dataset page: https://huggingface.co/datasets/sujet-ai/Sujet-Finance-QA-Vision-100k.

  4. Historical financial datasets for Financial Analysis with Spyder workshop

    • figshare.com
    txt
    Updated Jul 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Spyder IDE (2021). Historical financial datasets for Financial Analysis with Spyder workshop [Dataset]. http://doi.org/10.6084/m9.figshare.14995215.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 16, 2021
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Spyder IDE
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These three datasets provide closing price information for the following assets: Google, Apple, Microsoft, Netflix, Amazon, Pfizer, Astra Zeneca, Johnson & Johnson, ETH, BTC and LTC.The time period spans from 2012 to the end of 2020.

  5. Financial raw data

    • kaggle.com
    zip
    Updated Feb 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DanishJavedCodes (2024). Financial raw data [Dataset]. https://www.kaggle.com/datasets/danishjavedcodes/financial-raw-data
    Explore at:
    zip(1765 bytes)Available download formats
    Dataset updated
    Feb 14, 2024
    Authors
    DanishJavedCodes
    Description

    Dataset

    This dataset was created by DanishJavedCodes

    Contents

  6. d

    Financial Statement and Notes Data Sets

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated Nov 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Economic and Risk Analysis (2025). Financial Statement and Notes Data Sets [Dataset]. https://catalog.data.gov/dataset/financial-statement-and-notes-data-sets
    Explore at:
    Dataset updated
    Nov 22, 2025
    Dataset provided by
    Economic and Risk Analysis
    Description

    The data sets provide the text and detailed numeric information in all financial statements and their notes extracted from exhibits to corporate financial reports filed with the Commission using eXtensible Business Reporting Language (XBRL).

  7. h

    synthetic_pii_finance_multilingual

    • huggingface.co
    Updated Jun 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gretel.ai (2024). synthetic_pii_finance_multilingual [Dataset]. https://huggingface.co/datasets/gretelai/synthetic_pii_finance_multilingual
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 11, 2024
    Dataset provided by
    Gretel.ai
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Image generated by DALL-E. See prompt for more details

      💼 📊 Synthetic Financial Domain Documents with PII Labels
    

    gretelai/synthetic_pii_finance_multilingual is a dataset of full length synthetic financial documents containing Personally Identifiable Information (PII), generated using Gretel Navigator and released under Apache 2.0. This dataset is designed to assist with the following use cases:

    🏷️ Training NER (Named Entity Recognition) models to detect and label PII in… See the full description on the dataset page: https://huggingface.co/datasets/gretelai/synthetic_pii_finance_multilingual.

  8. 2023 IDB Climate Finance Database

    • data.iadb.org
    csv, docx, xlsx
    Updated Apr 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IDB Datasets (2025). 2023 IDB Climate Finance Database [Dataset]. http://doi.org/10.60966/s10a-j762
    Explore at:
    docx(26365), csv(146350), xlsx(434341), csv(1363)Available download formats
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Inter-American Development Bankhttp://www.iadb.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2023
    Description

    Under the current IDBG Corporate Results Framework (CRF) 2020-2023 (https://crf.iadb.org/en), the IDB committed to reach 30% of the total amount approved (including all lending operations) of climate finance during this period. In 2022, the IDB Group - composed of the IDB, IDB Lab (formerly the Multilateral Investment Fund) and IDB Invest - approved US$7.8 billion in climate finance as per the MDB climate finance tracking methodology. This resource is aimed at development activities carried out by the public and private sectors that reduce greenhouse gas (GHG) emissions and thus mitigate climate change, and/or that reduce vulnerability to climate change and contribute to an adaptation process. The IDB approved US$6.1 billion in climate finance (45.3% of total approvals). The IDB Group is composed of two separate legal entities: the IDB and the Inter-American Investment Corporation (IIC), which was rebranded as IDB Invest in 2017. The IDB Lab is a trust fund administered by the IDB and serves a unique function as the IDB Group s innovation laboratory. This dataset pertains to the IDB. Climate finance for the entire IDB Group (IDB, IDB Lab, and IDB Invest) in 2023 was US$8.3 billion.

  9. Finance Dataset - Dataset - data.gov.uk

    • ckan.publishing.service.gov.uk
    Updated Aug 30, 2013
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.publishing.service.gov.uk (2013). Finance Dataset - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/finance-dataset_1
    Explore at:
    Dataset updated
    Aug 30, 2013
    Dataset provided by
    CKANhttps://ckan.org/
    Description

    All financial transactions made by the Intellectual Property Office as part of the Government’s commitment to transparency in expenditure

  10. Data from: Finance Companies

    • catalog.data.gov
    • s.cnmilf.com
    Updated Dec 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Board of Governors of the Federal Reserve System (2024). Finance Companies [Dataset]. https://catalog.data.gov/dataset/finance-companies
    Explore at:
    Dataset updated
    Dec 18, 2024
    Dataset provided by
    Federal Reserve Board of Governors
    Federal Reserve Systemhttp://www.federalreserve.gov/
    Description

    The first table of the G.20 shows seasonally adjusted data for the flows and levels of finance company receivables outstanding. These data include simple annual percent changes of total, consumer, real estate, and business receivables. The percent change in a given period is calculated as the flow of receivables in the current period divided by the level in the previous period. Percent changes and levels are calculated from unrounded data. The second and third pages of the G.20 show data that are not seasonally adjusted. The second page contains levels of outstanding receivables by receivable type, while the third page contains flow of receivables by type.

  11. ESSnet finance - Dataset - data.gov.uk

    • ckan.publishing.service.gov.uk
    Updated Aug 30, 2013
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.publishing.service.gov.uk (2013). ESSnet finance - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/essnet-finance
    Explore at:
    Dataset updated
    Aug 30, 2013
    Dataset provided by
    CKANhttps://ckan.org/
    Description

    This (financial and personal) data is required to be kept as part of the auditing process of the co-ordinating country. It is required to be retained for several years after the ESSnet is completed.

  12. h

    twitter-financial-news-topic

    • huggingface.co
    Updated Dec 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    not a (2022). twitter-financial-news-topic [Dataset]. https://huggingface.co/datasets/zeroshot/twitter-financial-news-topic
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 4, 2022
    Authors
    not a
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Description

    The Twitter Financial News dataset is an English-language dataset containing an annotated corpus of finance-related tweets. This dataset is used to classify finance-related tweets for their topic.

    The dataset holds 21,107 documents annotated with 20 labels:

    topics = { "LABEL_0": "Analyst Update", "LABEL_1": "Fed | Central Banks", "LABEL_2": "Company | Product News", "LABEL_3": "Treasuries | Corporate Debt", "LABEL_4": "Dividend"… See the full description on the dataset page: https://huggingface.co/datasets/zeroshot/twitter-financial-news-topic.

  13. h

    Finance-Instruct-500k

    • huggingface.co
    Updated Nov 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jorge Alonso (2025). Finance-Instruct-500k [Dataset]. https://huggingface.co/datasets/oieieio/Finance-Instruct-500k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 8, 2025
    Authors
    Jorge Alonso
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Finance-Instruct-500k Dataset

      Overview
    

    Finance-Instruct-500k is a comprehensive and meticulously curated dataset designed to train advanced language models for financial tasks, reasoning, and multi-turn conversations. Combining data from numerous high-quality financial datasets, this corpus provides over 500,000 entries, offering unparalleled depth and versatility for finance-related instruction tuning and fine-tuning. The dataset includes content tailored for financial… See the full description on the dataset page: https://huggingface.co/datasets/oieieio/Finance-Instruct-500k.

  14. Finance Dataset

    • data.wu.ac.at
    • data.europa.eu
    Updated Dec 12, 2013
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Companies House (2013). Finance Dataset [Dataset]. https://data.wu.ac.at/odso/data_gov_uk/Y2M4MmEwNmItMjU5Ni00ZDE1LWExZDEtNGMxNzRjOGM4ZTRk
    Explore at:
    Dataset updated
    Dec 12, 2013
    Dataset provided by
    Companies Househttp://companieshouse.gov.uk/
    Description

    All financial transactions made by Companies House as part of the Government’s commitment to transparency in expenditure

  15. Personal Finance

    • kaggle.com
    zip
    Updated Dec 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    bukola Fatunde (2020). Personal Finance [Dataset]. https://www.kaggle.com/bukolafatunde/personal-finance
    Explore at:
    zip(7341 bytes)Available download formats
    Dataset updated
    Dec 17, 2020
    Authors
    bukola Fatunde
    Description

    Context

    This Data is from a crash course on Davidson

    Content

    It contains personal transactions on credit and debit transactions.

    Acknowledgements

    Thanks to DAVIDSON

    Inspiration

    This data can be analysed to answer questions like the total expenses incurred, the total income etc..

  16. F

    Domestic Finance Companies, All Other Assets and Accounts and Notes...

    • fred.stlouisfed.org
    json
    Updated Sep 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Domestic Finance Companies, All Other Assets and Accounts and Notes Receivable, Flow [Dataset]. https://fred.stlouisfed.org/series/STFAFOXDFBANA
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Sep 29, 2025
    License

    https://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain

    Description

    Graph and download economic data for Domestic Finance Companies, All Other Assets and Accounts and Notes Receivable, Flow (STFAFOXDFBANA) from Q2 1984 to Q2 2025 about notes, flow, finance companies, accounting, companies, finance, financial, domestic, assets, and USA.

  17. Census of Finance Companies and Other Lenders; Survey of Finance Companies

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated Dec 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Board of Governors of the Federal Reserve System (2024). Census of Finance Companies and Other Lenders; Survey of Finance Companies [Dataset]. https://catalog.data.gov/dataset/census-of-finance-companies-and-other-lenders-survey-of-finance-companies
    Explore at:
    Dataset updated
    Dec 18, 2024
    Dataset provided by
    Federal Reserve Board of Governors
    Federal Reserve Systemhttp://www.federalreserve.gov/
    Description

    The FR 3033p is the first part of a two-stage survey series, which has been conducted at regular five-year intervals since 1955. It is a census survey designed to identify the universe of finance companies eligible for potential inclusion in the FR 3033s. It gathers limited information including total assets, areas of specialization, and information on the corporate structure of such companies. The second part of these information collections, the FR 3033s, collects balance sheet data on major categories of consumer and business credit receivables and major liabilities, along with income and expenses, and is used to gather information on the scope of a company's operations and loan and lease servicing activities. In addition, additional questions were added to collect lending information related to the COVID-19 impacts.

  18. Finance Stores Count by Platforms

    • aftership.com
    Updated Feb 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AfterShip (2024). Finance Stores Count by Platforms [Dataset]. https://www.aftership.com/ecommerce/statistics/stores/finance
    Explore at:
    Dataset updated
    Feb 7, 2024
    Dataset authored and provided by
    AfterShiphttps://www.aftership.com/
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Our data sheds light on the distribution of Finance stores across different online platforms. WooCommerce leads with a substantial number of stores, holding 25.47K stores, which accounts for 49.97% of the total in this category. Custom Cart follows with 7.73K stores, making up 15.17% of the Finance market. Meanwhile, Shopify offers a significant presence as well, with 6.03K stores, or 11.84% of the total. This chart gives a clear picture of how stores within the Finance sector are spread across these key platforms.

  19. h

    finance-corpus-aihub-wiki

    • huggingface.co
    Updated Dec 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AnonymousLLMer (2024). finance-corpus-aihub-wiki [Dataset]. https://huggingface.co/datasets/AnonymousLLMer/finance-corpus-aihub-wiki
    Explore at:
    Dataset updated
    Dec 9, 2024
    Authors
    AnonymousLLMer
    Description

    AnonymousLLMer/finance-corpus-aihub-wiki dataset hosted on Hugging Face and contributed by the HF Datasets community

  20. H

    The Government Finance Database

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Jul 7, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kawika Pierson; Michael L. Hand; Fred Thompson (2018). The Government Finance Database [Dataset]. http://doi.org/10.7910/DVN/LMS8NT
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 7, 2018
    Dataset provided by
    Harvard Dataverse
    Authors
    Kawika Pierson; Michael L. Hand; Fred Thompson
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This data contains the latest State and Local Government Finance data from the U.S. Census. A detailed description of the project can be found in: Pierson K., Hand M., and Thompson F. (2015). The Government Finance Database: A Common Resource for Quantitative Research in Public Financial Analysis. PLoS ONE doi: 10.1371/journal.pone.0130119

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Sayel Abualigah (2025). High-Quality Financial News Dataset for NLP Tasks [Dataset]. https://www.kaggle.com/datasets/sayelabualigah/high-quality-financial-news-dataset-for-nlp-tasks
Organization logo

High-Quality Financial News Dataset for NLP Tasks

Financial Dataset for SFT Task

Explore at:
zip(1566953 bytes)Available download formats
Dataset updated
Oct 21, 2025
Authors
Sayel Abualigah
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

High-Quality Financial News Dataset

Description

This repository contains a meticulously scraped dataset from various financial websites. The data extraction process ensures high-quality and accurate text, including content from both the websites and their embedded PDFs.

Dataset Features

  • Date: The date of the announcement.
  • Subject: The subject of the financial news.
  • Content: The full content of the announcement, including text from the website and PDFs.

Additional Processed Fields

We applied the advanced Mixtral 7X8 model to generate the following additional fields:

  • ParaphrasedSubject: A paraphrased version of the original subject.
  • CompactedSummary: A concise summary limited to 1.5 lines.
  • DetailedSummary: A detailed summary of the content.
  • Impact: The impact of the announcement, summarized in 2 lines.

Methodology

The prompt used to generate the additional fields was highly effective, thanks to extensive discussions and collaboration with the Mistral AI team. This ensures that the dataset provides valuable insights and is ready for further analysis and model training.

Usage

This dataset can be used for various applications, including but not limited to:

  • Financial news analysis
  • Abstractive/Exctractive Summarization tasks
  • Machine learning model training
  • Natural language processing tasks
Search
Clear search
Close search
Google apps
Main menu