https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset offers a detailed collection of US-GAAP financial data extracted from the financial statements of exchange-listed U.S. companies, as submitted to the U.S. Securities and Exchange Commission (SEC) via the EDGAR database. Covering filings from January 2009 onwards, this dataset provides key financial figures reported by companies in accordance with U.S. Generally Accepted Accounting Principles (GAAP).
This dataset primarily relies on the SEC's Financial Statement Data Sets and EDGAR APIs: - SEC Financial Statement Data Sets - EDGAR Application Programming Interfaces
In instances where specific figures were missing from these sources, data was directly extracted from the companies' financial statements to ensure completeness.
Please note that the dataset presents financial figures exactly as reported by the companies, which may occasionally include errors. A common issue involves incorrect reporting of scaling factors in the XBRL format. XBRL supports two tag attributes related to scaling: 'decimals' and 'scale.' The 'decimals' attribute indicates the number of significant decimal places but does not affect the actual value of the figure, while the 'scale' attribute adjusts the value by a specific factor.
However, there are several instances, numbering in the thousands, where companies have incorrectly used the 'decimals' attribute (e.g., 'decimals="-6"') under the mistaken assumption that it controls scaling. This is not correct, and as a result, some figures may be inaccurately scaled. This dataset does not attempt to detect or correct such errors; it aims to reflect the data precisely as reported by the companies. A future version of the dataset may be introduced to address and correct these issues.
The source code for data extraction is available here
The data sets below provide selected information extracted from exhibits to corporate financial reports filed with the Commission using eXtensible Business Reporting Language (XBRL).
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This is a dataset that requires a lot of preprocessing with amazing EDA insights for a company. A dataset consisting of sales and profit data sorted by market segment and country/region.
Tips for pre-processing: 1. Check for column names and find error there itself!! 2. Remove '$' sign and '-' from all columns where they are present 3. Change datatype from objects to int after the above two. 4. Challenge: Try removing " , " (comma) from all numerical numbers. 5. Try plotting sales and profit with respect to timeline
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides a structured and machine-readable collection of financial statements filed with the Companies Registration Office (CRO) in Ireland. It currently includes financial statements for the year 2022, with additional years to be added as they become available. The dataset aligns with the European Union’s Open Data Directive (Directive (EU) 2019/1024) and the Implementing Regulation (EU) 2023/138, which designates company and company ownership data as a high-value dataset. It is available for bulk download and API access under the Creative Commons Attribution 4.0 (CC BY 4.0) licence, allowing unrestricted reuse with appropriate attribution. By increasing transparency and enabling data-driven insights, this dataset supports public sector initiatives, financial analysis, and digital services development. The API endpoints can be accessed using these links - Query - https://opendata.cro.ie/api/3/action/datastore_search Query (via SQL) - https://opendata.cro.ie/api/3/action/datastore_search_sql
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This is a compiled datasets comprising of data from various companies' 10-K annual reports and balance sheets. The data is a longitudinal or panel data, from year 2009-2022(/23) and also consists of a few bankrupt companies to help for investigating factors. The names of the companies are given according to their Stocks. Companies divided into specific categories.
Success.ai offers a cutting-edge solution for businesses and organizations seeking Company Financial Data on private and public companies. Our comprehensive database is meticulously crafted to provide verified profiles, including contact details for financial decision-makers such as CFOs, financial analysts, corporate treasurers, and other key stakeholders. This robust dataset is continuously updated and validated using AI technology to ensure accuracy and relevance, empowering businesses to make informed decisions and optimize their financial strategies.
Key Features of Success.ai's Company Financial Data:
Global Coverage: Access data from over 70 million businesses worldwide, including public and private companies across all major industries and regions. Our datasets span 250+ countries, offering extensive reach for your financial analysis and market research.
Detailed Financial Profiles: Gain insights into company financials, including revenue, profit margins, funding rounds, and operational costs. Profiles are enriched with key contact details, including work emails, phone numbers, and physical addresses, ensuring direct access to decision-makers.
Industry-Specific Data: Tailored datasets for sectors such as financial services, manufacturing, technology, healthcare, and energy, among others. Each dataset is customized to meet the unique needs of industry professionals and analysts.
Real-Time Accuracy: With continuous updates powered by AI-driven validation, our financial data maintains a 99% accuracy rate, ensuring you have access to the most reliable and up-to-date information available.
Compliance and Security: All data is collected and processed in strict adherence to global compliance standards, including GDPR, ensuring ethical and lawful usage.
Why Choose Success.ai for Company Financial Data?
Best Price Guarantee: We pride ourselves on offering the most competitive pricing in the industry, ensuring you receive unparalleled value for comprehensive financial data.
AI-Validated Accuracy: Our advanced AI algorithms meticulously verify every data point to ensure precision and reliability, helping you avoid costly errors in your financial decision-making.
Customized Data Solutions: Whether you need data for a specific region, industry, or type of business, we tailor our datasets to align perfectly with your requirements.
Scalable Data Access: From small startups to global enterprises, our platform caters to businesses of all sizes, delivering scalable solutions to suit your operational needs.
Comprehensive Use Cases for Financial Data:
Leverage our detailed financial profiles to create accurate budgets, forecasts, and strategic plans. Gain insights into competitors’ financial health and market positions to make data-driven decisions.
Access key financial details and contact information to streamline your M&A processes. Identify potential acquisition targets or partners with verified profiles and financial data.
Evaluate the financial performance of public and private companies for informed investment decisions. Use our data to identify growth opportunities and assess risk factors.
Enhance your sales outreach by targeting CFOs, financial analysts, and other decision-makers with verified contact details. Utilize accurate email and phone data to increase conversion rates.
Understand market trends and financial benchmarks with our industry-specific datasets. Use the data for competitive analysis, benchmarking, and identifying market gaps.
APIs to Power Your Financial Strategies:
Enrichment API: Integrate real-time updates into your systems with our Enrichment API. Keep your financial data accurate and current to drive dynamic decision-making and maintain a competitive edge.
Lead Generation API: Supercharge your lead generation efforts with access to verified contact details for key financial decision-makers. Perfect for personalized outreach and targeted campaigns.
Tailored Solutions for Industry Professionals:
Financial Services Firms: Gain detailed insights into revenue streams, funding rounds, and operational costs for competitor analysis and client acquisition.
Corporate Finance Teams: Enhance decision-making with precise data on industry trends and benchmarks.
Consulting Firms: Deliver informed recommendations to clients with access to detailed financial datasets and key stakeholder profiles.
Investment Firms: Identify potential investment opportunities with verified data on financial performance and market positioning.
What Sets Success.ai Apart?
Extensive Database: Access detailed financial data for 70M+ companies worldwide, including small businesses, startups, and large corporations.
Ethical Practices: Our data collection and processing methods are fully comp...
Our Financial API provides access to a vast collection of historical financial statements for over 50,000+ companies listed on major exchanges. With this powerful tool, you can easily retrieve balance sheets, income statements, and cash flow statements for any company in our extensive database. Stay informed about the financial health of various organizations and make data-driven decisions with confidence. Our API is designed to deliver accurate and up-to-date financial information, enabling you to gain valuable insights and streamline your analysis process. Experience the convenience and reliability of our company financial API today.
https://www.aiceltech.com/termshttps://www.aiceltech.com/terms
Korean Companies’ Financial Data provides important information to analyze a company’s financial status and performance. This data includes financial indicators such as revenue, expenses, assets, and liabilities. Collected from corporate financial reports and stock market data, it helps investors evaluate financial health and discover investment opportunities, essential for valuing Korean companies.
At CompanyData.com (BoldData), we specialize in delivering high-quality company data sourced directly from official trade registers. Our extensive dataset includes historical financial records for over 230 million companies worldwide, enabling deeper insight into business performance over time. Whether you're benchmarking companies, training AI models, or building risk profiles, our financial data equips you with the long-term perspective you need.
Our financial database includes multi-year balance sheets, profit and loss statements, and key performance indicators such as revenue, net income, assets, liabilities, and equity. We provide standardized and structured data—backed by rigorous validation processes—to ensure consistency and accuracy across jurisdictions. Each financial profile can be enriched with hierarchical data, firmographics, contact details, and industry classifications to support complex analyses.
This historical financial data supports a wide range of use cases including KYC and AML compliance, credit risk assessment, M&A research, financial modeling, competitive benchmarking, AI/ML training, and market segmentation. Whether you’re building a predictive scoring model or assessing long-term financial health, our data gives you the clarity and depth required for smarter decisions.
Delivery is flexible to suit your needs: access files in Excel or CSV, browse through our self-service platform, integrate via real-time API, or enhance your existing datasets through custom enrichment services. With access to 380 million verified companies across all industries and geographies, CompanyData.com (BoldData) provides the scale, precision, and historical context to power your next move—globally.
Every public company publishes a financial report to declare the financial activities and position of a business. This financial statement contains many tables to present the information. We classify these tables into predefined categories, such as below.
1) Income Statements 2) Balance Sheets 3) Cash Flows 4) Notes 5) Others
Datasets: Within the given dataset you will find 5 folders with the above category names. Every folder contains .html files with respective tabular data.
Expecting the grouping of documents in such a way that the files appear distinguished as per their category. The categories can only be used as a benchmark for evaluation later.
Data extracted: The data has been taken from the Publically available Hexaware Technologies financial annual reports. You can find here on link https://hexaware.com/investors/
Thank you for your Patience, Enjoy the dataset and Explore and learn more. Peace out✌️
The Financial Statements of Holding Companies (FR Y-9 Reports) collects standardized financial statements from domestic holding companies (HCs). This is pursuant to the Bank Holding Company Act of 1956, as amended (BHC Act), and the Home Owners Loan Act (HOLA). The FR Y-9C is used to identify emerging financial risks and monitor the safety and soundness of HC operations. HCs file the FR Y-9C and FR Y-9LP quarterly, the FR Y-9SP semiannually, the FR Y-9ES annually, and the FR Y-9CS on a schedule that is determined when this supplement is used.
Comprehensive database of over 100,000 financial filings from 8,000+ European companies
Problem Statement 👉 Download the case studies here A financial services firm faced inefficiencies in generating accurate and timely financial reports. The manual reporting process was labor-intensive, prone to errors, and delayed decision-making. With increasing data complexity and regulatory requirements, the firm sought an automated solution to streamline financial reporting while maintaining high accuracy. Challenge Implementing an automated financial reporting system involved addressing… See the full description on the dataset page: https://huggingface.co/datasets/globosetechnology12/Automated-Financial-Reporting.
https://brightdata.com/licensehttps://brightdata.com/license
Stay informed with our comprehensive Financial News Dataset, designed for investors, analysts, and businesses to track market trends, monitor financial events, and make data-driven decisions.
Dataset Features
Financial News Articles: Access structured financial news data, including headlines, summaries, full articles, publication dates, and source details. Market & Economic Indicators: Track financial reports, stock market updates, economic forecasts, and corporate earnings announcements. Sentiment & Trend Analysis: Analyze news sentiment, categorize articles by financial topics, and monitor emerging trends in global markets. Historical & Real-Time Data: Retrieve historical financial news archives or access continuously updated feeds for real-time insights.
Customizable Subsets for Specific Needs Our Financial News Dataset is fully customizable, allowing you to filter data based on publication date, region, financial topics, sentiment, or specific news sources. Whether you need broad coverage for market research or focused data for investment analysis, we tailor the dataset to your needs.
Popular Use Cases
Investment Strategy & Risk Management: Monitor financial news to assess market risks, identify investment opportunities, and optimize trading strategies. Market & Competitive Intelligence: Track industry trends, competitor financial performance, and economic developments. AI & Machine Learning Training: Use structured financial news data to train AI models for sentiment analysis, stock prediction, and automated trading. Regulatory & Compliance Monitoring: Stay updated on financial regulations, policy changes, and corporate governance news. Economic Research & Forecasting: Analyze financial news trends to predict economic shifts and market movements.
Whether you're tracking stock market trends, analyzing financial sentiment, or training AI models, our Financial News Dataset provides the structured data you need. Get started today and customize your dataset to fit your business objectives.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
FinanceQA Dataset
📌 Overview
FinanceQA is a curated dataset of financial question-answer pairs extracted from company annual reports, balance sheets, and financial statements. It is designed to support Question Answering (QA), Retrieval-Augmented Generation (RAG), and other NLP applications in financial analysis. The dataset contains ~4,000 entries across multiple companies and years, with structured fields for queries, answers, and contextual excerpts.
📂… See the full description on the dataset page: https://huggingface.co/datasets/sweatSmile/FinanceQA.
https://www.lseg.com/en/policies/website-disclaimerhttps://www.lseg.com/en/policies/website-disclaimer
Browse LSEG's US Company Filings Database, and find a range of filings content and history including annual reports, municipal bonds, and more.
https://www.lseg.com/en/policies/website-disclaimerhttps://www.lseg.com/en/policies/website-disclaimer
Company fundamentals data provides the user with a company's current financial health and when combined historically, the financial 'life-story' of the company.
The Corporate Financial Fraud project is a study of company and top-executive characteristics of firms that ultimately violated Securities and Exchange Commission (SEC) financial accounting and securities fraud provisions compared to a sample of public companies that did not. The fraud firm sample was identified through systematic review of SEC accounting enforcement releases from 2005-2010, which included administrative and civil actions, and referrals for criminal prosecution that were identified through mentions in enforcement release, indictments, and news searches. The non-fraud firms were randomly selected from among nearly 10,000 US public companies censused and active during at least one year between 2005-2010 in Standard and Poor's Compustat data. The Company and Top-Executive (CEO) databases combine information from numerous publicly available sources, many in raw form that were hand-coded (e.g., for fraud firms: Accounting and Auditing Enforcement Releases (AAER) enforcement releases, investigation summaries, SEC-filed complaints, litigation proceedings and case outcomes). Financial and structural information on companies for the year leading up to the financial fraud (or around year 2000 for non-fraud firms) was collected from Compustat financial statement data on Form 10-Ks, and supplemented by hand-collected data from original company 10-Ks, proxy statements, or other financial reports accessed via Electronic Data Gathering, Analysis, and Retrieval (EDGAR), SEC's data-gathering search tool. For CEOs, data on personal background characteristics were collected from Execucomp and BoardEx databases, supplemented by hand-collection from proxy-statement biographies.
https://www.usa.gov/government-works/https://www.usa.gov/government-works/
This dataset is from the SEC's Financial Statements and Notes Data Set.
It was a personal project to see if I could make the queries efficient.
It's just been collecting dust ever since, maybe someone will make good use of it.
Data is up to about early-2024.
It doesn't differ from the source, other than it's compiled - so maybe you can try it out, then compile your own (with the link below).
Dataset was created using SEC Files and SQL Server on Docker.
For details on the SQL Server database this came from, see: "dataset-previous-life-info" folder, which will contain:
- Row Counts
- Primary/Foreign Keys
- SQL Statements to recreate database tables
- Example queries on how to join the data tables.
- A pretty picture of the table associations.
Source: https://www.sec.gov/data-research/financial-statement-notes-data-sets
Happy coding!
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The ArGiMI Ardian datasets : Text only
The ArGiMi project is committed to open-source principles and data sharing. Thanks to our generous partners, we are releasing several valuable datasets to the public.
Dataset description
This dataset comprises 11,000 financial annual reports, written in english, meticulously extracted from their original PDF format to provide a valuable resource for researchers and developers in financial analysis and natural language… See the full description on the dataset page: https://huggingface.co/datasets/artefactory/Argimi-Ardian-Finance-10k-text.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset offers a detailed collection of US-GAAP financial data extracted from the financial statements of exchange-listed U.S. companies, as submitted to the U.S. Securities and Exchange Commission (SEC) via the EDGAR database. Covering filings from January 2009 onwards, this dataset provides key financial figures reported by companies in accordance with U.S. Generally Accepted Accounting Principles (GAAP).
This dataset primarily relies on the SEC's Financial Statement Data Sets and EDGAR APIs: - SEC Financial Statement Data Sets - EDGAR Application Programming Interfaces
In instances where specific figures were missing from these sources, data was directly extracted from the companies' financial statements to ensure completeness.
Please note that the dataset presents financial figures exactly as reported by the companies, which may occasionally include errors. A common issue involves incorrect reporting of scaling factors in the XBRL format. XBRL supports two tag attributes related to scaling: 'decimals' and 'scale.' The 'decimals' attribute indicates the number of significant decimal places but does not affect the actual value of the figure, while the 'scale' attribute adjusts the value by a specific factor.
However, there are several instances, numbering in the thousands, where companies have incorrectly used the 'decimals' attribute (e.g., 'decimals="-6"') under the mistaken assumption that it controls scaling. This is not correct, and as a result, some figures may be inaccurately scaled. This dataset does not attempt to detect or correct such errors; it aims to reflect the data precisely as reported by the companies. A future version of the dataset may be introduced to address and correct these issues.
The source code for data extraction is available here