100+ datasets found

Financial Statements - Dataset - CRO
opendata.cro.ie
Updated Feb 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
opendata.cro.ie (2025). Financial Statements - Dataset - CRO [Dataset]. https://opendata.cro.ie/dataset/financial-statements
Explore at:
Dataset updated
Feb 13, 2025
Dataset provided by
Companies Registration Office
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset provides a structured and machine-readable collection of financial statements filed with the Companies Registration Office (CRO) in Ireland. It currently includes financial statements for the year 2022, with additional years to be added as they become available. The dataset aligns with the European Union’s Open Data Directive (Directive (EU) 2019/1024) and the Implementing Regulation (EU) 2023/138, which designates company and company ownership data as a high-value dataset. It is available for bulk download and API access under the Creative Commons Attribution 4.0 (CC BY 4.0) licence, allowing unrestricted reuse with appropriate attribution. By increasing transparency and enabling data-driven insights, this dataset supports public sector initiatives, financial analysis, and digital services development. The API endpoints can be accessed using these links - Query - https://opendata.cro.ie/api/3/action/datastore_search Query (via SQL) - https://opendata.cro.ie/api/3/action/datastore_search_sql
d
Financial Statement Data Sets
catalog.data.gov
s.cnmilf.com
Updated Jul 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Economic and Risk Analysis (2025). Financial Statement Data Sets [Dataset]. https://catalog.data.gov/dataset/financial-statement-data-sets
Explore at:
Dataset updated
Jul 9, 2025
Dataset provided by
Economic and Risk Analysis
Description
The data sets below provide selected information extracted from exhibits to corporate financial reports filed with the Commission using eXtensible Business Reporting Language (XBRL).
Financial Sheets Dataset
kaggle.com
Updated Nov 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prashant Kumar Mishra (2024). Financial Sheets Dataset [Dataset]. https://www.kaggle.com/datasets/pacificrm/financial-sheets
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 23, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Prashant Kumar Mishra
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset offers a detailed and organized set of financial data, enabling users to analyze company performance, conduct stock market research, and develop predictive models. It spans multiple financial aspects, such as annual and quarterly profit and loss statements, balance sheets, cash flow data, financial ratios, and market prices.

The data is structured to support time-series analysis, with datasets covering financial metrics at T0 (financial statements) and T1 (market prices).

This makes it particularly useful for applications requiring cross-temporal insights or forecasting.
h
financial-reports-sec
huggingface.co
Updated Sep 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aman Khan (2023). financial-reports-sec [Dataset]. https://huggingface.co/datasets/JanosAudran/financial-reports-sec
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 15, 2023
Authors
Aman Khan
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
The dataset contains the annual report of US public firms filing with the SEC EDGAR system. Each annual report (10K filing) is broken into 20 sections. Each section is split into individual sentences. Sentiment labels are provided on a per filing basis from the market reaction around the filing data. Additional metadata for each filing is included in the dataset.
SEC Financial Statement Data Sets
kaggle.com
Updated Nov 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dominic Malouf (2024). SEC Financial Statement Data Sets [Dataset]. https://www.kaggle.com/datasets/dominicmalouf/sec-financial-statement-data-sets
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 12, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Dominic Malouf
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset contains information found in the 10-K annual reports filed by companies in the US. It comes from the SEC official website found here. I scraped the data in a jupyter notebook and kept only a few of the important financial line items (there are 300+ for some 10k reports). No 10-K/A amendments were taken into account, so some information could be incorrect. In other words, don't bet the farm on a trading model built with this data. The price data was collected from the yfinance python API.
Consolidated Financial Statements for Bank Holding Companies, Parent Company...
catalog.data.gov
s.cnmilf.com
Updated Dec 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Board of Governors of the Federal Reserve System (2024). Consolidated Financial Statements for Bank Holding Companies, Parent Company Only Financial Statements for Large Holding Companies, Parent Company Only Financial Statements for Small Holding Companies, Financial Statements Employee Stock Ownership Plan Holding Companies, Supplement to the Consolidated Financial Statements for Bank Holding Companies [Dataset]. https://catalog.data.gov/dataset/consolidated-financial-statements-for-bank-holding-companies-parent-company-only-financial
Explore at:
Dataset updated
Dec 18, 2024
Dataset provided by
Federal Reserve Systemhttp://www.federalreserve.gov/
Federal Reserve Board of Governors
Description
The Financial Statements of Holding Companies (FR Y-9 Reports) collects standardized financial statements from domestic holding companies (HCs). This is pursuant to the Bank Holding Company Act of 1956, as amended (BHC Act), and the Home Owners Loan Act (HOLA). The FR Y-9C is used to identify emerging financial risks and monitor the safety and soundness of HC operations. HCs file the FR Y-9C and FR Y-9LP quarterly, the FR Y-9SP semiannually, the FR Y-9ES annually, and the FR Y-9CS on a schedule that is determined when this supplement is used.
h
FinanceQA
huggingface.co
Updated Aug 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
amitk17 (2025). FinanceQA [Dataset]. https://huggingface.co/datasets/sweatSmile/FinanceQA
Explore at:
Dataset updated
Aug 21, 2025
Authors
amitk17
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
FinanceQA Dataset

📌 Overview

FinanceQA is a curated dataset of financial question-answer pairs extracted from company annual reports, balance sheets, and financial statements. It is designed to support Question Answering (QA), Retrieval-Augmented Generation (RAG), and other NLP applications in financial analysis. The dataset contains ~4,000 entries across multiple companies and years, with structured fields for queries, answers, and contextual excerpts.

📂… See the full description on the dataset page: https://huggingface.co/datasets/sweatSmile/FinanceQA.
a
S.Korea Financial statements datasets
aiceltech.com
Updated Dec 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
KED Aicel (2024). S.Korea Financial statements datasets [Dataset]. https://www.aiceltech.com/datasets/financial-statements
Explore at:
Dataset updated
Dec 30, 2024
Dataset authored and provided by
KED Aicel
License
https://www.aiceltech.com/termshttps://www.aiceltech.com/terms
Time period covered
2016 - 2024
Area covered
South Korea
Description
Korean Companies’ Financial Data provides important information to analyze a company’s financial status and performance. This data includes financial indicators such as revenue, expenses, assets, and liabilities. Collected from corporate financial reports and stock market data, it helps investors evaluate financial health and discover investment opportunities, essential for valuing Korean companies.
Company Financial Data | Private & Public Companies | Verified Profiles &...
datarade.ai
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Success.ai, Company Financial Data | Private & Public Companies | Verified Profiles & Contact Data | Best Price Guaranteed [Dataset]. https://datarade.ai/data-products/b2b-contact-data-premium-us-contact-data-us-b2b-contact-d-success-ai
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset provided by
Area covered
United Kingdom, Dominican Republic, Suriname, Antigua and Barbuda, Togo, Guam, Korea (Democratic People's Republic of), Montserrat, Iceland, Georgia
Description
Success.ai offers a cutting-edge solution for businesses and organizations seeking Company Financial Data on private and public companies. Our comprehensive database is meticulously crafted to provide verified profiles, including contact details for financial decision-makers such as CFOs, financial analysts, corporate treasurers, and other key stakeholders. This robust dataset is continuously updated and validated using AI technology to ensure accuracy and relevance, empowering businesses to make informed decisions and optimize their financial strategies.

Key Features of Success.ai's Company Financial Data:

Global Coverage: Access data from over 70 million businesses worldwide, including public and private companies across all major industries and regions. Our datasets span 250+ countries, offering extensive reach for your financial analysis and market research.

Detailed Financial Profiles: Gain insights into company financials, including revenue, profit margins, funding rounds, and operational costs. Profiles are enriched with key contact details, including work emails, phone numbers, and physical addresses, ensuring direct access to decision-makers.

Industry-Specific Data: Tailored datasets for sectors such as financial services, manufacturing, technology, healthcare, and energy, among others. Each dataset is customized to meet the unique needs of industry professionals and analysts.

Real-Time Accuracy: With continuous updates powered by AI-driven validation, our financial data maintains a 99% accuracy rate, ensuring you have access to the most reliable and up-to-date information available.

Compliance and Security: All data is collected and processed in strict adherence to global compliance standards, including GDPR, ensuring ethical and lawful usage.

Why Choose Success.ai for Company Financial Data?

Best Price Guarantee: We pride ourselves on offering the most competitive pricing in the industry, ensuring you receive unparalleled value for comprehensive financial data.

AI-Validated Accuracy: Our advanced AI algorithms meticulously verify every data point to ensure precision and reliability, helping you avoid costly errors in your financial decision-making.

Customized Data Solutions: Whether you need data for a specific region, industry, or type of business, we tailor our datasets to align perfectly with your requirements.

Scalable Data Access: From small startups to global enterprises, our platform caters to businesses of all sizes, delivering scalable solutions to suit your operational needs.

Comprehensive Use Cases for Financial Data:

Strategic Financial Planning:

Leverage our detailed financial profiles to create accurate budgets, forecasts, and strategic plans. Gain insights into competitors’ financial health and market positions to make data-driven decisions.

Mergers and Acquisitions (M&A):

Access key financial details and contact information to streamline your M&A processes. Identify potential acquisition targets or partners with verified profiles and financial data.

Investment Analysis:

Evaluate the financial performance of public and private companies for informed investment decisions. Use our data to identify growth opportunities and assess risk factors.

Lead Generation and Sales:

Enhance your sales outreach by targeting CFOs, financial analysts, and other decision-makers with verified contact details. Utilize accurate email and phone data to increase conversion rates.

Market Research:

Understand market trends and financial benchmarks with our industry-specific datasets. Use the data for competitive analysis, benchmarking, and identifying market gaps.

APIs to Power Your Financial Strategies:

Enrichment API: Integrate real-time updates into your systems with our Enrichment API. Keep your financial data accurate and current to drive dynamic decision-making and maintain a competitive edge.

Lead Generation API: Supercharge your lead generation efforts with access to verified contact details for key financial decision-makers. Perfect for personalized outreach and targeted campaigns.

Tailored Solutions for Industry Professionals:

Financial Services Firms: Gain detailed insights into revenue streams, funding rounds, and operational costs for competitor analysis and client acquisition.

Corporate Finance Teams: Enhance decision-making with precise data on industry trends and benchmarks.

Consulting Firms: Deliver informed recommendations to clients with access to detailed financial datasets and key stakeholder profiles.

Investment Firms: Identify potential investment opportunities with verified data on financial performance and market positioning.

What Sets Success.ai Apart?

Extensive Database: Access detailed financial data for 70M+ companies worldwide, including small businesses, startups, and large corporations.

Ethical Practices: Our data collection and processing methods are fully comp...
Data from: SEC Filings
kaggle.com
zip
Updated Jun 5, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2020). SEC Filings [Dataset]. https://www.kaggle.com/datasets/bigquery/sec-filings
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Jun 5, 2020
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
Description
In the U.S. public companies, certain insiders and broker-dealers are required to regularly file with the SEC. The SEC makes this data available online for anybody to view and use via their Electronic Data Gathering, Analysis, and Retrieval (EDGAR) database. The SEC updates this data every quarter going back to January, 2009. For more information please see this site.

To aid analysis a quick summary view of the data has been created that is not available in the original dataset. The quick summary view pulls together signals into a single table that otherwise would have to be joined from multiple tables and enables a more streamlined user experience.

DISCLAIMER: The Financial Statement and Notes Data Sets contain information derived from structured data filed with the Commission by individual registrants as well as Commission-generated filing identifiers. Because the data sets are derived from information provided by individual registrants, we cannot guarantee the accuracy of the data sets. In addition, it is possible inaccuracies or other errors were introduced into the data sets during the process of extracting the data and compiling the data sets. Finally, the data sets do not reflect all available information, including certain metadata associated with Commission filings. The data sets are intended to assist the public in analyzing data contained in Commission filings; however, they are not a substitute for such filings. Investors should review the full Commission filings before making any investment decision.
S&P Compustat Database
lseg.com
sql
Updated Nov 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LSEG (2024). S&P Compustat Database [Dataset]. https://www.lseg.com/en/data-analytics/financial-data/company-data/fundamentals-data/standardized-fundamentals/sp-compustat-database
Explore at:
sqlAvailable download formats
Dataset updated
Nov 25, 2024
Dataset provided by
London Stock Exchange Grouphttp://www.londonstockexchangegroup.com/
Authors
LSEG
License
https://www.lseg.com/en/policies/website-disclaimerhttps://www.lseg.com/en/policies/website-disclaimer
Description
Access historical and point-in-time financial statements, ratios, multiples, and press releases, with LSEG's S&P Compustat Database.
d
CompanyData.com (BoldData) - Historical Financial Data For 230M Companies...
datarade.ai
Updated Apr 15, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CompanyData.com (BoldData) (2021). CompanyData.com (BoldData) - Historical Financial Data For 230M Companies Worldwide [Dataset]. https://datarade.ai/data-products/custom-made-historical-financial-data-for-230m-companies-worldwide-bolddata
Explore at:
.json, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Apr 15, 2021
Dataset authored and provided by
CompanyData.com (BoldData)
Area covered
Ascension and Tristan da Cunha, Slovakia, Turkey, Algeria, Angola, Russian Federation, French Polynesia, Cook Islands, Solomon Islands, Tonga
Description
At CompanyData.com (BoldData), we specialize in delivering high-quality company data sourced directly from official trade registers. Our extensive dataset includes historical financial records for over 230 million companies worldwide, enabling deeper insight into business performance over time. Whether you're benchmarking companies, training AI models, or building risk profiles, our financial data equips you with the long-term perspective you need.

Our financial database includes multi-year balance sheets, profit and loss statements, and key performance indicators such as revenue, net income, assets, liabilities, and equity. We provide standardized and structured data—backed by rigorous validation processes—to ensure consistency and accuracy across jurisdictions. Each financial profile can be enriched with hierarchical data, firmographics, contact details, and industry classifications to support complex analyses.

This historical financial data supports a wide range of use cases including KYC and AML compliance, credit risk assessment, M&A research, financial modeling, competitive benchmarking, AI/ML training, and market segmentation. Whether you’re building a predictive scoring model or assessing long-term financial health, our data gives you the clarity and depth required for smarter decisions.

Delivery is flexible to suit your needs: access files in Excel or CSV, browse through our self-service platform, integrate via real-time API, or enhance your existing datasets through custom enrichment services. With access to 380 million verified companies across all industries and geographies, CompanyData.com (BoldData) provides the scale, precision, and historical context to power your next move—globally.
Taiwan Finance Report VS Stock (台灣財報與股票)
kaggle.com
Updated Dec 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Othern (2023). Taiwan Finance Report VS Stock (台灣財報與股票) [Dataset]. https://www.kaggle.com/datasets/othern/mda-final-project
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 27, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Othern
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Area covered
Taiwan
Description
English Description

Kaggle Dataset Data Card Update

New Datasets Added:

training_data.csv: This dataset consists of key financial metrics such as ROE, turnover rate, and R&D investment. The target variables (y) include SeasonReturn(%), TwoMonthReturn(%), MonthReturn(%), and WeekReturn(%). It is designed to facilitate the analysis of financial performance and short-term returns.

training_data2.csv: Features last year's and last quarter's financial statement metrics, with the target variable (y) being this quarter's NetOperatingRevenue. This dataset is particularly useful for year-over-year and quarter-over-quarter financial analysis.

semiconductor.csv: A specialized dataset focusing on Taiwan's key semiconductor industry. It categorizes companies into upstream, midstream, and downstream sectors, providing a comprehensive overview of this critical industry segment.

Existing Datasets:

fsdata2.csv: A comprehensive dataset covering the financial metrics of various Taiwanese companies. It includes detailed financial statements with 146 columns featuring cash flow, asset valuation, and other key financial indicators. The dataset spans 2010-03 ~ 2023-09, providing a deep dive into the financial health and performance of companies listed in Taiwan.

spdata.csv: This dataset captures daily stock market data for Taiwanese companies. It includes opening, highest, lowest, and closing prices, along with trading volume and value. The dataset offers insights into the stock market trends and performance of key players in the Taiwanese market over 2010-01-04 ~ 2023-11-28.

Language Note:

The datasets are primarily in Traditional Chinese with some financial terms in English, suitable for analysts and researchers focusing on the Taiwanese market.

Traditinal Chinese Description

Kaggle 數據集數據卡更新

新增數據集：

training_data.csv：此數據集包含了重點財務指標，如 ROE、周轉率和研發投資等。目標變量（y）包括 SeasonReturn(%)（季度回報率）、TwoMonthReturn(%)（兩月回報率）、MonthReturn(%)（月回報率）和 WeekReturn(%)（周回報率）。這個數據集旨在促進對財務績效和短期回報的分析。

training_data2.csv：特點是去年同季和上個季度的財務報表指標，目標變量（y）為本季的 NetOperatingRevenue（營業收入）。此數據集特別適用於進行年比年和季比季的財務分析。

semiconductor.csv：專注於台灣關鍵半導體產業的專門數據集。它將公司分類為上游、中游和下游部門，提供了對這一重要行業部門的全面概覽。

現有數據集：

fsdata2.csv：涵蓋各台灣公司財務指標的綜合數據集。包括詳細的財務報表，共146欄，特點是現金流、資產評估和其他關鍵財務指標。數據集時間範圍為 2010-03 ~ 2023-09，深入探討台灣上市公司的財務健康和績效。

spdata.csv：捕捉台灣公司日常股市數據的數據集。包括開盤價、最高價、最低價和收盤價，以及交易量和價值。該數據集提供了對 2010-01-04 ~ 2023-11-28 期間台灣市場主要參與者的股市趨勢和表現的洞察。

語言說明：

數據集主要為繁體中文，部分財務術語為英文，適合專注於台灣市場的分析師和研究人員。
Financial Statements of Foreign Subsidiaries of U.S. Banking Organizations
res1catalogd-o-tdatad-o-tgov.vcapture.xyz
datasets.ai
+2more
Updated Dec 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Board of Governors of the Federal Reserve System (2024). Financial Statements of Foreign Subsidiaries of U.S. Banking Organizations [Dataset]. https://res1catalogd-o-tdatad-o-tgov.vcapture.xyz/dataset/financial-statements-of-foreign-subsidiaries-of-u-s-banking-organizations
Explore at:
Dataset updated
Dec 18, 2024
Dataset provided by
Federal Reserve Systemhttp://www.federalreserve.gov/
Federal Reserve Board of Governors
Description
These reports collect selected financial information for direct or indirect foreign subsidiaries of U.S. state member banks (SMBs), Edge and agreement corporations, and bank holding companies (BHCs). The FR 2314 consists of a balance sheet and income statement; information on changes in equity capital, changes in the allowance for loan and lease losses, off-balance-sheet items, and loans; and a memoranda section. The FR 2314S collects four financial data items for smaller, less complex subsidiaries. (Note: The Report of Condition for Foreign Subsidiaries of U.S. Banking Organizations, FR 2314a and FR 2314c have been replaced by the FR 2314 and FR 2314S. and the FR 2314b has been discontinued.
T
Financial Report Data of 437 Company in Indonesia
dataverse.telkomuniversity.ac.id
tsv
Updated Apr 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Telkom University Dataverse (2024). Financial Report Data of 437 Company in Indonesia [Dataset]. http://doi.org/10.34820/FK2/ZT2PEC
Explore at:
tsv(28773)Available download formats
Unique identifier
https://doi.org/10.34820/FK2/ZT2PEC
Dataset updated
Apr 6, 2024
Dataset provided by
Telkom University Dataverse
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Dataset Financial Report of 437 Company in Indonesia
g
Financial Statement Data Sets Archive | gimi9.com
gimi9.com
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Financial Statement Data Sets Archive | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_financial-statement-data-sets-archive/
Explore at:
Description
The data sets below provide selected information extracted from exhibits to corporate financial reports filed with the Commission using eXtensible Business Reporting Language (XBRL).
Public sector company financial statements - Dataset - Publications |...
publications.qld.gov.au
Updated Jun 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
www.publications.qld.gov.au (2025). Public sector company financial statements - Dataset - Publications | Queensland Government [Dataset]. https://www.publications.qld.gov.au/dataset/public-sector-company-financial-statements
Explore at:
Dataset updated
Jun 16, 2025
Dataset provided by
Queensland Governmenthttp://qld.gov.au/
Area covered
Queensland Government, Queensland
Description
Under the Company Financial Reporting in the Queensland Public Sector policy, public sector companies without their own websites must publish their statements on the site of their controlling entity. The following financial statements are for public sector companies controlled by Queensland Treasury.
Z
Data from: Russian Financial Statements Database: A firm-level collection of...
data.niaid.nih.gov
Updated Mar 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ledenev, Victor (2025). Russian Financial Statements Database: A firm-level collection of the universe of financial statements [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_14622208
Explore at:
Dataset updated
Mar 14, 2025
Dataset provided by
Ledenev, Victor
Bondarkov, Sergey
Skougarevskiy, Dmitriy
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Area covered
Russia
Description
The Russian Financial Statements Database (RFSD) is an open, harmonized collection of annual unconsolidated financial statements of the universe of Russian firms:

🔓 First open data set with information on every active firm in Russia.

🗂️ First open financial statements data set that includes non-filing firms.

🏛️ Sourced from two official data providers: the Rosstat and the Federal Tax Service.

📅 Covers 2011-2023 initially, will be continuously updated.

🏗️ Restores as much data as possible through non-invasive data imputation, statement articulation, and harmonization.

The RFSD is hosted on 🤗 Hugging Face and Zenodo and is stored in a structured, column-oriented, compressed binary format Apache Parquet with yearly partitioning scheme, enabling end-users to query only variables of interest at scale.

The accompanying paper provides internal and external validation of the data: http://arxiv.org/abs/2501.05841.

Here we present the instructions for importing the data in R or Python environment. Please consult with the project repository for more information: http://github.com/irlcode/RFSD.

Importing The Data

You have two options to ingest the data: download the .parquet files manually from Hugging Face or Zenodo or rely on 🤗 Hugging Face Datasets library.

Python

🤗 Hugging Face Datasets

It is as easy as:

from datasets import load_dataset import polars as pl

This line will download 6.6GB+ of all RFSD data and store it in a 🤗 cache folder

RFSD = load_dataset('irlspbru/RFSD')

Alternatively, this will download ~540MB with all financial statements for 2023# to a Polars DataFrame (requires about 8GB of RAM)

RFSD_2023 = pl.read_parquet('hf://datasets/irlspbru/RFSD/RFSD/year=2023/*.parquet')

Please note that the data is not shuffled within year, meaning that streaming first n rows will not yield a random sample.

Local File Import

Importing in Python requires pyarrow package installed.

import pyarrow.dataset as ds import polars as pl

Read RFSD metadata from local file

RFSD = ds.dataset("local/path/to/RFSD")

Use RFSD_dataset.schema to glimpse the data structure and columns' classes

print(RFSD.schema)

Load full dataset into memory

RFSD_full = pl.from_arrow(RFSD.to_table())

Load only 2019 data into memory

RFSD_2019 = pl.from_arrow(RFSD.to_table(filter=ds.field('year') == 2019))

Load only revenue for firms in 2019, identified by taxpayer id

RFSD_2019_revenue = pl.from_arrow( RFSD.to_table( filter=ds.field('year') == 2019, columns=['inn', 'line_2110'] ) )

Give suggested descriptive names to variables

renaming_df = pl.read_csv('local/path/to/descriptive_names_dict.csv') RFSD_full = RFSD_full.rename({item[0]: item[1] for item in zip(renaming_df['original'], renaming_df['descriptive'])})

R

Local File Import

Importing in R requires arrow package installed.

library(arrow) library(data.table)

Read RFSD metadata from local file

RFSD <- open_dataset("local/path/to/RFSD")

Use schema() to glimpse into the data structure and column classes

schema(RFSD)

Load full dataset into memory

scanner <- Scanner$create(RFSD) RFSD_full <- as.data.table(scanner$ToTable())

Load only 2019 data into memory

scan_builder <- RFSD$NewScan() scan_builder$Filter(Expression$field_ref("year") == 2019) scanner <- scan_builder$Finish() RFSD_2019 <- as.data.table(scanner$ToTable())

Load only revenue for firms in 2019, identified by taxpayer id

scan_builder <- RFSD$NewScan() scan_builder$Filter(Expression$field_ref("year") == 2019) scan_builder$Project(cols = c("inn", "line_2110")) scanner <- scan_builder$Finish() RFSD_2019_revenue <- as.data.table(scanner$ToTable())

Give suggested descriptive names to variables

renaming_dt <- fread("local/path/to/descriptive_names_dict.csv") setnames(RFSD_full, old = renaming_dt$original, new = renaming_dt$descriptive)

Use Cases

🌍 For macroeconomists: Replication of a Bank of Russia study of the cost channel of monetary policy in Russia by Mogiliat et al. (2024) — interest_payments.md

🏭 For IO: Replication of the total factor productivity estimation by Kaukin and Zhemkova (2023) — tfp.md

🗺️ For economic geographers: A novel model-less house-level GDP spatialization that capitalizes on geocoding of firm addresses — spatialization.md

FAQ

Why should I use this data instead of Interfax's SPARK, Moody's Ruslana, or Kontur's Focus?hat is the data period?

To the best of our knowledge, the RFSD is the only open data set with up-to-date financial statements of Russian companies published under a permissive licence. Apart from being free-to-use, the RFSD benefits from data harmonization and error detection procedures unavailable in commercial sources. Finally, the data can be easily ingested in any statistical package with minimal effort.

What is the data period?

We provide financials for Russian firms in 2011-2023. We will add the data for 2024 by July, 2025 (see Version and Update Policy below).

Why are there no data for firm X in year Y?

Although the RFSD strives to be an all-encompassing database of financial statements, end users will encounter data gaps:

We do not include financials for firms that we considered ineligible to submit financial statements to the Rosstat/Federal Tax Service by law: financial, religious, or state organizations (state-owned commercial firms are still in the data).

Eligible firms may enjoy the right not to disclose under certain conditions. For instance, Gazprom did not file in 2022 and we had to impute its 2022 data from 2023 filings. Sibur filed only in 2023, Novatek — in 2020 and 2021. Commercial data providers such as Interfax's SPARK enjoy dedicated access to the Federal Tax Service data and therefore are able source this information elsewhere.

Firm may have submitted its annual statement but, according to the Uniform State Register of Legal Entities (EGRUL), it was not active in this year. We remove those filings.

Why is the geolocation of firm X incorrect?

We use Nominatim to geocode structured addresses of incorporation of legal entities from the EGRUL. There may be errors in the original addresses that prevent us from geocoding firms to a particular house. Gazprom, for instance, is geocoded up to a house level in 2014 and 2021-2023, but only at street level for 2015-2020 due to improper handling of the house number by Nominatim. In that case we have fallen back to street-level geocoding. Additionally, streets in different districts of one city may share identical names. We have ignored those problems in our geocoding and invite your submissions. Finally, address of incorporation may not correspond with plant locations. For instance, Rosneft has 62 field offices in addition to the central office in Moscow. We ignore the location of such offices in our geocoding, but subsidiaries set up as separate legal entities are still geocoded.

Why is the data for firm X different from https://bo.nalog.ru/?

Many firms submit correcting statements after the initial filing. While we have downloaded the data way past the April, 2024 deadline for 2023 filings, firms may have kept submitting the correcting statements. We will capture them in the future releases.

Why is the data for firm X unrealistic?

We provide the source data as is, with minimal changes. Consider a relatively unknown LLC Banknota. It reported 3.7 trillion rubles in revenue in 2023, or 2% of Russia's GDP. This is obviously an outlier firm with unrealistic financials. We manually reviewed the data and flagged such firms for user consideration (variable outlier), keeping the source data intact.

Why is the data for groups of companies different from their IFRS statements?

We should stress that we provide unconsolidated financial statements filed according to the Russian accounting standards, meaning that it would be wrong to infer financials for corporate groups with this data. Gazprom, for instance, had over 800 affiliated entities and to study this corporate group in its entirety it is not enough to consider financials of the parent company.

Why is the data not in CSV?

The data is provided in Apache Parquet format. This is a structured, column-oriented, compressed binary format allowing for conditional subsetting of columns and rows. In other words, you can easily query financials of companies of interest, keeping only variables of interest in memory, greatly reducing data footprint.

Version and Update Policy

Version (SemVer): 1.0.0.

We intend to update the RFSD annualy as the data becomes available, in other words when most of the firms have their statements filed with the Federal Tax Service. The official deadline for filing of previous year statements is April, 1. However, every year a portion of firms either fails to meet the deadline or submits corrections afterwards. Filing continues up to the very end of the year but after the end of April this stream quickly thins out. Nevertheless, there is obviously a trade-off between minimization of data completeness and version availability. We find it a reasonable compromise to query new data in early June, since on average by the end of May 96.7% statements are already filed, including 86.4% of all the correcting filings. We plan to make a new version of RFSD available by July.

Licence

Creative Commons License Attribution 4.0 International (CC BY 4.0).

Copyright © the respective contributors.

Citation

Please cite as:

@unpublished{bondarkov2025rfsd, title={{R}ussian {F}inancial {S}tatements {D}atabase}, author={Bondarkov, Sergey and Ledenev, Victor and Skougarevskiy, Dmitriy}, note={arXiv preprint arXiv:2501.05841}, doi={https://doi.org/10.48550/arXiv.2501.05841}, year={2025}}

Acknowledgments and Contacts

Data collection and processing: Sergey Bondarkov, sbondarkov@eu.spb.ru, Viktor Ledenev, vledenev@eu.spb.ru

Project conception, data validation, and use cases: Dmitriy Skougarevskiy, Ph.D.,
21st Century Corporate Financial Fraud, United States, 2005-2010
catalog.data.gov
res1catalogd-o-tdatad-o-tgov.vcapture.xyz
+1more
Updated Mar 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Justice (2025). 21st Century Corporate Financial Fraud, United States, 2005-2010 [Dataset]. https://catalog.data.gov/dataset/21st-century-corporate-financial-fraud-united-states-2005-2010-22a9e
Explore at:
Dataset updated
Mar 12, 2025
Dataset provided by
National Institute of Justicehttp://nij.ojp.gov/
Area covered
United States
Description
The Corporate Financial Fraud project is a study of company and top-executive characteristics of firms that ultimately violated Securities and Exchange Commission (SEC) financial accounting and securities fraud provisions compared to a sample of public companies that did not. The fraud firm sample was identified through systematic review of SEC accounting enforcement releases from 2005-2010, which included administrative and civil actions, and referrals for criminal prosecution that were identified through mentions in enforcement release, indictments, and news searches. The non-fraud firms were randomly selected from among nearly 10,000 US public companies censused and active during at least one year between 2005-2010 in Standard and Poor's Compustat data. The Company and Top-Executive (CEO) databases combine information from numerous publicly available sources, many in raw form that were hand-coded (e.g., for fraud firms: Accounting and Auditing Enforcement Releases (AAER) enforcement releases, investigation summaries, SEC-filed complaints, litigation proceedings and case outcomes). Financial and structural information on companies for the year leading up to the financial fraud (or around year 2000 for non-fraud firms) was collected from Compustat financial statement data on Form 10-Ks, and supplemented by hand-collected data from original company 10-Ks, proxy statements, or other financial reports accessed via Electronic Data Gathering, Analysis, and Retrieval (EDGAR), SEC's data-gathering search tool. For CEOs, data on personal background characteristics were collected from Execucomp and BoardEx databases, supplemented by hand-collection from proxy-statement biographies.
EDGAR XBRL
stanfordgsb.redivis.com
stanford.redivis.com
+1more
application/jsonl +7
Updated Sep 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford Graduate School of Business Library (2025). EDGAR XBRL [Dataset]. https://stanfordgsb.redivis.com/datasets/6rpv-9nmqw5tg2
Explore at:
application/jsonl, spss, sas, stata, parquet, arrow, csv, avroAvailable download formats
Dataset updated
Sep 9, 2025
Dataset provided by
Redivis Inc.
Authors
Stanford Graduate School of Business Library
Time period covered
Apr 15, 2009 - Jul 31, 2025
Description
Abstract

This dataset is a mirror of the Financial Statement and Notes Data Set (https://www.sec.gov/dera/data/financial-statement-and-notes-data-set.html) hosted by the SEC and is updated monthly.

Methodology

From this page:

%3E The Financial Statement and Notes Data Sets provide the text and detailed numeric information from all financial statements and their notes. This data is extracted from exhibits to corporate financial reports filed with the Commission using eXtensible Business Reporting Language (XBRL). As compared to the more compact Financial Statement Data Sets which provide only the numeric information from face financials, the Financial Statement and Notes Data Sets provide significantly more disclosure data. The information is presented without change from the "as filed" financial reports submitted by each registrant. The data is presented in a flattened format to help users analyze and compare corporate disclosure information over time and across registrants. The data sets also contain additional fields such as a company's Standard Industrial Classification to facilitate the data's use.

%3E DISCLAIMER: The Financial Statement and Notes Data Sets contain information derived from structured data filed with the Commission by individual registrants as well as Commission-generated filing identifiers. Because the data sets are derived from information provided by individual registrants, we cannot guarantee the accuracy of the data sets. In addition, it is possible inaccuracies or other errors were introduced into the data sets during the process of extracting the data and compiling the data sets. Finally, the data sets do not reflect all available information, including certain metadata associated with Commission filings. The data sets are intended to assist the public in analyzing data contained in Commission filings; however, they are not a substitute for such filings. Investors should review the full Commission filings before making any investment decision.

Once a month, the second-to-latest dump of data (ex: August 2022 dump is downloaded in October 2022) is downloaded from the page and then the tables are extracted and appended to the existing ones in this Redivis dataset.

Usage

Please refer to this documentation file created by the SEC, which provides documentation of scope, organization, file formats and table definitions.

Facebook

Twitter

Click to copy link

Link copied

Cite

opendata.cro.ie (2025). Financial Statements - Dataset - CRO [Dataset]. https://opendata.cro.ie/dataset/financial-statements

Financial Statements - Dataset - CRO

Explore at:

Dataset updated

Feb 13, 2025

Dataset provided by

Companies Registration Office

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset provides a structured and machine-readable collection of financial statements filed with the Companies Registration Office (CRO) in Ireland. It currently includes financial statements for the year 2022, with additional years to be added as they become available. The dataset aligns with the European Union’s Open Data Directive (Directive (EU) 2019/1024) and the Implementing Regulation (EU) 2023/138, which designates company and company ownership data as a high-value dataset. It is available for bulk download and API access under the Creative Commons Attribution 4.0 (CC BY 4.0) licence, allowing unrestricted reuse with appropriate attribution. By increasing transparency and enabling data-driven insights, this dataset supports public sector initiatives, financial analysis, and digital services development. The API endpoints can be accessed using these links - Query - https://opendata.cro.ie/api/3/action/datastore_search Query (via SQL) - https://opendata.cro.ie/api/3/action/datastore_search_sql

Clear search

Close search

Google apps

Main menu

Financial Statements - Dataset - CRO

Financial Statement Data Sets

Financial Sheets Dataset

financial-reports-sec

SEC Financial Statement Data Sets

Consolidated Financial Statements for Bank Holding Companies, Parent Company...

FinanceQA

S.Korea Financial statements datasets

Company Financial Data | Private & Public Companies | Verified Profiles &...

Data from: SEC Filings

S&P Compustat Database

CompanyData.com (BoldData) - Historical Financial Data For 230M Companies...

Taiwan Finance Report VS Stock (台灣財報與股票)

English Description

Kaggle Dataset Data Card Update

New Datasets Added:

Existing Datasets:

Language Note:

Traditinal Chinese Description

Kaggle 數據集數據卡更新

新增數據集：

現有數據集：

語言說明：

Financial Statements of Foreign Subsidiaries of U.S. Banking Organizations

Financial Report Data of 437 Company in Indonesia

Financial Statement Data Sets Archive | gimi9.com

Public sector company financial statements - Dataset - Publications |...

Data from: Russian Financial Statements Database: A firm-level collection of...

This line will download 6.6GB+ of all RFSD data and store it in a 🤗 cache folder

Alternatively, this will download ~540MB with all financial statements for 2023# to a Polars DataFrame (requires about 8GB of RAM)

Read RFSD metadata from local file

Use RFSD_dataset.schema to glimpse the data structure and columns' classes

Load full dataset into memory

Load only 2019 data into memory

Load only revenue for firms in 2019, identified by taxpayer id

Give suggested descriptive names to variables

Read RFSD metadata from local file

Use schema() to glimpse into the data structure and column classes

Load full dataset into memory

Load only 2019 data into memory

Load only revenue for firms in 2019, identified by taxpayer id

Give suggested descriptive names to variables

21st Century Corporate Financial Fraud, United States, 2005-2010

EDGAR XBRL

Abstract

Methodology

Usage

Financial Statements - Dataset - CRO