100+ datasets found

d
Financial Statement Data Sets
catalog.data.gov
s.cnmilf.com
Updated Jul 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Economic and Risk Analysis (2025). Financial Statement Data Sets [Dataset]. https://catalog.data.gov/dataset/financial-statement-data-sets
Explore at:
Dataset updated
Jul 9, 2025
Dataset provided by
Economic and Risk Analysis
Description
The data sets below provide selected information extracted from exhibits to corporate financial reports filed with the Commission using eXtensible Business Reporting Language (XBRL).
Historical financial datasets for Financial Analysis with Spyder workshop
figshare.com
txt
Updated Jul 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Spyder IDE (2021). Historical financial datasets for Financial Analysis with Spyder workshop [Dataset]. http://doi.org/10.6084/m9.figshare.14995215.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14995215.v1
Dataset updated
Jul 16, 2021
Dataset provided by
Figsharehttp://figshare.com/
Authors
Spyder IDE
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These three datasets provide closing price information for the following assets: Google, Apple, Microsoft, Netflix, Amazon, Pfizer, Astra Zeneca, Johnson & Johnson, ETH, BTC and LTC.The time period spans from 2012 to the end of 2020.
financial sentiment analysis dataset
kaggle.com
Updated Nov 17, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ujjwal Chowdhury (2022). financial sentiment analysis dataset [Dataset]. https://www.kaggle.com/datasets/ujjwalchowdhury/financial-sentiment-analysis-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 17, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ujjwal Chowdhury
Description
Dataset

This dataset was created by Ujjwal Chowdhury

Contents
h
financial-qa-dataset
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aditya Rane, financial-qa-dataset [Dataset]. https://huggingface.co/datasets/adityarane/financial-qa-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Aditya Rane
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
financial-qa-dataset

This dataset consists of Question-Answer_Context Pairs. It also consists of metadata for filtering the records.

Repo Structure

financial-qa-dataset ├── financial-qa-dataset.csv ├── metadata.csv ├── notebooks │ |── loading_dataset.ipynb │ |── Loading_dataset_huggingface.ipynb │ |── basic_rag_langchain_vertexai.ipynb │ |── basic_rag_with_evaluation.ipynb | ├── data |── Statements |── Reports… See the full description on the dataset page: https://huggingface.co/datasets/adityarane/financial-qa-dataset.
Financial_Risk
kaggle.com
Updated Jul 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Preetham Gouda (2024). Financial_Risk [Dataset]. https://www.kaggle.com/datasets/preethamgouda/financial-risk
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 23, 2024
Dataset provided by
Kaggle
Authors
Preetham Gouda
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The Financial Risk Assessment Dataset provides detailed information on individual financial profiles. It includes demographic, financial, and behavioral data to assess financial risk. The dataset features various columns such as income, credit score, and risk rating, with intentional imbalances and missing values to simulate real-world scenarios.
Data from: Company Financials Dataset
kaggle.com
Updated Aug 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Atharva Arya (2023). Company Financials Dataset [Dataset]. https://www.kaggle.com/datasets/atharvaarya25/financials
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 1, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Atharva Arya
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
This is a dataset that requires a lot of preprocessing with amazing EDA insights for a company. A dataset consisting of sales and profit data sorted by market segment and country/region.

Tips for pre-processing: 1. Check for column names and find error there itself!! 2. Remove '$' sign and '-' from all columns where they are present 3. Change datatype from objects to int after the above two. 4. Challenge: Try removing " , " (comma) from all numerical numbers. 5. Try plotting sales and profit with respect to timeline
Financial Statements of Major Companies(2009-2023)
kaggle.com
Updated Dec 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rishabh Patil (2023). Financial Statements of Major Companies(2009-2023) [Dataset]. https://www.kaggle.com/datasets/rish59/financial-statements-of-major-companies2009-2023
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 1, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Rishabh Patil
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
This is a compiled datasets comprising of data from various companies' 10-K annual reports and balance sheets. The data is a longitudinal or panel data, from year 2009-2022(/23) and also consists of a few bankrupt companies to help for investigating factors. The names of the companies are given according to their Stocks. Companies divided into specific categories.
Dataset .csv
figshare.com
Updated Oct 2, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roy Lee (2021). Dataset .csv [Dataset]. http://doi.org/10.6084/m9.figshare.14870139.v1
Explore at:
Unique identifier
https://doi.org/10.6084/m9.figshare.14870139.v1
Dataset updated
Oct 2, 2021
Dataset provided by
figshare
Authors
Roy Lee
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a dataset link to paper Improved Back Test of Magic Formula in Malaysian Stock Market Using Applied Programming and Online Quantitative Platform. Include the monthly return for all portfolio and return and risk statistics.
h
finance-alpaca-1k-train
huggingface.co
Updated Apr 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Poornima SS (2024). finance-alpaca-1k-train [Dataset]. https://huggingface.co/datasets/poornima9348/finance-alpaca-1k-train
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 26, 2024
Authors
Poornima SS
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This is the first 1k rows of the finance alpaca dataset in csv format. You may use this as train data. Another such pruned dataset with the next 1k rows is uploaded under poornima9348/finance-alpaca-1k-test.
d
CompanyData.com (BoldData) - Historical Financial Data For 230M Companies...
datarade.ai
Updated Apr 15, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CompanyData.com (BoldData) (2021). CompanyData.com (BoldData) - Historical Financial Data For 230M Companies Worldwide [Dataset]. https://datarade.ai/data-products/custom-made-historical-financial-data-for-230m-companies-worldwide-bolddata
Explore at:
.json, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Apr 15, 2021
Dataset authored and provided by
CompanyData.com (BoldData)
Area covered
Ascension and Tristan da Cunha, Slovakia, Algeria, Russian Federation, Angola, Turkey, Tonga, French Polynesia, Cook Islands, Solomon Islands
Description
At CompanyData.com (BoldData), we specialize in delivering high-quality company data sourced directly from official trade registers. Our extensive dataset includes historical financial records for over 230 million companies worldwide, enabling deeper insight into business performance over time. Whether you're benchmarking companies, training AI models, or building risk profiles, our financial data equips you with the long-term perspective you need.

Our financial database includes multi-year balance sheets, profit and loss statements, and key performance indicators such as revenue, net income, assets, liabilities, and equity. We provide standardized and structured data—backed by rigorous validation processes—to ensure consistency and accuracy across jurisdictions. Each financial profile can be enriched with hierarchical data, firmographics, contact details, and industry classifications to support complex analyses.

This historical financial data supports a wide range of use cases including KYC and AML compliance, credit risk assessment, M&A research, financial modeling, competitive benchmarking, AI/ML training, and market segmentation. Whether you’re building a predictive scoring model or assessing long-term financial health, our data gives you the clarity and depth required for smarter decisions.

Delivery is flexible to suit your needs: access files in Excel or CSV, browse through our self-service platform, integrate via real-time API, or enhance your existing datasets through custom enrichment services. With access to 380 million verified companies across all industries and geographies, CompanyData.com (BoldData) provides the scale, precision, and historical context to power your next move—globally.
Yahoo Finance Dataset
kaggle.com
Updated Jul 26, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yasir Hussein Shakir (2020). Yahoo Finance Dataset [Dataset]. https://www.kaggle.com/datasets/yasserhessein/yahoo-finance-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 26, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Yasir Hussein Shakir
Description
Dataset

This dataset was created by Yasir Hussein Shakir

Contents
C
Hospital Annual Financial Data - Selected Data & Pivot Tables
data.chhs.ca.gov
data.ca.gov
+6more
csv, data, doc, html +4
Updated Apr 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Health Care Access and Information (2025). Hospital Annual Financial Data - Selected Data & Pivot Tables [Dataset]. https://data.chhs.ca.gov/dataset/hospital-annual-financial-data-selected-data-pivot-tables
Explore at:
xlsx, pdf(383996), html, xlsx(750199), pdf(121968), xlsx(756356), xls(16002048), xlsx(768036), xlsx(754073), xlsx(769128), xlsx(763636), xls(920576), xls(44967936), xlsx(14714368), xlsx(758089), data, xls(18301440), pdf(333268), xls(51424256), pdf(310420), xlsx(765216), xls, xls(44933632), pdf(303198), csv(205488092), xlsx(752914), xls(14657536), doc, xls(51554816), pdf(258239), xlsx(770931), xlsx(771275), xls(19625472), zip, xls(19599360), xlsx(779866), xlsx(758376), xls(18445312), xlsx(777616), xlsx(782546), xls(19650048), xls(19577856), xlsx(790979)Available download formats
Dataset updated
Apr 23, 2025
Dataset authored and provided by
Department of Health Care Access and Information
Description
On an annual basis (individual hospital fiscal year), individual hospitals and hospital systems report detailed facility-level data on services capacity, inpatient/outpatient utilization, patients, revenues and expenses by type and payer, balance sheet and income statement.

Due to the large size of the complete dataset, a selected set of data representing a wide range of commonly used data items, has been created that can be easily managed and downloaded. The selected data file includes general hospital information, utilization data by payer, revenue data by payer, expense data by natural expense category, financial ratios, and labor information.

There are two groups of data contained in this dataset: 1) Selected Data - Calendar Year: To make it easier to compare hospitals by year, hospital reports with report periods ending within a given calendar year are grouped together. The Pivot Tables for a specific calendar year are also found here. 2) Selected Data - Fiscal Year: Hospital reports with report periods ending within a given fiscal year (July-June) are grouped together.
Test Data Dummy CSV
figshare.com
txt
Updated Nov 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tori Duckworth (2023). Test Data Dummy CSV [Dataset]. http://doi.org/10.6084/m9.figshare.24500965.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24500965.v2
Dataset updated
Nov 6, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Tori Duckworth
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This CSV represents a dummy dataset to test the functionality of trusted repository search capabilities and of research data governance practices. The associated dummy dissertation is entitled Financial Econometrics Dummy Dissertation. The dummy file is a 7KB CSV containing 5000 rows of notional demographic tabular data.
m
Computational Finance Research Dataset (1996-2020)
data.mendeley.com
narcis.nl
Updated Jul 1, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agung Purnomo (2021). Computational Finance Research Dataset (1996-2020) [Dataset]. http://doi.org/10.17632/d7k2852xnm.1
Explore at:
Unique identifier
https://doi.org/10.17632/d7k2852xnm.1
Dataset updated
Jul 1, 2021
Authors
Agung Purnomo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Computational Finance research & publication dataset, which was indexed by Scopus from 1996 to 2020. The dataset consist of 503 publication data in CSV format. The dataset contains data authors, authors ID Scopus, title, year, source title, volume, issue, article number in Scopus, DOI, link, affiliation, abstract, index keywords, references, correspondence Address, editors, publisher, conference name, conference date, conference code, ISSN, language, document type, access type, and EID.
f
Central Bank of Brazil data of foreign capital transfers, 2000-2011
su.figshare.com
researchdata.se
+1more
txt
Updated May 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alice Dauriach; Emma Sundström; Beatrice Crona; Victor Galaz (2023). Central Bank of Brazil data of foreign capital transfers, 2000-2011 [Dataset]. http://doi.org/10.17045/sthlmuni.5857716.v4
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.17045/sthlmuni.5857716.v4
Dataset updated
May 30, 2023
Dataset provided by
Stockholm University
Authors
Alice Dauriach; Emma Sundström; Beatrice Crona; Victor Galaz
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Area covered
Brazil
Description
This data set is a subset of the "Records of foreign capital" (Registros de capitais estrangeiros", RCE) published by the Central Bank of Brazil (CBB) on their website.The data set consists of three data files and three corresponding metadata files. All files are in openly accessible .csv or .txt formats. See detailed outline below for data contained in each. Data files contain transaction-specific data such as unique identifier, currency, cancelled status and amount. Metadata files outline variables in the corresponding data file.RCE_Unclean_full_dataset.csv - all transactions published to the Central Bank website from the four main categories outlined belowMetadata_Unclean_full_dataset.csvRCE_Unclean_cancelled_dataset.csv - data extracted from the RCE_Unclean_full_dataset.csv where transactions were registered then cancelledMetadata_Unclean_cancelled_dataset.csvRCE_Clean_selection_dataset.csv - transaction data extracted from RCE_Unclean_full_dataset.csv and RCE_Unclean_cancelled_dataset.csv for the nine companies and criteria identified belowMetadata_Clean_selection_dataset.csvThe data include the period between October 2000 and July 2011. This is the only time span for the data provided by the Central Bank of Brazil at this stage. The records were published monthly by the Central Bank of Brazil as required by Art. 66 in Decree nº 55.762 of 17 February 1965, modified by Decree nº 4.842 of 17 September 2003. The records were published on the bank’s website starting October 2000, as per communique nº 011489 of 7 October 2003. This remained the case until August 2011, after which the amount of each transaction was no longer disclosed (and publication of these stopped altogether after October 2011). The disclosure of the records was suspended in order to review their legal and technical aspects, and ensure their suitability to the requirements of the rules governing the confidentiality of the information (Law nº 12.527 of 18 November 2011 and Decree nº 7724 of May 2012) (pers. comm. Central Bank of Brazil, 2016. Name of contact available upon request to Authors).The records track transfers of foreign capital made from abroad to companies domiciled in Brazil, with information on the foreign company (name and country) transferring the money, and on the company receiving the capital (name and federative unit). For the purpose of this study, we consider the four categories of foreign capital transactions which are published with their amount and currency in the Central Bank’s data, and which are all part of the “Register of financial transactions” (abbreviated RDE-ROF): loans, leasing, financed import and cash in advance (see below for a detailed description). Additional categories exist, such as foreign direct investment (RDE-IED) and External Investment in Portfolio (RDE-Portfólio), for which no amount is published and which are therefore not included.We used the data posted online as PDFs on the bank’s website, and created a script to extract the data automatically from these four categories into the RCE_Unclean_full_dataset.csv file. This data set has not been double-checked manually and may contain errors. We used a similar script to extract rows from the "cancelled transactions" sections of the PDFs into the RCE_Unclean_cancelled_dataset.csv file. This is useful to identify transactions that have been registered to the Central Bank but later cancelled. This data set has not been double-checked manually and may contain errors.From these raw data sets, we conducted the following selections and calculations in order to create the RCE_Clean_selection_dataset.csv file. This data set has been double-checked manually to secure that no errors have been made in the extraction process.We selected all transactions whose recipient company name corresponds to one of these nine companies, or to one of their known subsidiaries in Brazil, according to the list of subsidiaries recorded in the Orbis database, maintained by Bureau Van Dijk. Transactions are included if the recipient company name matches one of the following:- the current or former name of one of the nine companies in our sample (former names are identified using Orbis, Bloomberg’s company profiles or the company website);- the name of a known subsidiary of one of the nine companies, if and only if we find evidence (in Orbis, Bloomberg’s company profiles or on the company website) that this subsidiary was owned at some point during the period 2000-2011, and that it operated in a sector related to the soy or beef industry (including fertilizers and trading activities).For each transaction, we extracted the name of the company sending capital and when possible, attributed the transaction to the known ultimate owner.The name of the countries of origin sometimes comes with typos or different denominations: we harmonized them.A manual check of all the selected data unveiled that a few transactions (n=14), appear twice in the database while bearing the same unique identification number. According to the Central Bank of Brazil (pers. comm., November 2016), this is due to errors in their routine of data extraction. We therefore deleted duplicates in our database, keeping only the latest occurrence of each unique transaction. Six (6) transactions recorded with an amount of zero were also deleted. Two (2) transactions registered in August 2003 with incoherent currencies (Deutsche Mark and Dutch guilder, which were demonetised in early 2002) were also deleted.To secure that the import of data from PDF to the database did not contain any systematic errors, for instance due to mistakes in coding, data were checked in two ways. First, because the script identifies the end of the row in the PDF using the amount of the transaction, which can sometimes fail if the amount is not entered correctly, we went through the extracted raw data (2798 rows) and cleaned all rows whose end had not been correctly identified by the script. Next, we manually double-checked the 486 largest transactions representing 90% of the total amount of capital inflows, as well as 140 randomly selected additional rows representing 5% of the total rows, compared the extracted data to the original PDFs, and found no mistakes.Transfers recorded in the database have been made in different currencies, including US dollars, Euros, Japanese Yens, Brazilian Reais, and more. The conversion to US dollars of all amounts denominated in other currencies was done using the average monthly exchange rate as published by the International Monetary Fund (International Financial Statistics: Exchange rates, national currency per US dollar, period average). Due to the limited time period, we have not corrected for inflation but aggregated nominal amounts in USD over the period 2000-2011.The categories loans, cash in advance (anticipated payment for exports), financed import, and leasing/rental, are those used by the Central Bank of Brazil in their published data. They are denominated respectively: “Loans” (“emprestimos” in original source) - : includes all loans, either contracted directly with creditors or indirectly through the issuance of securities, brokered by foreign agents. “Anticipated payment for exports” (“pagamento/renovacao pagamento antecipado de exportacao” in original source): defined as a type of loan (used in trade finance)“Financed import” (“importacao financiada” in original source): comprises all import financing transactions either direct (contracted by the importer with a foreign bank or with a foreign supplier), or indirect (contracted by Brazilian banks with foreign banks on behalf of Brazilian importers). They must be declared to the Central Bank if their term of payment is superior to 360 days.“Leasing/rental” (“arrendamento mercantil, leasing e aluguel” in original source) : concerns all types of external leasing operations consented by a Brazilian entity to a foreign one. They must be declared if the term of payment is superior to 360 days.More information about the different categories can be found through the Central Bank online.(Research Data Support provided by Springer Nature)
m
Low- and High-Dimensional Asset Prices Data
data.mendeley.com
Updated Oct 18, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chi Seng Pun (2017). Low- and High-Dimensional Asset Prices Data [Dataset]. http://doi.org/10.17632/ndxfrshm74.2
Explore at:
Unique identifier
https://doi.org/10.17632/ndxfrshm74.2
Dataset updated
Oct 18, 2017
Authors
Chi Seng Pun
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data files contain seven low-dimensional financial research data (in .txt format) and four high-dimensional daily stock prices data (in .csv format). The low-dimensional data sets are provided by Lorenzo Garlappi on his website, while the high-dimensional data sets are downloaded from Yahoo!Finance by the contributor's own efforts. The description of the low-dimensional data sets can be found in DeMiguel et al. (2009, RFS).
Sentiment Analysis on Financial Tweets
kaggle.com
zip
Updated Sep 5, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vivek Rathi (2019). Sentiment Analysis on Financial Tweets [Dataset]. https://www.kaggle.com/datasets/vivekrathi055/sentiment-analysis-on-financial-tweets
Explore at:
zip(2538259 bytes)Available download formats
Dataset updated
Sep 5, 2019
Authors
Vivek Rathi
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Context

The following information can also be found at https://www.kaggle.com/davidwallach/financial-tweets. Out of curosity, I just cleaned the .csv files to perform a sentiment analysis. So both the .csv files in this dataset are created by me.

Anything you read in the description is written by David Wallach and using all this information, I happen to perform my first ever sentiment analysis.

"I have been interested in using public sentiment and journalism to gather sentiment profiles on publicly traded companies. I first developed a Python package (https://github.com/dwallach1/Stocker) that scrapes the web for articles written about companies, and then noticed the abundance of overlap with Twitter. I then developed a NodeJS project that I have been running on my RaspberryPi to monitor Twitter for all tweets coming from those mentioned in the content section. If one of them tweeted about a company in the stocks_cleaned.csv file, then it would write the tweet to the database. Currently, the file is only from earlier today, but after about a month or two, I plan to update the tweets.csv file (hopefully closer to 50,000 entries.

I am not quite sure how this dataset will be relevant, but I hope to use these tweets and try to generate some sense of public sentiment score."

Content

This dataset has all the publicly traded companies (tickers and company names) that were used as input to fill the tweets.csv. The influencers whose tweets were monitored were: ['MarketWatch', 'business', 'YahooFinance', 'TechCrunch', 'WSJ', 'Forbes', 'FT', 'TheEconomist', 'nytimes', 'Reuters', 'GerberKawasaki', 'jimcramer', 'TheStreet', 'TheStalwart', 'TruthGundlach', 'Carl_C_Icahn', 'ReformedBroker', 'benbernanke', 'bespokeinvest', 'BespokeCrypto', 'stlouisfed', 'federalreserve', 'GoldmanSachs', 'ianbremmer', 'MorganStanley', 'AswathDamodaran', 'mcuban', 'muddywatersre', 'StockTwits', 'SeanaNSmith'

Acknowledgements

The data used here is gathered from a project I developed : https://github.com/dwallach1/StockerBot

Inspiration

I hope to develop a financial sentiment text classifier that would be able to track Twitter's (and the entire public's) feelings about any publicly traded company (and cryptocurrency)
Forex News Annotated Dataset for Sentiment Analysis
zenodo.org
data.niaid.nih.gov
csv
Updated Nov 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Georgios Fatouros; Georgios Fatouros; Kalliopi Kouroumali; Kalliopi Kouroumali (2023). Forex News Annotated Dataset for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.7976208
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7976208
Dataset updated
Nov 11, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Georgios Fatouros; Georgios Fatouros; Kalliopi Kouroumali; Kalliopi Kouroumali
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains news headlines relevant to key forex pairs: AUDUSD, EURCHF, EURUSD, GBPUSD, and USDJPY. The data was extracted from reputable platforms Forex Live and FXstreet over a period of 86 days, from January to May 2023. The dataset comprises 2,291 unique news headlines. Each headline includes an associated forex pair, timestamp, source, author, URL, and the corresponding article text. Data was collected using web scraping techniques executed via a custom service on a virtual machine. This service periodically retrieves the latest news for a specified forex pair (ticker) from each platform, parsing all available information. The collected data is then processed to extract details such as the article's timestamp, author, and URL. The URL is further used to retrieve the full text of each article. This data acquisition process repeats approximately every 15 minutes.

To ensure the reliability of the dataset, we manually annotated each headline for sentiment. Instead of solely focusing on the textual content, we ascertained sentiment based on the potential short-term impact of the headline on its corresponding forex pair. This method recognizes the currency market's acute sensitivity to economic news, which significantly influences many trading strategies. As such, this dataset could serve as an invaluable resource for fine-tuning sentiment analysis models in the financial realm.

We used three categories for annotation: 'positive', 'negative', and 'neutral', which correspond to bullish, bearish, and hold sentiments, respectively, for the forex pair linked to each headline. The following Table provides examples of annotated headlines along with brief explanations of the assigned sentiment.

Examples of Annotated Headlines Forex Pair Headline Sentiment Explanation GBPUSD Diminishing bets for a move to 12400 Neutral Lack of strong sentiment in either direction GBPUSD No reasons to dislike Cable in the very near term as long as the Dollar momentum remains soft Positive Positive sentiment towards GBPUSD (Cable) in the near term GBPUSD When are the UK jobs and how could they affect GBPUSD Neutral Poses a question and does not express a clear sentiment JPYUSD Appropriate to continue monetary easing to achieve 2% inflation target with wage growth Positive Monetary easing from Bank of Japan (BoJ) could lead to a weaker JPY in the short term due to increased money supply USDJPY Dollar rebounds despite US data. Yen gains amid lower yields Neutral Since both the USD and JPY are gaining, the effects on the USDJPY forex pair might offset each other USDJPY USDJPY to reach 124 by Q4 as the likelihood of a BoJ policy shift should accelerate Yen gains Negative USDJPY is expected to reach a lower value, with the USD losing value against the JPY AUDUSD <p>RBA Governor Lowe’s Testimony High inflation is damaging and corrosive </p> Positive Reserve Bank of Australia (RBA) expresses concerns about inflation. Typically, central banks combat high inflation with higher interest rates, which could strengthen AUD.

Moreover, the dataset includes two columns with the predicted sentiment class and score as predicted by the FinBERT model. Specifically, the FinBERT model outputs a set of probabilities for each sentiment class (positive, negative, and neutral), representing the model's confidence in associating the input headline with each sentiment category. These probabilities are used to determine the predicted class and a sentiment score for each headline. The sentiment score is computed by subtracting the negative class probability from the positive one.
m
Integrando Google Colab e Yahoo Finance (compactação e download de cotações...
data.mendeley.com
Updated Aug 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bernardo Mendes (2021). Integrando Google Colab e Yahoo Finance (compactação e download de cotações em formato CSV) published at the "Open Code Community" [Dataset]. http://doi.org/10.17632/r58pyjyvbx.1
Explore at:
Unique identifier
https://doi.org/10.17632/r58pyjyvbx.1
Dataset updated
Aug 26, 2021
Authors
Bernardo Mendes
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Material published at "https://opencodecom.net/post/2021-07-22-como-baixar-e-zipar-csv-utilizando-python/"
c
Redfin usa properties dataset
crawlfeeds.com
csv, zip
Updated Jun 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Redfin usa properties dataset [Dataset]. https://crawlfeeds.com/datasets/redfin-usa-properties-dataset
Explore at:
zip, csvAvailable download formats
Dataset updated
Jun 13, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Area covered
United States
Description
Explore the Redfin USA Properties Dataset, available in CSV format. This extensive dataset provides valuable insights into the U.S. real estate market, including detailed property listings, prices, property types, and more across various states and cities. Perfect for those looking to conduct in-depth market analysis, real estate investment research, or financial forecasting.

Key Features:

Comprehensive Property Data: Includes essential details such as listing prices, property types, square footage, and the number of bedrooms and bathrooms.

Geographic Coverage: Encompasses a wide range of U.S. states and cities, providing a broad view of the national real estate market.

Historical Trends: Analyze past market data to understand price movements, regional differences, and market trends over time.

Geo-Location Details: Enables spatial analysis and mapping by including precise geographical coordinates of properties.

Who Can Benefit From This Dataset:

Real Estate Investors: Identify lucrative opportunities by analyzing property values, market trends, and regional price variations.

Market Analysts: Gain a deeper understanding of the U.S. housing market dynamics to inform research and reporting.

Data Scientists and Researchers: Leverage detailed real estate data for modeling, urban studies, or economic analysis.

Financial Analysts: Utilize the dataset for financial modeling, helping to predict market behavior and assess investment risks.

Download the Redfin USA Properties Dataset to access essential information on the U.S. housing market, ideal for professionals in real estate, finance, and data analytics. Unlock key insights to make informed decisions in a dynamic market environment.

Looking for deeper insights or a custom data pull from Redfin?
Send a request with just one click and explore detailed property listings, price trends, and housing data.
🔗 Request Redfin Real Estate Data

Facebook

Twitter

Click to copy link

Link copied

Cite

Economic and Risk Analysis (2025). Financial Statement Data Sets [Dataset]. https://catalog.data.gov/dataset/financial-statement-data-sets

Financial Statement Data Sets

Explore at:

Dataset updated

Jul 9, 2025

Dataset provided by

Economic and Risk Analysis

Description

The data sets below provide selected information extracted from exhibits to corporate financial reports filed with the Commission using eXtensible Business Reporting Language (XBRL).

Clear search

Close search

Google apps

Main menu

Financial Statement Data Sets

Historical financial datasets for Financial Analysis with Spyder workshop

financial sentiment analysis dataset

Dataset

Contents

financial-qa-dataset

Financial_Risk

Data from: Company Financials Dataset

Financial Statements of Major Companies(2009-2023)

Dataset .csv

finance-alpaca-1k-train

CompanyData.com (BoldData) - Historical Financial Data For 230M Companies...

Yahoo Finance Dataset

Dataset

Contents

Hospital Annual Financial Data - Selected Data & Pivot Tables

Test Data Dummy CSV

Computational Finance Research Dataset (1996-2020)

Central Bank of Brazil data of foreign capital transfers, 2000-2011

Low- and High-Dimensional Asset Prices Data

Sentiment Analysis on Financial Tweets

Context

Content

Acknowledgements

Inspiration

Forex News Annotated Dataset for Sentiment Analysis

Integrando Google Colab e Yahoo Finance (compactação e download de cotações...

Redfin usa properties dataset

Financial Statement Data SetsSee More Versions

Financial Statement Data Sets