100+ datasets found
  1. d

    Financial Statement Data Sets

    • catalog.data.gov
    • s.cnmilf.com
    Updated Jul 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Economic and Risk Analysis (2025). Financial Statement Data Sets [Dataset]. https://catalog.data.gov/dataset/financial-statement-data-sets
    Explore at:
    Dataset updated
    Jul 9, 2025
    Dataset provided by
    Economic and Risk Analysis
    Description

    The data sets below provide selected information extracted from exhibits to corporate financial reports filed with the Commission using eXtensible Business Reporting Language (XBRL).

  2. Historical financial datasets for Financial Analysis with Spyder workshop

    • figshare.com
    txt
    Updated Jul 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Spyder IDE (2021). Historical financial datasets for Financial Analysis with Spyder workshop [Dataset]. http://doi.org/10.6084/m9.figshare.14995215.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 16, 2021
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Spyder IDE
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These three datasets provide closing price information for the following assets: Google, Apple, Microsoft, Netflix, Amazon, Pfizer, Astra Zeneca, Johnson & Johnson, ETH, BTC and LTC.The time period spans from 2012 to the end of 2020.

  3. financial sentiment analysis dataset

    • kaggle.com
    Updated Nov 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ujjwal Chowdhury (2022). financial sentiment analysis dataset [Dataset]. https://www.kaggle.com/datasets/ujjwalchowdhury/financial-sentiment-analysis-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 17, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ujjwal Chowdhury
    Description

    Dataset

    This dataset was created by Ujjwal Chowdhury

    Contents

  4. h

    financial-qa-dataset

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aditya Rane, financial-qa-dataset [Dataset]. https://huggingface.co/datasets/adityarane/financial-qa-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Aditya Rane
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    financial-qa-dataset

    This dataset consists of Question-Answer_Context Pairs. It also consists of metadata for filtering the records.

      Repo Structure
    

    financial-qa-dataset ├── financial-qa-dataset.csv ├── metadata.csv ├── notebooks │ |── loading_dataset.ipynb │ |── Loading_dataset_huggingface.ipynb │ |── basic_rag_langchain_vertexai.ipynb │ |── basic_rag_with_evaluation.ipynb | ├── data |── Statements |── Reports… See the full description on the dataset page: https://huggingface.co/datasets/adityarane/financial-qa-dataset.

  5. Financial_Risk

    • kaggle.com
    Updated Jul 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Preetham Gouda (2024). Financial_Risk [Dataset]. https://www.kaggle.com/datasets/preethamgouda/financial-risk
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 23, 2024
    Dataset provided by
    Kaggle
    Authors
    Preetham Gouda
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The Financial Risk Assessment Dataset provides detailed information on individual financial profiles. It includes demographic, financial, and behavioral data to assess financial risk. The dataset features various columns such as income, credit score, and risk rating, with intentional imbalances and missing values to simulate real-world scenarios.

  6. Data from: Company Financials Dataset

    • kaggle.com
    Updated Aug 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Atharva Arya (2023). Company Financials Dataset [Dataset]. https://www.kaggle.com/datasets/atharvaarya25/financials
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 1, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Atharva Arya
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    This is a dataset that requires a lot of preprocessing with amazing EDA insights for a company. A dataset consisting of sales and profit data sorted by market segment and country/region.

    Tips for pre-processing: 1. Check for column names and find error there itself!! 2. Remove '$' sign and '-' from all columns where they are present 3. Change datatype from objects to int after the above two. 4. Challenge: Try removing " , " (comma) from all numerical numbers. 5. Try plotting sales and profit with respect to timeline

  7. Financial Statements of Major Companies(2009-2023)

    • kaggle.com
    Updated Dec 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rishabh Patil (2023). Financial Statements of Major Companies(2009-2023) [Dataset]. https://www.kaggle.com/datasets/rish59/financial-statements-of-major-companies2009-2023
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 1, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Rishabh Patil
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    This is a compiled datasets comprising of data from various companies' 10-K annual reports and balance sheets. The data is a longitudinal or panel data, from year 2009-2022(/23) and also consists of a few bankrupt companies to help for investigating factors. The names of the companies are given according to their Stocks. Companies divided into specific categories.

  8. Dataset .csv

    • figshare.com
    Updated Oct 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roy Lee (2021). Dataset .csv [Dataset]. http://doi.org/10.6084/m9.figshare.14870139.v1
    Explore at:
    Dataset updated
    Oct 2, 2021
    Dataset provided by
    figshare
    Authors
    Roy Lee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a dataset link to paper Improved Back Test of Magic Formula in Malaysian Stock Market Using Applied Programming and Online Quantitative Platform. Include the monthly return for all portfolio and return and risk statistics.

  9. h

    finance-alpaca-1k-train

    • huggingface.co
    Updated Apr 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Poornima SS (2024). finance-alpaca-1k-train [Dataset]. https://huggingface.co/datasets/poornima9348/finance-alpaca-1k-train
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 26, 2024
    Authors
    Poornima SS
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This is the first 1k rows of the finance alpaca dataset in csv format. You may use this as train data. Another such pruned dataset with the next 1k rows is uploaded under poornima9348/finance-alpaca-1k-test.

  10. d

    CompanyData.com (BoldData) - Historical Financial Data For 230M Companies...

    • datarade.ai
    Updated Apr 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CompanyData.com (BoldData) (2021). CompanyData.com (BoldData) - Historical Financial Data For 230M Companies Worldwide [Dataset]. https://datarade.ai/data-products/custom-made-historical-financial-data-for-230m-companies-worldwide-bolddata
    Explore at:
    .json, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Apr 15, 2021
    Dataset authored and provided by
    CompanyData.com (BoldData)
    Area covered
    Ascension and Tristan da Cunha, Slovakia, Algeria, Russian Federation, Angola, Turkey, Tonga, French Polynesia, Cook Islands, Solomon Islands
    Description

    At CompanyData.com (BoldData), we specialize in delivering high-quality company data sourced directly from official trade registers. Our extensive dataset includes historical financial records for over 230 million companies worldwide, enabling deeper insight into business performance over time. Whether you're benchmarking companies, training AI models, or building risk profiles, our financial data equips you with the long-term perspective you need.

    Our financial database includes multi-year balance sheets, profit and loss statements, and key performance indicators such as revenue, net income, assets, liabilities, and equity. We provide standardized and structured data—backed by rigorous validation processes—to ensure consistency and accuracy across jurisdictions. Each financial profile can be enriched with hierarchical data, firmographics, contact details, and industry classifications to support complex analyses.

    This historical financial data supports a wide range of use cases including KYC and AML compliance, credit risk assessment, M&A research, financial modeling, competitive benchmarking, AI/ML training, and market segmentation. Whether you’re building a predictive scoring model or assessing long-term financial health, our data gives you the clarity and depth required for smarter decisions.

    Delivery is flexible to suit your needs: access files in Excel or CSV, browse through our self-service platform, integrate via real-time API, or enhance your existing datasets through custom enrichment services. With access to 380 million verified companies across all industries and geographies, CompanyData.com (BoldData) provides the scale, precision, and historical context to power your next move—globally.

  11. Yahoo Finance Dataset

    • kaggle.com
    Updated Jul 26, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yasir Hussein Shakir (2020). Yahoo Finance Dataset [Dataset]. https://www.kaggle.com/datasets/yasserhessein/yahoo-finance-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 26, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Yasir Hussein Shakir
    Description

    Dataset

    This dataset was created by Yasir Hussein Shakir

    Contents

  12. C

    Hospital Annual Financial Data - Selected Data & Pivot Tables

    • data.chhs.ca.gov
    • data.ca.gov
    • +6more
    csv, data, doc, html +4
    Updated Apr 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Health Care Access and Information (2025). Hospital Annual Financial Data - Selected Data & Pivot Tables [Dataset]. https://data.chhs.ca.gov/dataset/hospital-annual-financial-data-selected-data-pivot-tables
    Explore at:
    xlsx, pdf(383996), html, xlsx(750199), pdf(121968), xlsx(756356), xls(16002048), xlsx(768036), xlsx(754073), xlsx(769128), xlsx(763636), xls(920576), xls(44967936), xlsx(14714368), xlsx(758089), data, xls(18301440), pdf(333268), xls(51424256), pdf(310420), xlsx(765216), xls, xls(44933632), pdf(303198), csv(205488092), xlsx(752914), xls(14657536), doc, xls(51554816), pdf(258239), xlsx(770931), xlsx(771275), xls(19625472), zip, xls(19599360), xlsx(779866), xlsx(758376), xls(18445312), xlsx(777616), xlsx(782546), xls(19650048), xls(19577856), xlsx(790979)Available download formats
    Dataset updated
    Apr 23, 2025
    Dataset authored and provided by
    Department of Health Care Access and Information
    Description

    On an annual basis (individual hospital fiscal year), individual hospitals and hospital systems report detailed facility-level data on services capacity, inpatient/outpatient utilization, patients, revenues and expenses by type and payer, balance sheet and income statement.

    Due to the large size of the complete dataset, a selected set of data representing a wide range of commonly used data items, has been created that can be easily managed and downloaded. The selected data file includes general hospital information, utilization data by payer, revenue data by payer, expense data by natural expense category, financial ratios, and labor information.

    There are two groups of data contained in this dataset: 1) Selected Data - Calendar Year: To make it easier to compare hospitals by year, hospital reports with report periods ending within a given calendar year are grouped together. The Pivot Tables for a specific calendar year are also found here. 2) Selected Data - Fiscal Year: Hospital reports with report periods ending within a given fiscal year (July-June) are grouped together.

  13. Test Data Dummy CSV

    • figshare.com
    txt
    Updated Nov 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tori Duckworth (2023). Test Data Dummy CSV [Dataset]. http://doi.org/10.6084/m9.figshare.24500965.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Nov 6, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Tori Duckworth
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This CSV represents a dummy dataset to test the functionality of trusted repository search capabilities and of research data governance practices. The associated dummy dissertation is entitled Financial Econometrics Dummy Dissertation. The dummy file is a 7KB CSV containing 5000 rows of notional demographic tabular data.

  14. m

    Computational Finance Research Dataset (1996-2020)

    • data.mendeley.com
    • narcis.nl
    Updated Jul 1, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agung Purnomo (2021). Computational Finance Research Dataset (1996-2020) [Dataset]. http://doi.org/10.17632/d7k2852xnm.1
    Explore at:
    Dataset updated
    Jul 1, 2021
    Authors
    Agung Purnomo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Computational Finance research & publication dataset, which was indexed by Scopus from 1996 to 2020. The dataset consist of 503 publication data in CSV format. The dataset contains data authors, authors ID Scopus, title, year, source title, volume, issue, article number in Scopus, DOI, link, affiliation, abstract, index keywords, references, correspondence Address, editors, publisher, conference name, conference date, conference code, ISSN, language, document type, access type, and EID.

  15. f

    Central Bank of Brazil data of foreign capital transfers, 2000-2011

    • su.figshare.com
    • researchdata.se
    • +1more
    txt
    Updated May 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alice Dauriach; Emma Sundström; Beatrice Crona; Victor Galaz (2023). Central Bank of Brazil data of foreign capital transfers, 2000-2011 [Dataset]. http://doi.org/10.17045/sthlmuni.5857716.v4
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Stockholm University
    Authors
    Alice Dauriach; Emma Sundström; Beatrice Crona; Victor Galaz
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Area covered
    Brazil
    Description

    This data set is a subset of the "Records of foreign capital" (Registros de capitais estrangeiros", RCE) published by the Central Bank of Brazil (CBB) on their website.The data set consists of three data files and three corresponding metadata files. All files are in openly accessible .csv or .txt formats. See detailed outline below for data contained in each. Data files contain transaction-specific data such as unique identifier, currency, cancelled status and amount. Metadata files outline variables in the corresponding data file.RCE_Unclean_full_dataset.csv - all transactions published to the Central Bank website from the four main categories outlined belowMetadata_Unclean_full_dataset.csvRCE_Unclean_cancelled_dataset.csv - data extracted from the RCE_Unclean_full_dataset.csv where transactions were registered then cancelledMetadata_Unclean_cancelled_dataset.csvRCE_Clean_selection_dataset.csv - transaction data extracted from RCE_Unclean_full_dataset.csv and RCE_Unclean_cancelled_dataset.csv for the nine companies and criteria identified belowMetadata_Clean_selection_dataset.csvThe data include the period between October 2000 and July 2011. This is the only time span for the data provided by the Central Bank of Brazil at this stage. The records were published monthly by the Central Bank of Brazil as required by Art. 66 in Decree nº 55.762 of 17 February 1965, modified by Decree nº 4.842 of 17 September 2003. The records were published on the bank’s website starting October 2000, as per communique nº 011489 of 7 October 2003. This remained the case until August 2011, after which the amount of each transaction was no longer disclosed (and publication of these stopped altogether after October 2011). The disclosure of the records was suspended in order to review their legal and technical aspects, and ensure their suitability to the requirements of the rules governing the confidentiality of the information (Law nº 12.527 of 18 November 2011 and Decree nº 7724 of May 2012) (pers. comm. Central Bank of Brazil, 2016. Name of contact available upon request to Authors).The records track transfers of foreign capital made from abroad to companies domiciled in Brazil, with information on the foreign company (name and country) transferring the money, and on the company receiving the capital (name and federative unit). For the purpose of this study, we consider the four categories of foreign capital transactions which are published with their amount and currency in the Central Bank’s data, and which are all part of the “Register of financial transactions” (abbreviated RDE-ROF): loans, leasing, financed import and cash in advance (see below for a detailed description). Additional categories exist, such as foreign direct investment (RDE-IED) and External Investment in Portfolio (RDE-Portfólio), for which no amount is published and which are therefore not included.We used the data posted online as PDFs on the bank’s website, and created a script to extract the data automatically from these four categories into the RCE_Unclean_full_dataset.csv file. This data set has not been double-checked manually and may contain errors. We used a similar script to extract rows from the "cancelled transactions" sections of the PDFs into the RCE_Unclean_cancelled_dataset.csv file. This is useful to identify transactions that have been registered to the Central Bank but later cancelled. This data set has not been double-checked manually and may contain errors.From these raw data sets, we conducted the following selections and calculations in order to create the RCE_Clean_selection_dataset.csv file. This data set has been double-checked manually to secure that no errors have been made in the extraction process.We selected all transactions whose recipient company name corresponds to one of these nine companies, or to one of their known subsidiaries in Brazil, according to the list of subsidiaries recorded in the Orbis database, maintained by Bureau Van Dijk. Transactions are included if the recipient company name matches one of the following:- the current or former name of one of the nine companies in our sample (former names are identified using Orbis, Bloomberg’s company profiles or the company website);- the name of a known subsidiary of one of the nine companies, if and only if we find evidence (in Orbis, Bloomberg’s company profiles or on the company website) that this subsidiary was owned at some point during the period 2000-2011, and that it operated in a sector related to the soy or beef industry (including fertilizers and trading activities).For each transaction, we extracted the name of the company sending capital and when possible, attributed the transaction to the known ultimate owner.The name of the countries of origin sometimes comes with typos or different denominations: we harmonized them.A manual check of all the selected data unveiled that a few transactions (n=14), appear twice in the database while bearing the same unique identification number. According to the Central Bank of Brazil (pers. comm., November 2016), this is due to errors in their routine of data extraction. We therefore deleted duplicates in our database, keeping only the latest occurrence of each unique transaction. Six (6) transactions recorded with an amount of zero were also deleted. Two (2) transactions registered in August 2003 with incoherent currencies (Deutsche Mark and Dutch guilder, which were demonetised in early 2002) were also deleted.To secure that the import of data from PDF to the database did not contain any systematic errors, for instance due to mistakes in coding, data were checked in two ways. First, because the script identifies the end of the row in the PDF using the amount of the transaction, which can sometimes fail if the amount is not entered correctly, we went through the extracted raw data (2798 rows) and cleaned all rows whose end had not been correctly identified by the script. Next, we manually double-checked the 486 largest transactions representing 90% of the total amount of capital inflows, as well as 140 randomly selected additional rows representing 5% of the total rows, compared the extracted data to the original PDFs, and found no mistakes.Transfers recorded in the database have been made in different currencies, including US dollars, Euros, Japanese Yens, Brazilian Reais, and more. The conversion to US dollars of all amounts denominated in other currencies was done using the average monthly exchange rate as published by the International Monetary Fund (International Financial Statistics: Exchange rates, national currency per US dollar, period average). Due to the limited time period, we have not corrected for inflation but aggregated nominal amounts in USD over the period 2000-2011.The categories loans, cash in advance (anticipated payment for exports), financed import, and leasing/rental, are those used by the Central Bank of Brazil in their published data. They are denominated respectively: “Loans” (“emprestimos” in original source) - : includes all loans, either contracted directly with creditors or indirectly through the issuance of securities, brokered by foreign agents. “Anticipated payment for exports” (“pagamento/renovacao pagamento antecipado de exportacao” in original source): defined as a type of loan (used in trade finance)“Financed import” (“importacao financiada” in original source): comprises all import financing transactions either direct (contracted by the importer with a foreign bank or with a foreign supplier), or indirect (contracted by Brazilian banks with foreign banks on behalf of Brazilian importers). They must be declared to the Central Bank if their term of payment is superior to 360 days.“Leasing/rental” (“arrendamento mercantil, leasing e aluguel” in original source) : concerns all types of external leasing operations consented by a Brazilian entity to a foreign one. They must be declared if the term of payment is superior to 360 days.More information about the different categories can be found through the Central Bank online.(Research Data Support provided by Springer Nature)

  16. m

    Low- and High-Dimensional Asset Prices Data

    • data.mendeley.com
    Updated Oct 18, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chi Seng Pun (2017). Low- and High-Dimensional Asset Prices Data [Dataset]. http://doi.org/10.17632/ndxfrshm74.2
    Explore at:
    Dataset updated
    Oct 18, 2017
    Authors
    Chi Seng Pun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data files contain seven low-dimensional financial research data (in .txt format) and four high-dimensional daily stock prices data (in .csv format). The low-dimensional data sets are provided by Lorenzo Garlappi on his website, while the high-dimensional data sets are downloaded from Yahoo!Finance by the contributor's own efforts. The description of the low-dimensional data sets can be found in DeMiguel et al. (2009, RFS).

  17. Sentiment Analysis on Financial Tweets

    • kaggle.com
    zip
    Updated Sep 5, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vivek Rathi (2019). Sentiment Analysis on Financial Tweets [Dataset]. https://www.kaggle.com/datasets/vivekrathi055/sentiment-analysis-on-financial-tweets
    Explore at:
    zip(2538259 bytes)Available download formats
    Dataset updated
    Sep 5, 2019
    Authors
    Vivek Rathi
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Context

    The following information can also be found at https://www.kaggle.com/davidwallach/financial-tweets. Out of curosity, I just cleaned the .csv files to perform a sentiment analysis. So both the .csv files in this dataset are created by me.

    Anything you read in the description is written by David Wallach and using all this information, I happen to perform my first ever sentiment analysis.

    "I have been interested in using public sentiment and journalism to gather sentiment profiles on publicly traded companies. I first developed a Python package (https://github.com/dwallach1/Stocker) that scrapes the web for articles written about companies, and then noticed the abundance of overlap with Twitter. I then developed a NodeJS project that I have been running on my RaspberryPi to monitor Twitter for all tweets coming from those mentioned in the content section. If one of them tweeted about a company in the stocks_cleaned.csv file, then it would write the tweet to the database. Currently, the file is only from earlier today, but after about a month or two, I plan to update the tweets.csv file (hopefully closer to 50,000 entries.

    I am not quite sure how this dataset will be relevant, but I hope to use these tweets and try to generate some sense of public sentiment score."

    Content

    This dataset has all the publicly traded companies (tickers and company names) that were used as input to fill the tweets.csv. The influencers whose tweets were monitored were: ['MarketWatch', 'business', 'YahooFinance', 'TechCrunch', 'WSJ', 'Forbes', 'FT', 'TheEconomist', 'nytimes', 'Reuters', 'GerberKawasaki', 'jimcramer', 'TheStreet', 'TheStalwart', 'TruthGundlach', 'Carl_C_Icahn', 'ReformedBroker', 'benbernanke', 'bespokeinvest', 'BespokeCrypto', 'stlouisfed', 'federalreserve', 'GoldmanSachs', 'ianbremmer', 'MorganStanley', 'AswathDamodaran', 'mcuban', 'muddywatersre', 'StockTwits', 'SeanaNSmith'

    Acknowledgements

    The data used here is gathered from a project I developed : https://github.com/dwallach1/StockerBot

    Inspiration

    I hope to develop a financial sentiment text classifier that would be able to track Twitter's (and the entire public's) feelings about any publicly traded company (and cryptocurrency)

  18. Forex News Annotated Dataset for Sentiment Analysis

    • zenodo.org
    • data.niaid.nih.gov
    csv
    Updated Nov 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Georgios Fatouros; Georgios Fatouros; Kalliopi Kouroumali; Kalliopi Kouroumali (2023). Forex News Annotated Dataset for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.7976208
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 11, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Georgios Fatouros; Georgios Fatouros; Kalliopi Kouroumali; Kalliopi Kouroumali
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains news headlines relevant to key forex pairs: AUDUSD, EURCHF, EURUSD, GBPUSD, and USDJPY. The data was extracted from reputable platforms Forex Live and FXstreet over a period of 86 days, from January to May 2023. The dataset comprises 2,291 unique news headlines. Each headline includes an associated forex pair, timestamp, source, author, URL, and the corresponding article text. Data was collected using web scraping techniques executed via a custom service on a virtual machine. This service periodically retrieves the latest news for a specified forex pair (ticker) from each platform, parsing all available information. The collected data is then processed to extract details such as the article's timestamp, author, and URL. The URL is further used to retrieve the full text of each article. This data acquisition process repeats approximately every 15 minutes.

    To ensure the reliability of the dataset, we manually annotated each headline for sentiment. Instead of solely focusing on the textual content, we ascertained sentiment based on the potential short-term impact of the headline on its corresponding forex pair. This method recognizes the currency market's acute sensitivity to economic news, which significantly influences many trading strategies. As such, this dataset could serve as an invaluable resource for fine-tuning sentiment analysis models in the financial realm.

    We used three categories for annotation: 'positive', 'negative', and 'neutral', which correspond to bullish, bearish, and hold sentiments, respectively, for the forex pair linked to each headline. The following Table provides examples of annotated headlines along with brief explanations of the assigned sentiment.

    Examples of Annotated Headlines
    
    
        Forex Pair
        Headline
        Sentiment
        Explanation
    
    
    
    
        GBPUSD 
        Diminishing bets for a move to 12400 
        Neutral
        Lack of strong sentiment in either direction
    
    
        GBPUSD 
        No reasons to dislike Cable in the very near term as long as the Dollar momentum remains soft 
        Positive
        Positive sentiment towards GBPUSD (Cable) in the near term
    
    
        GBPUSD 
        When are the UK jobs and how could they affect GBPUSD 
        Neutral
        Poses a question and does not express a clear sentiment
    
    
        JPYUSD
        Appropriate to continue monetary easing to achieve 2% inflation target with wage growth 
        Positive
        Monetary easing from Bank of Japan (BoJ) could lead to a weaker JPY in the short term due to increased money supply
    
    
        USDJPY
        Dollar rebounds despite US data. Yen gains amid lower yields 
        Neutral
        Since both the USD and JPY are gaining, the effects on the USDJPY forex pair might offset each other
    
    
        USDJPY
        USDJPY to reach 124 by Q4 as the likelihood of a BoJ policy shift should accelerate Yen gains 
        Negative
        USDJPY is expected to reach a lower value, with the USD losing value against the JPY
    
    
        AUDUSD
    
        <p>RBA Governor Lowe’s Testimony High inflation is damaging and corrosive </p>
    
        Positive
        Reserve Bank of Australia (RBA) expresses concerns about inflation. Typically, central banks combat high inflation with higher interest rates, which could strengthen AUD.
    

    Moreover, the dataset includes two columns with the predicted sentiment class and score as predicted by the FinBERT model. Specifically, the FinBERT model outputs a set of probabilities for each sentiment class (positive, negative, and neutral), representing the model's confidence in associating the input headline with each sentiment category. These probabilities are used to determine the predicted class and a sentiment score for each headline. The sentiment score is computed by subtracting the negative class probability from the positive one.

  19. m

    Integrando Google Colab e Yahoo Finance (compactação e download de cotações...

    • data.mendeley.com
    Updated Aug 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bernardo Mendes (2021). Integrando Google Colab e Yahoo Finance (compactação e download de cotações em formato CSV) published at the "Open Code Community" [Dataset]. http://doi.org/10.17632/r58pyjyvbx.1
    Explore at:
    Dataset updated
    Aug 26, 2021
    Authors
    Bernardo Mendes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
  20. c

    Redfin usa properties dataset

    • crawlfeeds.com
    csv, zip
    Updated Jun 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2025). Redfin usa properties dataset [Dataset]. https://crawlfeeds.com/datasets/redfin-usa-properties-dataset
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Jun 13, 2025
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Area covered
    United States
    Description

    Explore the Redfin USA Properties Dataset, available in CSV format. This extensive dataset provides valuable insights into the U.S. real estate market, including detailed property listings, prices, property types, and more across various states and cities. Perfect for those looking to conduct in-depth market analysis, real estate investment research, or financial forecasting.

    Key Features:

    • Comprehensive Property Data: Includes essential details such as listing prices, property types, square footage, and the number of bedrooms and bathrooms.
    • Geographic Coverage: Encompasses a wide range of U.S. states and cities, providing a broad view of the national real estate market.
    • Historical Trends: Analyze past market data to understand price movements, regional differences, and market trends over time.
    • Geo-Location Details: Enables spatial analysis and mapping by including precise geographical coordinates of properties.

    Who Can Benefit From This Dataset:

    • Real Estate Investors: Identify lucrative opportunities by analyzing property values, market trends, and regional price variations.
    • Market Analysts: Gain a deeper understanding of the U.S. housing market dynamics to inform research and reporting.
    • Data Scientists and Researchers: Leverage detailed real estate data for modeling, urban studies, or economic analysis.
    • Financial Analysts: Utilize the dataset for financial modeling, helping to predict market behavior and assess investment risks.

    Download the Redfin USA Properties Dataset to access essential information on the U.S. housing market, ideal for professionals in real estate, finance, and data analytics. Unlock key insights to make informed decisions in a dynamic market environment.

    Looking for deeper insights or a custom data pull from Redfin?
    Send a request with just one click and explore detailed property listings, price trends, and housing data.
    đź”— Request Redfin Real Estate Data

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Economic and Risk Analysis (2025). Financial Statement Data Sets [Dataset]. https://catalog.data.gov/dataset/financial-statement-data-sets

Financial Statement Data Sets

Explore at:
Dataset updated
Jul 9, 2025
Dataset provided by
Economic and Risk Analysis
Description

The data sets below provide selected information extracted from exhibits to corporate financial reports filed with the Commission using eXtensible Business Reporting Language (XBRL).

Search
Clear search
Close search
Google apps
Main menu