18 datasets found
  1. w

    EDGAR Database of SEC Filings

    • data.wu.ac.at
    Updated Feb 1, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Economics Datasets (2014). EDGAR Database of SEC Filings [Dataset]. https://data.wu.ac.at/odso/datahub_io/ZWIwNTI0NTMtNzFmNS00NWNhLWEyMDQtODBjZmEyYWE4Yzg0
    Explore at:
    Dataset updated
    Feb 1, 2014
    Dataset provided by
    Economics Datasets
    Description

    Securities and Exchange Commission (SEC) EDGAR database which contains regulatory filings from publicly-traded US corporations.

    All companies, foreign and domestic, are required to file registration statements, periodic reports, and other forms electronically through EDGAR. Anyone can access and download this information for free. Here you'll find links to a complete list of filings available through EDGAR and instructions for searching the EDGAR database.

    Human Interface

    See http://www.sec.gov/edgar/searchedgar/companysearch.html

    Bulk Data

    EDGAR provides bulk access via FTP: ftp://ftp.sec.gov/ - "https://www.sec.gov/edgar/searchedgar/ftpusers.htm">official documentation. We summarize here the main points.

    Each company in EDGAR gets an identifier known as the CIK which is a 10 digit number. You can find the CIK by searching EDGAR using a name of stock market ticker.

    For example, "http://www.sec.gov/cgi-bin/browse-edgar?CIK=ibm">searching for IBM by ticker shows us that the the CIK is 0000051143.

    Note that leading zeroes are often omitted (e.g. in the ftp access) so this would be come 51143.

    Next each submission receives an 'Accession Number' (acc-no). For example, IBM's quarterly financial filing (form 10-Q) in October 2013 had accession number: 0000051143-13-000007.

    FTP File Paths

    Given a company with CIK (company ID) XXX (omitting leading zeroes) and document accession number YYY (acc-no on search results) the path would be:

    File paths are of the form:

    /edgar/data/XXX/YYY.txt
    

    For example, for the IBM data above it would be:

    ftp://ftp.sec.gov/edgar/data/51143/0000051143-13-000007.txt

    Note, if you are looking for a nice HTML version you can find it at in the Archives section with a similar URL (just add -index.html):

    http://www.sec.gov/Archives/edgar/data/51143/000005114313000007/0000051143-13-000007-index.htm

    Indices

    If you want to get a list of all filings you'll want to grab an Index. As the help page explains:

    The EDGAR indices are a helpful resource for FTP retrieval, listing the following information for each filing: Company Name, Form Type, CIK, Date Filed, and File Name (including folder path).

    Four types of indexes are available:

    • company — sorted by company name
    • form — sorted by form type
    • master — sorted by CIK number
    • XBRL — list of submissions containing XBRL financial files, sorted by CIK number; these include Voluntary Filer Program submissions

    URLs are like:

    ftp://ftp.sec.gov/edgar/full-index/2008/QTR4/master.gz

    That is, they have the following general form:

    ftp://ftp.sec.gov/edgar/full-index/{YYYY}/QTR{1-4}/{index-name}.[gz|zip]
    

    So for XBRL in the 3rd quarter of 2010 we'd do:

    ftp://ftp.sec.gov/edgar/full-index/2010/QTR3/xbrl.gz

    CIK lists and lookup

    There's a full list of all companies along with their CIK code here: http://www.sec.gov/edgar/NYU/cik.coleft.c

    If you want to look up a CIK or company by its ticker you can do the following query against the normal search system:

    http://www.sec.gov/cgi-bin/browse-edgar?CIK=ibm&Find=Search&owner=exclude&action=getcompany&output=atom

    Then parse the atom to grab the CIK. (If you prefer HTML output just omit output=atom).

    There is also a full-text company name to CIK lookup here: http://www.sec.gov/edgar/searchedgar/cik.htm (Note this does a POST to a 'text' API at http://www.sec.gov/cgi-bin/cik.pl.c)

    References

    • CorpWatch have an excellent API and DB dump covering a lot of EDGAR info - see the [CorpWatch DataHub Entry]
  2. F

    SEC EDGAR filings API

    • finazon.io
    json
    Updated Sep 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Finazon (2023). SEC EDGAR filings API [Dataset]. https://finazon.io/dataset/sec_edgar_filings
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Sep 29, 2023
    Dataset authored and provided by
    Finazon
    License

    https://finazon.io/assets/files/Finazon_Terms_of_Service.pdfhttps://finazon.io/assets/files/Finazon_Terms_of_Service.pdf

    Dataset funded by
    Finazon
    Description

    Leveraging the most comprehensive database of the U.S. Securities and Exchange Commission, this dataset offers real-time and historical access to all forms, filings, and exhibits directly from the SEC's EDGAR system. Covering every publicly traded company in the US, the dataset provides essential corporate data ranging from 10-K and 10-Q reports, 8-K filings, insider transactions (Form 4), to beneficial ownership reports (Forms 13D/G), and more. Notably, data is standardized and tagged using XBRL, offering convenient access and easy integration via RESTful API endpoints. The dataset's unique advantage lies in providing direct insights into the financial health, strategic decisions, and operational changes of corporations.

  3. Data from: SEC Filings

    • kaggle.com
    zip
    Updated Jun 5, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2020). SEC Filings [Dataset]. https://www.kaggle.com/bigquery/sec-filings
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Jun 5, 2020
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    Description

    In the U.S. public companies, certain insiders and broker-dealers are required to regularly file with the SEC. The SEC makes this data available online for anybody to view and use via their Electronic Data Gathering, Analysis, and Retrieval (EDGAR) database. The SEC updates this data every quarter going back to January, 2009. For more information please see this site.

    To aid analysis a quick summary view of the data has been created that is not available in the original dataset. The quick summary view pulls together signals into a single table that otherwise would have to be joined from multiple tables and enables a more streamlined user experience.

    DISCLAIMER: The Financial Statement and Notes Data Sets contain information derived from structured data filed with the Commission by individual registrants as well as Commission-generated filing identifiers. Because the data sets are derived from information provided by individual registrants, we cannot guarantee the accuracy of the data sets. In addition, it is possible inaccuracies or other errors were introduced into the data sets during the process of extracting the data and compiling the data sets. Finally, the data sets do not reflect all available information, including certain metadata associated with Commission filings. The data sets are intended to assist the public in analyzing data contained in Commission filings; however, they are not a substitute for such filings. Investors should review the full Commission filings before making any investment decision.

  4. h

    financial-reports-sec

    • huggingface.co
    Updated Sep 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aman Khan (2023). financial-reports-sec [Dataset]. https://huggingface.co/datasets/JanosAudran/financial-reports-sec
    Explore at:
    Dataset updated
    Sep 15, 2023
    Authors
    Aman Khan
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    The dataset contains the annual report of US public firms filing with the SEC EDGAR system. Each annual report (10K filing) is broken into 20 sections. Each section is split into individual sentences. Sentiment labels are provided on a per filing basis from the market reaction around the filing data. Additional metadata for each filing is included in the dataset.

  5. d

    Tradefeeds SEC Filings API - historical financial statements and reports

    • datarade.ai
    .json, .csv
    Updated Sep 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tradefeeds (2023). Tradefeeds SEC Filings API - historical financial statements and reports [Dataset]. https://datarade.ai/data-products/tradefeeds-sec-filings-api-historical-finanicial-statements-tradefeeds
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Sep 18, 2023
    Dataset authored and provided by
    Tradefeeds
    Area covered
    United States of America, Lesotho, Aruba, Netherlands, Falkland Islands (Malvinas), Burkina Faso, Yemen, Anguilla, Guernsey, Seychelles
    Description

    Companies’ filings submitted to the EDGAR system are financial statements and reports that receive a particular form name and form description by the U.S. Security and Exchange Commission. SEC filings are arranged around six key groupings: annual fillings (Form 10-K), quarterly filings (Form 10-Q), current reports (8-K), proxy filings (DEF14A), registration statements (Form 424B2), Section 16 filings (Form4) and slightly more. These are the most demanded filings by customers. However, they are also interested in other common types of SEC filings such as: Form 6-K, Forms 3 and 5, Form S-1, Form S-3, Form S-8 , Form 20-F, Form 40-F , Schedule 13, Form 144. SEC filings data of different companies are searchable by stock ticker symbols and form types in the API URL Path field. The SEC Filings API database is set up in such a way that companies’ filings are arranged according to their company names, stock ticker symbols, ISIN and CIK numbers.

  6. Data for: Relational Foundations of an Unequal Consumer Credit Market:...

    • commons.datacite.org
    Updated 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Megan Bea (2022). Data for: Relational Foundations of an Unequal Consumer Credit Market: Symbiotic Ties between Banks and Payday Lenders [Dataset]. http://doi.org/10.5064/f6otuzmq
    Explore at:
    Dataset updated
    2022
    Dataset provided by
    DataCitehttps://www.datacite.org/
    Authors
    Megan Bea
    Description

    This is an Annotation for Transparent Inquiry (ATI) data project. The annotated article can be viewed on the Publisher's website. You will need to use the Chrome or Edge browser with the Hypothesis extension installed to view the ATI annotations Data Generation. My primary data come from archival financial documents available from the Securities and Exchange Commission (SEC). I examine documents from each publicly traded payday lender for their duration of their requirements to report to the SEC (or through 2014 if filing is ongoing). The SEC requires all publicly traded firms to “disclose meaningful financial and other information to the public” as part of its mission to protect investors. (see: https://www.sec.gov/Article/whatwedo.html) As part of this, the SEC makes the financial records publicly available on the EDGAR Database (https://www.sec.gov/edgar.shtml), which permits archival searches of registrations, reports, and forms. There are several types of filings that publicly traded firms must provide that are useful for this study and referred to in analysis: S-1: Registration Statement. This is the initial registration of new securities (i.e. stocks) submitted by companies who are planning to go public. 10-K: Annual Report. This is filed 60 – 90 days after the end of each fiscal year. 10-Q: Quarterly Report. This is filed 40-45 days after the end of each fiscal quarter. 8-K: Current Report. A Current Report can be filed at any time of the year to notify investors of events that may be important for them. This includes entry into or termination of “a material definitive agreement,” including changes to their commercial credit agreement. Exhibits are also included with the above filings. Important for this research are the Exhibit 10 documents. Exhibit 10 filings pertain to material contracts, including the contracts for commercial credit agreements between the companies and banks reviewed here. In the EDGAR Database, searching by Exhibits is not possible. Instead, researchers must go to the Company Page, select the filing type (S-1, 10-K, 10-Q, 8-K), and then view related exhibits [see Doherty-Bea_Exhibit Search Process.pdf]. This limitation required that I read the filings first to identify when a new or amended/restated credit agreement was arranged between each payday lender and the banks. This was easiest to do by reading all Annual Reports (10-Ks) first. If a commercial credit agreement was amended or restated during the fiscal year, the date of this arrangement would be referenced in the Annual Report. Sometimes the Exhibit was including the 10-K filing, but if it was not, I could use the date mentioned in the report and then go back to the 10-Q or 8-K nearest to that date to which the agreement exhibit would be attached. Data Analysis. Data analysis occurred in several phases and required multiple reads of the archival documents. Within-Company Process.First, I systematically read each Registration (S-1) and Annual Report (10-K) filed by each company in chronological order. I started with the S-1, which would be the initial filing made by the company. I then read through each Annual Report in order from earliest to latest filed. As credit agreements were renegotiated over time, I read each connected agreement in order from earliest to latest. I recorded the banks involved in each credit agreement in order to construct the network (more details in next Analytic Note). I identified themes as I read and color coded excerpts as they related to each them. Some “themes” were classifcations that allowed me to construct the figures presented in this paper: E.g., Company Operations was a theme that noted company practices, total number of storefronts, new acquisitions, and shares of revenue from payday lending. I recorded these details in a separate document to later construct Figures 1 and Figure 4. The remaining themes corresponded to different motivations for receipt of bank support (on the part of payday lenders) and motivations for financing payday lenders (on the part of the banks). Different colors were assigned to each theme, and potential excerpts were color-coded by theme. Across-Company Process. After reading the within-company documents, I then read across companies’ filings to identify relevant passages to see how language and content varied in key passages relating to their operations, finances, and anticipated business risks. This required that I reread the S-1s and 10- Ks several times as I worked to identify motivations. Theme Identification. Four of the five thematic areas regarding about motivations for financing emerged from the narratives payday lenders used in their annual and quarterly reports. I read with the following questions in mind "Why do these companies need bank financing? Do these companies differ in their motivations? How so? What happens if they lose financing?" This deep reading resulted in three key themes that are discussed in the Results Section: Importance of Bank Financing for Maintaining Daily Operations Importance of Bank Financing for Storefront Expansion Mitigating Risks of Loss of Bank Financing Themes for banks’ decisions to finance payday lenders emerged primarily in the actual credit agreements that were signed by all banks and the payday lender. Similarly, I read with questions in mind: "Why are banks financing payday lenders, a potential competitor?" "What do banks get out of this financial relationship?" The key theme for banks that emerged from the credit agreements was: 1) Asymmetric Information. A second theme for banks, Continued Profits from High Interest Lending, was identified in part from my review of research on the history of banks’ failed attempts to engage in more direct means of high interest lending, and via a conceptualization of how this provides revenue for banks (see Figure 3). These motivations are also discussed in detail in the results section. Logic of Annotation. I primarily used annotations to provide Analytic Notes, Source Excerpts, and Source Links that provided additional information about my analytic process and choice of quotes in the main text. All sources are publicly accessible through the EDGAR database links provided. Page numbers are included in the annotations if they are not already provided in the main text. Bold and Italics are presented as they appear in the financial documents, unless otherwise noted. Analytic Notes aim to provide additional information about 1) how I identified and classified eligible banks and payday lenders to use in my analysis; 2) how I constructed the bank-payday lender network using historical data; 3) how I identified information in the financial reports to create the Figures referred to in the analysis; 4) the nature of the financial documents and the content I analyzed; and/or 5) how payday lenders companies were similar (or different) in their motivations around the use of bank financing and the terms of their contracts with the banks. Finally, one Analytic Note to emphasize the standard nature of the terms of the credit arrangements between banks and payday lenders (see p. 23). Source Excerpts sought to provide additional context for quotes that appeared in the main text. Sometimes Excerpts were paired with Analytic Notes to further elaborate on the context of the statements. In some cases, I did not provide an additional source excerpt because the full scope of the explanatory content was available in the quote provided. There are a few cases where the main text summarizes a passage rather than providing a direct quote; in these cases I elected to provide the full Source Excerpt so that the reader could view the exact language that I drew my summary from (see, e.g., p. 25).

  7. P

    E-NER Dataset

    • paperswithcode.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ting Wai Terence Au; Ingemar J. Cox; Vasileios Lampos, E-NER Dataset [Dataset]. https://paperswithcode.com/dataset/e-ner
    Explore at:
    Authors
    Ting Wai Terence Au; Ingemar J. Cox; Vasileios Lampos
    Description

    E-NER is a publicly available legal Named Entity Recognition (NER) data set. It contains 52 filings from the US SEC EDGAR database. The named entity tags are hand annotated.

  8. US Company Filings Database

    • lseg.com
    Updated Dec 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Refinitiv (2023). US Company Filings Database [Dataset]. https://www.lseg.com/en/data-analytics/financial-data/filings/company-filings-database
    Explore at:
    csv,html,json,pdf,python,text,user interface,xmlAvailable download formats
    Dataset updated
    Dec 8, 2023
    Dataset authored and provided by
    Refinitivhttp://www.refinitiv.com/
    License

    https://www.refinitiv.com/en/policies/terms-of-usehttps://www.refinitiv.com/en/policies/terms-of-use

    Description

    Browse LSEG's US Company Filings Database, and find a range of filings content and history including annual reports, municipal bonds, and more.

  9. MarkupMnA

    • zenodo.org
    • explore.openaire.eu
    zip
    Updated Jun 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sukrit Rao; Pranab Islam; Rohith Bollineni; Shaan Khosla; Tingyi Fei; Qian Wu; Kyunghyun Cho; Vladimir Kobzar; Sukrit Rao; Pranab Islam; Rohith Bollineni; Shaan Khosla; Tingyi Fei; Qian Wu; Kyunghyun Cho; Vladimir Kobzar (2023). MarkupMnA [Dataset]. http://doi.org/10.5281/zenodo.8034853
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 15, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sukrit Rao; Pranab Islam; Rohith Bollineni; Shaan Khosla; Tingyi Fei; Qian Wu; Kyunghyun Cho; Vladimir Kobzar; Sukrit Rao; Pranab Islam; Rohith Bollineni; Shaan Khosla; Tingyi Fei; Qian Wu; Kyunghyun Cho; Vladimir Kobzar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The MarkupMnA dataset is a corpus of 151 merger and acquisition agreements with annotated sections titles, section numbers, page numbers, and more, based on HTML filings by US public companies retrieved from the SEC EDGAR database. We consider the task of section title annotation as a sequence labeling task, and to that end, use the BEIOS tagging scheme when generating our annotations. There are over 70,000 labels in the entire dataset excluding outside labels and over 465,000 labels including outside labels.

    We add annotations to the contracts in an already widely used dataset, MAUD, which is an expert-annotated reading comprehension dataset. The broad objective of our work is to make progress toward developing computationally efficient hierarchical representations of long documents, specifically for legal contracts. We hope that our annotations can be used in conjunction with MAUD to advance legal NLP research.

  10. H

    Common Ownership Data: Scraped SEC form 13F filings for 1999-2017

    • dataverse.harvard.edu
    bin, csv +3
    Updated Aug 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harvard Dataverse (2020). Common Ownership Data: Scraped SEC form 13F filings for 1999-2017 [Dataset]. http://doi.org/10.7910/DVN/ZRH3EU
    Explore at:
    bin(271859768), tsv(11192545), bin(2934960), txt(3008286), txt(110929), txt(14847), text/x-perl-script(21999), bin(4653090), txt(25964), csv(2363718396), txt(303881), bin(323182551), txt(156950), txt(196510)Available download formats
    Dataset updated
    Aug 17, 2020
    Dataset provided by
    Harvard Dataverse
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1999 - Dec 31, 2017
    Description

    Introduction In the course of researching the common ownership hypothesis, we found a number of issues with the Thomson Reuters (TR) "S34" dataset used by many researchers and frequently accessed via Wharton Research Data Services (WRDS). WRDS has done extensive work to improve the database, working with other researchers that have uncovered problems, specifically fixing a lack of records of BlackRock holdings. However, even with the updated dataset posted in the summer of 2018, we discovered a number of discrepancies when accessing data for constituent firms of the S&P 500 Index. We therefore set out to separately create a dataset of 13(f) holdings from the source documents, which are all public and available electronically from the Securities and Exchange Commission (SEC) website. Coverage is good starting in 1999, when electronic filing became mandatory. However, the SEC's Inspector General issued a critical report in 2010 about the information contained in 13(f) filings. The process: We gathered all 13(f) filings from 1999-2017 here. The corpus is over 318,000 filings and occupies ~25GB of space if unzipped. (We do not include the raw filings here as they can be downloaded from EDGAR). We wrote code to parse the filings to extract holding information using regular expressions in Perl. Our target list of holdings was all public firms with a market capitalization of at least $10M. From the header of the file, we first extract the filing date, reporting date, and reporting entity (Central Index Key, or CIK, and CIKNAME). Beginning with the September 30 2013 filing date, all filings were in XML format, which made parsing fairly straightforward, as all values are contained in tags. Prior to that date, the filings are remarkable for the heterogeneity in formatting. Several examples are linked to below. Our approach was to look for any lines containing a CUSIP code that we were interested in, and then attempting to determine the "number of shares" field and the "value" field. To help validate the values we extracted, we downloaded stock price data from CRSP for the filing date, as that allows for a logic check of (price * shares) = value. We do not claim that this will exhaustively extract all holding information. We can provide examples of filings that are formatted in such a way that we are not able to extract the relevant information. In both XML and non-XML filings, we attempt to remove any derivative holdings by looking for phrases such as OPT, CALL, PUT, WARR, etc. We then perform some final data cleaning: in the case of amended filings, we keep an amended level of holdings if the amended report a) occurred within 90 days of the reporting date and b) the initial filing fails our logic check described above. The resulting dataset has around 48M reported holdings (CIK-CUSIP) for all 76 quarters and between 4,000 and 7,000 CUSIPs and between 1,000 and 4,000 investors per quarter. We do not claim that our dataset is perfect; there are undoubtedly errors. As documented elsewhere, there are often errors in the actual source documents as well. However, our method seemed to produce more reliable data in several cases than the TR dataset, as shown in Online Appendix B of the related paper linked above. Included Files Perl Parsing Code (find_holdings_snp.pl). For reference, only needed if you wish to re-parse original filings. Investor holdings for 1999-2017: lightly cleaned. Each CIK-CUSIP-rdate is unique. Over 47M records. The fields are CIK: the central index key assigned by the SEC for this investor. Mapping to names is available below. CUSIP: the identity of the holdings. Consult the SEC's 13(f) listings to identify your CUSIPs of interest. shares: the number of shares reportedly held. Merging in CRSP data on shares outstanding at the CUSIP-Month level allows one to construct \beta. We make no distinction for the sole/shared/none voting discretion fields. If a researcher is interested, we did collect that starting in mid-2013, when filings are in XML format. rdate: reporting date (end of quarter). 8 digit, YYYYMMDD. fdate: filing date. 8 digit, YYYYMMDD. ftype: the form name. Notes: we did not consolidate separate BlackRock entities (or any other possibly related entities). If one wants to do so, use the CIK-CIKname mapping file below. We drop any CUSIP-rdate observation where any investor in that CUSIP reports owning greater than 50% of shares outstanding (even though legitimate cases exist - see, for example, Diamond Offshore and Loews Corporation). We also drop any CUSIP-rdate observation where greater than 120% of shares outstanding are reported to be held by 13(f) investors. Cases where the shares held are listed as zero likely mean the investor filing lists a holding for the firm but that our code could not find the number of shares due to the formatting of the file. We leave these in the data so that any researchers that find a zero know to go back to that source filing to manually gather the holdings for the securities they are interested in. Processed 13f holdings (airlines.parquet, cereal.parquet, out_scrape.parquet). These are used in our related AEJ:Microeconomics paper. The files contain all firms within the airline industry, RTE cereal industry, and all large cap firms (a superset of the S&P 500) respectively. These are a merged version of the scrape_parsed.csv file described above, that include the shares outstanding and percent ownership used to calculate measures of common ownership. These are distributed as brotli compressed Apache Parquet (binary) files. This preserves date information correctly. mgrno: manager number (which is actually CIK in the scraped data) rdate: reporting date ncusip: cusip rrdate: reportaing date in stata format mgrname: manager name shares: shares sole: shares with sole authority shared: shares with shared authority none: shares with no authority isbr/isfi/iss/isba/isvg: is this blackrock, statestreet, vanguard, barclay, fidelity numowners: how many owners prc: price at reporting date shares_out: shares outstanding at reporting date value: reported value in 13(f) beta: shares/shares_out permno: permno Profit weight values (i.e. \kappa) for all firms in the sample. (public_scrape_kappas_XXXX.parquet). Each file represents one year of data and is around 200MB and distributed as a compressed (brotli) parquet file. Fields are simply CUSIP_FROM, CUSIP_TO, KAPPA, QUARTER. Note that these have not been adjusted for multi-class share firms, insider holdings, etc. If looking at a particular market, some additional data cleaning on the investor holdings (above) followed by recomputing profit weights is recommended. For this, we did merge the separate BlackRock entities prior to computing \kappa. CIKmap.csv (~250K observations) Mapping is from CIK-rdate to CIKname. Use this if you want to consolidate holdings across reporting entities or explore the identities of reporting firms. In the case of amended filings that use different names than original ones, we keep the earliest name. Example of Parsing Challenge Prior to the XML era, filings were far from uniform, which creates a notable challenge for parsing them for holdings. In the examples directory we include several example text files of raw 13f filings. Example 1 is a "well behaved" filing, with CUSIP, followed by value, followed by number of shares, as recommended by the SEC. Example 2 shows a case where the ordering is changed: CUSIP, then shares, then value. The column headers show "item 5" coming before "item 4". Example 3 shows a case of a fixed width table, which in principle could be parsed very easily using the tags at the top, although not all filings consistently use these tags. Example 4 shows a case with a fixed width table, with no tag for the CUSIP column. Also, notice that if the firm holds more than 10M shares of a firm, that number occupies the entire width of the column and there is no longer a column separator (i.e. Cisco Systems on line 374). Example 5 shows a comma-separated table format. Example 6 shows a case of changing the column ordering, but also adding an (unrequired) column for share price. Example 7 shows a case where the table is split across subsequent pages, and so the CUSIP appears on a different line than the number of shares.

  11. d

    Risk Factors | North American Public Companies Risk Data | Datacie

    • datarade.ai
    Updated Mar 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datacie (2021). Risk Factors | North American Public Companies Risk Data | Datacie [Dataset]. https://datarade.ai/data-products/risk-factors-north-american-public-companies-risk-data-datacie-datacie
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Mar 22, 2021
    Dataset authored and provided by
    Datacie
    Area covered
    United States of America, Canada
    Description

    Datacie’s Risk Factors dataset is a unique text-mined data solution that offers uncharted insights into the risk factors born by publicly listed companies. Access this unequaled dataset to understand, measure, and make informed decisions about the plenitude of risk factors that surround a business model, a group of companies, or an entire industry from a fresh, qualitative perspective.

    USE CASES & BENEFITS

    Consultants - help your clients realize the competitive advantages of an effective, risk-informed strategy.

    Corporates - obtain an in-depth understanding of the risk factors that your business partners & competitors face and compare your company’s identified risk items to your industry’s standards.

    Hedge Funds - devise winning investment strategies that exploit risks’ probability of occurrence miscalculation or the misjudgment of the risks’ severity of consequences.

    Financial product managers - develop investment products that concentrate/exclude distinct risk factors.

    Institutional managers - measure and report on your products’ integrated risk factors and provide confidence to your clients that require greater transparency around risk management.

    Investors - gain full awareness of the risks inherent to your portfolio companies such as client concentration or dependence on a certain product or supplier.

    Risk managers - discover new concentration risk, segment idiosyncratic from widespread risk factors, create risk-mitigation strategies by adjusting your exposure to distinct risk factors and perform comprehensive, cross-sectorial risk analyses using self-reported data.

    METHODOLOGY

    Datacie’s Risk Factors dataset is built from preliminary earnings report (PRELIM), quarterly/annual filings (10-Q/10-K), and from Events or Changes Between Quarterly Reports (8-K) – all the SEC filings being sourced from the SEC EDGAR database.

    As a trusted data partner, Datacie is committed to delivering competitive and error-free data products to its clients. To this end, we are leveraging our proprietary data extraction technology to capture most of the risk items and their summary automatically from the “Risk Factors” section found in specific SEC filings (see item 105 of regulation S-K). By construction, our technology predicts diverse quality indicators for each data point acquired; our risk management experts then review these quality indicators and carry thorough qualitative investigations of the extracted information, guaranteeing that our risk factors compilations are fully accurate and that no crucial items are missed.

    After inception, the entire dataset is audited and evaluated for trustworthiness. Every data entry passes through hundreds of quality checks that automatically identify outliers and potentially misreported observations. Additional manual checks are performed on low-confident data points, ensuring that the final data product is free from poor-quality observations.

    STATUS & DELIVERY

    The Risk Factors dataset is currently being compiled and prepared for our clients’ preferred usage; additional coverage (temporal and company-wise) and existing/new columns can be modified/added on demand. Contact us and be among the first to benefit from this data solution to inform your risk-related decisions with the highest quality, cost-effective data of the market.

  12. Historical Financials Data for 3000+ stocks

    • kaggle.com
    zip
    Updated Jul 26, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    bot_developer (2020). Historical Financials Data for 3000+ stocks [Dataset]. https://www.kaggle.com/miguelaenlle/parsed-sec-10q-filings-since-2006
    Explore at:
    zip(4553901 bytes)Available download formats
    Dataset updated
    Jul 26, 2020
    Authors
    bot_developer
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Context

    Getting access to high-quality historical stock market data can be very expensive and/or complicated; parsing SEC 10-Q filings direct from the SEC EDGAR is difficult due to the varying structures of filings and SEC filing data from providers such as Quandl charge hundreds or thousands of dollars in yearly fees to get access to them. Here, I provide an easy-to-use, straight from the source database of parsed financials information from SEC 10-Q filings for more than 3000 stocks.

    Content

    The quarterly financials are provided in a single .csv file, quarterly_financials.csv ~50% of the data is NaN either because the field wasn't detected by my XBRL parsing system or the field wasn't addressed in the SEC filing.

    Acknowledgements

    All the data is scraped from the SEC from the XBRL files.

  13. d

    8-K reports database 2015 - 2019 - Dataset - B2FIND

    • b2find.dkrz.de
    Updated Oct 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). 8-K reports database 2015 - 2019 - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/3e49535d-2d7f-5368-bc20-aeb7500ed85b
    Explore at:
    Dataset updated
    Oct 21, 2023
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We create a dataset which focuses on 8-K reports for the years 2015 - 2019. We restrict ourselves to Standard & Poor's 500 companies.An 8-K is a report of unscheduled material events or corporate changes at a company that could be of importance to the shareholders or the Securities and Exchange Commission (SEC). Also known as a Form 8K, the report notifies the public of events, including acquisitions, bankruptcy, the resignation of directors, or changes in the fiscal year. We have compiled this dataset, thanks to SEC's EDGAR tool.The texts were pre-processed by applying a classical pipeline : - removal of non-alphanumeric characters; - lemmatisation; - removal of rare words and stopwords.The file (K8_data_2015_2019.rds) is a list of two items. The first item is composed of all information about the 8K and extracted texts. The second item is the document-term matrix with the pre-processed texts with 37238 texts and 70223 words. An example of 8-K can be found here https://www.sec.gov/files/form8-k.pdf.

  14. H

    A database for blockholders in US-listed firms including all Form 13D and...

    • dataverse.harvard.edu
    rar
    Updated Aug 17, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harvard Dataverse (2021). A database for blockholders in US-listed firms including all Form 13D and Form 13G filings. [Dataset]. http://doi.org/10.7910/DVN/61Z64Q
    Explore at:
    rar(75655291)Available download formats
    Dataset updated
    Aug 17, 2021
    Dataset provided by
    Harvard Dataverse
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Description

    The Dataset contains structured, parsed data from 758,666 Form 13D and Form 13G blockholder filings from November 1993 to May 2021, downloaded from SEC Edgar. The data is made available as a RAR-compressed CSV file, in which each row represents a single filing and the 76 columns contain parsed information for each filing. Please see the accompanying paper "Determinants of Blockholdership - A new Dataset for Blockholder Analysis" for more information and cite it when using the data for your research. As stated by the SEC, "Information presented on www.sec.gov is considered public information and may be copied or further distributed by users of the web site without the SEC’s permission." (see https://www.sec.gov/privacy.htm#dissemination).

  15. g

    Firm Database of Emerging Growth Initial Public Offerings (IPOs), 1990-2010...

    • search.gesis.org
    Updated May 1, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kenney, Martin; Patton, Donald (2021). Firm Database of Emerging Growth Initial Public Offerings (IPOs), 1990-2010 - Version 1 [Dataset]. http://doi.org/10.3886/ICPSR34944.v1
    Explore at:
    Dataset updated
    May 1, 2021
    Dataset provided by
    GESIS search
    ICPSR - Interuniversity Consortium for Political and Social Research
    Authors
    Kenney, Martin; Patton, Donald
    Description

    Abstract (en): This database is comprised of all emerging growth initial public offerings (IPOs) on American stock exchanges and filed with the Securities and Exchange Commission (SEC) from January 1990 through December 2010. The emerging growth status of firms were established through examination of the prospectus, specifically in the the prospectus summary where the firm describes its activities, history, and business. The data has been constructed directly from registration statements and prospectuses filed with the SEC and contains variables that pertain to the firm going public and the offering itself. Documents used to collect this data were found on the SEC's Electronic Data, Gathering, Analysis and Retrieval (EDGAR) Web site. Information regarding IPO registration statements and prospectuses filed from January 1990 through May 1996 were obtained from the Stanford Graduate School Library in either PDF of TIFF format. The variables were extracted from each firm's prospectus (form 424B) or the firm's registration statement (form S-1). The collection does not contain weight variables. ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection: Checked for undocumented or out-of-range codes.. All emerging growth initial public offerings (IPOs) on American stock exchanges and filed with the Securities and Exchange Commission (SEC) from January 1990 through December 2010. Smallest Geographic Unit: City Funding insitution(s): National Science Foundation. Science of Science Policy Program (NSF 0915257). National Science Foundation. Geography and Regional Sciences (NSF 0647838). record abstractsMultiple string responses for the Company_Auditor variable contain diacritical marks. Subsequently, this variable was exported to a tab delimited file (qda34944-0001_company_audtior.txt) which is available for download and can be linked to the dataset via the CASEID variable.

  16. d

    Wells Fargo 8-K reports database 2015 - 2019 - Dataset - B2FIND

    • b2find.dkrz.de
    Updated May 9, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Wells Fargo 8-K reports database 2015 - 2019 - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/60ccf919-becf-58bd-bc45-5e47c425b16c
    Explore at:
    Dataset updated
    May 9, 2023
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We create a dataset which focuses on 8-K reports for the years 2015 - 2019 for Wells Fargo. An 8-K is a report of unscheduled material events or corporate changes at a company that could be of importance to the shareholders or the Securities and Exchange Commission (SEC). Also known as a Form 8K, the report notifies the public of events, including acquisitions, bankruptcy, the resignation of directors, or changes in the fiscal year. We have compiled this dataset, thanks to SEC's EDGAR tool.The texts were pre-processed by applying a classical pipeline : - removal of non-alphanumeric characters; - lemmatisation; - removal of rare words and stopwords: we obtain a dictionary of 4377 distinct roots for the whole corpus.This company published 672 reports for the years 2015 and 2019. The file is a list of two items. The first item is composed of all information about the 8K and extracted texts. The second item is the document-term matrix with the pre-processed texts with 672 texts and 4377 words. An example of 8-K can be found here https://www.sec.gov/files/form8-k.pdf.

  17. Board Leadership Database (U.S. Public Firms) + ML Script for Scaling Human...

    • zenodo.org
    bin, csv +1
    Updated Jun 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joseph S Harrison; Joseph S Harrison; Matthew Josefy; Matthew Josefy; Matias Kalm; Matias Kalm; Ryan Krause; Ryan Krause (2023). Board Leadership Database (U.S. Public Firms) + ML Script for Scaling Human Coded Data [Dataset]. http://doi.org/10.5281/zenodo.7304697
    Explore at:
    csv, text/x-python, binAvailable download formats
    Dataset updated
    Jun 26, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Joseph S Harrison; Joseph S Harrison; Matthew Josefy; Matthew Josefy; Matias Kalm; Matias Kalm; Ryan Krause; Ryan Krause
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Files include: (1) an open sourced database of CEO duality and board chair orientations developed by scaling human coded data using supervised machine learning techniques (in both .dta and .csv formats), as well as (2) the accompanying training and scoring scripts to scale human coded data.

    Users may apply the scoring script to score the same variables from company proxy statements, or may adapt the training/scoring scripts and retrain models to scale human coded data of other constructs or measures.

    We note that early steps in the process to develop our database and script required web-scraping of company filings from SEC Edgar and text extraction from collected filings. We relied on other publicly available scripts to develop our own fetcher and extraction scripts. Users seeking to duplicate those parts of the process may benefit from the following resources from Kai Chen and pipy.org:

    For resources from Kai Chen: see https://www.kaichen.work/?p=681 and https://www.kaichen.work/?p=946

    For resources from pipy.org, see sec-edgar-downloader and sec-api

  18. O

    DEFT Corpus

    • opendatalab.com
    zip
    Updated Mar 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adobe Research (2023). DEFT Corpus [Dataset]. https://opendatalab.com/OpenDataLab/DEFT_Corpus
    Explore at:
    zip(54370260 bytes)Available download formats
    Dataset updated
    Mar 24, 2023
    Dataset provided by
    University of California
    Adobe
    Adobe Research
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    The DEFT corpus1 consists of annotated content from two different data sources: 1) 2,443 sentences (5,324,430 tokens) from various 2017 SEC contract filings from the publicly available US Securities and Exchange Commission EDGAR (SEC) database, and 2) 21,303 sentences (409,253 tokens) from the https://cnx.org/ open source textbooks (by various authors, licensed under CC BY 4.0) including topics in biology, history, physics, psychology, economics, sociology, and government. 22% of SEC sentences contain definitions and 28% of textbook sentences contain definitions. Our entire corpus, including both datasets, is significantly larger and more complex than any existing definition extraction dataset (see Table 1).

  19. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Economics Datasets (2014). EDGAR Database of SEC Filings [Dataset]. https://data.wu.ac.at/odso/datahub_io/ZWIwNTI0NTMtNzFmNS00NWNhLWEyMDQtODBjZmEyYWE4Yzg0

EDGAR Database of SEC Filings

Explore at:
10 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Feb 1, 2014
Dataset provided by
Economics Datasets
Description

Securities and Exchange Commission (SEC) EDGAR database which contains regulatory filings from publicly-traded US corporations.

All companies, foreign and domestic, are required to file registration statements, periodic reports, and other forms electronically through EDGAR. Anyone can access and download this information for free. Here you'll find links to a complete list of filings available through EDGAR and instructions for searching the EDGAR database.

Human Interface

See http://www.sec.gov/edgar/searchedgar/companysearch.html

Bulk Data

EDGAR provides bulk access via FTP: ftp://ftp.sec.gov/ - "https://www.sec.gov/edgar/searchedgar/ftpusers.htm">official documentation. We summarize here the main points.

Each company in EDGAR gets an identifier known as the CIK which is a 10 digit number. You can find the CIK by searching EDGAR using a name of stock market ticker.

For example, "http://www.sec.gov/cgi-bin/browse-edgar?CIK=ibm">searching for IBM by ticker shows us that the the CIK is 0000051143.

Note that leading zeroes are often omitted (e.g. in the ftp access) so this would be come 51143.

Next each submission receives an 'Accession Number' (acc-no). For example, IBM's quarterly financial filing (form 10-Q) in October 2013 had accession number: 0000051143-13-000007.

FTP File Paths

Given a company with CIK (company ID) XXX (omitting leading zeroes) and document accession number YYY (acc-no on search results) the path would be:

File paths are of the form:

/edgar/data/XXX/YYY.txt

For example, for the IBM data above it would be:

ftp://ftp.sec.gov/edgar/data/51143/0000051143-13-000007.txt

Note, if you are looking for a nice HTML version you can find it at in the Archives section with a similar URL (just add -index.html):

http://www.sec.gov/Archives/edgar/data/51143/000005114313000007/0000051143-13-000007-index.htm

Indices

If you want to get a list of all filings you'll want to grab an Index. As the help page explains:

The EDGAR indices are a helpful resource for FTP retrieval, listing the following information for each filing: Company Name, Form Type, CIK, Date Filed, and File Name (including folder path).

Four types of indexes are available:

  • company — sorted by company name
  • form — sorted by form type
  • master — sorted by CIK number
  • XBRL — list of submissions containing XBRL financial files, sorted by CIK number; these include Voluntary Filer Program submissions

URLs are like:

ftp://ftp.sec.gov/edgar/full-index/2008/QTR4/master.gz

That is, they have the following general form:

ftp://ftp.sec.gov/edgar/full-index/{YYYY}/QTR{1-4}/{index-name}.[gz|zip]

So for XBRL in the 3rd quarter of 2010 we'd do:

ftp://ftp.sec.gov/edgar/full-index/2010/QTR3/xbrl.gz

CIK lists and lookup

There's a full list of all companies along with their CIK code here: http://www.sec.gov/edgar/NYU/cik.coleft.c

If you want to look up a CIK or company by its ticker you can do the following query against the normal search system:

http://www.sec.gov/cgi-bin/browse-edgar?CIK=ibm&Find=Search&owner=exclude&action=getcompany&output=atom

Then parse the atom to grab the CIK. (If you prefer HTML output just omit output=atom).

There is also a full-text company name to CIK lookup here: http://www.sec.gov/edgar/searchedgar/cik.htm (Note this does a POST to a 'text' API at http://www.sec.gov/cgi-bin/cik.pl.c)

References

  • CorpWatch have an excellent API and DB dump covering a lot of EDGAR info - see the [CorpWatch DataHub Entry]
Search
Clear search
Close search
Google apps
Main menu