3 datasets found
  1. Z

    EDGAR-CORPUS

    • data.niaid.nih.gov
    • explore.openaire.eu
    Updated Oct 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lefteris Loukas (2021). EDGAR-CORPUS [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5570566
    Explore at:
    Dataset updated
    Oct 15, 2021
    Dataset provided by
    Ion Androutsopoulos
    Manos Fergadiotis
    Lefteris Loukas
    Prodromos Malakasiotis
    Description

    EDGAR-CORPUS: Billions of Tokens Make The World Go Round

    In the Proceedings of the Workshop on Economics and Natural Language Processing (ECONLP) - co-located with EMNLP 2021

    We release EDGAR-CORPUS, a novel corpus comprising annual reports from all the publicly traded companies in the US spanning a period of more than 25 years.

    All the reports are downloaded, split into their corresponding items (sections), and provided in a clean, easy-to-use JSON format.

  2. Sec Financial Statement Data in Json

    • kaggle.com
    Updated Apr 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Angular2guy (2025). Sec Financial Statement Data in Json [Dataset]. https://www.kaggle.com/datasets/wbqrmgmcia7lhhq/sec-financial-statement-data-in-json/versions/13
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 13, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Angular2guy
    License

    https://www.usa.gov/government-works/https://www.usa.gov/government-works/

    Description

    Data from 2010 Q1 to 2025 Q1

    The data is created with this Jupyter Notebook:

    The data format is documented in the Readme. The Sec data documentation can be found here.

    Json structure: {"quarter": "Q1", "country": "Italy", "data": {"cf": [{"value": 0, "concept": "A", "unit": "USD", "label": "B", "info": "C"}], "bs": [{"value": 0, "concept": "A", "unit": "USD", "label": "B", "info": "C"}], "ic": [{"value": 0, "concept": "A", "unit": "USD", "label": "B", "info": "C"}]}, "year": 0, "name": "B", "startDate": "2009-12-31", "endDate": "2010-12-30", "symbol": "GM", "city": "York"}

    An example Json: {"year": 2023, "data": {"cf": [{"value": -1834000000, "concept": "NetCashProvidedByUsedInFinancingActivities", "unit": "USD", "label": "Amount of cash inflow (outflow) from financing … Amount of cash inflow (outflow) from financing …", "info": "Net cash used in financing activities"}], "ic":[{"value": 1000000, "concept": "IncreaseDecreaseInDueFromRelatedParties", "unit": "USD", "label": "The increase (decrease) during the reporting pe… The increase (decrease) during the reporting pe…", "info": "Receivables from related parties"}], "bs": [{"value": 2779000000, "concept": "AccountsPayableCurrent", "unit": "USD", "label": "Carrying value as of the balance sheet date of … Carrying value as of the balance sheet date of …", "info": "Accounts payable"}]}, "quarter": "Q2", "city": "SANTA CLARA", "startDate": "2023-06-30", "name": "ADVANCED MICRO DEVICES INC", "endDate": "2023-09-29", "country": "US", "symbol": "AMD"}

  3. SEC data 10Q and 10K

    • kaggle.com
    Updated Oct 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Malik_1641 (2024). SEC data 10Q and 10K [Dataset]. https://www.kaggle.com/datasets/malik1641/sec-data-with-market-cap-added/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 9, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Malik_1641
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    this data is 10Q and 10K reports downloaded as JSON files, i then did tern them to parquet files for efficiency with every data frame there is a market cap column that is a masseur of the market cap of that day you can always get the data from the SEC website the latest update for the data should be here https://www.sec.gov/Archives/edgar/daily-index/xbrl/companyfacts.zip A.csv is just an example of what the rest of the data is going to look like. enjoy it if you can

  4. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Lefteris Loukas (2021). EDGAR-CORPUS [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5570566

EDGAR-CORPUS

Explore at:
Dataset updated
Oct 15, 2021
Dataset provided by
Ion Androutsopoulos
Manos Fergadiotis
Lefteris Loukas
Prodromos Malakasiotis
Description

EDGAR-CORPUS: Billions of Tokens Make The World Go Round

In the Proceedings of the Workshop on Economics and Natural Language Processing (ECONLP) - co-located with EMNLP 2021

We release EDGAR-CORPUS, a novel corpus comprising annual reports from all the publicly traded companies in the US spanning a period of more than 25 years.

All the reports are downloaded, split into their corresponding items (sections), and provided in a clean, easy-to-use JSON format.

Search
Clear search
Close search
Google apps
Main menu