100+ datasets found
  1. P

    MIMIC-IV ICD-10 Dataset

    • paperswithcode.com
    Updated Apr 20, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joakim Edin; Alexander Junge; Jakob D. Havtorn; Lasse Borgholt; Maria Maistro; Tuukka Ruotsalo; Lars Maaløe (2023). MIMIC-IV ICD-10 Dataset [Dataset]. https://paperswithcode.com/dataset/mimic-iv-icd-10
    Explore at:
    Dataset updated
    Apr 20, 2023
    Authors
    Joakim Edin; Alexander Junge; Jakob D. Havtorn; Lasse Borgholt; Maria Maistro; Tuukka Ruotsalo; Lars Maaløe
    Description

    MIMIC-IV ICD-10 contains 122,279 discharge summaries—free-text medical documents—annotated with ICD-10 diagnosis and procedure codes. It contains data for patients admitted to the Beth Israel Deaconess Medical Center emergency department or ICU between 2008-2019. All codes with fewer than ten examples have been removed, and the train-val-test split was created using multi-label stratified sampling. The dataset is described further in Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review and Replicability Study, and the code to use the dataset is found here.

    The dataset is intended for medical code prediction and was created using MIMIC-IV v2.2 and MIMIC-IV-NOTE v2.2. Using the two datasets requires a license obtained in Physionet; this can take a couple of days.

  2. h

    synth-ehr-icd10-alpaca-format

    • huggingface.co
    Updated Jun 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Generative Technologies, Inc (2024). synth-ehr-icd10-alpaca-format [Dataset]. https://huggingface.co/datasets/generative-technologies/synth-ehr-icd10-alpaca-format
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 26, 2024
    Dataset authored and provided by
    Generative Technologies, Inc
    Description

    generative-technologies/synth-ehr-icd10-alpaca-format dataset hosted on Hugging Face and contributed by the HF Datasets community

  3. ICD-10 CM Diagnosis Codes 2020

    • johnsnowlabs.com
    csv
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Snow Labs, ICD-10 CM Diagnosis Codes 2020 [Dataset]. https://www.johnsnowlabs.com/marketplace/icd-10-cm-diagnosis-codes-2020/
    Explore at:
    csvAvailable download formats
    Dataset authored and provided by
    John Snow Labs
    Time period covered
    Oct 1, 2019 - Nov 5, 2019
    Area covered
    United States
    Description

    This dataset contains the International Classification of Diseases, Clinical Modification, 10th Edition (ICD-10-CM) 2020 files that contain information on the new diagnosis coding system, ICD-10-CM, that is a replacement for ICD-9-CM, Volumes 1 and 2. These 2020 ICD-10-CM codes are to be used for services provided from October 1, 2019 through September 30, 2020.

  4. ICD-10 CM Diagnosis Codes 2023

    • johnsnowlabs.com
    csv
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Snow Labs, ICD-10 CM Diagnosis Codes 2023 [Dataset]. https://www.johnsnowlabs.com/marketplace/icd-10-cm-diagnosis-codes-2023/
    Explore at:
    csvAvailable download formats
    Dataset authored and provided by
    John Snow Labs
    Time period covered
    Oct 1, 2022 - Sep 30, 2023
    Area covered
    United States
    Description

    This dataset contains the International Classification of Diseases, Clinical Modification, 10th Edition (ICD-10-CM) 2023 files that contain information on the new diagnosis coding system, ICD-10-CM, that is a replacement for ICD-9-CM, Volumes 1 and 2. These 2023 ICD-10-CM codes are to be used for discharges occurring from October 1, 2022 through September 30, 2023 and for patient encounters occurring from October 1, 2022 through September 30, 2023.

  5. ICD-10 CM Diagnosis Codes 2016

    • johnsnowlabs.com
    csv
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Snow Labs, ICD-10 CM Diagnosis Codes 2016 [Dataset]. https://www.johnsnowlabs.com/marketplace/icd-10-cm-diagnosis-codes-2016/
    Explore at:
    csvAvailable download formats
    Dataset authored and provided by
    John Snow Labs
    Time period covered
    Oct 1, 2015 - Sep 30, 2016
    Area covered
    United States
    Description

    This dataset contains the International Classification of Diseases, Clinical Modification, 10th Edition (ICD-10-CM) 2016 files that contain information on the new diagnosis coding system, ICD-10-CM, that is a replacement for ICD-9-CM, Volumes 1 and 2. These 2016 ICD-10-CM codes are to be used for services provided from October 1, 2015 through September 30, 2016.

  6. h

    ICD10

    • huggingface.co
    Updated Sep 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Moudather Chelbi (2024). ICD10 [Dataset]. https://huggingface.co/datasets/chemouda/ICD10
    Explore at:
    Dataset updated
    Sep 26, 2024
    Authors
    Moudather Chelbi
    Description

    chemouda/ICD10 dataset hosted on Hugging Face and contributed by the HF Datasets community

  7. ICD9 and ICD10 Comorbid Diagnosis for High Risk Veteran Patients

    • catalog.data.gov
    • data.va.gov
    • +1more
    Updated Feb 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Veterans Affairs (2024). ICD9 and ICD10 Comorbid Diagnosis for High Risk Veteran Patients [Dataset]. https://catalog.data.gov/dataset/icd9-and-icd10-comorbid-diagnosis-for-high-risk-veteran-patients
    Explore at:
    Dataset updated
    Feb 28, 2024
    Dataset provided by
    United States Department of Veterans Affairshttp://va.gov/
    Description

    There are 2 datasets of high-risk patient populations; one from calendar year 2014 (N1 = 937,407), for which we used International Classification of Disease Version 9 (ICD9) codes to identify comorbid conditions, and a second, more recent population selected from June 2017 to June 2018 (N2 = 979,607) for use with the newer International Classification of Disease Version 10 (ICD10) codes. DOI: 10.1109/JBHI.2019.2948734

  8. d

    ICD-10

    • catalog.data.gov
    Updated Jun 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unspecified (2025). ICD-10 [Dataset]. https://catalog.data.gov/dataset/icd-10-3bb0c
    Explore at:
    Dataset updated
    Jun 29, 2025
    Dataset provided by
    Unspecified
    Description

    The International Classification of Diseases 10th Revision is a medical classification list by the WHO for coding various diseases and conditions

  9. d

    Data from: DSM5-ICD10

    • catalog.data.gov
    Updated Jun 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unspecified (2025). DSM5-ICD10 [Dataset]. https://catalog.data.gov/dataset/dsm5-icd10
    Explore at:
    Dataset updated
    Jun 15, 2025
    Dataset provided by
    Unspecified
    Description

    Ontology for use in Phenotyping Natural Language Processing (NLP)

  10. d

    Data from: DSM5-ICD10

    • catalog.data.gov
    Updated Jun 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unspecified (2025). DSM5-ICD10 [Dataset]. https://catalog.data.gov/dataset/dsm5-icd10-bb886
    Explore at:
    Dataset updated
    Jun 29, 2025
    Dataset provided by
    Unspecified
    Description

    The Diuagnostics and Statistical Manual of Mental Discorders, Fifth Edition, is used by clinicians to diagnose mental disorders

  11. Z

    CodiEsp-abstracs: Abstracts from Lilacs and Ibecs with ICD10 codes

    • data.niaid.nih.gov
    Updated May 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rana, Ankush (2021). CodiEsp-abstracs: Abstracts from Lilacs and Ibecs with ICD10 codes [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3606625
    Explore at:
    Dataset updated
    May 7, 2021
    Dataset provided by
    Krallinger, Martin
    Rana, Ankush
    Miranda-Escalada, Antonio
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    JSON file with abstracts from Lilacs and Ibecs with ICD10 codes (ICD10-CM and ICD10-PCS) associated to them (CIE10 in Spanish).

    Please, cite us:

    Miranda-Escalada, A., Gonzalez-Agirre, A., Armengol-Estapé, J., Krallinger, M.: Overview of automatic clinical coding: annotations, guidelines, and solutions for non-English clinical cases at CodiEsp track of eHealth CLEF 2020. In: CLEF (Working Notes) (2020)

    @inproceedings{miranda2020overview, title={Overview of automatic clinical coding: annotations, guidelines, and solutions for non-english clinical cases at codiesp track of CLEF eHealth 2020}, author={Miranda-Escalada, Antonio and Gonzalez-Agirre, Aitor and Armengol-Estap{\'e}, Jordi and Krallinger, Martin}, booktitle={Working Notes of Conference and Labs of the Evaluation (CLEF) Forum. CEUR Workshop Proceedings}, year={2020} }

    Lilacs and Ibecs databases have MeSH terms describing some of their documents. Then, using UMLS Metathesaurus, those MeSH terms have been translated into ICD10 codes (ICD10-CM and ICD10-PCS). Every abstract have at least one ICD10 code.

    In addition, MeSH codes given by the databases (Lilacs and Ibecs) have a "word" describing them. These "words" have been used to add further ICD10 codes. We have done strict string matching to find whether those "words" were a descriptor of any ICD10 code (in the Spanish version, CIE10).

    The format of the JSON file is the following:

    {'articles': [{'title': 'title', 'pmid': 'pmid', 'abstractText': 'abtract (in Spanish)', 'Mesh': [{'Code': 'MeSHCode', 'Word': 'reference', 'CIE': [CIE10_1, CIE10_2, ...]}, ...] }, ...] }

    Additionally, the compressed file includes a folder with all the abstracts extracted in individual UTF-8 encoded text files and a tab-separated file with 4 fields:

    pmid label cie10-code word

    Summary statistics:

    number of abstracts: 355 840

    number abstracts with at least one ICD10 code: 176 294

    Percentage of MeSH codes mapped to ICD10: 10.6% (there were 2 526 772 MeSH codes and 266 949 mapped to ICD10)

    average number of MeSH codes per article: 7.1

    average number of ICD10 codes per article: 2.5

    number of ICD10 codes that have an associated MeSH code in UMLS: 3293

    number of ICD10 codes that have an associated MeSH code in UMLS and appear in this dataset: 3082

  12. d

    ICD-10

    • catalog.data.gov
    Updated Jun 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Unspecified (2025). ICD-10 [Dataset]. https://catalog.data.gov/dataset/icd-10
    Explore at:
    Dataset updated
    Jun 15, 2025
    Dataset provided by
    Unspecified
    Description

    Ontology for use in Phenotyping Natural Language Processing (NLP)

  13. E

    MeSDiCon subset for CodiEsp: MESH terms in MeSDiCon mapped to ICD10 CM and...

    • live.european-language-grid.eu
    tsv
    Updated Dec 11, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). MeSDiCon subset for CodiEsp: MESH terms in MeSDiCon mapped to ICD10 CM and ICD10 PCS [Dataset]. https://live.european-language-grid.eu/catalogue/lcr/7567
    Explore at:
    tsvAvailable download formats
    Dataset updated
    Dec 11, 2023
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The MeSDiCon consists of a list or gazetteer of candidate names of diseases and symptoms mentioned in Spanish clinical texts. Thus MeSDiCon serves as a lexical resource or dictionary for automatic detection of disease/symptom mentions, as well as indexing or classification of medical texts with such concept types. Terms in MeSDiCon were mapped to MESH terminology.

    In this subset, we have mapped MESH codes to ICD10-CM and ICD10-PCS through UMLS Metathesaurus. Then, this resource contains diseases and symptoms terms from Spanish clinical texts mapped to MESH and ICD10.

    File structure. TSV. Data is separated by tabs (\t). Every row of the file has the following fields:terminology identifier translatedTerm termCount documentCount ICD10CM-code ICD10PCS-code

    In case one MESH term is mapped to more than one ICD10 code, they are separated by commas.

  14. ICD-10 CM Age Restriction

    • johnsnowlabs.com
    csv
    Updated Jan 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Snow Labs (2021). ICD-10 CM Age Restriction [Dataset]. https://www.johnsnowlabs.com/marketplace/icd-10-cm-age-restriction/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 20, 2021
    Dataset authored and provided by
    John Snow Labs
    Area covered
    United States
    Description

    This dataset contains the International Classification of Diseases, Clinical Modification, 10th edition Age Restriction Database contains information on the age-restricted diagnosis codes of the diagnoses coding system, ICD-10-CM.

  15. P

    MIMIC-IV-ICD10-top50 Dataset

    • paperswithcode.com
    Updated Apr 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). MIMIC-IV-ICD10-top50 Dataset [Dataset]. https://paperswithcode.com/dataset/mimic-iv-icd10-top50
    Explore at:
    Dataset updated
    Apr 26, 2023
    Description

    The MIMIC-IV-ICD10 dataset, featuring the top 50 most frequently occurring labels.

  16. r

    ICD9 CCS Neuro-Neurosurgery

    • redivis.com
    Updated May 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Surgical Epi, Trauma Systems, Geographic Inequalities @PGSSC (2024). ICD9 CCS Neuro-Neurosurgery [Dataset]. https://redivis.com/datasets/6v7y-b8rx0vh7z
    Explore at:
    Dataset updated
    May 30, 2024
    Dataset authored and provided by
    Surgical Epi, Trauma Systems, Geographic Inequalities @PGSSC
    Description

    The table ICD9 CCS Neuro-Neurosurgery is part of the dataset ICD9 and ICD10 Neuro-Neurosurgery Codes CCS, available at https://redivis.com/datasets/6v7y-b8rx0vh7z. It contains 3948 rows across 8 variables.

  17. r

    ICD10 Descriptions

    • redivis.com
    Updated Apr 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Surgical Epi, Trauma Systems, Geographic Inequalities @PGSSC (2025). ICD10 Descriptions [Dataset]. https://redivis.com/datasets/bq13-2xzrxkrw2
    Explore at:
    Dataset updated
    Apr 15, 2025
    Dataset authored and provided by
    Surgical Epi, Trauma Systems, Geographic Inequalities @PGSSC
    Description

    The table ICD10 Descriptions is part of the dataset Hospitalarios Secretaria de Salud, 2008-2023, available at https://redivis.com/datasets/bq13-2xzrxkrw2. It contains 14498 rows across 78 variables.

  18. o

    Data from: Non-technical Summaries (NTS) of Animal Experiments Indexed with...

    • openagrar.de
    Updated Jan 18, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mariana Neves; Daniel Butzke; Antje Dörendahl; Nora Leich; Barbara Grune; Gilbert Schönfelder (2019). Non-technical Summaries (NTS) of Animal Experiments Indexed with ICD-10 Codes (Version 1.0) [Dataset]. http://doi.org/10.17590/20190118-134645-0
    Explore at:
    Dataset updated
    Jan 18, 2019
    Authors
    Mariana Neves; Daniel Butzke; Antje Dörendahl; Nora Leich; Barbara Grune; Gilbert Schönfelder
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Dataset containing 8,386 non-technical summaries (NTS) of animal experiments recently carried out in Germany (as of September 19, 2018) and originally on-line available at the AnimalTestInfo database (http://animaltestinfo.de). Each NTS contains a title, uses (goals) of the experiments, possible harms caused to the animals, and comments about replacement, reduction and refinement (in the scope of the 3R principles). All documents are in the German language. The dataset includes the ICD-10 codes manually assigned by experts to the NTS. However, some NTSs have no ICD-10 codes assigned to them, as the codes were not applicable to the uses described in the NTS. All codes are chapters or groups from the ICD-10 German Modification 2016 version (https://www.dimdi.de/static/de/klassifikationen/icd/icd-10-gm/kode-suche/htmlgm2016/). Finally, the dataset is split into training and development datasets which are meant to be used in the CLEF eHealth 2019, Task 1 - Multilingual Information Extraction (https://sites.google.com/view/clefehealth2019/task-1-multilingual-information-extraction-icd10-coding).

  19. ICD-10 Market: U.S Industry Analysis and Opportunity Assessment 2022 to 2032...

    • futuremarketinsights.com
    html, pdf
    Updated Dec 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Future Market Insights (2022). ICD-10 Market: U.S Industry Analysis and Opportunity Assessment 2022 to 2032 [Dataset]. https://www.futuremarketinsights.com/reports/icd-10-market
    Explore at:
    pdf, htmlAvailable download formats
    Dataset updated
    Dec 12, 2022
    Dataset authored and provided by
    Future Market Insights
    License

    https://www.futuremarketinsights.com/privacy-policyhttps://www.futuremarketinsights.com/privacy-policy

    Time period covered
    2022 - 2032
    Area covered
    United States, Worldwide
    Description

    The global ICD-10 market was valued US$ 18.78 billion in 2022, and it is expected to grow at a CAGR of 10.0% over the forecast period. By 2032, the global market is expected to be worth US$ 18.78 billion. The growing requirement for a uniform language in medical documentation to streamline hospital billing operations is driving market expansion.

    AttributesDetails
    ICD-10 Market CAGR10%
    ICD-10 Market Size 2022US$ 18.78 billion
    ICD-10 Market Size 2032US$ 18.78 billion
  20. w

    ICD10

    • data.wu.ac.at
    html
    Updated May 15, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2014). ICD10 [Dataset]. https://data.wu.ac.at/schema/linkeddatacatalog_dws_informatik_uni-mannheim_de/ZTljYjY1ZjctNTNjMS00Nzk0LWE1NTAtMWQzY2E2MTUyYTBm
    Explore at:
    htmlAvailable download formats
    Dataset updated
    May 15, 2014
    Description

    International Statistical Classification of Diseases and Related Health Problems (ICD-10). 10th rev. Geneva

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Joakim Edin; Alexander Junge; Jakob D. Havtorn; Lasse Borgholt; Maria Maistro; Tuukka Ruotsalo; Lars Maaløe (2023). MIMIC-IV ICD-10 Dataset [Dataset]. https://paperswithcode.com/dataset/mimic-iv-icd-10

MIMIC-IV ICD-10 Dataset

Explore at:
12 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Apr 20, 2023
Authors
Joakim Edin; Alexander Junge; Jakob D. Havtorn; Lasse Borgholt; Maria Maistro; Tuukka Ruotsalo; Lars Maaløe
Description

MIMIC-IV ICD-10 contains 122,279 discharge summaries—free-text medical documents—annotated with ICD-10 diagnosis and procedure codes. It contains data for patients admitted to the Beth Israel Deaconess Medical Center emergency department or ICU between 2008-2019. All codes with fewer than ten examples have been removed, and the train-val-test split was created using multi-label stratified sampling. The dataset is described further in Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review and Replicability Study, and the code to use the dataset is found here.

The dataset is intended for medical code prediction and was created using MIMIC-IV v2.2 and MIMIC-IV-NOTE v2.2. Using the two datasets requires a license obtained in Physionet; this can take a couple of days.

Search
Clear search
Close search
Google apps
Main menu