13 datasets found
  1. p

    MIMIC-III Clinical Database

    • physionet.org
    Updated Sep 4, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alistair Johnson; Tom Pollard; Roger Mark (2016). MIMIC-III Clinical Database [Dataset]. http://doi.org/10.13026/C2XW26
    Explore at:
    Dataset updated
    Sep 4, 2016
    Authors
    Alistair Johnson; Tom Pollard; Roger Mark
    License

    https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts

    Description

    MIMIC-III is a large, freely-available database comprising deidentified health-related data associated with over forty thousand patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012. The database includes information such as demographics, vital sign measurements made at the bedside (~1 data point per hour), laboratory test results, procedures, medications, caregiver notes, imaging reports, and mortality (including post-hospital discharge).MIMIC supports a diverse range of analytic studies spanning epidemiology, clinical decision-rule improvement, and electronic tool development. It is notable for three factors: it is freely available to researchers worldwide; it encompasses a diverse and very large population of ICU patients; and it contains highly granular data, including vital signs, laboratory results, and medications.

  2. MIMIC-III Clinical Database(Open Access)

    • kaggle.com
    Updated Jun 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ihssane Ned (2025). MIMIC-III Clinical Database(Open Access) [Dataset]. https://www.kaggle.com/datasets/ihssanened/mimic-iii-clinical-databaseopen-access
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 2, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ihssane Ned
    Description

    Dataset Source

    This dataset is a portion of MIMIC-III Clinical Database, a large, freely-available database comprising deidentified health-related data associated with over forty thousand patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012. The MIMIC-III demo provides researchers with an opportunity to review the structure and content of MIMIC-III before deciding whether or not to carry out an analysis on the full dataset. The full dataset is available on PhysioNet this** link**

    Dataset Description:

    This dataset contains solely 4 tables (extracted from the original dataset), more informations about each table can be found in its corresponding link - admissions.csv
    - d_labitems.csv - labevents.csv - patient.csv a nice visualization of this dataset can be found here

    Future Perspectives:

    This portion of the dataset will be combined to build a comprehensive dataset of simulated medical reports.

  3. p

    MIMIC-IV Clinical Database Demo

    • physionet.org
    • registry.opendata.aws
    Updated Jan 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alistair Johnson; Lucas Bulgarelli; Tom Pollard; Steven Horng; Leo Anthony Celi; Roger Mark (2023). MIMIC-IV Clinical Database Demo [Dataset]. http://doi.org/10.13026/dp1f-ex47
    Explore at:
    Dataset updated
    Jan 31, 2023
    Authors
    Alistair Johnson; Lucas Bulgarelli; Tom Pollard; Steven Horng; Leo Anthony Celi; Roger Mark
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    The Medical Information Mart for Intensive Care (MIMIC)-IV database is comprised of deidentified electronic health records for patients admitted to the Beth Israel Deaconess Medical Center. Access to MIMIC-IV is limited to credentialed users. Here, we have provided an openly-available demo of MIMIC-IV containing a subset of 100 patients. The dataset includes similar content to MIMIC-IV, but excludes free-text clinical notes. The demo may be useful for running workshops and for assessing whether the MIMIC-IV is appropriate for a study before making an access request.

  4. Z

    Structure Annotations of Assessment and Plan Sections from MIMIC-III

    • data.niaid.nih.gov
    Updated Apr 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stupp, Doron; Barequet, Ronnie; Lee, I-Ching; Oren, Eyal; Feder, Amir; Benjamini, Ayelet; Hassidim, Avinatan; Matias, Yossi; Ofek, Eran; Rajkomar, Alvin (2022). Structure Annotations of Assessment and Plan Sections from MIMIC-III [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6413404
    Explore at:
    Dataset updated
    Apr 17, 2022
    Dataset provided by
    Google
    Authors
    Stupp, Doron; Barequet, Ronnie; Lee, I-Ching; Oren, Eyal; Feder, Amir; Benjamini, Ayelet; Hassidim, Avinatan; Matias, Yossi; Ofek, Eran; Rajkomar, Alvin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Physicians record their detailed thought-processes about diagnoses and treatments as unstructured text in a section of a clinical note called the "assessment and plan". This information is more clinically rich than structured billing codes assigned for an encounter but harder to reliably extract given the complexity of clinical language and documentation habits. To structure these sections we collected a dataset of annotations over assessment and plan sections from the publicly available and de-identified MIMIC-III dataset, and developed deep-learning based models to perform this task, described in the associated paper available as a pre-print at: https://www.medrxiv.org/content/10.1101/2022.04.13.22273438v1

    When using this data please cite our paper:

    @article {Stupp2022.04.13.22273438, author = {Stupp, Doron and Barequet, Ronnie and Lee, I-Ching and Oren, Eyal and Feder, Amir and Benjamini, Ayelet and Hassidim, Avinatan and Matias, Yossi and Ofek, Eran and Rajkomar, Alvin}, title = {Structured Understanding of Assessment and Plans in Clinical Documentation}, year = {2022}, doi = {10.1101/2022.04.13.22273438}, publisher = {Cold Spring Harbor Laboratory Press}, URL = {https://www.medrxiv.org/content/early/2022/04/17/2022.04.13.22273438}, journal = {medRxiv} }

    The dataset, presented here, contains annotations of assessment and plan sections of notes from the publicly available and de-identified MIMIC-III dataset, marking the active problems, their assessment description, and plan action items. Action items are additionally marked as one of 8 categories (listed below). The dataset contains over 30,000 annotations of 579 notes from distinct patients, annotated by 6 medical residents and students.

    The dataset is divided into 4 partitions - a training set (481 notes), validation set (50 notes), test set (48 notes) and an inter-rater set. The inter-rater set contains the annotations of each of the raters over the test set. Rater 1 in the inter-rater set should be regarded as an intra-rater comparison (details in the paper). The labels underwent automatic normalization to capture entire word boundaries and remove flanking non-alphanumeric characters.

    Code for transforming labels into TensorFlow examples and training models as described in the paper will be made available at GitHub: https://github.com/google-research/google-research/tree/master/assessment_plan_modeling

    In order to use these annotations, the user additionally needs to obtain the text of the notes which is found in the NOTE_EVENTS table from MIMIC-III, access to which is to be acquired independently (https://mimic.mit.edu/)

    Annotations are given as character spans in a CSV file with the following schema:

        Field
        Type
        Semantics
    
    
        partition
        categorical (one of [train, val, test, interrater]
        The set of ratings the span belongs to.
    
    
        rater_id
        int
        Unique id for each the raters
    
    
        note_id
        int
        The note’s unique note_id, links to the MIMIC-III notes table (as ROW-ID).
    
    
        span_type
        categorical (one of [PROBLEM_TITLE,
        PROBLEM_DESCRIPTION, ACTION_ITEM]
        Type of the span as annotated by raters.
    
    
        char_start
        int
        Character offsets from note start
    
    
        char_end
        int
    
    
        action_item_type
        categorical (one of [MEDICATIONS, IMAGING, OBSERVATIONS_LABS, CONSULTS, NUTRITION, THERAPEUTIC_PROCEDURES, OTHER_DIAGNOSTIC_PROCEDURES, OTHER])
        Type of action item if the span is an action item (empty otherwise) as annotated by raters.
    
  5. S

    Mortality Prediction MIMIC-III

    • scidb.cn
    Updated May 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yanrong Cai (2021). Mortality Prediction MIMIC-III [Dataset]. http://doi.org/10.11922/sciencedb.00787
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 6, 2021
    Dataset provided by
    Science Data Bank
    Authors
    Yanrong Cai
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This new dataset was established according to the MIMIC III dataset, an openly available database developed by The Laboratory of Computational Physiology at Massachusetts Institute of Technology (MIT), which consists of data from more than 25,000 patients who were admitted to the Beth Israel Deaconess Medical Center (BIDMC) since 2003 and who have been de-identified for information safety. Here, we identified patients who were diagnosed as pelvic, acetabular, or combined pelvic and acetabular fractures according to ICD-9 code and who survived at least 72 hours after the ICU admission. All the data within the first 72 hours following ICU admission were collected and extracted from the MIMIC-III clinical database version 1.4.

  6. MIMIC-IV Lab Events Subset - Preprocessed for Data Normalization...

    • zenodo.org
    bin, text/x-python +1
    Updated Jan 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ali Azadi; ali Azadi (2025). MIMIC-IV Lab Events Subset - Preprocessed for Data Normalization Analysis.xlsx [Dataset]. http://doi.org/10.5281/zenodo.14641824
    Explore at:
    txt, bin, text/x-pythonAvailable download formats
    Dataset updated
    Jan 13, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    ali Azadi; ali Azadi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This file contains a preprocessed subset of the MIMIC-IV dataset (Medical Information Mart for Intensive Care, Version IV), specifically focusing on laboratory event data related to glucose levels. It has been curated and processed for research on data normalization and integration within Clinical Decision Support Systems (CDSS) to improve Human-Computer Interaction (HCI) elements.

    The dataset includes the following key features:

    • Raw Lab Data: Original values of glucose levels as recorded in the clinical setting.
    • Normalized Data: Glucose levels transformed into a standardized range for comparison and analysis.
    • Demographic Information: Includes patient age and gender to support subgroup analyses.

    This data has been used to analyze the impact of normalization and integration techniques on improving data accuracy and usability in CDSS environments. The file is provided as part of ongoing research on enhancing clinical decision-making and user interaction in healthcare systems.

    Key Applications:

    • Research on the effects of data normalization on clinical outcomes.
    • Study of demographic variations in laboratory values to support personalized healthcare.
    • Exploration of data integration and its role in reducing cognitive load in CDSS.

    Data Source:

    The data originates from the publicly available MIMIC-IV database, developed and maintained by the Massachusetts Institute of Technology (MIT). Proper ethical guidelines for accessing and preprocessing the dataset have been followed.

    File Content:

    • Filename: MIMIC-IV_LabEvents_Subset_Normalization.xlsx
    • File Format: Microsoft Excel
    • Number of Rows: 100 samples for demonstration purposes.
    • Fields Included: Patient ID, Age, Gender, Raw Glucose Value, Normalized Glucose Value, and additional derived statistics.
  7. mimic-iii

    • kaggle.com
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    chan hainguyen (2025). mimic-iii [Dataset]. https://www.kaggle.com/datasets/chanhainguyen/mimic-iii
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 5, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    chan hainguyen
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by chan hainguyen

    Released under MIT

    Contents

  8. c

    Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II)

    • s.cnmilf.com
    • healthdata.gov
    • +4more
    Updated Jul 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institutes of Health (NIH) (2023). Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II) [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/multiparameter-intelligent-monitoring-in-intensive-care-ii-mimic-ii
    Explore at:
    Dataset updated
    Jul 26, 2023
    Dataset provided by
    National Institutes of Health (NIH)
    Description

    The objective of this Bioengineering Research Partnership is to focus the resources of a powerful interdisciplinary team from academia (MIT), industry (Philips Medical Systems) and clinical medicine (Beth Israel Deaconess Medical Center) to develop and evaluate advanced ICU patient monitoring systems that will substantially improve the efficiency, accuracy and timeliness of clinical decision making in intensive care.

  9. h

    clinical-ie

    • huggingface.co
    Updated Dec 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MIT Clinical Machine Learning Group (2022). clinical-ie [Dataset]. https://huggingface.co/datasets/mitclinicalml/clinical-ie
    Explore at:
    Dataset updated
    Dec 7, 2022
    Dataset authored and provided by
    MIT Clinical Machine Learning Group
    Description

    Below, we provide access to the datasets used in and created for the EMNLP 2022 paper Large Language Models are Few-Shot Clinical Information Extractors.

      Task #1: Clinical Sense Disambiguation
    

    For Task #1, we use the original annotations from the Clinical Acronym Sense Inventory (CASI) dataset, described in their paper. As is common, due to noisiness in the label set, we do not evaluate on the entire dataset, but only on a cleaner subset. For consistency, we use the subset defined… See the full description on the dataset page: https://huggingface.co/datasets/mitclinicalml/clinical-ie.

  10. f

    Study design and patients enrollment.

    • plos.figshare.com
    zip
    Updated Apr 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ju Luo; Shifang Zhou; Ning Ding (2025). Study design and patients enrollment. [Dataset]. http://doi.org/10.1371/journal.pone.0321063.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Ju Luo; Shifang Zhou; Ning Ding
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ObjectiveIn this study, we aimed to explore the relationship between serum phosphate and clinical outcomes in sepsis with E.Coli infection based on a public database in order to help physicians do individualized medical decisions.MethodsWe performed this retrospective study based on the Medical Information Mart for Intensive Care IV(MIMIC-IV) database (https://mimic.mit.edu/iv/). All the patients were hospitalized and serum phosphate was measured in 24 hours after hospitalization. E.Coli infection was confirmed by the positive blood culture of E.Coli in the database. Three models were utilized to investigate the relationship between serum phosphate and mortality in sepsis as follows: crude model (adjusted for none), model I (adjusted for age and gender) and model II (adjusted for all potential confounders). The smooth fitting curve was performed by the generalized additive model.Results421 adult sepsis patients with E.Coli infection were included. The 28-day mortality was 10.69%(n=45). The median age was 70 and the proportion of males was 47.51%(n=200). The smooth fitting curve showed that the relationship between serum phosphate and 28-day mortality in sepsis with E.Coli infection was positive. When serum phosphate >2.1mg/dl, the relationship was significantly positive (OR=1.55, 95%CI:1.01–2.36, P=0.043).ConclusionThe positive relationship between serum phosphate and 28-day mortality in adult sepsis patients with E.Coli infection was found based on MIMIC-IV database.

  11. p

    Data from: MIMIC-CXR-JPG - chest radiographs with structured labels

    • physionet.org
    Updated Mar 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alistair Johnson; Matthew Lungren; Yifan Peng; Zhiyong Lu; Roger Mark; Seth Berkowitz; Steven Horng (2024). MIMIC-CXR-JPG - chest radiographs with structured labels [Dataset]. http://doi.org/10.13026/jsn5-t979
    Explore at:
    Dataset updated
    Mar 12, 2024
    Authors
    Alistair Johnson; Matthew Lungren; Yifan Peng; Zhiyong Lu; Roger Mark; Seth Berkowitz; Steven Horng
    License

    https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts

    Description

    The MIMIC Chest X-ray JPG (MIMIC-CXR-JPG) Database v2.0.0 is a large publicly available dataset of chest radiographs in JPG format with structured labels derived from free-text radiology reports. The MIMIC-CXR-JPG dataset is wholly derived from MIMIC-CXR, providing JPG format files derived from the DICOM images and structured labels derived from the free-text reports. The aim of MIMIC-CXR-JPG is to provide a convenient processed version of MIMIC-CXR, as well as to provide a standard reference for data splits and image labels. The dataset contains 377,110 JPG format images and structured labels derived from the 227,827 free-text radiology reports associated with these images. The dataset is de-identified to satisfy the US Health Insurance Portability and Accountability Act of 1996 (HIPAA) Safe Harbor requirements. Protected health information (PHI) has been removed. The dataset is intended to support a wide body of research in medicine including image understanding, natural language processing, and decision support.

  12. h

    MedQA

    • huggingface.co
    Updated Oct 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artur Guimarães (2025). MedQA [Dataset]. https://huggingface.co/datasets/araag2/MedQA
    Explore at:
    Dataset updated
    Oct 15, 2025
    Authors
    Artur Guimarães
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    MedQA-USMLE — A Large-scale Open Domain Question Answering Dataset from Medical Exams

      Dataset Description
    

    Links

    Homepage: Github.io

    Repository: Github

    Paper: arXiv

    Leaderboard: Papers with Code

    Contact (Original Authors): Di Jin (jindi15@mit.edu)

    Contact (Curator): Artur Guimarães (artur.guimas@gmail.com)

      Dataset Summary
    

    MedQA is a large-scale multiple-choice question-answering dataset designed to mimic the style of professional… See the full description on the dataset page: https://huggingface.co/datasets/araag2/MedQA.

  13. Transcriptional changes induced by I-BET151-treated cells mimic those...

    • plos.figshare.com
    tiff
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Danae Schulz; Monica R. Mugnier; Eda-Margaret Paulsen; Hee-Sook Kim; Chun-wa W. Chung; David F. Tough; Inmaculada Rioja; Rab K. Prinjha; F. Nina Papavasiliou; Erik W. Debler (2023). Transcriptional changes induced by I-BET151-treated cells mimic those induced by differentiating cells. [Dataset]. http://doi.org/10.1371/journal.pbio.1002316.g002
    Explore at:
    tiffAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Danae Schulz; Monica R. Mugnier; Eda-Margaret Paulsen; Hee-Sook Kim; Chun-wa W. Chung; David F. Tough; Inmaculada Rioja; Rab K. Prinjha; F. Nina Papavasiliou; Erik W. Debler
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    (A) Plot showing median RPKM for all genes within the indicated functional groups at each time point after induction with I-BET151 (solid lines). For comparison, median expression values for all genes within the indicated functional groups were derived from data generated in [12] and plotted as dashed lines. Colors between solid and dashed lines are matched for each functional group. All functional groups with a GSEA FDR of 70% of genes were up-regulated or down-regulated at 48 h of I-BET151 treatment are plotted. (B) Examples of 3 functional groups that do not match our criteria of a GSEA FDR of 70% of genes were up-regulated or down-regulated at 48 h. mit F1, F1 ATPase, mit MCP, mitochondrial carrier proteins, mit imp, import of mitochondrial proteins. Numerical data for Fig 2 is in S2 Data.

  14. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Alistair Johnson; Tom Pollard; Roger Mark (2016). MIMIC-III Clinical Database [Dataset]. http://doi.org/10.13026/C2XW26

MIMIC-III Clinical Database

Explore at:
Dataset updated
Sep 4, 2016
Authors
Alistair Johnson; Tom Pollard; Roger Mark
License

https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts

Description

MIMIC-III is a large, freely-available database comprising deidentified health-related data associated with over forty thousand patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012. The database includes information such as demographics, vital sign measurements made at the bedside (~1 data point per hour), laboratory test results, procedures, medications, caregiver notes, imaging reports, and mortality (including post-hospital discharge).MIMIC supports a diverse range of analytic studies spanning epidemiology, clinical decision-rule improvement, and electronic tool development. It is notable for three factors: it is freely available to researchers worldwide; it encompasses a diverse and very large population of ICU patients; and it contains highly granular data, including vital signs, laboratory results, and medications.

Search
Clear search
Close search
Google apps
Main menu