https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Retrospectively collected medical data has the opportunity to improve patient care through knowledge discovery and algorithm development. Broad reuse of medical data is desirable for the greatest public good, but data sharing must be done in a manner which protects patient privacy. Here we present Medical Information Mart for Intensive Care (MIMIC)-IV, a large deidentified dataset of patients admitted to the emergency department or an intensive care unit at the Beth Israel Deaconess Medical Center in Boston, MA. MIMIC-IV contains data for over 65,000 patients admitted to an ICU and over 200,000 patients admitted to the emergency department. MIMIC-IV incorporates contemporary data and adopts a modular approach to data organization, highlighting data provenance and facilitating both individual and combined use of disparate data sources. MIMIC-IV is intended to carry on the success of MIMIC-III and support a broad set of applications within healthcare.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Abstract The Medical Information Mart for Intensive Care (MIMIC)-IV database is comprised of deidentified electronic health records for patients admitted to the Beth Israel Deaconess Medical Center. Access to MIMIC-IV is limited to credentialed users. Here, we have provided an openly-available demo of MIMIC-IV containing a subset of 100 patients. The dataset includes similar content to MIMIC-IV, but excludes free-text clinical notes. The demo may be useful for running workshops and for assessing whether the MIMIC-IV is appropriate for a study before making an access request.
Background The increasing adoption of digital electronic health records has led to the existence of large datasets that could be used to carry out important research across many areas of medicine. Research progress has been limited, however, due to limitations in the way that the datasets are curated and made available for research. The MIMIC datasets allow credentialed researchers around the world unprecedented access to real world clinical data, helping to reduce the barriers to conducting important medical research. The public availability of the data allows studies to be reproduced and collaboratively improved in ways that would not otherwise be possible.
Methods First, the set of individuals to include in the demo was chosen. Each person in MIMIC-IV is assigned a unique subject_id. As the subject_id is randomly generated, ordering by subject_id results in a random subset of individuals. We only considered individuals with an anchor_year_group value of 2011 - 2013 or 2014 - 2016 to ensure overlap with MIMIC-CXR v2.0.0. The first 100 subject_id who satisfied the anchor_year_group criteria were selected for the demo dataset.
All tables from MIMIC-IV were included in the demo dataset. Tables containing patient information, such as emar or labevents, were filtered using the list of selected subject_id. Tables which do not contain patient level information were included in their entirety (e.g. d_items or d_labitems). Note that all tables which do not contain patient level information are prefixed with the characters 'd_'.
Deidentification was performed following the same approach as the MIMIC-IV database. Protected health information (PHI) as listed in the HIPAA Safe Harbor provision was removed. Patient identifiers were replaced using a random cipher, resulting in deidentified integer identifiers for patients, hospitalizations, and ICU stays. Stringent rules were applied to structured columns based on the data type. Dates were shifted consistently using a random integer removing seasonality, day of the week, and year information. Text fields were filtered by manually curated allow and block lists, as well as context-specific regular expressions. For example, columns containing dose values were filtered to only contain numeric values. If necessary, a free-text deidentification algorithm was applied to remove PHI from free-text. Results of this algorithm were manually reviewed and verified to remove identified PHI.
Data Description MIMIC-IV is a relational database consisting of 26 tables. For a detailed description of the database structure, see the MIMIC-IV Clinical Database page [1] or the MIMIC-IV online documentation [2]. The demo shares an identical schema and structure to the equivalent version of MIMIC-IV.
Data files are distributed in comma separated value (CSV) format following the RFC 4180 standard [3]. The dataset is also made available on Google BigQuery. Instructions to accessing the dataset on BigQuery are provided on the online MIMIC-IV documentation, under the cloud page [2].
An additional file is included: demo_subject_id.csv. This is a list of the subject_id used to filter MIMIC-IV to the demo subset.
Usage Notes The MIMIC-IV demo provides researchers with the opportunity to better understand MIMIC-IV data.
CSV files can be opened natively using any text editor or spreadsheet program. However, as some tables are large it may be preferable to navigate the data via a relational database. We suggest either working with the data in Google BigQuery (see the "Files" section for access details) or creating an SQLite database using the CSV files. SQLite is a lightweight database format which stores all constituent tables in a single file, and SQLite databases interoperate well with a number software tools.
Code is made available for use with MIMIC-IV on the MIMIC-IV code repository [4]. Code provided includes derivation of clinical concepts, tutorials, and reproducible analyses.
Release Notes Release notes for the demo follow the release notes for the MIMIC-IV database.
Ethics This project was approved by the Institutional Review Boards of Beth Israel Deaconess Medical Center (Boston, MA) and the Massachusetts Institute of Technology (Cambridge, MA). Requirement for individual patient consent was waived because the pr...
Retrospectively collected medical data has the opportunity to improve patient care through knowledge discovery and algorithm development. Broad reuse of medical data is desirable for the greatest public good, but data sharing must be done in a manner which protects patient privacy.
The Medical Information Mart for Intensive Care (MIMIC)-III database provided critical care data for over 40,000 patients admitted to intensive care units at the Beth Israel Deaconess Medical Center (BIDMC). Importantly, MIMIC-III was deidentified, and patient identifiers were removed according to the Health Insurance Portability and Accountability Act (HIPAA) Safe Harbor provision. MIMIC-III has been integral in driving large amounts of research in clinical informatics, epidemiology, and machine learning. Here we present MIMIC-IV, an update to MIMIC-III, which incorporates contemporary data and improves on numerous aspects of MIMIC-III. MIMIC-IV adopts a modular approach to data organization, highlighting data provenance and facilitating both individual and combined use of disparate data sources. MIMIC-IV is intended to carry on the success of MIMIC-III and support a broad set of applications within healthcare.
The advent of large, open access text databases has driven advances in state-of-the-art model performance in natural language processing (NLP). The relatively limited amount of clinical data available for NLP has been cited as a significant barrier to the field's progress. Here we describe MIMIC-IV-Note: a collection of deidentified free-text clinical notes for patients included in the MIMIC-IV clinical database. MIMIC-IV-Note contains 331,794 deidentified discharge summaries from 145,915 patients admitted to the hospital and emergency department at the Beth Israel Deaconess Medical Center in Boston, MA, USA. The database also contains 2,321,355 deidentified radiology reports for 237,427 patients. All notes have had protected health information removed in accordance with the Health Insurance Portability and Accountability Act (HIPAA) Safe Harbor provision. All notes are linkable to MIMIC-IV providing important context to the clinical data therein. The database is intended to stimulate research in clinical natural language processing and associated areas.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Antu Saha
Released under CC0: Public Domain
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
MIMIC-IV-ED is a large, freely available database of emergency department (ED) admissions at the Beth Israel Deaconess Medical Center between 2011 and 2019. The database contains ~425,000 ED stays. Vital signs, triage information, medication reconciliation, medication administration, and discharge diagnoses are available. All data are deidentified to comply with the Health Information Portability and Accountability Act (HIPAA) Safe Harbor provision. MIMIC-IV-ED is intended to support a diverse range of education initiatives and research studies.
MIMIC-IV ICD-10 contains 122,279 discharge summaries—free-text medical documents—annotated with ICD-10 diagnosis and procedure codes. It contains data for patients admitted to the Beth Israel Deaconess Medical Center emergency department or ICU between 2008-2019. All codes with fewer than ten examples have been removed, and the train-val-test split was created using multi-label stratified sampling. The dataset is described further in Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review and Replicability Study, and the code to use the dataset is found here.
The dataset is intended for medical code prediction and was created using MIMIC-IV v2.2 and MIMIC-IV-NOTE v2.2. Using the two datasets requires a license obtained in Physionet; this can take a couple of days.
The MIMIC-IV-ECG module contains approximately 800,000 diagnostic electrocardiograms across nearly 160,000 unique patients. These diagnostic ECGs use 12 leads and are 10 seconds in length. They are sampled at 500 Hz. This subset contains all of the ECGs for patients who appear in the MIMIC-IV Clinical Database. When a cardiologist report is available for a given ECG, we provide the needed information to link the waveform to the report. The patients in MIMIC-IV-ECG have been matched against the MIMIC-IV Clinical Database, making it possible to link to information across the MIMIC-IV modules.
The MIMIC-IV-ECG module contains approximately 800,000 diagnostic electrocardiograms across nearly 160,000 unique patients. These diagnostic ECGs use 12 leads and are 10 seconds in length. They are sampled at 500 Hz. This subset contains all of the ECGs for patients who appear in the MIMIC-IV Clinical Database. When a cardiologist report is available for a given ECG, we provide the needed information to link the waveform to the report. The patients in MIMIC-IV-ECG have been matched against the MIMIC-IV Clinical Database, making it possible to link to information across the MIMIC-IV modules.
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Fast Healthcare Interoperability Resources (FHIR) has emerged as a robust standard for healthcare data exchange. To explore the use of FHIR for the process of data harmonization, we converted the Medical Information Mart for Intensive Care IV (MIMIC-IV) and MIMIC-IV Emergency Department (MIMIC-IV-ED) databases into FHIR. We extended base FHIR to encode information in MIMIC-IV and aimed to retain the data in FHIR with minimal additional processing, aligning to US Core v4.0.0 where possible. A total of 24 profiles were created for MIMIC-IV data, and an additional 6 profiles were created for MIMIC-IV-ED data. Code systems and value sets were created from MIMIC terminology. We hope MIMIC-IV in FHIR provides a useful restructuring of the data to support applications around data harmonization, interoperability, and other areas of research.
Objective To assess the use of Health Level Seven Fast Healthcare Interoperability Resources (FHIR®) for implementing the Findable, Accessible, Interoperable, and Reusable guiding principles for scientific data (FAIR). Additionally, present a list of FAIR implementation choices for supporting future FAIR implementations that use FHIR. Material and Methods A case study was conducted on the Medical Information Mart for Intensive Care-IV Emergency Department dataset (MIMIC-ED), a deidentified clinical dataset converted into FHIR. The FAIRness of this dataset was assessed using a set of common FAIR assessment indicators. Results The FHIR distribution of MIMIC-ED, comprising an implementation guide and demo data, was more FAIR compared to the non-FHIR distribution. The FAIRness score increased from 60 to 82 out of 95 points, a relative improvement of 37%. The most notable improvements were observed in interoperability, with a score increase from 5 to 19 out of 19 points, and reusability, wit..., The authors of the paper collected the dataset. , Microsoft Word (.docx files) or Microsoft Excel (.csv files) (Open-source alternatives: LibreOffice, OpenOffice) The data files (.csv) can also be opened using any text editor, R, etc., # FAIR Indicator Scores and Qualitative Comments
This dataset belongs as supplementary material to the paper entitled "Assessing the Use of HL7 FHIR for Implementing the FAIR Guiding Principles: A Case Study of the MIMIC-IV Emergency Department Module".
This dataset describes the indicator scores and qualitative comments of the FAIR data assessment of the Medical Information Mart for Intensive Care (MIMIC)-IV Emergency Department Module. Two distributions of the Emergency Department module were assessed, the PhysioNet distribution and the Fast Healthcare Interoperability Resources (FHIR) distribution. This dataset consists of two files: (1) PhysioNet.csv containing the data of the PhysioNet distribution; and (2) FHIR.csv containing the data of the FHIR distribution. Both files share the same structure and fields.
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
The advent of large, open access text databases has driven advances in state-of-the-art model performance in natural language processing (NLP). The relatively limited amount of clinical data available for NLP has been cited as a significant barrier to the field's progress. Here we describe MIMIC-IV-Note: a collection of deidentified free-text clinical notes for patients included in the MIMIC-IV clinical database. MIMIC-IV-Note contains 331,794 deidentified discharge summaries from 145,915 patients admitted to the hospital and emergency department at the Beth Israel Deaconess Medical Center in Boston, MA, USA. The database also contains 2,321,355 deidentified radiology reports for 237,427 patients. All notes have had protected health information removed in accordance with the Health Insurance Portability and Accountability Act (HIPAA) Safe Harbor provision. All notes are linkable to MIMIC-IV providing important context to the clinical data therein. The database is intended to stimulate research in clinical natural language processing and associated areas.
F219091/mimic-iv-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
With the rapid development of generative LLMs (large language models) in the field of natural language processing, their potential in medical applications has become increasingly evident. However, most existing studies rely on exam-style questions or artificially designed cases, lacking validation using real patient data. To address this gap, this study leverages the MIMIC-IV database to construct a subset, MIMIC-IV-Ext Cardiac Disease, which includes 4,761 patients diagnosed with cardiac diseases. The dataset covers all relevant clinical examinations from admission to discharge, as well as the final diagnoses.Combining these data with the multi-turn interaction framework we built can be used to test whether large models can guide patients through in-hospital examinations. Moreover, after modifying the MIMIC-IV dataset, our sub-dataset can greatly facilitate researchers in conducting other studies.
This dataset was created by Mehrnooshazizi
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
The number of publicly available ECG datasets has increased tremendously in the past few years and several of these datasets have developed into widely used benchmarking datasets. However, most of them exhibit a common limitation, namely the reliance on retrospective annotation and a lack of clinical ground truth. This represents a serious limitation compared to closed in-hospital datasets. To circumvent this issue, we propose the MIMIC-IV-ECG-Ext-ICD dataset by linking the samples from the MIMIC-IV-ECG dataset to clinical ground truth from the MIMIC-IV dataset, in the form of ED and hospital discharge diagnoses. We release this derived dataset to foster further research on ECG-based prediction models with clinical ground truth and build a resource for benchmarking clinical ECG prediction models.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionAcute ischemic stroke (AIS) patients admitted to the intensive care unit (ICU) have a high mortality rate, necessitating the early identification of those at risk of a poor prognosis. This study investigated the association between the blood glucose-to-potassium ratio (GPR) and the prognosis of AIS patients.MethodsWe conducted a retrospective cohort study using data from the Medical Information Mart for Intensive Care IV (MIMIC-IV) database. The primary outcomes were 28-day, 90-day, and 1-year mortality rates following ICU admission. Multivariate Cox proportional hazards regression models were used to estimate adjusted hazard ratios (HRs) with 95% confidence intervals (CIs). Subgroup analyses, Kaplan–Meier survival curves, and restricted cubic spline models were employed to further evaluate the relationship between the GPR and mortality in AIS patients.ResultsA total of 2,636 AIS patients were included in the study, with a mean age of 69.4 ± 15.6 years. The 1-year mortality rate was 36.8% (n = 969). After adjusting for confounders, compared with the first quartile (Q1, GPR ≤ 1.39), the 1-year mortality risks for the second quartile (Q2, 1.39 < GPR ≤ 1.74), third quartile (Q3, 1.74 < GPR ≤ 2.25), and fourth quartile (Q4, GPR ≥ 2.25) were HR = 1.17 (95% CI: 0.95–1.43, p = 0.132), HR = 1.42 (95% CI: 1.17–1.73, p < 0.001), and HR = 1.61 (95% CI: 1.33–1.96, p < 0.001), respectively. Similar trends were observed for 28-day and 90-day mortality. Kaplan–Meier (KM) analysis revealed that groups with higher GPRs had higher mortality rates at 28 days, 90 days, and 1 year. Non-linear analysis further confirmed the presence of an inflection point in the association between the GPR and 365-day mortality, which was identified at GPR = 2.75. At ratios less than this threshold, the risk of mortality increased significantly with increasing GPR (HR: 1.466; 95% CI: 1.239–1.735; p < 0.001). However, above this ratio, the association plateaued and was no longer statistically significant (HR: 0.899; 95% CI: 0.726–1.113; p = 0.095).ConclusionThe GPR is an independent predictor of poor prognosis in AIS patients admitted to the ICU. Higher GPRs are associated with increased 28-day and 90-day mortality rates, highlighting the potential utility of this ratio in risk stratification and clinical decision-making. A non-linear relationship was observed between the GPR and 365-day mortality, with an inflection point identified at GPR = 2.75.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The original data
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data used for research.
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Retrospectively collected medical data has the opportunity to improve patient care through knowledge discovery and algorithm development. Broad reuse of medical data is desirable for the greatest public good, but data sharing must be done in a manner which protects patient privacy. Here we present Medical Information Mart for Intensive Care (MIMIC)-IV, a large deidentified dataset of patients admitted to the emergency department or an intensive care unit at the Beth Israel Deaconess Medical Center in Boston, MA. MIMIC-IV contains data for over 65,000 patients admitted to an ICU and over 200,000 patients admitted to the emergency department. MIMIC-IV incorporates contemporary data and adopts a modular approach to data organization, highlighting data provenance and facilitating both individual and combined use of disparate data sources. MIMIC-IV is intended to carry on the success of MIMIC-III and support a broad set of applications within healthcare.