https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
MIMIC-III is a large, freely-available database comprising deidentified health-related data associated with over forty thousand patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012. The database includes information such as demographics, vital sign measurements made at the bedside (~1 data point per hour), laboratory test results, procedures, medications, caregiver notes, imaging reports, and mortality (including post-hospital discharge).MIMIC supports a diverse range of analytic studies spanning epidemiology, clinical decision-rule improvement, and electronic tool development. It is notable for three factors: it is freely available to researchers worldwide; it encompasses a diverse and very large population of ICU patients; and it contains highly granular data, including vital signs, laboratory results, and medications.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Abstract The Medical Information Mart for Intensive Care (MIMIC)-IV database is comprised of deidentified electronic health records for patients admitted to the Beth Israel Deaconess Medical Center. Access to MIMIC-IV is limited to credentialed users. Here, we have provided an openly-available demo of MIMIC-IV containing a subset of 100 patients. The dataset includes similar content to MIMIC-IV, but excludes free-text clinical notes. The demo may be useful for running workshops and for assessing whether the MIMIC-IV is appropriate for a study before making an access request.
Background The increasing adoption of digital electronic health records has led to the existence of large datasets that could be used to carry out important research across many areas of medicine. Research progress has been limited, however, due to limitations in the way that the datasets are curated and made available for research. The MIMIC datasets allow credentialed researchers around the world unprecedented access to real world clinical data, helping to reduce the barriers to conducting important medical research. The public availability of the data allows studies to be reproduced and collaboratively improved in ways that would not otherwise be possible.
Methods First, the set of individuals to include in the demo was chosen. Each person in MIMIC-IV is assigned a unique subject_id. As the subject_id is randomly generated, ordering by subject_id results in a random subset of individuals. We only considered individuals with an anchor_year_group value of 2011 - 2013 or 2014 - 2016 to ensure overlap with MIMIC-CXR v2.0.0. The first 100 subject_id who satisfied the anchor_year_group criteria were selected for the demo dataset.
All tables from MIMIC-IV were included in the demo dataset. Tables containing patient information, such as emar or labevents, were filtered using the list of selected subject_id. Tables which do not contain patient level information were included in their entirety (e.g. d_items or d_labitems). Note that all tables which do not contain patient level information are prefixed with the characters 'd_'.
Deidentification was performed following the same approach as the MIMIC-IV database. Protected health information (PHI) as listed in the HIPAA Safe Harbor provision was removed. Patient identifiers were replaced using a random cipher, resulting in deidentified integer identifiers for patients, hospitalizations, and ICU stays. Stringent rules were applied to structured columns based on the data type. Dates were shifted consistently using a random integer removing seasonality, day of the week, and year information. Text fields were filtered by manually curated allow and block lists, as well as context-specific regular expressions. For example, columns containing dose values were filtered to only contain numeric values. If necessary, a free-text deidentification algorithm was applied to remove PHI from free-text. Results of this algorithm were manually reviewed and verified to remove identified PHI.
Data Description MIMIC-IV is a relational database consisting of 26 tables. For a detailed description of the database structure, see the MIMIC-IV Clinical Database page [1] or the MIMIC-IV online documentation [2]. The demo shares an identical schema and structure to the equivalent version of MIMIC-IV.
Data files are distributed in comma separated value (CSV) format following the RFC 4180 standard [3]. The dataset is also made available on Google BigQuery. Instructions to accessing the dataset on BigQuery are provided on the online MIMIC-IV documentation, under the cloud page [2].
An additional file is included: demo_subject_id.csv. This is a list of the subject_id used to filter MIMIC-IV to the demo subset.
Usage Notes The MIMIC-IV demo provides researchers with the opportunity to better understand MIMIC-IV data.
CSV files can be opened natively using any text editor or spreadsheet program. However, as some tables are large it may be preferable to navigate the data via a relational database. We suggest either working with the data in Google BigQuery (see the "Files" section for access details) or creating an SQLite database using the CSV files. SQLite is a lightweight database format which stores all constituent tables in a single file, and SQLite databases interoperate well with a number software tools.
Code is made available for use with MIMIC-IV on the MIMIC-IV code repository [4]. Code provided includes derivation of clinical concepts, tutorials, and reproducible analyses.
Release Notes Release notes for the demo follow the release notes for the MIMIC-IV database.
Ethics This project was approved by the Institutional Review Boards of Beth Israel Deaconess Medical Center (Boston, MA) and the Massachusetts Institute of Technology (Cambridge, MA). Requirement for individual patient consent was waived because the pr...
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
MIMIC-II documents a diverse and large population of intensive care unit patient stays and contains comprehensive and detailed clinical data, including physiological waveforms and minute-by-minute trends for a subset of records. It establishes a unique public-access resource for critical care research, supporting a diverse range of analytic studies spanning epidemiology, clinical decision-rule development, and electronic tool development. The MIMIC-II Clinical Database, although de-identified, still contains detailed information regarding the clinical care of patients, and must be treated with appropriate care and respect.
The Medical Information Mart for Intensive Care III (MIMIC-III) dataset is a large, de-identified and publicly-available collection of medical records. Each record in the dataset includes ICD-9 codes, which identify diagnoses and procedures performed. Each code is partitioned into sub-codes, which often include specific circumstantial details. The dataset consists of 112,000 clinical reports records (average length 709.3 tokens) and 1,159 top-level ICD-9 codes. Each report is assigned to 7.6 codes, on average. Data includes vital signs, medications, laboratory measurements, observations and notes charted by care providers, fluid balance, procedure codes, diagnostic codes, imaging reports, hospital length of stay, survival data, and more.
The database supports applications including academic and industrial research, quality improvement initiatives, and higher education coursework.
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Retrospectively collected medical data has the opportunity to improve patient care through knowledge discovery and algorithm development. Broad reuse of medical data is desirable for the greatest public good, but data sharing must be done in a manner which protects patient privacy. Here we present Medical Information Mart for Intensive Care (MIMIC)-IV, a large deidentified dataset of patients admitted to the emergency department or an intensive care unit at the Beth Israel Deaconess Medical Center in Boston, MA. MIMIC-IV contains data for over 65,000 patients admitted to an ICU and over 200,000 patients admitted to the emergency department. MIMIC-IV incorporates contemporary data and adopts a modular approach to data organization, highlighting data provenance and facilitating both individual and combined use of disparate data sources. MIMIC-IV is intended to carry on the success of MIMIC-III and support a broad set of applications within healthcare.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Abstract MIMIC-III is a large, freely-available database comprising deidentified health-related data associated with over 40,000 patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012 [1]. The MIMIC-III Clinical Database is available on PhysioNet (doi: 10.13026/C2XW26). Though deidentified, MIMIC-III contains detailed information regarding the care of real patients, and as such requires credentialing before access. To allow researchers to ascertain whether the database is suitable for their work, we have manually curated a demo subset, which contains information for 100 patients also present in the MIMIC-III Clinical Database. Notably, the demo dataset does not include free-text notes.
Background In recent years there has been a concerted move towards the adoption of digital health record systems in hospitals. Despite this advance, interoperability of digital systems remains an open issue, leading to challenges in data integration. As a result, the potential that hospital data offers in terms of understanding and improving care is yet to be fully realized.
MIMIC-III integrates deidentified, comprehensive clinical data of patients admitted to the Beth Israel Deaconess Medical Center in Boston, Massachusetts, and makes it widely accessible to researchers internationally under a data use agreement. The open nature of the data allows clinical studies to be reproduced and improved in ways that would not otherwise be possible.
The MIMIC-III database was populated with data that had been acquired during routine hospital care, so there was no associated burden on caregivers and no interference with their workflow. For more information on the collection of the data, see the MIMIC-III Clinical Database page.
Methods The demo dataset contains all intensive care unit (ICU) stays for 100 patients. These patients were selected randomly from the subset of patients in the dataset who eventually die. Consequently, all patients will have a date of death (DOD). However, patients do not necessarily die during an individual hospital admission or ICU stay.
This project was approved by the Institutional Review Boards of Beth Israel Deaconess Medical Center (Boston, MA) and the Massachusetts Institute of Technology (Cambridge, MA). Requirement for individual patient consent was waived because the project did not impact clinical care and all protected health information was deidentified.
Data Description MIMIC-III is a relational database consisting of 26 tables. For a detailed description of the database structure, see the MIMIC-III Clinical Database page. The demo shares an identical schema, except all rows in the NOTEEVENTS table have been removed.
The data files are distributed in comma separated value (CSV) format following the RFC 4180 standard. Notably, string fields which contain commas, newlines, and/or double quotes are encapsulated by double quotes ("). Actual double quotes in the data are escaped using an additional double quote. For example, the string she said "the patient was notified at 6pm"
would be stored in the CSV as "she said ""the patient was notified at 6pm"""
. More detail is provided on the RFC 4180 description page: https://tools.ietf.org/html/rfc4180
Usage Notes The MIMIC-III demo provides researchers with an opportunity to review the structure and content of MIMIC-III before deciding whether or not to carry out an analysis on the full dataset.
CSV files can be opened natively using any text editor or spreadsheet program. However, some tables are large, and it may be preferable to navigate the data stored in a relational database. One alternative is to create an SQLite database using the CSV files. SQLite is a lightweight database format which stores all constituent tables in a single file, and SQLite databases interoperate well with a number software tools.
DB Browser for SQLite is a high quality, visual, open source tool to create, design, and edit database files compatible with SQLite. We have found this tool to be useful for navigating SQLite files. Information regarding installation of the software and creation of the database can be found online: https://sqlitebrowser.org/
Release Notes Release notes for the demo follow the release notes for the MIMIC-III database.
Acknowledgements This research and development was supported by grants NIH-R01-EB017205, NIH-R01-EB001659, and NIH-R01-GM104987 from the National Institutes of Health. The authors would also like to thank Philips Healthcare and staff at the Beth Israel Deaconess Medical Center, Boston, for supporting database development, and Ken Pierce for providing ongoing support for the MIMIC research community.
Conflicts of Interest The authors declare no competing financial interests.
References Johnson, A. E. W., Pollard, T. J., Shen, L., Lehman, L. H., Feng, M., Ghassemi, M., Mo...
Retrospectively collected medical data has the opportunity to improve patient care through knowledge discovery and algorithm development. Broad reuse of medical data is desirable for the greatest public good, but data sharing must be done in a manner which protects patient privacy. The Medical Information Mart for Intensive Care (MIMIC)-III database provided critical care data for over 40,000 patients admitted to intensive care units at the Beth Israel Deaconess Medical Center (BIDMC). Importantly, MIMIC-III was deidentified, and patient identifiers were removed according to the Health Insurance Portability and Accountability Act (HIPAA) Safe Harbor provision. MIMIC-III has been integral in driving large amounts of research in clinical informatics, epidemiology, and machine learning. Here we present MIMIC-IV, an update to MIMIC-III, which incorporates contemporary data and improves on numerous aspects of MIMIC-III. MIMIC-IV adopts a modular approach to data organization, highlighting data provenance and facilitating both individual and combined use of disparate data sources. MIMIC-IV is intended to carry on the success of MIMIC-III and support a broad set of applications within healthcare.
MIMIC-IV-ED is a large, freely available database of emergency department (ED) admissions at the Beth Israel Deaconess Medical Center between 2011 and 2019. As of MIMIC-ED v1.0, the database contains 448,972 ED stays. Vital signs, triage information, medication reconciliation, medication administration, and discharge diagnoses are available. All data are deidentified to comply with the Health Information Portability and Accountability Act (HIPAA) Safe Harbor provision. MIMIC-ED is intended to support a diverse range of education initiatives and research studies.
This dataset is created from MIMIC-III (Medical Information Mart for Intensive Care III) and contains simulated patient admission notes. The clinical notes contain information about a patient at admission time to the ICU and are labelled for four outcome prediction tasks: Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay.
To obtain the data one first has to gain access to the MIMIC-III dataset and then run the scripts introduced in the linked repository.
The advent of large, open access text databases has driven advances in state-of-the-art model performance in natural language processing (NLP). The relatively limited amount of clinical data available for NLP has been cited as a significant barrier to the field's progress. Here we describe MIMIC-IV-Note: a collection of deidentified free-text clinical notes for patients included in the MIMIC-IV clinical database. MIMIC-IV-Note contains 331,794 deidentified discharge summaries from 145,915 patients admitted to the hospital and emergency department at the Beth Israel Deaconess Medical Center in Boston, MA, USA. The database also contains 2,321,355 deidentified radiology reports for 237,427 patients. All notes have had protected health information removed in accordance with the Health Insurance Portability and Accountability Act (HIPAA) Safe Harbor provision. All notes are linkable to MIMIC-IV providing important context to the clinical data therein. The database is intended to stimulate research in clinical natural language processing and associated areas.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
MIMIC_III_IPI - Discharge Summaries from Medical Information Mart for Intensive Care-III with Indirect Personal Identifiers Annotations
The discharge summaries we use for demonstrating our Indirect Personal Identifiers (IPI) schema are randomly sampled from the Medical Information Mart for Intensive Care (MIMIC-III) dataset. MIMIC-III comprises health-related data from over 40,000 patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012. Among other types of data, such as patient demographics, the database also includes various types of textual data, such as diagnostic reports and discharge summaries. We chose discharge summaries for our study, since these are richer in information than other notes in MIMIC-III. Details:
This is the Discharge Summaries from MIMIC-III with Indirect Personal Identifiers Annotations as an external source of the paper accepted at the PrivateNLP workshop at NAACL 2025, a preprint can be found in:
This repository contains the annotations in a CSV file and the annotation guidelines document. Inspecting the exact annotation texts requires access to the MIMIC-III Clinical Database, see https://physionet.org/content/mimiciii/1.4/. Each row in the CSV file has an ID together with a list of the IPI annotated spans, each in the format {"start": ,"end": ,"label": }. The ID in the ipi_annotations.csv table corresponds to the same ROW_ID in the MIMIC-III NOTEEVENTS.csv table and can be used for merging the tables to inspect the original documents and reconstruct the annotations using the offsets.
Please note that only authenticated users can request access to review and download the annotations and guidelines. If you encounter any issues, feel free to reach out to the contact person.
The MIMIC-IV-ECG module contains approximately 800,000 diagnostic electrocardiograms across nearly 160,000 unique patients. These diagnostic ECGs use 12 leads and are 10 seconds in length. They are sampled at 500 Hz. This subset contains all of the ECGs for patients who appear in the MIMIC-IV Clinical Database. When a cardiologist report is available for a given ECG, we provide the needed information to link the waveform to the report. The patients in MIMIC-IV-ECG have been matched against the MIMIC-IV Clinical Database, making it possible to link to information across the MIMIC-IV modules.
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Overview
The MIMIC PERform datasets contain physiological signals recorded from critically-ill patients during routine clinical care. Specifically, the datasets contain the following signals:
electrocardiogram (ECG)
photoplethysmogram (PPG)
impedance pneumography (imp), also known as respiratory (resp)
The datasets were extracted from the MIMIC III Waveform Database. Further details of the datasets are provided in the documentation accompanying the ppg-beats project, which is available at: https://ppg-beats.readthedocs.io/en/latest/ .
Datasets
The following datasets are available:
MIMIC PERform AF Dataset: Recordings from 35 critically-ill adults during routine clinical care, categorised as either AF (atrial fibrillation, 19 subjects) or non-AF (16 subjects).
Matlab format (AF subjects, non-AF subjects)
WFDB format (AF subjects, non-AF subjects)
CSV format (AF subjects, non-AF subjects)
MIMIC PERform Training Dataset: Recordings from 200 patients during routine clinical care, who are categorised as either adults (100 subjects) or neonates (100 subjects).
Matlab format (all data, adults, neonates)
WFDB format (all data, adults, neonates)
CSV format (all data, adults, neonates)
MIMIC PERform Testing Dataset: Recordings from 200 patients during routine clinical care, who are categorised as either adults (100 subjects) or neonates (100 subjects).
Matlab format (all data, adults, neonates)
WFDB format (all data, adults, neonates)
CSV format (all data, adults, neonates)
Citation
When using these datasets, please cite the following publication:
Charlton PH et al. Detecting beats in the photoplethysmogram: benchmarking open-source algorithms. Physiological Measurement 2022. DOI: 10.1088/1361-6579/ac826d
Acknowledgments
Each dataset is accompanied by a licence which acknowledges the source(s) of the data - please see the individual licenses for these acknowledgements.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset is part of the MIMIC database and specifically utilise the data corresponding to two patients with ids 221 and 230.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundExisting research suggests that using statins may reduce the incidence of enteritis caused by C. difficile and improve the prognosis of patients. This study aimed to explore the relation between Clostridium difficile-induced enteritis (CDE) and statin use.MethodsData were collected from the Medical Information Mart for Intensive Care-IV (MIMIC-IV) database. Multivariate logistic regression analysis was employed to assess the impact of statin use on CDE incidence in patients in intensive care units (ICUs) and its effect on in-hospital mortality among them. The research findings were validated by performing propensity score matching (PSM), inverse probability of treatment weighting (IPTW), and subgroup analyses.ResultsThe study enrolled the data of 51,978 individuals to assess the effect of statin usage on the occurrence of CDE in patients admitted to the ICU. The results indicate that statins can decrease the prevalence of CDE in patients in ICU (odds ratio (OR): 0.758, 95% confidence interval (CI): 0.666–0.873, P < 0.05), which was further confirmed through PSM (OR: 0.760, 95% CI: 0.661–0.873, P < 0.05) and IPTW (OR: 0.818, 95% CI: 0.754–0.888, P < 0.05) analyses. For most subgroups, statins’ favorable effect in reducing CDE remained constant. A total of 1,208 patients were included in the study to evaluate whether statins could lower the risk of death in patients in ICU with enteritis caused by C. difficile. Statins did not reduce in-hospital mortality of patients in ICU with CDE (OR: 0.911, 95% CI: 0.667–1.235, P = 0.553). The results were validated following PSM (OR: 0.877, 95% CI: 0.599–1.282, P = 0.499) and IPTW (OR: 0.781, 95% CI: 0.632–1.062, P = 0.071) analyses, and all subgroups demonstrated consistent results.ConclusionStatin administration can reduce the incidence of CDE in patients in the ICU; however, it does not decrease the in-hospital mortality rate for individuals with CDE.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We conducted our experiments on de-identified EHR data from MIMIC-III. This data set contains various clinical data relating to patient admission to ICU, such as disease diagnoses in the form of International Classification of Diseases (ICD)-9 codes, and lab test results as detailed in Supplementary Materials. We collected data for 5,956 patients, extracting lab tests every hour from admission. There are a total of 409 unique lab tests and 3,387 unique disease diagnoses observed. The diagnoses were obtained as ICD-9 codes and they were represented using one-hot encoding where one represents patients with disease and zero indicates those without. We binned the lab test events into 6, 12, 24, and 48 hours prior to patient death or discharge from ICU. From these data, we performed mortality predictions that are 10-fold, cross validated.
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
We propose a derivative dataset (derived from MIMIC-III Waveform Database Matched Subset) composed of 380 hours of the most common biomedical signals, including arterial blood pressure, photoplethysmograph, and electrocardiogram for 1,524 de-identified subjects, each having 30 segments of 30 seconds of those signals. For more detailed information, please refer to the scientific article at this link: https://www.nature.com/articles/s41597-024-04041-1
The MIMIC-IV-ECG module contains approximately 800,000 diagnostic electrocardiograms across nearly 160,000 unique patients. These diagnostic ECGs use 12 leads and are 10 seconds in length. They are sampled at 500 Hz. This subset contains all of the ECGs for patients who appear in the MIMIC-IV Clinical Database. When a cardiologist report is available for a given ECG, we provide the needed information to link the waveform to the report. The patients in MIMIC-IV-ECG have been matched against the MIMIC-IV Clinical Database, making it possible to link to information across the MIMIC-IV modules.
MIMIC II (Multiparameter Intelligent Monitoring in Intensive Care) Database contains comprehensive clinical data from tens of thousands of Intensive Care Unit (ICU) patients. Data were collected between 2001 and 2008 from a variety of ICUs (medical, surgical, coronary care, and neonatal) in a single tertiary teaching hospital. The database contains clinical data from bedside workstations as well as hospital archives. The database also includes thousands of records of continuous high-resolution physiologic waveforms and minute-by-minute numeric time series (trends) of physiologic measurements.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We provide some annotations of the Medical Information Mart for Intensive Care (MIMIC) III waveform database matched Subset. The annotations are for the electrocardiogram recordings and denote atrial fibrillation status.More annotations will be added in future.Details about MIMIC III matched subset can be found at Physionet.https://archive.physionet.org/physiobank/database/mimic3wdb/matched/If you use the annotations, please cite the following paper:Bashar, S.K., Ding, E., Walkey, A.J., McManus, D.D. and Chon, K.H., 2019. Noise Detection in Electrocardiogram Signals for Intensive Care Unit Patients. IEEE Access, 7, pp.88357-88368
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
MIMIC-III is a large, freely-available database comprising deidentified health-related data associated with over forty thousand patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012. The database includes information such as demographics, vital sign measurements made at the bedside (~1 data point per hour), laboratory test results, procedures, medications, caregiver notes, imaging reports, and mortality (including post-hospital discharge).MIMIC supports a diverse range of analytic studies spanning epidemiology, clinical decision-rule improvement, and electronic tool development. It is notable for three factors: it is freely available to researchers worldwide; it encompasses a diverse and very large population of ICU patients; and it contains highly granular data, including vital signs, laboratory results, and medications.