https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Retrospectively collected medical data has the opportunity to improve patient care through knowledge discovery and algorithm development. Broad reuse of medical data is desirable for the greatest public good, but data sharing must be done in a manner which protects patient privacy. Here we present Medical Information Mart for Intensive Care (MIMIC)-IV, a large deidentified dataset of patients admitted to the emergency department or an intensive care unit at the Beth Israel Deaconess Medical Center in Boston, MA. MIMIC-IV contains data for over 65,000 patients admitted to an ICU and over 200,000 patients admitted to the emergency department. MIMIC-IV incorporates contemporary data and adopts a modular approach to data organization, highlighting data provenance and facilitating both individual and combined use of disparate data sources. MIMIC-IV is intended to carry on the success of MIMIC-III and support a broad set of applications within healthcare.
The Medical Information Mart for Intensive Care (MIMIC)-IV database is comprised of deidentified electronic health records for patients admitted to the Beth Israel Deaconess Medical Center. Access to MIMIC-IV is limited to credentialed users. Here, we have provided an openly-available demo of MIMIC-IV containing a subset of 100 patients. The dataset includes similar content to MIMIC-IV, but excludes free-text clinical notes. The demo may be useful for running workshops and for assessing whether the MIMIC-IV is appropriate for a study before making an access request.
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Retrospectively collected medical data has the opportunity to improve patient care through knowledge discovery and algorithm development. Broad reuse of medical data is desirable for the greatest public good, but data sharing must be done in a manner which protects patient privacy. The Medical Information Mart for Intensive Care (MIMIC)-III database provided critical care data for over 40,000 patients admitted to intensive care units at the Beth Israel Deaconess Medical Center (BIDMC). Importantly, MIMIC-III was deidentified, and patient identifiers were removed according to the Health Insurance Portability and Accountability Act (HIPAA) Safe Harbor provision. MIMIC-III has been integral in driving large amounts of research in clinical informatics, epidemiology, and machine learning. Here we present MIMIC-IV, an update to MIMIC-III, which incorporates contemporary data and improves on numerous aspects of MIMIC-III. MIMIC-IV adopts a modular approach to data organization, highlighting data provenance and facilitating both individual and combined use of disparate data sources. MIMIC-IV is intended to carry on the success of MIMIC-III and support a broad set of applications within healthcare.
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
MIMIC-IV-ED is a large, freely available database of emergency department (ED) admissions at the Beth Israel Deaconess Medical Center between 2011 and 2019. The database contains ~425,000 ED stays. Vital signs, triage information, medication reconciliation, medication administration, and discharge diagnoses are available. All data are deidentified to comply with the Health Information Portability and Accountability Act (HIPAA) Safe Harbor provision. MIMIC-IV-ED is intended to support a diverse range of education initiatives and research studies.
The MIMIC-IV-ECG module contains approximately 800,000 diagnostic electrocardiograms across nearly 160,000 unique patients. These diagnostic ECGs use 12 leads and are 10 seconds in length. They are sampled at 500 Hz. This subset contains all of the ECGs for patients who appear in the MIMIC-IV Clinical Database. When a cardiologist report is available for a given ECG, we provide the needed information to link the waveform to the report. The patients in MIMIC-IV-ECG have been matched against the MIMIC-IV Clinical Database, making it possible to link to information across the MIMIC-IV modules.
Dataset for mimic4 data, by default for the Mortality task. Available tasks are: Mortality, Length of Stay, Readmission, Phenotype. The data is extracted from the mimic4 database using this pipeline: 'https://github.com/healthylaife/MIMIC-IV-Data-Pipeline/tree/main' mimic path should have this form : "path/to/mimic4data/from/username/mimiciv/2.2" If you choose a Custom task provide a configuration file for the Time series. Currently working with Mimic-IV ICU Data.
and the eICU
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This file contains a preprocessed subset of the MIMIC-IV dataset (Medical Information Mart for Intensive Care, Version IV), specifically focusing on laboratory event data related to glucose levels. It has been curated and processed for research on data normalization and integration within Clinical Decision Support Systems (CDSS) to improve Human-Computer Interaction (HCI) elements.
The dataset includes the following key features:
This data has been used to analyze the impact of normalization and integration techniques on improving data accuracy and usability in CDSS environments. The file is provided as part of ongoing research on enhancing clinical decision-making and user interaction in healthcare systems.
The data originates from the publicly available MIMIC-IV database, developed and maintained by the Massachusetts Institute of Technology (MIT). Proper ethical guidelines for accessing and preprocessing the dataset have been followed.
MIMIC-IV_LabEvents_Subset_Normalization.xlsx
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundExisting research suggests that using statins may reduce the incidence of enteritis caused by C. difficile and improve the prognosis of patients. This study aimed to explore the relation between Clostridium difficile-induced enteritis (CDE) and statin use.MethodsData were collected from the Medical Information Mart for Intensive Care-IV (MIMIC-IV) database. Multivariate logistic regression analysis was employed to assess the impact of statin use on CDE incidence in patients in intensive care units (ICUs) and its effect on in-hospital mortality among them. The research findings were validated by performing propensity score matching (PSM), inverse probability of treatment weighting (IPTW), and subgroup analyses.ResultsThe study enrolled the data of 51,978 individuals to assess the effect of statin usage on the occurrence of CDE in patients admitted to the ICU. The results indicate that statins can decrease the prevalence of CDE in patients in ICU (odds ratio (OR): 0.758, 95% confidence interval (CI): 0.666–0.873, P < 0.05), which was further confirmed through PSM (OR: 0.760, 95% CI: 0.661–0.873, P < 0.05) and IPTW (OR: 0.818, 95% CI: 0.754–0.888, P < 0.05) analyses. For most subgroups, statins’ favorable effect in reducing CDE remained constant. A total of 1,208 patients were included in the study to evaluate whether statins could lower the risk of death in patients in ICU with enteritis caused by C. difficile. Statins did not reduce in-hospital mortality of patients in ICU with CDE (OR: 0.911, 95% CI: 0.667–1.235, P = 0.553). The results were validated following PSM (OR: 0.877, 95% CI: 0.599–1.282, P = 0.499) and IPTW (OR: 0.781, 95% CI: 0.632–1.062, P = 0.071) analyses, and all subgroups demonstrated consistent results.ConclusionStatin administration can reduce the incidence of CDE in patients in the ICU; however, it does not decrease the in-hospital mortality rate for individuals with CDE.
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
The advent of large, open access text databases has driven advances in state-of-the-art model performance in natural language processing (NLP). The relatively limited amount of clinical data available for NLP has been cited as a significant barrier to the field's progress. Here we describe MIMIC-IV-Note: a collection of deidentified free-text clinical notes for patients included in the MIMIC-IV clinical database. MIMIC-IV-Note contains 331,794 deidentified discharge summaries from 145,915 patients admitted to the hospital and emergency department at the Beth Israel Deaconess Medical Center in Boston, MA, USA. The database also contains 2,321,355 deidentified radiology reports for 237,427 patients. All notes have had protected health information removed in accordance with the Health Insurance Portability and Accountability Act (HIPAA) Safe Harbor provision. All notes are linkable to MIMIC-IV providing important context to the clinical data therein. The database is intended to stimulate research in clinical natural language processing and associated areas.
Objective To assess the use of Health Level Seven Fast Healthcare Interoperability Resources (FHIR®) for implementing the Findable, Accessible, Interoperable, and Reusable guiding principles for scientific data (FAIR). Additionally, present a list of FAIR implementation choices for supporting future FAIR implementations that use FHIR. Material and Methods A case study was conducted on the Medical Information Mart for Intensive Care-IV Emergency Department dataset (MIMIC-ED), a deidentified clinical dataset converted into FHIR. The FAIRness of this dataset was assessed using a set of common FAIR assessment indicators. Results The FHIR distribution of MIMIC-ED, comprising an implementation guide and demo data, was more FAIR compared to the non-FHIR distribution. The FAIRness score increased from 60 to 82 out of 95 points, a relative improvement of 37%. The most notable improvements were observed in interoperability, with a score increase from 5 to 19 out of 19 points, and reusability, wit..., The authors of the paper collected the dataset. , Microsoft Word (.docx files) or Microsoft Excel (.csv files) (Open-source alternatives: LibreOffice, OpenOffice) The data files (.csv) can also be opened using any text editor, R, etc., # FAIR Indicator Scores and Qualitative Comments
This dataset belongs as supplementary material to the paper entitled "Assessing the Use of HL7 FHIR for Implementing the FAIR Guiding Principles: A Case Study of the MIMIC-IV Emergency Department Module".
This dataset describes the indicator scores and qualitative comments of the FAIR data assessment of the Medical Information Mart for Intensive Care (MIMIC)-IV Emergency Department Module. Two distributions of the Emergency Department module were assessed, the PhysioNet distribution and the Fast Healthcare Interoperability Resources (FHIR) distribution. This dataset consists of two files: (1) PhysioNet.csv containing the data of the PhysioNet distribution; and (2) FHIR.csv containing the data of the FHIR distribution. Both files share the same structure and fields.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
AimTo compare the effects of midazolam, propofol, and dexmedetomidine monotherapy and combination therapy on the prognosis of intensive care unit (ICU) patients receiving continuous mechanical ventilation (MV).Methods11,491 participants from the Medical Information Mart for Intensive Care (MIMIC)-IV database 2008–2019 was included in this retrospective cohort study. The primary outcome was defined as incidence of ventilator-associated pneumonia (VAP), in-hospital mortality, and duration of MV. Univariate and multivariate logistic regression analyses were utilized to evaluate the association between sedation and the incidence of VAP. Univariate and multivariate Cox analyses were performed to investigate the correlation between sedative therapy and in-hospital mortality. Additionally, univariate and multivariate linear analyses were conducted to explore the relationship between sedation and duration of MV.ResultsCompared to patients not receiving these medications, propofol alone, dexmedetomidine alone, combination of midazolam and dexmedetomidine, combination of propofol and dexmedetomidine, combination of midazolam, propofol and dexmedetomidine were all association with an increased risk of VAP; dexmedetomidine alone, combination of midazolam and dexmedetomidine, combination of propofol and dexmedetomidine, combination of midazolam, propofol and dexmedetomidine may be protective factor for in-hospital mortality, while propofol alone was risk factor. There was a positive correlation between all types of tranquilizers and the duration of MV. Taking dexmedetomidine alone as the reference, all other drug groups were found to be associated with an increased risk of in-hospital mortality. The administration of propofol alone, in combination with midazolam and dexmedetomidine, in combination with propofol and dexmedetomidine, in combination with midazolam, propofol and dexmedetomidine were associated with an increased risk of VAP compared to the use of dexmedetomidine alone.ConclusionDexmedetomidine alone may present as a favorable prognostic option for ICU patients with mechanical ventilation MV.
F219091/mimic-iv-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Muhamamd Umer
Released under CC0: Public Domain
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
This dataset presents a curated collection of preprocessed and labeled clinical notes derived from the MIMIC-IV-Note database. The primary aim of this resource is to facilitate the development and training of machine learning models focused on summarizing brief hospital courses (BHC) from clinical discharge notes.
The dataset contains 270,033 meticulously cleaned and standardized clinical notes containing an average token length of 2,267, ensuring usability for machine learning (ML) applications. Each clinical note is paired with a corresponding BHC summary, providing a robust foundation for supervised learning tasks. The preprocessing pipeline employed uses regular expressions to address common issues in the raw clinical text, such as special characters, extraneous whitespace, inconsistent formatting, and irrelevant text, to produce a high-quality, structured dataset with separated clinical note sections through appropriate headings.
By offering this resource, we aim to support healthcare professionals and researchers in their efforts to enhance patient care through the automation of BHC summarization. This dataset is ideal for exploring various NLP techniques, developing predictive models, and improving the efficiency and accuracy of clinical documentation practices. We invite the research community to utilize this dataset to advance the field of medical informatics and contribute to better health outcomes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The original data
Dataset Details
This is a randomly split five fold dataset of the MIMIC-IV-ECG varaint of ECG-QA stratified based on patients (i.e., zero patient overlap between training and test). The name of the dataset repository is ecg-qa-mimic-iv-ecg-250-500 where 250 refers to the sampling frequency and 500 denotes 2 seconds. The code to do the splitting is here. The dataset contents are under the same license as the original ECG-QA's license. Any questions or issues, please do not hesitate… See the full description on the dataset page: https://huggingface.co/datasets/willxxy/ecg-qa-mimic-iv-ecg-250-500.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundThis research focused on evaluating the correlation between platelet count and sepsis prognosis, and even the dose-response relationship, in a cohort of American adults.MethodPlatelet counts were recorded retrospectively after hospitalization for patients admitted to Beth Israel Deaconess Medical Center’s intensive care unit between 2008 and 2019. On admission to the intensive care unit, sepsis patients were divided into four categories based on platelet counts (very low < 50 × 109/L, intermediate-low 50 × 109–100 × 109/L, low 100 × 109–150 × 109/L, and normal ≥ 150 × 109/L). A multivariate Cox proportional risk model was used to calculate the 28-day risk of mortality in sepsis based on baseline platelet counts, and a two-piece linear regression model was used to calculate the threshold effect.ResultsThe risk of 28-day septic mortality was nearly 2-fold higher in the platelet very low group when compared to the low group (hazard ratios [HRs], 2.24; 95% confidence interval [CI], 1.92–2.6). Further analysis revealed a curvilinear association between platelets and the sepsis risk of death, with a saturation effect predicted at 100 × 109/L. When platelet counts were below 100 × 109/L, the risk of sepsis 28-day death decreased significantly with increasing platelet count levels (HR, 0.875; 95% CI, 0.84–0.90).ConclusionWhen platelet count was less than 100 × 109/L, it was a strong predictor of the potential risk of sepsis death, which is declined by 13% for every 10 × 109/L growth in platelets. When platelet counts reach up to 100 × 109/L, the probability of dying to sepsis within 28 days climbs by 1% for every 10 × 109/L increase in platelet count.
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Clinical decision making is one of the most impactful parts of a physician's responsibilities and stands to benefit greatly from AI solutions such as large language models (LLMs). However, while many datasets exist to test the performance of AI models on constructed case vignettes, such as medical licensing exams, these tests fail to assess many skills that are necessary for deployment in a realistic clinical decision making environment. To understand how useful LLMs are in real-world settings, we must evaluate them in the wild, i.e. on real-world data under realistic conditions. To address this need, we have created a curated dataset based on the MIMIC-IV database, spanning 2400 real patient cases and four common abdominal pathologies: appendicitis, cholecystitis, diverticulitis, and pancreatitis. Each patient case contains the filtered and curated information necessary to arrive at the delivered diagnosis of the physician and can be used in an interactive manner to test the information gathering, synthesizing, and diagnostic capabilities of AI models.
BackgroundCommunity-acquired pneumonia (CAP) is a common infectious disease characterized by inflammation of the lung parenchyma in individuals who have not recently been hospitalized. It remains a significant cause of morbidity and mortality worldwide. Aspirin is a widely used drug, often administered to CAP patients. However, the benefits of aspirin remain controversial.ObjectiveWe sought to determine whether aspirin treatment has a protective effect on the outcomes of CAP patients.MethodsWe selected patients with CAP from the Medical Information Mart for Intensive Care IV (MIMIC-IV) database. Propensity score matching (PSM) balanced baseline differences. A multivariate Cox regression model assessed the relationship between aspirin treatment and 28-day mortality.ResultsA total of 3,595 patients were included, with 2,261 receiving aspirin and 1,334 not. After PSM, 1,219 pairs were matched. The 28-day mortality rate for aspirin users was 20.46%, lower than non-users. Multivariate Cox regression indicated aspirin use was associated with decreased 28-day mortality (HR 0.75, 95% CI 0.63–0.88, p < 0.001). No significant differences were found between 325 mg/day and 81 mg/day aspirin treatments in terms of 28-day mortality, hospital mortality, 90-day mortality, gastrointestinal hemorrhage, and thrombocytopenia. However, intensive care unit (ICU) stay was longer for the 325 mg/day group compared to the 81 mg/day group (4.22 vs. 3.57 days, p = 0.031).ConclusionAspirin is associated with reduced 28-day mortality in CAP patients. However, 325 mg/day aspirin does not provide extra benefits over 81 mg/day and may lead to longer ICU stays.
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Retrospectively collected medical data has the opportunity to improve patient care through knowledge discovery and algorithm development. Broad reuse of medical data is desirable for the greatest public good, but data sharing must be done in a manner which protects patient privacy. Here we present Medical Information Mart for Intensive Care (MIMIC)-IV, a large deidentified dataset of patients admitted to the emergency department or an intensive care unit at the Beth Israel Deaconess Medical Center in Boston, MA. MIMIC-IV contains data for over 65,000 patients admitted to an ICU and over 200,000 patients admitted to the emergency department. MIMIC-IV incorporates contemporary data and adopts a modular approach to data organization, highlighting data provenance and facilitating both individual and combined use of disparate data sources. MIMIC-IV is intended to carry on the success of MIMIC-III and support a broad set of applications within healthcare.