On an annual basis (individual hospital fiscal year), individual hospitals and hospital systems report detailed facility-level data on services capacity, inpatient/outpatient utilization, patients, revenues and expenses by type and payer, balance sheet and income statement.
Due to the large size of the complete dataset, a selected set of data representing a wide range of commonly used data items, has been created that can be easily managed and downloaded. The selected data file includes general hospital information, utilization data by payer, revenue data by payer, expense data by natural expense category, financial ratios, and labor information.
There are two groups of data contained in this dataset: 1) Selected Data - Calendar Year: To make it easier to compare hospitals by year, hospital reports with report periods ending within a given calendar year are grouped together. The Pivot Tables for a specific calendar year are also found here. 2) Selected Data - Fiscal Year: Hospital reports with report periods ending within a given fiscal year (July-June) are grouped together.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset is designed for predicting patient readmissions within 30 days of discharge. It includes synthetic patient records with a variety of medical features such as age, diagnosis, number of procedures, and discharge destination. The goal is to develop machine learning models that can predict whether a patient will be readmitted within 30 days, which can help hospitals improve patient care and reduce costs.
train.csv - This file contains the training data with a target label (readmitted) indicating whether a patient was readmitted within 30 days.
test.csv - This file contains the test data, which omits the target variable. The task is to predict whether each patient in this dataset will be readmitted.
sample_submission.csv - This file shows the expected format for your submission, with two columns: Patient_ID and readmitted.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset is Electronic Health Record Predicting collected from a private Hospital in Indonesia. It contains the patients laboratory test results used to determine next patient treatment whether in care or out care patient. The task embedded to the dataset is classification prediction.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
The Open Database of Healthcare Facilities (ODHF) is a collection of open data containing the names, types, and locations of health facilities across Canada. It is released under the Open Government License - Canada. The ODHF compiles open, publicly available, and directly-provided data on health facilities across Canada. Data sources include regional health authorities, provincial, territorial and municipal governments, and public health and professional healthcare bodies. This database aims to provide enhanced access to a harmonized listing of health facilities across Canada by making them available as open data. This database is a component of the Linkable Open Data Environment (LODE).
USGS Structures from The National Map (TNM) consists of data to include the name, function, location, and other core information and characteristics of selected manmade facilities across all US states and territories. The types of structures collected are largely determined by the needs of disaster planning and emergency response, and homeland security organizations. Structures currently included are: School, School:Elementary, School:Middle, School:High, College/University, Technical/Trade School, Ambulance Service, Fire Station/EMS Station, Law Enforcement, Prison/Correctional Facility, Post Office, Hospital/Medical Center, Cabin, Campground, Cemetery, Historic Site/Point of Interest, Picnic Area, Trailhead, Vistor/Information Center, US Capitol, State Capitol, US Supreme Court, State Supreme Court, Court House, Headquarters, Ranger Station, White House, and City/Town Hall. Structures data are designed to be used in general mapping and in the analysis of structure related activities using geographic information system technology. Included is a feature class of preliminary building polygons provided by FEMA, USA Structures. The National Map structures data is commonly combined with other data themes, such as boundaries, elevation, hydrography, and transportation, to produce general reference base maps. The National Map viewer allows free downloads of public domain structures data in either Esri File Geodatabase or Shapefile formats. For additional information on the structures data model, go to https://www.usgs.gov/ngp-standards-and-specifications/national-map-structures-content.
The Medical Information Mart for Intensive Care III (MIMIC-III) dataset is a large, de-identified and publicly-available collection of medical records. Each record in the dataset includes ICD-9 codes, which identify diagnoses and procedures performed. Each code is partitioned into sub-codes, which often include specific circumstantial details. The dataset consists of 112,000 clinical reports records (average length 709.3 tokens) and 1,159 top-level ICD-9 codes. Each report is assigned to 7.6 codes, on average. Data includes vital signs, medications, laboratory measurements, observations and notes charted by care providers, fluid balance, procedure codes, diagnostic codes, imaging reports, hospital length of stay, survival data, and more.
The database supports applications including academic and industrial research, quality improvement initiatives, and higher education coursework.
This dataset contains Hospital Chargemasters with prices in effect as of June 1 of their reporting year. Chargemasters consists of a list of average charges for 25 common outpatient procedures, and the estimated percentage change in gross revenue due to price changes each July 1.
For more on HCAI Chargemaster Data.
Comprehensive dataset of 41 Hospitals in Free municipal consortium of Agrigento, Italy as of July, 2025. Includes verified contact information (email, phone), geocoded addresses, customer ratings, reviews, business categories, and operational details. Perfect for market research, lead generation, competitive analysis, and business intelligence. Download a complimentary sample to evaluate data quality and completeness.
The Healthcare Cost and Utilization Project (HCUP) Kids' Inpatient Database (KID) is the largest publicly available all-payer pediatric inpatient care database in the United States, containing data from two to three million hospital stays each year. Its large sample size is ideal for developing national and regional estimates and enables analyses of rare conditions, such as congenital anomalies, as well as uncommon treatments, such as organ transplantation. Developed through a Federal-State-Industry partnership sponsored by the Agency for Healthcare Research and Quality, HCUP data inform decision making at the national, State, and community levels. The KID is a sample of pediatric discharges from 4,000 U.S. hospitals in the HCUP State Inpatient Databases yielding approximately two to three million unweighted hospital discharges for newborns, children, and adolescents per year. About 10 percent of normal newborns and 80 percent of other neonatal and pediatric stays are selected from each hospital that is sampled for patients younger than 21 years of age. The KID contains clinical and resource use information included in a typical discharge abstract, with safeguards to protect the privacy of individual patients, physicians, and hospitals (as required by data sources). It includes discharge status, diagnoses, procedures, patient demographics (e.g., sex, age), expected source of primary payment (e.g., Medicare, Medicaid, private insurance, self-pay, and other insurance types), and hospital charges and cost. Restricted access data files are available with a data use agreement and brief online security training.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The heart attack datasets were collected at Zheen hospital in Erbil, Iraq, from January 2019 to May 2019. The attributes of this dataset are: age, gender, heart rate, systolic blood pressure, diastolic blood pressure, blood sugar, ck-mb and troponin with negative or positive output. According to the provided information, the medical dataset classifies either heart attack or none. The gender column in the data is normalized: the male is set to 1 and the female to 0. The glucose column is set to 1 if it is > 120; otherwise, 0. As for the output, positive is set to 1 and negative to 0.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This heart disease dataset is acquired from one o f the multispecialty hospitals in India. Over 14 common features which makes it one of the heart disease dataset available so far for research purposes. This dataset consists of 1000 subjects with 12 features. This dataset will be useful for building a early-stage heart disease detection as well as to generate predictive machine learning models.
Comprehensive dataset of 4 General hospitals in Free municipal consortium of Caltanissetta, Italy as of July, 2025. Includes verified contact information (email, phone), geocoded addresses, customer ratings, reviews, business categories, and operational details. Perfect for market research, lead generation, competitive analysis, and business intelligence. Download a complimentary sample to evaluate data quality and completeness.
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
MIMIC-III is a large, freely-available database comprising deidentified health-related data associated with over forty thousand patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012. The database includes information such as demographics, vital sign measurements made at the bedside (~1 data point per hour), laboratory test results, procedures, medications, caregiver notes, imaging reports, and mortality (including post-hospital discharge).MIMIC supports a diverse range of analytic studies spanning epidemiology, clinical decision-rule improvement, and electronic tool development. It is notable for three factors: it is freely available to researchers worldwide; it encompasses a diverse and very large population of ICU patients; and it contains highly granular data, including vital signs, laboratory results, and medications.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The construction of diabetes dataset was explained. The data were collected from the Iraqi society, as they data were acquired from the laboratory of Medical City Hospital and (the Specializes Center for Endocrinology and Diabetes-Al-Kindy Teaching Hospital). Patients' files were taken and data extracted from them and entered in to the database to construct the diabetes dataset. The data consist of medical information, laboratory analysis. The data attribute are: The data consist of medical information, laboratory analysis… etc. The data that have been entered initially into the system are: No. of Patient, Sugar Level Blood, Age, Gender, Creatinine ratio(Cr), Body Mass Index (BMI), Urea, Cholesterol (Chol), Fasting lipid profile, including total, LDL, VLDL, Triglycerides(TG) and HDL Cholesterol , HBA1C, Class (the patient's diabetes disease class may be Diabetic, Non-Diabetic, or Predict-Diabetic).
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
The MIT-BIH Arrhythmia Database contains 48 half-hour excerpts of two-channel ambulatory ECG recordings, obtained from 47 subjects studied by the BIH Arrhythmia Laboratory between 1975 and 1979. Twenty-three recordings were chosen at random from a set of 4000 24-hour ambulatory ECG recordings collected from a mixed population of inpatients (about 60%) and outpatients (about 40%) at Boston's Beth Israel Hospital; the remaining 25 recordings were selected from the same set to include less common but clinically significant arrhythmias that would not be well-represented in a small random sample.
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Retrospectively collected medical data has the opportunity to improve patient care through knowledge discovery and algorithm development. Broad reuse of medical data is desirable for the greatest public good, but data sharing must be done in a manner which protects patient privacy. Here we present Medical Information Mart for Intensive Care (MIMIC)-IV, a large deidentified dataset of patients admitted to the emergency department or an intensive care unit at the Beth Israel Deaconess Medical Center in Boston, MA. MIMIC-IV contains data for over 65,000 patients admitted to an ICU and over 200,000 patients admitted to the emergency department. MIMIC-IV incorporates contemporary data and adopts a modular approach to data organization, highlighting data provenance and facilitating both individual and combined use of disparate data sources. MIMIC-IV is intended to carry on the success of MIMIC-III and support a broad set of applications within healthcare.
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
This database, collected at the Children’s Hospital Boston, consists of EEG recordings from pediatric subjects with intractable seizures. Subjects were monitored for up to several days following withdrawal of anti-seizure medication in order to characterize their seizures and assess their candidacy for surgical intervention. The recordings are grouped into 23 cases and were collected from 22 subjects (5 males, ages 3–22; and 17 females, ages 1.5–19).
We created a dataset of clinical action items annotated over MIMIC-III. This dataset, which we call CLIP, is annotated by physicians and covers 718 discharge summaries, representing 107,494 sentences. Annotations were collected as character-level spans to discharge summaries after applying surrogate generation to fill in the anonymized templates from MIMIC-III text with faked data. We release these spans, their aggregation into sentence-level labels, and the sentence tokenizer used to aggregate the spans and label sentences. We also release the surrogate data generator, and the document IDs used for training, validation, and test splits, to enable reproduction. The spans are annotated with 0 or more labels of 7 different types, representing the different actions that may need to be taken: Appointment, Lab, Procedure, Medication, Imaging, Patient Instructions, and Other. We encourage the community to use this dataset to develop methods for automatically extracting clinical action items from discharge summaries.
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
HiRID is a freely accessible critical care dataset containing data relating to almost 34 thousand patient admissions to the Department of Intensive Care Medicine of the Bern University Hospital, Switzerland (ICU), an interdisciplinary 60-bed unit admitting >6,500 patients per year. The ICU offers the full range of modern interdisciplinary intensive care medicine for adult patients. The dataset was developed in cooperation between the Swiss Federal Institute of Technology (ETH) Zürich, Switzerland and the ICU.
The dataset contains de-identified demographic information and a total of 681 routinely collected physiological variables, diagnostic test results and treatment parameters from almost 34 thousand admissions during the period from January 2008 to June 2016. Data is stored with a uniquely high time resolution of one entry every two minutes.
Comprehensive dataset of 2 Private hospitals in Free municipal consortium of Caltanissetta, Italy as of July, 2025. Includes verified contact information (email, phone), geocoded addresses, customer ratings, reviews, business categories, and operational details. Perfect for market research, lead generation, competitive analysis, and business intelligence. Download a complimentary sample to evaluate data quality and completeness.
On an annual basis (individual hospital fiscal year), individual hospitals and hospital systems report detailed facility-level data on services capacity, inpatient/outpatient utilization, patients, revenues and expenses by type and payer, balance sheet and income statement.
Due to the large size of the complete dataset, a selected set of data representing a wide range of commonly used data items, has been created that can be easily managed and downloaded. The selected data file includes general hospital information, utilization data by payer, revenue data by payer, expense data by natural expense category, financial ratios, and labor information.
There are two groups of data contained in this dataset: 1) Selected Data - Calendar Year: To make it easier to compare hospitals by year, hospital reports with report periods ending within a given calendar year are grouped together. The Pivot Tables for a specific calendar year are also found here. 2) Selected Data - Fiscal Year: Hospital reports with report periods ending within a given fiscal year (July-June) are grouped together.