Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset provides comprehensive healthcare patient demographic records, including unique medical identifiers, insurance details, emergency contacts, and appointment histories. It enables efficient patient management, supports clinical workflows, and facilitates analytics for healthcare providers and administrators. The structured schema ensures data integrity and usability for operational and research applications.
Facebook
TwitterPatient demographics and clinical data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This file contains raw data of all laboratory measurements presented in the paper. In addition, the file contains raw demographic data of the patients as summarized in the paper in Table 1.
Facebook
TwitterPatient demographics and baseline characteristics.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains counts of inpatient visits leading to a discharge to hospice care. Inpatient visits included in the counts consist of individuals aged 18 or over with a discharge disposition leading to home or facility hospice care. The total counts per each individual year can be viewed based on different patient characteristics, including patient age groups, individual counties of residence, primary payer type, diagnosis category, and patient sex/race/ethnicity. The disease categories include circulatory conditions, diabetes, malignant/benign neoplasms, malnutrition, neurodegenerative disease, renal failure or other kidney diagnoses, respiratory conditions and circulatory conditions. The categories represent common groupings of diagnoses seen in other studies related to hospice care and were created by grouping together relevant medical MSDRG codes in the HCAI inpatient data.
Facebook
TwitterPatient demographics and summary findings.
Facebook
TwitterDemographics of the patient population.
Facebook
Twitterhttps://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
📌 Project Overview This project analyzes hospital admissions, patient stays, and cost trends using Excel. The dataset contains information on patient demographics, hospital names, insurance providers, and treatment costs. Key insights were derived using PivotTables, charts, and formulas.
📊 Key Insights & Visualizations ✅ Top Hospitals by Admissions → Bar Chart ✅ Insurance Provider with Most Patients → Pie Chart ✅ Cost per Day Trends → Line Chart ✅ Average Length of Stay per Hospital → Bar Chart
🛠 Excel Analysis Techniques Used PivotTables for summarizing patient data
Conditional Formatting to highlight cost trends
Bar, Pie, and Line Charts for visualization
Statistical Analysis (Average length of stay, cost trends)
📂 Files Included 📌 hospital_analysis.xlsx – The full Excel analysis file 📌 hospital_summary.pdf – Summary of key findings
Facebook
TwitterBy Health [source]
This dataset contains detailed information about 30-day readmission and mortality rates of U.S. hospitals. It is an essential tool for stakeholders aiming to identify opportunities for improving healthcare quality and performance across the country. Providers benefit by having access to comprehensive data regarding readmission, mortality rate, score, measure start/end dates, compared average to national as well as other pertinent metrics like zip codes, phone numbers and county names. Use this data set to conduct evaluations of how hospitals are meeting industry standards from a quality and outcomes perspective in order to make more informed decisions when designing patient care strategies and policies
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset provides data on 30-day readmission and mortality rates of U.S. hospitals, useful in understanding the quality of healthcare being provided. This data can provide insight into the effectiveness of treatments, patient care, and staff performance at different healthcare facilities throughout the country.
In order to use this dataset effectively, it is important to understand each column and how best to interpret them. The ‘Hospital Name’ column displays the name of the facility; ‘Address’ lists a street address for the hospital; ‘City’ indicates its geographic location; ‘State’ specifies a two-letter abbreviation for that state; ‘ZIP Code’ provides each facility's 5 digit zip code address; 'County Name' specifies what county that particular hospital resides in; 'Phone number' lists a phone contact for any given facility ;'Measure Name' identifies which measure is being recorded (for instance: Elective Delivery Before 39 Weeks); 'Score' value reflects an average score based on patient feedback surveys taken over time frame listed under ' Measure Start Date.' Then there are also columns tracking both lower estimates ('Lower Estimate') as well as higher estimates ('Higher Estimate'); these create variability that can be tracked by researchers seeking further answers or formulating future studies on this topic or field.; Lastly there is one more measure oissociated with this set: ' Footnote,' which may highlight any addional important details pertinent to analysis such as numbers outlying National averages etc..
This data set can be used by hospitals, research facilities and other interested parties in providing inciteful information when making decisions about patient care standards throughout America . It can help find patterns about readmitis/mortality along county lines or answer questions about preformance fluctuations between different hospital locations over an extended amount of time. So if you are ever curious about 30 days readmitted within US Hospitals don't hesitate to dive into this insightful dataset!
- Comparing hospitals on a regional or national basis to measure the quality of care provided for readmission and mortality rates.
- Analyzing the effects of technological advancements such as telemedicine, virtual visits, and AI on readmission and mortality rates at different hospitals.
- Using measures such as Lower Estimate Higher Estimate scores to identify systematic problems in readmissions or mortality rate management at hospitals and informing public health care policy
If you use this dataset in your research, please credit the original authors. Data Source
License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.
File: Readmissions_and_Deaths_-_Hospital.csv | Column name | Description | |:-------------------------|:---------------------------------------------------------------------------------------------------| | Hospital Name ...
Facebook
Twitterhttps://choosealicense.com/licenses/gpl/https://choosealicense.com/licenses/gpl/
This dataset contains clinical and demographic information of patients along with their stroke status. The dataset provides comprehensive medical data for stroke research and machine learning applications.
Dataset Summary
Total Samples: 5,110 Features: 12 (Patient ID, Demographic info, Clinical measurements, Stroke status) Task: Binary classification (Stroke vs No Stroke)
Features
Patient Information
id: Unique identifier (integer) gender: Biological sex… See the full description on the dataset page: https://huggingface.co/datasets/electricsheepafrica/Africa-stroke-prediction-dataset.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
BOSQUE Test Set: A Dermoscopic Image Dataset from Colombian Patients with Diverse Skin Phototypes Description: The BOSQUE Test Set is a curated dataset of 151 dermoscopic images of pigmented skin lesions, collected from dermatology consultations and outreach campaigns in Bogotá, Colombia. Each image is accompanied by expert-verified metadata including histological diagnosis, patient demographic details, anatomical site, and skin phototype. The dataset is intended to support machine learning research in dermatology with a particular focus on skin tone diversity and fairness in diagnostic algorithms. The dataset was developed under the guidance of Universidad El Bosque, whose name inspired the acronym BOSQUE. It responds to the global underrepresentation of darker skin phototypes in existing dermoscopic image collections such as HAM10000, and aims to improve diagnostic equity through inclusive data curation. Key Features 151 dermoscopic images acquired in real-world clinical settings Captured using polarized light dermatoscopes (DermLite 4 + iPhone) Inclusive population: Sex: 97 Female, 54 Male Age groups: from 0–29 to 90+, categorized into clinically relevant bins Fitzpatrick skin phototypes: ranging from II to VI Type II (fair, burns easily): 11 patients Type III (light brown, mild burns): 94 patients Type IV (moderate brown, rarely burns): 34 patients Type V (dark brown, very rarely burns): 7 patients Type VI (deeply pigmented, never burns): 5 patients Lesion characteristics: Nature: benign or malignant (histopathologically confirmed) Size: categorized as ≤5mm, 6–10mm, 11–20mm, >20mm Evolution time: grouped into <1y, 1y, 2y, 3–4y, 5–9y, and 10y+ categories Anatomical site: head/neck, trunk, limbs, or acral areas Histopathological diagnosis: 7-class ISIC-style labels (akiec, bcc, bkl, df, mel, nv, vasc) Clinical label: melanocytic vs. non-melanocytic (from clinical diagnosis) Clinical context: includes personal history of NMSC and use of photosensitizing drugs Image naming: pseudonymized file names encode diagnosis label and image ID Ethics: all data anonymized and collected under IRB-approved protocol in Colombia Included Files BOSQUE_test_set.zip: Folder containing 151 dermoscopic image files (JPG) BOSQUE_metadata.csv: Metadata for each image, including: Patient sex, age group, skin phototype Anatomical site of the lesion Lesion nature (benign/malignant) Lesion size and evolution time (binned) Histological diagnosis (7-class) Clinical label (melanocytic / non-melanocytic) Use Cases This dataset is intended for: Benchmarking AI models for dermoscopic image classification Fairness analysis across skin tones, sex, and age groups Medical education and clinical training on diverse skin phototypes Comparison against HAM10000 or ISIC datasets in research Ethical Statement All patients provided informed consent for the capture and use of clinical and dermoscopic images, the collection of relevant clinical metadata, and the performance of skin biopsies for diagnostic confirmation. The study protocol was reviewed and approved by the Institutional Ethics Committee at Subred Integrada de Servicios de Salud Norte E.S.E and Universidad El Bosque (Bogotá, Colombia). All data were anonymized in compliance with Colombian health data privacy regulations and international ethical standards (e.g., Declaration of Helsinki). No personally identifiable information is included in the metadata or image files. Access to data was restricted to authorized investigators, and patients were informed about the research and educational use of their anonymized data. Suggested Citation [Author(s)]. (2025). BOSQUE Test Set: A Dermoscopic Image Dataset from Colombian Patients with Diverse Skin Phototypes [Data set]. Harvard Dataverse. https://doi.org/xxxxx
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In August of 2018, FSSA’s Office of Healthy Opportunities deployed a social risk assessment survey. The 10-question survey was made available to anyone applying online through FSSA for health coverage, the Supplemental Nutritional Assistance Program or Temporary Assistance for Needy Families. The results of this survey are aggregated and presented below and can help communities better understand the social risk factors affecting the health of those applying for our services. Please read and review the following information regarding the use of this data prior to viewing the tool. This survey was made available to those individuals who applied online ONLY and does not represent anyone who applied in-person, by telephone, by mail or any other method. In 2018, online applications accounted for 79% of those who applied for SNAP, TANF or health coverage. Survey completion is voluntary and does not impact eligibility for SNAP, TANF or health coverage. Applications are filed at a household level and may represent several individuals. The application process identifies a primary contact person for the household, and that individual’s demographics are represented on the dashboard; for example, person’s gender, race and education level. An individual who completes more than one application and survey over any given time period is represented once for each instance, and the survey answers and demographic details are based on each application’s responses. For example, an applicant’s age, education level and survey answers can change over time, and the reporting reflects any such changes. All information is presented in aggregate to ensure personally identifiable information is protected. To protect the privacy of individuals, data representing 20 or less individuals in any county will not be displayed. I.e. it will show as blank
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Purpose: This dataset is designed to aid in the development of computer vision models for the early detection and classification of brain tumors. It contains medical images, typically MRI or CT scans, of human brains with and without tumors.
Key Characteristics:
Image Types: MRI or CT scans Image Format: Commonly JPG, Labels: Associated with each image, indicating the presence or absence of a brain tumor, and possibly the tumor type (e.g., glioma, meningioma, pituitary tumor). Metadata: Additional information such as patient demographics, acquisition parameters, and clinical notes. Potential Use Cases:
Training and evaluation of deep learning models for brain tumor detection and classification. Research into medical image analysis techniques for early diagnosis and treatment planning. Development of AI-powered tools to assist radiologists in identifying and assessing brain tumors. Important Considerations:
Data Quality: Ensure the images are of high quality, well-annotated, and representative of a diverse patient population. Ethical Considerations: Adhere to privacy regulations and obtain informed consent from patients. Data Preprocessing: Images may require preprocessing steps such as normalization, augmentation, and noise reduction before being used for model training.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset integrates multiple healthcare data sources. It combines:
Electronic Health Records (EHRs): Patient demographics, medical history, diagnoses, and treatment plans.
Medical Imaging (CT & MRI): Cross-modality imaging data for enhanced diagnostic analysis.
The UCI dataset contains 20 records and 16 attributes, likely representing features related to diabetes diagnosis or risk factors. All attributes are numerical (integer values: 0 or 1), indicating the presence or absence of a particular condition.
Wearable IoT Sensor Data: Real-time physiological monitoring, including heart rate, activity levels, and mental health indicators.
Facebook
Twitterhttps://www.pioneerdatahub.co.uk/data/data-request-process/https://www.pioneerdatahub.co.uk/data/data-request-process/
OMOP dataset: Hospital COVID patients: severity, acuity, therapies, outcomes Dataset number 2.0
Coronavirus disease 2019 (COVID-19) was identified in January 2020. Currently, there have been more than 6 million cases & more than 1.5 million deaths worldwide. Some individuals experience severe manifestations of infection, including viral pneumonia, adult respiratory distress syndrome (ARDS) & death. There is a pressing need for tools to stratify patients, to identify those at greatest risk. Acuity scores are composite scores which help identify patients who are more unwell to support & prioritise clinical care. There are no validated acuity scores for COVID-19 & it is unclear whether standard tools are accurate enough to provide this support. This secondary care COVID OMOP dataset contains granular demographic, morbidity, serial acuity and outcome data to inform risk prediction tools in COVID-19.
PIONEER geography The West Midlands (WM) has a population of 5.9 million & includes a diverse ethnic & socio-economic mix. There is a higher than average percentage of minority ethnic groups. WM has a large number of elderly residents but is the youngest population in the UK. Each day >100,000 people are treated in hospital, see their GP or are cared for by the NHS. The West Midlands was one of the hardest hit regions for COVID admissions in both wave 1 & 2.
EHR. University Hospitals Birmingham NHS Foundation Trust (UHB) is one of the largest NHS Trusts in England, providing direct acute services & specialist care across four hospital sites, with 2.2 million patient episodes per year, 2750 beds & 100 ITU beds. UHB runs a fully electronic healthcare record (EHR) (PICS; Birmingham Systems), a shared primary & secondary care record (Your Care Connected) & a patient portal “My Health”. UHB has cared for >5000 COVID admissions to date. This is a subset of data in OMOP format.
Scope: All COVID swab confirmed hospitalised patients to UHB from January – August 2020. The dataset includes highly granular patient demographics & co-morbidities taken from ICD-10 & SNOMED-CT codes. Serial, structured data pertaining to care process (timings, staff grades, specialty review, wards), presenting complaint, acuity, all physiology readings (pulse, blood pressure, respiratory rate, oxygen saturations), all blood results, microbiology, all prescribed & administered treatments (fluids, antibiotics, inotropes, vasopressors, organ support), all outcomes.
Available supplementary data: Health data preceding & following admission event. Matched “non-COVID” controls; ambulance, 111, 999 data, synthetic data. Further OMOP data available as an additional service.
Available supplementary support: Analytics, Model build, validation & refinement; A.I.; Data partner support for ETL (extract, transform & load) process, Clinical expertise, Patient & end-user access, Purchaser access, Regulatory requirements, Data-driven trials, “fast screen” services.
Facebook
TwitterODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Please note this page provides neighborhood demographic data using 2010 Census tracts. For updated Neighborhood Demographics using 2020 Census tracts consistently across historical years, please refer to the Planning Department Research Division's Exploring Neighborhood Change Tool. The tool visualizes demographic, economic, and housing data for Boston's tracts and neighborhoods from 1950 to 2025 (with projections to 2035) using the most up-to-date 2020 Census tract-based Neighborhood boundaries.
Boston is a city defined by the unique character of its many neighborhoods. The historical tables created by the BPDA Research Division from U.S. Census Decennial data describe demographic changes in Boston’s neighborhoods from 1950 through 2010 using consistent tract-based geographies. For more analysis of these data, please see Historical Trends in Boston's Neighborhoods. The most recent available neighborhood demographic data come from the 5-year American Community Survey (ACS). The ACS tables also present demographic data for Census-tract approximations of Boston’s neighborhoods. For pdf versions of the data presented here plus earlier versions of the analysis, please see Boston in Context.
Facebook
Twitterhttps://www.pioneerdatahub.co.uk/data/data-request-process/https://www.pioneerdatahub.co.uk/data/data-request-process/
1 in 17 people are born with or develop a rare disease during their lifetime. 80% of rare diseases have an identified genetic component. However, there are usually significant diagnostic delays. The 100k Genome project was established to collect clinical data, genomic sequencing and samples from people with cancer and rare diseases, to better understand disease and find novel treatments and interventions. This includes rare cardiovascular, ciliopathy, endocrine, gastroenterological, haematological, metabolic, neurological, renal, respiratory skeletal and rheumatological disorders and cancers.
The PIONEER University Hospital Birmingham (UHB) secondary care 100k genomics dataset contains granular demographic, morbidity, treatment and outcome data, supplemented with acute care contacts with serial physiology, blood biomarker data from UHB patients recruited to this programme, to better understand the acute healthcare needs of this group of patients.
PIONEER geography: The West Midlands has a population of 5.9M and includes a diverse ethnic and socio-economic mix. There is a higher than average percentage of minority ethnic groups and a higher than average proportion of patients with rare diseases. Birmingham is home to the first Centre for Rare Diseases for adults and children, treating more than 500 rare diseases and 9000 patients per year.
Electronic Health Records: University Hospitals Birmingham NHS Foundation Trust (UHB) is one of the largest NHS Trusts in England, providing direct acute services and specialist care across four hospital sites, with 2.2M patient episodes per year, 2750 beds and 100 ITU beds.
Scope: All patients recruited to the 100K genome project from UHB. This includes all routinely collected health data for all these patients, but data is uniquely supplemented with all acute care contacts through UHB. The dataset includes highly granular patient demographics and co-morbidities taken from ICD-10 and SNOMED-CT codes. Serial, structured data pertaining to acute care process (timings, staff grades, specialty review, wards), presenting complaint, acuity, all physiology readings (pulse, blood pressure, respiratory rate, oxygen saturations), all blood results, microbiology, all prescribed and administered treatments (fluids, antibiotics, inotropes, vasopressors, organ support), all outcomes.
Available supplementary data: Matched controls; ambulance, synthetic data. Available supplementary support: Analytics, Model build, validation and refinement; A.I.; Data partner support for ETL process, Clinical expertise, Patient and end-user access, Purchaser access, Regulatory requirements, Data-driven trials, “fast screen” services.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is a synthetic hospital dataset designed for data science and machine learning practice. It contains detailed information about patient admissions, hospital departments, ward types, bed grades, and hospital resources, making it ideal for predictive modeling, exploratory data analysis, and feature engineering exercises.
5000 patient records with 18 columns covering patient demographics, admission details, severity, and stay length.
Fully cleaned and preprocessed: duplicates removed, missing values handled, categorical features encoded, and numeric columns ready for ML.
Suitable for regression tasks (predicting length of stay or admission deposit) and classification tasks (predicting severity of illness, type of admission, or department).
Synthetic and safe: no real patient data included, perfect for learning and experimentation.
This dataset is perfect for students, beginners, and practitioners who want to practice healthcare analytics, ML model building, or data preprocessing.
Facebook
TwitterPatient characteristics and demographics.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Patient demographics at the time of the baseline PET/CT (total number of PET/CT examinations n = 101).
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset provides comprehensive healthcare patient demographic records, including unique medical identifiers, insurance details, emergency contacts, and appointment histories. It enables efficient patient management, supports clinical workflows, and facilitates analytics for healthcare providers and administrators. The structured schema ensures data integrity and usability for operational and research applications.