54 datasets found

Healthcare Management System
kaggle.com
Updated Dec 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anouska Abhisikta (2023). Healthcare Management System [Dataset]. https://www.kaggle.com/datasets/anouskaabhisikta/healthcare-management-system
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 23, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Anouska Abhisikta
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Patients Table:

PatientID: Unique identifier for each patient.

firstname: First name of the patient.

lastname: Last name of the patient.

email: Email address of the patient.

This table stores information about individual patients, including their names and contact details.

Doctors Table:

DoctorID: Unique identifier for each doctor.

DoctorName: Full name of the doctor.

Specialization: Area of medical specialization.

DoctorContact: Contact details of the doctor.

This table contains details about healthcare providers, including their names, specializations, and contact information.

Appointments Table:

AppointmentID: Unique identifier for each appointment.

Date: Date of the appointment.

Time: Time of the appointment.

PatientID: Foreign key referencing the Patients table, indicating the patient for the appointment.

DoctorID: Foreign key referencing the Doctors table, indicating the doctor for the appointment.

This table records scheduled appointments, linking patients to doctors.

MedicalProcedure Table:

ProcedureID: Unique identifier for each medical procedure.

ProcedureName: Name or description of the medical procedure.

AppointmentID: Foreign key referencing the Appointments table, indicating the appointment associated with the procedure.

This table stores details about medical procedures associated with specific appointments.

Billing Table:

InvoiceID: Unique identifier for each billing transaction.

PatientID: Foreign key referencing the Patients table, indicating the patient for the billing transaction.

Items: Description of items or services billed.

Amount: Amount charged for the billing transaction.

This table maintains records of billing transactions, associating them with specific patients.

demo Table:

ID: Primary key, serves as a unique identifier for each record.

Name: Name of the entity.

Hint: Additional information or hint about the entity.

This table appears to be a demonstration or testing table, possibly unrelated to the healthcare management system.

This dataset schema is designed to capture comprehensive information about patients, doctors, appointments, medical procedures, and billing transactions in a healthcare management system. Adjustments can be made based on specific requirements, and additional attributes can be included as needed.
Synthetic Healthcare Database for Research (SyH-DR)
catalog.data.gov
healthdata.gov
+2more
Updated Sep 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agency for Healthcare Research and Quality (2023). Synthetic Healthcare Database for Research (SyH-DR) [Dataset]. https://catalog.data.gov/dataset/synthetic-healthcare-database-for-research-syh-dr
Explore at:
Dataset updated
Sep 16, 2023
Dataset provided by
Agency for Healthcare Research and Qualityhttp://www.ahrq.gov/
Description
The Agency for Healthcare Research and Quality (AHRQ) created SyH-DR from eligibility and claims files for Medicare, Medicaid, and commercial insurance plans in calendar year 2016. SyH-DR contains data from a nationally representative sample of insured individuals for the 2016 calendar year. SyH-DR uses synthetic data elements at the claim level to resemble the marginal distribution of the original data elements. SyH-DR person-level data elements are not synthetic, but identifying information is aggregated or masked.
m
Heart Attack Dataset
data.mendeley.com
kaggle.com
Updated Nov 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tarik A. Rashid (2022). Heart Attack Dataset [Dataset]. http://doi.org/10.17632/wmhctcrt5v.1
Explore at:
Unique identifier
https://doi.org/10.17632/wmhctcrt5v.1
Dataset updated
Nov 23, 2022
Authors
Tarik A. Rashid
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The heart attack datasets were collected at Zheen hospital in Erbil, Iraq, from January 2019 to May 2019. The attributes of this dataset are: age, gender, heart rate, systolic blood pressure, diastolic blood pressure, blood sugar, ck-mb and troponin with negative or positive output. According to the provided information, the medical dataset classifies either heart attack or none. The gender column in the data is normalized: the male is set to 1 and the female to 0. The glucose column is set to 1 if it is > 120; otherwise, 0. As for the output, positive is set to 1 and negative to 0.
G
Open Database of Healthcare Facilities
open.canada.ca
ouvert.canada.ca
zip
Updated Apr 23, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statistics Canada (2020). Open Database of Healthcare Facilities [Dataset]. https://open.canada.ca/data/en/dataset/543fe07a-fd79-40e9-a829-ccd697526765
Explore at:
zipAvailable download formats
Dataset updated
Apr 23, 2020
Dataset provided by
Statistics Canada
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Time period covered
Nov 1, 2019 - Mar 1, 2020
Description
The Open Database of Healthcare Facilities (ODHF) is a listing of health facilities across Canada. Facilities are classified into one of three types: ambulatory health care services, hospitals, and nursing and residential care facilities. The listing contains the names, addresses, and geo coordinates of facilities, as well as the facility type as assigned in the data source. The ODHF is based on data from authoritative sources that include among them all levels of government and public health and professional healthcare bodies. The ODHF is released as open data under the Open Government License - Canada and provided as a zipped comma-separated values (.csv) file.
P
MIMIC-III Dataset
paperswithcode.com
opendatalab.com
Updated Apr 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alistair E.W. Johnson; Tom J. Pollard; Lu Shen; Li-wei H. Lehman; Mengling Feng; Mohammad Ghassemi; Benjamin Moody; Peter Szolovits; Leo Anthony Celi; Roger G. Mark (2022). MIMIC-III Dataset [Dataset]. https://paperswithcode.com/dataset/mimic-iii
Explore at:
Dataset updated
Apr 20, 2022
Authors
Alistair E.W. Johnson; Tom J. Pollard; Lu Shen; Li-wei H. Lehman; Mengling Feng; Mohammad Ghassemi; Benjamin Moody; Peter Szolovits; Leo Anthony Celi; Roger G. Mark
Description
The Medical Information Mart for Intensive Care III (MIMIC-III) dataset is a large, de-identified and publicly-available collection of medical records. Each record in the dataset includes ICD-9 codes, which identify diagnoses and procedures performed. Each code is partitioned into sub-codes, which often include specific circumstantial details. The dataset consists of 112,000 clinical reports records (average length 709.3 tokens) and 1,159 top-level ICD-9 codes. Each report is assigned to 7.6 codes, on average. Data includes vital signs, medications, laboratory measurements, observations and notes charted by care providers, fluid balance, procedure codes, diagnostic codes, imaging reports, hospital length of stay, survival data, and more.

The database supports applications including academic and industrial research, quality improvement initiatives, and higher education coursework.
c
Mental Health - Datasets - CTData.org
data.ctdata.org
Updated Jun 24, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2016). Mental Health - Datasets - CTData.org [Dataset]. http://data.ctdata.org/dataset/mental-health
Explore at:
Dataset updated
Jun 24, 2016
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Mental Health reports the prevalence of the mental illness in the past year by age range.
G
Health Trends, Comprehensive download file for all geographies
open.canada.ca
ouvert.canada.ca
csv
Updated Mar 9, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statistics Canada (2022). Health Trends, Comprehensive download file for all geographies [Dataset]. https://open.canada.ca/data/en/dataset/3ef254aa-519b-47d6-96ec-f0ba2e72e1dd
Explore at:
csvAvailable download formats
Dataset updated
Mar 9, 2022
Dataset provided by
Statistics Canada
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Description
This product presents comparable time-series data for a range of health indicators from a number of sources including the Canadian Community Health Survey, Vital Statistics, and Canadian Cancer Registry.
E
Health Statistic and Research Database
www-acc.healthinformationportal.eu
healthinformationportal.eu
html
Updated Feb 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Estonian National Institute for Health Development (2023). Health Statistic and Research Database [Dataset]. https://www-acc.healthinformationportal.eu/health-information-sources/health-statistic-and-research-database
Explore at:
htmlAvailable download formats
Dataset updated
Feb 23, 2023
Dataset authored and provided by
Estonian National Institute for Health Development
Variables measured
sex, title, topics, country, language, data_owners, description, contact_name, geo_coverage, contact_email, and 10 more
Measurement technique
Multiple sources
Description
The Health Statistics and Health Research Database is Estonian largest set of health-related statistics and survey results administrated by National Institute for Health Development. Use of the database is free of charge.

The database consists of eight main areas divided into sub-areas. The data tables included in the sub-areas are assigned unique codes. The data tables presented in the database can be both viewed in the Internet environment, and downloaded using different file formats (.px, .xlsx, .csv, .json). You can download the detailed database user manual here (.pdf).

The database is constantly updated with new data. Dates of updating the existing data tables and adding new data are provided in the release calendar. The date of the last update to each table is provided after the title of the table in the list of data tables.

A contact person for each sub-area is provided under the "Definitions and Methodology" link of each sub-area, so you can ask additional information about the data published in the database. Contact this person for any further questions and data requests.

Read more about publication of health statistics by National Institute for Health Development in Health Statistics Dissemination Principles.
o
Synthetic Metabolic Syndrome Patient Records Dataset
opendatabay.com
.undefined
Updated Apr 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Opendatabay Labs (2025). Synthetic Metabolic Syndrome Patient Records Dataset [Dataset]. https://www.opendatabay.com/data/synthetic/7bf17077-77ce-40cc-84e8-05b5e545d5eb
Explore at:
.undefinedAvailable download formats
Dataset updated
Apr 26, 2025
Dataset authored and provided by
Opendatabay Labs
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Patient Health Records & Digital Health
Description
The Synthetic Metabolic Syndrome Dataset is designed for educational and research purposes in healthcare, focusing on metabolic syndrome and related health parameters. The dataset contains demographic, anthropometric, and biochemical information that can be used to analyze and predict the presence of metabolic syndrome in individuals.

Dataset Features

seqn: A unique identifier for each individual in the dataset.

Age: Age of the individual (in years).

Sex: Gender of the individual (Male/Female).

Marital: Marital status of the individual (e.g., Married, Separated, etc.).

Income: Annual income of the individual (in simulated currency units).

Race: Race or ethnicity of the individual (e.g., White, Black, Mexican American, etc.).

WaistCirc: Waist circumference (in cm), an indicator of central obesity.

BMI: Body Mass Index, a measure of body fat based on height and weight.

Albuminuria: Presence of albumin in urine (binary indicator, 0 for no, 1 for yes).

UrAlbCr: Urinary albumin-to-creatinine ratio, a measure of kidney health.

UricAcid: Uric acid levels (in mg/dL), used to assess gout risk and metabolic health.

BloodGlucose: Blood glucose level (in mg/dL), an indicator of diabetes or prediabetes.

HDL: High-density lipoprotein cholesterol level (in mg/dL), often referred to as "good cholesterol."

Triglycerides: Triglyceride levels (in mg/dL), a measure of fat in the blood.

MetabolicSyndrome: Presence of metabolic syndrome (Yes/No), based on a combination of criteria such as waist circumference, blood pressure, glucose, HDL, and triglycerides. ### Distribution https://storage.googleapis.com/opendatabay_public/7bf17077-77ce-40cc-84e8-05b5e545d5eb/c5f0c7f50ff3_Metabolic_2.png" alt="Synthetic Metabolic Syndrome Patient Records Dataset Distribution">

https://storage.googleapis.com/opendatabay_public/7bf17077-77ce-40cc-84e8-05b5e545d5eb/7e880a16ea2c_Metabolic_1.png" alt="Synthetic Metabolic Syndrome Data">

Usage

This dataset is well-suited for applications in healthcare analytics, public health, and data science:

Metabolic Syndrome Prediction: Develop machine learning models to predict the presence of metabolic syndrome based on demographic and biochemical markers.

Risk Factor Analysis: Identify key risk factors for metabolic syndrome, such as obesity, high glucose, or low HDL cholesterol.

Public Health Research: Investigate correlations between socioeconomic status (e.g., income and marital status) and metabolic syndrome prevalence.

Personalized Healthcare: Design intervention strategies tailored to individuals based on their metabolic health profile.

Health Disparities: Explore health disparities among racial and ethnic groups to inform equitable healthcare policies. ### Coverage This synthetic dataset provides a comprehensive representation of metabolic health across different demographic groups. It includes diverse examples of individuals at varying risk levels for metabolic syndrome, ensuring broad applicability in research and education.

License

CC0 (Public Domain)

Who Can Use It - Healthcare Professionals: To study metabolic syndrome trends and tailor interventions. - Data Scientists: For practicing classification, regression, and clustering techniques in healthcare analytics. - Public Health Analysts: To assess population-level metabolic health and inform policies. - Researchers: To simulate the impact of lifestyle changes on metabolic health outcomes.
P
MIMIC-IV Dataset
paperswithcode.com
physionet.org
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MIMIC-IV Dataset [Dataset]. https://paperswithcode.com/dataset/mimic-iv
Explore at:
Description
Retrospectively collected medical data has the opportunity to improve patient care through knowledge discovery and algorithm development. Broad reuse of medical data is desirable for the greatest public good, but data sharing must be done in a manner which protects patient privacy.

The Medical Information Mart for Intensive Care (MIMIC)-III database provided critical care data for over 40,000 patients admitted to intensive care units at the Beth Israel Deaconess Medical Center (BIDMC). Importantly, MIMIC-III was deidentified, and patient identifiers were removed according to the Health Insurance Portability and Accountability Act (HIPAA) Safe Harbor provision. MIMIC-III has been integral in driving large amounts of research in clinical informatics, epidemiology, and machine learning. Here we present MIMIC-IV, an update to MIMIC-III, which incorporates contemporary data and improves on numerous aspects of MIMIC-III. MIMIC-IV adopts a modular approach to data organization, highlighting data provenance and facilitating both individual and combined use of disparate data sources. MIMIC-IV is intended to carry on the success of MIMIC-III and support a broad set of applications within healthcare.
p
MIMIC-IV
physionet.org
Updated Oct 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alistair Johnson; Lucas Bulgarelli; Tom Pollard; Brian Gow; Benjamin Moody; Steven Horng; Leo Anthony Celi; Roger Mark (2024). MIMIC-IV [Dataset]. http://doi.org/10.13026/kpb9-mt58
Explore at:
Unique identifier
https://doi.org/10.13026/kpb9-mt58
Dataset updated
Oct 11, 2024
Authors
Alistair Johnson; Lucas Bulgarelli; Tom Pollard; Brian Gow; Benjamin Moody; Steven Horng; Leo Anthony Celi; Roger Mark
License
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Description
Retrospectively collected medical data has the opportunity to improve patient care through knowledge discovery and algorithm development. Broad reuse of medical data is desirable for the greatest public good, but data sharing must be done in a manner which protects patient privacy. Here we present Medical Information Mart for Intensive Care (MIMIC)-IV, a large deidentified dataset of patients admitted to the emergency department or an intensive care unit at the Beth Israel Deaconess Medical Center in Boston, MA. MIMIC-IV contains data for over 65,000 patients admitted to an ICU and over 200,000 patients admitted to the emergency department. MIMIC-IV incorporates contemporary data and adopts a modular approach to data organization, highlighting data provenance and facilitating both individual and combined use of disparate data sources. MIMIC-IV is intended to carry on the success of MIMIC-III and support a broad set of applications within healthcare.

Stroke Risk Prediction Dataset based on Literature

kaggle.com

Updated Mar 1, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Mahatir Ahmed Tusher (2025). Stroke Risk Prediction Dataset based on Literature [Dataset]. http://doi.org/10.34740/kaggle/dsv/10892812

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.34740/kaggle/dsv/10892812

Dataset updated

Mar 1, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Mahatir Ahmed Tusher

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Stroke Risk Prediction Dataset (Version 2)

Medically Validated, Age-Accurate, and Balanced
Samples: 35,000 | Features: 16 | Targets: 2 (Binary + Regression)

📌 Overview

This dataset is designed for predicting stroke risk using symptoms, demographics, and medical literature-inspired risk modeling. Version 2 significantly improves upon Version 1 by incorporating age-dependent symptom probabilities, gender-specific risk modifiers, and medically validated feature engineering.

Key Enhancements in Version 2:

Age-Accurate Risk Modeling:
- Stroke risk now follows a sigmoidal curve (sharp increase after age 50), reflecting real-world epidemiological trends.
- Symptom probabilities (e.g., hypertension, chest pain) scale with age (see Medical Validity).
Gender-Specific Risk:
- Males under 60 have 1.5× higher risk, while females over 60 have 1.8× higher risk (post-menopausal hormonal changes).
Balanced and Expanded Data:
- 35,000 samples (vs. 10,000 in Version 1) to improve model generalizability and capture rare symptom combinations.
- 50% at-risk (stroke risk ≥50%) and 50% not-at-risk (stroke risk <50%).

📊 Dataset Statistics

Column	Type	Description
`age`	Integer	Age (18–90)
`gender`	String	Male/Female
`chest_pain`	Binary	1 = Present, 0 = Absent
`shortness_of_breath`	Binary	1 = Present, 0 = Absent
`irregular_heartbeat`	Binary	1 = Present, 0 = Absent
`fatigue_weakness`	Binary	1 = Present, 0 = Absent
`dizziness`	Binary	1 = Present, 0 = Absent
`swelling_edema`	Binary	1 = Present, 0 = Absent
`neck_jaw_pain`	Binary	1 = Present, 0 = Absent
`excessive_sweating`	Binary	1 = Present, 0 = Absent
`persistent_cough`	Binary	1 = Present, 0 = Absent
`nausea_vomiting`	Binary	1 = Present, 0 = Absent
`high_blood_pressure`	Binary	1 = Present, 0 = Absent
`chest_discomfort`	Binary	1 = Present, 0 = Absent
`cold_hands_feet`	Binary	1 = Present, 0 = Absent
`snoring_sleep_apnea`	Binary	1 = Present, 0 = Absent
`anxiety_doom`	Binary	1 = Present, 0 = Absent
`at_risk`	Binary	Target for classification (1 = At Risk, 0 = Not At Risk)
`stroke_risk_percentage`	Float	Target for regression (0–100%)

Age distribution in Version 2 vs. Version 1
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F21100322%2F6317df05bc7526268853e24a5ce831ba%2FAge%20Distribution%20Plot.png?generation=1740875866152537&alt=media" alt="">

🔬 Medical Validity

This dataset is grounded in peer-reviewed medical literature, with symptom probabilities, risk weights, and demographic relationships directly derived from clinical guidelines and epidemiological studies. Below is a detailed breakdown of how medical knowledge was translated into dataset parameters:

1. Age-Dependent Symptom Probabilities

The prevalence of symptoms increases with age, reflecting real-world clinical observations. Probabilities are calibrated using population-level data from medical literature:

Hypertension (High Blood Pressure)

Probability by Age: 10% (18–30), 25% (31–50), 45% (51–70), 60% (71–90).
Source: WHO Global Report on Stroke (2023) identifies hypertension as the leading modifiable stroke risk factor, with prevalence rising from ~12% in adults <30 to ~65% in adults >70.
Clinical Basis: Arterial stiffness and cumulative vascular damage over time explain the age-dependent increase (Chapter 4, Harrison’s Principles of Internal Medicine).

Chest Pain

Probability by Age: 5% (18–30), 15% (31–50), 25% (51–70), 35% (71–90).
Source: The Stroke Book (Cambridge Medicine) notes that chest pain is rare in young adults but becomes prevalent in older populations due to atherosclerosis and coronary artery disease.
Clinical Basis: Atherosclerotic plaque buildup accelerates after age ...

Datasets for federated learning
kaggle.com
Updated Dec 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
wonghoitin (2022). Datasets for federated learning [Dataset]. https://www.kaggle.com/datasets/wonghoitin/datasets-for-federated-learning
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 29, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
wonghoitin
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Federated learning is to build machine learning models based on data sets that are distributed across multiple devices while preventing data leakage.(Q. Yang et al. 2019)

source:

smoking https://www.kaggle.com/datasets/kukuroo3/body-signal-of-smoking license = CC0: Public Domain

heart https://www.kaggle.com/datasets/rashikrahmanpritom/heart-attack-analysis-prediction-dataset license = CC0: Public Domain

water https://www.kaggle.com/datasets/adityakadiwal/water-potability license = CC0: Public Domain

customer https://www.kaggle.com/datasets/imakash3011/customer-personality-analysis license = CC0: Public Domain

insurance https://www.kaggle.com/datasets/tejashvi14/travel-insurance-prediction-data license = CC0: Public Domain

credit https://www.kaggle.com/datasets/ajay1735/hmeq-data license = CC0: Public Domain

income https://www.kaggle.com/datasets/mastmustu/income license = CC0: Public Domain

machine https://www.kaggle.com/datasets/shivamb/machine-predictive-maintenance-classification license: CC0: Public Domain

skin https://www.kaggle.com/datasets/saurabhshahane/lumpy-skin-disease-dataset license = Attribution 4.0 International (CC BY 4.0)

score https://www.kaggle.com/datasets/parisrohan/credit-score-classification?select=train.csv license = CC0: Public Domain
Heart failure clinical records Data Set
figshare.com
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lukas Heumos (2023). Heart failure clinical records Data Set [Dataset]. http://doi.org/10.6084/m9.figshare.19108337.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19108337.v1
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Lukas Heumos
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Adaptation of http://archive.ics.uci.edu/ml/datasets/Heart+failure+clinical+records# Ready for usage with ehrapy
CoAID dataset with multiple extracted features (both sparse and dense)
zenodo.org
data.niaid.nih.gov
csv
Updated Jun 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Guillaume Bernard; Guillaume Bernard (2022). CoAID dataset with multiple extracted features (both sparse and dense) [Dataset]. http://doi.org/10.5281/zenodo.6630405
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6630405
Dataset updated
Jun 10, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Guillaume Bernard; Guillaume Bernard
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a publication of the CoAID dataset originaly dedicated to fake news detection. We changed here the purpose of this dataset in order to use it in the context of event tracking in press documents.

Cui, Limeng, et Dongwon Lee. 2020. « CoAID: COVID-19 Healthcare Misinformation Dataset ». ArXiv:2006.00885 [Cs], novembre. http://arxiv.org/abs/2006.00885.

In this dataset, we provide multiple features extracted from the text itself. Please note the text is missing from the dataset published in the CSV format for copyright reasons. You can download the original datasets and manually add the missing texts from the original publications.

Features are extracted using:

- A corpus of reference articles in multiple languages languages for TF-IDF weighting. (features_news) [1]

- A corpus of tweets reporting news for TF-IDF weighting. (features_tweets) [1]

- A S-BERT model [2] that uses distiluse-base-multilingual-cased-v1 (called features_use) [3]

- A S-BERT model [2] that uses paraphrase-multilingual-mpnet-base-v2 (called features_mpnet) [4]

References:

[1]: Guillaume Bernard. (2022). Resources to compute TF-IDF weightings on press articles and tweets (1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6610406

[2]: Reimers, Nils, et Iryna Gurevych. 2019. « Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks ». In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 3982‑92. Hong Kong, China: Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-1410.

[3]: https://huggingface.co/sentence-transformers/distiluse-base-multilingual-cased-v1

[4]: https://huggingface.co/sentence-transformers/paraphrase-multilingual-mpnet-base-v2
Hospital Building Data
data.chhs.ca.gov
csv, zip
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Health Care Access and Information (2025). Hospital Building Data [Dataset]. https://data.chhs.ca.gov/dataset/hospital-building-data
Explore at:
csv(2534), zip, csv(1470374)Available download formats
Dataset updated
Jun 30, 2025
Dataset authored and provided by
Department of Health Care Access and Information
Description
Provides basic information for general acute care hospital buildings such as height, number of stories, the building code used to design the building, and the year it was completed. The data is sorted by counties and cities. Structural Performance Categories (SPC ratings) are also provided. SPC ratings range from 1 to 5 with SPC 1 assigned to buildings that may be at risk of collapse during a strong earthquake and SPC 5 assigned to buildings reasonably capable of providing services to the public following a strong earthquake. Where SPC ratings have not been confirmed by the Department of Health Care Access and Information (HCAI) yet, the rating index is followed by 's'. A URL for the building webpage in HCAI/OSHPD eServices Portal is also provided to view projects related to any building.
o
Synthetic Heart Disease Dataset
opendatabay.com
.undefined
Updated May 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Opendatabay Labs (2025). Synthetic Heart Disease Dataset [Dataset]. https://www.opendatabay.com/data/synthetic/9969a415-c090-4564-99d6-eca151e9884d
Explore at:
.undefinedAvailable download formats
Dataset updated
May 19, 2025
Dataset authored and provided by
Opendatabay Labs
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Clinical Trials & Research
Description
A synthetic heart disease dataset has been generated to serve as an educational resource for data science, machine learning, and data analysis applications in the healthcare industry. It simulates patient records related to heart disease, allowing users to practice data manipulation and develop analytical skills in a healthcare context.

Dataset Features:

Age: Age of the patient at admission (in years).

Country: Country of residence, specified as the USA.

State: Random assignments of U.S. states for geographic analysis.

Blood Pressure: Simulated values reflecting typical hypertension ranges (in mmHg).

Cholesterol: Values adjusted to fall within common cholesterol levels (in mg/dL).

BMI: Calculated to represent healthy to overweight classifications.

Glucose Level: Simulated to represent fasting glucose levels (in mg/dL).

Gender: Randomly assigned to simulate demographic diversity.

Hospital: Randomly assigned hospitals to represent different healthcare facilities.

Treatment Options: Various treatment methods including Physiotherapy, Medication, Surgery, Rehabilitation, and Counseling.

Treatment Date: Randomly generated dates for when treatments were administered.

Heart Disease: A binary indicator (0 = No, 1 = Yes) representing the presence of heart disease.

Data Distribution and Outliers:

https://storage.googleapis.com/opendatabay_public/images/image_88c9876e-c5a3-48be-837e-f1ea77d11693.png" alt="Synthetic Heart Disease Data">

https://storage.googleapis.com/opendatabay_public/images/image_041922c7-f3dc-49c9-bfbf-16cdf98d6bd8.png" alt="Synthetic Heart Disease Patient Records Dataset">

https://storage.googleapis.com/opendatabay_public/images/hearr_disease_09f51ed4-86d0-4ac4-b6c0-b7b376a9f7f2.png" alt="Synthetic Heart Disease Statistics">

https://storage.googleapis.com/opendatabay_public/images/heart_disease3_abb20b90-1bbd-4e2c-87ce-a47f1e414583.png" alt="Synthetic Heart Disease Data Distribution">

https://storage.googleapis.com/opendatabay_public/images/heart_disease4_64b65bf1-9b53-4ab1-a7ea-3486c050f607.png" alt="Synthetic Heart Disease Dataset Heatmap and Correlation">

Usage:

This dataset can be used for: - Healthcare research: To explore trends and patterns in cardiovascular health, treatment efficacy, and patient demographics. - Educational training: To teach data cleaning, transformation, and visualisation techniques specific to healthcare data. - Predictive modelling: To develop models that predict heart disease risk based on various patient and demographic factors.

Coverage:

This dataset is synthetic and anonymized, making it a safe tool for experimentation and learning without compromising real patient privacy.

License:

CCO (Public Domain)

Who can use it:

Researchers and educators: For studies or teaching purposes in healthcare analytics and data science.

Data science enthusiasts: For learning, practising, and applying healthcare data manipulation and analysis techniques.
m
Behavioral Risk Factor Surveillance System (BRFSS)
data.mendeley.com
Updated Feb 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kevin Griffith (2025). Behavioral Risk Factor Surveillance System (BRFSS) [Dataset]. http://doi.org/10.17632/shs97w8jtb.1
Explore at:
Unique identifier
https://doi.org/10.17632/shs97w8jtb.1
Dataset updated
Feb 3, 2025
Authors
Kevin Griffith
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These BRFSS datasets were downloaded prior to them being taken offline on January 31st, 2025. Special thanks to James Bailey & Doug Livingston who made earlier years of BRFSS data available!

Data 2000-2023 are provided in SAS, Stata, and R formats. Data for 1987-1999 are provided in CSV format.

This repository has a DOI assigned if you need to cite it.
MIT-BIH Arrhythmia Database (Simple CSVs)
kaggle.com
Updated Jul 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Proto Bioengineering (2023). MIT-BIH Arrhythmia Database (Simple CSVs) [Dataset]. http://doi.org/10.34740/kaggle/dsv/6114424
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/6114424
Dataset updated
Jul 10, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Proto Bioengineering
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
A beginner-friendly version of the MIT-BIH Arrhythmia Database, which contains 48 electrocardiograms (EKGs) from 47 patients that were at Beth Israel Deaconess Medical Center in Boston, MA in 1975-1979.

There are 48 CSVs, each of which is a 30-minute echocardiogram (EKG) from a single patient (record 201 and 202 are from the same patient). Data was collected at 360 Hz, meaning that 360 data points is equal to 1 second of time.

Banner photo by Joshua Chehov on Unsplash.

How to Analyze the Heart with Python

How to Analyze Heartbeats in 15 Minutes with Python

How the Heart Works (and What is a "QRS" Complex?)

How to Identify and Label the Waves of an EKG

How to Flatten a Wandering EKG

How to Calculate the Heart Rate

What is a 12-lead EKG?

EKGs, or electrocardiograms, measure the heart's function by looking at its electrical activity. The electrical activity in each part of the heart is supposed to happen in a particular order and intensity, creating that classic "heartbeat" line (or "QRS complex") you see on monitors in medical TV shows.

There are a few types of EKGs (4-lead, 5-lead, 12-lead, etc.), which give us varying detail about the heart. A 12-lead is one of the most detailed types of EKGs, as it allows us to get 12 different outputs or graphs, all looking at different, specific parts of the heart muscles.

This dataset only publishes two leads from each patient's 12-lead EKG, since that is all that the original MIT-BIH database provided.

What does each part of the QRS complex mean?

Check out Ninja Nerd's EKG Basics tutorial on YouTube to understand what each part of the QRS complex (or heartbeat) means from an electrical standpoint.

Filenames

Each file's name is the ID of the patient (except for 201 and 202, which are the same person).

Columns

index

calculated elapsed milliseconds (index / 360 * 1000)

the first lead

the second lead

The two leads are often lead MLII and another lead such as V1, V2, or V5, though some datasets do not use MLII at all. MLII is the lead most often associated with the classic QRS Complex (the medical name for a single heartbeat).

Milliseconds were calculated and added as a secondary index to each dataset. Calculations were made by dividing the index by 360 Hz then multiplying by 1000. The original index was preserved, since the calculation of milliseconds as digital signals processing (e.g. filtering) occurs may cause issues with the correlation and merging of data. You are encouraged to try whichever index is most suitable for your analysis and/or recalculate a time index with Pandas' to_timedelta().

Patient information

Info about each of the 47 patients is available here, including age, gender, medications, diagnoses, etc.

Getting Started

Physionet has some online tutorials and tips for analyzing EKGs and other time series / digital signals.

Check out our notebook for opening and visualizing the data.

How the CSVs were obtained

A write-up on how the data was converted from .dat to .csv files is available on Medium.com. Data was downloaded from the MIT-BIH Arrhythmia Database then converted to CSV.

Citations

Moody GB, Mark RG. The impact of the MIT-BIH Arrhythmia Database. IEEE Eng in Med and Biol 20(3):45-50 (May-June 2001). (PMID: 11446209)

Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
o
Historical Stock Data of UnitedHealth
opendatabay.com
.undefined
Updated Jun 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DataDooix LTD (2025). Historical Stock Data of UnitedHealth [Dataset]. https://www.opendatabay.com/data/financial/6bcd7286-60a3-434f-b19a-adbe02ef137a
Explore at:
.undefinedAvailable download formats
Dataset updated
Jun 13, 2025
Dataset authored and provided by
DataDooix LTD
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Public Health & Epidemiology
Description
Tracking United HealthCare Stock Performance Since IPO

Dataset Description

This dataset provides historical stock data for UnitedHealth Group (UHG), one of the largest healthcare and insurance companies in the world. It covers stock prices, market capitalization, and trading volumes from the company's IPO to the present. As a Fortune 500 company with a significant market presence, analyzing UHG's stock performance can provide valuable insights into healthcare market trends, investment opportunities, and economic indicators.

Dataset Features

Date – The trading date for the stock data.

Open Price – Stock price at market open.

Close Price – Stock price at market close.

High – Highest stock price during the trading day.

Low – Lowest stock price during the trading day.

Volume – The number of shares traded on that day.

Market Cap – The total market capitalization of UnitedHealth Group.

Dataset Distribution

Data Volume: Number of records depends on trading days from IPO to present.

Format: CSV, Excel, or other structured data formats.

Update Frequency: Weekly.

Usage

This dataset is useful for:

Stock Market Analysis – Analyzing historical stock price trends.

Financial Forecasting – Predicting future stock price movements using machine learning.

Investment Research – Assessing UnitedHealth Group’s stock as part of a portfolio.

Market Trends – Understanding broader trends in the healthcare insurance sector.

Coverage

Geographic Coverage: United States (NYSE).

Time Range: From IPO to present.

Economic Indicators: Healthcare sector, insurance market trends.

License

CC0 (Public Domain) – This dataset is freely available for public and commercial use.

Who Can Use This Dataset?

Investors & Traders – To analyze market trends and make informed decisions.

Economists & Researchers – To study healthcare market impacts.

Data Scientists – To develop predictive stock models.

Facebook

Twitter

Click to copy link

Link copied

Cite

Anouska Abhisikta (2023). Healthcare Management System [Dataset]. https://www.kaggle.com/datasets/anouskaabhisikta/healthcare-management-system

Healthcare Management System

Optimizing Healthcare: Comprehensive Management for Seamless Integration.

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Dec 23, 2023

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Anouska Abhisikta

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Patients Table:

PatientID: Unique identifier for each patient.
firstname: First name of the patient.
lastname: Last name of the patient.
email: Email address of the patient.

This table stores information about individual patients, including their names and contact details.

Doctors Table:

DoctorID: Unique identifier for each doctor.
DoctorName: Full name of the doctor.
Specialization: Area of medical specialization.
DoctorContact: Contact details of the doctor.

This table contains details about healthcare providers, including their names, specializations, and contact information.

Appointments Table:

AppointmentID: Unique identifier for each appointment.
Date: Date of the appointment.
Time: Time of the appointment.
PatientID: Foreign key referencing the Patients table, indicating the patient for the appointment.
DoctorID: Foreign key referencing the Doctors table, indicating the doctor for the appointment.

This table records scheduled appointments, linking patients to doctors.

MedicalProcedure Table:

ProcedureID: Unique identifier for each medical procedure.
ProcedureName: Name or description of the medical procedure.
AppointmentID: Foreign key referencing the Appointments table, indicating the appointment associated with the procedure.

This table stores details about medical procedures associated with specific appointments.

Billing Table:

InvoiceID: Unique identifier for each billing transaction.
PatientID: Foreign key referencing the Patients table, indicating the patient for the billing transaction.
Items: Description of items or services billed.
Amount: Amount charged for the billing transaction.

This table maintains records of billing transactions, associating them with specific patients.

demo Table:

ID: Primary key, serves as a unique identifier for each record.
Name: Name of the entity.
Hint: Additional information or hint about the entity.

This table appears to be a demonstration or testing table, possibly unrelated to the healthcare management system.

This dataset schema is designed to capture comprehensive information about patients, doctors, appointments, medical procedures, and billing transactions in a healthcare management system. Adjustments can be made based on specific requirements, and additional attributes can be included as needed.

Clear search

Close search

Google apps

Main menu

Healthcare Management System

Synthetic Healthcare Database for Research (SyH-DR)

Heart Attack Dataset

Open Database of Healthcare Facilities

MIMIC-III Dataset

Mental Health - Datasets - CTData.org

Health Trends, Comprehensive download file for all geographies

Health Statistic and Research Database

Synthetic Metabolic Syndrome Patient Records Dataset

Dataset Features

Usage

License

MIMIC-IV Dataset

MIMIC-IV

Stroke Risk Prediction Dataset based on Literature

Stroke Risk Prediction Dataset (Version 2)

📌 Overview

Key Enhancements in Version 2:

📊 Dataset Statistics

🔬 Medical Validity

1. Age-Dependent Symptom Probabilities

Hypertension (High Blood Pressure)

Chest Pain

Datasets for federated learning

Heart failure clinical records Data Set

CoAID dataset with multiple extracted features (both sparse and dense)

Hospital Building Data

Synthetic Heart Disease Dataset

Dataset Features:

Data Distribution and Outliers:

Usage:

Coverage:

License:

Who can use it:

Behavioral Risk Factor Surveillance System (BRFSS)

MIT-BIH Arrhythmia Database (Simple CSVs)

How to Analyze the Heart with Python

What is a 12-lead EKG?

What does each part of the QRS complex mean?

Filenames

Columns

Patient information

Getting Started

How the CSVs were obtained

Citations

Historical Stock Data of UnitedHealth

Dataset Description

Dataset Features

Dataset Distribution

Usage

Coverage

License

Who Can Use This Dataset?

Healthcare Management System

Optimizing Healthcare: Comprehensive Management for Seamless Integration.