54 datasets found
  1. Healthcare Management System

    • kaggle.com
    Updated Dec 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anouska Abhisikta (2023). Healthcare Management System [Dataset]. https://www.kaggle.com/datasets/anouskaabhisikta/healthcare-management-system
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 23, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Anouska Abhisikta
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Patients Table:

    • PatientID: Unique identifier for each patient.
    • firstname: First name of the patient.
    • lastname: Last name of the patient.
    • email: Email address of the patient.

    This table stores information about individual patients, including their names and contact details.

    Doctors Table:

    • DoctorID: Unique identifier for each doctor.
    • DoctorName: Full name of the doctor.
    • Specialization: Area of medical specialization.
    • DoctorContact: Contact details of the doctor.

    This table contains details about healthcare providers, including their names, specializations, and contact information.

    Appointments Table:

    • AppointmentID: Unique identifier for each appointment.
    • Date: Date of the appointment.
    • Time: Time of the appointment.
    • PatientID: Foreign key referencing the Patients table, indicating the patient for the appointment.
    • DoctorID: Foreign key referencing the Doctors table, indicating the doctor for the appointment.

    This table records scheduled appointments, linking patients to doctors.

    MedicalProcedure Table:

    • ProcedureID: Unique identifier for each medical procedure.
    • ProcedureName: Name or description of the medical procedure.
    • AppointmentID: Foreign key referencing the Appointments table, indicating the appointment associated with the procedure.

    This table stores details about medical procedures associated with specific appointments.

    Billing Table:

    • InvoiceID: Unique identifier for each billing transaction.
    • PatientID: Foreign key referencing the Patients table, indicating the patient for the billing transaction.
    • Items: Description of items or services billed.
    • Amount: Amount charged for the billing transaction.

    This table maintains records of billing transactions, associating them with specific patients.

    demo Table:

    • ID: Primary key, serves as a unique identifier for each record.
    • Name: Name of the entity.
    • Hint: Additional information or hint about the entity.

    This table appears to be a demonstration or testing table, possibly unrelated to the healthcare management system.

    This dataset schema is designed to capture comprehensive information about patients, doctors, appointments, medical procedures, and billing transactions in a healthcare management system. Adjustments can be made based on specific requirements, and additional attributes can be included as needed.

  2. Synthetic Healthcare Database for Research (SyH-DR)

    • catalog.data.gov
    • healthdata.gov
    • +2more
    Updated Sep 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agency for Healthcare Research and Quality (2023). Synthetic Healthcare Database for Research (SyH-DR) [Dataset]. https://catalog.data.gov/dataset/synthetic-healthcare-database-for-research-syh-dr
    Explore at:
    Dataset updated
    Sep 16, 2023
    Dataset provided by
    Agency for Healthcare Research and Qualityhttp://www.ahrq.gov/
    Description

    The Agency for Healthcare Research and Quality (AHRQ) created SyH-DR from eligibility and claims files for Medicare, Medicaid, and commercial insurance plans in calendar year 2016. SyH-DR contains data from a nationally representative sample of insured individuals for the 2016 calendar year. SyH-DR uses synthetic data elements at the claim level to resemble the marginal distribution of the original data elements. SyH-DR person-level data elements are not synthetic, but identifying information is aggregated or masked.

  3. m

    Heart Attack Dataset

    • data.mendeley.com
    • kaggle.com
    Updated Nov 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tarik A. Rashid (2022). Heart Attack Dataset [Dataset]. http://doi.org/10.17632/wmhctcrt5v.1
    Explore at:
    Dataset updated
    Nov 23, 2022
    Authors
    Tarik A. Rashid
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The heart attack datasets were collected at Zheen hospital in Erbil, Iraq, from January 2019 to May 2019. The attributes of this dataset are: age, gender, heart rate, systolic blood pressure, diastolic blood pressure, blood sugar, ck-mb and troponin with negative or positive output. According to the provided information, the medical dataset classifies either heart attack or none. The gender column in the data is normalized: the male is set to 1 and the female to 0. The glucose column is set to 1 if it is > 120; otherwise, 0. As for the output, positive is set to 1 and negative to 0.

  4. G

    Open Database of Healthcare Facilities

    • open.canada.ca
    • ouvert.canada.ca
    zip
    Updated Apr 23, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistics Canada (2020). Open Database of Healthcare Facilities [Dataset]. https://open.canada.ca/data/en/dataset/543fe07a-fd79-40e9-a829-ccd697526765
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 23, 2020
    Dataset provided by
    Statistics Canada
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Time period covered
    Nov 1, 2019 - Mar 1, 2020
    Description

    The Open Database of Healthcare Facilities (ODHF) is a listing of health facilities across Canada. Facilities are classified into one of three types: ambulatory health care services, hospitals, and nursing and residential care facilities. The listing contains the names, addresses, and geo coordinates of facilities, as well as the facility type as assigned in the data source. The ODHF is based on data from authoritative sources that include among them all levels of government and public health and professional healthcare bodies. The ODHF is released as open data under the Open Government License - Canada and provided as a zipped comma-separated values (.csv) file.

  5. P

    MIMIC-III Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Apr 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alistair E.W. Johnson; Tom J. Pollard; Lu Shen; Li-wei H. Lehman; Mengling Feng; Mohammad Ghassemi; Benjamin Moody; Peter Szolovits; Leo Anthony Celi; Roger G. Mark (2022). MIMIC-III Dataset [Dataset]. https://paperswithcode.com/dataset/mimic-iii
    Explore at:
    Dataset updated
    Apr 20, 2022
    Authors
    Alistair E.W. Johnson; Tom J. Pollard; Lu Shen; Li-wei H. Lehman; Mengling Feng; Mohammad Ghassemi; Benjamin Moody; Peter Szolovits; Leo Anthony Celi; Roger G. Mark
    Description

    The Medical Information Mart for Intensive Care III (MIMIC-III) dataset is a large, de-identified and publicly-available collection of medical records. Each record in the dataset includes ICD-9 codes, which identify diagnoses and procedures performed. Each code is partitioned into sub-codes, which often include specific circumstantial details. The dataset consists of 112,000 clinical reports records (average length 709.3 tokens) and 1,159 top-level ICD-9 codes. Each report is assigned to 7.6 codes, on average. Data includes vital signs, medications, laboratory measurements, observations and notes charted by care providers, fluid balance, procedure codes, diagnostic codes, imaging reports, hospital length of stay, survival data, and more.

    The database supports applications including academic and industrial research, quality improvement initiatives, and higher education coursework.

  6. c

    Mental Health - Datasets - CTData.org

    • data.ctdata.org
    Updated Jun 24, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). Mental Health - Datasets - CTData.org [Dataset]. http://data.ctdata.org/dataset/mental-health
    Explore at:
    Dataset updated
    Jun 24, 2016
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mental Health reports the prevalence of the mental illness in the past year by age range.

  7. G

    Health Trends, Comprehensive download file for all geographies

    • open.canada.ca
    • ouvert.canada.ca
    csv
    Updated Mar 9, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistics Canada (2022). Health Trends, Comprehensive download file for all geographies [Dataset]. https://open.canada.ca/data/en/dataset/3ef254aa-519b-47d6-96ec-f0ba2e72e1dd
    Explore at:
    csvAvailable download formats
    Dataset updated
    Mar 9, 2022
    Dataset provided by
    Statistics Canada
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Description

    This product presents comparable time-series data for a range of health indicators from a number of sources including the Canadian Community Health Survey, Vital Statistics, and Canadian Cancer Registry.

  8. E

    Health Statistic and Research Database

    • www-acc.healthinformationportal.eu
    • healthinformationportal.eu
    html
    Updated Feb 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Estonian National Institute for Health Development (2023). Health Statistic and Research Database [Dataset]. https://www-acc.healthinformationportal.eu/health-information-sources/health-statistic-and-research-database
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Feb 23, 2023
    Dataset authored and provided by
    Estonian National Institute for Health Development
    Variables measured
    sex, title, topics, country, language, data_owners, description, contact_name, geo_coverage, contact_email, and 10 more
    Measurement technique
    Multiple sources
    Description

    The Health Statistics and Health Research Database is Estonian largest set of health-related statistics and survey results administrated by National Institute for Health Development. Use of the database is free of charge.

    The database consists of eight main areas divided into sub-areas. The data tables included in the sub-areas are assigned unique codes. The data tables presented in the database can be both viewed in the Internet environment, and downloaded using different file formats (.px, .xlsx, .csv, .json). You can download the detailed database user manual here (.pdf).

    The database is constantly updated with new data. Dates of updating the existing data tables and adding new data are provided in the release calendar. The date of the last update to each table is provided after the title of the table in the list of data tables.

    A contact person for each sub-area is provided under the "Definitions and Methodology" link of each sub-area, so you can ask additional information about the data published in the database. Contact this person for any further questions and data requests.

    Read more about publication of health statistics by National Institute for Health Development in Health Statistics Dissemination Principles.

  9. o

    Synthetic Metabolic Syndrome Patient Records Dataset

    • opendatabay.com
    .undefined
    Updated Apr 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Opendatabay Labs (2025). Synthetic Metabolic Syndrome Patient Records Dataset [Dataset]. https://www.opendatabay.com/data/synthetic/7bf17077-77ce-40cc-84e8-05b5e545d5eb
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Apr 26, 2025
    Dataset authored and provided by
    Opendatabay Labs
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Patient Health Records & Digital Health
    Description

    The Synthetic Metabolic Syndrome Dataset is designed for educational and research purposes in healthcare, focusing on metabolic syndrome and related health parameters. The dataset contains demographic, anthropometric, and biochemical information that can be used to analyze and predict the presence of metabolic syndrome in individuals.

    Dataset Features

    • seqn: A unique identifier for each individual in the dataset.
    • Age: Age of the individual (in years).
    • Sex: Gender of the individual (Male/Female).
    • Marital: Marital status of the individual (e.g., Married, Separated, etc.).
    • Income: Annual income of the individual (in simulated currency units).
    • Race: Race or ethnicity of the individual (e.g., White, Black, Mexican American, etc.).
    • WaistCirc: Waist circumference (in cm), an indicator of central obesity.
    • BMI: Body Mass Index, a measure of body fat based on height and weight.
    • Albuminuria: Presence of albumin in urine (binary indicator, 0 for no, 1 for yes).
    • UrAlbCr: Urinary albumin-to-creatinine ratio, a measure of kidney health.
    • UricAcid: Uric acid levels (in mg/dL), used to assess gout risk and metabolic health.
    • BloodGlucose: Blood glucose level (in mg/dL), an indicator of diabetes or prediabetes.
    • HDL: High-density lipoprotein cholesterol level (in mg/dL), often referred to as "good cholesterol."
    • Triglycerides: Triglyceride levels (in mg/dL), a measure of fat in the blood.
    • MetabolicSyndrome: Presence of metabolic syndrome (Yes/No), based on a combination of criteria such as waist circumference, blood pressure, glucose, HDL, and triglycerides. ### Distribution https://storage.googleapis.com/opendatabay_public/7bf17077-77ce-40cc-84e8-05b5e545d5eb/c5f0c7f50ff3_Metabolic_2.png" alt="Synthetic Metabolic Syndrome Patient Records Dataset Distribution">

    https://storage.googleapis.com/opendatabay_public/7bf17077-77ce-40cc-84e8-05b5e545d5eb/7e880a16ea2c_Metabolic_1.png" alt="Synthetic Metabolic Syndrome Data">

    Usage

    This dataset is well-suited for applications in healthcare analytics, public health, and data science:

    • Metabolic Syndrome Prediction: Develop machine learning models to predict the presence of metabolic syndrome based on demographic and biochemical markers.
    • Risk Factor Analysis: Identify key risk factors for metabolic syndrome, such as obesity, high glucose, or low HDL cholesterol.
    • Public Health Research: Investigate correlations between socioeconomic status (e.g., income and marital status) and metabolic syndrome prevalence.
    • Personalized Healthcare: Design intervention strategies tailored to individuals based on their metabolic health profile.
    • Health Disparities: Explore health disparities among racial and ethnic groups to inform equitable healthcare policies. ### Coverage This synthetic dataset provides a comprehensive representation of metabolic health across different demographic groups. It includes diverse examples of individuals at varying risk levels for metabolic syndrome, ensuring broad applicability in research and education.

    License

    CC0 (Public Domain)

    Who Can Use It - Healthcare Professionals: To study metabolic syndrome trends and tailor interventions. - Data Scientists: For practicing classification, regression, and clustering techniques in healthcare analytics. - Public Health Analysts: To assess population-level metabolic health and inform policies. - Researchers: To simulate the impact of lifestyle changes on metabolic health outcomes.

  10. P

    MIMIC-IV Dataset

    • paperswithcode.com
    • physionet.org
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MIMIC-IV Dataset [Dataset]. https://paperswithcode.com/dataset/mimic-iv
    Explore at:
    Description

    Retrospectively collected medical data has the opportunity to improve patient care through knowledge discovery and algorithm development. Broad reuse of medical data is desirable for the greatest public good, but data sharing must be done in a manner which protects patient privacy.

    The Medical Information Mart for Intensive Care (MIMIC)-III database provided critical care data for over 40,000 patients admitted to intensive care units at the Beth Israel Deaconess Medical Center (BIDMC). Importantly, MIMIC-III was deidentified, and patient identifiers were removed according to the Health Insurance Portability and Accountability Act (HIPAA) Safe Harbor provision. MIMIC-III has been integral in driving large amounts of research in clinical informatics, epidemiology, and machine learning. Here we present MIMIC-IV, an update to MIMIC-III, which incorporates contemporary data and improves on numerous aspects of MIMIC-III. MIMIC-IV adopts a modular approach to data organization, highlighting data provenance and facilitating both individual and combined use of disparate data sources. MIMIC-IV is intended to carry on the success of MIMIC-III and support a broad set of applications within healthcare.

  11. p

    MIMIC-IV

    • physionet.org
    Updated Oct 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alistair Johnson; Lucas Bulgarelli; Tom Pollard; Brian Gow; Benjamin Moody; Steven Horng; Leo Anthony Celi; Roger Mark (2024). MIMIC-IV [Dataset]. http://doi.org/10.13026/kpb9-mt58
    Explore at:
    Dataset updated
    Oct 11, 2024
    Authors
    Alistair Johnson; Lucas Bulgarelli; Tom Pollard; Brian Gow; Benjamin Moody; Steven Horng; Leo Anthony Celi; Roger Mark
    License

    https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts

    Description

    Retrospectively collected medical data has the opportunity to improve patient care through knowledge discovery and algorithm development. Broad reuse of medical data is desirable for the greatest public good, but data sharing must be done in a manner which protects patient privacy. Here we present Medical Information Mart for Intensive Care (MIMIC)-IV, a large deidentified dataset of patients admitted to the emergency department or an intensive care unit at the Beth Israel Deaconess Medical Center in Boston, MA. MIMIC-IV contains data for over 65,000 patients admitted to an ICU and over 200,000 patients admitted to the emergency department. MIMIC-IV incorporates contemporary data and adopts a modular approach to data organization, highlighting data provenance and facilitating both individual and combined use of disparate data sources. MIMIC-IV is intended to carry on the success of MIMIC-III and support a broad set of applications within healthcare.

  12. Stroke Risk Prediction Dataset based on Literature

    • kaggle.com
    Updated Mar 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mahatir Ahmed Tusher (2025). Stroke Risk Prediction Dataset based on Literature [Dataset]. http://doi.org/10.34740/kaggle/dsv/10892812
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 1, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mahatir Ahmed Tusher
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Stroke Risk Prediction Dataset (Version 2)

    Medically Validated, Age-Accurate, and Balanced
    Samples: 35,000 | Features: 16 | Targets: 2 (Binary + Regression)

    πŸ“Œ Overview

    This dataset is designed for predicting stroke risk using symptoms, demographics, and medical literature-inspired risk modeling. Version 2 significantly improves upon Version 1 by incorporating age-dependent symptom probabilities, gender-specific risk modifiers, and medically validated feature engineering.

    Key Enhancements in Version 2:

    1. Age-Accurate Risk Modeling:

      • Stroke risk now follows a sigmoidal curve (sharp increase after age 50), reflecting real-world epidemiological trends.
      • Symptom probabilities (e.g., hypertension, chest pain) scale with age (see Medical Validity).
    2. Gender-Specific Risk:

      • Males under 60 have 1.5Γ— higher risk, while females over 60 have 1.8Γ— higher risk (post-menopausal hormonal changes).
    3. Balanced and Expanded Data:

      • 35,000 samples (vs. 10,000 in Version 1) to improve model generalizability and capture rare symptom combinations.
      • 50% at-risk (stroke risk β‰₯50%) and 50% not-at-risk (stroke risk <50%).

    πŸ“Š Dataset Statistics

    ColumnTypeDescription
    ageIntegerAge (18–90)
    genderStringMale/Female
    chest_painBinary1 = Present, 0 = Absent
    shortness_of_breathBinary1 = Present, 0 = Absent
    irregular_heartbeatBinary1 = Present, 0 = Absent
    fatigue_weaknessBinary1 = Present, 0 = Absent
    dizzinessBinary1 = Present, 0 = Absent
    swelling_edemaBinary1 = Present, 0 = Absent
    neck_jaw_painBinary1 = Present, 0 = Absent
    excessive_sweatingBinary1 = Present, 0 = Absent
    persistent_coughBinary1 = Present, 0 = Absent
    nausea_vomitingBinary1 = Present, 0 = Absent
    high_blood_pressureBinary1 = Present, 0 = Absent
    chest_discomfortBinary1 = Present, 0 = Absent
    cold_hands_feetBinary1 = Present, 0 = Absent
    snoring_sleep_apneaBinary1 = Present, 0 = Absent
    anxiety_doomBinary1 = Present, 0 = Absent
    at_riskBinaryTarget for classification (1 = At Risk, 0 = Not At Risk)
    stroke_risk_percentageFloatTarget for regression (0–100%)

    Age distribution in Version 2 vs. Version 1
    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F21100322%2F6317df05bc7526268853e24a5ce831ba%2FAge%20Distribution%20Plot.png?generation=1740875866152537&alt=media" alt="">

    πŸ”¬ Medical Validity

    This dataset is grounded in peer-reviewed medical literature, with symptom probabilities, risk weights, and demographic relationships directly derived from clinical guidelines and epidemiological studies. Below is a detailed breakdown of how medical knowledge was translated into dataset parameters:

    1. Age-Dependent Symptom Probabilities

    The prevalence of symptoms increases with age, reflecting real-world clinical observations. Probabilities are calibrated using population-level data from medical literature:

    Hypertension (High Blood Pressure)

    • Probability by Age: 10% (18–30), 25% (31–50), 45% (51–70), 60% (71–90).
    • Source: WHO Global Report on Stroke (2023) identifies hypertension as the leading modifiable stroke risk factor, with prevalence rising from ~12% in adults <30 to ~65% in adults >70.
    • Clinical Basis: Arterial stiffness and cumulative vascular damage over time explain the age-dependent increase (Chapter 4, Harrison’s Principles of Internal Medicine).

    Chest Pain

    • Probability by Age: 5% (18–30), 15% (31–50), 25% (51–70), 35% (71–90).
    • Source: The Stroke Book (Cambridge Medicine) notes that chest pain is rare in young adults but becomes prevalent in older populations due to atherosclerosis and coronary artery disease.
    • Clinical Basis: Atherosclerotic plaque buildup accelerates after age ...
  13. Datasets for federated learning

    • kaggle.com
    Updated Dec 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    wonghoitin (2022). Datasets for federated learning [Dataset]. https://www.kaggle.com/datasets/wonghoitin/datasets-for-federated-learning
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 29, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    wonghoitin
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Federated learning is to build machine learning models based on data sets that are distributed across multiple devices while preventing data leakage.(Q. Yang et al. 2019)

    source:

    1. smoking https://www.kaggle.com/datasets/kukuroo3/body-signal-of-smoking license = CC0: Public Domain

    2. heart https://www.kaggle.com/datasets/rashikrahmanpritom/heart-attack-analysis-prediction-dataset license = CC0: Public Domain

    3. water https://www.kaggle.com/datasets/adityakadiwal/water-potability license = CC0: Public Domain

    4. customer https://www.kaggle.com/datasets/imakash3011/customer-personality-analysis license = CC0: Public Domain

    5. insurance https://www.kaggle.com/datasets/tejashvi14/travel-insurance-prediction-data license = CC0: Public Domain

    6. credit https://www.kaggle.com/datasets/ajay1735/hmeq-data license = CC0: Public Domain

    7. income https://www.kaggle.com/datasets/mastmustu/income license = CC0: Public Domain

    8. machine https://www.kaggle.com/datasets/shivamb/machine-predictive-maintenance-classification license: CC0: Public Domain

    9. skin https://www.kaggle.com/datasets/saurabhshahane/lumpy-skin-disease-dataset license = Attribution 4.0 International (CC BY 4.0)

    10. score https://www.kaggle.com/datasets/parisrohan/credit-score-classification?select=train.csv license = CC0: Public Domain

  14. Heart failure clinical records Data Set

    • figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lukas Heumos (2023). Heart failure clinical records Data Set [Dataset]. http://doi.org/10.6084/m9.figshare.19108337.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Lukas Heumos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
  15. CoAID dataset with multiple extracted features (both sparse and dense)

    • zenodo.org
    • data.niaid.nih.gov
    csv
    Updated Jun 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guillaume Bernard; Guillaume Bernard (2022). CoAID dataset with multiple extracted features (both sparse and dense) [Dataset]. http://doi.org/10.5281/zenodo.6630405
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 10, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Guillaume Bernard; Guillaume Bernard
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a publication of the CoAID dataset originaly dedicated to fake news detection. We changed here the purpose of this dataset in order to use it in the context of event tracking in press documents.

    Cui, Limeng, et Dongwon Lee. 2020. Β« CoAID: COVID-19 Healthcare Misinformation Dataset Β». ArXiv:2006.00885 [Cs], novembre. http://arxiv.org/abs/2006.00885.

    In this dataset, we provide multiple features extracted from the text itself. Please note the text is missing from the dataset published in the CSV format for copyright reasons. You can download the original datasets and manually add the missing texts from the original publications.

    Features are extracted using:

    - A corpus of reference articles in multiple languages languages for TF-IDF weighting. (features_news) [1]

    - A corpus of tweets reporting news for TF-IDF weighting. (features_tweets) [1]

    - A S-BERT model [2] that uses distiluse-base-multilingual-cased-v1 (called features_use) [3]

    - A S-BERT model [2] that uses paraphrase-multilingual-mpnet-base-v2 (called features_mpnet) [4]

    References:

    [1]: Guillaume Bernard. (2022). Resources to compute TF-IDF weightings on press articles and tweets (1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6610406

    [2]: Reimers, Nils, et Iryna Gurevych. 2019. Β« Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks Β». In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 3982‑92. Hong Kong, China: Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-1410.

    [3]: https://huggingface.co/sentence-transformers/distiluse-base-multilingual-cased-v1

    [4]: https://huggingface.co/sentence-transformers/paraphrase-multilingual-mpnet-base-v2

  16. Hospital Building Data

    • data.chhs.ca.gov
    csv, zip
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Health Care Access and Information (2025). Hospital Building Data [Dataset]. https://data.chhs.ca.gov/dataset/hospital-building-data
    Explore at:
    csv(2534), zip, csv(1470374)Available download formats
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    Department of Health Care Access and Information
    Description

    Provides basic information for general acute care hospital buildings such as height, number of stories, the building code used to design the building, and the year it was completed. The data is sorted by counties and cities. Structural Performance Categories (SPC ratings) are also provided. SPC ratings range from 1 to 5 with SPC 1 assigned to buildings that may be at risk of collapse during a strong earthquake and SPC 5 assigned to buildings reasonably capable of providing services to the public following a strong earthquake. Where SPC ratings have not been confirmed by the Department of Health Care Access and Information (HCAI) yet, the rating index is followed by 's'. A URL for the building webpage in HCAI/OSHPD eServices Portal is also provided to view projects related to any building.

  17. o

    Synthetic Heart Disease Dataset

    • opendatabay.com
    .undefined
    Updated May 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Opendatabay Labs (2025). Synthetic Heart Disease Dataset [Dataset]. https://www.opendatabay.com/data/synthetic/9969a415-c090-4564-99d6-eca151e9884d
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    May 19, 2025
    Dataset authored and provided by
    Opendatabay Labs
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Clinical Trials & Research
    Description

    A synthetic heart disease dataset has been generated to serve as an educational resource for data science, machine learning, and data analysis applications in the healthcare industry. It simulates patient records related to heart disease, allowing users to practice data manipulation and develop analytical skills in a healthcare context.

    Dataset Features:

    • Age: Age of the patient at admission (in years).
    • Country: Country of residence, specified as the USA.
    • State: Random assignments of U.S. states for geographic analysis.
    • Blood Pressure: Simulated values reflecting typical hypertension ranges (in mmHg).
    • Cholesterol: Values adjusted to fall within common cholesterol levels (in mg/dL).
    • BMI: Calculated to represent healthy to overweight classifications.
    • Glucose Level: Simulated to represent fasting glucose levels (in mg/dL).
    • Gender: Randomly assigned to simulate demographic diversity.
    • Hospital: Randomly assigned hospitals to represent different healthcare facilities.
    • Treatment Options: Various treatment methods including Physiotherapy, Medication, Surgery, Rehabilitation, and Counseling.
    • Treatment Date: Randomly generated dates for when treatments were administered.
    • Heart Disease: A binary indicator (0 = No, 1 = Yes) representing the presence of heart disease.

    Data Distribution and Outliers:

    https://storage.googleapis.com/opendatabay_public/images/image_88c9876e-c5a3-48be-837e-f1ea77d11693.png" alt="Synthetic Heart Disease Data">

    https://storage.googleapis.com/opendatabay_public/images/image_041922c7-f3dc-49c9-bfbf-16cdf98d6bd8.png" alt="Synthetic Heart Disease Patient Records Dataset">

    https://storage.googleapis.com/opendatabay_public/images/hearr_disease_09f51ed4-86d0-4ac4-b6c0-b7b376a9f7f2.png" alt="Synthetic Heart Disease Statistics">

    https://storage.googleapis.com/opendatabay_public/images/heart_disease3_abb20b90-1bbd-4e2c-87ce-a47f1e414583.png" alt="Synthetic Heart Disease Data Distribution">

    https://storage.googleapis.com/opendatabay_public/images/heart_disease4_64b65bf1-9b53-4ab1-a7ea-3486c050f607.png" alt="Synthetic Heart Disease Dataset Heatmap and Correlation">

    Usage:

    This dataset can be used for: - Healthcare research: To explore trends and patterns in cardiovascular health, treatment efficacy, and patient demographics. - Educational training: To teach data cleaning, transformation, and visualisation techniques specific to healthcare data. - Predictive modelling: To develop models that predict heart disease risk based on various patient and demographic factors.

    Coverage:

    This dataset is synthetic and anonymized, making it a safe tool for experimentation and learning without compromising real patient privacy.

    License:

    CCO (Public Domain)

    Who can use it:

    • Researchers and educators: For studies or teaching purposes in healthcare analytics and data science.
    • Data science enthusiasts: For learning, practising, and applying healthcare data manipulation and analysis techniques.
  18. m

    Behavioral Risk Factor Surveillance System (BRFSS)

    • data.mendeley.com
    Updated Feb 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kevin Griffith (2025). Behavioral Risk Factor Surveillance System (BRFSS) [Dataset]. http://doi.org/10.17632/shs97w8jtb.1
    Explore at:
    Dataset updated
    Feb 3, 2025
    Authors
    Kevin Griffith
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These BRFSS datasets were downloaded prior to them being taken offline on January 31st, 2025. Special thanks to James Bailey & Doug Livingston who made earlier years of BRFSS data available!

    Data 2000-2023 are provided in SAS, Stata, and R formats. Data for 1987-1999 are provided in CSV format.

    This repository has a DOI assigned if you need to cite it.

  19. MIT-BIH Arrhythmia Database (Simple CSVs)

    • kaggle.com
    Updated Jul 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Proto Bioengineering (2023). MIT-BIH Arrhythmia Database (Simple CSVs) [Dataset]. http://doi.org/10.34740/kaggle/dsv/6114424
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 10, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Proto Bioengineering
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    A beginner-friendly version of the MIT-BIH Arrhythmia Database, which contains 48 electrocardiograms (EKGs) from 47 patients that were at Beth Israel Deaconess Medical Center in Boston, MA in 1975-1979.

    There are 48 CSVs, each of which is a 30-minute echocardiogram (EKG) from a single patient (record 201 and 202 are from the same patient). Data was collected at 360 Hz, meaning that 360 data points is equal to 1 second of time.

    Banner photo by Joshua Chehov on Unsplash.

    How to Analyze the Heart with Python

    1. How to Analyze Heartbeats in 15 Minutes with Python
    2. How the Heart Works (and What is a "QRS" Complex?)
    3. How to Identify and Label the Waves of an EKG
    4. How to Flatten a Wandering EKG
    5. How to Calculate the Heart Rate

    What is a 12-lead EKG?

    EKGs, or electrocardiograms, measure the heart's function by looking at its electrical activity. The electrical activity in each part of the heart is supposed to happen in a particular order and intensity, creating that classic "heartbeat" line (or "QRS complex") you see on monitors in medical TV shows.

    There are a few types of EKGs (4-lead, 5-lead, 12-lead, etc.), which give us varying detail about the heart. A 12-lead is one of the most detailed types of EKGs, as it allows us to get 12 different outputs or graphs, all looking at different, specific parts of the heart muscles.

    This dataset only publishes two leads from each patient's 12-lead EKG, since that is all that the original MIT-BIH database provided.

    What does each part of the QRS complex mean?

    Check out Ninja Nerd's EKG Basics tutorial on YouTube to understand what each part of the QRS complex (or heartbeat) means from an electrical standpoint.

    Filenames

    Each file's name is the ID of the patient (except for 201 and 202, which are the same person).

    Columns

    • index
    • calculated elapsed milliseconds (index / 360 * 1000)
    • the first lead
    • the second lead

    The two leads are often lead MLII and another lead such as V1, V2, or V5, though some datasets do not use MLII at all. MLII is the lead most often associated with the classic QRS Complex (the medical name for a single heartbeat).

    Milliseconds were calculated and added as a secondary index to each dataset. Calculations were made by dividing the index by 360 Hz then multiplying by 1000. The original index was preserved, since the calculation of milliseconds as digital signals processing (e.g. filtering) occurs may cause issues with the correlation and merging of data. You are encouraged to try whichever index is most suitable for your analysis and/or recalculate a time index with Pandas' to_timedelta().

    Patient information

    Info about each of the 47 patients is available here, including age, gender, medications, diagnoses, etc.

    Getting Started

    Physionet has some online tutorials and tips for analyzing EKGs and other time series / digital signals.

    Check out our notebook for opening and visualizing the data.

    How the CSVs were obtained

    A write-up on how the data was converted from .dat to .csv files is available on Medium.com. Data was downloaded from the MIT-BIH Arrhythmia Database then converted to CSV.

    Citations

    Moody GB, Mark RG. The impact of the MIT-BIH Arrhythmia Database. IEEE Eng in Med and Biol 20(3):45-50 (May-June 2001). (PMID: 11446209)

    Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

  20. o

    Historical Stock Data of UnitedHealth

    • opendatabay.com
    .undefined
    Updated Jun 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DataDooix LTD (2025). Historical Stock Data of UnitedHealth [Dataset]. https://www.opendatabay.com/data/financial/6bcd7286-60a3-434f-b19a-adbe02ef137a
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jun 13, 2025
    Dataset authored and provided by
    DataDooix LTD
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Public Health & Epidemiology
    Description

    Tracking United HealthCare Stock Performance Since IPO

    Dataset Description

    This dataset provides historical stock data for UnitedHealth Group (UHG), one of the largest healthcare and insurance companies in the world. It covers stock prices, market capitalization, and trading volumes from the company's IPO to the present. As a Fortune 500 company with a significant market presence, analyzing UHG's stock performance can provide valuable insights into healthcare market trends, investment opportunities, and economic indicators.

    Dataset Features

    • Date – The trading date for the stock data.
    • Open Price – Stock price at market open.
    • Close Price – Stock price at market close.
    • High – Highest stock price during the trading day.
    • Low – Lowest stock price during the trading day.
    • Volume – The number of shares traded on that day.
    • Market Cap – The total market capitalization of UnitedHealth Group.

    Dataset Distribution

    • Data Volume: Number of records depends on trading days from IPO to present.
    • Format: CSV, Excel, or other structured data formats.
    • Update Frequency: Weekly.

    Usage

    This dataset is useful for:

    • Stock Market Analysis – Analyzing historical stock price trends.
    • Financial Forecasting – Predicting future stock price movements using machine learning.
    • Investment Research – Assessing UnitedHealth Group’s stock as part of a portfolio.
    • Market Trends – Understanding broader trends in the healthcare insurance sector.

    Coverage

    • Geographic Coverage: United States (NYSE).
    • Time Range: From IPO to present.
    • Economic Indicators: Healthcare sector, insurance market trends.

    License

    CC0 (Public Domain) – This dataset is freely available for public and commercial use.

    Who Can Use This Dataset?

    • Investors & Traders – To analyze market trends and make informed decisions.
    • Economists & Researchers – To study healthcare market impacts.
    • Data Scientists – To develop predictive stock models.
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Anouska Abhisikta (2023). Healthcare Management System [Dataset]. https://www.kaggle.com/datasets/anouskaabhisikta/healthcare-management-system
Organization logo

Healthcare Management System

Optimizing Healthcare: Comprehensive Management for Seamless Integration.

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 23, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Anouska Abhisikta
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Patients Table:

  • PatientID: Unique identifier for each patient.
  • firstname: First name of the patient.
  • lastname: Last name of the patient.
  • email: Email address of the patient.

This table stores information about individual patients, including their names and contact details.

Doctors Table:

  • DoctorID: Unique identifier for each doctor.
  • DoctorName: Full name of the doctor.
  • Specialization: Area of medical specialization.
  • DoctorContact: Contact details of the doctor.

This table contains details about healthcare providers, including their names, specializations, and contact information.

Appointments Table:

  • AppointmentID: Unique identifier for each appointment.
  • Date: Date of the appointment.
  • Time: Time of the appointment.
  • PatientID: Foreign key referencing the Patients table, indicating the patient for the appointment.
  • DoctorID: Foreign key referencing the Doctors table, indicating the doctor for the appointment.

This table records scheduled appointments, linking patients to doctors.

MedicalProcedure Table:

  • ProcedureID: Unique identifier for each medical procedure.
  • ProcedureName: Name or description of the medical procedure.
  • AppointmentID: Foreign key referencing the Appointments table, indicating the appointment associated with the procedure.

This table stores details about medical procedures associated with specific appointments.

Billing Table:

  • InvoiceID: Unique identifier for each billing transaction.
  • PatientID: Foreign key referencing the Patients table, indicating the patient for the billing transaction.
  • Items: Description of items or services billed.
  • Amount: Amount charged for the billing transaction.

This table maintains records of billing transactions, associating them with specific patients.

demo Table:

  • ID: Primary key, serves as a unique identifier for each record.
  • Name: Name of the entity.
  • Hint: Additional information or hint about the entity.

This table appears to be a demonstration or testing table, possibly unrelated to the healthcare management system.

This dataset schema is designed to capture comprehensive information about patients, doctors, appointments, medical procedures, and billing transactions in a healthcare management system. Adjustments can be made based on specific requirements, and additional attributes can be included as needed.

Search
Clear search
Close search
Google apps
Main menu