95 datasets found
  1. Diabetes Health Indicators Dataset

    • kaggle.com
    Updated Sep 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohan Krishna Thalla (2025). Diabetes Health Indicators Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/13128284
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 21, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mohan Krishna Thalla
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Diabetes Health Indicators Dataset

    Overview

    This dataset contains 100,000 patient records designed for diabetes risk prediction, analysis, and machine learning applications. The dataset is clean, preprocessed, and ready for use in classification, regression, feature engineering, statistical analysis, and data visualization.

    • Rows: 100,000
    • Columns: 35+
    • File: diabetes_dataset.csv

    Dataset Description

    The dataset includes patient profiles with features based on demographics, lifestyle habits, family history, and clinical measurements that are well-established indicators of diabetes risk. All data is generated using statistical distributions inspired by real-world medical research, ensuring privacy preservation while reflecting realistic health patterns.

    Features

    ColumnTypeDescriptionValues/Range
    patient_idIntegerUnique patient identifier1–100000
    ageIntegerAge of patient in years18–90
    genderStringPatient gender'Male', 'Female', 'Other'
    ethnicityStringEthnic background'White', 'Hispanic', 'Black', 'Asian', 'Other'
    education_levelStringHighest completed education'No formal', 'Highschool', 'Graduate', 'Postgraduate'
    income_levelStringIncome category'Low', 'Medium', 'High'
    employment_statusStringEmployment type'Employed', 'Unemployed', 'Retired', 'Student'
    smoking_statusStringSmoking behavior'Never', 'Former', 'Current'
    alcohol_consumption_per_weekFloatDrinks consumed per week0–30
    physical_activity_minutes_per_weekIntegerPhysical activity (weekly minutes)0–600
    diet_scoreIntegerDiet quality (higher = healthier)0–10
    sleep_hours_per_dayFloatAverage daily sleep hours3–12
    screen_time_hours_per_dayFloatAverage daily screen time hours0–12
    family_history_diabetesIntegerFamily history of diabetes0 = No, 1 = Yes
    hypertension_historyIntegerHypertension history0 = No, 1 = Yes
    cardiovascular_historyIntegerCardiovascular history0 = No, 1 = Yes
    bmiFloatBody Mass Index (kg/m²)15–45
    waist_to_hip_ratioFloatWaist-to-hip ratio0.7–1.2
    systolic_bpIntegerSystolic blood pressure (mmHg)90–180
    diastolic_bpIntegerDiastolic blood pressure (mmHg)60–120
    heart_rateIntegerResting heart rate (bpm)50–120
    cholesterol_totalFloatTotal cholesterol (mg/dL)120–300
    hdl_cholesterolFloatHDL cholesterol (mg/dL)20–100
    ldl_cholesterolFloatLDL cholesterol (mg/dL)50–200
    triglyceridesFloatTriglycerides (mg/dL)50–500
    glucose_fastingFloatFasting glucose (mg/dL)70–250
    glucose_postprandialFloatPost-meal glucose (mg/dL)90–350
    insulin_levelFloatBlood insulin level (µU/mL)2–50
    hba1cFloatHbA1c (%)4–14
    diabetes_risk_scoreIntegerRisk score (calculated, 0–100)0–100
    diabetes_stageStringStage of diabetes'No Diabetes', 'Pre-Diabetes', 'Type 1', 'Type 2', 'Gestational'
    diagnosed_diabetesIntegerTarget: Diabetes diagnosis0 = No, 1 = Yes

    Data Quality

    • Complete: No missing values or duplicates
    • Clean: All values fall within medically realistic ranges
    • Balanced Features: Distribution matches realistic population health patterns
    • Target Distribution: ~20–25% diagnosed cases (balanced for ML classification)

    Use Cases

    • 🩺 Binary Classification → Predict diagnosed_diabetes (Yes/No)
    • 🧮 Multiclass Classification → Predict diabetes_stage
    • 📊 Regression → Predict glucose_fasting, hba1c, or diabetes_risk_score
    • 🔍 EDA & Visualization → Explore lifestyle and clinical health patterns
    • 🧠 Machine Learning → Train ML/DL models for healthcare prediction tasks
    • 📈 Statistical Testing → Hypothesis testing on health indicators
  2. d

    Community Health Indicators

    • catalog.data.gov
    • data.kingcounty.gov
    • +2more
    Updated Feb 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.kingcounty.gov (2024). Community Health Indicators [Dataset]. https://catalog.data.gov/dataset/community-health-indicators
    Explore at:
    Dataset updated
    Feb 2, 2024
    Dataset provided by
    data.kingcounty.gov
    Description

    These indicators are presented by Public Health — Seattle & King County, in conjunction with the King County Hospitals for a Healthier Community (HHC). The data offer a comprehensive overview of demographics, health, and health behaviors among King County residents. Users can search by key word or topic area to filter the table of contents displayed below. After clicking on an indicator, a summary tab will open and users can click on additional tabs to explore data analyzed by demographic characteristics, see how rates have changed over time, and view data for cities/neighborhoods. Most indicators are interactive and users can hover over maps or charts to find more information. The data presented on this website may be reproduced without permission. Please use the following citation when reproducing: "Retrieved (date) from Public Health – Seattle & King County, Community Health Indicators. www.kingcounty.gov/chi"

  3. New York State Health Indicators

    • kaggle.com
    zip
    Updated Jan 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). New York State Health Indicators [Dataset]. https://www.kaggle.com/datasets/thedevastator/new-york-state-health-indicators
    Explore at:
    zip(513327 bytes)Available download formats
    Dataset updated
    Jan 28, 2023
    Authors
    The Devastator
    Area covered
    New York
    Description

    New York State Health Indicators

    Examination of County and Region-level Data

    By Health Data New York [source]

    About this dataset

    The New York State Community Health Indicator Reports (CHIRS) provides an incredible resource of data to analyze the health of all communities in this state. This dataset contains more than 300 indicators across 15 health topics, which are organized by region and county. These indicators include important information such as event counts, percent/rates, confidence intervals, measure units,quartiles and many more. Whether you're a researcher or a policymaker interested in public health issues in this state - this dataset can be used to inform your decisions by creating powerful visuals with it's wealth of data points. Use this dataset to explore different factors that could be impacting public health outcomes and discover key insights around public health trends in the Empire State!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset contains data on more than 300 health indicators for all 62 New York State counties, 11 regions (including New York City), the State excluding New York City, and New York State. It can be used to analyze different trends in population health from a local and state-level perspective. Here is a guide on how to use this dataset:

    • Familiarize yourself with the data columns: Have an understanding of what each column represents in order to have a better grasp of what type of analyses you will be able to do with this dataset. Additionally, look into other potential features that may not be included within this dataset but could help you with your research or analysis.
    • Clean and prepare the data: Make sure that the data is up-to-date and free of errors by cleaning it up prior to conducting any analysis or research project. Some cleaning steps may include inspecting for accuracy, addressing missing values/outliers, formatting irregularities etc.
    • Generate questions related to public health issues: Brainstorm ideas around public health topics or possible implications based on your curiosities then use those questions as stepping stones when conducting further research or analysis into this particular healthcare dataset..
    • Visualize key information through visual plots/charts: Create charts and graphs which could significantly give out important insights by providing visualization capabilities that would allow users valuable information in an understandable manner such as indicating correlations between certain factors or determining frequency distributions among others.. 5 Develop conclusions from your exploratory findings : Through careful calculation using thoughtfully designed formulas as well as chart interpretation draw meaningful conclusions from continuous observation assessments performed within the contents of this healthcare related base answer pertinent queries raised at hand efficiently thereby leaving no room for ambiguity in user’s overall comprehension about subject matter discussed herein ensured efficient completion processes executed timely objectives justly desired

    Research Ideas

    • Comparing health indicators across different New York state counties and regions: This dataset can be used to compare the health indicators of different New York county and region levels, helping identify areas of strength or weakness in an area's public health conditions.
    • Examining changes over time: By analyzing data from multiple years, this dataset can be used to understand patterns in changes of public health outcomes throughout NY state regions since 2012.
    • Generating targeted public health initiatives and interventions: Understanding the geographical distribution of positive or negative public health outcomes could help generate targeted policy interventions more effectively tailored to local needs than a one-size-fits-all approach

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    See the dataset description for more information.

    Columns

    File: community-health-indicator-reports-chirs-latest-data-1.csv | Column name | Description | |:----------------------------------|:-------------------------------------------------------------------------------| | County Name | Name of the county in New York State. (String) | | Health Topic Number | Number assigned to each hea...

  4. c

    Diabetes Health Indicators Dataset

    • cubig.ai
    zip
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Diabetes Health Indicators Dataset [Dataset]. https://cubig.ai/store/products/399/diabetes-health-indicators-dataset
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 5, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
    Description

    1) Data Introduction • The Diabetes Health Indicators Dataset is a large health dataset that collects various health indicators and lifestyle information related to diabetes diagnosis based on health surveys and medical records of the U.S. population.

    2) Data Utilization (1) Diabetes Health Indicators Dataset has characteristics that: • The dataset consists of more than 250,000 samples and contains more than 20 health and demographic variables, including diabetes (binary or triage label), age, gender, BMI, blood pressure, cholesterol, smoking and drinking habits, physical activity, mental health, income, and education level. (2) Diabetes Health Indicators Dataset can be used to: • Diabetes prediction model development: It can be used to develop machine learning-based classification models that use health indicators and lifestyle data to predict the risk of developing diabetes. • A Study on the Correlation between Lifestyle and Diabetes: It can be used in epidemiological and public health studies to analyze the effects of various lifestyle and demographic variables such as smoking, drinking, exercise, and eating habits on diabetes incidence.

  5. Mental and Behavioral Health Diagnoses in Emergency Department and Inpatient...

    • data.chhs.ca.gov
    • data.ca.gov
    • +2more
    csv, pdf, zip
    Updated Nov 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Health Care Access and Information (2025). Mental and Behavioral Health Diagnoses in Emergency Department and Inpatient Discharges by Healthy Places Index Ranking [Dataset]. https://data.chhs.ca.gov/dataset/mental-and-behavioral-health-diagnoses-in-emergency-department-and-inpatient-discharges
    Explore at:
    pdf(84837), csv(115833), zipAvailable download formats
    Dataset updated
    Nov 7, 2025
    Dataset authored and provided by
    Department of Health Care Access and Information
    Description

    This dataset provides counts and percentages of diagnoses broken down by each patient’s Healthy Places Index percentile ranking (based on ZIP code of residence). Healthcare encounters are categorized into four diagnosis groups: mental health disorders, substance use disorders, co-occurring disorders, and all other diagnoses. To view and interact with a fully functioning version of the HPI map and data used in these HCAI analyses of behavioral health, please click the link to visit https://map.healthyplacesindex.org/.

  6. w

    Service Delivery Indicators Health Survey 2013 - Harmonized Public Use Data...

    • microdata.worldbank.org
    • catalog.ihsn.org
    Updated Apr 1, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Waly Wane (2021). Service Delivery Indicators Health Survey 2013 - Harmonized Public Use Data - Uganda [Dataset]. https://microdata.worldbank.org/index.php/catalog/2750
    Explore at:
    Dataset updated
    Apr 1, 2021
    Dataset authored and provided by
    Waly Wane
    Time period covered
    2013
    Area covered
    Uganda
    Description

    Abstract

    The Service Delivery Indicators (SDI) are a set of health and education indicators that examine the effort and ability of staff and the availability of key inputs and resources that contribute to a functioning school or health facility. The indicators are standardized, allowing comparison between and within countries over time.

    The Health SDIs include healthcare provider effort, knowledge and ability, and the availability of key inputs (for example, basic equipment, medicines and infrastructure, such as toilets and electricity). The indicators provide a snapshot of the health facility and assess the availability of key resources for providing high quality care.

    The Uganda SDI Health survey team visited a sample of 394 health facilities across Uganda between June and October 2013. The survey team collected rosters covering 2,347 workers for absenteeism and assessed 733 health workers for competence using patient case simulations.

    Geographic coverage

    National

    Analysis unit

    Health facilities and healthcare providers

    Universe

    All health facilities providing primary-level care.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The sampling strategy for SDI surveys is designed towards attaining indicators that are accurate and representative at the national level, as this allows for proper cross-country (i.e. international benchmarking) and across time comparisons, when applicable. In addition, other levels of representativeness are sought to allow for further disaggregation (rural/urban areas, public/private facilities, subregions, etc.) during the analysis stage.

    The sampling strategy for SDI surveys follows a multistage sampling approach. The main units of analysis are facilities (schools and health centers) and providers (health and education workers: teachers, doctors, nurses, facility managers, etc.). The multi-stage sampling approach makes sampling procedures more practical by dividing the selection of large populations of sampling units in a step-by-step fashion. After defining the sampling frame and categorizing it by stratum, a first stage selection of sampling units is carried out independently within each stratum. Often, the primary sampling units (PSU) for this stage are cluster locations (e.g. districts, communities, counties, neighborhoods, etc.) which are randomly drawn within each stratum with a probability proportional to the size (PPS) of the cluster (measured by the location’s number of facilities, providers or pupils). Once locations are selected, a second stage takes place by randomly selecting facilities within location (either with equal probability or with PPS) as secondary sampling units. At a third stage, a fixed number of health and education workers and pupils are randomly selected within facilities to provide information for the different questionnaire modules.

    Detailed information about the specific sampling process is available in the associated SDI Country Report included as part of the documentation that accompany these datasets.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The SDI Health Survey Questionnaire consists of four modules and weights:

    Module 1: General Information - Administered to the health facility manager to collect information on equipment, medicines, infrastructure and other facets of the health facility.

    Module 2: Provider Absence - A roster of healthcare providers is collected and absence measured.

    Module 3: Clinical Vignettes – A selection of providers are given clinical vignettes to measure knowledge of common medical conditions.

    Module 4: Facility finances – Information on facility revenue and expenditures is collected from the health facility manager.

    Weights: Weights for facilities, absentee-related analyses and clinical vignette analyses.

    Cleaning operations

    Quality control was performed in Stata.

  7. WHO Malaysia Health Indicators

    • kaggle.com
    zip
    Updated Jan 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). WHO Malaysia Health Indicators [Dataset]. https://www.kaggle.com/datasets/thedevastator/who-malaysia-health-indicators
    Explore at:
    zip(752315 bytes)Available download formats
    Dataset updated
    Jan 28, 2023
    Authors
    The Devastator
    Area covered
    Malaysia
    Description

    WHO Malaysia Health Indicators

    Malaria, HIV/STIs, Suicide, CVD, Mortality, and more

    By Humanitarian Data Exchange [source]

    About this dataset

    This dataset contains a range of indicators related to health, health systems, and sustainable development from the World Health Organization's data portal. It covers topics ranging from mortality and global health estimates to essential health technologies, youth engagement, mental health initiatives, and infectious diseases. With data points including publich state codes and display values, this dataset provides detailed insight into how healthcare is managed all around the globe. From tracking malaria outbreaks to exploring various international agreements on public healthcare initiatives, this dataset offers a wide array of powerful information for machine learning projects that are designed to improve our understanding of global healthcare trends. Explore the correlations between different countries' universal healthcare coverage measures or investigate any discrepancies between developed and developing nations - unlock deeper insights with the WHO's extensive data!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    Getting Started: First, you need to download the dataset from Kaggle. Once you have it saved in your computer, open it with a spreadsheet software such as Excel or Google Sheets.

    Exploring the Data: The dataset contains columns that offer information about indicators related to health in Malaysia including mortality rates, prevention programs and providers, financing information, human resource information, and more. To explore particular aspects of this data you should filter the rows using any of these column values. For example if you want results for a specific year or region you can filter by ‘year’ or ‘region’ accordingly. It’s important to note that some columns have relation between them (e.g., country code corresponds with country display name).

    Data Outputs:
    Using this dataset allows users to generate visual representations such as graphs which can help display trends over time regarding our stability goals concerning human resources funding rates or pregnancies outcomes among other variables included in our report summary outputs on WHO dashboard at global level specifically representing data coming from our members countries likeMalaysia making sense out these actions performed by several governments highlights where we still have areas lacking risk mitigation efforts and core elements when tryingto achieve better life quality around world aiming better efficiency through good governance practices supported on demand reduction strategies coming from healthcare professionals expertise frame work .

    Conclusion:

    Research Ideas

    • Analysis of health coverage and services in Malaysia, allowing comparison between different public health organizations and the effect of specific prevention programs.
    • Identification of gaps between existing healthcare access and provide a standardized data-driven reference point to ensure equitable access across different regions in the country.
    • Creation of interactive geographical dashboards that display comparisons among relevant indicators, providing visual representation on how to best target distribution resources for optimal impact

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    See the dataset description for more information.

    Columns

    File: rsud-service-organization-and-delivery-prevention-programs-and-providers-indicators-for-malaysia-38.csv | Column name | Description | |:--------------------------------------|:----------------------------------------------------------------| | GHO (CODE) | The Global Health Observatory code for the indicator. (String) | | GHO (DISPLAY) | The name of the indicator. (String) | | GHO (URL) | The URL for the indicator. (URL) | | PUBLISHSTATE (CODE) | The code for the publishing state of the indicator. (String) | | PUBLISHSTATE (DISPLAY) | The name of the publishing state of the indicator. (String) | | PUBLISHSTATE (URL) | The URL for the publishing state of the indicator. (URL) | | YEAR (CODE) | The code for...

  8. o

    WHO Health Indicators - Dataset - Data Catalog Armenia

    • data.opendata.am
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). WHO Health Indicators - Dataset - Data Catalog Armenia [Dataset]. https://data.opendata.am/dataset/who-health-indicators
    Explore at:
    Dataset updated
    May 31, 2023
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Contains data from World Health Organization's data portal covering the following categories: Mortality and global health estimates, Sustainable development goals, Millennium Development Goals (MDGs), Health systems, Malaria, Tuberculosis, Child health, Infectious diseases, Neglected Tropical Diseases, World Health Statistics, Health financing, Tobacco, Substance use and mental health, Injuries and violence, HIV/AIDS and other STIs, Public health and environment, Nutrition, Urban health, Child mortality, Noncommunicable diseases, Noncommunicable diseases CCS, Negelected tropical diseases, Infrastructure, Essential health technologies, Medical equipment, Demographic and socioeconomic statistics, Health inequality monitor, Health Equity Monitor, Child malnutrition, TOBACCO, Neglected tropical diseases, International Health Regulations (2005) monitoring framework, 0, Insecticide resistance, Oral health, Universal Health Coverage, Global Observatory for eHealth (GOe), RSUD: GOVERNANCE, POLICY AND FINANCING : PREVENTION, RSUD: GOVERNANCE, POLICY AND FINANCING: TREATMENT, RSUD: GOVERNANCE, POLICY AND FINANCING: FINANCING, RSUD: SERVICE ORGANIZATION AND DELIVERY: TREATMENT SECTORS AND PROVIDERS, RSUD: SERVICE ORGANIZATION AND DELIVERY: TREATMENT CAPACITY AND TREATMENT COVERAGE, RSUD: SERVICE ORGANIZATION AND DELIVERY: PHARMACOLOGICAL TREATMENT, RSUD: SERVICE ORGANIZATION AND DELIVERY: SCREENING AND BRIEF INTERVENTIONS, RSUD: SERVICE ORGANIZATION AND DELIVERY: PREVENTION PROGRAMS AND PROVIDERS, RSUD: SERVICE ORGANIZATION AND DELIVERY: SPECIAL PROGRAMMES AND SERVICES, RSUD: HUMAN RESOURCES, RSUD: INFORMATION SYSTEMS, RSUD: YOUTH, FINANCIAL PROTECTION, AMR GLASS, Noncommunicable diseases and mental health, Health workforce, AMR GASP, ICD, SEXUAL AND REPRODUCTIVE HEALTH, Immunization, NLIS, AMC GLASS. For links to individual indicator metadata, see resource descriptions.

  9. Health Index scores, England

    • ons.gov.uk
    • cy.ons.gov.uk
    xlsx
    Updated Jun 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2023). Health Index scores, England [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/healthandwellbeing/datasets/healthindexscoresengland
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 16, 2023
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Health Index scores at national, regional, and upper- and lower-tier local authority level for England, including indicator details to construct the Index.

  10. Case Mix Index

    • catalog.data.gov
    • healthdata.gov
    • +2more
    Updated Oct 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Health Care Access and Information (2025). Case Mix Index [Dataset]. https://catalog.data.gov/dataset/case-mix-index-bad58
    Explore at:
    Dataset updated
    Oct 23, 2025
    Dataset provided by
    Department of Health Care Access and Information
    Description

    The Case Mix Index (CMI) is the average relative DRG weight of a hospital’s inpatient discharges, calculated by summing the Medicare Severity-Diagnosis Related Group (MS-DRG) weight for each discharge and dividing the total by the number of discharges. The CMI reflects the diversity, clinical complexity, and resource needs of all the patients in the hospital. A higher CMI indicates a more complex and resource-intensive case load. Although the MS-DRG weights, provided by the Centers for Medicare & Medicaid Services (CMS), were designed for the Medicare population, they are applied here to all discharges regardless of payer. Note: It is not meaningful to add the CMI values together.

  11. l

    Healthy Places Index (3.0)

    • geohub.lacity.org
    • ph-lacounty.hub.arcgis.com
    • +2more
    Updated Sep 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    County of Los Angeles (2022). Healthy Places Index (3.0) [Dataset]. https://geohub.lacity.org/datasets/lacounty::healthy-places-index-3-0
    Explore at:
    Dataset updated
    Sep 30, 2022
    Dataset authored and provided by
    County of Los Angeles
    Area covered
    Description

    The California Healthy Places Index 3.0 data file was acquired on 04/25/22 from the Public Health Institute on behalf of the Public Health Alliance of Southern California.According to the Public Health Institute, "The HPI tool evaluates the relationship between 23 identified key drivers of health and life expectancy at birth -- which can vary dramatically by neighborhood. Based on that analysis, it produces a score ranking from 1 to 99 that shows the relative impact of conditions in a selected area compared to all other such places in the state." The HPI score is divided across four quartiles. (The Enhanced HPI 3.0: Advancing Health Equity Through High-Quality Data)Potential indicators assigned to eight policy action areas (domains):EconomicsEducationHealthcare accessHousingNeighborhood ConditionsClean EnvironmentSocial EnvironmentTransportationAn HPI score, domains, and individual indicator values and their percentile rankings are presented in the table.For more information, visit the California Healthy Places Index website at https://www.healthyplacesindex.org/ProcessConverted the XLSX file received from the Public Health Institute to a file geodatabase table. Filtered the statewide data to Los Angeles County only. The filtered dataset retains the original default HPI score rank, which is based on conditions across statewide census tracts. Edited field alias names for readability. Joined table to CENSUS_TRACTS_2010 from the Los Angeles County eGIS Data Repository. Exported to new file geodatabase feature class.

  12. Public Health Indicators in Chicago

    • kaggle.com
    zip
    Updated Jan 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Public Health Indicators in Chicago [Dataset]. https://www.kaggle.com/datasets/thedevastator/public-health-indicators-in-chicago
    Explore at:
    zip(5864 bytes)Available download formats
    Dataset updated
    Jan 24, 2023
    Authors
    The Devastator
    Area covered
    Chicago
    Description

    Public Health Indicators in Chicago

    Natality, Mortality, Infectious Disease, Lead Poisoning and Economic Status

    By City of Chicago [source]

    About this dataset

    This public health dataset contains a comprehensive selection of indicators related to natality, mortality, infectious disease, lead poisoning, and economic status from Chicago community areas. It is an invaluable resource for those interested in understanding the current state of public health within each area in order to identify any deficiencies or areas of improvement needed.

    The data includes 27 indicators such as birth and death rates, prenatal care beginning in first trimester percentages, preterm birth rates, breast cancer incidences per hundred thousand female population, all-sites cancer rates per hundred thousand population and more. For each indicator provided it details the geographical region so that analyses can be made regarding trends on a local level. Furthermore this dataset allows various stakeholders to measure performance along these indicators or even compare different community areas side-by-side.

    This dataset provides a valuable tool for those striving toward better public health outcomes for the citizens of Chicago's communities by allowing greater insight into trends specific to geographic regions that could potentially lead to further research and implementation practices based on empirical evidence gathered from this comprehensive yet digestible selection of indicators

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    In order to use this dataset effectively to assess the public health of a given area or areas in the city: - Understand which data is available: The list of data included in this dataset can be found above. It is important to know all that are included as well as their definitions so that accurate conclusions can be made when utilizing the data for research or analysis. - Identify areas of interest: Once you are familiar with what type of data is present it can help to identify which community areas you would like to study more closely or compare with one another. - Choose your variables: Once you have identified your areas it will be helpful to decide which variables are most relevant for your studies and research specific questions regarding these variables based on what you are trying to learn from this data set.
    - Analyze the Data : Once your variables have been selected and clarified take right into analyzing the corresponding values across different community areas using statistical tests such as t-tests or correlations etc.. This will help answer questions like “Are there significant differences between two outputs?” allowing you to compare how different Chicago Community Areas stack up against each other with regards to public health statistics tracked by this dataset!

    Research Ideas

    • Creating interactive maps that show data on public health indicators by Chicago community area to allow users to explore the data more easily.
    • Designing a machine learning model to predict future variations in public health indicators by Chicago community area such as birth rate, preterm births, and childhood lead poisoning levels.
    • Developing an app that enables users to search for public health information in their own community areas and compare with other areas within the city or across different cities in the US

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    See the dataset description for more information.

    Columns

    File: public-health-statistics-selected-public-health-indicators-by-chicago-community-area-1.csv | Column name | Description | |:-----------------------------------------------|:--------------------------------------------------------------------------------------------------| | Community Area | Unique identifier for each community area in Chicago. (Integer) | | Community Area Name | Name of the community area in Chicago. (String) | | Birth Rate | Number of live births per 1,000 population. (Float) | | General Fertility Rate | Number of live births per 1,000 women aged 15-44. (Float) ...

  13. w

    Service Delivery Indicators Health Survey 2014 - Harmonized Public Use Data...

    • microdata.worldbank.org
    • catalog.ihsn.org
    Updated Apr 1, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ezequiel Molina (2021). Service Delivery Indicators Health Survey 2014 - Harmonized Public Use Data - Mozambique [Dataset]. https://microdata.worldbank.org/index.php/catalog/3876
    Explore at:
    Dataset updated
    Apr 1, 2021
    Dataset provided by
    Ezequiel Molina
    Waly Wane
    Time period covered
    2014
    Area covered
    Mozambique
    Description

    Abstract

    The Service Delivery Indicators (SDI) are a set of health and education indicators that examine the effort and ability of staff and the availability of key inputs and resources that contribute to a functioning school or health facility. The indicators are standardized, allowing comparison between and within countries over time.

    The Health SDIs include healthcare provider effort, knowledge and ability, and the availability of key inputs (for example, basic equipment, medicines and infrastructure, such as toilets and electricity). The indicators provide a snapshot of the health facility and assess the availability of key resources for providing high quality care.

    The Mozambique SDI Health survey team visited a sample of 195 health facilities across Mozambique between April and June 2014. The survey team collected rosters covering 2,972 workers for absenteeism and assessed 694 health workers for competence using patient case simulations.

    Geographic coverage

    National

    Analysis unit

    Health facilities and healthcare providers

    Universe

    All health facilities providing primary-level care

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The sampling strategy for SDI surveys is designed towards attaining indicators that are accurate and representative at the national level, as this allows for proper cross-country (i.e. international benchmarking) and across time comparisons, when applicable. In addition, other levels of representativeness are sought to allow for further disaggregation (rural/urban areas, public/private facilities, subregions, etc.) during the analysis stage.

    The sampling strategy for SDI surveys follows a multistage sampling approach. The main units of analysis are facilities (schools and health centers) and providers (health and education workers: teachers, doctors, nurses, facility managers, etc.). The multi-stage sampling approach makes sampling procedures more practical by dividing the selection of large populations of sampling units in a step-by-step fashion. After defining the sampling frame and categorizing it by stratum, a first stage selection of sampling units is carried out independently within each stratum. Often, the primary sampling units (PSU) for this stage are cluster locations (e.g. districts, communities, counties, neighborhoods, etc.) which are randomly drawn within each stratum with a probability proportional to the size (PPS) of the cluster (measured by the location’s number of facilities, providers or pupils). Once locations are selected, a second stage takes place by randomly selecting facilities within location (either with equal probability or with PPS) as secondary sampling units. At a third stage, a fixed number of health and education workers and pupils are randomly selected within facilities to provide information for the different questionnaire modules.

    Detailed information about the specific sampling process is available in the associated SDI Country Report included as part of the documentation that accompany these datasets.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The SDI Health Survey Questionnaire consists of four modules and weights:

    Module 1: General Information - Administered to the health facility manager to collect information on equipment, medicines, infrastructure and other facets of the health facility.

    Module 2: Provider Absence - A roster of healthcare providers is collected and absence measured.

    Module 3: Clinical Vignettes – A selection of providers are given clinical vignettes to measure knowledge of common medical conditions.

    Module 4: Facility finances – Information on facility revenue and expenditures is collected from the health facility manager.

    Weights: Weights for facilities, absentee-related analyses and clinical vignette analyses.

    Cleaning operations

    Quality control was performed in Stata.

  14. l

    Community Health and Equity Index

    • visionzero.geohub.lacity.org
    • geohub.lacity.org
    • +3more
    Updated Feb 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GIS@LADCP (2024). Community Health and Equity Index [Dataset]. https://visionzero.geohub.lacity.org/datasets/community-health-and-equity-index-1
    Explore at:
    Dataset updated
    Feb 7, 2024
    Dataset authored and provided by
    GIS@LADCP
    Area covered
    Description

    The Community Health and Equity Index was developed by Raimi + Associates to compare health conditions, vulnerabilities, and cumulative burdens across the City of Los Angeles. The Index standardizes demographic, socio-economic, health conditions, land use, transportation, food environment, crime, and pollution burden variables, and then averages them together, yielding a score on a scale of 0-100. Lower values indicate better community health.Variables used in the index include: Hardship Index, Life Expectancy, Health Variables (Heart Disease Mortality, Emergency Department Visits for Heart Attacks, Respiratory Disease Mortality, Diabetes Mortality, Stroke Mortality, Childhood Obesity, Percentage of Low Birth Weight Infants, Number of Emergency Department Visits for Asthma for Under 17 and 18+ age groups), Walkability Index, Complete Communities Index (amenities and establishments serving the community), Transportation Index, Modified Retail Food Environment Index, Crime Rate (Violent Crimes, Property Crimes), and Pollution Burden (Pollution Exposure, Environmental Effects).Variables were assigned weights and averaged together. Weights were assigned based on the weights used in the 2013 Health Atlas. For more information, see page 181 of the 2013 Health Atlas, which is available as a PDF on the Los Angeles City Planning website, https://planning.lacity.gov.

  15. Evaluating Health Home Care Quality

    • kaggle.com
    zip
    Updated Jan 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Evaluating Health Home Care Quality [Dataset]. https://www.kaggle.com/datasets/thedevastator/evaluating-health-home-care-quality/data
    Explore at:
    zip(52620 bytes)Available download formats
    Dataset updated
    Jan 23, 2023
    Authors
    The Devastator
    Description

    Evaluating Health Home Care Quality

    CMS Core Set and Health Home SPA Measures

    By Health Data New York [source]

    About this dataset

    This dataset provides comprehensive measures to evaluate the quality of medical services provided to Medicaid beneficiaries by Health Homes, including the Centers for Medicare & Medicaid Services (CMS) Core Set and Health Home State Plan Amendment (SPA). This allows us to gain insight into how well these health homes are performing in terms of delivering high-quality care. Our data sources include the Medicaid Data Mart, QARR Member Level Files, and New York State Delivery System Inform Incentive Program (DSRIP) Data Warehouse. With this data set you can explore essential indicators such as rates for indicators within scope of Core Set Measures, sub domains, domains and measure descriptions; age categories used; denominators of each measure; level of significance for each indicator; and more! By understanding more about Health Home Quality Measures from this resource you can help make informed decisions about evidence based health practices while also promoting better patient outcomes

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset contains measures that evaluate the quality of care delivered by Health Homes for the Centers for Medicare & Medicaid Services (CMS). With this dataset, you can get an overview of how a health home is performing in terms of quality. You can use this data to compare different health homes and their respective service offerings.

    The data used to create this dataset was collected from Medicaid Data Mart, QARR Member Level Files, and New York State Delivery System Incentive Program (DSRIP) Data Warehouse sources.

    In order to use this dataset effectively, you should start by looking at the columns provided. These include: Measurement Year; Health Home Name; Domain; Sub Domain; Measure Description; Age Category; Denominator; Rate; Level of Significance; Indicator. Each column provides valuable insight into how a particular health home is performing in various measurements of healthcare quality.

    When examining this data, it is important to remember that many variables are included in any given measure and that changes may have occurred over time due to varying factors such as population or financial resources available for healthcare delivery. Furthermore, changes in policy may also affect performance over time so it is important to take these things into account when evaluating the performance of any given health home from one year to the next or when comparing different health homes on a specific measure or set of indicators over time

    Research Ideas

    • Using this dataset, state governments can evaluate the effectiveness of their health home programs by comparing the performance across different domains and subdomains.
    • Healthcare providers and organizations can use this data to identify areas for improvement in quality of care provided by health homes and strategies to reduce disparities between individuals receiving care from health homes.
    • Researchers can use this dataset to analyze how variations in cultural context, geography, demographics or other factors impact delivery of quality health home services across different locations

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    See the dataset description for more information.

    Columns

    File: health-home-quality-measures-beginning-2013-1.csv | Column name | Description | |:--------------------------|:----------------------------------------------------| | Measurement Year | The year in which the data was collected. (Integer) | | Health Home Name | The name of the health home. (String) | | Domain | The domain of the measure. (String) | | Sub Domain | The sub domain of the measure. (String) | | Measure Description | A description of the measure. (String) | | Age Category | The age category of the patient. (String) | | Denominator | The denominator of the measure. (Integer) | | Rate | The rate of the measure. (Float) | | Level of Significance | The level of significance of the measure. (String) | | Indicator | The indicator of the measure. (String) |

    Acknowledgements

    ...

  16. Key Substance Use and Mental Health Indicators in the United States: Results...

    • catalog.data.gov
    • data.virginia.gov
    • +1more
    Updated Sep 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Substance Abuse and Mental Health Services Administration (2025). Key Substance Use and Mental Health Indicators in the United States: Results from the 2016 National Survey on Drug Use and Health [Dataset]. https://catalog.data.gov/dataset/key-substance-use-and-mental-health-indicators-in-the-united-states-results-from-the-2016-
    Explore at:
    Dataset updated
    Sep 6, 2025
    Dataset provided by
    Substance Abuse and Mental Health Services Administrationhttps://www.samhsa.gov/
    Area covered
    United States
    Description

    This national report summarizes key findings from the 2016 National Survey on Drug Use and Health (NSDUH) for indicators of substance use and mental health among people aged 12 years old or older in the civilian, noninstitutionalized population of the United States. Estimates include tobacco use, alcohol use, illicit drug use, opioid use, substance use disorders, major depressive episode, any mental illness, serious mental illness, suicide, co-occurring disorders, and receipt of treatment or services.

  17. Environmental Quality Index

    • catalog.data.gov
    • s.cnmilf.com
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Environmental Quality Index [Dataset]. https://catalog.data.gov/dataset/environmental-quality-index
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    An Environmental Quality Index (EQI) for all counties in the United States for the time period 2000-2005 was developed which incorporated data from five environmental domains: air, water, land, built, and socio-demographic. The EQI was developed in four parts: domain identification; data source identification and review; variable construction; and data reduction using principal components analysis (PCA). The methods applied provide a reproducible approach that capitalizes almost exclusively on publically-available data sources. The primary goal in creating the EQI is to use it as a composite environmental indicator for research on human health. A series of peer reviewed manuscripts utilized the EQI in examining health outcomes. This dataset is not publicly accessible because: This series of papers are considered Human health research - not to be loaded onto ScienceHub. It can be accessed through the following means: The EQI data can be accessed at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: EQI data, metadata, formats, and data dictionary all available at website. This dataset is associated with the following publications: Gray, C., L. Messer, K. Rappazzo, J. Jagai, S. Grabich, and D. Lobdell. The association between physical inactivity and obesity is modified by five domains of environmental quality in U.S. adults: A cross-sectional study. PLoS ONE. Public Library of Science, San Francisco, CA, USA, 13(8): e0203301, (2018). Patel, A., J. Jagai, L. Messer, C. Gray, K. Rappazzo, S. DeflorioBarker, and D. Lobdell. Associations between environmental quality and infant mortality in the United States, 2000-2005. Archives of Public Health. BioMed Central Ltd, London, UK, 76(60): 1, (2018). Gray, C., D. Lobdell, K. Rappazzo, Y. Jian, J. Jagai, L. Messer, A. Patel, S. Deflorio-Barker, C. Lyttle, J. Solway, and A. Rzhetsky. Associations between environmental quality and adult asthma prevalence in medical claims data. ENVIRONMENTAL RESEARCH. Elsevier B.V., Amsterdam, NETHERLANDS, 166: 529-536, (2018).

  18. Ocean Health Index

    • samoa-data.sprep.org
    • tuvalu-data.sprep.org
    • +6more
    Updated Oct 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Secreteriat of the Pacific Regional Environment Programme (SPREP) (2025). Ocean Health Index [Dataset]. https://samoa-data.sprep.org/dataset/ocean-health-index
    Explore at:
    Dataset updated
    Oct 17, 2025
    Dataset provided by
    Pacific Regional Environment Programmehttps://www.sprep.org/
    License

    Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
    License information was derived automatically

    Area covered
    Worldwide, Pacific Region
    Description

    The global Ocean Health Index measures the state of the world’s oceans.The global OHI score for the 2024 assessment was 69, which was quite a bit lower than last year’s score of 73. This was due to COVID-related declines in tourism and recreation [the 2024 scores reflect 2021 data]. You can explore this and other goals using the interactive map which shows how different countries and goals contribute to the global score, as well as how the score has changed since 2012. Click on colored regions (i.e. EEZs) to see short country summaries.

  19. Indices of Multiple Deprivation 2010, Health Score - Dataset - data.gov.uk

    • ckan.publishing.service.gov.uk
    Updated Oct 27, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.publishing.service.gov.uk (2014). Indices of Multiple Deprivation 2010, Health Score - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/indices-of-multiple-deprivation-2010-health-score
    Explore at:
    Dataset updated
    Oct 27, 2014
    Dataset provided by
    CKANhttps://ckan.org/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Score for each LSOA in the Health Deprivation and Disability domain. The English Indices of Deprivation provide a relative measure of deprivation at small area level across England. Areas are ranked from least deprived to most deprived on seven different dimensions of deprivation and an overall composite measure of multiple deprivation. Most of the data underlying the 2010 indices are for the year 2008. The indices have been constructed by the Social Disadvantage Research Centre at the University of Oxford for the Department for Communities and Local Government. All figures can only be reproduced if the source (Department for Communities and Local Government, Indices of Deprivation 2010) is fully acknowledged. The domains used in the Indices of Deprivation 2010 are: income deprivation; employment deprivation; health deprivation and disability; education deprivation; crime deprivation; barriers to housing and services deprivation; and living environment deprivation. Each of these domains has its own scores and ranks, allowing users to focus on specific aspects of deprivation. Because the indices give a relative measure, they can tell you if one area is more deprived than another but not by how much. For example, if an area has a rank of 40 it is not half as deprived as a place with a rank of 20. The Index of Multiple Deprivation was constructed by combining scores from the seven domains. When comparing areas, a higher deprivation score indicates a higher proportion of people living there who are classed as deprived. But as for ranks, deprivation scores can only tell you if one area is more deprived than another, but not by how much. This dataset was created from a spreadsheet provided by the Department of Communities and Local Government, which can be downloaded here. The method for calculating the IMD score and underlying indicators is detailed in the report 'The English Indices of Deprivation 2010: Technical Report'. The data is represented here as Linked Data, using the Data Cube ontology.

  20. T

    US Small Business Health Index

    • tradingeconomics.com
    csv, excel, json, xml
    Updated Dec 22, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2014). US Small Business Health Index [Dataset]. https://tradingeconomics.com/united-states/small-business-sentiment
    Explore at:
    json, xml, excel, csvAvailable download formats
    Dataset updated
    Dec 22, 2014
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Feb 28, 2014 - Feb 28, 2015
    Area covered
    United States
    Description

    Small Business Sentiment in the United States increased to 51.67 in February from 49.03 in January of 2015. This dataset provides the latest reported value for - US Small Business Health Index - plus previous releases, historical high and low, short-term forecast and long-term prediction, economic calendar, survey consensus and news.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mohan Krishna Thalla (2025). Diabetes Health Indicators Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/13128284
Organization logo

Diabetes Health Indicators Dataset

A Comprehensive Dataset of 100,000 Patient Records for Diabetes Risk Analysis

Explore at:
346 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 21, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mohan Krishna Thalla
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Diabetes Health Indicators Dataset

Overview

This dataset contains 100,000 patient records designed for diabetes risk prediction, analysis, and machine learning applications. The dataset is clean, preprocessed, and ready for use in classification, regression, feature engineering, statistical analysis, and data visualization.

  • Rows: 100,000
  • Columns: 35+
  • File: diabetes_dataset.csv

Dataset Description

The dataset includes patient profiles with features based on demographics, lifestyle habits, family history, and clinical measurements that are well-established indicators of diabetes risk. All data is generated using statistical distributions inspired by real-world medical research, ensuring privacy preservation while reflecting realistic health patterns.

Features

ColumnTypeDescriptionValues/Range
patient_idIntegerUnique patient identifier1–100000
ageIntegerAge of patient in years18–90
genderStringPatient gender'Male', 'Female', 'Other'
ethnicityStringEthnic background'White', 'Hispanic', 'Black', 'Asian', 'Other'
education_levelStringHighest completed education'No formal', 'Highschool', 'Graduate', 'Postgraduate'
income_levelStringIncome category'Low', 'Medium', 'High'
employment_statusStringEmployment type'Employed', 'Unemployed', 'Retired', 'Student'
smoking_statusStringSmoking behavior'Never', 'Former', 'Current'
alcohol_consumption_per_weekFloatDrinks consumed per week0–30
physical_activity_minutes_per_weekIntegerPhysical activity (weekly minutes)0–600
diet_scoreIntegerDiet quality (higher = healthier)0–10
sleep_hours_per_dayFloatAverage daily sleep hours3–12
screen_time_hours_per_dayFloatAverage daily screen time hours0–12
family_history_diabetesIntegerFamily history of diabetes0 = No, 1 = Yes
hypertension_historyIntegerHypertension history0 = No, 1 = Yes
cardiovascular_historyIntegerCardiovascular history0 = No, 1 = Yes
bmiFloatBody Mass Index (kg/m²)15–45
waist_to_hip_ratioFloatWaist-to-hip ratio0.7–1.2
systolic_bpIntegerSystolic blood pressure (mmHg)90–180
diastolic_bpIntegerDiastolic blood pressure (mmHg)60–120
heart_rateIntegerResting heart rate (bpm)50–120
cholesterol_totalFloatTotal cholesterol (mg/dL)120–300
hdl_cholesterolFloatHDL cholesterol (mg/dL)20–100
ldl_cholesterolFloatLDL cholesterol (mg/dL)50–200
triglyceridesFloatTriglycerides (mg/dL)50–500
glucose_fastingFloatFasting glucose (mg/dL)70–250
glucose_postprandialFloatPost-meal glucose (mg/dL)90–350
insulin_levelFloatBlood insulin level (µU/mL)2–50
hba1cFloatHbA1c (%)4–14
diabetes_risk_scoreIntegerRisk score (calculated, 0–100)0–100
diabetes_stageStringStage of diabetes'No Diabetes', 'Pre-Diabetes', 'Type 1', 'Type 2', 'Gestational'
diagnosed_diabetesIntegerTarget: Diabetes diagnosis0 = No, 1 = Yes

Data Quality

  • Complete: No missing values or duplicates
  • Clean: All values fall within medically realistic ranges
  • Balanced Features: Distribution matches realistic population health patterns
  • Target Distribution: ~20–25% diagnosed cases (balanced for ML classification)

Use Cases

  • 🩺 Binary Classification → Predict diagnosed_diabetes (Yes/No)
  • 🧮 Multiclass Classification → Predict diabetes_stage
  • 📊 Regression → Predict glucose_fasting, hba1c, or diabetes_risk_score
  • 🔍 EDA & Visualization → Explore lifestyle and clinical health patterns
  • 🧠 Machine Learning → Train ML/DL models for healthcare prediction tasks
  • 📈 Statistical Testing → Hypothesis testing on health indicators
Search
Clear search
Close search
Google apps
Main menu