73 datasets found
  1. U

    United States US: Diabetes Prevalence: % of Population Aged 20-79

    • ceicdata.com
    Updated Mar 15, 2009
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2009). United States US: Diabetes Prevalence: % of Population Aged 20-79 [Dataset]. https://www.ceicdata.com/en/united-states/health-statistics/us-diabetes-prevalence--of-population-aged-2079
    Explore at:
    Dataset updated
    Mar 15, 2009
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 1, 2017
    Area covered
    United States
    Description

    United States US: Diabetes Prevalence: % of Population Aged 20-79 data was reported at 10.790 % in 2017. United States US: Diabetes Prevalence: % of Population Aged 20-79 data is updated yearly, averaging 10.790 % from Dec 2017 (Median) to 2017, with 1 observations. United States US: Diabetes Prevalence: % of Population Aged 20-79 data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s USA – Table US.World Bank: Health Statistics. Diabetes prevalence refers to the percentage of people ages 20-79 who have type 1 or type 2 diabetes.; ; International Diabetes Federation, Diabetes Atlas.; Weighted average;

  2. CDC Diabetes Health Indicators

    • kaggle.com
    zip
    Updated Jul 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdelaziz Sami (2024). CDC Diabetes Health Indicators [Dataset]. https://www.kaggle.com/datasets/abdelazizsami/cdc-diabetes-health-indicators
    Explore at:
    zip(6324278 bytes)Available download formats
    Dataset updated
    Jul 21, 2024
    Authors
    Abdelaziz Sami
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Context:

    Diabetes is a widespread chronic disease affecting millions of Americans each year, imposing a substantial financial burden on the economy. It impairs the body's ability to regulate blood glucose levels, leading to a range of health issues such as heart disease, vision loss, limb amputation, and kidney disease. Diabetes occurs when the body either fails to produce sufficient insulin or cannot use the insulin produced effectively. Insulin is crucial for enabling cells to utilize sugars from the bloodstream for energy.

    Though there is no cure for diabetes, lifestyle changes such as weight management, healthy eating, and regular physical activity, along with medical treatments, can help manage the disease. Early detection and intervention are vital, making predictive models for diabetes risk valuable tools for healthcare providers and public health officials.

    As of 2018, the CDC reported that 34.2 million Americans have diabetes, with 88 million having prediabetes. Alarmingly, a significant portion of those affected are unaware of their condition. Type II diabetes, the most prevalent form, varies in prevalence based on age, education, income, location, race, and other social determinants of health. The economic impact is substantial, with diagnosed diabetes costing approximately $327 billion annually, and total costs, including undiagnosed cases and prediabetes, nearing $400 billion.

    Content: The dataset originates from the Behavioral Risk Factor Surveillance System (BRFSS), an annual telephone survey by the CDC since 1984, collecting data on health-related risk behaviors, chronic health conditions, and preventative service usage. For this project, the 2015 BRFSS dataset available on Kaggle was used, featuring responses from 441,455 individuals across 330 features.

    The dataset includes three files:

    1. diabetes_012_health_indicators_BRFSS2015.csv: Contains 253,680 responses with 21 features. The target variable, Diabetes_012, has 3 classes: 0 (no diabetes or only during pregnancy), 1 (prediabetes), and 2 (diabetes). This dataset is imbalanced.

    2. diabetes_binary_5050split_health_indicators_BRFSS2015.csv: Contains 70,692 responses with 21 features, balanced 50-50 between individuals with no diabetes and those with prediabetes or diabetes. The target variable, Diabetes_binary, has 2 classes: 0 (no diabetes) and 1 (prediabetes or diabetes).

    3. diabetes_binary_health_indicators_BRFSS2015.csv: Contains 253,680 responses with 21 features, with the target variable Diabetes_binary having 2 classes: 0 (no diabetes) and 1 (prediabetes or diabetes). This dataset is not balanced.

    Research Questions: - Can BRFSS survey questions accurately predict diabetes? - What risk factors are most indicative of diabetes risk? - Can a subset of risk factors effectively predict diabetes risk? - Can a shorter questionnaire be developed from the BRFSS using feature selection to predict diabetes risk?

    Acknowledgements: This dataset was not created by me; it is a cleaned and consolidated version of the BRFSS 2015 dataset available on Kaggle. The original dataset and the data cleaning notebook can be found here.

    Inspiration: This work was inspired by Zidian Xie et al.'s study on building risk prediction models for Type 2 diabetes using machine learning techniques on the 2014 BRFSS dataset. The study can be found here.

  3. Diabetes control is associated with environmental quality in the U.S.

    • catalog.data.gov
    Updated Jul 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2022). Diabetes control is associated with environmental quality in the U.S. [Dataset]. https://catalog.data.gov/dataset/diabetes-control-is-associated-with-environmental-quality-in-the-u-s
    Explore at:
    Dataset updated
    Jul 21, 2022
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Area covered
    United States
    Description

    Population-based county-level estimates for prevalence of DC were obtained from the Institute for Health Metrics and Evaluation (IHME) for the years 2004-2012 (16). DC prevalence rate was defined as the propor-tion of people within a county who had previously been diagnosed with diabetes (high fasting plasma glu-cose 126 mg/dL, hemoglobin A1c (HbA1c) of 6.5%, or diabetes diagnosis) but do not currently have high fasting plasma glucose or HbA1c for the period 2004-2012. DC prevalence estimates were calculated using a two-stage approach. The first stage used National Health and Nutrition Examination Survey (NHANES) data to predict high fasting plasma glucose (FPG) levels (≥126 mg/dL) and/or HbA1C levels (≥6.5% [48 mmol/mol]) based on self-reported demographic and behavioral characteristics (16). This model was then applied to Behavioral Risk Factor Surveillance System (BRFSS) data to impute high FPG and/or HbA1C status for each BRFSS respondent (16). The second stage used the imputed BRFSS data to fit a series of small area models, which were used to predict county-level prevalence of diabetes-related outcomes, including DC (16). The EQI was constructed for 2006-2010 for all US counties and is composed of five domains (air, water, built, land, and sociodemographic), each composed of variables to represent the environmental quality of that domain. Domain-specific EQIs were developed using principal components analysis (PCA) to reduce these variables within each domain while the overall EQI was constructed from a second PCA from these individual domains (L. C. Messer et al., 2014). To account for differences in environment across rural and urban counties, the overall and domain-specific EQIs were stratified by rural urban continuum codes (RUCCs) (U.S. Department of Agriculture, 2015). Results are reported as prevalence rate differences (PRD) with 95% confidence intervals (CIs) comparing the highest quintile/worst environmental quality to the lowest quintile/best environmental quality expo-sure metrics. PRDs are representative of the entire period of interest, 2004-2012. Due to availability of DC data and covariate data, not all counties were captured, however, the majority, 3134 of 3142 were utilized in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Human health data are not available publicly. EQI data are available at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: Data are stored as csv files. This dataset is associated with the following publication: Jagai, J., A. Krajewski, K. Price, D. Lobdell, and R. Sargis. Diabetes control is associated with environmental quality in the USA. Endocrine Connections. BioScientifica Ltd., Bristol, UK, 10(9): 1018-1026, (2021).

  4. Diabetes Health Indicators

    • kaggle.com
    zip
    Updated Mar 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Siamak Tahmasbi (2025). Diabetes Health Indicators [Dataset]. https://www.kaggle.com/datasets/siamaktahmasbi/diabetes-health-indicators
    Explore at:
    zip(4413929 bytes)Available download formats
    Dataset updated
    Mar 7, 2025
    Authors
    Siamak Tahmasbi
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context Diabetes is one of the most prevalent chronic diseases in the United States, affecting millions of Americans each year and placing a substantial financial burden on the economy. It is a serious chronic condition in which the body loses the ability to effectively regulate blood glucose levels, leading to a reduced quality of life and decreased life expectancy. During digestion, food is broken down into sugars, which enter the bloodstream. This triggers the pancreas to release insulin, a hormone that helps cells in the body use these sugars for energy. Diabetes is typically characterized by either insufficient insulin production or the body's inability to use insulin effectively.

    Chronic high blood sugar levels in individuals with diabetes can lead to severe complications, including heart disease, vision loss, kidney disease, and lower-limb amputation. Although there is no cure for diabetes, strategies such as maintaining a healthy weight, eating a balanced diet, staying physically active, and receiving medical treatments can help mitigate its effects. Early diagnosis is crucial, as it allows for lifestyle modifications and more effective treatment, making predictive models for assessing diabetes risk valuable tools for public health officials.

    The scale of the diabetes epidemic is significant. According to the Centers for Disease Control and Prevention (CDC), as of 2018, approximately 34.2 million Americans have diabetes, while 88 million have prediabetes. Alarmingly, the CDC estimates that 1 in 5 individuals with diabetes and about 8 in 10 individuals with prediabetes are unaware of their condition. Type II diabetes is the most common form, and its prevalence varies based on factors such as age, education, income, geographic location, race, and other social determinants of health. The burden of diabetes disproportionately affects those with lower socioeconomic status. The economic impact is also substantial, with the cost of diagnosed diabetes reaching approximately $327 billion annually, and total costs, including undiagnosed diabetes and prediabetes, nearing $400 billion each year.

    Content The Behavioral Risk Factor Surveillance System (BRFSS) is a health-related telephone survey that is collected annually by the CDC. Each year, the survey collects responses from over 400,000 Americans on health-related risk behaviors, chronic health conditions, and the use of preventative services. It has been conducted every year since 1984. For this project, a XPT of the dataset available on CDC website for the year 2023 was used. This original dataset contains responses from 433,323 individuals and has 345 features. These features are either questions directly asked of participants, or calculated variables based on individual participant responses.

    I have selected 20 features from this dataset that are suitable for working on the topic of diabetes, and I have saved them in a CSV file without making any changes to the data. The goal of this is to make it easier to work with the data. For more information or to access updated data, you can refer to the CDC website. I initially examined the original dataset from the CDC and found no duplicate entries. That dataset contains 330 columns and features. Therefore, the duplicate cases in this dataset are not due to errors but rather represent individuals with similar conditions. In my opinion, removing these entries would both introduce errors and reduce accuracy.

    Explore some of the following research questions: - Can survey questions from the BRFSS provide accurate predictions of whether an individual has diabetes? - What risk factors are most predictive of diabetes risk? - Can we use a subset of the risk factors to accurately predict whether an individual has diabetes? - Can we create a short form of questions from the BRFSS using feature selection to accurately predict if someone might have diabetes or is at high risk of diabetes?

    Acknowledgements It is important to reiterate that I did not create this dataset, it is simply a summarized and reformatted dataset derived from the BRFSS 2023 dataset available on the CDC website. It is also worth noting that none of the data in this dataset discloses individuals' identities.

    Inspiration Zidian Xie et al for Building Risk Prediction Models for Type 2 Diabetes Using Machine Learning Techniques using the 2014 BRFSS, and Alex Teboul for building Diabetes Health Indicators dataset based on BRFSS 2015 were the inspiration for creating this dataset and exploring the BRFSS in general.

  5. The association between environmental quality and diabetes in the U.S.

    • catalog.data.gov
    • s.cnmilf.com
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). The association between environmental quality and diabetes in the U.S. [Dataset]. https://catalog.data.gov/dataset/the-association-between-environmental-quality-and-diabetes-in-the-u-s
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    Population-based county-level estimates for diagnosed (DDP), undiagnosed (UDP), and total diabetes prevalence (TDP) were acquired from the Institute for Health Metrics and Evaluation (IHME) for the years 2004-2012 (Evaluation 2017). Prevalence estimates were calculated using a two-stage approach. The first stage used National Health and Nutrition Examination Survey (NHANES) data to predict high fasting plasma glucose (FPG) levels (≥126 mg/dL) and/or hemoglobin A1C (HbA1C) levels (≥6.5% [48 mmol/mol]) based on self-reported demographic and behavioral characteristics (Dwyer-Lindgren, Mackenbach et al. 2016). This model was then applied to Behavioral Risk Factor Surveillance System (BRFSS) data to impute high FPG and/or A1C status for each BRFSS respondent (Dwyer-Lindgren, Mackenbach et al. 2016). The second stage used the imputed BRFSS data to fit a series of small area models, which were used to predict the county-level prevalence of each of the diabetes-related outcomes (Dwyer-Lindgren, Mackenbach et al. 2016). Diagnosed diabetes was defined as the proportion of adults (age 20+ years) who reported a previous diabetes diagnosis, represented as an age-standardized prevalence percentage. Undiagnosed diabetes was defined as proportion of adults (age 20+ years) who have a high FPG or HbA1C but did not report a previous diagnosis of diabetes. Total diabetes was defined as the proportion of adults (age 20+ years) who reported a previous diabetes diagnosis and/or had a high FPG/HbA1C. The age-standardized diabetes prevalence (%) was used as the outcome. The EQI was constructed for 2000-2005 for all US counties and is composed of five domains (air, water, built, land, and sociodemographic), each composed of variables to represent the environmental quality of that domain. Domain-specific EQIs were developed using principal components analysis (PCA) to reduce these variables within each domain while the overall EQI was constructed from a second PCA from these individual domains (L. C. Messer et al., 2014). To account for differences in environment across rural and urban counties, the overall and domain-specific EQIs were stratified by rural urban continuum codes (RUCCs) (U.S. Department of Agriculture, 2015). This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Human health data are not available publicly. EQI data are available at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: Data are stored as csv files. This dataset is associated with the following publication: Jagai, J., A. Krajewski, S. Shaikh, D. Lobdell, and R. Sargis. Association between environmental quality and diabetes in the U.S.A.. Journal of Diabetes Investigation. John Wiley & Sons, Inc., Hoboken, NJ, USA, 11(2): 315-324, (2020).

  6. d

    Diabetes

    • catalog.data.gov
    • data.wprdc.org
    • +2more
    Updated Mar 14, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Allegheny County (2023). Diabetes [Dataset]. https://catalog.data.gov/dataset/diabetes
    Explore at:
    Dataset updated
    Mar 14, 2023
    Dataset provided by
    Allegheny County
    Description

    These datasets provide de-identified insurance data for diabetes. The data is provided by three managed care organizations in Allegheny County (Gateway Health Plan, Highmark Health, and UPMC) and represents their insured population for the 2015 and calendar years. Disclaimer: Users should be cautious of using administrative claims data as a measure of disease prevalence and interpreting trends over time, as data provided were collected for purposes other than surveillance. Limitations of these data include but are not limited to: misclassification, duplicate individuals, exclusion of individuals who did not seek care in past two years and those who are: uninsured, enrolled in plans not represented in the dataset, or were not enrolled in one of the represented plans for at least 90 days.

  7. Diabetes Health Indicators Dataset

    • kaggle.com
    zip
    Updated Nov 27, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jullien Nazreen (2023). Diabetes Health Indicators Dataset [Dataset]. https://www.kaggle.com/datasets/julnazz/diabetes-health-indicators-dataset/code
    Explore at:
    zip(5555220 bytes)Available download formats
    Dataset updated
    Nov 27, 2023
    Authors
    Jullien Nazreen
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Diabetes is a chronic health condition that affects how your body turns food into energy. There are three main types of diabetes: type 1, type 2, and gestational diabetes.

    • Type 1 diabetes is an autoimmune disease that causes your body to attack the cells in your pancreas that produce insulin. Insulin is a hormone that helps your body use glucose for energy.

    • Type 2 diabetes is the most common type of diabetes. It occurs when your body doesn't respond normally to insulin, or when your body doesn't produce enough insulin.

    • Gestational diabetes is a type of diabetes that develops during pregnancy. It usually goes away after the baby is born.

    Prevalence of Diabetes

    According to the CDC BRFSS 2021, 34.1 million adults in the United States have diabetes, or 10.5% of the adult population. This number has been increasing over time. In 2010, 29.1 million adults in the United States had diabetes, or 9.3% of the adult population.

    Content

    The Behavioral Risk Factor Surveillance System (BRFSS) is an ongoing, state-based telephone survey that collects data about health-related risk behaviors, chronic health conditions, and the use of preventive services among adults aged 18 years and older residing in the United States. Conducted annually by the Centers for Disease Control and Prevention (CDC), the BRFSS has been providing valuable insights into the health status and behaviors of U.S. adults since its inception in 1984.

    For this dataset, a csv of the 2021 BRFSS dataset available on Kaggle was used. The original dataset contains responses from 438,693 individuals and has 303 features. These features are either questions directly asked of participants, or calculated variables based on individual participant responses.

    This dataset contains 3 files:

    • diabetes_012_health_indicators_BRFSS2021.csv is a clean dataset of 236,378 survey responses to the CDC's BRFSS2021. The target variable Diabetes_012 has 3 classes. 0 is for no diabetes or only during pregnancy, 1 is for prediabetes, and 2 is for diabetes. There is class imbalance in this dataset. This dataset has 21 feature variables.

    • diabetes_binary_5050split_health_indicators_BRFSS2021.csv is a clean dataset of 67,136 survey responses to the CDC's BRFSS2021. It has an equal 50-50 split of respondents with no diabetes and with either prediabetes or diabetes. The target variable Diabetes_binary has 2 classes. 0 is for no diabetes, and 1 is for prediabetes or diabetes. This dataset has 21 feature variables and is balanced.

    • diabetes_binary_health_indicators_BRFSS2021.csv is a clean dataset of 236,378 survey responses to the CDC's BRFSS2021. The target variable Diabetes_binary has 2 classes. 0 is for no diabetes, and 1 is for prediabetes or diabetes. This dataset has 21 feature variables and is not balanced.

    Acknowledgements

    It it important to reiterate that I did not create this dataset, it is just a cleaned and consolidated dataset created from the BRFSS 2021 dataset already on Kaggle. That dataset can be found here and the notebook I used for the data cleaning can be found here.

    Inspiration

    Alex Teboul for Cleaning the dataset for Machine Learning use by using the 2015 BRFSS was the inspiration for creating this dataset and exploring the BRFSS in general.

  8. c

    Diabetes Health Indicators Dataset

    • cubig.ai
    zip
    Updated Jun 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Diabetes Health Indicators Dataset [Dataset]. https://cubig.ai/store/products/399/diabetes-health-indicators-dataset
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 5, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
    Description

    1) Data Introduction • The Diabetes Health Indicators Dataset is a large health dataset that collects various health indicators and lifestyle information related to diabetes diagnosis based on health surveys and medical records of the U.S. population.

    2) Data Utilization (1) Diabetes Health Indicators Dataset has characteristics that: • The dataset consists of more than 250,000 samples and contains more than 20 health and demographic variables, including diabetes (binary or triage label), age, gender, BMI, blood pressure, cholesterol, smoking and drinking habits, physical activity, mental health, income, and education level. (2) Diabetes Health Indicators Dataset can be used to: • Diabetes prediction model development: It can be used to develop machine learning-based classification models that use health indicators and lifestyle data to predict the risk of developing diabetes. • A Study on the Correlation between Lifestyle and Diabetes: It can be used in epidemiological and public health studies to analyze the effects of various lifestyle and demographic variables such as smoking, drinking, exercise, and eating habits on diabetes incidence.

  9. u

    Diabetes Mellitus death rates by county, 2019-2023 - Dataset - Healthy...

    • midb.uspatial.umn.edu
    Updated Oct 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Diabetes Mellitus death rates by county, 2019-2023 - Dataset - Healthy Communities Data Portal [Dataset]. https://midb.uspatial.umn.edu/hcdp/dataset/diabetes-mellitus-death-rates-by-county-2019-2023
    Explore at:
    Dataset updated
    Oct 24, 2025
    Description

    Diabetes Mellitus death rates by county, all races (includes Hispanic/Latino), all sexes, all ages, 2019-2023. Death data were provided by the National Vital Statistics System. Death rates (deaths per 100,000 population per year) are age-adjusted to the 2000 US standard population (20 age groups: <1, 1-4, 5-9, ... , 80-84, 85-89, 90+). Rates calculated using SEER*Stat. Population counts for denominators are based on Census populations as modified by the National Cancer Institute. The US Population Data File is used for mortality data. The Average Annual Percent Change is based onthe APCs calculated by the Joinpoint Regression Program (Version 4.9.0.0). Due to data availability issues, the time period used in the calculation of the joinpoint regression model may differ for selected counties. Counties with a (3) after their name may have their joinpoint regresssion model calculated using a different time period due to data availability issues.

  10. Cost of Diabetes in USA

    • kaggle.com
    zip
    Updated May 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Majid Ahmad Khan (2023). Cost of Diabetes in USA [Dataset]. https://www.kaggle.com/datasets/i191796majid/cost-of-diabetes-in-usa
    Explore at:
    zip(931703 bytes)Available download formats
    Dataset updated
    May 12, 2023
    Authors
    Majid Ahmad Khan
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The dataset is well-structured, hence convenient and time-saving. You need to go through preprocessing and visualization steps necessary for the machine learning process. The dataset description is mentioned in the "Attribute_Description.csv" file.

  11. Indicators of Heart Disease (2022 UPDATE)

    • kaggle.com
    zip
    Updated Oct 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kamil Pytlak (2023). Indicators of Heart Disease (2022 UPDATE) [Dataset]. https://www.kaggle.com/datasets/kamilpytlak/personal-key-indicators-of-heart-disease/discussion
    Explore at:
    zip(22474335 bytes)Available download formats
    Dataset updated
    Oct 12, 2023
    Authors
    Kamil Pytlak
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Key Indicators of Heart Disease

    2022 annual CDC survey data of 400k+ adults related to their health status

    What subject does the dataset cover?

    According to the CDC, heart disease is a leading cause of death for people of most races in the U.S. (African Americans, American Indians and Alaska Natives, and whites). About half of all Americans (47%) have at least 1 of 3 major risk factors for heart disease: high blood pressure, high cholesterol, and smoking. Other key indicators include diabetes status, obesity (high BMI), not getting enough physical activity, or drinking too much alcohol. Identifying and preventing the factors that have the greatest impact on heart disease is very important in healthcare. In turn, developments in computing allow the application of machine learning methods to detect "patterns" in the data that can predict a patient's condition.

    Where did the data set come from and what treatments has it undergone?

    The dataset originally comes from the CDC and is a major part of the Behavioral Risk Factor Surveillance System (BRFSS), which conducts annual telephone surveys to collect data on the health status of U.S. residents. As described by the CDC: "Established in 1984 with 15 states, BRFSS now collects data in all 50 states, the District of Columbia, and three U.S. territories. BRFSS completes more than 400,000 adult interviews each year, making it the largest continuously conducted health survey system in the world. The most recent dataset includes data from 2023. In this dataset, I noticed many factors (questions) that directly or indirectly influence heart disease, so I decided to select the most relevant variables from it. I also decided to share with you two versions of the most recent dataset: with NaNs and without it.

    What can you do with this data set?

    As described above, the original dataset of nearly 300 variables was reduced to 40variables. In addition to classical EDA, this dataset can be used to apply a number of machine learning methods, especially classifier models (logistic regression, SVM, random forest, etc.). You should treat the variable "HadHeartAttack" as binary ("Yes" - respondent had heart disease; "No" - respondent did not have heart disease). Note, however, that the classes are unbalanced, so the classic approach of applying a model is not advisable. Fixing the weights/undersampling should yield much better results. Based on the data set, I built a logistic regression model and embedded it in an application that might inspire you: https://share.streamlit.io/kamilpytlak/heart-condition-checker/main/app.py. Can you indicate which variables have a significant effect on the likelihood of heart disease?

    What steps did you use to convert the dataset?

    Check out this notebook in my GitHub repository: https://github.com/kamilpytlak/data-science-projects/blob/main/heart-disease-prediction/2022/notebooks/data_processing.ipynb

  12. Diabetes in Adults - CDPHE Community Level Estimates (Census Tracts)

    • data-cdphe.opendata.arcgis.com
    • hub.arcgis.com
    • +1more
    Updated May 12, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Colorado Department of Public Health and Environment (2016). Diabetes in Adults - CDPHE Community Level Estimates (Census Tracts) [Dataset]. https://data-cdphe.opendata.arcgis.com/datasets/diabetes-in-adults-cdphe-community-level-estimates-census-tracts
    Explore at:
    Dataset updated
    May 12, 2016
    Dataset authored and provided by
    Colorado Department of Public Health and Environmenthttps://cdphe.colorado.gov/
    Area covered
    Description

    These data represent the predicted (modeled) prevalence of Diabetes among adults (Age 18+) for each census tract in Colorado. Diabetes is defined as ever being diagnosed with Diabetes by a doctor, nurse, or other health professional, and this definition does not include gestational, borderline, or pre-diabetes.The estimate for each census tract represents an average that was derived from multiple years of Colorado Behavioral Risk Factor Surveillance System data (2014-2017).CDPHE used a model-based approach to measure the relationship between age, race, gender, poverty, education, location and health conditions or risk behavior indicators and applied this relationship to predict the number of persons' who have the health conditions or risk behavior for each census tract in Colorado. We then applied these probabilities, based on demographic stratification, to the 2013-2017 American Community Survey population estimates and determined the percentage of adults with the health conditions or risk behavior for each census tract in Colorado.The estimates are based on statistical models and are not direct survey estimates. Using the best available data, CDPHE was able to model census tract estimates based on demographic data and background knowledge about the distribution of specific health conditions and risk behaviors.The estimates are displayed in both the map and data table using point estimate values for each census tract and displayed using a Quintile range. The high and low value for each color on the map is calculated based on dividing the total number of census tracts in Colorado (1249) into five groups based on the total range of estimates for all Colorado census tracts. Each Quintile range represents roughly 20% of the census tracts in Colorado. No estimates are provided for census tracts with a known population of less than 50. These census tracts are displayed in the map as "No Est, Pop < 50."No estimates are provided for 7 census tracts with a known population of less than 50 or for the 2 census tracts that exclusively contain a federal correctional institution as 100% of their population. These 9 census tracts are displayed in the map as "No Estimate."

  13. m

    Tandem Diabetes Care Inc - Gross-Profit

    • macro-rankings.com
    csv, excel
    Updated Aug 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    macro-rankings (2025). Tandem Diabetes Care Inc - Gross-Profit [Dataset]. https://www.macro-rankings.com/markets/stocks/tndm-nasdaq/income-statement/gross-profit
    Explore at:
    excel, csvAvailable download formats
    Dataset updated
    Aug 24, 2025
    Dataset authored and provided by
    macro-rankings
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    united states
    Description

    Gross-Profit Time Series for Tandem Diabetes Care Inc. Tandem Diabetes Care, Inc. designs, develops, and commercializes technology solutions for people living with diabetes in the United States and internationally. The company's flagship product is the t:slim X2 insulin delivery system; and Tandem Mobi insulin pump, an automated insulin delivery system. It also sells single-use products, including cartridges for storing and delivering insulin, and infusion sets that connect the insulin pump to the user's body. In addition, the company offers Tandem Device Updater used to update the pump software from a personal computer; Tandem Source, a web-based data management platform, which provides a visual way to display diabetes therapy management data from the pumps, integrated CGMs; and Sugarmate, a mobile app used to help people visualize diabetes therapy data. It has collaboration agreement with the University of Virginia Center for Diabetes Technology for research and development of fully automated closed-loop insulin delivery systems. The company was formerly known as Phluid Inc. and changed its name to Tandem Diabetes Care, Inc. in January 2008. Tandem Diabetes Care, Inc. was incorporated in 2006 and is headquartered in San Diego, California.

  14. m

    Tandem Diabetes Care Inc - Net-Income-Including-Non-Controlling-Interests

    • macro-rankings.com
    csv, excel
    Updated Aug 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    macro-rankings (2025). Tandem Diabetes Care Inc - Net-Income-Including-Non-Controlling-Interests [Dataset]. https://www.macro-rankings.com/markets/stocks/tndm-nasdaq/income-statement/net-income-including-non-controlling-interests
    Explore at:
    csv, excelAvailable download formats
    Dataset updated
    Aug 23, 2025
    Dataset authored and provided by
    macro-rankings
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    united states
    Description

    Net-Income-Including-Non-Controlling-Interests Time Series for Tandem Diabetes Care Inc. Tandem Diabetes Care, Inc. designs, develops, and commercializes technology solutions for people living with diabetes in the United States and internationally. The company's flagship product is the t:slim X2 insulin delivery system; and Tandem Mobi insulin pump, an automated insulin delivery system. It also sells single-use products, including cartridges for storing and delivering insulin, and infusion sets that connect the insulin pump to the user's body. In addition, the company offers Tandem Device Updater used to update the pump software from a personal computer; Tandem Source, a web-based data management platform, which provides a visual way to display diabetes therapy management data from the pumps, integrated CGMs; and Sugarmate, a mobile app used to help people visualize diabetes therapy data. It has collaboration agreement with the University of Virginia Center for Diabetes Technology for research and development of fully automated closed-loop insulin delivery systems. The company was formerly known as Phluid Inc. and changed its name to Tandem Diabetes Care, Inc. in January 2008. Tandem Diabetes Care, Inc. was incorporated in 2006 and is headquartered in San Diego, California.

  15. Table_2_Association of HIV-1 Infection and Antiretroviral Therapy With Type...

    • frontiersin.figshare.com
    • datasetcatalog.nlm.nih.gov
    docx
    Updated Jun 10, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juan Carlos Lopez-Alvarenga; Dora A. Martinez; Alvaro Diaz-Badillo; Liza D. Morales; Rector Arya; Christopher P. Jenkinson; Joanne E. Curran; Donna M. Lehman; John Blangero; Ravindranath Duggirala; Srinivas Mummidi; Ruben D. Martinez (2023). Table_2_Association of HIV-1 Infection and Antiretroviral Therapy With Type 2 Diabetes in the Hispanic Population of the Rio Grande Valley, Texas, USA.docx [Dataset]. http://doi.org/10.3389/fmed.2021.676979.s003
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 10, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Juan Carlos Lopez-Alvarenga; Dora A. Martinez; Alvaro Diaz-Badillo; Liza D. Morales; Rector Arya; Christopher P. Jenkinson; Joanne E. Curran; Donna M. Lehman; John Blangero; Ravindranath Duggirala; Srinivas Mummidi; Ruben D. Martinez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Lower Rio Grande Valley, Texas, United States
    Description

    The Rio Grande Valley (RGV) in South Texas has one of the highest prevalence of obesity and type 2 diabetes (T2D) in the United States (US). We report for the first time the T2D prevalence in persons with HIV (PWH) in the RGV and the interrelationship between T2D, cardiometabolic risk factors, HIV-related indices, and antiretroviral therapies (ART). The PWH in this study received medical care at Valley AIDS Council (VAC) clinic sites located in Harlingen and McAllen, Texas. Henceforth, this cohort will be referred to as Valley AIDS Council Cohort (VACC). Cross-sectional analyses were conducted using retrospective data obtained from 1,827 registries. It included demographic and anthropometric variables, cardiometabolic traits, and HIV-related virological and immunological indices. For descriptive statistics, we used mean values of the quantitative variables from unbalanced visits across 20 months. Robust regression methods were used to determine the associations. For comparisons, we used cardiometabolic trait data obtained from HIV-uninfected San Antonio Mexican American Family Studies (SAMAFS; N = 2,498), and the Mexican American population in the National Health and Nutrition Examination Survey (HHANES; N = 5,989). The prevalence of T2D in VACC was 51% compared to 27% in SAMAFS and 19% in HHANES, respectively. The PWH with T2D in VACC were younger (4.7 years) and had lower BMI (BMI 2.43 units less) when compared to SAMAFS individuals. In contrast, VACC individuals had increased blood pressure and dyslipidemia. The increased T2D prevalence in VACC was independent of BMI. Within the VACC, ART was associated with viral load and CD4+ T cell counts but not with metabolic dysfunction. Notably, we found that individuals with any INSTI combination had higher T2D risk: OR 2.08 (95%CI 1.67, 2.6; p < 0.001). In summary, our results suggest that VACC individuals may develop T2D at younger ages independent of obesity. The high burden of T2D in these individuals necessitates rigorously designed longitudinal studies to draw potential causal inferences and develop better treatment regimens.

  16. N

    North America Self-monitoring Blood Glucose Market Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Jul 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). North America Self-monitoring Blood Glucose Market Report [Dataset]. https://www.datainsightsmarket.com/reports/north-america-self-monitoring-blood-glucose-market-8037
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    Jul 26, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    North America
    Variables measured
    Market Size
    Description

    The size of the North America Self-monitoring Blood Glucose Market was valued at USD 8.10 Million in 2023 and is projected to reach USD 12.99 Million by 2032, with an expected CAGR of 6.98% during the forecast period. Recent developments include: May 2023: LifeScan announced positive data from a study of real-world evidence supporting its Bluetooth-connected blood glucose meter. Evidence from more than 55,000 people with diabetes demonstrated sustained improvements in readings in range. The analysis focuses on changes over 180 days. LifeScan published results in the peer-reviewed journal Diabetes Therapy. The company’s OneTouch Bluetooth-connected blood glucose meter and mobile diabetes app provide simplicity, accuracy, and trust., January 2023: LifeScan announced that the peer-reviewed Journal of Diabetes Science and Technology published Improved Glycemic Control Using a Bluetooth-Connected Blood Glucose Meter and a Mobile Diabetes App: Real-World Evidence from Over 144,000 People With Diabetes, detailing results from a retrospective analysis of real-world data from over 144,000 people with diabetes is one of the largest combined blood glucose meter and mobile diabetes app datasets ever published.. Key drivers for this market are: Rising Prevalence of Cancer Worldwide, Technological Advancements in Diagnostic Testing; Increasing Demand for Point-of-care Treatment. Potential restraints include: High Cost of Molecular Diagnostic Tests, Lack of Skilled Workforce and Stringent Regulatory Framework. Notable trends are: Blood Glucose Test Strips Held the Largest Market Share in Current Year.

  17. Air Pollution and Health in the Jackson Heart Study: a Cohort of African...

    • catalog.data.gov
    • s.cnmilf.com
    Updated Jan 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2022). Air Pollution and Health in the Jackson Heart Study: a Cohort of African Americans in Jackson, Mississippi [Dataset]. https://catalog.data.gov/dataset/air-pollution-and-health-in-the-jackson-heart-study-a-cohort-of-african-americans-in-jacks
    Explore at:
    Dataset updated
    Jan 24, 2022
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Area covered
    Mississippi, Jackson
    Description

    Data include individual-level health data, including results from cardiovascular tests and medical history. This is linked to air quality data at participants' residence. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Data may be requested through the Jackson Heart Study. Format: Data include individual-level health data, including results from cardiovascular tests and medical history. This is linked to air quality data at participants' residence. Since these data contain PII, they cannot be released to ScienceHub. This dataset is associated with the following publications: Weaver, A., A. Bidulescu, G. Wellenius, D. Hickson, M. Sims, A. Vaidyanathan, W. Wu, A. Correa, and Y. Wang. Associations between Air Pollution Indicators and Prevalent and Incident Diabetes among African American Participants in the Jackson Heart Study. Environmental Epidemiology. Wolters Kluwer, Alphen aan den Rijn, NETHERLANDS, 5(3): e140, (2021). Weaver, A., Y. Wang, G. Wellenius, A. Bidulescu, M. Sims, A. Vaidyanathan, D. Hickson, D. Shimbo, M. Abdalla, K. Diaz, and S. Seals. Long-Term Air Pollution and Blood Pressure in an African American Cohort: The Jackson Heart Study. American Journal of Preventive Medicine. Elsevier B.V., Amsterdam, NETHERLANDS, 60(3): 397-405, (2021).

  18. o

    Association of a low-frequency variant in HNF1A with type 2 diabetes in a...

    • omicsdi.org
    xml
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SIGMA Type 2 Diabetes Consortium, Association of a low-frequency variant in HNF1A with type 2 diabetes in a Latino population. [Dataset]. https://www.omicsdi.org/dataset/biostudies/S-EPMC4425850
    Explore at:
    xmlAvailable download formats
    Authors
    SIGMA Type 2 Diabetes Consortium
    Variables measured
    Unknown
    Description

    Latino populations have one of the highest prevalences of type 2 diabetes worldwide.To investigate the association between rare protein-coding genetic variants and prevalence of type 2 diabetes in a large Latino population and to explore potential molecular and physiological mechanisms for the observed relationships.Whole-exome sequencing was performed on DNA samples from 3756 Mexican and US Latino individuals (1794 with type 2 diabetes and 1962 without diabetes) recruited from 1993 to 2013. One variant was further tested for allele frequency and association with type 2 diabetes in large multiethnic data sets of 14,276 participants and characterized in experimental assays.Prevalence of type 2 diabetes. Secondary outcomes included age of onset, body mass index, and effect on protein function.A single rare missense variant (c.1522G>A [p.E508K]) was associated with type 2 diabetes prevalence (odds ratio [OR], 5.48; 95% CI, 2.83-10.61; P = 4.4 × 10(-7)) in hepatocyte nuclear factor 1-? (HNF1A), the gene responsible for maturity onset diabetes of the young type 3 (MODY3). This variant was observed in 0.36% of participants without type 2 diabetes and 2.1% of participants with it. In multiethnic replication data sets, the p.E508K variant was seen only in Latino patients (n = 1443 with type 2 diabetes and 1673 without it) and was associated with type 2 diabetes (OR, 4.16; 95% CI, 1.75-9.92; P =?.0013). In experimental assays, HNF-1A protein encoding the p.E508K mutant demonstrated reduced transactivation activity of its target promoter compared with a wild-type protein. In our data, carriers and noncarriers of the p.E508K mutation with type 2 diabetes had no significant differences in compared clinical characteristics, including age at onset. The mean (SD) age for carriers was 45.3 years (11.2) vs 47.5 years (11.5) for noncarriers (P =?.49) and the mean (SD) BMI for carriers was 28.2 (5.5) vs 29.3 (5.3) for noncarriers (P =?.19).Using whole-exome sequencing, we identified a single low-frequency variant in the MODY3-causing gene HNF1A that is associated with type 2 diabetes in Latino populations and may affect protein function. This finding may have implications for screening and therapeutic modification in this population, but additional studies are required.

  19. f

    Sample sizes of diabetes patients with COVID-19 hospitalization across...

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Sep 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ni Luh Putu S. P. Paramita; Joseph K. Agor; Maria E. Mayorga; Julie S. Ivy; Kristen E. Miller; Osman Y. Ozaltin (2023). Sample sizes of diabetes patients with COVID-19 hospitalization across different demographic groups. [Dataset]. http://doi.org/10.1371/journal.pone.0286815.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Sep 28, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Ni Luh Putu S. P. Paramita; Joseph K. Agor; Maria E. Mayorga; Julie S. Ivy; Kristen E. Miller; Osman Y. Ozaltin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sample sizes of diabetes patients with COVID-19 hospitalization across different demographic groups.

  20. m

    Embecta Corp - Research-and-Development

    • macro-rankings.com
    csv, excel
    Updated Nov 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    macro-rankings (2025). Embecta Corp - Research-and-Development [Dataset]. https://www.macro-rankings.com/markets/stocks/embc-nasdaq/income-statement/research-and-development
    Explore at:
    csv, excelAvailable download formats
    Dataset updated
    Nov 13, 2025
    Dataset authored and provided by
    macro-rankings
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    united states
    Description

    Research-and-Development Time Series for Embecta Corp. Embecta Corp., a medical device company, focuses on the provision of various solutions to enhance the health and wellbeing of people living with diabetes in the United States and internationally. The company's products include pen needles, syringes, and safety injection devices, as well as digital applications to assist people with managing patient's diabetes. It primarily sells its products to wholesalers and distributors. The company was formerly known as Berra Newco, Inc. Embecta Corp. was founded in 1924 and is headquartered in Parsippany, New Jersey.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
CEICdata.com (2009). United States US: Diabetes Prevalence: % of Population Aged 20-79 [Dataset]. https://www.ceicdata.com/en/united-states/health-statistics/us-diabetes-prevalence--of-population-aged-2079

United States US: Diabetes Prevalence: % of Population Aged 20-79

Explore at:
Dataset updated
Mar 15, 2009
Dataset provided by
CEICdata.com
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered
Dec 1, 2017
Area covered
United States
Description

United States US: Diabetes Prevalence: % of Population Aged 20-79 data was reported at 10.790 % in 2017. United States US: Diabetes Prevalence: % of Population Aged 20-79 data is updated yearly, averaging 10.790 % from Dec 2017 (Median) to 2017, with 1 observations. United States US: Diabetes Prevalence: % of Population Aged 20-79 data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s USA – Table US.World Bank: Health Statistics. Diabetes prevalence refers to the percentage of people ages 20-79 who have type 1 or type 2 diabetes.; ; International Diabetes Federation, Diabetes Atlas.; Weighted average;

Search
Clear search
Close search
Google apps
Main menu