Facebook
TwitterPopulation-based county-level estimates for prevalence of DC were obtained from the Institute for Health Metrics and Evaluation (IHME) for the years 2004-2012 (16). DC prevalence rate was defined as the propor-tion of people within a county who had previously been diagnosed with diabetes (high fasting plasma glu-cose ďł126 mg/dL, hemoglobin A1c (HbA1c) of ďł6.5%, or diabetes diagnosis) but do not currently have high fasting plasma glucose or HbA1c for the period 2004-2012. DC prevalence estimates were calculated using a two-stage approach. The first stage used National Health and Nutrition Examination Survey (NHANES) data to predict high fasting plasma glucose (FPG) levels (âĽ126 mg/dL) and/or HbA1C levels (âĽ6.5% [48 mmol/mol]) based on self-reported demographic and behavioral characteristics (16). This model was then applied to Behavioral Risk Factor Surveillance System (BRFSS) data to impute high FPG and/or HbA1C status for each BRFSS respondent (16). The second stage used the imputed BRFSS data to fit a series of small area models, which were used to predict county-level prevalence of diabetes-related outcomes, including DC (16). The EQI was constructed for 2006-2010 for all US counties and is composed of five domains (air, water, built, land, and sociodemographic), each composed of variables to represent the environmental quality of that domain. Domain-specific EQIs were developed using principal components analysis (PCA) to reduce these variables within each domain while the overall EQI was constructed from a second PCA from these individual domains (L. C. Messer et al., 2014). To account for differences in environment across rural and urban counties, the overall and domain-specific EQIs were stratified by rural urban continuum codes (RUCCs) (U.S. Department of Agriculture, 2015). Results are reported as prevalence rate differences (PRD) with 95% confidence intervals (CIs) comparing the highest quintile/worst environmental quality to the lowest quintile/best environmental quality expo-sure metrics. PRDs are representative of the entire period of interest, 2004-2012. Due to availability of DC data and covariate data, not all counties were captured, however, the majority, 3134 of 3142 were utilized in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Human health data are not available publicly. EQI data are available at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: Data are stored as csv files. This dataset is associated with the following publication: Jagai, J., A. Krajewski, K. Price, D. Lobdell, and R. Sargis. Diabetes control is associated with environmental quality in the USA. Endocrine Connections. BioScientifica Ltd., Bristol, UK, 10(9): 1018-1026, (2021).
Facebook
TwitterPopulation-based county-level estimates for diagnosed (DDP), undiagnosed (UDP), and total diabetes prevalence (TDP) were acquired from the Institute for Health Metrics and Evaluation (IHME) for the years 2004-2012 (Evaluation 2017). Prevalence estimates were calculated using a two-stage approach. The first stage used National Health and Nutrition Examination Survey (NHANES) data to predict high fasting plasma glucose (FPG) levels (âĽ126 mg/dL) and/or hemoglobin A1C (HbA1C) levels (âĽ6.5% [48 mmol/mol]) based on self-reported demographic and behavioral characteristics (Dwyer-Lindgren, Mackenbach et al. 2016). This model was then applied to Behavioral Risk Factor Surveillance System (BRFSS) data to impute high FPG and/or A1C status for each BRFSS respondent (Dwyer-Lindgren, Mackenbach et al. 2016). The second stage used the imputed BRFSS data to fit a series of small area models, which were used to predict the county-level prevalence of each of the diabetes-related outcomes (Dwyer-Lindgren, Mackenbach et al. 2016). Diagnosed diabetes was defined as the proportion of adults (age 20+ years) who reported a previous diabetes diagnosis, represented as an age-standardized prevalence percentage. Undiagnosed diabetes was defined as proportion of adults (age 20+ years) who have a high FPG or HbA1C but did not report a previous diagnosis of diabetes. Total diabetes was defined as the proportion of adults (age 20+ years) who reported a previous diabetes diagnosis and/or had a high FPG/HbA1C. The age-standardized diabetes prevalence (%) was used as the outcome. The EQI was constructed for 2000-2005 for all US counties and is composed of five domains (air, water, built, land, and sociodemographic), each composed of variables to represent the environmental quality of that domain. Domain-specific EQIs were developed using principal components analysis (PCA) to reduce these variables within each domain while the overall EQI was constructed from a second PCA from these individual domains (L. C. Messer et al., 2014). To account for differences in environment across rural and urban counties, the overall and domain-specific EQIs were stratified by rural urban continuum codes (RUCCs) (U.S. Department of Agriculture, 2015). This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Human health data are not available publicly. EQI data are available at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: Data are stored as csv files. This dataset is associated with the following publication: Jagai, J., A. Krajewski, S. Shaikh, D. Lobdell, and R. Sargis. Association between environmental quality and diabetes in the U.S.A.. Journal of Diabetes Investigation. John Wiley & Sons, Inc., Hoboken, NJ, USA, 11(2): 315-324, (2020).
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Context Diabetes is one of the most prevalent chronic diseases in the United States, affecting millions of Americans each year and placing a substantial financial burden on the economy. It is a serious chronic condition in which the body loses the ability to effectively regulate blood glucose levels, leading to a reduced quality of life and decreased life expectancy. During digestion, food is broken down into sugars, which enter the bloodstream. This triggers the pancreas to release insulin, a hormone that helps cells in the body use these sugars for energy. Diabetes is typically characterized by either insufficient insulin production or the body's inability to use insulin effectively.
Chronic high blood sugar levels in individuals with diabetes can lead to severe complications, including heart disease, vision loss, kidney disease, and lower-limb amputation. Although there is no cure for diabetes, strategies such as maintaining a healthy weight, eating a balanced diet, staying physically active, and receiving medical treatments can help mitigate its effects. Early diagnosis is crucial, as it allows for lifestyle modifications and more effective treatment, making predictive models for assessing diabetes risk valuable tools for public health officials.
The scale of the diabetes epidemic is significant. According to the Centers for Disease Control and Prevention (CDC), as of 2018, approximately 34.2 million Americans have diabetes, while 88 million have prediabetes. Alarmingly, the CDC estimates that 1 in 5 individuals with diabetes and about 8 in 10 individuals with prediabetes are unaware of their condition. Type II diabetes is the most common form, and its prevalence varies based on factors such as age, education, income, geographic location, race, and other social determinants of health. The burden of diabetes disproportionately affects those with lower socioeconomic status. The economic impact is also substantial, with the cost of diagnosed diabetes reaching approximately $327 billion annually, and total costs, including undiagnosed diabetes and prediabetes, nearing $400 billion each year.
Content The Behavioral Risk Factor Surveillance System (BRFSS) is a health-related telephone survey that is collected annually by the CDC. Each year, the survey collects responses from over 400,000 Americans on health-related risk behaviors, chronic health conditions, and the use of preventative services. It has been conducted every year since 1984. For this project, a XPT of the dataset available on CDC website for the year 2023 was used. This original dataset contains responses from 433,323 individuals and has 345 features. These features are either questions directly asked of participants, or calculated variables based on individual participant responses.
I have selected 20 features from this dataset that are suitable for working on the topic of diabetes, and I have saved them in a CSV file without making any changes to the data. The goal of this is to make it easier to work with the data. For more information or to access updated data, you can refer to the CDC website. I initially examined the original dataset from the CDC and found no duplicate entries. That dataset contains 330 columns and features. Therefore, the duplicate cases in this dataset are not due to errors but rather represent individuals with similar conditions. In my opinion, removing these entries would both introduce errors and reduce accuracy.
Explore some of the following research questions: - Can survey questions from the BRFSS provide accurate predictions of whether an individual has diabetes? - What risk factors are most predictive of diabetes risk? - Can we use a subset of the risk factors to accurately predict whether an individual has diabetes? - Can we create a short form of questions from the BRFSS using feature selection to accurately predict if someone might have diabetes or is at high risk of diabetes?
Acknowledgements It is important to reiterate that I did not create this dataset, it is simply a summarized and reformatted dataset derived from the BRFSS 2023 dataset available on the CDC website. It is also worth noting that none of the data in this dataset discloses individuals' identities.
Inspiration Zidian Xie et al for Building Risk Prediction Models for Type 2 Diabetes Using Machine Learning Techniques using the 2014 BRFSS, and Alex Teboul for building Diabetes Health Indicators dataset based on BRFSS 2015 were the inspiration for creating this dataset and exploring the BRFSS in general.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
*(Age-standardised incidence rates per 100,000 individuals per year with 95% confidence intervals. â For cells labeled as NA, 95% CIs could not be estimated as there was only 1 data point).
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Diabetes is among the most prevalent chronic diseases in the United States, impacting millions of Americans each year and exerting a significant financial burden on the economy. Diabetes is a serious chronic disease in which individuals lose the ability to effectively regulate levels of glucose in the blood, and can lead to reduced quality of life and life expectancy. After different foods are broken down into sugars during digestion, the sugars are then released into the bloodstream. This signals the pancreas to release insulin. Insulin helps enable cells within the body to use those sugars in the bloodstream for energy. Diabetes is generally characterized by either the body not making enough insulin or being unable to use the insulin that is made as effectively as needed.
Complications like heart disease, vision loss, lower-limb amputation, and kidney disease are associated with chronically high levels of sugar remaining in the bloodstream for those with diabetes. While there is no cure for diabetes, strategies like losing weight, eating healthily, being active, and receiving medical treatments can mitigate the harms of this disease in many patients. Early diagnosis can lead to lifestyle changes and more effective treatment, making predictive models for diabetes risk important tools for public and public health officials.
The scale of this problem is also important to recognize. The Centers for Disease Control and Prevention has indicated that as of 2018, 34.2 million Americans have diabetes and 88 million have prediabetes. Furthermore, the CDC estimates that 1 in 5 diabetics, and roughly 8 in 10 prediabetics are unaware of their risk. While there are different types of diabetes, type II diabetes is the most common form and its prevalence varies by age, education, income, location, race, and other social determinants of health. Much of the burden of the disease falls on those of lower socioeconomic status as well. Diabetes also places a massive burden on the economy, with diagnosed diabetes costs of roughly $327 billion dollars and total costs with undiagnosed diabetes and prediabetes approaching $400 billion dollars annually.
The Behavioral Risk Factor Surveillance System (BRFSS) is a health-related telephone survey that is collected annually by the CDC. Each year, the survey collects responses from over 400,000 Americans on health-related risk behaviors, chronic health conditions, and the use of preventative services. It has been conducted every year since 1984. For this project, a csv of the dataset available on Kaggle for the year 2015 was used. This original dataset contains responses from 441,455 individuals and has 330 features. These features are either questions directly asked of participants, or calculated variables based on individual participant responses.
This dataset contains 3 files: 1. diabetes _ 012 _ health _ indicators _ BRFSS2015.csv is a clean dataset of 253,680 survey responses to the CDC's BRFSS2015. The target variable Diabetes_012 has 3 classes. 0 is for no diabetes or only during pregnancy, 1 is for prediabetes, and 2 is for diabetes. There is class imbalance in this dataset. This dataset has 21 feature variables 2. diabetes _ binary _ 5050split _ health _ indicators _ BRFSS2015.csv is a clean dataset of 70,692 survey responses to the CDC's BRFSS2015. It has an equal 50-50 split of respondents with no diabetes and with either prediabetes or diabetes. The target variable Diabetes_binary has 2 classes. 0 is for no diabetes, and 1 is for prediabetes or diabetes. This dataset has 21 feature variables and is balanced. 3. diabetes _ binary _ health _ indicators _ BRFSS2015.csv is a clean dataset of 253,680 survey responses to the CDC's BRFSS2015. The target variable Diabetes_binary has 2 classes. 0 is for no diabetes, and 1 is for prediabetes or diabetes. This dataset has 21 feature variables and is not balanced.
Explore some of the following research questions: 1. Can survey questions from the BRFSS provide accurate predictions of whether an individual has diabetes? 2. What risk factors are most predictive of diabetes risk? 3. Can we use a subset of the risk factors to accurately predict whether an individual has diabetes? 4. Can we create a short form of questions from the BRFSS using feature selection to accurately predict if someone might have diabetes or is at high risk of diabetes?
It it important to reiterate that I did not create this dataset, it is just a cleaned and consolidated dataset created from the BRFSS 2015 dataset already on Kaggle. That dataset can be found here and the notebook I used for the data cleaning can be found here.
Zidian Xie et al fo...
Facebook
TwitterThese data represent the predicted (modeled) prevalence of Diabetes among adults (Age 18+) for each census tract in Colorado. Diabetes is defined as ever being diagnosed with Diabetes by a doctor, nurse, or other health professional, and this definition does not include gestational, borderline, or pre-diabetes.The estimate for each census tract represents an average that was derived from multiple years of Colorado Behavioral Risk Factor Surveillance System data (2014-2017).CDPHE used a model-based approach to measure the relationship between age, race, gender, poverty, education, location and health conditions or risk behavior indicators and applied this relationship to predict the number of persons' who have the health conditions or risk behavior for each census tract in Colorado. We then applied these probabilities, based on demographic stratification, to the 2013-2017 American Community Survey population estimates and determined the percentage of adults with the health conditions or risk behavior for each census tract in Colorado.The estimates are based on statistical models and are not direct survey estimates. Using the best available data, CDPHE was able to model census tract estimates based on demographic data and background knowledge about the distribution of specific health conditions and risk behaviors.The estimates are displayed in both the map and data table using point estimate values for each census tract and displayed using a Quintile range. The high and low value for each color on the map is calculated based on dividing the total number of census tracts in Colorado (1249) into five groups based on the total range of estimates for all Colorado census tracts. Each Quintile range represents roughly 20% of the census tracts in Colorado. No estimates are provided for census tracts with a known population of less than 50. These census tracts are displayed in the map as "No Est, Pop < 50."No estimates are provided for 7 census tracts with a known population of less than 50 or for the 2 census tracts that exclusively contain a federal correctional institution as 100% of their population. These 9 census tracts are displayed in the map as "No Estimate."
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Summary
The Pubmed Diabetes dataset consists of 19,717 scientific publications from the PubMed database pertaining to diabetes, classified into one of three classes. The classes are as follows:
Experimental Diabetes Type 1 Diabetes Type 2 Diabetes
Dataset Structure
Data Fields
paper_id: The PubMed ID. title: The PubMed paper title. abstract: The PubMed paper abstract. label: The class label assigned to the paper. predicted_ranked_labels: The most⌠See the full description on the dataset page: https://huggingface.co/datasets/devanshamin/PubMedDiabetes-LLM-Predictions.
Facebook
TwitterDiabetic kidney disease (DKD) is the most common etiology of chronic kidney disease (CKD) in the industrialized world and accounts for much of the excess mortality in patients with diabetes mellitus. Approximately 45% of U.S. patients with incident end-stage kidney disease (ESKD) have DKD. Independent of glycemic control, DKD aggregates in families and has higher incidence rates in African, Mexican, and American Indian ancestral groups relative to European populations. The Family Investigation of Nephropathy and Diabetes (FIND) performed a genome-wide association study (GWAS) contrasting 6,197 unrelated individuals with advanced DKD with healthy and diabetic individuals lacking nephropathy of European American, African American, Mexican American, or American Indian ancestry. A large-scale replication and trans-ethnic meta-analysis included 7,539 additional European American, African American and American Indian DKD cases and non-nephropathy controls. Within ethnic group meta-analysis of discovery GWAS and replication set results identified genome-wide significant evidence for association between DKD and rs12523822 on chromosome 6q25.2 in American Indians (P = 5.74x10-9). The strongest signal of association in the trans-ethnic meta-analysis was with a SNP in strong linkage disequilibrium with rs12523822 (rs955333; P = 1.31x10-8), with directionally consistent results across ethnic groups. These 6q25.2 SNPs are located between the SCAF8 and CNKSR3 genes, a region with DKD relevant changes in gene expression and an eQTL with IPCEF1, a gene co-translated with CNKSR3. Several other SNPs demonstrated suggestive evidence of association with DKD, within and across populations. These data identify a novel DKD susceptibility locus with consistent directions of effect across diverse ancestral groups and provide insight into the genetic architecture of DKD.
Facebook
TwitterAim: Non-alcoholic fatty liver disease (NAFLD) exhibits a racial disparity. We examined the prevalence and the association between race, gender, and NAFLD among prediabetes and diabetes populations among adults in the United States.Methods: We analyzed data for 3,190 individuals âĽ18 years old from the National Health and Nutrition Examination Survey (NHANES) 2017â2018. NAFLD was diagnosed by FibroScanÂŽ using controlled attenuation parameter (CAP) values: S0 (none) < 238, S1 (mild) = 238â259, S2 (moderate) = 260â290, S3 (severe) > 290. Data were analyzed using Chi-square test and multinomial logistic regression, adjusting for confounding variables and considering the design and sample weights.Results: Of the 3,190 subjects, the prevalence of NAFLD was 82.6%, 56.4%, and 30.5% (p < 0.0001) among diabetes, prediabetes and normoglycemia populations respectively. Mexican American males with prediabetes or diabetes had the highest prevalence of severe NAFLD relative to other racial/ethnic groups (p < 0.05). In the adjusted model, among the total, prediabetes, and diabetes populations, a one unit increase in HbA1c was associated with higher odds of severe NAFLD [adjusted odds ratio (AOR) = 1.8, 95% confidence level (CI) = 1.4â2.3, p < 0.0001; AOR = 2.2, 95% CI = 1.1â4.4, p = 0.033; and AOR = 1.5, 95% CI = 1.1â1.9, p = 0.003 respectively].Conclusion: We found that prediabetes and diabetes populations had a high prevalence and higher odds of NAFLD relative to the normoglycemic population and HbA1c is an independent predictor of NAFLD severity in prediabetes and diabetes populations. Healthcare providers should screen prediabetes and diabetes populations for early detection of NAFLD and initiate treatments including lifestyle modification to prevent the progression to non-alcoholic steatohepatitis or liver cancer.
Facebook
TwitterGenomic data set on Type 2 Diabetes in African-Americans derived via admixture mapping, a method for genome-wide association analysis based on admixture-generated linkage disequilibrium. This collaborative group has identified 1,478 African Americans with Type 2 Diabetes (T2D) from the Jackson Heart Study and Multiethnic Cohort Study, as well as 498 controls from the Jackson Heart Study who are normoglycemic despite high body mass index and older age. All samples were genotyped (using the Illumina BeadLab platform) for 1,291 polymorphic markers chosen to be extremely different in frequency between west Africans and European Americans. Evidence for association to diabetes at each marker as reported by the ANCESTRYMAP software are reported in the downloadable table. They calculate that this study has statistical power to detect loci where African or European ancestry on average confers multiplicative increased risk of 1.35-fold or more. The fact that they did not detect a statistically significant signal of association in the scan suggests that any genetic risk factors for T2D do not confer different risks due to ancestry that differ by this factor. The genome scan results are publicly available (Excel file) prior to publication so that researchers interested in the genetics of T2D can use the results of the scan to prioritize follow-up of any regions of interest.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Diabetes Dataset â Exploratory Data Analysis (EDA)
This repository contains a diabetes-related tabular dataset and a complete Exploratory Data Analysis (EDA).The main objective of this project was to learn how to conduct a structured EDA, apply best practices, and extract meaningful insights from real-world health data.
The analysis includes correlations, distributions, group comparisons, class balance exploration, and statistical interpretations that illustrate how different⌠See the full description on the dataset page: https://huggingface.co/datasets/guyshilo12/diabetes_eda_analysis.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Context
Diabetes is a chronic health condition that affects how your body turns food into energy. There are three main types of diabetes: type 1, type 2, and gestational diabetes.
Type 1 diabetes is an autoimmune disease that causes your body to attack the cells in your pancreas that produce insulin. Insulin is a hormone that helps your body use glucose for energy.
Type 2 diabetes is the most common type of diabetes. It occurs when your body doesn't respond normally to insulin, or when your body doesn't produce enough insulin.
Gestational diabetes is a type of diabetes that develops during pregnancy. It usually goes away after the baby is born.
Prevalence of Diabetes
According to the CDC BRFSS 2021, 34.1 million adults in the United States have diabetes, or 10.5% of the adult population. This number has been increasing over time. In 2010, 29.1 million adults in the United States had diabetes, or 9.3% of the adult population.
Content
The Behavioral Risk Factor Surveillance System (BRFSS) is an ongoing, state-based telephone survey that collects data about health-related risk behaviors, chronic health conditions, and the use of preventive services among adults aged 18 years and older residing in the United States. Conducted annually by the Centers for Disease Control and Prevention (CDC), the BRFSS has been providing valuable insights into the health status and behaviors of U.S. adults since its inception in 1984.
For this dataset, a csv of the 2021 BRFSS dataset available on Kaggle was used. The original dataset contains responses from 438,693 individuals and has 303 features. These features are either questions directly asked of participants, or calculated variables based on individual participant responses.
This dataset contains 3 files:
diabetes_012_health_indicators_BRFSS2021.csv is a clean dataset of 236,378 survey responses to the CDC's BRFSS2021. The target variable Diabetes_012 has 3 classes. 0 is for no diabetes or only during pregnancy, 1 is for prediabetes, and 2 is for diabetes. There is class imbalance in this dataset. This dataset has 21 feature variables.
diabetes_binary_5050split_health_indicators_BRFSS2021.csv is a clean dataset of 67,136 survey responses to the CDC's BRFSS2021. It has an equal 50-50 split of respondents with no diabetes and with either prediabetes or diabetes. The target variable Diabetes_binary has 2 classes. 0 is for no diabetes, and 1 is for prediabetes or diabetes. This dataset has 21 feature variables and is balanced.
diabetes_binary_health_indicators_BRFSS2021.csv is a clean dataset of 236,378 survey responses to the CDC's BRFSS2021. The target variable Diabetes_binary has 2 classes. 0 is for no diabetes, and 1 is for prediabetes or diabetes. This dataset has 21 feature variables and is not balanced.
Acknowledgements
It it important to reiterate that I did not create this dataset, it is just a cleaned and consolidated dataset created from the BRFSS 2021 dataset already on Kaggle. That dataset can be found here and the notebook I used for the data cleaning can be found here.
Inspiration
Alex Teboul for Cleaning the dataset for Machine Learning use by using the 2015 BRFSS was the inspiration for creating this dataset and exploring the BRFSS in general.
Facebook
Twitterhttps://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The size of the North America Self-monitoring Blood Glucose Market was valued at USD 8.10 Million in 2023 and is projected to reach USD 12.99 Million by 2032, with an expected CAGR of 6.98% during the forecast period. Recent developments include: May 2023: LifeScan announced positive data from a study of real-world evidence supporting its Bluetooth-connected blood glucose meter. Evidence from more than 55,000 people with diabetes demonstrated sustained improvements in readings in range. The analysis focuses on changes over 180 days. LifeScan published results in the peer-reviewed journal Diabetes Therapy. The companyâs OneTouch Bluetooth-connected blood glucose meter and mobile diabetes app provide simplicity, accuracy, and trust., January 2023: LifeScan announced that the peer-reviewed Journal of Diabetes Science and Technology published Improved Glycemic Control Using a Bluetooth-Connected Blood Glucose Meter and a Mobile Diabetes App: Real-World Evidence from Over 144,000 People With Diabetes, detailing results from a retrospective analysis of real-world data from over 144,000 people with diabetes is one of the largest combined blood glucose meter and mobile diabetes app datasets ever published.. Key drivers for this market are: Rising Prevalence of Cancer Worldwide, Technological Advancements in Diagnostic Testing; Increasing Demand for Point-of-care Treatment. Potential restraints include: High Cost of Molecular Diagnostic Tests, Lack of Skilled Workforce and Stringent Regulatory Framework. Notable trends are: Blood Glucose Test Strips Held the Largest Market Share in Current Year.
Facebook
TwitterDataset name: Diabetes 130-US hospitals for years 1999-2008 Data Set
Background: Diabetes Mellitus (DM) is a chronic disease where the blood has high sugar level. It can occur when the pancreas does not produce enough insulin, or when the body cannot effectively use the insulin it produces (WHO). Diabetes is a progressive disease that can lead to a significant number of health complications and profoundly reduce the quality of life. While many diabetic patients manage the health complication with diet and exercise, some require medications to control blood glucose level. As published by a research article named âThe relationship between diabetes mellitus and 30-day readmission ratesâ, it is estimated that 9.3% of the population in the United States have diabetes mellitus (DM), 28% of which are undiagnosed. In recent years, government agencies and healthcare systems have increasingly focused on 30-day readmission rates to determine the complexity of their patient populations and to improve quality. Thirty-day readmission rates for hospitalized patients with DM are reported to be between 14.4 and 22.7%, much higher than the rate for all hospitalized patients (8.5â13.5%).
Problem Statement: To identify the factors that lead to the high readmission rate of diabetic patients within 30 days post discharge and correspondingly to predict the high-risk diabetic-patients who are most likely to get readmitted within 30 days so that the quality of care can be improved along with improved patientâs experience, health of the population and reduce costs by lowering readmission rates. Also, to identify the medicines that are the most effective in treating diabetes.
Impact on business: Hospital readmission is an important contributor to total medical expenditures and is an emerging indicator of quality of care. Diabetes, similar to other chronic medical conditions, is associated with increased risk of hospital readmission. As mentioned in the article âCorrection to: Hospital Readmission of Patients with Diabetesâ, hospital readmission is a high-priority health care quality measure and target for cost reduction, particularly within 30 days of discharge. The burden of diabetes among hospitalized patients is substantial, growing, and costly, and readmissions contribute a significant portion of this burden. Reducing readmission rates among patients with diabetes has the potential to greatly reduce health care costs while simultaneously improving care. Our aim is to provide some insights into the risk factors for readmission and also to identify the medicines that are the most effective in treating diabetes.
Variable identification: 1. Independent variables (49): encounter_id, patient_nbr, race, gender, age, weight, admission_type_id, discharge_disposition_id, admission_source_id, time_in_hospital, payer_code, medical_specialty, num_lab_procedures, num_procedures, num_medications, number_outpatient, number_emergency, number_inpatient, diag_1, diag_2, diag_3, number_diagnoses, max_glu_serum, A1Cresult, metformin, repaglinide, nateglinide, chlorpropamide, glimepiride, acetohexamide, glipizide, glyburide, tolbutamide, pioglitazone, rosiglitazone, acarbose, miglitol, troglitazone, tolazamide, examide, citoglipton, insulin, glyburide-metformin, glipizide-metformin, glimepiride-pioglitazone, metformin-rosiglitazone, metformin-pioglitazone, change, diabetesMed. ***2. Dependent variable (1)**:* readmitted (Categorical)
Extra Info: Our dataset consists of hospital admissions of length between one and 14 days that did not result in a patientâs death. Each encounter corresponds to a patient diagnosed with diabetes, although the primary diagnosis may be different. During each of the analyzed encounters, lab tests were ordered and medication was administered.
Facebook
Twitterhttps://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
The Latin American diabetes care devices market, valued at approximately $XX million in 2025, is projected to experience robust growth, exhibiting a compound annual growth rate (CAGR) of 5.64% from 2025 to 2033. This expansion is fueled by several key factors. The rising prevalence of diabetes across Brazil, Mexico, and the rest of Latin America, coupled with increasing awareness of effective disease management, is driving demand for both self-monitoring and management devices. Technological advancements, such as the introduction of more user-friendly continuous glucose monitoring (CGM) systems and improved insulin delivery technologies like insulin pumps, are further stimulating market growth. Government initiatives promoting diabetes awareness and affordable access to healthcare are also contributing positively. However, high costs associated with advanced devices, limited healthcare infrastructure in certain regions, and a lack of patient education in some areas pose challenges to market penetration. The market is segmented by device type (self-monitoring blood glucose devices â including glucometers, test strips, and lancets; CGMs â encompassing sensors, receivers, and transmitters; and management devices like insulin pumps, syringes, pens, and jet injectors) and geography (Brazil, Mexico, and the rest of Latin America). Brazil and Mexico are expected to be the largest markets due to their significant populations and relatively higher prevalence of diabetes. The competitive landscape is characterized by the presence of major global players like Abbott, Dexcom, Medtronic, and Roche, alongside several regional players. The future of the Latin American diabetes care devices market appears bright, although further growth hinges on several factors. Addressing affordability concerns through government subsidies and insurance coverage is critical. Furthermore, improving healthcare infrastructure and expanding patient education programs will be crucial to maximizing market potential. Strategic partnerships between manufacturers, healthcare providers, and government bodies can facilitate wider adoption of advanced technologies. Specific focus should be placed on improving access to CGM technology, which offers significant improvements in diabetes management compared to traditional self-monitoring methods. The growing adoption of telehealth and remote monitoring will also contribute to the market's future trajectory. Competitive strategies will likely focus on innovation, product differentiation, and expanding distribution networks to reach diverse patient populations across the region. Recent developments include: January 2023: LifeScan announced that the peer-reviewed Journal of Diabetes Science and Technology published Improved Glycemic Control Using a Bluetooth Connected Blood Glucose Meter and a Mobile Diabetes App: Real-World Evidence From Over 144,000 People With Diabetes, detailing results from a retrospective analysis of real-world data from over 144,000 people with diabetes - one of the largest combined blood glucose meter and mobile diabetes app datasets ever published., October 2022: Becton, Dickinson, and Company and Biocorp announced that they had signed an agreement to use connected technology to track adherence to self-administered drug therapies, like biologics. To support biopharmaceutical companies in their efforts to improve the adherence and outcomes of injectable drugs, the two companies will integrate Biocorp's Injay technology-a solution designed to capture and transmit injection events using Near Field Communication technology to the BD UltraSafe Plus Passive Needle Guard used with pre-fillable syringes.. Notable trends are: The Continuous Glucose Monitoring Segment is expected to witness the highest growth rate over the forecast period.
Facebook
TwitterThe Diabetes Prevention Program (DPP) is a clinical trial that investigated whether modest weight loss through dietary changes and increased physical activity or treatment with the oral diabetes drug metformin (Glucophage) could prevent or delay the onset of type 2 diabetes in high risk individuals with prediabetes.
The study enrolled overweight persons with elevated fasting and post-load plasma glucose concentrations. Participants were randomized to placebo, metformin (850 mg twice daily), or a lifestyle-modification program with the goals of at least a 7 percent weight loss and at least 150 minutes of physical activity per week. The primary outcome measure was development of diabetes, diagnosed on the basis of an annual oral glucose-tolerance test or a semiannual fasting plasma glucose test, according to the 1997 criteria of the American Diabetes Association: a value for plasma glucose of 126 mg per deciliter (7.0 mmol per liter) or higher in the fasting state, or 200 mg per deciliter (11.1 mmol per liter) or higher two hours after a 75-g oral glucose load. Participation in DPP continued after a diagnosis of diabetes was made, although study medication was discontinued and participants were sent to their local primary care provider for treatment of diabetes once fasting glucose was > 140 mg/dl.
Results showed that both lifestyle changes and treatment with metformin reduced the incidence of diabetes in persons at high risk compared with placebo. Furthermore, the lifestyle intervention was more effective than metformin in preventing the onset of diabetes.
Supplemental measurements were collected using biospecimens that were obtained during the original DPP clinical trial. These measurements included antibodies, biomarkers, hormones, and vitamin D levels to assess the relationships between sex hormones, diabetes risk factors, and the progression to diabetes. The supplemental data showed that sex hormones were associated with diabetes risk in men, but these associations were not found in women. Furthermore, obesity and glycemia were more important predictors of diabetes risk than sex hormones.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset consists of a single data table in .docx Word document format, holding aggregated data on cardiovascular risk factors for men in Paramaribo, Suriname.In related studies from 2013-2015, the population of Suriname was found to have a high cardiovascular risk factor burden. Around 40% of the general population was hypertensive, 15% had diabetes, and the large majority had one or more risk factors for cardiovascular disease. However, it was not possible to assess time trends in these risk factors as historical data were lacking.
This dataset holds rediscovered and hitherto unpublished aggregated data of what was apparently the first population study on measured blood pressure, diabetes, and cardiovascular health in men in Suriname, assessed in 1973. These are presented alongside 2013 data for the same variables. These data may help understand the cardiovascular risk factor escalation of the local population in time as well as aid in projections of future cardiovascular disease in this middle income country. The variables reported in the data table are: sample size (%), sampling method, African ancestry (%), Regular leisure exercise (%), Ever smoked tobacco (%), Hypertension (%) and Diabetes (%).
Facebook
Twitterhttps://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The size of the Diabetes Care Devices Market in Latin America market was valued at USD XX Million in 2023 and is projected to reach USD XXX Million by 2032, with an expected CAGR of 5.64% during the forecast period. Recent developments include: January 2023: LifeScan announced that the peer-reviewed Journal of Diabetes Science and Technology published Improved Glycemic Control Using a Bluetooth Connected Blood Glucose Meter and a Mobile Diabetes App: Real-World Evidence From Over 144,000 People With Diabetes, detailing results from a retrospective analysis of real-world data from over 144,000 people with diabetes - one of the largest combined blood glucose meter and mobile diabetes app datasets ever published., October 2022: Becton, Dickinson, and Company and Biocorp announced that they had signed an agreement to use connected technology to track adherence to self-administered drug therapies, like biologics. To support biopharmaceutical companies in their efforts to improve the adherence and outcomes of injectable drugs, the two companies will integrate Biocorp's Injay technology-a solution designed to capture and transmit injection events using Near Field Communication technology to the BD UltraSafe Plus Passive Needle Guard used with pre-fillable syringes.. Key drivers for this market are: Increasing Demand for Better Dentistry and Better Aesthetic outcomes, Increase in the Disposable Income. Potential restraints include: High Cost Associated with the Digital Dentistry. Notable trends are: The Continuous Glucose Monitoring Segment is expected to witness the highest growth rate over the forecast period.
Facebook
TwitterThe dataset is part of a large dataset maintained by the National Institute of Diabetes and Digestive and Kidney Diseases in the USA. The data was used for a diabetes study conducted on Pima Indian women aged 21 and older living in Phoenix, the fifth-largest city in Arizona, USA. The target variable is specified as âoutcomeâ; 1 indicates a positive diabetes test result, while 0 indicates a negative result.
Pregnancies: Number of pregnancies Glucose: 2-hour plasma glucose concentration in an oral glucose tolerance test Blood Pressure: Blood Pressure (Diastolic) (mm Hg) SkinThickness: Skin Thickness Insulin: 2-hour serum insulin (mu U/ml) DiabetesPedigreeFunction: Diabetes Pedigree Function BMI: Body Mass Index Age: Age (years) Outcome: Whether the individual has the disease (1) or not (0)
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
According to the CDC, heart disease is a leading cause of death for people of most races in the U.S. (African Americans, American Indians and Alaska Natives, and whites). About half of all Americans (47%) have at least 1 of 3 major risk factors for heart disease: high blood pressure, high cholesterol, and smoking. Other key indicators include diabetes status, obesity (high BMI), not getting enough physical activity, or drinking too much alcohol. Identifying and preventing the factors that have the greatest impact on heart disease is very important in healthcare. In turn, developments in computing allow the application of machine learning methods to detect "patterns" in the data that can predict a patient's condition.
The dataset originally comes from the CDC and is a major part of the Behavioral Risk Factor Surveillance System (BRFSS), which conducts annual telephone surveys to collect data on the health status of U.S. residents. As described by the CDC: "Established in 1984 with 15 states, BRFSS now collects data in all 50 states, the District of Columbia, and three U.S. territories. BRFSS completes more than 400,000 adult interviews each year, making it the largest continuously conducted health survey system in the world. The most recent dataset includes data from 2023. In this dataset, I noticed many factors (questions) that directly or indirectly influence heart disease, so I decided to select the most relevant variables from it. I also decided to share with you two versions of the most recent dataset: with NaNs and without it.
As described above, the original dataset of nearly 300 variables was reduced to 40variables. In addition to classical EDA, this dataset can be used to apply a number of machine learning methods, especially classifier models (logistic regression, SVM, random forest, etc.). You should treat the variable "HadHeartAttack" as binary ("Yes" - respondent had heart disease; "No" - respondent did not have heart disease). Note, however, that the classes are unbalanced, so the classic approach of applying a model is not advisable. Fixing the weights/undersampling should yield much better results. Based on the data set, I built a logistic regression model and embedded it in an application that might inspire you: https://share.streamlit.io/kamilpytlak/heart-condition-checker/main/app.py. Can you indicate which variables have a significant effect on the likelihood of heart disease?
Check out this notebook in my GitHub repository: https://github.com/kamilpytlak/data-science-projects/blob/main/heart-disease-prediction/2022/notebooks/data_processing.ipynb
Facebook
TwitterPopulation-based county-level estimates for prevalence of DC were obtained from the Institute for Health Metrics and Evaluation (IHME) for the years 2004-2012 (16). DC prevalence rate was defined as the propor-tion of people within a county who had previously been diagnosed with diabetes (high fasting plasma glu-cose ďł126 mg/dL, hemoglobin A1c (HbA1c) of ďł6.5%, or diabetes diagnosis) but do not currently have high fasting plasma glucose or HbA1c for the period 2004-2012. DC prevalence estimates were calculated using a two-stage approach. The first stage used National Health and Nutrition Examination Survey (NHANES) data to predict high fasting plasma glucose (FPG) levels (âĽ126 mg/dL) and/or HbA1C levels (âĽ6.5% [48 mmol/mol]) based on self-reported demographic and behavioral characteristics (16). This model was then applied to Behavioral Risk Factor Surveillance System (BRFSS) data to impute high FPG and/or HbA1C status for each BRFSS respondent (16). The second stage used the imputed BRFSS data to fit a series of small area models, which were used to predict county-level prevalence of diabetes-related outcomes, including DC (16). The EQI was constructed for 2006-2010 for all US counties and is composed of five domains (air, water, built, land, and sociodemographic), each composed of variables to represent the environmental quality of that domain. Domain-specific EQIs were developed using principal components analysis (PCA) to reduce these variables within each domain while the overall EQI was constructed from a second PCA from these individual domains (L. C. Messer et al., 2014). To account for differences in environment across rural and urban counties, the overall and domain-specific EQIs were stratified by rural urban continuum codes (RUCCs) (U.S. Department of Agriculture, 2015). Results are reported as prevalence rate differences (PRD) with 95% confidence intervals (CIs) comparing the highest quintile/worst environmental quality to the lowest quintile/best environmental quality expo-sure metrics. PRDs are representative of the entire period of interest, 2004-2012. Due to availability of DC data and covariate data, not all counties were captured, however, the majority, 3134 of 3142 were utilized in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Human health data are not available publicly. EQI data are available at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: Data are stored as csv files. This dataset is associated with the following publication: Jagai, J., A. Krajewski, K. Price, D. Lobdell, and R. Sargis. Diabetes control is associated with environmental quality in the USA. Endocrine Connections. BioScientifica Ltd., Bristol, UK, 10(9): 1018-1026, (2021).