Facebook
TwitterIn 2023, almost 46 percent of adults in Alabama suffered from hypertension. This statistic depicts the rate of adults suffering from hypertension in the United States in 2023, sorted by state.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The dataset, "Heart Attack in Youth vs. Adults in America", contains 500,000 synthetic records detailing health, lifestyle, and demographic factors contributing to heart attack risks among youth and adults in the United States. This dataset can help researchers and data enthusiasts analyze patterns, predict risk levels, and understand disparities between age groups and regions in terms of heart health.
Insights Beginners, Intermediate, and Advanced Users Can Derive:
For Beginners:
Descriptive Statistics:
Calculate average cholesterol levels or blood pressure for youth vs. adults. Determine the distribution of heart attack risk levels across different states or demographics.
Data Visualization:
Visualize the distribution of obesity indices across age groups. Plot the survival rates based on risk levels.
For Intermediate Users:
Exploratory Data Analysis (EDA):
Investigate the correlation between lifestyle factors (e.g., dietary habits, smoking history) and heart attack risk levels. Compare access to healthcare between low-income and high-income groups.
Predictive Modeling:
Build a logistic regression or decision tree model to predict high-risk individuals. Use clustering techniques to group individuals based on heart attack risks.
For Advanced Users:
Deep Analysis and Insights:
Perform a time series analysis on hospital visits and prior heart attacks. Use advanced ML algorithms (e.g., Gradient Boosting, Neural Networks) for risk prediction and survival rate forecasting.
Feature Engineering:
Create new features, such as BMI categories or healthcare accessibility indices. Analyze the interaction effects between physical activity, obesity index, and smoking history.
Explainable AI:
Use SHAP (SHapley Additive exPlanations) to understand model predictions. Identify biases in predictions related to ethnicity or access to healthcare.
Facebook
TwitterThis data represents the age-adjusted prevalence of high total cholesterol, hypertension, and obesity among US adults aged 20 and over between 1999-2000 to 2017-2018. Notes: All estimates are age adjusted by the direct method to the U.S. Census 2000 population using age groups 20–39, 40–59, and 60 and over. Definitions Hypertension: Systolic blood pressure greater than or equal to 130 mmHg or diastolic blood pressure greater than or equal to 80 mmHg, or currently taking medication to lower high blood pressure High total cholesterol: Serum total cholesterol greater than or equal to 240 mg/dL. Obesity: Body mass index (BMI, weight in kilograms divided by height in meters squared) greater than or equal to 30. Data Source and Methods Data from the National Health and Nutrition Examination Surveys (NHANES) for the years 1999–2000, 2001–2002, 2003–2004, 2005–2006, 2007–2008, 2009–2010, 2011–2012, 2013–2014, 2015–2016, and 2017–2018 were used for these analyses. NHANES is a cross-sectional survey designed to monitor the health and nutritional status of the civilian noninstitutionalized U.S. population. The survey consists of interviews conducted in participants’ homes and standardized physical examinations, including a blood draw, conducted in mobile examination centers.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
According to the CDC, heart disease is a leading cause of death for people of most races in the U.S. (African Americans, American Indians and Alaska Natives, and whites). About half of all Americans (47%) have at least 1 of 3 major risk factors for heart disease: high blood pressure, high cholesterol, and smoking. Other key indicators include diabetes status, obesity (high BMI), not getting enough physical activity, or drinking too much alcohol. Identifying and preventing the factors that have the greatest impact on heart disease is very important in healthcare. In turn, developments in computing allow the application of machine learning methods to detect "patterns" in the data that can predict a patient's condition.
The dataset originally comes from the CDC and is a major part of the Behavioral Risk Factor Surveillance System (BRFSS), which conducts annual telephone surveys to collect data on the health status of U.S. residents. As described by the CDC: "Established in 1984 with 15 states, BRFSS now collects data in all 50 states, the District of Columbia, and three U.S. territories. BRFSS completes more than 400,000 adult interviews each year, making it the largest continuously conducted health survey system in the world. The most recent dataset includes data from 2023. In this dataset, I noticed many factors (questions) that directly or indirectly influence heart disease, so I decided to select the most relevant variables from it. I also decided to share with you two versions of the most recent dataset: with NaNs and without it.
As described above, the original dataset of nearly 300 variables was reduced to 40variables. In addition to classical EDA, this dataset can be used to apply a number of machine learning methods, especially classifier models (logistic regression, SVM, random forest, etc.). You should treat the variable "HadHeartAttack" as binary ("Yes" - respondent had heart disease; "No" - respondent did not have heart disease). Note, however, that the classes are unbalanced, so the classic approach of applying a model is not advisable. Fixing the weights/undersampling should yield much better results. Based on the data set, I built a logistic regression model and embedded it in an application that might inspire you: https://share.streamlit.io/kamilpytlak/heart-condition-checker/main/app.py. Can you indicate which variables have a significant effect on the likelihood of heart disease?
Check out this notebook in my GitHub repository: https://github.com/kamilpytlak/data-science-projects/blob/main/heart-disease-prediction/2022/notebooks/data_processing.ipynb
Facebook
TwitterThis web map is part of the Centers for Disease Control and Prevention (CDC) PLACES. It provides model-based estimates of taking high blood pressure medication prevalence among adults aged 18 years and older who has high blood pressure at county, place, census tract, and ZCTA levels in the United States. PLACES is an expansion of the original 500 Cities Project and a collaboration between the CDC, the Robert Wood Johnson Foundation, and the CDC Foundation. Data sources used to generate these estimates include the Behavioral Risk Factor Surveillance System (BRFSS), Census 2020 population counts or Census annual county-level population estimates, and the American Community Survey (ACS) estimates. For detailed methodology see www.cdc.gov/places. For questions or feedback send an email to places@cdc.gov.Measure name used for taking high blood pressure medication is BPMED.
Facebook
TwitterAbstract Background Hypertension is a serious and persistent public health problem and is one of the main causes of cardiovascular diseases and general mortality. Objectives This study aimed to verify the prevalence and factors associated with systemic arterial hypertension in workers from the state of Rio Grande do Sul, Brazil. Methods This is a cross-sectional study using the secondary data from 20,792 industry workers from 18 to 59 years of age. The presence of arterial hypertension was determined from systolic blood pressure ≥ 140mmHg and/or diastolic blood pressure ≥ 90mmHg or taking antihypertensive medication. Factors investigated included demographic, socioeconomic, behavioral, nutritional status, and family history characteristics. Poisson regression was used in multivariate analysis, adopting a significance level of p<0.05. All analyses were stratified by sex. Results The sample included 12,349 men and 8,443 women with a mean age of 32.8 years (Standard Deviation = 9.8). The prevalence of arterial hypertension was 10.3% (95% CI: 9.8-10.7), which was significantly higher in men than in women (10.9% vs 9.4%; p = 0.001). Arterial hypertension was associated with increased age, a low level of education, living with a partner, being overweight or obese, and having at least one relative with a history of hypertension for both sexes. Women with better socioeconomic conditions presented a lower prevalence of hypertension. Conclusions The main factors associated with hypertension included sociodemographic, nutritional, and family history characteristics. In addition, socioeconomic conditions showed an association with the occurrence of hypertension, especially among women.
Facebook
Twitterhttps://www.usa.gov/government-works/https://www.usa.gov/government-works/
This is the 500 Cities Data of a different year (released in 2018) to the ones already present on Kaggle. It was exported and uploaded without modification. The original source states: "This is the complete dataset for the 500 Cities project 2018 release. This dataset includes 2016, 2015 model-based small area estimates for 27 measures of chronic disease related to unhealthy behaviors (5), health outcomes (13), and use of preventive services (9). Data were provided by the Centers for Disease Control and Prevention (CDC), Division of Population Health, Epidemiology and Surveillance Branch. The project was funded by the Robert Wood Johnson Foundation (RWJF) in conjunction with the CDC Foundation. It represents a first-of-its kind effort to release information on a large scale for cities and for small areas within those cities. It includes estimates for the 500 largest US cities and approximately 28,000 census tracts within these cities. These estimates can be used to identify emerging health problems and to inform development and implementation of effective, targeted public health prevention activities. Because the small area model cannot detect effects due to local interventions, users are cautioned against using these estimates for program or policy evaluations. Data sources used to generate these measures include Behavioral Risk Factor Surveillance System (BRFSS) data (2016, 2015), Census Bureau 2010 census population data, and American Community Survey (ACS) 2012-2016, 2011-2015 estimates. Because some questions are only asked every other year in the BRFSS, there are 4 measures (high blood pressure, taking high blood pressure medication, high cholesterol, cholesterol screening) from the 2015 BRFSS that are the same in the 2018 release as the previous 2017 release. More information about the methodology can be found at www.cdc.gov/500cities."
The original can be found at: https://chronicdata.cdc.gov/500-Cities-Places/500-Cities-Local-Data-for-Better-Health-2018-relea/rja3-32tc
The 500 Cities project ran from 2016 to 2019. In December of 2020, this was expanded into and replaced by the PLACES project.
This dataset contains data for the US, 500 cities within it (the 497 largest cities of the US, then a few that were the largest of their state in order to ensure all states were represented), and the census tracts within those cities. The total population represents about 33.4% of the US population.
Measures include:
Please help this dataset reveal more by investigating anything that captures your attention, but for ideas, consider: * Does the state or region play a role in any of the measures? * Can you build a model to predict any of the measures? * Combining this data with the other years posted on Kaggle to determine how the measures have changed over ...
Facebook
TwitterIn 2008, a group of uninsured low-income adults in Oregon was selected by lottery to be given the chance to apply for Medicaid. This lottery provides an opportunity to gauge the effects of expanding access to public health insurance on the health care use, financial strain, and health of low-income adults using a randomized controlled design. The Oregon Health Insurance Experiment follows and compares those selected in the lottery (treatment group) with those not selected (control group). The data collected and provided here include data from in-person interviews, three mail surveys, emergency department records, and administrative records on Medicaid enrollment, the initial lottery sign-up list, welfare benefits, and mortality. This data collection has seven data files: Dataset 1 contains administrative data on the lottery from the state of Oregon. These data include demographic characteristics that were recorded when individuals signed up for the lottery, date of lottery draw, and information on who was selected for the lottery, applied for the lotteried Medicaid plan if selected, and whose application for the lotteried plan was approved. Also included are Oregon mortality data for 2008 and 2009. Dataset 2 contains information from the state of Oregon on the individuals' participation in Medicaid, Supplemental Nutrition Assistance Program (SNAP), and Temporary Assistance to Needy Families (TANF). Datasets 3-5 contain the data from the initial, six month, and 12 month mail surveys, respectively. Topics covered by the surveys include demographic characteristics; health insurance, access to health care and health care utilization; health care needs, experiences, and costs; overall health status and changes in health; and depression and medical conditions and use of medications to treat them. Dataset 6 contains an analysis subset of the variables from the in-person interviews. Topics covered by the survey questionnaire include overall health, health insurance coverage, health care access, health care utilization, conditions and treatments, health behaviors, medical and dental costs, and demographic characteristics. The interviewers also obtained blood pressure and anthropometric measurements and collected dried blood spots to measure levels of cholesterol, glycated hemoglobin and C-reactive protein. Dataset 7 contains an analysis subset of the variables the study obtained for all emergency department (ED) visits to twelve hospitals in the Portland area during 2007-2009. These variables capture total hospital costs, ED costs, and the number of ED visits categorized by time of the visit (daytime weekday or nighttime and weekends), necessity of the visit (emergent, ED care needed, non-preventable; emergent, ED care needed, preventable; emergent, primary care treatable), ambulatory case sensitive status, whether or not the patient was hospitalized, and the reason for the visit (e.g., injury, abdominal pain, chest pain, headache, and mental disorders). The collection also includes a ZIP archive (Dataset 8) with Stata programs that replicate analyses reported in three articles by the principal investigators and others: Finkelstein, Amy et al "The Oregon Health Insurance Experiment: Evidence from the First Year". The Quarterly Journal of Economics. August 2012. Vol 127(3). Baicker, Katherine et al "The Oregon Experiment - Effects of Medicaid on Clinical Outcomes". New England Journal of Medicine. 2 May 2013. Vol 368(18). Taubman, Sarah et al "Medicaid Increases Emergency Department Use: Evidence from Oregon's Health Insurance Experiment". Science. 2 Jan 2014.
Facebook
TwitterThis is historical data. The update frequency has been set to "Static Data" and is here for historic value. Updated on 8/14/2024 Adults who are not overweight or obese - This indicator shows the percentage of adults who are not overweight or obese. In Maryland in 2015, of adults considered obese, 52% had high blood pressure, 44% had high cholesterol, and 21% had diabetes. Healthy weight can aid in the control of these conditions if they develop. Link to Data Details
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
World Health Organization has estimated 12 million deaths occur worldwide, every year due to Heart diseases. Half the deaths in the United States and other developed countries are due to cardio vascular diseases. The early prognosis of cardiovascular diseases can aid in making decisions on lifestyle changes in high risk patients and in turn reduce the complications. This research intends to pinpoint the most relevant/risk factors of heart disease as well as predict the overall risk using logistic regression, decision tree classifier, Random Forest Classifier and various boosting techniques. The dataset is publically available on the Kaggle website, and it is from an ongoing ongoing cardiovascular study on residents of the town of Framingham, Massachusetts. The classification goal is to predict whether the patient has 10-year risk of future coronary heart disease (CHD).The dataset provides the patients’ information. It includes over 4,000 records and 15 attributes.
Variables : Each attribute is a potential risk factor. There are both demographic, behavioural and medical risk factors.
**Demographic: sex: **male or female;(Nominal)
age: age of the patient;(Continuous - Although the recorded ages have been truncated to whole numbers, the concept of age is continuous) Behavioural
**currentSmoker: **whether or not the patient is a current smoker (Nominal)
cigsPerDay: the number of cigarettes that the person smoked on average in one day.(can be considered continuous as one can have any number of cigarretts, even half a cigarette.)
Medical( history):
BPMeds: whether or not the patient was on blood pressure medication (Nominal)
prevalentStroke: whether or not the patient had previously had a stroke (Nominal)
****prevalentHyp: whether or not the patient was hypertensive (Nominal)
diabetes: whether or not the patient had diabetes (Nominal)
Medical(current):
totChol: total cholesterol level (Continuous)
sysBP: systolic blood pressure (Continuous)
**diaBP: **diastolic blood pressure (Continuous)
BMI: Body Mass Index (Continuous)
heartRate: heart rate (Continuous - In medical research, variables such as heart rate though in fact discrete, yet are considered continuous because of large number of possible values.)
glucose: glucose level (Continuous)
Predict variable (desired target):
10 year risk of coronary heart disease CHD (binary: “1”, means “Yes”, “0” means “No”)
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A comprehensive dataset characterizing healthy research volunteers in terms of clinical assessments, mood-related psychometrics, cognitive function neuropsychological tests, structural and functional magnetic resonance imaging (MRI), along with diffusion tensor imaging (DTI), and a comprehensive magnetoencephalography battery (MEG).
In addition, blood samples are currently banked for future genetic analysis. All data collected in this protocol are broadly shared in the OpenNeuro repository, in the Brain Imaging Data Structure (BIDS) format. In addition, task paradigms and basic pre-processing scripts are shared on GitHub. This dataset is unprecedented in its depth of characterization of a healthy population and will allow a wide array of investigations into normal cognition and mood regulation.
This dataset is licensed under the Creative Commons Zero (CC0) v1.0 License.
This release includes data collected between 2020-06-03 (cut-off date for v1.0.0) and 2024-04-01. Notable changes in this release:
visit and age_at_visit columns added to phenotype files to distinguish between visits and intervals between them.See the CHANGES file for complete version-wise changelog.
To be eligible for the study, participants need to be medically healthy adults over 18 years of age with the ability to read, speak and understand English. All participants provided electronic informed consent for online pre-screening, and written informed consent for all other procedures. Participants with a history of mental illness or suicidal or self-injury thoughts or behavior are excluded. Additional exclusion criteria include current illicit drug use, abnormal medical exam, and less than an 8th grade education or IQ below 70. Current NIMH employees, or first degree relatives of NIMH employees are prohibited from participating. Study participants are recruited through direct mailings, bulletin boards and listservs, outreach exhibits, print advertisements, and electronic media.
All potential volunteers visit the study website, check a box indicating consent, and fill out preliminary screening questionnaires. The questionnaires include basic demographics, the World Health Organization Disability Assessment Schedule 2.0 (WHODAS 2.0), the DSM-5 Self-Rated Level 1 Cross-Cutting Symptom Measure, the DSM-5 Level 2 Cross-Cutting Symptom Measure - Substance Use, the Alcohol Use Disorders Identification Test (AUDIT), the Edinburgh Handedness Inventory, and a brief clinical history checklist. The WHODAS 2.0 is a 15 item questionnaire that assesses overall general health and disability, with 14 items distributed over 6 domains: cognition, mobility, self-care, “getting along”, life activities, and participation. The DSM-5 Level 1 cross-cutting measure uses 23 items to assess symptoms across diagnoses, although an item regarding self-injurious behavior was removed from the online self-report version. The DSM-5 Level 2 cross-cutting measure is adapted from the NIDA ASSIST measure, and contains 15 items to assess use of both illicit drugs and prescription drugs without a doctor’s prescription. The AUDIT is a 10 item screening assessment used to detect harmful levels of alcohol consumption, and the Edinburgh Handedness Inventory is a systematic assessment of handedness. These online results do not contain any personally identifiable information (PII). At the conclusion of the questionnaires, participants are prompted to send an email to the study team. These results are reviewed by the study team, who determines if the participant is appropriate for an in-person interview.
Participants who meet all inclusion criteria are scheduled for an in-person screening visit to determine if there are any further exclusions to participation. At this visit, participants receive a History and Physical exam, Structured Clinical Interview for DSM-5 Disorders (SCID-5), the Beck Depression Inventory-II (BDI-II), Beck Anxiety Inventory (BAI), and the Kaufman Brief Intelligence Test, Second Edition (KBIT-2). The purpose of these cognitive and psychometric tests is two-fold. First, these measures are designed to provide a sensitive test of psychopathology. Second, they provide a comprehensive picture of cognitive functioning, including mood regulation. The SCID-5 is a structured interview, administered by a clinician, that establishes the absence of any DSM-5 axis I disorder. The KBIT-2 is a brief (20 minute) assessment of intellectual functioning administered by a trained examiner. There are three subtests, including verbal knowledge, riddles, and matrices.
Biological and physiological measures are acquired, including blood pressure, pulse, weight, height, and BMI. Blood and urine samples are taken and a complete blood count, acute care panel, hepatic panel, thyroid stimulating hormone, viral markers (HCV, HBV, HIV), c-reactive protein, creatine kinase, urine drug screen and urine pregnancy tests are performed. In addition, three additional tubes of blood samples are collected and banked for future analysis, including genetic testing.
Participants were given the option to enroll in optional magnetic resonance imaging (MRI) and magnetoencephalography (MEG) studies.
On the same visit as the MRI scan, participants are administered a subset of tasks from the NIH Toolbox Cognition Battery. The four tasks asses attention and executive functioning (Flanker Inhibitory Control and Attention Task), executive functioning (Dimensional Change Card Sort Task), episodic memory (Picture Sequence Memory Task), and working memory (List Sorting Working Memory Task). The MRI protocol used was initially based on the ADNI-3 basic protocol, but was later modified to include portions of the ABCD protocol in the following manner:
The optional MEG studies were added to the protocol approximately one year after the study was initiated, thus there are relatively fewer MEG recordings in comparison to the MRI dataset. MEG studies are performed on a 275 channel CTF MEG system. The position of the head was localized at the beginning and end of the recording using three fiducial coils. These coils were placed 1.5 cm above the nasion, and at each ear, 1.5 cm from the tragus on a line between the tragus and the outer canthus of the eye. For some participants, photographs were taken of the three coils and used to mark the points on the T1 weighted structural MRI scan for co-registration. For the remainder of the participants, a BrainSight neuro-navigation unit was used to coregister the MRI, anatomical fiducials, and localizer coils directly prior to MEG data acquisition.
NOTE: In the release 2.0 of the dataset, two measures Brief Trauma Questionnaire (BTQ) and Big Five personality survey were added to the online screening questionnaires. Also, for the in-person screening visit, the Beck Anxiety Inventory (BAI) and Beck Depression Inventory-II (BDI-II) were replaced with the General Anxiety Disorder-7 (GAD7) and Patient Health Questionnaire 9 (PHQ9) surveys, respectively. The Perceived Health rating survey was discontinued.
| Survey or Test | BIDS TSV Name |
|---|---|
| Alcohol Use Disorders Identification Test (AUDIT) | audit.tsv |
| Brief Trauma Questionnaire (BTQ) | btq.tsv |
| Big-Five Personality | big_five_personality.tsv |
| Demographics | demographics.tsv |
| Drug Use Questionnaire |
Facebook
TwitterThe 2013 NDHS is part of the worldwide Demographic and Health Surveys (DHS) programme funded by the United States Agency for International Development (USAID). DHS surveys are designed to collect data on fertility, family planning, and maternal and child health; assist countries in monitoring changes in population, health, and nutrition; and provide an international database that can be used by researchers investigating topics related to population, health, and nutrition.
The overall objective of the survey is to provide demographic, socioeconomic, and health data necessary for policymaking, planning, monitoring, and evaluation of national health and population programmes. In addition, the survey measured the prevalence of anaemia, HIV, high blood glucose, and high blood pressure among adult women and men; assessed the prevalence of anaemia among children age 6-59 months; and collected anthropometric measurements to assess the nutritional status of women, men, and children.
A long-term objective of the survey is to strengthen the technical capacity of local organizations to plan, conduct, and process and analyse data from complex national population and health surveys. At the global level, the 2013 NDHS data are comparable with those from a number of DHS surveys conducted in other developing countries. The 2013 NDHS adds to the vast and growing international database on demographic and health-related variables.
National coverage
Sample survey data [ssd]
Sample Design The primary focus of the 2013 NDHS was to provide estimates of key population and health indicators, including fertility and mortality rates, for the country as a whole and for urban and rural areas. In addition, the sample was designed to provide estimates of most key variables for the 13 administrative regions.
Each of the administrative regions is subdivided into a number of constituencies (with an overall total of 107 constituencies). Each constituency is further subdivided into lower level administrative units. An enumeration area (EA) is the smallest identifiable entity without administrative specification, numbered sequentially within each constituency. Each EA is classified as urban or rural. The sampling frame used for the 2013 NDHS was the preliminary frame of the 2011 Namibia Population and Housing Census (NSA, 2013a). The sampling frame was a complete list of all EAs covering the whole country. Each EA is a geographical area covering an adequate number of households to serve as a counting unit for the population census. In rural areas, an EA is a natural village, part of a large village, or a group of small villages; in urban areas, an EA is usually a city block. The 2011 population census also produced a digitised map for each of the EAs that served as the means of identifying these areas.
The sample for the 2013 NDHS was a stratified sample selected in two stages. In the first stage, 554 EAs-269 in urban areas and 285 in rural areas-were selected with a stratified probability proportional to size selection from the sampling frame. The size of an EA is defined according to the number of households residing in the EA, as recorded in the 2011 Population and Housing Census. Stratification was achieved by separating every region into urban and rural areas. Therefore, the 13 regions were stratified into 26 sampling strata (13 rural strata and 13 urban strata). Samples were selected independently in every stratum, with a predetermined number of EAs selected. A complete household listing and mapping operation was carried out in all selected clusters. In the second stage, a fixed number of 20 households were selected in every urban and rural cluster according to equal probability systematic sampling.
Due to the non-proportional allocation of the sample to the different regions and the possible differences in response rates, sampling weights are required for any analysis using the 2013 NDHS data to ensure the representativeness of the survey results at the national as well as the regional level. Since the 2013 NDHS sample was a two-stage stratified cluster sample, sampling probabilities were calculated separately for each sampling stage and for each cluster.
See Appendix A in the final report for details
Face-to-face [f2f]
Three questionnaires were administered in the 2013 NDHS: the Household Questionnaire, the Woman’s Questionnaire, and the Man’s Questionnaire. These questionnaires were adapted from the standard DHS6 core questionnaires to reflect the population and health issues relevant to Namibia at a series of meetings with various stakeholders from government ministries and agencies, nongovernmental organisations, and international donors. The final draft of each questionnaire was discussed at a questionnaire design workshop organised by the MoHSS from September 25-28, 2012, in Windhoek. The questionnaires were then translated from English into the six main local languages—Afrikaans, Rukwangali, Oshiwambo, Damara/Nama, Otjiherero, and Silozi—and back translated into English. The questionnaires were finalised after the pretest, which took place from February 11-25, 2013.
The Household Questionnaire was used to list all usual household members as well as visitors in the selected households. Basic information was collected on the characteristics of each person listed, including age, sex, education, and relationship to the head of the household. For children under age 18, parents’ survival status was determined. In addition, the Household Questionnaire included questions on knowledge of malaria and use of mosquito nets by household members, along with questions regarding health expenditures. The Household Questionnaire was used to identify women and men who were eligible for the individual interview and the interview on domestic violence. The questionnaire also collected information on characteristics of the household’s dwelling unit, such as source of water, type of toilet facilities, materials used for the floor of the house, and ownership of various durable goods. The results of tests assessing iodine levels were recorded as well.
In half of the survey households (the same households selected for the male survey), the Household Questionnaire was also used to record information on anthropometry and biomarker data collected from eligible respondents, as follows: • All eligible women and men age 15-64 were measured, weighed, and tested for anaemia and HIV. • All eligible women and men age 35-64 had their blood pressure and blood glucose measured. • All children age 0 to 59 months were measured and weighed. • All children age 6 to 59 months were tested for anaemia.
The Woman’s Questionnaire was also used to collect information from women age 50-64 living in half of the selected survey households on background characteristics, marriage and sexual activity, women’s work and husbands’ background characteristics, awareness and behaviour regarding AIDS and other STIs, and other health issues.
The Man’s Questionnaire was administered to all men age 15-64 living in half of the selected survey households. The Man’s Questionnaire collected much of the same information as the Woman’s Questionnaire but was shorter because it did not contain a detailed reproductive history or questions on maternal and child health or nutrition.
CSPro—a Windows-based integrated census and survey processing system that combines and replaces the ISSA and IMPS packages—was used for entry, editing, and tabulation of the NDHS data. Prior to data entry, a practical training session was provided by ICF International to all data entry staff. A total of 28 data processing personnel, including 17 data entry operators, one questionnaire administrator, two office editors, three secondary editors, two network technicians, two data processing supervisors, and one coordinator, were recruited and trained on administration of questionnaires and coding, data entry and verification, correction of questionnaires and provision of feedback, and secondary editing. NDHS data processing was formally launched during the week of June 22, 2013, at the National Statistics Agency Data Processing Centre in Windhoek. The data entry and editing phase of the survey was completed in January 2014.
A total of 11,004 households were selected for the sample, of which 10,165 were found to be occupied during data collection. Of the occupied households, 9,849 were successfully interviewed, yielding a household response rate of 97 percent.
In these households, 9,940 women age 15-49 were identified as eligible for the individual interview. Interviews were completed with 9,176 women, yielding a response rate of 92 percent. In addition, in half of these households, 842 women age 50-64 were successfully interviewed; in this group of women, the response rate was 91 percent.
Of the 5,271 eligible men identified in the selected subsample of households, 4,481 (85 percent) were successfully interviewed.
Response rates were higher in rural than in urban areas, with the rural-urban difference more marked among men than among women.
The estimates from a sample survey are affected by two types of errors: nonsampling errors and sampling errors. Nonsampling errors are the results of mistakes made in implementing data collection and data processing, such as failure to locate and interview
Facebook
TwitterWorld Health Organization has estimated 12 million deaths occur worldwide, every year due to Heart diseases. Half the deaths in the United States and other developed countries are due to cardio vascular diseases. The early prognosis of cardiovascular diseases can aid in making decisions on lifestyle changes in high risk patients and in turn reduce the complications. This research intends to pinpoint the most relevant/risk factors of heart disease as well as predict the overall risk using logistic regression Data Preparation
The task is to predict whether patient have 10 year risk of coronary heart disease CHD or not. Additionally, participants also asked to create some data visualization about the data to gained actionable insight about the topic.
The dataset is publically available on the Kaggle website, and it is from an ongoing cardiovascular study on residents of the town of Framingham, Massachusetts. The classification goal is to predict whether the patient has 10-year risk of future coronary heart disease (CHD).The dataset provides the patients’ information. It includes over 4,000 records and 15 attributes. Variables Each attribute is a potential risk factor. There are both demographic, behavioral and medical risk factors.
Demographic: • Sex: male or female("M" or "F") • Age: Age of the patient;(Continuous - Although the recorded ages have been truncated to whole numbers, the concept of age is continuous) Behavioral • is_smoking: whether or not the patient is a current smoker ("YES" or "NO") • Cigs Per Day: the number of cigarettes that the person smoked on average in one day.(can be considered continuous as one can have any number of cigarettes, even half a cigarette.) Medical( history) • BP Meds: whether or not the patient was on blood pressure medication (Nominal) • Prevalent Stroke: whether or not the patient had previously had a stroke (Nominal) • Prevalent Hyp: whether or not the patient was hypertensive (Nominal) • Diabetes: whether or not the patient had diabetes (Nominal) Medical(current) • Tot Chol: total cholesterol level (Continuous) • Sys BP: systolic blood pressure (Continuous) • Dia BP: diastolic blood pressure (Continuous) • BMI: Body Mass Index (Continuous) • Heart Rate: heart rate (Continuous - In medical research, variables such as heart rate though in fact discrete, yet are considered continuous because of large number of possible values.) • Glucose: glucose level (Continuous) Predict variable (desired target) • 10 year risk of coronary heart disease CHD(binary: “1”, means “Yes”, “0” means “No”)
Facebook
TwitterSUMMARY This table contains data about women, ages 15 to 50, pregnant people, infants, children, and youths, up to age 24. It contains information about a wide range of health topics, including medical conditions, nutrition, dehydration, oral health, mental health, safety, access to health care, and basic needs, like housing. Local, county-level prevalence rates, time trends, and health disparities about national public health priorities, including preterm birth, infant death, childhood obesity, adolescent depression and substance use, and high blood pressure, diabetes, and kidney disease in young adults. The population data is from the 2023-2024 San Francisco Maternal Child and Adolescent Health needs assessment and is published on the Open Data Portal to share with community partners, plan services, and promote health. For more information see: Maternal, Child, and Adolescent Health Homepage Maternal, Child, and Adolescent Health Reports HOW THE DATASET IS CREATED The Maternal, Child, and Adolescent Health (MCAH) Needs Assessment for San Francisco included review of a wide range of citywide population data covering a ten-year span, from 2014 to 2023. Data from over 83,000 birth records, 59,000 death records, 261,000 emergency room visits, 66,000 hospital admissions, and 90,000 newborn screening discharges were gathered, along with citywide data from child welfare records, health screenings in childcare and schools, DMV records of first-time drivers, school surveys, and a state-run mailed survey of recent births (California Department of Public Health MIHA survey). The datasets provided information about approximately 700 health conditions. Each health condition was described in terms of the number of people affected or cases, and the rate affected, stratified by age, sex, race-ethnicity, insurance status, zip code, and time period. Rates were calculated by dividing the number of people or events by the population group estimate (e.g., total births or census estimates), then multiplying by 100 or 1,000 depending on the measure. Each rate was presented with its 95% confidence interval to support users to compare any two rates, either between groups or over time. Two rates differ “significantly” if their 95% confidence intervals do not overlap. The present dataset summarizes the group-level results for any age-, sex-, race-, insurance-, zip code-, and/or period-specific group that included at least 20 people or cases. Causes of death, health conditions that affected over 1000 people in the time frame, problems that got worse over time, and health disparities by insurance, race-ethnicity and/or zip code were flagged for the MCAH Needs Assessment. UPDATE PROCESS The dataset will be updated manually, bi-annually, each December and June. HOW TO USE THIS DATASET Population data from the MCAH needs assessment are shared in several formats, including aggregated datasets on DataSF.gov, downloadable PDF summary reports by age group, interactive online visualizations, data tables, trend graphs, and maps. Information about each variable is available in a linked data dictionary. The definition of each numerator and denominator depends on data source, life stage, and time. Health conditions may not be directly comparable across life stage, if the numerator definition includes age- or pregnancy-specific diagnosis codes (e.g. diabetes hospitalization). For small groups or rare conditions, consider combining time periods and/or groups. Data are suppressed if fewer than 20 cases happened in the group and period. Group-specific rates are available if the matched group-specific census estimates (denominator) were available. Census estim
Facebook
TwitterLOGISTIC REGRESSION - HEART DISEASE PREDICTION
Introduction World Health Organization has estimated 12 million deaths occur worldwide, every year due to Heart diseases. Half the deaths in the United States and other developed countries are due to cardio vascular diseases. The early prognosis of cardiovascular diseases can aid in making decisions on lifestyle changes in high risk patients and in turn reduce the complications. This research intends to pinpoint the most relevant/risk factors of heart disease as well as predict the overall risk using logistic regression Data Preparation
Source The dataset is publically available on the Kaggle website, and it is from an ongoing cardiovascular study on residents of the town of Framingham, Massachusetts. The classification goal is to predict whether the patient has 10-year risk of future coronary heart disease (CHD).The dataset provides the patients’ information. It includes over 4,000 records and 15 attributes. Variables Each attribute is a potential risk factor. There are both demographic, behavioral and medical risk factors.
Demographic: • Sex: male or female(Nominal) • Age: Age of the patient;(Continuous - Although the recorded ages have been truncated to whole numbers, the concept of age is continuous) Behavioral • Current Smoker: whether or not the patient is a current smoker (Nominal) • Cigs Per Day: the number of cigarettes that the person smoked on average in one day.(can be considered continuous as one can have any number of cigarettes, even half a cigarette.) Medical( history) • BP Meds: whether or not the patient was on blood pressure medication (Nominal) • Prevalent Stroke: whether or not the patient had previously had a stroke (Nominal) • Prevalent Hyp: whether or not the patient was hypertensive (Nominal) • Diabetes: whether or not the patient had diabetes (Nominal) Medical(current) • Tot Chol: total cholesterol level (Continuous) • Sys BP: systolic blood pressure (Continuous) • Dia BP: diastolic blood pressure (Continuous) • BMI: Body Mass Index (Continuous) • Heart Rate: heart rate (Continuous - In medical research, variables such as heart rate though in fact discrete, yet are considered continuous because of large number of possible values.) • Glucose: glucose level (Continuous) Predict variable (desired target) • 10 year risk of coronary heart disease CHD (binary: “1”, means “Yes”, “0” means “No”) Logistic Regression Logistic regression is a type of regression analysis in statistics used for prediction of outcome of a categorical dependent variable from a set of predictor or independent variables. In logistic regression the dependent variable is always binary. Logistic regression is mainly used to for prediction and also calculating the probability of success. The results above show some of the attributes with P value higher than the preferred alpha(5%) and thereby showing low statistically significant relationship with the probability of heart disease. Backward elimination approach is used here to remove those attributes with highest P-value one at a time followed by running the regression repeatedly until all attributes have P Values less than 0.05. Feature Selection: Backward elimination (P-value approach) Logistic regression equation P=eβ0+β1X1/1+eβ0+β1X1P=eβ0+β1X1/1+eβ0+β1X1 When all features plugged in: logit(p)=log(p/(1−p))=β0+β1∗Sexmale+β2∗age+β3∗cigsPerDay+β4∗totChol+β5∗sysBP+β6∗glucoselogit(p)=log(p/(1−p))=β0+β1∗Sexmale+β2∗age+β3∗cigsPerDay+β4∗totChol+β5∗sysBP+β6∗glucose
Interpreting the results: Odds Ratio, Confidence Intervals and P-values • This fitted model shows that, holding all other features constant, the odds of getting diagnosed with heart disease for males (sex_male = 1)over that of females (sex_male = 0) is exp(0.5815) = 1.788687. In terms of percent change, we can say that the odds for males are 78.8% higher than the odds for females. • The coefficient for age says that, holding all others constant, we will see 7% increase in the odds of getting diagnosed with CDH for a one year increase in age since exp(0.0655) = 1.067644. • Similarly , with every extra cigarette one smokes thers is a 2% increase in the odds of CDH. • For Total cholesterol level and glucose level there is no significant change.
• There is a 1.7% increase in odds for every unit increase in systolic Blood Pressure.
Model Evaluation - Statistics From the above statistics it is clear that the model is highly specific than sensitive. The negative values are predicted more accurately than the positives. Predicted probabilities of 0 (No Coronary Heart Disease) and 1 ( Coronary Heart Disease: Yes) for the test data with a default classification threshold of 0.5 lower the threshold Since the model is predicting Heart disease too many type II errors is not advisable. A False Negative ( ignoring the probability of disease when there actu...
Facebook
TwitterBackgroundCardiovascular disease (CVD) is the leading cause of mortality in India. Yet, evidence on the CVD risk of India’s population is limited. To inform health system planning and effective targeting of interventions, this study aimed to determine how CVD risk—and the factors that determine risk—varies among states in India, by rural–urban location, and by individual-level sociodemographic characteristics.Methods and findingsWe used 2 large household surveys carried out between 2012 and 2014, which included a sample of 797,540 adults aged 30 to 74 years across India. The main outcome variable was the predicted 10-year risk of a CVD event as calculated with the Framingham risk score. The Harvard–NHANES, Globorisk, and WHO–ISH scores were used in secondary analyses. CVD risk and the prevalence of CVD risk factors were examined by state, rural–urban residence, age, sex, household wealth, and education. Mean CVD risk varied from 13.2% (95% CI: 12.7%–13.6%) in Jharkhand to 19.5% (95% CI: 19.1%–19.9%) in Kerala. CVD risk tended to be highest in North, Northeast, and South India. District-level wealth quintile (based on median household wealth in a district) and urbanization were both positively associated with CVD risk. Similarly, household wealth quintile and living in an urban area were positively associated with CVD risk among both sexes, but the associations were stronger among women than men. Smoking was more prevalent in poorer household wealth quintiles and in rural areas, whereas body mass index, high blood glucose, and systolic blood pressure were positively associated with household wealth and urban location. Men had a substantially higher (age-standardized) smoking prevalence (26.2% [95% CI: 25.7%–26.7%] versus 1.8% [95% CI: 1.7%–1.9%]) and mean systolic blood pressure (126.9 mm Hg [95% CI: 126.7–127.1] versus 124.3 mm Hg [95% CI: 124.1–124.5]) than women. Important limitations of this analysis are the high proportion of missing values (27.1%) in the main outcome variable, assessment of diabetes through a 1-time capillary blood glucose measurement, and the inability to exclude participants with a current or previous CVD event.ConclusionsThis study identified substantial variation in CVD risk among states and sociodemographic groups in India—findings that can facilitate effective targeting of CVD programs to those most at risk and most in need. While the CVD risk scores used have not been validated in South Asian populations, the patterns of variation in CVD risk among the Indian population were similar across all 4 risk scoring systems.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundThe linkage between psoriasis and hypertension has been established through observational studies. Despite this, a comprehensive assessment of the combined effects of psoriasis and hypertension on all-cause mortality is lacking. The principal aim of the present study is to elucidate the synergistic impact of psoriasis and hypertension on mortality within a representative cohort of adults residing in the United States.MethodsThe analysis was conducted on comprehensive datasets derived from the National Health and Nutrition Examination Study spanning two distinct periods: 2003–2006 and 2009–2014. The determination of psoriasis status relied on self-reported questionnaire data, whereas hypertension was characterized by parameters including systolic blood pressure ≥ 140 mmHg, diastolic blood pressure ≥ 90 mmHg, self-reported physician diagnosis, or the use of antihypertensive medication. The assessment of the interplay between psoriasis and hypertension employed multivariable logistic regression analyses. Continuous monitoring of participants’ vital status was conducted until December 31, 2019. A four-level variable amalgamating information on psoriasis and hypertension was established, and the evaluation of survival probability utilized the Kaplan-Meier curve alongside Cox regression analysis. Hazard ratios (HRs) and their associated 95% confidence intervals (CIs) were computed to scrutinize the correlation between psoriasis/hypertension and all-cause mortality.ResultsIn total, this study included 19,799 participants, among whom 554 had psoriasis and 7,692 had hypertension. The findings from the logistic regression analyses indicated a heightened risk of hypertension among individuals with psoriasis in comparison to those devoid of psoriasis. Throughout a median follow-up spanning 105 months, 1,845 participants experienced all-cause death. In comparison to individuals devoid of both hypertension and psoriasis, those with psoriasis alone exhibited an all-cause mortality HR of 0.73 (95% CI: 0.35–1.53), individuals with hypertension alone showed an HR of 1.78 (95% CI: 1.55–2.04), and those with both psoriasis and hypertension had an HR of 2.33 (95% CI: 1.60–3.40). In the course of a stratified analysis differentiating between the presence and absence of psoriasis, it was noted that hypertension correlated with an elevated risk of all-cause mortality in individuals lacking psoriasis (HR 1.77, 95% CI: 1.54–2.04). Notably, this association was further accentuated among individuals with psoriasis, revealing an increased HR of 3.23 (95% CI: 1.47–7.13).ConclusionsThe outcomes of our investigation demonstrated a noteworthy and positive association between psoriasis, hypertension, and all-cause mortality. These findings indicate that individuals who have both psoriasis and hypertension face an increased likelihood of mortality.
Facebook
TwitterSocial vulnerability is defined as the disproportionate susceptibility of some social groups to the impacts of hazards, including death, injury, loss, or disruption of livelihood. In this dataset from Climate Ready Boston, groups identified as being more vulnerable are older adults, children, people of color, people with limited English proficiency, people with low or no incomes, people with disabilities, and people with medical illnesses. Source:The analysis and definitions used in Climate Ready Boston (2016) are based on "A framework to understand the relationship between social factors that reduce resilience in cities: Application to the City of Boston." Published 2015 in the International Journal of Disaster Risk Reduction by Atyia Martin, Northeastern University.Population Definitions:Older Adults:Older adults (those over age 65) have physical vulnerabilities in a climate event; they suffer from higher rates of medical illness than the rest of the population and can have some functional limitations in an evacuation scenario, as well as when preparing for and recovering from a disaster. Furthermore, older adults are physically more vulnerable to the impacts of extreme heat. Beyond the physical risk, older adults are more likely to be socially isolated. Without an appropriate support network, an initially small risk could be exacerbated if an older adult is not able to get help.Data source: 2008-2012 American Community Survey 5-year Estimates (ACS) data by census tract for population over 65 years of age.Attribute label: OlderAdultChildren: Families with children require additional resources in a climate event. When school is cancelled, parents need alternative childcare options, which can mean missing work. Children are especially vulnerable to extreme heat and stress following a natural disaster.Data source: 2010 American Community Survey 5-year Estimates (ACS) data by census tract for population under 5 years of age.Attribute label: TotChildPeople of Color: People of color make up a majority (53 percent) of Boston’s population. People of color are more likely to fall into multiple vulnerable groups aswell. People of color statistically have lower levels of income and higher levels of poverty than the population at large. People of color, many of whom also have limited English proficiency, may not have ready access in their primary language to information about the dangers of extreme heat or about cooling center resources. This risk to extreme heat can be compounded by the fact that people of color often live in more densely populated urban areas that are at higher risk for heat exposure due to the urban heat island effect.Data source: 2008-2012 American Community Survey 5-year Estimates (ACS) data by census tract: Black, Native American, Asian, Island, Other, Multi, Non-white Hispanics.Attribute label: POC2Limited English Proficiency: Without adequate English skills, residents can miss crucial information on how to preparefor hazards. Cultural practices for information sharing, for example, may focus on word-of-mouth communication. In a flood event, residents can also face challenges communicating with emergency response personnel. If residents are more sociallyisolated, they may be less likely to hear about upcoming events. Finally, immigrants, especially ones who are undocumented, may be reluctant to use government services out of fear of deportation or general distrust of the government or emergency personnel.Data Source: 2008-2012 American Community Survey 5-year Estimates (ACS) data by census tract, defined as speaks English only or speaks English “very well”.Attribute label: LEPLow to no Income: A lack of financial resources impacts a household’s ability to prepare for a disaster event and to support friends and neighborhoods. For example, residents without televisions, computers, or data-driven mobile phones may face challenges getting news about hazards or recovery resources. Renters may have trouble finding and paying deposits for replacement housing if their residence is impacted by flooding. Homeowners may be less able to afford insurance that will cover flood damage. Having low or no income can create difficulty evacuating in a disaster event because of a higher reliance on public transportation. If unable to evacuate, residents may be more at risk without supplies to stay in their homes for an extended period of time. Low- and no-income residents can also be more vulnerable to hot weather if running air conditioning or fans puts utility costs out of reach.Data source: 2008-2012 American Community Survey 5-year Estimates (ACS) data by census tract for low-to- no income populations. The data represents a calculated field that combines people who were 100% below the poverty level and those who were 100–149% of the poverty level.Attribute label: Low_to_NoPeople with Disabilities: People with disabilities are among the most vulnerable in an emergency; they sustain disproportionate rates of illness, injury, and death in disaster events.46 People with disabilities can find it difficult to adequately prepare for a disaster event, including moving to a safer place. They are more likely to be left behind or abandoned during evacuations. Rescue and relief resources—like emergency transportation or shelters, for example— may not be universally accessible. Research has revealed a historic pattern of discrimination against people with disabilities in times of resource scarcity, like after a major storm and flood.Data source: 2008-2012 American Community Survey 5-year Estimates (ACS) data by census tract for total civilian non-institutionalized population, including: hearing difficulty, vision difficulty, cognitive difficulty, ambulatory difficulty, self-care difficulty, and independent living difficulty. Attribute label: TotDisMedical Illness: Symptoms of existing medical illnesses are often exacerbated by hot temperatures. For example, heat can trigger asthma attacks or increase already high blood pressure due to the stress of high temperatures put on the body. Climate events can interrupt access to normal sources of healthcare and even life-sustaining medication. Special planning is required for people experiencing medical illness. For example, people dependent on dialysis will have different evacuation and care needs than other Boston residents in a climate event.Data source: Medical illness is a proxy measure which is based on EASI data accessed through Simply Map. Health data at the local level in Massachusetts is not available beyond zip codes. EASI modeled the health statistics for the U.S. population based upon age, sex, and race probabilities using U.S. Census Bureau data. The probabilities are modeled against the census and current year and five year forecasts. Medical illness is the sum of asthma in children, asthma in adults, heart disease, emphysema, bronchitis, cancer, diabetes, kidney disease, and liver disease. A limitation is that these numbers may be over-counted as the result of people potentially having more than one medical illness. Therefore, the analysis may have greater numbers of people with medical illness within census tracts than actually present. Overall, the analysis was based on the relationship between social factors.Attribute label: MedIllnesOther attribute definitions:GEOID10: Geographic identifier: State Code (25), Country Code (025), 2010 Census TractAREA_SQFT: Tract area (in square feet)AREA_ACRES: Tract area (in acres)POP100_RE: Tract population countHU100_RE: Tract housing unit countName: Boston Neighborhood
Facebook
TwitterAbbreviations: BP = blood pressure; HDL = High-density lipoprotein; HOMA %B = Homeostasis Model Assessment steady state beta cell function, HOMA %S = Homeostasis Model Assessment insulin sensitivity, OGTT = oral glucose tolerance test; NIM = not included in model; CVD = cardiovascular disease.P<0.05; **P<0.01; ***P<0.001. P values are two-sided.Models were adjusted for socio-demographic, medical history and smoking, alcohol and dietary behaviour. Please see Table S1 for full list of covariates included in the model for each cardio-metabolic biomarker.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains extensive health information for 2,149 patients, each uniquely identified with IDs ranging from 4751 to 6900. The dataset includes demographic details, lifestyle factors, medical history, clinical measurements, cognitive and functional assessments, symptoms, and a diagnosis of Alzheimer's Disease. The data is ideal for researchers and data scientists looking to explore factors associated with Alzheimer's, develop predictive models, and conduct statistical analyses.
This dataset offers extensive insights into the factors associated with Alzheimer's Disease, including demographic, lifestyle, medical, cognitive, and functional variables. It is ideal for developing predictive models, conducting statistical analyses, and exploring the complex interplay of factors contributing to Alzheimer's Disease.
If you use this dataset in your work, please cite it as follows:
@misc{rabie_el_kharoua_2024,
title={Alzheimer's Disease Dataset},
url={https://www.kaggle.com/dsv/8668279},
DOI={10.34740/KAGGLE/DSV/8668279},
publisher={Kaggle...
Facebook
TwitterIn 2023, almost 46 percent of adults in Alabama suffered from hypertension. This statistic depicts the rate of adults suffering from hypertension in the United States in 2023, sorted by state.