31 datasets found
  1. Adults with hypertension in the U.S. by state 2023

    • statista.com
    Updated Sep 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Adults with hypertension in the U.S. by state 2023 [Dataset]. https://www.statista.com/statistics/505995/adults-with-hypertension-in-the-us-by-states/
    Explore at:
    Dataset updated
    Sep 15, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2023
    Area covered
    United States
    Description

    In 2023, almost 46 percent of adults in Alabama suffered from hypertension. This statistic depicts the rate of adults suffering from hypertension in the United States in 2023, sorted by state.

  2. Heart Attack in Youth Vs Adult in America(State)

    • kaggle.com
    zip
    Updated Jan 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ankush Panday (2025). Heart Attack in Youth Vs Adult in America(State) [Dataset]. https://www.kaggle.com/datasets/ankushpanday1/heart-attack-in-youth-vs-adult-in-americastate
    Explore at:
    zip(100884848 bytes)Available download formats
    Dataset updated
    Jan 5, 2025
    Authors
    Ankush Panday
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    United States
    Description

    The dataset, "Heart Attack in Youth vs. Adults in America", contains 500,000 synthetic records detailing health, lifestyle, and demographic factors contributing to heart attack risks among youth and adults in the United States. This dataset can help researchers and data enthusiasts analyze patterns, predict risk levels, and understand disparities between age groups and regions in terms of heart health.

    Insights Beginners, Intermediate, and Advanced Users Can Derive:

    For Beginners:

    Descriptive Statistics:

    Calculate average cholesterol levels or blood pressure for youth vs. adults. Determine the distribution of heart attack risk levels across different states or demographics.

    Data Visualization:

    Visualize the distribution of obesity indices across age groups. Plot the survival rates based on risk levels.

    For Intermediate Users:

    Exploratory Data Analysis (EDA):

    Investigate the correlation between lifestyle factors (e.g., dietary habits, smoking history) and heart attack risk levels. Compare access to healthcare between low-income and high-income groups.

    Predictive Modeling:

    Build a logistic regression or decision tree model to predict high-risk individuals. Use clustering techniques to group individuals based on heart attack risks.

    For Advanced Users:

    Deep Analysis and Insights:

    Perform a time series analysis on hospital visits and prior heart attacks. Use advanced ML algorithms (e.g., Gradient Boosting, Neural Networks) for risk prediction and survival rate forecasting.

    Feature Engineering:

    Create new features, such as BMI categories or healthcare accessibility indices. Analyze the interaction effects between physical activity, obesity index, and smoking history.

    Explainable AI:

    Use SHAP (SHapley Additive exPlanations) to understand model predictions. Identify biases in predictions related to ethnicity or access to healthcare.

  3. Prevalence of Selected Measures Among Adults Aged 20 and Over: United...

    • catalog.data.gov
    • data.virginia.gov
    • +3more
    Updated Apr 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Centers for Disease Control and Prevention (2025). Prevalence of Selected Measures Among Adults Aged 20 and Over: United States, 1999-2000 through 2017-2018 [Dataset]. https://catalog.data.gov/dataset/prevalence-of-selected-measures-among-adults-aged-20-and-over-united-states-1999-2000-2017-42e36
    Explore at:
    Dataset updated
    Apr 23, 2025
    Dataset provided by
    Centers for Disease Control and Preventionhttp://www.cdc.gov/
    Area covered
    United States
    Description

    This data represents the age-adjusted prevalence of high total cholesterol, hypertension, and obesity among US adults aged 20 and over between 1999-2000 to 2017-2018. Notes: All estimates are age adjusted by the direct method to the U.S. Census 2000 population using age groups 20–39, 40–59, and 60 and over. Definitions Hypertension: Systolic blood pressure greater than or equal to 130 mmHg or diastolic blood pressure greater than or equal to 80 mmHg, or currently taking medication to lower high blood pressure High total cholesterol: Serum total cholesterol greater than or equal to 240 mg/dL. Obesity: Body mass index (BMI, weight in kilograms divided by height in meters squared) greater than or equal to 30. Data Source and Methods Data from the National Health and Nutrition Examination Surveys (NHANES) for the years 1999–2000, 2001–2002, 2003–2004, 2005–2006, 2007–2008, 2009–2010, 2011–2012, 2013–2014, 2015–2016, and 2017–2018 were used for these analyses. NHANES is a cross-sectional survey designed to monitor the health and nutritional status of the civilian noninstitutionalized U.S. population. The survey consists of interviews conducted in participants’ homes and standardized physical examinations, including a blood draw, conducted in mobile examination centers.

  4. Indicators of Heart Disease (2022 UPDATE)

    • kaggle.com
    zip
    Updated Oct 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kamil Pytlak (2023). Indicators of Heart Disease (2022 UPDATE) [Dataset]. https://www.kaggle.com/datasets/kamilpytlak/personal-key-indicators-of-heart-disease/discussion
    Explore at:
    zip(22474335 bytes)Available download formats
    Dataset updated
    Oct 12, 2023
    Authors
    Kamil Pytlak
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Key Indicators of Heart Disease

    2022 annual CDC survey data of 400k+ adults related to their health status

    What subject does the dataset cover?

    According to the CDC, heart disease is a leading cause of death for people of most races in the U.S. (African Americans, American Indians and Alaska Natives, and whites). About half of all Americans (47%) have at least 1 of 3 major risk factors for heart disease: high blood pressure, high cholesterol, and smoking. Other key indicators include diabetes status, obesity (high BMI), not getting enough physical activity, or drinking too much alcohol. Identifying and preventing the factors that have the greatest impact on heart disease is very important in healthcare. In turn, developments in computing allow the application of machine learning methods to detect "patterns" in the data that can predict a patient's condition.

    Where did the data set come from and what treatments has it undergone?

    The dataset originally comes from the CDC and is a major part of the Behavioral Risk Factor Surveillance System (BRFSS), which conducts annual telephone surveys to collect data on the health status of U.S. residents. As described by the CDC: "Established in 1984 with 15 states, BRFSS now collects data in all 50 states, the District of Columbia, and three U.S. territories. BRFSS completes more than 400,000 adult interviews each year, making it the largest continuously conducted health survey system in the world. The most recent dataset includes data from 2023. In this dataset, I noticed many factors (questions) that directly or indirectly influence heart disease, so I decided to select the most relevant variables from it. I also decided to share with you two versions of the most recent dataset: with NaNs and without it.

    What can you do with this data set?

    As described above, the original dataset of nearly 300 variables was reduced to 40variables. In addition to classical EDA, this dataset can be used to apply a number of machine learning methods, especially classifier models (logistic regression, SVM, random forest, etc.). You should treat the variable "HadHeartAttack" as binary ("Yes" - respondent had heart disease; "No" - respondent did not have heart disease). Note, however, that the classes are unbalanced, so the classic approach of applying a model is not advisable. Fixing the weights/undersampling should yield much better results. Based on the data set, I built a logistic regression model and embedded it in an application that might inspire you: https://share.streamlit.io/kamilpytlak/heart-condition-checker/main/app.py. Can you indicate which variables have a significant effect on the likelihood of heart disease?

    What steps did you use to convert the dataset?

    Check out this notebook in my GitHub repository: https://github.com/kamilpytlak/data-science-projects/blob/main/heart-disease-prediction/2022/notebooks/data_processing.ipynb

  5. a

    PLACES: Taking high blood pressure medication

    • hub.arcgis.com
    Updated Oct 22, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Centers for Disease Control and Prevention (2020). PLACES: Taking high blood pressure medication [Dataset]. https://hub.arcgis.com/datasets/cdcarcgis::places-taking-high-blood-pressure-medication
    Explore at:
    Dataset updated
    Oct 22, 2020
    Dataset authored and provided by
    Centers for Disease Control and Prevention
    Area covered
    Description

    This web map is part of the Centers for Disease Control and Prevention (CDC) PLACES. It provides model-based estimates of taking high blood pressure medication prevalence among adults aged 18 years and older who has high blood pressure at county, place, census tract, and ZCTA levels in the United States. PLACES is an expansion of the original 500 Cities Project and a collaboration between the CDC, the Robert Wood Johnson Foundation, and the CDC Foundation. Data sources used to generate these estimates include the Behavioral Risk Factor Surveillance System (BRFSS), Census 2020 population counts or Census annual county-level population estimates, and the American Community Survey (ACS) estimates. For detailed methodology see www.cdc.gov/places. For questions or feedback send an email to places@cdc.gov.Measure name used for taking high blood pressure medication is BPMED.

  6. f

    Data from: Factors Associated with the Occurrence of Arterial Hypertension...

    • datasetcatalog.nlm.nih.gov
    • scielo.figshare.com
    Updated May 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Garcez, Anderson; Xavier, Paula Brustolin; Cibeira, Gabriela Herrmann; Olinto, Maria Teresa Anselmo; Germano, Antonino (2022). Factors Associated with the Occurrence of Arterial Hypertension in Industry Workers of State of Rio Grande do Sul, Brazil [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000407074
    Explore at:
    Dataset updated
    May 30, 2022
    Authors
    Garcez, Anderson; Xavier, Paula Brustolin; Cibeira, Gabriela Herrmann; Olinto, Maria Teresa Anselmo; Germano, Antonino
    Area covered
    State of Rio Grande do Sul, Brazil
    Description

    Abstract Background Hypertension is a serious and persistent public health problem and is one of the main causes of cardiovascular diseases and general mortality. Objectives This study aimed to verify the prevalence and factors associated with systemic arterial hypertension in workers from the state of Rio Grande do Sul, Brazil. Methods This is a cross-sectional study using the secondary data from 20,792 industry workers from 18 to 59 years of age. The presence of arterial hypertension was determined from systolic blood pressure ≥ 140mmHg and/or diastolic blood pressure ≥ 90mmHg or taking antihypertensive medication. Factors investigated included demographic, socioeconomic, behavioral, nutritional status, and family history characteristics. Poisson regression was used in multivariate analysis, adopting a significance level of p<0.05. All analyses were stratified by sex. Results The sample included 12,349 men and 8,443 women with a mean age of 32.8 years (Standard Deviation = 9.8). The prevalence of arterial hypertension was 10.3% (95% CI: 9.8-10.7), which was significantly higher in men than in women (10.9% vs 9.4%; p = 0.001). Arterial hypertension was associated with increased age, a low level of education, living with a partner, being overweight or obese, and having at least one relative with a history of hypertension for both sexes. Women with better socioeconomic conditions presented a lower prevalence of hypertension. Conclusions The main factors associated with hypertension included sociodemographic, nutritional, and family history characteristics. In addition, socioeconomic conditions showed an association with the occurrence of hypertension, especially among women.

  7. 500 Cities: Local Data for Better Health, 2018

    • kaggle.com
    zip
    Updated Nov 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jennifer Santiago (2021). 500 Cities: Local Data for Better Health, 2018 [Dataset]. https://www.kaggle.com/datasets/jennifersantiago/500-cities-local-data-for-better-health-2018
    Explore at:
    zip(45682447 bytes)Available download formats
    Dataset updated
    Nov 14, 2021
    Authors
    Jennifer Santiago
    License

    https://www.usa.gov/government-works/https://www.usa.gov/government-works/

    Description

    Description

    This is the 500 Cities Data of a different year (released in 2018) to the ones already present on Kaggle. It was exported and uploaded without modification. The original source states: "This is the complete dataset for the 500 Cities project 2018 release. This dataset includes 2016, 2015 model-based small area estimates for 27 measures of chronic disease related to unhealthy behaviors (5), health outcomes (13), and use of preventive services (9). Data were provided by the Centers for Disease Control and Prevention (CDC), Division of Population Health, Epidemiology and Surveillance Branch. The project was funded by the Robert Wood Johnson Foundation (RWJF) in conjunction with the CDC Foundation. It represents a first-of-its kind effort to release information on a large scale for cities and for small areas within those cities. It includes estimates for the 500 largest US cities and approximately 28,000 census tracts within these cities. These estimates can be used to identify emerging health problems and to inform development and implementation of effective, targeted public health prevention activities. Because the small area model cannot detect effects due to local interventions, users are cautioned against using these estimates for program or policy evaluations. Data sources used to generate these measures include Behavioral Risk Factor Surveillance System (BRFSS) data (2016, 2015), Census Bureau 2010 census population data, and American Community Survey (ACS) 2012-2016, 2011-2015 estimates. Because some questions are only asked every other year in the BRFSS, there are 4 measures (high blood pressure, taking high blood pressure medication, high cholesterol, cholesterol screening) from the 2015 BRFSS that are the same in the 2018 release as the previous 2017 release. More information about the methodology can be found at www.cdc.gov/500cities."

    The original can be found at: https://chronicdata.cdc.gov/500-Cities-Places/500-Cities-Local-Data-for-Better-Health-2018-relea/rja3-32tc

    The 500 Cities project ran from 2016 to 2019. In December of 2020, this was expanded into and replaced by the PLACES project.

    Content

    This dataset contains data for the US, 500 cities within it (the 497 largest cities of the US, then a few that were the largest of their state in order to ensure all states were represented), and the census tracts within those cities. The total population represents about 33.4% of the US population.

    Measures include:

    • "Current lack of health insurance among adults aged 18–64 Years,"
    • "Visits to doctor for routine checkup within the past Year among adults aged >=18 Years,"
    • "Fecal occult blood test, sigmoidoscopy, or colonoscopy among adults aged 50–75 Years,"
    • "Mammography use among women aged 50–74 Years,"
    • "Visits to dentist or dental clinic among adults aged >=18 Years,"
    • "Cholesterol screening among adults aged >=18 Years,"
    • "Older adult men aged >=65 Years who are up to date on a core set of clinical preventive services: Flu shot past Year, PPV shot ever, Colorectal cancer screening,"
    • "Older adult women aged >=65 Years who are up to date on a core set of clinical preventive services: Flu shot past Year, PPV shot ever, Colorectal cancer screening,"
    • "Papanicolaou smear use among adult women aged 21–65 Years."
    • "No leisure-time physical activity among adults aged >=18 Years"
    • "Sleeping less than 7 hours among adults aged >=18 Years"
    • "Binge drinking among adults aged >=18 Years"
    • "Current smoking among adults aged >=18 Years"
    • "Physical health not good for >=14 days among adults aged >=18 Years"
    • "Mental health not good for >=14 days among adults aged >=18 Years"
    • "High blood pressure among adults aged >=18 Years"
    • "Arthritis among adults aged >=18 Years"
    • "Stroke among adults aged >=18 Years"
    • "Obesity among adults aged >=18 Years"
    • "All teeth lost among adults aged >=65 Years"
    • "Diagnosed diabetes among adults aged >=18 Years"
    • "Cancer (excluding skin cancer) among adults aged >=18 Years"
    • "Chronic obstructive pulmonary disease among adults aged >=18 Years"
    • "Coronary heart disease among adults aged >=18 Years"
    • "Current asthma among adults aged >=18 Years"
    • "Chronic kidney disease among adults aged >=18 Years"
    • "Taking medicine for high blood pressure control among adults aged >=18 Years with high blood pressure"
    • "High cholesterol among adults aged >=18 Years who have been screened in the past 5 Years"

    Inspiration

    Please help this dataset reveal more by investigating anything that captures your attention, but for ideas, consider: * Does the state or region play a role in any of the measures? * Can you build a model to predict any of the measures? * Combining this data with the other years posted on Kaggle to determine how the measures have changed over ...

  8. Data from: Oregon Health Insurance Experiment, 2007-2010

    • search.datacite.org
    • icpsr.umich.edu
    Updated 2013
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amy Finkelstein; Katherine Baicker (2013). Oregon Health Insurance Experiment, 2007-2010 [Dataset]. http://doi.org/10.3886/icpsr34314
    Explore at:
    Dataset updated
    2013
    Dataset provided by
    DataCitehttps://www.datacite.org/
    Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
    Authors
    Amy Finkelstein; Katherine Baicker
    Dataset funded by
    United States Department of Health and Human Services. Office of the Assistant Secretary for Planning and Evaluation
    United States Department of Health and Human Services. Centers for Medicare and Medicaid Services
    United States Department of Health and Human Services. National Institutes of Health. National Institute on Aging
    California HealthCare Foundation
    John D. and Catherine T. MacArthur Foundation
    Robert Wood Johnson Foundation
    United States Social Security Administration
    Smith Richardson Foundation
    Alfred P. Sloan Foundation
    Description

    In 2008, a group of uninsured low-income adults in Oregon was selected by lottery to be given the chance to apply for Medicaid. This lottery provides an opportunity to gauge the effects of expanding access to public health insurance on the health care use, financial strain, and health of low-income adults using a randomized controlled design. The Oregon Health Insurance Experiment follows and compares those selected in the lottery (treatment group) with those not selected (control group). The data collected and provided here include data from in-person interviews, three mail surveys, emergency department records, and administrative records on Medicaid enrollment, the initial lottery sign-up list, welfare benefits, and mortality. This data collection has seven data files: Dataset 1 contains administrative data on the lottery from the state of Oregon. These data include demographic characteristics that were recorded when individuals signed up for the lottery, date of lottery draw, and information on who was selected for the lottery, applied for the lotteried Medicaid plan if selected, and whose application for the lotteried plan was approved. Also included are Oregon mortality data for 2008 and 2009. Dataset 2 contains information from the state of Oregon on the individuals' participation in Medicaid, Supplemental Nutrition Assistance Program (SNAP), and Temporary Assistance to Needy Families (TANF). Datasets 3-5 contain the data from the initial, six month, and 12 month mail surveys, respectively. Topics covered by the surveys include demographic characteristics; health insurance, access to health care and health care utilization; health care needs, experiences, and costs; overall health status and changes in health; and depression and medical conditions and use of medications to treat them. Dataset 6 contains an analysis subset of the variables from the in-person interviews. Topics covered by the survey questionnaire include overall health, health insurance coverage, health care access, health care utilization, conditions and treatments, health behaviors, medical and dental costs, and demographic characteristics. The interviewers also obtained blood pressure and anthropometric measurements and collected dried blood spots to measure levels of cholesterol, glycated hemoglobin and C-reactive protein. Dataset 7 contains an analysis subset of the variables the study obtained for all emergency department (ED) visits to twelve hospitals in the Portland area during 2007-2009. These variables capture total hospital costs, ED costs, and the number of ED visits categorized by time of the visit (daytime weekday or nighttime and weekends), necessity of the visit (emergent, ED care needed, non-preventable; emergent, ED care needed, preventable; emergent, primary care treatable), ambulatory case sensitive status, whether or not the patient was hospitalized, and the reason for the visit (e.g., injury, abdominal pain, chest pain, headache, and mental disorders). The collection also includes a ZIP archive (Dataset 8) with Stata programs that replicate analyses reported in three articles by the principal investigators and others: Finkelstein, Amy et al "The Oregon Health Insurance Experiment: Evidence from the First Year". The Quarterly Journal of Economics. August 2012. Vol 127(3). Baicker, Katherine et al "The Oregon Experiment - Effects of Medicaid on Clinical Outcomes". New England Journal of Medicine. 2 May 2013. Vol 368(18). Taubman, Sarah et al "Medicaid Increases Emergency Department Use: Evidence from Oregon's Health Insurance Experiment". Science. 2 Jan 2014.

  9. d

    SHIP Adults who are not overweight or obese 2011-2021

    • catalog.data.gov
    • opendata.maryland.gov
    • +1more
    Updated Aug 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    opendata.maryland.gov (2024). SHIP Adults who are not overweight or obese 2011-2021 [Dataset]. https://catalog.data.gov/dataset/ship-adults-who-are-not-overweight-or-obese-2011-2017
    Explore at:
    Dataset updated
    Aug 16, 2024
    Dataset provided by
    opendata.maryland.gov
    Description

    This is historical data. The update frequency has been set to "Static Data" and is here for historic value. Updated on 8/14/2024 Adults who are not overweight or obese - This indicator shows the percentage of adults who are not overweight or obese. In Maryland in 2015, of adults considered obese, 52% had high blood pressure, 44% had high cholesterol, and 21% had diabetes. Healthy weight can aid in the control of these conditions if they develop. Link to Data Details

  10. Coronary Heart Disease Prediction in Ten Years

    • kaggle.com
    zip
    Updated Dec 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Palak Doshi (2023). Coronary Heart Disease Prediction in Ten Years [Dataset]. https://www.kaggle.com/datasets/palakdoshijain/coronary-heart-disease-prediction-in-ten-years/code
    Explore at:
    zip(59801 bytes)Available download formats
    Dataset updated
    Dec 10, 2023
    Authors
    Palak Doshi
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    World Health Organization has estimated 12 million deaths occur worldwide, every year due to Heart diseases. Half the deaths in the United States and other developed countries are due to cardio vascular diseases. The early prognosis of cardiovascular diseases can aid in making decisions on lifestyle changes in high risk patients and in turn reduce the complications. This research intends to pinpoint the most relevant/risk factors of heart disease as well as predict the overall risk using logistic regression, decision tree classifier, Random Forest Classifier and various boosting techniques. The dataset is publically available on the Kaggle website, and it is from an ongoing ongoing cardiovascular study on residents of the town of Framingham, Massachusetts. The classification goal is to predict whether the patient has 10-year risk of future coronary heart disease (CHD).The dataset provides the patients’ information. It includes over 4,000 records and 15 attributes.

    Variables : Each attribute is a potential risk factor. There are both demographic, behavioural and medical risk factors.

    **Demographic: sex: **male or female;(Nominal)

    age: age of the patient;(Continuous - Although the recorded ages have been truncated to whole numbers, the concept of age is continuous) Behavioural

    **currentSmoker: **whether or not the patient is a current smoker (Nominal)

    cigsPerDay: the number of cigarettes that the person smoked on average in one day.(can be considered continuous as one can have any number of cigarretts, even half a cigarette.)

    Medical( history):

    BPMeds: whether or not the patient was on blood pressure medication (Nominal)

    prevalentStroke: whether or not the patient had previously had a stroke (Nominal)

    ****prevalentHyp: whether or not the patient was hypertensive (Nominal)

    diabetes: whether or not the patient had diabetes (Nominal)

    Medical(current):

    totChol: total cholesterol level (Continuous)

    sysBP: systolic blood pressure (Continuous)

    **diaBP: **diastolic blood pressure (Continuous)

    BMI: Body Mass Index (Continuous)

    heartRate: heart rate (Continuous - In medical research, variables such as heart rate though in fact discrete, yet are considered continuous because of large number of possible values.)

    glucose: glucose level (Continuous)

    Predict variable (desired target):

    10 year risk of coronary heart disease CHD (binary: “1”, means “Yes”, “0” means “No”)

  11. The NIMH Healthy Research Volunteer Dataset

    • openneuro.org
    Updated Dec 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Allison C. Nugent; Adam G Thomas; Margaret Mahoney; Alison Gibbons; Jarrod Smith; Antoinette Charles; Jacob S Shaw; Jeffrey D Stout; Anna M Namyst; Arshitha Basavaraj; Eric Earl; Dustin Moraczewski; Emily Guinee; Michael Liu; Travis Riddle; Joseph Snow; Shruti Japee; Morgan Andrews; Adriana Pavletic; Stephen Sinclair; Vinai Roopchansingh; Peter A Bandettini; Joyce Chung (2024). The NIMH Healthy Research Volunteer Dataset [Dataset]. http://doi.org/10.18112/openneuro.ds004215.v2.0.1
    Explore at:
    Dataset updated
    Dec 20, 2024
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    Allison C. Nugent; Adam G Thomas; Margaret Mahoney; Alison Gibbons; Jarrod Smith; Antoinette Charles; Jacob S Shaw; Jeffrey D Stout; Anna M Namyst; Arshitha Basavaraj; Eric Earl; Dustin Moraczewski; Emily Guinee; Michael Liu; Travis Riddle; Joseph Snow; Shruti Japee; Morgan Andrews; Adriana Pavletic; Stephen Sinclair; Vinai Roopchansingh; Peter A Bandettini; Joyce Chung
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The National Institute of Mental Health (NIMH) Research Volunteer (RV) Data Set

    A comprehensive dataset characterizing healthy research volunteers in terms of clinical assessments, mood-related psychometrics, cognitive function neuropsychological tests, structural and functional magnetic resonance imaging (MRI), along with diffusion tensor imaging (DTI), and a comprehensive magnetoencephalography battery (MEG).

    In addition, blood samples are currently banked for future genetic analysis. All data collected in this protocol are broadly shared in the OpenNeuro repository, in the Brain Imaging Data Structure (BIDS) format. In addition, task paradigms and basic pre-processing scripts are shared on GitHub. This dataset is unprecedented in its depth of characterization of a healthy population and will allow a wide array of investigations into normal cognition and mood regulation.

    This dataset is licensed under the Creative Commons Zero (CC0) v1.0 License.

    Release Notes

    Release v2.0.0

    This release includes data collected between 2020-06-03 (cut-off date for v1.0.0) and 2024-04-01. Notable changes in this release:

    1. 769 new participants have been added along with re-evaluation data for 15 participants. Total unique participants count is now 1859.
    2. visit and age_at_visit columns added to phenotype files to distinguish between visits and intervals between them.
    3. Follow-up online survey data included.
    4. Replaced Beck Anxiety Inventory (BAI) and Beck Depression Inventory-II (BDI-II) with General Anxiety Disorder-7 (GAD7) and Patient Health Questionnaire 9 (PHQ9) surveys, respectively.
    5. Discontinued the Perceived Health rating survey.
    6. Added Brief Trauma Questionnaire (BTQ) and Big Five personality survey to online screening questionnaires.
    7. MRI:
      • Replaced ADNI-3 resting state sequence with a multi-echo sequence with higher spatial resolution.
      • Replaced field map scans with a shorter reversed-blipped EPI scan.
    8. MEG:
      • Some participants have 6-minute empty room data instead of the shorter duration empty room acquisition.

    See the CHANGES file for complete version-wise changelog.

    Participant Eligibility

    To be eligible for the study, participants need to be medically healthy adults over 18 years of age with the ability to read, speak and understand English. All participants provided electronic informed consent for online pre-screening, and written informed consent for all other procedures. Participants with a history of mental illness or suicidal or self-injury thoughts or behavior are excluded. Additional exclusion criteria include current illicit drug use, abnormal medical exam, and less than an 8th grade education or IQ below 70. Current NIMH employees, or first degree relatives of NIMH employees are prohibited from participating. Study participants are recruited through direct mailings, bulletin boards and listservs, outreach exhibits, print advertisements, and electronic media.

    Clinical Measures

    All potential volunteers visit the study website, check a box indicating consent, and fill out preliminary screening questionnaires. The questionnaires include basic demographics, the World Health Organization Disability Assessment Schedule 2.0 (WHODAS 2.0), the DSM-5 Self-Rated Level 1 Cross-Cutting Symptom Measure, the DSM-5 Level 2 Cross-Cutting Symptom Measure - Substance Use, the Alcohol Use Disorders Identification Test (AUDIT), the Edinburgh Handedness Inventory, and a brief clinical history checklist. The WHODAS 2.0 is a 15 item questionnaire that assesses overall general health and disability, with 14 items distributed over 6 domains: cognition, mobility, self-care, “getting along”, life activities, and participation. The DSM-5 Level 1 cross-cutting measure uses 23 items to assess symptoms across diagnoses, although an item regarding self-injurious behavior was removed from the online self-report version. The DSM-5 Level 2 cross-cutting measure is adapted from the NIDA ASSIST measure, and contains 15 items to assess use of both illicit drugs and prescription drugs without a doctor’s prescription. The AUDIT is a 10 item screening assessment used to detect harmful levels of alcohol consumption, and the Edinburgh Handedness Inventory is a systematic assessment of handedness. These online results do not contain any personally identifiable information (PII). At the conclusion of the questionnaires, participants are prompted to send an email to the study team. These results are reviewed by the study team, who determines if the participant is appropriate for an in-person interview.

    Participants who meet all inclusion criteria are scheduled for an in-person screening visit to determine if there are any further exclusions to participation. At this visit, participants receive a History and Physical exam, Structured Clinical Interview for DSM-5 Disorders (SCID-5), the Beck Depression Inventory-II (BDI-II), Beck Anxiety Inventory (BAI), and the Kaufman Brief Intelligence Test, Second Edition (KBIT-2). The purpose of these cognitive and psychometric tests is two-fold. First, these measures are designed to provide a sensitive test of psychopathology. Second, they provide a comprehensive picture of cognitive functioning, including mood regulation. The SCID-5 is a structured interview, administered by a clinician, that establishes the absence of any DSM-5 axis I disorder. The KBIT-2 is a brief (20 minute) assessment of intellectual functioning administered by a trained examiner. There are three subtests, including verbal knowledge, riddles, and matrices.

    Biological and physiological measures

    Biological and physiological measures are acquired, including blood pressure, pulse, weight, height, and BMI. Blood and urine samples are taken and a complete blood count, acute care panel, hepatic panel, thyroid stimulating hormone, viral markers (HCV, HBV, HIV), c-reactive protein, creatine kinase, urine drug screen and urine pregnancy tests are performed. In addition, three additional tubes of blood samples are collected and banked for future analysis, including genetic testing.

    Imaging Studies

    Participants were given the option to enroll in optional magnetic resonance imaging (MRI) and magnetoencephalography (MEG) studies.

    MRI

    On the same visit as the MRI scan, participants are administered a subset of tasks from the NIH Toolbox Cognition Battery. The four tasks asses attention and executive functioning (Flanker Inhibitory Control and Attention Task), executive functioning (Dimensional Change Card Sort Task), episodic memory (Picture Sequence Memory Task), and working memory (List Sorting Working Memory Task). The MRI protocol used was initially based on the ADNI-3 basic protocol, but was later modified to include portions of the ABCD protocol in the following manner:

    1. The T1 scan from ADNI3 was replaced by the T1 scan from the ABCD protocol.
    2. The Axial T2 2D FLAIR acquisition from ADNI2 was added, and fat saturation turned on.
    3. Fat saturation was turned on for the pCASL acquisition.
    4. The high-resolution in-plane hippocampal 2D T2 scan was removed, and replaced with the whole brain 3D T2 scan from the ABCD protocol (which is resolution and bandwidth matched to the T1 scan).
    5. The slice-select gradient reversal method was turned on for DTI acquisition, and reconstruction interpolation turned off.
    6. Scans for distortion correction were added (reversed-blip scans for DTI and resting state scans).
    7. The 3D FLAIR sequence was made optional, and replaced by one where the prescription and other acquisition parameters provide resolution and geometric correspondence between the T1 and T2 scans.

    MEG

    The optional MEG studies were added to the protocol approximately one year after the study was initiated, thus there are relatively fewer MEG recordings in comparison to the MRI dataset. MEG studies are performed on a 275 channel CTF MEG system. The position of the head was localized at the beginning and end of the recording using three fiducial coils. These coils were placed 1.5 cm above the nasion, and at each ear, 1.5 cm from the tragus on a line between the tragus and the outer canthus of the eye. For some participants, photographs were taken of the three coils and used to mark the points on the T1 weighted structural MRI scan for co-registration. For the remainder of the participants, a BrainSight neuro-navigation unit was used to coregister the MRI, anatomical fiducials, and localizer coils directly prior to MEG data acquisition.

    Specific Survey and Test Data within Data Set

    NOTE: In the release 2.0 of the dataset, two measures Brief Trauma Questionnaire (BTQ) and Big Five personality survey were added to the online screening questionnaires. Also, for the in-person screening visit, the Beck Anxiety Inventory (BAI) and Beck Depression Inventory-II (BDI-II) were replaced with the General Anxiety Disorder-7 (GAD7) and Patient Health Questionnaire 9 (PHQ9) surveys, respectively. The Perceived Health rating survey was discontinued.

    1. Preliminary Online Screening Questionnaires

    Survey or TestBIDS TSV Name
    Alcohol Use Disorders Identification Test (AUDIT)audit.tsv
    Brief Trauma Questionnaire (BTQ)btq.tsv
    Big-Five Personalitybig_five_personality.tsv
    Demographicsdemographics.tsv
    Drug Use Questionnaire
  12. w

    Demographic and Health Survey 2013 - Namibia

    • microdata.worldbank.org
    • catalog.ihsn.org
    • +1more
    Updated Jun 5, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ministry of Health and Social Services (MoHSS) (2017). Demographic and Health Survey 2013 - Namibia [Dataset]. https://microdata.worldbank.org/index.php/catalog/2210
    Explore at:
    Dataset updated
    Jun 5, 2017
    Dataset provided by
    Ministry of Health and Social Serviceshttp://www.mhss.gov.na/
    Authors
    Ministry of Health and Social Services (MoHSS)
    Time period covered
    2013
    Area covered
    Namibia
    Description

    Abstract

    The 2013 NDHS is part of the worldwide Demographic and Health Surveys (DHS) programme funded by the United States Agency for International Development (USAID). DHS surveys are designed to collect data on fertility, family planning, and maternal and child health; assist countries in monitoring changes in population, health, and nutrition; and provide an international database that can be used by researchers investigating topics related to population, health, and nutrition.

    The overall objective of the survey is to provide demographic, socioeconomic, and health data necessary for policymaking, planning, monitoring, and evaluation of national health and population programmes. In addition, the survey measured the prevalence of anaemia, HIV, high blood glucose, and high blood pressure among adult women and men; assessed the prevalence of anaemia among children age 6-59 months; and collected anthropometric measurements to assess the nutritional status of women, men, and children.

    A long-term objective of the survey is to strengthen the technical capacity of local organizations to plan, conduct, and process and analyse data from complex national population and health surveys. At the global level, the 2013 NDHS data are comparable with those from a number of DHS surveys conducted in other developing countries. The 2013 NDHS adds to the vast and growing international database on demographic and health-related variables.

    Geographic coverage

    National coverage

    Analysis unit

    • Households
    • Children aged 0-5
    • Women aged 15 to 49
    • Men aged 15 to 64

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    Sample Design The primary focus of the 2013 NDHS was to provide estimates of key population and health indicators, including fertility and mortality rates, for the country as a whole and for urban and rural areas. In addition, the sample was designed to provide estimates of most key variables for the 13 administrative regions.

    Each of the administrative regions is subdivided into a number of constituencies (with an overall total of 107 constituencies). Each constituency is further subdivided into lower level administrative units. An enumeration area (EA) is the smallest identifiable entity without administrative specification, numbered sequentially within each constituency. Each EA is classified as urban or rural. The sampling frame used for the 2013 NDHS was the preliminary frame of the 2011 Namibia Population and Housing Census (NSA, 2013a). The sampling frame was a complete list of all EAs covering the whole country. Each EA is a geographical area covering an adequate number of households to serve as a counting unit for the population census. In rural areas, an EA is a natural village, part of a large village, or a group of small villages; in urban areas, an EA is usually a city block. The 2011 population census also produced a digitised map for each of the EAs that served as the means of identifying these areas.

    The sample for the 2013 NDHS was a stratified sample selected in two stages. In the first stage, 554 EAs-269 in urban areas and 285 in rural areas-were selected with a stratified probability proportional to size selection from the sampling frame. The size of an EA is defined according to the number of households residing in the EA, as recorded in the 2011 Population and Housing Census. Stratification was achieved by separating every region into urban and rural areas. Therefore, the 13 regions were stratified into 26 sampling strata (13 rural strata and 13 urban strata). Samples were selected independently in every stratum, with a predetermined number of EAs selected. A complete household listing and mapping operation was carried out in all selected clusters. In the second stage, a fixed number of 20 households were selected in every urban and rural cluster according to equal probability systematic sampling.

    Due to the non-proportional allocation of the sample to the different regions and the possible differences in response rates, sampling weights are required for any analysis using the 2013 NDHS data to ensure the representativeness of the survey results at the national as well as the regional level. Since the 2013 NDHS sample was a two-stage stratified cluster sample, sampling probabilities were calculated separately for each sampling stage and for each cluster.

    See Appendix A in the final report for details

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    Three questionnaires were administered in the 2013 NDHS: the Household Questionnaire, the Woman’s Questionnaire, and the Man’s Questionnaire. These questionnaires were adapted from the standard DHS6 core questionnaires to reflect the population and health issues relevant to Namibia at a series of meetings with various stakeholders from government ministries and agencies, nongovernmental organisations, and international donors. The final draft of each questionnaire was discussed at a questionnaire design workshop organised by the MoHSS from September 25-28, 2012, in Windhoek. The questionnaires were then translated from English into the six main local languages—Afrikaans, Rukwangali, Oshiwambo, Damara/Nama, Otjiherero, and Silozi—and back translated into English. The questionnaires were finalised after the pretest, which took place from February 11-25, 2013.

    The Household Questionnaire was used to list all usual household members as well as visitors in the selected households. Basic information was collected on the characteristics of each person listed, including age, sex, education, and relationship to the head of the household. For children under age 18, parents’ survival status was determined. In addition, the Household Questionnaire included questions on knowledge of malaria and use of mosquito nets by household members, along with questions regarding health expenditures. The Household Questionnaire was used to identify women and men who were eligible for the individual interview and the interview on domestic violence. The questionnaire also collected information on characteristics of the household’s dwelling unit, such as source of water, type of toilet facilities, materials used for the floor of the house, and ownership of various durable goods. The results of tests assessing iodine levels were recorded as well.

    In half of the survey households (the same households selected for the male survey), the Household Questionnaire was also used to record information on anthropometry and biomarker data collected from eligible respondents, as follows: • All eligible women and men age 15-64 were measured, weighed, and tested for anaemia and HIV. • All eligible women and men age 35-64 had their blood pressure and blood glucose measured. • All children age 0 to 59 months were measured and weighed. • All children age 6 to 59 months were tested for anaemia.

    The Woman’s Questionnaire was also used to collect information from women age 50-64 living in half of the selected survey households on background characteristics, marriage and sexual activity, women’s work and husbands’ background characteristics, awareness and behaviour regarding AIDS and other STIs, and other health issues.

    The Man’s Questionnaire was administered to all men age 15-64 living in half of the selected survey households. The Man’s Questionnaire collected much of the same information as the Woman’s Questionnaire but was shorter because it did not contain a detailed reproductive history or questions on maternal and child health or nutrition.

    Cleaning operations

    CSPro—a Windows-based integrated census and survey processing system that combines and replaces the ISSA and IMPS packages—was used for entry, editing, and tabulation of the NDHS data. Prior to data entry, a practical training session was provided by ICF International to all data entry staff. A total of 28 data processing personnel, including 17 data entry operators, one questionnaire administrator, two office editors, three secondary editors, two network technicians, two data processing supervisors, and one coordinator, were recruited and trained on administration of questionnaires and coding, data entry and verification, correction of questionnaires and provision of feedback, and secondary editing. NDHS data processing was formally launched during the week of June 22, 2013, at the National Statistics Agency Data Processing Centre in Windhoek. The data entry and editing phase of the survey was completed in January 2014.

    Response rate

    A total of 11,004 households were selected for the sample, of which 10,165 were found to be occupied during data collection. Of the occupied households, 9,849 were successfully interviewed, yielding a household response rate of 97 percent.

    In these households, 9,940 women age 15-49 were identified as eligible for the individual interview. Interviews were completed with 9,176 women, yielding a response rate of 92 percent. In addition, in half of these households, 842 women age 50-64 were successfully interviewed; in this group of women, the response rate was 91 percent.

    Of the 5,271 eligible men identified in the selected subsample of households, 4,481 (85 percent) were successfully interviewed.

    Response rates were higher in rural than in urban areas, with the rural-urban difference more marked among men than among women.

    Sampling error estimates

    The estimates from a sample survey are affected by two types of errors: nonsampling errors and sampling errors. Nonsampling errors are the results of mistakes made in implementing data collection and data processing, such as failure to locate and interview

  13. Cardiovascular Study Dataset

    • kaggle.com
    zip
    Updated Sep 22, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christofel Ganteng (2020). Cardiovascular Study Dataset [Dataset]. https://www.kaggle.com/christofel04/cardiovascular-study-dataset-predict-heart-disea
    Explore at:
    zip(76391 bytes)Available download formats
    Dataset updated
    Sep 22, 2020
    Authors
    Christofel Ganteng
    Description

    This Dataset is for HME Workshop in Oct 3, 2020

    Introduction

    World Health Organization has estimated 12 million deaths occur worldwide, every year due to Heart diseases. Half the deaths in the United States and other developed countries are due to cardio vascular diseases. The early prognosis of cardiovascular diseases can aid in making decisions on lifestyle changes in high risk patients and in turn reduce the complications. This research intends to pinpoint the most relevant/risk factors of heart disease as well as predict the overall risk using logistic regression Data Preparation

    Task

    The task is to predict whether patient have 10 year risk of coronary heart disease CHD or not. Additionally, participants also asked to create some data visualization about the data to gained actionable insight about the topic.

    Source

    The dataset is publically available on the Kaggle website, and it is from an ongoing cardiovascular study on residents of the town of Framingham, Massachusetts. The classification goal is to predict whether the patient has 10-year risk of future coronary heart disease (CHD).The dataset provides the patients’ information. It includes over 4,000 records and 15 attributes. Variables Each attribute is a potential risk factor. There are both demographic, behavioral and medical risk factors.

    Data Description

    Demographic: • Sex: male or female("M" or "F") • Age: Age of the patient;(Continuous - Although the recorded ages have been truncated to whole numbers, the concept of age is continuous) Behavioral • is_smoking: whether or not the patient is a current smoker ("YES" or "NO") • Cigs Per Day: the number of cigarettes that the person smoked on average in one day.(can be considered continuous as one can have any number of cigarettes, even half a cigarette.) Medical( history) • BP Meds: whether or not the patient was on blood pressure medication (Nominal) • Prevalent Stroke: whether or not the patient had previously had a stroke (Nominal) • Prevalent Hyp: whether or not the patient was hypertensive (Nominal) • Diabetes: whether or not the patient had diabetes (Nominal) Medical(current) • Tot Chol: total cholesterol level (Continuous) • Sys BP: systolic blood pressure (Continuous) • Dia BP: diastolic blood pressure (Continuous) • BMI: Body Mass Index (Continuous) • Heart Rate: heart rate (Continuous - In medical research, variables such as heart rate though in fact discrete, yet are considered continuous because of large number of possible values.) • Glucose: glucose level (Continuous) Predict variable (desired target) • 10 year risk of coronary heart disease CHD(binary: “1”, means “Yes”, “0” means “No”)

  14. d

    Maternal, Child, and Adolescent Health Needs Assessment, 2023-2024

    • catalog.data.gov
    • data.sfgov.org
    Updated Aug 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.sfgov.org (2025). Maternal, Child, and Adolescent Health Needs Assessment, 2023-2024 [Dataset]. https://catalog.data.gov/dataset/maternal-child-and-adolescent-health-needs-assessment-2023-2024
    Explore at:
    Dataset updated
    Aug 11, 2025
    Dataset provided by
    data.sfgov.org
    Description

    SUMMARY This table contains data about women, ages 15 to 50, pregnant people, infants, children, and youths, up to age 24. It contains information about a wide range of health topics, including medical conditions, nutrition, dehydration, oral health, mental health, safety, access to health care, and basic needs, like housing. Local, county-level prevalence rates, time trends, and health disparities about national public health priorities, including preterm birth, infant death, childhood obesity, adolescent depression and substance use, and high blood pressure, diabetes, and kidney disease in young adults. The population data is from the 2023-2024 San Francisco Maternal Child and Adolescent Health needs assessment and is published on the Open Data Portal to share with community partners, plan services, and promote health. For more information see: Maternal, Child, and Adolescent Health Homepage Maternal, Child, and Adolescent Health Reports HOW THE DATASET IS CREATED The Maternal, Child, and Adolescent Health (MCAH) Needs Assessment for San Francisco included review of a wide range of citywide population data covering a ten-year span, from 2014 to 2023. Data from over 83,000 birth records, 59,000 death records, 261,000 emergency room visits, 66,000 hospital admissions, and 90,000 newborn screening discharges were gathered, along with citywide data from child welfare records, health screenings in childcare and schools, DMV records of first-time drivers, school surveys, and a state-run mailed survey of recent births (California Department of Public Health MIHA survey). The datasets provided information about approximately 700 health conditions. Each health condition was described in terms of the number of people affected or cases, and the rate affected, stratified by age, sex, race-ethnicity, insurance status, zip code, and time period. Rates were calculated by dividing the number of people or events by the population group estimate (e.g., total births or census estimates), then multiplying by 100 or 1,000 depending on the measure. Each rate was presented with its 95% confidence interval to support users to compare any two rates, either between groups or over time. Two rates differ “significantly” if their 95% confidence intervals do not overlap. The present dataset summarizes the group-level results for any age-, sex-, race-, insurance-, zip code-, and/or period-specific group that included at least 20 people or cases. Causes of death, health conditions that affected over 1000 people in the time frame, problems that got worse over time, and health disparities by insurance, race-ethnicity and/or zip code were flagged for the MCAH Needs Assessment. UPDATE PROCESS The dataset will be updated manually, bi-annually, each December and June. HOW TO USE THIS DATASET Population data from the MCAH needs assessment are shared in several formats, including aggregated datasets on DataSF.gov, downloadable PDF summary reports by age group, interactive online visualizations, data tables, trend graphs, and maps. Information about each variable is available in a linked data dictionary. The definition of each numerator and denominator depends on data source, life stage, and time. Health conditions may not be directly comparable across life stage, if the numerator definition includes age- or pregnancy-specific diagnosis codes (e.g. diabetes hospitalization). For small groups or rare conditions, consider combining time periods and/or groups. Data are suppressed if fewer than 20 cases happened in the group and period. Group-specific rates are available if the matched group-specific census estimates (denominator) were available. Census estim

  15. Logistic regression To predict heart disease

    • kaggle.com
    zip
    Updated Jun 7, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dileep (2019). Logistic regression To predict heart disease [Dataset]. https://www.kaggle.com/dileep070/heart-disease-prediction-using-logistic-regression
    Explore at:
    zip(59801 bytes)Available download formats
    Dataset updated
    Jun 7, 2019
    Authors
    Dileep
    Description

    LOGISTIC REGRESSION - HEART DISEASE PREDICTION

    Introduction World Health Organization has estimated 12 million deaths occur worldwide, every year due to Heart diseases. Half the deaths in the United States and other developed countries are due to cardio vascular diseases. The early prognosis of cardiovascular diseases can aid in making decisions on lifestyle changes in high risk patients and in turn reduce the complications. This research intends to pinpoint the most relevant/risk factors of heart disease as well as predict the overall risk using logistic regression Data Preparation

    Source The dataset is publically available on the Kaggle website, and it is from an ongoing cardiovascular study on residents of the town of Framingham, Massachusetts. The classification goal is to predict whether the patient has 10-year risk of future coronary heart disease (CHD).The dataset provides the patients’ information. It includes over 4,000 records and 15 attributes. Variables Each attribute is a potential risk factor. There are both demographic, behavioral and medical risk factors.

    Demographic: • Sex: male or female(Nominal) • Age: Age of the patient;(Continuous - Although the recorded ages have been truncated to whole numbers, the concept of age is continuous) Behavioral • Current Smoker: whether or not the patient is a current smoker (Nominal) • Cigs Per Day: the number of cigarettes that the person smoked on average in one day.(can be considered continuous as one can have any number of cigarettes, even half a cigarette.) Medical( history) • BP Meds: whether or not the patient was on blood pressure medication (Nominal) • Prevalent Stroke: whether or not the patient had previously had a stroke (Nominal) • Prevalent Hyp: whether or not the patient was hypertensive (Nominal) • Diabetes: whether or not the patient had diabetes (Nominal) Medical(current) • Tot Chol: total cholesterol level (Continuous) • Sys BP: systolic blood pressure (Continuous) • Dia BP: diastolic blood pressure (Continuous) • BMI: Body Mass Index (Continuous) • Heart Rate: heart rate (Continuous - In medical research, variables such as heart rate though in fact discrete, yet are considered continuous because of large number of possible values.) • Glucose: glucose level (Continuous) Predict variable (desired target) • 10 year risk of coronary heart disease CHD (binary: “1”, means “Yes”, “0” means “No”) Logistic Regression Logistic regression is a type of regression analysis in statistics used for prediction of outcome of a categorical dependent variable from a set of predictor or independent variables. In logistic regression the dependent variable is always binary. Logistic regression is mainly used to for prediction and also calculating the probability of success. The results above show some of the attributes with P value higher than the preferred alpha(5%) and thereby showing low statistically significant relationship with the probability of heart disease. Backward elimination approach is used here to remove those attributes with highest P-value one at a time followed by running the regression repeatedly until all attributes have P Values less than 0.05. Feature Selection: Backward elimination (P-value approach) Logistic regression equation P=eβ0+β1X1/1+eβ0+β1X1P=eβ0+β1X1/1+eβ0+β1X1 When all features plugged in: logit(p)=log(p/(1−p))=β0+β1∗Sexmale+β2∗age+β3∗cigsPerDay+β4∗totChol+β5∗sysBP+β6∗glucoselogit(p)=log(p/(1−p))=β0+β1∗Sexmale+β2∗age+β3∗cigsPerDay+β4∗totChol+β5∗sysBP+β6∗glucose

    Interpreting the results: Odds Ratio, Confidence Intervals and P-values • This fitted model shows that, holding all other features constant, the odds of getting diagnosed with heart disease for males (sex_male = 1)over that of females (sex_male = 0) is exp(0.5815) = 1.788687. In terms of percent change, we can say that the odds for males are 78.8% higher than the odds for females. • The coefficient for age says that, holding all others constant, we will see 7% increase in the odds of getting diagnosed with CDH for a one year increase in age since exp(0.0655) = 1.067644. • Similarly , with every extra cigarette one smokes thers is a 2% increase in the odds of CDH. • For Total cholesterol level and glucose level there is no significant change.

    • There is a 1.7% increase in odds for every unit increase in systolic Blood Pressure.

    Model Evaluation - Statistics From the above statistics it is clear that the model is highly specific than sensitive. The negative values are predicted more accurately than the positives. Predicted probabilities of 0 (No Coronary Heart Disease) and 1 ( Coronary Heart Disease: Yes) for the test data with a default classification threshold of 0.5 lower the threshold Since the model is predicting Heart disease too many type II errors is not advisable. A False Negative ( ignoring the probability of disease when there actu...

  16. f

    Data from: Geographic and sociodemographic variation of cardiovascular...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Jun 19, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Theilmann, Michaela; Geldsetzer, Pascal; Awasthi, Ashish; Gaziano, Thomas A.; Atun, Rifat; Bärnighausen, Till; Jaacks, Lindsay M.; Danaei, Goodarz; Manne-Goehler, Jennifer; Vollmer, Sebastian; Davies, Justine I. (2018). Geographic and sociodemographic variation of cardiovascular disease risk in India: A cross-sectional study of 797,540 adults [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000716873
    Explore at:
    Dataset updated
    Jun 19, 2018
    Authors
    Theilmann, Michaela; Geldsetzer, Pascal; Awasthi, Ashish; Gaziano, Thomas A.; Atun, Rifat; Bärnighausen, Till; Jaacks, Lindsay M.; Danaei, Goodarz; Manne-Goehler, Jennifer; Vollmer, Sebastian; Davies, Justine I.
    Description

    BackgroundCardiovascular disease (CVD) is the leading cause of mortality in India. Yet, evidence on the CVD risk of India’s population is limited. To inform health system planning and effective targeting of interventions, this study aimed to determine how CVD risk—and the factors that determine risk—varies among states in India, by rural–urban location, and by individual-level sociodemographic characteristics.Methods and findingsWe used 2 large household surveys carried out between 2012 and 2014, which included a sample of 797,540 adults aged 30 to 74 years across India. The main outcome variable was the predicted 10-year risk of a CVD event as calculated with the Framingham risk score. The Harvard–NHANES, Globorisk, and WHO–ISH scores were used in secondary analyses. CVD risk and the prevalence of CVD risk factors were examined by state, rural–urban residence, age, sex, household wealth, and education. Mean CVD risk varied from 13.2% (95% CI: 12.7%–13.6%) in Jharkhand to 19.5% (95% CI: 19.1%–19.9%) in Kerala. CVD risk tended to be highest in North, Northeast, and South India. District-level wealth quintile (based on median household wealth in a district) and urbanization were both positively associated with CVD risk. Similarly, household wealth quintile and living in an urban area were positively associated with CVD risk among both sexes, but the associations were stronger among women than men. Smoking was more prevalent in poorer household wealth quintiles and in rural areas, whereas body mass index, high blood glucose, and systolic blood pressure were positively associated with household wealth and urban location. Men had a substantially higher (age-standardized) smoking prevalence (26.2% [95% CI: 25.7%–26.7%] versus 1.8% [95% CI: 1.7%–1.9%]) and mean systolic blood pressure (126.9 mm Hg [95% CI: 126.7–127.1] versus 124.3 mm Hg [95% CI: 124.1–124.5]) than women. Important limitations of this analysis are the high proportion of missing values (27.1%) in the main outcome variable, assessment of diabetes through a 1-time capillary blood glucose measurement, and the inability to exclude participants with a current or previous CVD event.ConclusionsThis study identified substantial variation in CVD risk among states and sociodemographic groups in India—findings that can facilitate effective targeting of CVD programs to those most at risk and most in need. While the CVD risk scores used have not been validated in South Asian populations, the patterns of variation in CVD risk among the Indian population were similar across all 4 risk scoring systems.

  17. f

    S1 Data -

    • figshare.com
    • plos.figshare.com
    xlsx
    Updated Jul 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Honglei Zhao; Ji Wu; Qianqian Wu (2024). S1 Data - [Dataset]. http://doi.org/10.1371/journal.pone.0306048.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 5, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Honglei Zhao; Ji Wu; Qianqian Wu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundThe linkage between psoriasis and hypertension has been established through observational studies. Despite this, a comprehensive assessment of the combined effects of psoriasis and hypertension on all-cause mortality is lacking. The principal aim of the present study is to elucidate the synergistic impact of psoriasis and hypertension on mortality within a representative cohort of adults residing in the United States.MethodsThe analysis was conducted on comprehensive datasets derived from the National Health and Nutrition Examination Study spanning two distinct periods: 2003–2006 and 2009–2014. The determination of psoriasis status relied on self-reported questionnaire data, whereas hypertension was characterized by parameters including systolic blood pressure ≥ 140 mmHg, diastolic blood pressure ≥ 90 mmHg, self-reported physician diagnosis, or the use of antihypertensive medication. The assessment of the interplay between psoriasis and hypertension employed multivariable logistic regression analyses. Continuous monitoring of participants’ vital status was conducted until December 31, 2019. A four-level variable amalgamating information on psoriasis and hypertension was established, and the evaluation of survival probability utilized the Kaplan-Meier curve alongside Cox regression analysis. Hazard ratios (HRs) and their associated 95% confidence intervals (CIs) were computed to scrutinize the correlation between psoriasis/hypertension and all-cause mortality.ResultsIn total, this study included 19,799 participants, among whom 554 had psoriasis and 7,692 had hypertension. The findings from the logistic regression analyses indicated a heightened risk of hypertension among individuals with psoriasis in comparison to those devoid of psoriasis. Throughout a median follow-up spanning 105 months, 1,845 participants experienced all-cause death. In comparison to individuals devoid of both hypertension and psoriasis, those with psoriasis alone exhibited an all-cause mortality HR of 0.73 (95% CI: 0.35–1.53), individuals with hypertension alone showed an HR of 1.78 (95% CI: 1.55–2.04), and those with both psoriasis and hypertension had an HR of 2.33 (95% CI: 1.60–3.40). In the course of a stratified analysis differentiating between the presence and absence of psoriasis, it was noted that hypertension correlated with an elevated risk of all-cause mortality in individuals lacking psoriasis (HR 1.77, 95% CI: 1.54–2.04). Notably, this association was further accentuated among individuals with psoriasis, revealing an increased HR of 3.23 (95% CI: 1.47–7.13).ConclusionsThe outcomes of our investigation demonstrated a noteworthy and positive association between psoriasis, hypertension, and all-cause mortality. These findings indicate that individuals who have both psoriasis and hypertension face an increased likelihood of mortality.

  18. a

    Climate Ready Boston Social Vulnerability

    • bostonopendata-boston.opendata.arcgis.com
    • cloudcity.ogopendata.com
    • +3more
    Updated Sep 21, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BostonMaps (2017). Climate Ready Boston Social Vulnerability [Dataset]. https://bostonopendata-boston.opendata.arcgis.com/datasets/boston::climate-ready-boston-social-vulnerability
    Explore at:
    Dataset updated
    Sep 21, 2017
    Dataset authored and provided by
    BostonMaps
    Area covered
    Description

    Social vulnerability is defined as the disproportionate susceptibility of some social groups to the impacts of hazards, including death, injury, loss, or disruption of livelihood. In this dataset from Climate Ready Boston, groups identified as being more vulnerable are older adults, children, people of color, people with limited English proficiency, people with low or no incomes, people with disabilities, and people with medical illnesses. Source:The analysis and definitions used in Climate Ready Boston (2016) are based on "A framework to understand the relationship between social factors that reduce resilience in cities: Application to the City of Boston." Published 2015 in the International Journal of Disaster Risk Reduction by Atyia Martin, Northeastern University.Population Definitions:Older Adults:Older adults (those over age 65) have physical vulnerabilities in a climate event; they suffer from higher rates of medical illness than the rest of the population and can have some functional limitations in an evacuation scenario, as well as when preparing for and recovering from a disaster. Furthermore, older adults are physically more vulnerable to the impacts of extreme heat. Beyond the physical risk, older adults are more likely to be socially isolated. Without an appropriate support network, an initially small risk could be exacerbated if an older adult is not able to get help.Data source: 2008-2012 American Community Survey 5-year Estimates (ACS) data by census tract for population over 65 years of age.Attribute label: OlderAdultChildren: Families with children require additional resources in a climate event. When school is cancelled, parents need alternative childcare options, which can mean missing work. Children are especially vulnerable to extreme heat and stress following a natural disaster.Data source: 2010 American Community Survey 5-year Estimates (ACS) data by census tract for population under 5 years of age.Attribute label: TotChildPeople of Color: People of color make up a majority (53 percent) of Boston’s population. People of color are more likely to fall into multiple vulnerable groups aswell. People of color statistically have lower levels of income and higher levels of poverty than the population at large. People of color, many of whom also have limited English proficiency, may not have ready access in their primary language to information about the dangers of extreme heat or about cooling center resources. This risk to extreme heat can be compounded by the fact that people of color often live in more densely populated urban areas that are at higher risk for heat exposure due to the urban heat island effect.Data source: 2008-2012 American Community Survey 5-year Estimates (ACS) data by census tract: Black, Native American, Asian, Island, Other, Multi, Non-white Hispanics.Attribute label: POC2Limited English Proficiency: Without adequate English skills, residents can miss crucial information on how to preparefor hazards. Cultural practices for information sharing, for example, may focus on word-of-mouth communication. In a flood event, residents can also face challenges communicating with emergency response personnel. If residents are more sociallyisolated, they may be less likely to hear about upcoming events. Finally, immigrants, especially ones who are undocumented, may be reluctant to use government services out of fear of deportation or general distrust of the government or emergency personnel.Data Source: 2008-2012 American Community Survey 5-year Estimates (ACS) data by census tract, defined as speaks English only or speaks English “very well”.Attribute label: LEPLow to no Income: A lack of financial resources impacts a household’s ability to prepare for a disaster event and to support friends and neighborhoods. For example, residents without televisions, computers, or data-driven mobile phones may face challenges getting news about hazards or recovery resources. Renters may have trouble finding and paying deposits for replacement housing if their residence is impacted by flooding. Homeowners may be less able to afford insurance that will cover flood damage. Having low or no income can create difficulty evacuating in a disaster event because of a higher reliance on public transportation. If unable to evacuate, residents may be more at risk without supplies to stay in their homes for an extended period of time. Low- and no-income residents can also be more vulnerable to hot weather if running air conditioning or fans puts utility costs out of reach.Data source: 2008-2012 American Community Survey 5-year Estimates (ACS) data by census tract for low-to- no income populations. The data represents a calculated field that combines people who were 100% below the poverty level and those who were 100–149% of the poverty level.Attribute label: Low_to_NoPeople with Disabilities: People with disabilities are among the most vulnerable in an emergency; they sustain disproportionate rates of illness, injury, and death in disaster events.46 People with disabilities can find it difficult to adequately prepare for a disaster event, including moving to a safer place. They are more likely to be left behind or abandoned during evacuations. Rescue and relief resources—like emergency transportation or shelters, for example— may not be universally accessible. Research has revealed a historic pattern of discrimination against people with disabilities in times of resource scarcity, like after a major storm and flood.Data source: 2008-2012 American Community Survey 5-year Estimates (ACS) data by census tract for total civilian non-institutionalized population, including: hearing difficulty, vision difficulty, cognitive difficulty, ambulatory difficulty, self-care difficulty, and independent living difficulty. Attribute label: TotDisMedical Illness: Symptoms of existing medical illnesses are often exacerbated by hot temperatures. For example, heat can trigger asthma attacks or increase already high blood pressure due to the stress of high temperatures put on the body. Climate events can interrupt access to normal sources of healthcare and even life-sustaining medication. Special planning is required for people experiencing medical illness. For example, people dependent on dialysis will have different evacuation and care needs than other Boston residents in a climate event.Data source: Medical illness is a proxy measure which is based on EASI data accessed through Simply Map. Health data at the local level in Massachusetts is not available beyond zip codes. EASI modeled the health statistics for the U.S. population based upon age, sex, and race probabilities using U.S. Census Bureau data. The probabilities are modeled against the census and current year and five year forecasts. Medical illness is the sum of asthma in children, asthma in adults, heart disease, emphysema, bronchitis, cancer, diabetes, kidney disease, and liver disease. A limitation is that these numbers may be over-counted as the result of people potentially having more than one medical illness. Therefore, the analysis may have greater numbers of people with medical illness within census tracts than actually present. Overall, the analysis was based on the relationship between social factors.Attribute label: MedIllnesOther attribute definitions:GEOID10: Geographic identifier: State Code (25), Country Code (025), 2010 Census TractAREA_SQFT: Tract area (in square feet)AREA_ACRES: Tract area (in acres)POP100_RE: Tract population countHU100_RE: Tract housing unit countName: Boston Neighborhood

  19. f

    The Relative Strength of Association Between Sedentary Time, Total Physical...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    • +1more
    Updated Jan 15, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Olds, Tim; Maher, Carol; Katzmarzyk, Peter T.; Mire, Emily (2014). The Relative Strength of Association Between Sedentary Time, Total Physical Activity and Cardio-Metabolic Biomarkers in Adults in the 2003/04 and 2005/06 U.S. National Health and Nutrition Examination Survey. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001211951
    Explore at:
    Dataset updated
    Jan 15, 2014
    Authors
    Olds, Tim; Maher, Carol; Katzmarzyk, Peter T.; Mire, Emily
    Description

    Abbreviations: BP = blood pressure; HDL = High-density lipoprotein; HOMA %B = Homeostasis Model Assessment steady state beta cell function, HOMA %S = Homeostasis Model Assessment insulin sensitivity, OGTT = oral glucose tolerance test; NIM = not included in model; CVD = cardiovascular disease.P<0.05; **P<0.01; ***P<0.001. P values are two-sided.Models were adjusted for socio-demographic, medical history and smoking, alcohol and dietary behaviour. Please see Table S1 for full list of covariates included in the model for each cardio-metabolic biomarker.

  20. 🧠 Alzheimer's Disease Dataset 🧠

    • kaggle.com
    zip
    Updated Jun 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rabie El Kharoua (2024). 🧠 Alzheimer's Disease Dataset 🧠 [Dataset]. https://www.kaggle.com/datasets/rabieelkharoua/alzheimers-disease-dataset
    Explore at:
    zip(274395 bytes)Available download formats
    Dataset updated
    Jun 11, 2024
    Authors
    Rabie El Kharoua
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains extensive health information for 2,149 patients, each uniquely identified with IDs ranging from 4751 to 6900. The dataset includes demographic details, lifestyle factors, medical history, clinical measurements, cognitive and functional assessments, symptoms, and a diagnosis of Alzheimer's Disease. The data is ideal for researchers and data scientists looking to explore factors associated with Alzheimer's, develop predictive models, and conduct statistical analyses.

    Table of Contents

    1. Patient Information
      • Patient ID
      • Demographic Details
      • Lifestyle Factors
    2. Medical History
    3. Clinical Measurements
    4. Cognitive and Functional Assessments
    5. Symptoms
    6. Diagnosis Information
    7. Confidential Information

    Patient Information

    Patient ID

    • PatientID: A unique identifier assigned to each patient (4751 to 6900).

    Demographic Details

    • Age: The age of the patients ranges from 60 to 90 years.
    • Gender: Gender of the patients, where 0 represents Male and 1 represents Female.
    • Ethnicity: The ethnicity of the patients, coded as follows:
      • 0: Caucasian
      • 1: African American
      • 2: Asian
      • 3: Other
    • EducationLevel: The education level of the patients, coded as follows:
      • 0: None
      • 1: High School
      • 2: Bachelor's
      • 3: Higher

    Lifestyle Factors

    • BMI: Body Mass Index of the patients, ranging from 15 to 40.
    • Smoking: Smoking status, where 0 indicates No and 1 indicates Yes.
    • AlcoholConsumption: Weekly alcohol consumption in units, ranging from 0 to 20.
    • PhysicalActivity: Weekly physical activity in hours, ranging from 0 to 10.
    • DietQuality: Diet quality score, ranging from 0 to 10.
    • SleepQuality: Sleep quality score, ranging from 4 to 10.

    Medical History

    • FamilyHistoryAlzheimers: Family history of Alzheimer's Disease, where 0 indicates No and 1 indicates Yes.
    • CardiovascularDisease: Presence of cardiovascular disease, where 0 indicates No and 1 indicates Yes.
    • Diabetes: Presence of diabetes, where 0 indicates No and 1 indicates Yes.
    • Depression: Presence of depression, where 0 indicates No and 1 indicates Yes.
    • HeadInjury: History of head injury, where 0 indicates No and 1 indicates Yes.
    • Hypertension: Presence of hypertension, where 0 indicates No and 1 indicates Yes.

    Clinical Measurements

    • SystolicBP: Systolic blood pressure, ranging from 90 to 180 mmHg.
    • DiastolicBP: Diastolic blood pressure, ranging from 60 to 120 mmHg.
    • CholesterolTotal: Total cholesterol levels, ranging from 150 to 300 mg/dL.
    • CholesterolLDL: Low-density lipoprotein cholesterol levels, ranging from 50 to 200 mg/dL.
    • CholesterolHDL: High-density lipoprotein cholesterol levels, ranging from 20 to 100 mg/dL.
    • CholesterolTriglycerides: Triglycerides levels, ranging from 50 to 400 mg/dL.

    Cognitive and Functional Assessments

    • MMSE: Mini-Mental State Examination score, ranging from 0 to 30. Lower scores indicate cognitive impairment.
    • FunctionalAssessment: Functional assessment score, ranging from 0 to 10. Lower scores indicate greater impairment.
    • MemoryComplaints: Presence of memory complaints, where 0 indicates No and 1 indicates Yes.
    • BehavioralProblems: Presence of behavioral problems, where 0 indicates No and 1 indicates Yes.
    • ADL: Activities of Daily Living score, ranging from 0 to 10. Lower scores indicate greater impairment.

    Symptoms

    • Confusion: Presence of confusion, where 0 indicates No and 1 indicates Yes.
    • Disorientation: Presence of disorientation, where 0 indicates No and 1 indicates Yes.
    • PersonalityChanges: Presence of personality changes, where 0 indicates No and 1 indicates Yes.
    • DifficultyCompletingTasks: Presence of difficulty completing tasks, where 0 indicates No and 1 indicates Yes.
    • Forgetfulness: Presence of forgetfulness, where 0 indicates No and 1 indicates Yes.

    Diagnosis Information

    • Diagnosis: Diagnosis status for Alzheimer's Disease, where 0 indicates No and 1 indicates Yes.

    Confidential Information

    • DoctorInCharge: This column contains confidential information about the doctor in charge, with "XXXConfid" as the value for all patients.

    Conclusion

    This dataset offers extensive insights into the factors associated with Alzheimer's Disease, including demographic, lifestyle, medical, cognitive, and functional variables. It is ideal for developing predictive models, conducting statistical analyses, and exploring the complex interplay of factors contributing to Alzheimer's Disease.

    Citation

    If you use this dataset in your work, please cite it as follows:

    @misc{rabie_el_kharoua_2024,
      title={Alzheimer's Disease Dataset},
      url={https://www.kaggle.com/dsv/8668279},
      DOI={10.34740/KAGGLE/DSV/8668279},
      publisher={Kaggle...
    
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2024). Adults with hypertension in the U.S. by state 2023 [Dataset]. https://www.statista.com/statistics/505995/adults-with-hypertension-in-the-us-by-states/
Organization logo

Adults with hypertension in the U.S. by state 2023

Explore at:
Dataset updated
Sep 15, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2023
Area covered
United States
Description

In 2023, almost 46 percent of adults in Alabama suffered from hypertension. This statistic depicts the rate of adults suffering from hypertension in the United States in 2023, sorted by state.

Search
Clear search
Close search
Google apps
Main menu