100+ datasets found
  1. f

    Cleaned NHANES 1988-2018

    • figshare.com
    txt
    Updated Feb 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vy Nguyen; Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet (2025). Cleaned NHANES 1988-2018 [Dataset]. http://doi.org/10.6084/m9.figshare.21743372.v9
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 18, 2025
    Dataset provided by
    figshare
    Authors
    Vy Nguyen; Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The National Health and Nutrition Examination Survey (NHANES) provides data and have considerable potential to study the health and environmental exposure of the non-institutionalized US population. However, as NHANES data are plagued with multiple inconsistencies, processing these data is required before deriving new insights through large-scale analyses. Thus, we developed a set of curated and unified datasets by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-2018), totaling 135,310 participants and 5,078 variables. The variables conveydemographics (281 variables),dietary consumption (324 variables),physiological functions (1,040 variables),occupation (61 variables),questionnaires (1444 variables, e.g., physical activity, medical conditions, diabetes, reproductive health, blood pressure and cholesterol, early childhood),medications (29 variables),mortality information linked from the National Death Index (15 variables),survey weights (857 variables),environmental exposure biomarker measurements (598 variables), andchemical comments indicating which measurements are below or above the lower limit of detection (505 variables).csv Data Record: The curated NHANES datasets and the data dictionaries includes 23 .csv files and 1 excel file.The curated NHANES datasets involves 20 .csv formatted files, two for each module with one as the uncleaned version and the other as the cleaned version. The modules are labeled as the following: 1) mortality, 2) dietary, 3) demographics, 4) response, 5) medications, 6) questionnaire, 7) chemicals, 8) occupation, 9) weights, and 10) comments."dictionary_nhanes.csv" is a dictionary that lists the variable name, description, module, category, units, CAS Number, comment use, chemical family, chemical family shortened, number of measurements, and cycles available for all 5,078 variables in NHANES."dictionary_harmonized_categories.csv" contains the harmonized categories for the categorical variables.“dictionary_drug_codes.csv” contains the dictionary for descriptors on the drugs codes.“nhanes_inconsistencies_documentation.xlsx” is an excel file that contains the cleaning documentation, which records all the inconsistencies for all affected variables to help curate each of the NHANES modules.R Data Record: For researchers who want to conduct their analysis in the R programming language, only cleaned NHANES modules and the data dictionaries can be downloaded as a .zip file which include an .RData file and an .R file.“w - nhanes_1988_2018.RData” contains all the aforementioned datasets as R data objects. We make available all R scripts on customized functions that were written to curate the data.“m - nhanes_1988_2018.R” shows how we used the customized functions (i.e. our pipeline) to curate the original NHANES data.Example starter codes: The set of starter code to help users conduct exposome analysis consists of four R markdown files (.Rmd). We recommend going through the tutorials in order.“example_0 - merge_datasets_together.Rmd” demonstrates how to merge the curated NHANES datasets together.“example_1 - account_for_nhanes_design.Rmd” demonstrates how to conduct a linear regression model, a survey-weighted regression model, a Cox proportional hazard model, and a survey-weighted Cox proportional hazard model.“example_2 - calculate_summary_statistics.Rmd” demonstrates how to calculate summary statistics for one variable and multiple variables with and without accounting for the NHANES sampling design.“example_3 - run_multiple_regressions.Rmd” demonstrates how run multiple regression models with and without adjusting for the sampling design.

  2. H

    National Health and Nutrition Examination Survey (NHANES)

    • dataverse.harvard.edu
    Updated May 30, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anthony Damico (2013). National Health and Nutrition Examination Survey (NHANES) [Dataset]. http://doi.org/10.7910/DVN/IMWQPJ
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 30, 2013
    Dataset provided by
    Harvard Dataverse
    Authors
    Anthony Damico
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    analyze the national health and nutrition examination survey (nhanes) with r nhanes is this fascinating survey where doctors and dentists accompany survey interviewers in a little mobile medical center that drives around the country. while the survey folks are interviewing people, the medical professionals administer laboratory tests and conduct a real doctor's examination. the b lood work and medical exam allow researchers like you and me to answer tough questions like, "how many people have diabetes but don't know they have diabetes?" conducting the lab tests and the physical isn't cheap, so a new nhanes data set becomes available once every two years and only includes about twelve thousand respondents. since the number of respondents is so small, analysts often pool multiple years of data together. the replication scripts below give a few different examples of how multiple years of data can be pooled with r. the survey gets conducted by the centers for disease control and prevention (cdc), and generalizes to the united states non-institutional, non-active duty military population. most of the data tables produced by the cdc include only a small number of variables, so importation with the foreign package's read.xport function is pretty straightforward. but that makes merging the appropriate data sets trickier, since it might not be clear what to pull for which variables. for every analysis, start with the table with 'demo' in the name -- this file includes basic demographics, weighting, and complex sample survey design variables. since it's quick to download the files directly from the cdc's ftp site, there's no massive ftp download automation script. this new github repository co ntains five scripts: 2009-2010 interview only - download and analyze.R download, import, save the demographics and health insurance files onto your local computer load both files, limit them to the variables needed for the analysis, merge them together perform a few example variable recodes create the complex sample survey object, using the interview weights run a series of pretty generic analyses on the health insurance ques tions 2009-2010 interview plus laboratory - download and analyze.R download, import, save the demographics and cholesterol files onto your local computer load both files, limit them to the variables needed for the analysis, merge them together perform a few example variable recodes create the complex sample survey object, using the mobile examination component (mec) weights perform a direct-method age-adjustment and matc h figure 1 of this cdc cholesterol brief replicate 2005-2008 pooled cdc oral examination figure.R download, import, save, pool, recode, create a survey object, run some basic analyses replicate figure 3 from this cdc oral health databrief - the whole barplot replicate cdc publications.R download, import, save, pool, merge, and recode the demographics file plus cholesterol laboratory, blood pressure questionnaire, and blood pressure laboratory files match the cdc's example sas and sudaan syntax file's output for descriptive means match the cdc's example sas and sudaan synta x file's output for descriptive proportions match the cdc's example sas and sudaan syntax file's output for descriptive percentiles replicate human exposure to chemicals report.R (user-contributed) download, import, save, pool, merge, and recode the demographics file plus urinary bisphenol a (bpa) laboratory files log-transform some of the columns to calculate the geometric means and quantiles match the 2007-2008 statistics shown on pdf page 21 of the cdc's fourth edition of the report click here to view these five scripts for more detail about the national health and nutrition examination survey (nhanes), visit: the cdc's nhanes homepage the national cancer institute's page of nhanes web tutorials notes: nhanes includes interview-only weights and interview + mobile examination component (mec) weights. if you o nly use questions from the basic interview in your analysis, use the interview-only weights (the sample size is a bit larger). i haven't really figured out a use for the interview-only weights -- nhanes draws most of its power from the combination of the interview and the mobile examination component variables. if you're only using variables from the interview, see if you can use a data set with a larger sample size like the current population (cps), national health interview survey (nhis), or medical expenditure panel survey (meps) instead. confidential to sas, spss, stata, sudaan users: why are you still riding around on a donkey after we've invented the internal combustion engine? time to transition to r. :D

  3. National Health and Nutrition Examination Survey (NHANES), 2001-2002

    • icpsr.umich.edu
    • datamed.org
    ascii, delimited, sas +2
    Updated Feb 22, 2012
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States Department of Health and Human Services. Centers for Disease Control and Prevention. National Center for Health Statistics (2012). National Health and Nutrition Examination Survey (NHANES), 2001-2002 [Dataset]. http://doi.org/10.3886/ICPSR25502.v5
    Explore at:
    ascii, stata, spss, delimited, sasAvailable download formats
    Dataset updated
    Feb 22, 2012
    Dataset provided by
    Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
    Authors
    United States Department of Health and Human Services. Centers for Disease Control and Prevention. National Center for Health Statistics
    License

    https://www.icpsr.umich.edu/web/ICPSR/studies/25502/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/25502/terms

    Time period covered
    2001 - 2002
    Area covered
    United States
    Description

    The National Health and Nutrition Examination Surveys (NHANES) is a program of studies designed to assess the health and nutritional status of adults and children in the United States. The NHANES combines personal interviews and physical examinations, which focus on different population groups or health topics. These surveys have been conducted by the National Center for Health Statistics (NCHS) on a periodic basis from 1971 to 1994. In 1999 the NHANES became a continuous program with a changing focus on a variety of health and nutrition measurements which were designed to meet current and emerging concerns. The surveys examine a nationally representative sample of approximately 5,000 persons each year. These persons are located in counties across the United States, 15 of which are visited each year. The 2001-2002 NHANES contains data for 11,039 individuals (and MEC examined sample size of 10,477) of all ages. Many questions that were asked in NHANES II, 1976-1980, Hispanic HANES 1982-1984, and NHANES III, 1988-1994, were combined with new questions in the NHANES 2001-2002. As in past health examination surveys, data were collected on the prevalence of chronic conditions in the population. Estimates for previously undiagnosed conditions, as well as those known to and reported by survey respondents, are produced through the survey. Risk factors, those aspects of a person's lifestyle, constitution, heredity, or environment that may increase the chances of developing a certain disease or condition, were examined. Data on smoking, alcohol consumption, sexual practices, drug use, physical fitness and activity, weight, and dietary intake were collected. Information on certain aspects of reproductive health, such as use of oral contraceptives and breastfeeding practices, were also collected. The diseases, medical conditions, and health indicators that were studied include: anemia, cardiovascular disease, diabetes and lower extremity disease, environmental exposures, equilibrium, hearing loss, infectious diseases and immunization, kidney disease, mental health and cognitive functioning, nutrition, obesity, oral health, osteoporosis, physical fitness and physical functioning, reproductive history and sexual behavior, respiratory disease (asthma, chronic bronchitis, emphysema), sexually transmitted diseases, skin diseases, and vision. The sample for the survey was selected to represent the United States population of all ages. Special emphasis in the 2001-2002 NHANES was on adolescent health and the health of older Americans. To produce reliable statistics for these groups, adolescents aged 15-19 years and persons aged 60 years and older were over-sampled for the survey. African Americans and Mexican Americans were also over-sampled to enable accurate estimates for these groups. Several important areas in adolescent health, including nutrition and fitness and other aspects of growth and development, were addressed. Since the United States has experienced dramatic growth in the number of older people during the twentieth century, the aging population has major implications for health care needs, public policy, and research priorities. NCHS is working with public health agencies to increase the knowledge of the health status of older Americans. NHANES has a primary role in this endeavor. In the examination, all participants visit the physician who takes their pulse or blood pressure. Dietary interviews and body measurements are included for everyone. All but the very young have a blood sample taken and see the dentist. Depending upon the age of the participant, the rest of the examination includes tests and procedures to assess the various aspects of health listed above. Usually, the older the individual, the more extensive the examination. Some persons who are unable to come to the examination center may be given a less extensive examination in their homes. Demographic data file variables are grouped into three broad categories: (1) Status Variables: provide core information on the survey participant. Examples of the core variables include interview status, examination status, and sequence number. (Sequence number is a unique ID assigned to each sample person and is required to match the information on this demographic file to the rest of the NHANES 2001-2002 data). (2) Recoded Demographic Variables: these variables include age (

  4. f

    NHANES 1988-2018

    • figshare.com
    application/gzip
    Updated Feb 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vy Nguyen; Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet (2025). NHANES 1988-2018 [Dataset]. http://doi.org/10.6084/m9.figshare.21743372.v3
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Feb 18, 2025
    Dataset provided by
    figshare
    Authors
    Vy Nguyen; Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The National Health and Nutrition Examination Survey (NHANES) provides data on the health and environmental exposure of the non-institutionalized US population. Such data have considerable potential to understand how the environment and behaviors impact human health. These data are also currently leveraged to answer public health questions such as prevalence of disease. However, these data need to first be processed before new insights can be derived through large-scale analyses. NHANES data are stored across hundreds of files with multiple inconsistencies. Correcting such inconsistencies takes systematic cross examination and considerable efforts but is required for accurately and reproducibly characterizing the associations between the exposome and diseases (e.g., cancer mortality outcomes). Thus, we developed a set of curated and unified datasets and accompanied code by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-2018), totaling 134,310 participants and 4,740 variables. The variables convey 1) demographic information, 2) dietary consumption, 3) physical examination results, 4) occupation, 5) questionnaire items (e.g., physical activity, general health status, medical conditions), 6) medications, 7) mortality status linked from the National Death Index, 8) survey weights, 9) environmental exposure biomarker measurements, and 10) chemical comments that indicate which measurements are below or above the lower limit of detection. We also provide a data dictionary listing the variables and their descriptions to help researchers browse the data. We also provide R markdown files to show example codes on calculating summary statistics and running regression models to help accelerate high-throughput analysis of the exposome and secular trends on cancer mortality. csv Data Record: The curated NHANES datasets and the data dictionaries includes 13 .csv files and 1 excel file. The curated NHANES datasets involves 10 .csv formatted files, one for each module and labeled as the following: 1) mortality, 2) dietary, 3) demographics, 4) response, 5) medications, 6) questionnaire, 7) chemicals, 8) occupation, 9) weights, and 10) comments. The eleventh file is a dictionary that lists the variable name, description, module, category, units, CAS Number, comment use, chemical family, chemical family shortened, number of measurements, and cycles available for all 4,740 variables in NHANES ("dictionary_nhanes.csv"). The 12th csv file contains the harmonized categories for the categorical variables ("dictionary_harmonized_categories.csv"). The 13th file contains the dictionary for descriptors on the drugs codes (“dictionary_drug_codes.csv”). The 14th file is an excel file that contains the cleaning documentation, which records all the inconsistencies for all affected variables to help curate each of the NHANES datasets (“nhanes_inconsistencies_documentation.xlsx”). R Data Record: For researchers who want to conduct their analysis in the R programming language, the curated NHANES datasets and the data dictionaries can be downloaded as a .zip file which include an .RData file and an .R file. We provided an .RData file that contains all the aforementioned datasets as R data objects (“w - nhanes_1988_2018.RData”). Also in this .RData file, we make available all R scripts on customized functions that were written to curate the data. We also provide an .R file that shows how we used the customized functions (i.e. our pipeline) to curate the data (“m - nhanes_1988_2018.R”).

  5. National Health & Nutrition Exam Survey 2017-2018

    • kaggle.com
    Updated Jan 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Riley Zurrin (2024). National Health & Nutrition Exam Survey 2017-2018 [Dataset]. https://www.kaggle.com/datasets/rileyzurrin/national-health-and-nutrition-exam-survey-2017-2018
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 12, 2024
    Dataset provided by
    Kaggle
    Authors
    Riley Zurrin
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    As of January 2024, this is the most recent NHANES dataset whose data collection was not affected by COVID-19.

    Context

    The National Health and Nutrition Examination Survey (NHANES) is a program of studies designed to assess the health and nutritional status of adults and children in the United States. The survey is unique in that it combines interviews and physical examinations. NHANES is a major program of the National Center for Health Statistics (NCHS). NCHS is part of the Centers for Disease Control and Prevention (CDC) and has the responsibility for producing vital and health statistics for the Nation.

    The NHANES program began in the early 1960s and has been conducted as a series of surveys focusing on different population groups or health topics. In 1999, the survey became a continuous program that has a changing focus on a variety of health and nutrition measurements to meet emerging needs. The survey examines a nationally representative sample of about 5,000 persons each year. These persons are located in counties across the country, 15 of which are visited each year.

    The NHANES interview includes demographic, socioeconomic, dietary, and health-related questions. The examination component consists of medical, dental, and physiological measurements, as well as laboratory tests administered by highly trained medical personnel.

    To date, thousands of research findings have been published using the NHANES data.

    Content

    The 2017-2018 NHANES datasets include the following components:

    1. Demographics dataset:

    • A complete variable dictionary can be found here

    2. Examinations dataset, which contains factors like:

    • Blood pressure

    • Body measures

    • Muscle strength - grip test

    • Oral health - dentition

    • Taste & smell

    • A complete variable dictionary can be found here

    3. Dietary data - total nutrient intake:

    • A complete variable dictionary can be found here

    4. Laboratory dataset, which includes factors like:

    • Albumin & Creatinine - Urine

    • Apolipoprotein B

    • Blood Lead, Cadmium, Total Mercury, Selenium, and Manganese

    • Blood mercury: inorganic, ethyl and methyl

    • Cholesterol - HDL

    • Cholesterol - LDL & Triglycerides

    • Cholesterol - Total

    • Complete Blood Count with 5-part Differential - Whole Blood

    • Copper, Selenium & Zinc - Serum

    • Fasting Questionnaire

    • Fluoride - Plasma

    • Fluoride - Water

    • Glycohemoglobin

    • Hepatitis A

    • Hepatitis B Surface Antibody

    • Hepatitis B: core antibody, surface antigen, and Hepatitis D antibody

    • Hepatitis C RNA (HCV-RNA) and Hepatitis C Genotype

    • Hepatitis E: IgG & IgM Antibodies

    • Herpes Simplex Virus Type-1 & Type-2

    • HIV Antibody Test

    • Human Papillomavirus (HPV) - Oral Rinse

    • Human Papillomavirus (HPV) DNA - Vaginal Swab: Roche Cobas & Roche Linear Array

    • Human Papillomavirus (HPV) DNA Results from Penile Swab Samples: Roche Linear Array

    • Insulin

    • Iodine - Urine

    • Perchlorate, Nitrate & Thiocyanate - Urine

    • Perfluoroalkyl and Polyfluoroalkyl Substances (formerly Polyfluoroalkyl Chemicals - PFC)

    • Personal Care and Consumer Product Chemicals and Metabolites

    • Phthalates and Plasticizers Metabolites - Urine

    • Plasma Fasting Glucose

    • Polycyclic Aromatic Hydrocarbons (PAH) - Urine

    • Standard Biochemistry Profile

    • Tissue Transglutaminase Assay (IgA-TTG) & IgA Endomyseal Antibody Assay (IgA EMA)

    • Trichomonas - Urine

    • Two-hour Oral Glucose Tolerance Test

    • Urinary Chlamydia

    • Urinary Mercury

    • Urinary Speciated Arsenics

    • Urinary Total Arsenic

    • Urine Flow Rate

    • Urine Metals

    • Urine Pregnancy Test

    • Vitamin B12

    • A complete variable dictionary can be found here

    5. Questionnaire dataset, which includes items like:

    • Acculturation

    • Alcohol Use

    • Blood Pressure & Cholesterol

    • Cardiovascular Health

    • Consumer Behavior

    • Current Health Status

    • Dermatology

    • Diabetes

    • Diet Behavior & Nutrition

    • Disability

    • Drug Use

    • Early Childhood

    • Food Security

    • Health Insurance

    • Hepatitis

    • Hospital Utilization & Access to Care

    • Housing Characteristics

    • Immunization

    • Income

    • Medical Conditions

    • Mental Health - Depression Screener

    • Occupation

    • Oral Health

    • Osteoporosis

    • Pesticide Use

    • Physical Activity

    • Physical Functioning

    • Preventive Aspirin Us...

  6. National Health and Nutrition Examination Survey (NHANES), Demographic and...

    • thearda.com
    Updated Nov 15, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Association of Religion Data Archives (2014). National Health and Nutrition Examination Survey (NHANES), Demographic and Questionnaire Data, 2003-2004 [Dataset]. http://doi.org/10.17605/OSF.IO/JGD5C
    Explore at:
    Dataset updated
    Nov 15, 2014
    Dataset provided by
    Association of Religion Data Archives
    Dataset funded by
    National Center for Health Statistics (NCHS)
    Description

    The National Health and Nutrition Examination Surveys (NHANES) is a program of studies designed to assess the health and nutritional status of adults and children in the United States. The NHANES combines personal interviews and physical examinations, which focus on different population groups or health topics. These surveys have been conducted by the National Center for Health Statistics (NCHS) on a periodic basis from 1971 to 1994. In 1999, the NHANES became a continuous program with a changing focus on a variety of health and nutrition measurements which were designed to meet current and emerging concerns. The sample for the survey is selected to represent the U.S. population of all ages. Many of the NHANES 2007-2008 questions also were asked in NHANES II 1976-1980, Hispanic HANES 1982-1984, NHANES III 1988-1994, and NHANES 1999-2006. New questions were added to the survey based on recommendations from survey collaborators, NCHS staff, and other interagency work groups. Estimates for previously undiagnosed conditions, as well as those known to and reported by survey respondents, are produced through the survey.

    In the 2003-2004 wave, the NHANES includes over 100 datasets. Most have been combined into three datasets for convenience. Each starts with the Demographic dataset and includes datasets of a specific type.

    1. National Health and Nutrition Examination Survey (NHANES), Demographic & Examination Data, 2003-2004 (The base of the Demographic dataset + all data from medical examinations).

    2. National Health and Nutrition Examination Survey (NHANES), Demographic & Laboratory Data, 2003-2004 (The base of the Demographic dataset + all data from medical laboratories).

    3. National Health and Nutrition Examination Survey (NHANES), Demographic & Questionnaire Data, 2003-2004 (The base of the Demographic dataset + all data from questionnaires)

    Variable SEQN is included for merging files within the waves. All data files should be sorted by SEQN.

    Additional details of the design and content of each survey are available at the "https://www.cdc.gov/nchs/nhanes/index.html" Target="_blank">NHANES website.

  7. National Health and Nutrition Examination Survey III, 1988-1994: Series II,...

    • icpsr.umich.edu
    ascii, sas
    Updated Jan 18, 2006
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States Department of Health and Human Services. Centers for Disease Control and Prevention. National Center for Health Statistics (2006). National Health and Nutrition Examination Survey III, 1988-1994: Series II, No. 3A [Dataset]. http://doi.org/10.3886/ICPSR04010.v1
    Explore at:
    ascii, sasAvailable download formats
    Dataset updated
    Jan 18, 2006
    Dataset provided by
    Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
    Authors
    United States Department of Health and Human Services. Centers for Disease Control and Prevention. National Center for Health Statistics
    License

    https://www.icpsr.umich.edu/web/ICPSR/studies/4010/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/4010/terms

    Time period covered
    1988 - 1994
    Area covered
    United States
    Description

    The third National Health and Nutrition Examination Survey (NHANES III, ICPSR 2231), conducted in 1988-1994, was designed to obtain nationally representative information on the health and nutritional status of the population of the United States through interviews and direct physical examinations. This release, Series II, No. 3A, contains data obtained from a second exam of selected survey participants who had had a primary exam. This release does not replace any previous NHANES III data releases. The second exam sample consists of seven separate data files. The Combination Foods file contains information on food weight, nutrient data, and descriptions about combination foods. The Total Nutrient Intake file records respondent intake of foods and beverages in a 24-hour time period. The Examination file consists of a comprehensive physical/dental examination. The Individual Foods file lists the food records and component food records for single and multi-component combination foods. The Laboratory file contains data collected through whole blood, serum, plasma, and urine specimens collected from respondents. The Second Laboratory file contains blood and urine assessments by specimen type and age group. The Variable Ingredient file reports data pertaining to the variable ingredients for many recipe foods in the Individual Foods file.

  8. f

    Data Sheet 2_Low-carbohydrate diet score and chronic obstructive pulmonary...

    • frontiersin.figshare.com
    docx
    Updated Dec 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xin Zhang; Jipeng Mo; Kaiyu Yang; Tiewu Tan; Hui Qin; Cuiping Zhao (2024). Data Sheet 2_Low-carbohydrate diet score and chronic obstructive pulmonary disease: a machine learning analysis of NHANES data.docx [Dataset]. http://doi.org/10.3389/fnut.2024.1519782.s002
    Explore at:
    docxAvailable download formats
    Dataset updated
    Dec 18, 2024
    Dataset provided by
    Frontiers
    Authors
    Xin Zhang; Jipeng Mo; Kaiyu Yang; Tiewu Tan; Hui Qin; Cuiping Zhao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundRecent research has identified the Low-Carbohydrate Diet (LCD) score as a novel biomarker, with studies showing that LCDs can reduce carbon dioxide retention, potentially improving lung function. While the link between the LCD score and chronic obstructive pulmonary disease (COPD) has been explored, its relevance in the US population remains uncertain. This study aims to explore the association between the LCD score and the likelihood of COPD prevalence in this population.MethodsData from 16,030 participants in the National Health and Nutrition Examination Survey (NHANES) collected between 2007 and 2023 were analyzed to examine the relationship between LCD score and COPD. Propensity score matching (PSM) was employed to reduce baseline bias. Weighted multivariable logistic regression models were applied, and restricted cubic spline (RCS) regression was used to explore possible nonlinear relationships. Subgroup analyses were performed to evaluate the robustness of the results. Additionally, we employed eight machine learning methods—Boost Tree, Decision Tree, Logistic Regression, MLP, Naive Bayes, KNN, Random Forest, and SVM RBF—to build predictive models and evaluate their performance. Based on the best-performing model, we further examined variable importance and model accuracy.ResultsUpon controlling for variables, the LCD score demonstrated a strong correlation with the odds of COPD prevalence. In compared to the lowest quartile, the adjusted odds ratios (ORs) for the high quartile were 0.77 (95% CI: 0.63, 0.95), 0.74 (95% CI: 0.59, 0.93), and 0.61 (95% CI: 0.48, 0.78). RCS analysis demonstrated a linear inverse relationship between the LCD score and the odds of COPD prevalence. Furthermore, the random forest model exhibited robust predictive efficacy, with an area under the curve (AUC) of 71.6%.ConclusionOur study of American adults indicates that adherence to the LCD may be linked to lower odds of COPD prevalence. These findings underscore the important role of the LCD score as a tool for enhancing COPD prevention efforts within the general population. Nonetheless, additional prospective cohort studies are required to assess and validate these results.

  9. f

    Creation of variables analogous to those in the American Diabetes...

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    Updated May 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Enticott, Joanne; Teede, Helena; De Silva, Kushan; Demmer, Ryan T.; Jönsson, Daniel; Lim, Siew; Mousa, Aya; Forbes, Andrew (2021). Creation of variables analogous to those in the American Diabetes Association (ADA) diabetes risk test using National Health and Nutrition Examination Survey (NHANES) data. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000918210
    Explore at:
    Dataset updated
    May 5, 2021
    Authors
    Enticott, Joanne; Teede, Helena; De Silva, Kushan; Demmer, Ryan T.; Jönsson, Daniel; Lim, Siew; Mousa, Aya; Forbes, Andrew
    Description

    Creation of variables analogous to those in the American Diabetes Association (ADA) diabetes risk test using National Health and Nutrition Examination Survey (NHANES) data.

  10. f

    Adjusted means (95% CL) for continuous hematologic variables across...

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erik A. Willis; Joseph J. Shearer; Charles E. Matthews; Jonathan N. Hofmann (2023). Adjusted means (95% CL) for continuous hematologic variables across quartiles of MVPA in U.S. adults ≥ 20 years (NHANES 2003–2006). [Dataset]. http://doi.org/10.1371/journal.pone.0204277.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Erik A. Willis; Joseph J. Shearer; Charles E. Matthews; Jonathan N. Hofmann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Adjusted means (95% CL) for continuous hematologic variables across quartiles of MVPA in U.S. adults ≥ 20 years (NHANES 2003–2006).

  11. Nutritional data

    • redivis.com
    Updated Jun 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Redivis Demo Organization (2025). Nutritional data [Dataset]. https://redivis.com/datasets/zvnm-5f4wzzfjn
    Explore at:
    Dataset updated
    Jun 4, 2025
    Dataset provided by
    Redivis Inc.
    Authors
    Redivis Demo Organization
    Description

    The table Nutritional data is part of the dataset NHANES II, available at https://redivis.com/datasets/zvnm-5f4wzzfjn. It contains 10351 rows across 58 variables.

  12. f

    Characteristics of participants (weighted mean ± standard error or weighted...

    • plos.figshare.com
    xls
    Updated Jun 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Monica K. Silver; Marie S. O'Neill; MaryFran R. Sowers; Sung Kyun Park (2023). Characteristics of participants (weighted mean ± standard error or weighted percentage) by NHANES cycle, National Health and Nutrition Examination Survey, 2003-2008. [Dataset]. http://doi.org/10.1371/journal.pone.0026868.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 6, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Monica K. Silver; Marie S. O'Neill; MaryFran R. Sowers; Sung Kyun Park
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    *p-value based on the Rao-Scott log-likelihood ratio test for continuous variables and the Rao-Scott Chi-square test for categorical variables.†Geometric mean (95% confidence interval) is presented.‡Defined as HbA1c ≥6.5% or use of diabetes medication.

  13. Public-Use Linked Mortality Files

    • catalog.data.gov
    • data.virginia.gov
    • +4more
    Updated Apr 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Centers for Disease Control and Prevention (2025). Public-Use Linked Mortality Files [Dataset]. https://catalog.data.gov/dataset/public-use-linked-mortality-files
    Explore at:
    Dataset updated
    Apr 23, 2025
    Dataset provided by
    Centers for Disease Control and Preventionhttp://www.cdc.gov/
    Description

    NCHS has linked data from various surveys with death certificate records from the National Death Index (NDI). Linkage of the NCHS survey participant data with the NDI mortality data provides the opportunity to conduct a vast array of outcome studies designed to investigate the association of a wide variety of health factors with mortality. The Linked Mortality Files (LMF) have been updated with mortality follow-up data through December 31, 2019. Public-use Linked Mortality Files (LMF) are available for 1986-2018 NHIS, 1999-2018 NHANES, and NHANES III. The files include a limited set of mortality variables for adult participants only. The public-use versions of the NCHS Linked Mortality Files were subjected to data perturbation techniques to reduce the risk of participant re-identification. For select records, synthetic data were substituted for follow-up time or underlying cause of death. Information regarding vital status was not perturbed.

  14. Weighted estimates, 95% confidence intervals, and p-values for imputed...

    • plos.figshare.com
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elizabeth M. Miller (2023). Weighted estimates, 95% confidence intervals, and p-values for imputed survey regression models of hemoglobin, ferritin, % transferrin saturation, and transferrin receptor as dependent variables, reproductive variables as independent variables, and ethnicity, survey release year, and other covariates as control variables. [Dataset]. http://doi.org/10.1371/journal.pone.0112216.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Elizabeth M. Miller
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    aThe n for the hemoglobin, ferritin, and % transferrin saturation model is 6603 (all years).bThe n for the transferrin receptor model is 3295 (survey years 2003–2006 only).cMean R2 of 50 imputations.dReference category is white ethnicity.eReference category is survey release year 1999–2000 (hemoglobin, ferritin, and % transferrin saturation models).fReference category is survey release year 2003–2004 (transferrin receptor model only).Weighted estimates, 95% confidence intervals, and p-values for imputed survey regression models of hemoglobin, ferritin, % transferrin saturation, and transferrin receptor as dependent variables, reproductive variables as independent variables, and ethnicity, survey release year, and other covariates as control variables.

  15. f

    Adjusted means (95% CL) for continuous hematologic variables across...

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erik A. Willis; Joseph J. Shearer; Charles E. Matthews; Jonathan N. Hofmann (2023). Adjusted means (95% CL) for continuous hematologic variables across quartiles of total sedentary time in U.S. adults ≥ 20 years (NHANES 2003–2006). [Dataset]. http://doi.org/10.1371/journal.pone.0204277.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Erik A. Willis; Joseph J. Shearer; Charles E. Matthews; Jonathan N. Hofmann
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Adjusted means (95% CL) for continuous hematologic variables across quartiles of total sedentary time in U.S. adults ≥ 20 years (NHANES 2003–2006).

  16. Weighted estimates and p-values for complete case survey regression models...

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elizabeth M. Miller (2023). Weighted estimates and p-values for complete case survey regression models of hemoglobin, ferritin, % transferrin saturation, and transferrin receptor as dependent variables, reproductive variables as independent variables, and ethnicity, survey release year, and other covariates as control variables. [Dataset]. http://doi.org/10.1371/journal.pone.0112216.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Elizabeth M. Miller
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    aTransferrin receptor model is based on survey years 2003–2006 only.bReference category is white ethnicity.cReference category is survey release year 1999–2000 (hemoglobin, ferritin, and % transferrin saturation models).dReference category is survey release year 2003–2004 (transferrin receptor model only).Weighted estimates and p-values for complete case survey regression models of hemoglobin, ferritin, % transferrin saturation, and transferrin receptor as dependent variables, reproductive variables as independent variables, and ethnicity, survey release year, and other covariates as control variables.

  17. f

    US prevalence of a detectable serum autoantibody, NHANES 1988–1994.

    • figshare.com
    xls
    Updated Jun 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Charles F. Dillon; Michael H. Weisman; Frederick W. Miller (2023). US prevalence of a detectable serum autoantibody, NHANES 1988–1994. [Dataset]. http://doi.org/10.1371/journal.pone.0226516.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 7, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Charles F. Dillon; Michael H. Weisman; Frederick W. Miller
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    US prevalence of a detectable serum autoantibody, NHANES 1988–1994.

  18. f

    NHANES dataset characteristics and comparison between phenotypes.

    • plos.figshare.com
    xls
    Updated Jun 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hyeong Jun Ahn; Kyle Ishikawa; Min-Hee Kim (2024). NHANES dataset characteristics and comparison between phenotypes. [Dataset]. http://doi.org/10.1371/journal.pone.0304785.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 28, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Hyeong Jun Ahn; Kyle Ishikawa; Min-Hee Kim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    NHANES dataset characteristics and comparison between phenotypes.

  19. US autoantibody prevalence by sex, NHANES 1960–2014.

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Charles F. Dillon; Michael H. Weisman; Frederick W. Miller (2023). US autoantibody prevalence by sex, NHANES 1960–2014. [Dataset]. http://doi.org/10.1371/journal.pone.0226516.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Charles F. Dillon; Michael H. Weisman; Frederick W. Miller
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    US autoantibody prevalence by sex, NHANES 1960–2014.

  20. National Health and Nutrition Examination Survey I: Epidemiologic Follow-Up...

    • icpsr.umich.edu
    • datamed.org
    ascii
    Updated Jan 18, 2006
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States Department of Health and Human Services. National Center for Health Statistics (2006). National Health and Nutrition Examination Survey I: Epidemiologic Follow-Up Study, 1982-1984 [Dataset]. http://doi.org/10.3886/ICPSR08900.v2
    Explore at:
    asciiAvailable download formats
    Dataset updated
    Jan 18, 2006
    Dataset provided by
    Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
    Authors
    United States Department of Health and Human Services. National Center for Health Statistics
    License

    https://www.icpsr.umich.edu/web/ICPSR/studies/8900/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/8900/terms

    Time period covered
    1982 - 1984
    Area covered
    United States
    Description

    The National Health and Nutrition Examination Survey I Epidemiologic Followup Study (NHEFS) originated as a joint project between the National Center for Health Statistics (NCHS) and the National Institute on Aging (NIA). The design of NHEFS, which contains follow-up data on the NHANES I cohort, consisted of five steps. The first step focused on tracing and locating all subjects in the cohort or their proxies and determining their vital status. The second step involved the obtaining of death certificates for subjects who were deceased. Interviews with the participants or their proxies constituted the third phase of the follow-up. The fourth phase of the follow-up included measurements of pulse, blood pressure, and weight for interviewed respondents, and the fifth step was the acquisition of relevant hospital and nursing home records, including pathology reports and electrocardiograms. The respondent interview was designed to gather information on selected aspects of the subject's health history since the time of the NHANES I exam. This information included a history of the occurrence or recurrence of selected medical conditions, an assessment of behavioral, social, nutritional, and medical risk factors believed to be associated with these conditions, and an assessment of various aspects of functional status. Whenever possible, the questionnaire was designed to retain item comparability between NHANES I and NHEFS in order to measure change over time. However, questionnaire items were modified, added, or deleted when necessary to take advantage of recent improvements in questionnaire methodology. The Vital and Tracing Status file is a master file containing tracing, vital status, and demographic data for all NHEFS respondents. In addition, it provides users with information on the availability of different survey components for each respondent. For example, variables have been created to indicate whether a death certificate was received for a deceased subject, hospital records were received, or a follow-up interview was completed. The Health Care Facility Record file offers data on respondents who had reported an overnight stay in a health care facility after 1970. Information on the name and address of the facility, the date of the stay, and the reason for the stay was recorded. The Mortality Data file contains death certificate information for 1,935 NHEFS decedents. The death certificate information is for deaths occurring from 1971 to 1983.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Vy Nguyen; Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet (2025). Cleaned NHANES 1988-2018 [Dataset]. http://doi.org/10.6084/m9.figshare.21743372.v9

Cleaned NHANES 1988-2018

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
txtAvailable download formats
Dataset updated
Feb 18, 2025
Dataset provided by
figshare
Authors
Vy Nguyen; Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The National Health and Nutrition Examination Survey (NHANES) provides data and have considerable potential to study the health and environmental exposure of the non-institutionalized US population. However, as NHANES data are plagued with multiple inconsistencies, processing these data is required before deriving new insights through large-scale analyses. Thus, we developed a set of curated and unified datasets by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-2018), totaling 135,310 participants and 5,078 variables. The variables conveydemographics (281 variables),dietary consumption (324 variables),physiological functions (1,040 variables),occupation (61 variables),questionnaires (1444 variables, e.g., physical activity, medical conditions, diabetes, reproductive health, blood pressure and cholesterol, early childhood),medications (29 variables),mortality information linked from the National Death Index (15 variables),survey weights (857 variables),environmental exposure biomarker measurements (598 variables), andchemical comments indicating which measurements are below or above the lower limit of detection (505 variables).csv Data Record: The curated NHANES datasets and the data dictionaries includes 23 .csv files and 1 excel file.The curated NHANES datasets involves 20 .csv formatted files, two for each module with one as the uncleaned version and the other as the cleaned version. The modules are labeled as the following: 1) mortality, 2) dietary, 3) demographics, 4) response, 5) medications, 6) questionnaire, 7) chemicals, 8) occupation, 9) weights, and 10) comments."dictionary_nhanes.csv" is a dictionary that lists the variable name, description, module, category, units, CAS Number, comment use, chemical family, chemical family shortened, number of measurements, and cycles available for all 5,078 variables in NHANES."dictionary_harmonized_categories.csv" contains the harmonized categories for the categorical variables.“dictionary_drug_codes.csv” contains the dictionary for descriptors on the drugs codes.“nhanes_inconsistencies_documentation.xlsx” is an excel file that contains the cleaning documentation, which records all the inconsistencies for all affected variables to help curate each of the NHANES modules.R Data Record: For researchers who want to conduct their analysis in the R programming language, only cleaned NHANES modules and the data dictionaries can be downloaded as a .zip file which include an .RData file and an .R file.“w - nhanes_1988_2018.RData” contains all the aforementioned datasets as R data objects. We make available all R scripts on customized functions that were written to curate the data.“m - nhanes_1988_2018.R” shows how we used the customized functions (i.e. our pipeline) to curate the original NHANES data.Example starter codes: The set of starter code to help users conduct exposome analysis consists of four R markdown files (.Rmd). We recommend going through the tutorials in order.“example_0 - merge_datasets_together.Rmd” demonstrates how to merge the curated NHANES datasets together.“example_1 - account_for_nhanes_design.Rmd” demonstrates how to conduct a linear regression model, a survey-weighted regression model, a Cox proportional hazard model, and a survey-weighted Cox proportional hazard model.“example_2 - calculate_summary_statistics.Rmd” demonstrates how to calculate summary statistics for one variable and multiple variables with and without accounting for the NHANES sampling design.“example_3 - run_multiple_regressions.Rmd” demonstrates how run multiple regression models with and without adjusting for the sampling design.

Search
Clear search
Close search
Google apps
Main menu