Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The National Health and Nutrition Examination Survey (NHANES) provides data and have considerable potential to study the health and environmental exposure of the non-institutionalized US population. However, as NHANES data are plagued with multiple inconsistencies, processing these data is required before deriving new insights through large-scale analyses. Thus, we developed a set of curated and unified datasets by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-2018), totaling 135,310 participants and 5,078 variables. The variables conveydemographics (281 variables),dietary consumption (324 variables),physiological functions (1,040 variables),occupation (61 variables),questionnaires (1444 variables, e.g., physical activity, medical conditions, diabetes, reproductive health, blood pressure and cholesterol, early childhood),medications (29 variables),mortality information linked from the National Death Index (15 variables),survey weights (857 variables),environmental exposure biomarker measurements (598 variables), andchemical comments indicating which measurements are below or above the lower limit of detection (505 variables).csv Data Record: The curated NHANES datasets and the data dictionaries includes 23 .csv files and 1 excel file.The curated NHANES datasets involves 20 .csv formatted files, two for each module with one as the uncleaned version and the other as the cleaned version. The modules are labeled as the following: 1) mortality, 2) dietary, 3) demographics, 4) response, 5) medications, 6) questionnaire, 7) chemicals, 8) occupation, 9) weights, and 10) comments."dictionary_nhanes.csv" is a dictionary that lists the variable name, description, module, category, units, CAS Number, comment use, chemical family, chemical family shortened, number of measurements, and cycles available for all 5,078 variables in NHANES."dictionary_harmonized_categories.csv" contains the harmonized categories for the categorical variables.βdictionary_drug_codes.csvβ contains the dictionary for descriptors on the drugs codes.βnhanes_inconsistencies_documentation.xlsxβ is an excel file that contains the cleaning documentation, which records all the inconsistencies for all affected variables to help curate each of the NHANES modules.R Data Record: For researchers who want to conduct their analysis in the R programming language, only cleaned NHANES modules and the data dictionaries can be downloaded as a .zip file which include an .RData file and an .R file.βw - nhanes_1988_2018.RDataβ contains all the aforementioned datasets as R data objects. We make available all R scripts on customized functions that were written to curate the data.βm - nhanes_1988_2018.Rβ shows how we used the customized functions (i.e. our pipeline) to curate the original NHANES data.Example starter codes: The set of starter code to help users conduct exposome analysis consists of four R markdown files (.Rmd). We recommend going through the tutorials in order.βexample_0 - merge_datasets_together.Rmdβ demonstrates how to merge the curated NHANES datasets together.βexample_1 - account_for_nhanes_design.Rmdβ demonstrates how to conduct a linear regression model, a survey-weighted regression model, a Cox proportional hazard model, and a survey-weighted Cox proportional hazard model.βexample_2 - calculate_summary_statistics.Rmdβ demonstrates how to calculate summary statistics for one variable and multiple variables with and without accounting for the NHANES sampling design.βexample_3 - run_multiple_regressions.Rmdβ demonstrates how run multiple regression models with and without adjusting for the sampling design.
DNA samples were collected in the Third National Health and Nutrition Examination Survey (NHANES III; 1988-1994) and in subsequent NHANES cycles (1999-2002, 2007-2008, 2009-2010, and 2011-2012). The program is a nationally representative collection of stored DNA samples and genetic data and will serve to add to the extensive amount of health, nutritional, and environmental information collected from NHANES. Resulting genetic variants are deposited into the NHANES Genetic Data Repository. These datasets are categorized as restricted data since they contain identifiable information.
For more information on the NHANES Genetic Data please visit: NHANES DNA Specimens and Genetic Data Program at: https://www.cdc.gov/nchs/nhanes/biospecimens/dnaspecimens.htm. For more information on NHANES, visit the NHANES - National Health and Nutrition Examination Survey Homepage at: https://www.cdc.gov/nchs/nhanes/index.htm.
https://www.icpsr.umich.edu/web/ICPSR/studies/25503/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/25503/terms
The National Health and Nutrition Examination Surveys (NHANES) is a program of studies designed to assess the health and nutritional status of adults and children in the United States. The NHANES combines personal interviews and physical examinations, which focus on different population groups or health topics. These surveys have been conducted by the National Center for Health Statistics (NCHS) on a periodic basis from 1971 to 1994. In 1999 the NHANES became a continuous program with a changing focus on a variety of health and nutrition measurements which were designed to meet current and emerging concerns. The surveys examine a nationally representative sample of approximately 5,000 persons each year. These persons are located in counties across the United States, 15 of which are visited each year.
For NHANES 2003-2004, there were 12,761 persons selected for the sample, 10,122 of those were interviewed (79.3 percent) and 9,643 (75.6 percent) were examined in the mobile examination centers (MEC). Many of the NHANES 2003-2004 questions were also asked in NHANES II 1976-1980, Hispanic HANES 1982-1984, NHANES III 1988-1994, and NHANES 1999-2002. New questions were added to the survey based on recommendations from survey collaborators, NCHS staff, and other interagency work groups. As in past health examination surveys, data were collected on the prevalence of chronic conditions in the population. Estimates for previously undiagnosed conditions, as well as those known to and reported by survey respondents, are produced through the survey. Risk factors, those aspects of a person's lifestyle, constitution, heredity, or environment that may increase the chances of developing a certain disease or condition, were examined. Data on smoking, alcohol consumption, sexual practices, drug use, physical fitness and activity, weight, and dietary intake were collected. Information on certain aspects of reproductive health, such as use of oral contraceptives and breastfeeding practices, were also collected. The diseases, medical conditions, and health indicators that were studied include: anemia, cardiovascular disease, diabetes and lower extremity disease, environmental exposures, equilibrium, hearing loss, infectious diseases and immunization, kidney disease, mental health and cognitive functioning, nutrition, obesity, oral health, osteoporosis, physical fitness and physical functioning, reproductive history and sexual behavior, respiratory disease (asthma, chronic bronchitis, emphysema), sexually transmitted diseases, skin diseases, and vision. The sample for the survey was selected to represent the United States population of all ages. Special emphasis in the 2003-2004 NHANES was on adolescent health and the health of older Americans. To produce reliable statistics for these groups, adolescents aged 15-19 years and persons aged 60 years and older were over-sampled for the survey. African Americans and Mexican Americans were also over-sampled to enable accurate estimates for these groups. Several important areas in adolescent health, including nutrition and fitness and other aspects of growth and development, were addressed. Since the United States has experienced dramatic growth in the number of older people during the twentieth century, the aging population has major implications for health care needs, public policy, and research priorities. NCHS is working with public health agencies to increase the knowledge of the health status of older Americans. NHANES has a primary role in this endeavor. In the examination, all participants visit the physician who takes their pulse or blood pressure. Dietary interviews and body measurements are included for everyone. All but the very young have a blood sample taken and see the dentist. Depending upon the age of the participant, the rest of the examination includes tests and procedures to assess the various aspects of health listed above. Usually, the older the individual, the more extensive the examination. Some persons who are unable or unwilling to come to the examination center may be given a less extensive examination in their homes.
Demographic data file variables are grouped into three broad categories: (1) Status Variables: provide core information on the survey participant. Examples of the core variables include interview status, examination status, and sequence nu
The National Health and Nutrition Examination Survey (NHANES) is designed to assess the health and nutritional status of adults and children in the United States. The survey is unique in that it combines interviews with standardized physical examinations and laboratory tests.
NHANES was conducted on a periodic basis from 1971 to 1994, including NHANES I (1971-1975), NHANES II (1976-1980), NHANES III (1988-1994), and a Hispanic Health and Nutrition Examination Survey (HHANES, 1982-1984). In 1999, NHANES became continuous and has been collecting data annually ever since.
All of the NHANES programs utilized a stratified, multistage probability cluster design to provide a nationally representative sample of the U.S. civilian, noninstitutionalized population. The NHANES interview includes demographic, socioeconomic, dietary, and health-related questions. The examination component conducted in a mobile examination center consists of medical, dental, and physiological measurements, as well as the collection of biospecimens, such as blood and urine for laboratory testing.
This set of restricted data contains indirect identifying and/or sensitive information collected in NHANES prior to 1999. Please refer to the links below for additional data available from NHANES:
The National Health and Nutrition Examination Surveys (NHANES) is a program of studies designed to assess the health and nutritional status of adults and children in the United States. The NHANES combines personal interviews and physical examinations, which focus on different population groups or health topics. These surveys have been conducted by the National Center for Health Statistics (NCHS) on a periodic basis from 1971 to 1994. In 1999, the NHANES became a continuous program with a changing focus on a variety of health and nutrition measurements which were designed to meet current and emerging concerns. The sample for the survey is selected to represent the U.S. population of all ages. Many of the NHANES 2007-2008 questions also were asked in NHANES II 1976-1980, Hispanic NHANES 1982-1984, NHANES III 1988-1994, and NHANES 1999-2006. New questions were added to the survey based on recommendations from survey collaborators, NCHS staff, and other interagency work groups. Estimates for previously undiagnosed conditions, as well as those known to and reported by survey respondents, are produced through the survey. In the 2003-2004 wave, the NHANES includes more than 100 datasets. Most have been combined into three datasets for convenience. Each starts with the Demographic dataset and includes datasets of a specific type. 1. National Health and Nutrition Examination Survey (NHANES), Demographic & Examination Data, 2003-2004 (The base of the Demographic dataset + all data from medical examinations). 2. National Health and Nutrition Examination Survey (NHANES), Demographic & Laboratory Data, 2003-2004 (The base of the Demographic dataset + all data from medical laboratories). 3. National Health and Nutrition Examination Survey (NHANES), Demographic & Questionnaire Data, 2003-2004 (The base of the Demographic dataset + all data from questionnaires) Variable SEQN is included for merging files within the waves. All data files should be sorted by SEQN. Additional details of the design and content of each survey are available at the NHANES website.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description: The National Health and Nutrition Examination Survey (NHANES) provides data and have considerable potential to study the health and environmental exposure of the non-institutionalized US population. However, as NHANES data are plagued with multiple inconsistencies, processing these data is required before deriving new insights through large-scale analyses. Thus, we developed a set of curated and unified datasets by merging 614 separate files and harmonizing unrestricted data⦠See the full description on the dataset page: https://huggingface.co/datasets/nguyenvy/cleaned_nhanes_1988_2018.
https://www.icpsr.umich.edu/web/ICPSR/studies/4010/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/4010/terms
The third National Health and Nutrition Examination Survey (NHANES III, ICPSR 2231), conducted in 1988-1994, was designed to obtain nationally representative information on the health and nutritional status of the population of the United States through interviews and direct physical examinations. This release, Series II, No. 3A, contains data obtained from a second exam of selected survey participants who had had a primary exam. This release does not replace any previous NHANES III data releases. The second exam sample consists of seven separate data files. The Combination Foods file contains information on food weight, nutrient data, and descriptions about combination foods. The Total Nutrient Intake file records respondent intake of foods and beverages in a 24-hour time period. The Examination file consists of a comprehensive physical/dental examination. The Individual Foods file lists the food records and component food records for single and multi-component combination foods. The Laboratory file contains data collected through whole blood, serum, plasma, and urine specimens collected from respondents. The Second Laboratory file contains blood and urine assessments by specimen type and age group. The Variable Ingredient file reports data pertaining to the variable ingredients for many recipe foods in the Individual Foods file.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Measurements of discrimination by sex and race/ethnicity, NHANES III linked mortality file 1988β2006.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
US prevalence of a detectable serum autoantibody, NHANES 1988β1994.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Characteristics of adults aged 40β79 years with no prior atherosclerotic cardiovascular disease, NHANES III linked mortality file 1988β2006.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ImportanceReligiosity has been associated with positive health outcomes. Hypothesized pathways for this association include religious practices, such as church attendance, that result in reduced stress.ObjectiveThe objective of this study was to examine the relationship between religiosity (church attendance), allostatic load (AL) (a physiologic measure of stress) and all-cause mortality in middle-aged adults.Design, setting and participantsData for this study are from NHANES III (1988β1994). The analytic sample (n = 5449) was restricted to adult participants, who were between 40β65 years of age at the time of interview, had values for at least 9 out of 10 clinical/biologic markers used to derive AL, and had complete information on church attendance.Main outcomes and measuresThe primary outcomes were AL and mortality. AL was derived from values for metabolic, cardiovascular, and nutritional/inflammatory clinical/biologic markers. Mortality was derived from a probabilistic algorithm matching the NHANES III Linked Mortality File to the National Death Index through December 31, 2006, providing up to 18 years follow-up. The primary predictor variable was baseline report of church attendance over the past 12 months. Cox proportional hazard logistic regression models contained key covariates including socioeconomic status, self-rated health, co-morbid medical conditions, social support, healthy eating, physical activity, and alcohol intake.ResultsChurchgoers (at least once a year) comprised 64.0% of the study cohort (n = 3782). Non-churchgoers had significantly higher overall mean AL scores and higher prevalence of high-risk values for 3 of the 10 markers of AL than did churchgoers. In bivariate analyses non-churchgoers, compared to churchgoers, had higher odds of an AL score 2β3 (OR 1.24; 95% CI 1.01, 1.50) or β₯4 (OR 1.38; 95% CI 1.11, 1.71) compared to AL score of 0β1. More frequent churchgoers (more than once a week) had a 55% reduction of all-cause mortality risk compared with non-churchgoers. (HR 0.45, CI 0.24β0.85) in the fully adjusted model that included AL.Conclusions and relevanceWe found a significant association between church attendance and mortality among middle-aged adults after full adjustments. AL, a measure of stress, only partially explained differences in mortality between church and non-church attendees. These findings suggest a potential independent effect of church attendance on mortality.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Differences in CMV IgM Seroprevalence among Women Aged 12β49 Years by Selected Demographic Factors, NHANES III, 1988β1994.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ObjectiveHearing loss can cause speech and language delays, communication barriers, and learning problems. Such factors are associated with reduced academic achievement, social isolation, decreased quality of life, and poorer health outcomes. We use a national cohort of children to examine how subclinical hearing loss is associated with academic/educational performance. The goal of this study is to determine if different levels of subclinical hearing loss (pure tone average β€ 25 dB HL) are associated with educational testing outcomes in children.DesignAnalysis of children 6β16 years old who participated in the National Health and Nutrition Examination Survey (NHANES-III, 1988β1994) was performed. Air-conduction thresholds were measured at 0.5, 1, 2, 4, 6, and 8 kHz. A four-frequency pure-tone average (PTA) was calculated from 0.5, 1, 2, and 4 kHz. Hearing thresholds were divided into categories ( β€ 0, 1β10, and 11β25 dB) for analysis. The outcomes of interest were the Wide Range Achievement Test (WRAT-R) and Wechsler Intelligence Scale for Children (WISC-R). Analysis was conducted using ANOVA and logistic regression.ResultsWe analyzed 3,965 participants. In univariable analysis, the average scores in scaled math, reading, digit span (short-term memory), and block design (visual-motor skills) were significantly lower with worsening hearing categories (p < 0.01). In multivariable regression PTAs of 1β10 dB HL (OR 1.72, 95% CI 1.29β2.29, p < 0.01) and 11-25 dB HL (OR: 2.99, 95% CI 1.3β6.65, p < 0.01), compared to PTA of β€0 dB HL, were associated with poor reading test performance (
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Baseline characteristics according to presence of NAFLD (NHANES 1988β1994, n = 5,404).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abbreviations: Neg, Negative; Pos, Positive.aPrevalence percentages are weighted to be representative of the US population. Row percentages add to 100 except for rounding error.bNumbers are unweighted numbers of participants.cP based on Pearsonβs Ο2test.dDefined as total family income divided by poverty threshold, as determined by the US Census Bureau for the year of the interviewWeighted Prevalence of Helicobacter pylori Across Demographic Characteristics in US 21 to 59 Year-Olds, NHANES III (1988β1994).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundSarcopenia is prevalent in metabolic dysfunction-associated fatty liver diseases (MAFLD), and the primary treatment for both diseases is lifestyle modification. We studied how dietary components and physical activity affect individuals with sarcopenia and MAFLD.Materials and methodsWe conducted a study utilizing National Health and Nutrition Examination Survey (NHANES) III (1988β1994) data with Linked Mortality file (through 2019). The diagnosis of fatty liver disease (FLD) was based on ultrasound images revealing moderate and severe steatosis. Using bioelectrical measures, sarcopenia was assessed. Using self-report data, dietary intake and physical activity levels were evaluated.ResultsAmong 12,259 participants, 2,473 presented with MAFLD, and 290 of whom had sarcopenia. Higher levels of physical activity (odds ratio [OR] = 0.51 [0.36β0.95]) and calorie (OR = 0.58 [0.41β0.83]) intake reduced the likelihood of sarcopenia in MAFLD patients. During a median follow-up period of 15.3 years, 1,164 MAFLD and 181 MAFLD patients with sarcopenia perished. Increased activity levels improved the prognosis of patients with sarcopenia (Insufficiently active, HR = 0.75 [0.58β0.97]; Active, HR = 0.64 [0.48β0.86]), which was particularly pronounced in older patients.ConclusionIn the general population, hyperglycemia was highly related to MAFLD prognosis. Physical inactivity and a protein-restricted diet corresponded to sarcopenia, with physical inactivity being connected to poor outcomes. Adding protein supplements would be beneficial for older people with sarcopenia who are unable to exercise due to frailty, while the survival benefits were negligible.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The association between metabolic syndrome severity and odds of Metabolic Dysfunction-Associated Steatotic Liver Disease Occurrence in United States adults, the National Health and Nutrition Examination Survey (NHANES III) 1988β1994 (n = 10,605).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The distribution of blood heavy metals in the 1999β2020 NHANES data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abbreviations: CI, Confidence intervals; Neg/Neg, H. pylori negative and CagA negative; Pos/Neg, H. pylori positive and CagA positive; Pos/Pos, H. pylori positive and CagA positive. Note: Covariates that did not have significant interactions with H. pylori and CagA were included in all models but not shown. N = 1,755.* P
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sample characteristics by race/ethnicity and Metabolic Dysfunction-Associated Steatotic Liver Disease (MASLD) Status, United States adults, The National Health and Nutrition Examination Survey (NHANES III) 1988β1994 (n = 10,605).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The National Health and Nutrition Examination Survey (NHANES) provides data and have considerable potential to study the health and environmental exposure of the non-institutionalized US population. However, as NHANES data are plagued with multiple inconsistencies, processing these data is required before deriving new insights through large-scale analyses. Thus, we developed a set of curated and unified datasets by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-2018), totaling 135,310 participants and 5,078 variables. The variables conveydemographics (281 variables),dietary consumption (324 variables),physiological functions (1,040 variables),occupation (61 variables),questionnaires (1444 variables, e.g., physical activity, medical conditions, diabetes, reproductive health, blood pressure and cholesterol, early childhood),medications (29 variables),mortality information linked from the National Death Index (15 variables),survey weights (857 variables),environmental exposure biomarker measurements (598 variables), andchemical comments indicating which measurements are below or above the lower limit of detection (505 variables).csv Data Record: The curated NHANES datasets and the data dictionaries includes 23 .csv files and 1 excel file.The curated NHANES datasets involves 20 .csv formatted files, two for each module with one as the uncleaned version and the other as the cleaned version. The modules are labeled as the following: 1) mortality, 2) dietary, 3) demographics, 4) response, 5) medications, 6) questionnaire, 7) chemicals, 8) occupation, 9) weights, and 10) comments."dictionary_nhanes.csv" is a dictionary that lists the variable name, description, module, category, units, CAS Number, comment use, chemical family, chemical family shortened, number of measurements, and cycles available for all 5,078 variables in NHANES."dictionary_harmonized_categories.csv" contains the harmonized categories for the categorical variables.βdictionary_drug_codes.csvβ contains the dictionary for descriptors on the drugs codes.βnhanes_inconsistencies_documentation.xlsxβ is an excel file that contains the cleaning documentation, which records all the inconsistencies for all affected variables to help curate each of the NHANES modules.R Data Record: For researchers who want to conduct their analysis in the R programming language, only cleaned NHANES modules and the data dictionaries can be downloaded as a .zip file which include an .RData file and an .R file.βw - nhanes_1988_2018.RDataβ contains all the aforementioned datasets as R data objects. We make available all R scripts on customized functions that were written to curate the data.βm - nhanes_1988_2018.Rβ shows how we used the customized functions (i.e. our pipeline) to curate the original NHANES data.Example starter codes: The set of starter code to help users conduct exposome analysis consists of four R markdown files (.Rmd). We recommend going through the tutorials in order.βexample_0 - merge_datasets_together.Rmdβ demonstrates how to merge the curated NHANES datasets together.βexample_1 - account_for_nhanes_design.Rmdβ demonstrates how to conduct a linear regression model, a survey-weighted regression model, a Cox proportional hazard model, and a survey-weighted Cox proportional hazard model.βexample_2 - calculate_summary_statistics.Rmdβ demonstrates how to calculate summary statistics for one variable and multiple variables with and without accounting for the NHANES sampling design.βexample_3 - run_multiple_regressions.Rmdβ demonstrates how run multiple regression models with and without adjusting for the sampling design.