24 datasets found
  1. NHANES 1988-2018

    • figshare.com
    application/gzip
    Updated Feb 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet (2025). NHANES 1988-2018 [Dataset]. http://doi.org/10.6084/m9.figshare.21743372.v2
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Feb 18, 2025
    Dataset provided by
    figshare
    Authors
    Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The National Health and Nutrition Examination Survey (NHANES) provides data on the health and environmental exposure of the non-institutionalized US population. Such data have considerable potential to understand how the environment and behaviors impact human health. These data are also currently leveraged to answer public health questions such as prevalence of disease. However, these data need to first be processed before new insights can be derived through large-scale analyses. NHANES data are stored across hundreds of files with multiple inconsistencies. Correcting such inconsistencies takes systematic cross examination and considerable efforts but is required for accurately and reproducibly characterizing the associations between the exposome and diseases (e.g., cancer mortality outcomes). Thus, we developed a set of curated and unified datasets and accompanied code by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-2018), totaling 134,310 participants and 4,740 variables. The variables convey 1) demographic information, 2) dietary consumption, 3) physical examination results, 4) occupation, 5) questionnaire items (e.g., physical activity, general health status, medical conditions), 6) medications, 7) mortality status linked from the National Death Index, 8) survey weights, 9) environmental exposure biomarker measurements, and 10) chemical comments that indicate which measurements are below or above the lower limit of detection. We also provide a data dictionary listing the variables and their descriptions to help researchers browse the data. We also provide R markdown files to show example codes on calculating summary statistics and running regression models to help accelerate high-throughput analysis of the exposome and secular trends on cancer mortality. csv Data Record: The curated NHANES datasets and the data dictionaries includes 13 .csv files and 1 excel file. The curated NHANES datasets involves 10 .csv formatted files, one for each module and labeled as the following: 1) mortality, 2) dietary, 3) demographics, 4) response, 5) medications, 6) questionnaire, 7) chemicals, 8) occupation, 9) weights, and 10) comments. The eleventh file is a dictionary that lists the variable name, description, module, category, units, CAS Number, comment use, chemical family, chemical family shortened, number of measurements, and cycles available for all 4,740 variables in NHANES ("dictionary_nhanes.csv"). The 12th csv file contains the harmonized categories for the categorical variables ("dictionary_harmonized_categories.csv"). The 13th file contains the dictionary for descriptors on the drugs codes (“dictionary_drug_codes.csv”). The 14th file is an excel file that contains the cleaning documentation, which records all the inconsistencies for all affected variables to help curate each of the NHANES datasets (“nhanes_inconsistencies_documentation.xlsx”). R Data Record: For researchers who want to conduct their analysis in the R programming language, the curated NHANES datasets and the data dictionaries can be downloaded as a .zip file which include an .RData file and an .R file. We provided an .RData file that contains all the aforementioned datasets as R data objects (“w - nhanes_1988_2018.RData”). Also in this .RData file, we make available all R scripts on customized functions that were written to curate the data. We also provide an .R file that shows how we used the customized functions (i.e. our pipeline) to curate the data (“m - nhanes_1988_2018.R”).

  2. Data from: USDA National Nutrient Database for Standard Reference Dataset...

    • agdatacommons.nal.usda.gov
    • gimi9.com
    • +3more
    zip
    Updated Feb 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jaspreet K.C. Ahuja; David B. Haytowitz; Janet M. Roseland; Shirley Wasswa-Kintu; Bethany Showell; Melissa Nickle; Meena Somanchi; Mona Khan; Jacob Exler; Juhi R. Williams; Quynh Anh Nguyen; Pamela R. Pehrsson (2024). USDA National Nutrient Database for Standard Reference Dataset for What We Eat In America, NHANES (Survey-SR) [Dataset]. http://doi.org/10.15482/USDA.ADC/1409053
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 9, 2024
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Authors
    Jaspreet K.C. Ahuja; David B. Haytowitz; Janet M. Roseland; Shirley Wasswa-Kintu; Bethany Showell; Melissa Nickle; Meena Somanchi; Mona Khan; Jacob Exler; Juhi R. Williams; Quynh Anh Nguyen; Pamela R. Pehrsson
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    United States
    Description

    The dataset, Survey-SR, provides the nutrient data for assessing dietary intakes from the national survey What We Eat In America, National Health and Nutrition Examination Survey (WWEIA, NHANES). Historically, USDA databases have been used for national nutrition monitoring (1). Currently, the Food and Nutrient Database for Dietary Studies (FNDDS) (2), is used by Food Surveys Research Group, ARS, to process dietary intake data from WWEIA, NHANES. Nutrient values for FNDDS are based on Survey-SR. Survey-SR was referred to as the "Primary Data Set" in older publications. Early versions of the dataset were composed mainly of commodity-type items such as wheat flour, sugar, milk, etc. However, with increased consumption of commercial processed and restaurant foods and changes in how national nutrition monitoring data are used (1), many commercial processed and restaurant items have been added to Survey-SR.

    The current version, Survey-SR 2013-2014, is mainly based on the USDA National Nutrient Database for Standard Reference (SR) 28 (2) and contains sixty-six nutrientseach for 3,404 foods. These nutrient data will be used for assessing intake data from WWEIA, NHANES 2013-2014. Nutrient profiles were added for 265 new foods and updated for about 500 foods from the version used for the previous survey (WWEIA, NHANES 2011-12). New foods added include mainly commercially processed foods such as several gluten-free products, milk substitutes, sauces and condiments such as sriracha, pesto and wasabi, Greek yogurt, breakfast cereals, low-sodium meat products, whole grain pastas and baked products, and several beverages including bottled tea and coffee, coconut water, malt beverages, hard cider, fruit-flavored drinks, fortified fruit juices and fruit and/or vegetable smoothies. Several school lunch pizzas and chicken products, fast-food sandwiches, and new beef cuts were also added, as they are now reported more frequently by survey respondents. Nutrient profiles were updated for several commonly consumed foods such as cheddar, mozzarella and American cheese, ground beef, butter, and catsup. The changes in nutrient values may be due to reformulations in products, changes in the market shares of brands, or more accurate data. Examples of more accurate data include analytical data, market share data, and data from a nationally representative sample. Resources in this dataset:Resource Title: USDA National Nutrient Database for Standard Reference Dataset for What We Eat In America, NHANES 2013-14 (Survey SR 2013-14). File Name: SurveySR_2013_14 (1).zipResource Description: Access database downloaded on November 16, 2017. US Department of Agriculture, Agricultural Research Service, Nutrient Data Laboratory. USDA National Nutrient Database for Standard Reference Dataset for What We Eat In America, NHANES (Survey-SR), October 2015. Resource Title: Data Dictionary. File Name: SurveySR_DD.pdf

  3. f

    Table_2_Factors affecting HPV infection in U.S. and Beijing females: A...

    • frontiersin.figshare.com
    • figshare.com
    xlsx
    Updated Jun 6, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Huixia Yang; Yujin Xie; Rui Guan; Yanlan Zhao; Weihua Lv; Ying Liu; Feng Zhu; Huijuan Liu; Xinxiang Guo; Zhen Tang; Haijing Li; Yu Zhong; Bin Zhang; Hong Yu (2023). Table_2_Factors affecting HPV infection in U.S. and Beijing females: A modeling study.XLSX [Dataset]. http://doi.org/10.3389/fpubh.2022.1052210.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 6, 2023
    Dataset provided by
    Frontiers
    Authors
    Huixia Yang; Yujin Xie; Rui Guan; Yanlan Zhao; Weihua Lv; Ying Liu; Feng Zhu; Huijuan Liu; Xinxiang Guo; Zhen Tang; Haijing Li; Yu Zhong; Bin Zhang; Hong Yu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundHuman papillomavirus (HPV) infection is an important carcinogenic infection highly prevalent among many populations. However, independent influencing factors and predictive models for HPV infection in both U.S. and Beijing females are rarely confirmed. In this study, our first objective was to explore the overlapping HPV infection-related factors in U.S. and Beijing females. Secondly, we aimed to develop an R package for identifying the top-performing prediction models and build the predictive models for HPV infection using this R package.MethodsThis cross-sectional study used data from the 2009–2016 NHANES (a national population-based study) and the 2019 data on Beijing female union workers from various industries. Prevalence, potential influencing factors, and predictive models for HPV infection in both cohorts were explored.ResultsThere were 2,259 (NHANES cohort, age: 20–59 years) and 1,593 (Beijing female cohort, age: 20–70 years) participants included in analyses. The HPV infection rate of U.S. NHANES and Beijing females were, respectively 45.73 and 8.22%. The number of male sex partners, marital status, and history of HPV infection were the predominant factors that influenced HPV infection in both NHANES and Beijing female cohorts. However, condom application was not an independent influencing factor for HPV infection in both cohorts. R package Modelbest was established. The nomogram developed based on Modelbest package showed better performance than the nomogram which only included significant factors in multivariate regression analysis.ConclusionCollectively, despite the widespread availability of HPV vaccines, HPV infection is still prevalent. Compared with condom promotion, avoidance of multiple sexual partners seems to be more effective for preventing HPV infection. Nomograms developed based on Modelbest can provide improved personalized risk assessment for HPV infection. Our R package Modelbest has potential to be a powerful tool for future predictive model studies.

  4. f

    Table_1_Exploring the relationship between total serum calcium and melanoma...

    • frontiersin.figshare.com
    • figshare.com
    xlsx
    Updated Dec 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qiaochu Zhou; Wei Wang; Jinhui Wang; Changchang Li; Jianle Ji (2024). Table_1_Exploring the relationship between total serum calcium and melanoma development: a cross-sectional study.xlsx [Dataset]. http://doi.org/10.3389/fnut.2024.1461818.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Dec 23, 2024
    Dataset provided by
    Frontiers
    Authors
    Qiaochu Zhou; Wei Wang; Jinhui Wang; Changchang Li; Jianle Ji
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundMelanoma is the fourth leading cause of cancer-related death worldwide. The continuous exploration and reporting of risk factors of melanoma is important for standardizing and reducing the incidence of the disease. Calcium signaling is a promising therapeutic target for melanoma; however, the relationship between total serum calcium levels and melanoma development remains unclear.MethodsIn this study, we included patients with melanoma from the National Health and Nutrition Examination Survey (NHANES) database from 2003 to 2006 and from 2009 to 2016. The baseline clinical characteristics of the participants were analyzed using the chi-square and rank-sum tests. Subsequently, a fitted model was constructed to evaluate the relationship between total serum calcium levels and melanoma development. The performance of total serum calcium levels and covariates in predicting the risk of melanoma was assessed based on ROC curves. Finally, LASSO regression analysis was performed using the “glmnet” R package to identify clinical characteristics associated with melanoma.ResultsA total of 13,432 participants were included in this study. Age, race, household poverty-to-income ratio, response of the skin to sunlight after a certain period of non-exposure, wearing long-sleeved shirts, frequency of sunscreen use, and arthritis were significantly correlated with the development of melanoma. The p-values of total serum calcium levels in three fitted models were 

  5. The association between environmental quality and diabetes in the U.S.

    • catalog.data.gov
    • s.cnmilf.com
    Updated Nov 12, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). The association between environmental quality and diabetes in the U.S. [Dataset]. https://catalog.data.gov/dataset/the-association-between-environmental-quality-and-diabetes-in-the-u-s
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Area covered
    United States
    Description

    Population-based county-level estimates for diagnosed (DDP), undiagnosed (UDP), and total diabetes prevalence (TDP) were acquired from the Institute for Health Metrics and Evaluation (IHME) for the years 2004-2012 (Evaluation 2017). Prevalence estimates were calculated using a two-stage approach. The first stage used National Health and Nutrition Examination Survey (NHANES) data to predict high fasting plasma glucose (FPG) levels (≥126 mg/dL) and/or hemoglobin A1C (HbA1C) levels (≥6.5% [48 mmol/mol]) based on self-reported demographic and behavioral characteristics (Dwyer-Lindgren, Mackenbach et al. 2016). This model was then applied to Behavioral Risk Factor Surveillance System (BRFSS) data to impute high FPG and/or A1C status for each BRFSS respondent (Dwyer-Lindgren, Mackenbach et al. 2016). The second stage used the imputed BRFSS data to fit a series of small area models, which were used to predict the county-level prevalence of each of the diabetes-related outcomes (Dwyer-Lindgren, Mackenbach et al. 2016). Diagnosed diabetes was defined as the proportion of adults (age 20+ years) who reported a previous diabetes diagnosis, represented as an age-standardized prevalence percentage. Undiagnosed diabetes was defined as proportion of adults (age 20+ years) who have a high FPG or HbA1C but did not report a previous diagnosis of diabetes. Total diabetes was defined as the proportion of adults (age 20+ years) who reported a previous diabetes diagnosis and/or had a high FPG/HbA1C. The age-standardized diabetes prevalence (%) was used as the outcome. The EQI was constructed for 2000-2005 for all US counties and is composed of five domains (air, water, built, land, and sociodemographic), each composed of variables to represent the environmental quality of that domain. Domain-specific EQIs were developed using principal components analysis (PCA) to reduce these variables within each domain while the overall EQI was constructed from a second PCA from these individual domains (L. C. Messer et al., 2014). To account for differences in environment across rural and urban counties, the overall and domain-specific EQIs were stratified by rural urban continuum codes (RUCCs) (U.S. Department of Agriculture, 2015). This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Human health data are not available publicly. EQI data are available at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: Data are stored as csv files. This dataset is associated with the following publication: Jagai, J., A. Krajewski, S. Shaikh, D. Lobdell, and R. Sargis. Association between environmental quality and diabetes in the U.S.A.. Journal of Diabetes Investigation. John Wiley & Sons, Inc., Hoboken, NJ, USA, 11(2): 315-324, (2020).

  6. f

    Performance of classification models (testing data).

    • figshare.com
    • plos.figshare.com
    xls
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prasanna Santhanam; Tanmay Nath; Faiz Khan Mohammad; Rexford S. Ahima (2023). Performance of classification models (testing data). [Dataset]. http://doi.org/10.1371/journal.pone.0233336.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Prasanna Santhanam; Tanmay Nath; Faiz Khan Mohammad; Rexford S. Ahima
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Performance of classification models (testing data).

  7. Estimated US population prevalence and number needed to treat to avoid...

    • figshare.com
    xls
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joseph A. Johnston; David R. Nelson; Limin Zhang; Sarah E. Curtis; James R. Voelker; John R. Wetterau (2023). Estimated US population prevalence and number needed to treat to avoid all-cause mortality in select NHANES patient subgroups. [Dataset]. http://doi.org/10.1371/journal.pone.0218435.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Joseph A. Johnston; David R. Nelson; Limin Zhang; Sarah E. Curtis; James R. Voelker; John R. Wetterau
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    Estimated US population prevalence and number needed to treat to avoid all-cause mortality in select NHANES patient subgroups.

  8. f

    Table_1_Subclinical hearing loss and educational performance in children: a...

    • frontiersin.figshare.com
    bin
    Updated Aug 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rahul K. Sharma; Alexander Chern; Justin S. Golub; Anil K. Lalwani (2023). Table_1_Subclinical hearing loss and educational performance in children: a national study.docx [Dataset]. http://doi.org/10.3389/fauot.2023.1214188.s001
    Explore at:
    binAvailable download formats
    Dataset updated
    Aug 3, 2023
    Dataset provided by
    Frontiers
    Authors
    Rahul K. Sharma; Alexander Chern; Justin S. Golub; Anil K. Lalwani
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ObjectiveHearing loss can cause speech and language delays, communication barriers, and learning problems. Such factors are associated with reduced academic achievement, social isolation, decreased quality of life, and poorer health outcomes. We use a national cohort of children to examine how subclinical hearing loss is associated with academic/educational performance. The goal of this study is to determine if different levels of subclinical hearing loss (pure tone average ≤ 25 dB HL) are associated with educational testing outcomes in children.DesignAnalysis of children 6–16 years old who participated in the National Health and Nutrition Examination Survey (NHANES-III, 1988–1994) was performed. Air-conduction thresholds were measured at 0.5, 1, 2, 4, 6, and 8 kHz. A four-frequency pure-tone average (PTA) was calculated from 0.5, 1, 2, and 4 kHz. Hearing thresholds were divided into categories ( ≤ 0, 1–10, and 11–25 dB) for analysis. The outcomes of interest were the Wide Range Achievement Test (WRAT-R) and Wechsler Intelligence Scale for Children (WISC-R). Analysis was conducted using ANOVA and logistic regression.ResultsWe analyzed 3,965 participants. In univariable analysis, the average scores in scaled math, reading, digit span (short-term memory), and block design (visual-motor skills) were significantly lower with worsening hearing categories (p < 0.01). In multivariable regression PTAs of 1–10 dB HL (OR 1.72, 95% CI 1.29–2.29, p < 0.01) and 11-25 dB HL (OR: 2.99, 95% CI 1.3–6.65, p < 0.01), compared to PTA of ≤0 dB HL, were associated with poor reading test performance (

  9. f

    Infection status by nicotine exposure.

    • plos.figshare.com
    • figshare.com
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erin L. Tompkins; Thomas A. Beltran; Elizabeth J. Gelner; Aaron R. Farmer (2023). Infection status by nicotine exposure. [Dataset]. http://doi.org/10.1371/journal.pone.0234704.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Erin L. Tompkins; Thomas A. Beltran; Elizabeth J. Gelner; Aaron R. Farmer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Infection status by nicotine exposure.

  10. f

    Characteristics of NHANES 1999–2002 participants by quartiles (Q) of red...

    • figshare.com
    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jing Hu; WenYen Juan; Nadine R. Sahyoun (2023). Characteristics of NHANES 1999–2002 participants by quartiles (Q) of red blood cell (RBC) folate and dietary folate equivalents (DFE)1. [Dataset]. http://doi.org/10.1371/journal.pone.0148697.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Jing Hu; WenYen Juan; Nadine R. Sahyoun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Characteristics of NHANES 1999–2002 participants by quartiles (Q) of red blood cell (RBC) folate and dietary folate equivalents (DFE)1.

  11. Performance of 5-fold cross-validation of classification models -training...

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prasanna Santhanam; Tanmay Nath; Faiz Khan Mohammad; Rexford S. Ahima (2023). Performance of 5-fold cross-validation of classification models -training data (mean ± SD). [Dataset]. http://doi.org/10.1371/journal.pone.0233336.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Prasanna Santhanam; Tanmay Nath; Faiz Khan Mohammad; Rexford S. Ahima
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Performance of 5-fold cross-validation of classification models -training data (mean ± SD).

  12. f

    Key parameters of various machine learning models used for regression task.

    • plos.figshare.com
    xls
    Updated May 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prasanna Santhanam; Tanmay Nath; Faiz Khan Mohammad; Rexford S. Ahima (2020). Key parameters of various machine learning models used for regression task. [Dataset]. http://doi.org/10.1371/journal.pone.0233336.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 20, 2020
    Dataset provided by
    PLOS ONE
    Authors
    Prasanna Santhanam; Tanmay Nath; Faiz Khan Mohammad; Rexford S. Ahima
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Key parameters of various machine learning models used for regression task.

  13. f

    Performance of 5-fold cross-validation of predictive models (training data)....

    • figshare.com
    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prasanna Santhanam; Tanmay Nath; Faiz Khan Mohammad; Rexford S. Ahima (2023). Performance of 5-fold cross-validation of predictive models (training data). [Dataset]. http://doi.org/10.1371/journal.pone.0233336.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Prasanna Santhanam; Tanmay Nath; Faiz Khan Mohammad; Rexford S. Ahima
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Performance of 5-fold cross-validation of predictive models (training data).

  14. Threshold analysis between DII and EM.

    • plos.figshare.com
    xls
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pan-Wei Hu; Bi-Rong Yang; Xiao-Le Zhang; Xiao-Tong Yan; Juan-Juan Ma; Cong Qi; Guo-Jing Jiang (2023). Threshold analysis between DII and EM. [Dataset]. http://doi.org/10.1371/journal.pone.0283216.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Pan-Wei Hu; Bi-Rong Yang; Xiao-Le Zhang; Xiao-Tong Yan; Juan-Juan Ma; Cong Qi; Guo-Jing Jiang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Endometriosis is a common chronic inflammatory and estrogen-dependent disease that mostly affects people of childbearing age. The dietary inflammatory index (DII) is a novel instrument for assessing the overall inflammatory potential of diet. However, no studies have shown the relationship between DII and endometriosis to date. This study aimed to elucidate the relationship between DII and endometriosis. Data were acquired from the National Health and Nutrition Examination Survey (NHANES) 2001–2006. DII was calculated using an inbuilt function in the R package. Relevant patient information was obtained through a questionnaire containing their gynecological history. Based on an endometriosis questionnaire survey, those participants who answered yes were considered cases (with endometriosis), and participants who answered no were considered as controls (without endometriosis) group. Multivariate weighted logistic regression was applied to examine the correlation between DII and endometriosis. Subgroup analysis and smoothing curve between DII and endometriosis were conducted in a further investigation. Compared to the control group, patients were prone to having a higher DII (P = 0.014). Adjusted multivariate regression models showed that DII was positively correlated with the incidence of endometriosis (P < 0.05). Analysis of subgroups revealed no significant heterogeneity. In middle-aged and older women (age ≥ 35 years), the smoothing curve fitting analysis results demonstrated a non-linear relationship between DII and the prevalence of endometriosis. Therefore, using DII as an indicator of dietary-related inflammation may help to provide new insight into the role of diet in the prevention and management of endometriosis.

  15. Hazard ratios (HR) of overall cancer and 95% confidence intervals (95% CI)...

    • plos.figshare.com
    xls
    Updated Feb 22, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jing Hu; WenYen Juan; Nadine R. Sahyoun (2016). Hazard ratios (HR) of overall cancer and 95% confidence intervals (95% CI) by quartiles (Q) of red blood cell (RBC) folate, serum folate, and dietary folate equivalents (DFE) 1,2, NHANES 1999–2002. [Dataset]. http://doi.org/10.1371/journal.pone.0148697.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Feb 22, 2016
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Jing Hu; WenYen Juan; Nadine R. Sahyoun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Hazard ratios (HR) of overall cancer and 95% confidence intervals (95% CI) by quartiles (Q) of red blood cell (RBC) folate, serum folate, and dietary folate equivalents (DFE) 1,2, NHANES 1999–2002.

  16. Association between gamma gap and all-cause mortality with gamma gap...

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stephen P. Juraschek; Alison R. Moliterno; William Checkley; Edgar R. Miller III (2023). Association between gamma gap and all-cause mortality with gamma gap dichotomized at different cutpoints (Hazard Ratios, 95% CI). [Dataset]. http://doi.org/10.1371/journal.pone.0143494.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Stephen P. Juraschek; Alison R. Moliterno; William Checkley; Edgar R. Miller III
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Note: Bold represents P < 0.05Model 1: adjusted for age, sex, race/ethnicityModel 2: adjusted for model 1 + estimated glomerular filtration rate, albuminuria, hypertension, smoking status, body mass index, total cholesterol, HDL-cholesterol, self-reported cancer, aspartate aminotransferase, alanine aminotransferase, total bilirubin, alkaline phosphatase, hepatitis B virus core Igg status, hepatitis C virus Igg status, C-reactive protein, white blood cell count, and serum albumin*Between percentiles; 0.5 was used to indicate that this was between percentiles.Association between gamma gap and all-cause mortality with gamma gap dichotomized at different cutpoints (Hazard Ratios, 95% CI).

  17. f

    Nonparametric correlation coefficients between log transformed HOMA-IR and...

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prasanna Santhanam; Steven P. Rowe; Jenny Pena Dias; Rexford S. Ahima (2023). Nonparametric correlation coefficients between log transformed HOMA-IR and HOMA-β. [Dataset]. http://doi.org/10.1371/journal.pone.0216900.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Prasanna Santhanam; Steven P. Rowe; Jenny Pena Dias; Rexford S. Ahima
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Nonparametric correlation coefficients between log transformed HOMA-IR and HOMA-β.

  18. Descriptive statistics of the study cohort.

    • plos.figshare.com
    xls
    Updated Jun 7, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prasanna Santhanam; Steven P. Rowe; Jenny Pena Dias; Rexford S. Ahima (2023). Descriptive statistics of the study cohort. [Dataset]. http://doi.org/10.1371/journal.pone.0216900.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 7, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Prasanna Santhanam; Steven P. Rowe; Jenny Pena Dias; Rexford S. Ahima
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Descriptive statistics of the study cohort.

  19. f

    R code for data analysis.

    • plos.figshare.com
    txt
    Updated Feb 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mi Zhou; Zikun Zhao; Xiaoran Li; Yanxuan Jin; Xinlei Hong; Haoning He; Mengchu Zhao; Xiaomei Song (2025). R code for data analysis. [Dataset]. http://doi.org/10.1371/journal.pone.0311103.s001
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 19, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Mi Zhou; Zikun Zhao; Xiaoran Li; Yanxuan Jin; Xinlei Hong; Haoning He; Mengchu Zhao; Xiaomei Song
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundPhysical activity (PA) is important for students in secondary school, however, trends in PA among secondary school students have shown a significant decline. There is a need to understand the PA of middle school students.ObjectiveThe first objective is to identify the PA levels and screen time of students in middle school. The second objective of the study is to examine the PA levels and screen time among students of different genders.MethodsParticipants from four consecutive two-year cycles of National Health and Nutrition Examination Survey (NHANES, 2011–2012, 2013–2014, 2015–2016, and 2017–2018) were included in this study. Spearman correlation model was used to identify the correlation between participants’ demographics, PA, and screen time data. Negative binomial regression model was used to describe students’ PA and screen time (Dependent variable) in different grades (Independent variables). Gender and Age were taken as control variables.ResultsAfter the data preprocessing, 2516 participants were included in this study. A significant correlation has been found between grade and PA, instead of screen time. Negative binomial regression shows that students have the lowest PA in their transition year grade 6, and their screen time decreased with the grade increased. Significant differences can be found across gender. Future efforts should focus on developing school transition support programs designed to improve PA.

  20. f

    Hazard ratios of overall cancer incidence and 95% confidence intervals (95%...

    • figshare.com
    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jing Hu; WenYen Juan; Nadine R. Sahyoun (2023). Hazard ratios of overall cancer incidence and 95% confidence intervals (95% CI) by continuous levels of red blood cell (RBC) folate, serum folate, and dietary folate equivalents (DFE) 1, NHANES 1999–2002. [Dataset]. http://doi.org/10.1371/journal.pone.0148697.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Jing Hu; WenYen Juan; Nadine R. Sahyoun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Hazard ratios of overall cancer incidence and 95% confidence intervals (95% CI) by continuous levels of red blood cell (RBC) folate, serum folate, and dietary folate equivalents (DFE) 1, NHANES 1999–2002.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet (2025). NHANES 1988-2018 [Dataset]. http://doi.org/10.6084/m9.figshare.21743372.v2
Organization logo

NHANES 1988-2018

Explore at:
69 scholarly articles cite this dataset (View in Google Scholar)
application/gzipAvailable download formats
Dataset updated
Feb 18, 2025
Dataset provided by
figshare
Authors
Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The National Health and Nutrition Examination Survey (NHANES) provides data on the health and environmental exposure of the non-institutionalized US population. Such data have considerable potential to understand how the environment and behaviors impact human health. These data are also currently leveraged to answer public health questions such as prevalence of disease. However, these data need to first be processed before new insights can be derived through large-scale analyses. NHANES data are stored across hundreds of files with multiple inconsistencies. Correcting such inconsistencies takes systematic cross examination and considerable efforts but is required for accurately and reproducibly characterizing the associations between the exposome and diseases (e.g., cancer mortality outcomes). Thus, we developed a set of curated and unified datasets and accompanied code by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-2018), totaling 134,310 participants and 4,740 variables. The variables convey 1) demographic information, 2) dietary consumption, 3) physical examination results, 4) occupation, 5) questionnaire items (e.g., physical activity, general health status, medical conditions), 6) medications, 7) mortality status linked from the National Death Index, 8) survey weights, 9) environmental exposure biomarker measurements, and 10) chemical comments that indicate which measurements are below or above the lower limit of detection. We also provide a data dictionary listing the variables and their descriptions to help researchers browse the data. We also provide R markdown files to show example codes on calculating summary statistics and running regression models to help accelerate high-throughput analysis of the exposome and secular trends on cancer mortality. csv Data Record: The curated NHANES datasets and the data dictionaries includes 13 .csv files and 1 excel file. The curated NHANES datasets involves 10 .csv formatted files, one for each module and labeled as the following: 1) mortality, 2) dietary, 3) demographics, 4) response, 5) medications, 6) questionnaire, 7) chemicals, 8) occupation, 9) weights, and 10) comments. The eleventh file is a dictionary that lists the variable name, description, module, category, units, CAS Number, comment use, chemical family, chemical family shortened, number of measurements, and cycles available for all 4,740 variables in NHANES ("dictionary_nhanes.csv"). The 12th csv file contains the harmonized categories for the categorical variables ("dictionary_harmonized_categories.csv"). The 13th file contains the dictionary for descriptors on the drugs codes (“dictionary_drug_codes.csv”). The 14th file is an excel file that contains the cleaning documentation, which records all the inconsistencies for all affected variables to help curate each of the NHANES datasets (“nhanes_inconsistencies_documentation.xlsx”). R Data Record: For researchers who want to conduct their analysis in the R programming language, the curated NHANES datasets and the data dictionaries can be downloaded as a .zip file which include an .RData file and an .R file. We provided an .RData file that contains all the aforementioned datasets as R data objects (“w - nhanes_1988_2018.RData”). Also in this .RData file, we make available all R scripts on customized functions that were written to curate the data. We also provide an .R file that shows how we used the customized functions (i.e. our pipeline) to curate the data (“m - nhanes_1988_2018.R”).

Search
Clear search
Close search
Google apps
Main menu