24 datasets found

NHANES 1988-2018
figshare.com
application/gzip
Updated Feb 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet (2025). NHANES 1988-2018 [Dataset]. http://doi.org/10.6084/m9.figshare.21743372.v2
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21743372.v2
Dataset updated
Feb 18, 2025
Dataset provided by
figshare
Authors
Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The National Health and Nutrition Examination Survey (NHANES) provides data on the health and environmental exposure of the non-institutionalized US population. Such data have considerable potential to understand how the environment and behaviors impact human health. These data are also currently leveraged to answer public health questions such as prevalence of disease. However, these data need to first be processed before new insights can be derived through large-scale analyses. NHANES data are stored across hundreds of files with multiple inconsistencies. Correcting such inconsistencies takes systematic cross examination and considerable efforts but is required for accurately and reproducibly characterizing the associations between the exposome and diseases (e.g., cancer mortality outcomes). Thus, we developed a set of curated and unified datasets and accompanied code by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-2018), totaling 134,310 participants and 4,740 variables. The variables convey 1) demographic information, 2) dietary consumption, 3) physical examination results, 4) occupation, 5) questionnaire items (e.g., physical activity, general health status, medical conditions), 6) medications, 7) mortality status linked from the National Death Index, 8) survey weights, 9) environmental exposure biomarker measurements, and 10) chemical comments that indicate which measurements are below or above the lower limit of detection. We also provide a data dictionary listing the variables and their descriptions to help researchers browse the data. We also provide R markdown files to show example codes on calculating summary statistics and running regression models to help accelerate high-throughput analysis of the exposome and secular trends on cancer mortality. csv Data Record: The curated NHANES datasets and the data dictionaries includes 13 .csv files and 1 excel file. The curated NHANES datasets involves 10 .csv formatted files, one for each module and labeled as the following: 1) mortality, 2) dietary, 3) demographics, 4) response, 5) medications, 6) questionnaire, 7) chemicals, 8) occupation, 9) weights, and 10) comments. The eleventh file is a dictionary that lists the variable name, description, module, category, units, CAS Number, comment use, chemical family, chemical family shortened, number of measurements, and cycles available for all 4,740 variables in NHANES ("dictionary_nhanes.csv"). The 12th csv file contains the harmonized categories for the categorical variables ("dictionary_harmonized_categories.csv"). The 13th file contains the dictionary for descriptors on the drugs codes (“dictionary_drug_codes.csv”). The 14th file is an excel file that contains the cleaning documentation, which records all the inconsistencies for all affected variables to help curate each of the NHANES datasets (“nhanes_inconsistencies_documentation.xlsx”). R Data Record: For researchers who want to conduct their analysis in the R programming language, the curated NHANES datasets and the data dictionaries can be downloaded as a .zip file which include an .RData file and an .R file. We provided an .RData file that contains all the aforementioned datasets as R data objects (“w - nhanes_1988_2018.RData”). Also in this .RData file, we make available all R scripts on customized functions that were written to curate the data. We also provide an .R file that shows how we used the customized functions (i.e. our pipeline) to curate the data (“m - nhanes_1988_2018.R”).
Data from: USDA National Nutrient Database for Standard Reference Dataset...
agdatacommons.nal.usda.gov
gimi9.com
+3more
zip
Updated Feb 9, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jaspreet K.C. Ahuja; David B. Haytowitz; Janet M. Roseland; Shirley Wasswa-Kintu; Bethany Showell; Melissa Nickle; Meena Somanchi; Mona Khan; Jacob Exler; Juhi R. Williams; Quynh Anh Nguyen; Pamela R. Pehrsson (2024). USDA National Nutrient Database for Standard Reference Dataset for What We Eat In America, NHANES (Survey-SR) [Dataset]. http://doi.org/10.15482/USDA.ADC/1409053
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.15482/USDA.ADC/1409053
Dataset updated
Feb 9, 2024
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Authors
Jaspreet K.C. Ahuja; David B. Haytowitz; Janet M. Roseland; Shirley Wasswa-Kintu; Bethany Showell; Melissa Nickle; Meena Somanchi; Mona Khan; Jacob Exler; Juhi R. Williams; Quynh Anh Nguyen; Pamela R. Pehrsson
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
United States
Description
The dataset, Survey-SR, provides the nutrient data for assessing dietary intakes from the national survey What We Eat In America, National Health and Nutrition Examination Survey (WWEIA, NHANES). Historically, USDA databases have been used for national nutrition monitoring (1). Currently, the Food and Nutrient Database for Dietary Studies (FNDDS) (2), is used by Food Surveys Research Group, ARS, to process dietary intake data from WWEIA, NHANES. Nutrient values for FNDDS are based on Survey-SR. Survey-SR was referred to as the "Primary Data Set" in older publications. Early versions of the dataset were composed mainly of commodity-type items such as wheat flour, sugar, milk, etc. However, with increased consumption of commercial processed and restaurant foods and changes in how national nutrition monitoring data are used (1), many commercial processed and restaurant items have been added to Survey-SR.

The current version, Survey-SR 2013-2014, is mainly based on the USDA National Nutrient Database for Standard Reference (SR) 28 (2) and contains sixty-six nutrientseach for 3,404 foods. These nutrient data will be used for assessing intake data from WWEIA, NHANES 2013-2014. Nutrient profiles were added for 265 new foods and updated for about 500 foods from the version used for the previous survey (WWEIA, NHANES 2011-12). New foods added include mainly commercially processed foods such as several gluten-free products, milk substitutes, sauces and condiments such as sriracha, pesto and wasabi, Greek yogurt, breakfast cereals, low-sodium meat products, whole grain pastas and baked products, and several beverages including bottled tea and coffee, coconut water, malt beverages, hard cider, fruit-flavored drinks, fortified fruit juices and fruit and/or vegetable smoothies. Several school lunch pizzas and chicken products, fast-food sandwiches, and new beef cuts were also added, as they are now reported more frequently by survey respondents. Nutrient profiles were updated for several commonly consumed foods such as cheddar, mozzarella and American cheese, ground beef, butter, and catsup. The changes in nutrient values may be due to reformulations in products, changes in the market shares of brands, or more accurate data. Examples of more accurate data include analytical data, market share data, and data from a nationally representative sample. Resources in this dataset:Resource Title: USDA National Nutrient Database for Standard Reference Dataset for What We Eat In America, NHANES 2013-14 (Survey SR 2013-14). File Name: SurveySR_2013_14 (1).zipResource Description: Access database downloaded on November 16, 2017. US Department of Agriculture, Agricultural Research Service, Nutrient Data Laboratory. USDA National Nutrient Database for Standard Reference Dataset for What We Eat In America, NHANES (Survey-SR), October 2015. Resource Title: Data Dictionary. File Name: SurveySR_DD.pdf
f
Table_2_Factors affecting HPV infection in U.S. and Beijing females: A...
frontiersin.figshare.com
figshare.com
xlsx
Updated Jun 6, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Huixia Yang; Yujin Xie; Rui Guan; Yanlan Zhao; Weihua Lv; Ying Liu; Feng Zhu; Huijuan Liu; Xinxiang Guo; Zhen Tang; Haijing Li; Yu Zhong; Bin Zhang; Hong Yu (2023). Table_2_Factors affecting HPV infection in U.S. and Beijing females: A modeling study.XLSX [Dataset]. http://doi.org/10.3389/fpubh.2022.1052210.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fpubh.2022.1052210.s002
Dataset updated
Jun 6, 2023
Dataset provided by
Frontiers
Authors
Huixia Yang; Yujin Xie; Rui Guan; Yanlan Zhao; Weihua Lv; Ying Liu; Feng Zhu; Huijuan Liu; Xinxiang Guo; Zhen Tang; Haijing Li; Yu Zhong; Bin Zhang; Hong Yu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundHuman papillomavirus (HPV) infection is an important carcinogenic infection highly prevalent among many populations. However, independent influencing factors and predictive models for HPV infection in both U.S. and Beijing females are rarely confirmed. In this study, our first objective was to explore the overlapping HPV infection-related factors in U.S. and Beijing females. Secondly, we aimed to develop an R package for identifying the top-performing prediction models and build the predictive models for HPV infection using this R package.MethodsThis cross-sectional study used data from the 2009–2016 NHANES (a national population-based study) and the 2019 data on Beijing female union workers from various industries. Prevalence, potential influencing factors, and predictive models for HPV infection in both cohorts were explored.ResultsThere were 2,259 (NHANES cohort, age: 20–59 years) and 1,593 (Beijing female cohort, age: 20–70 years) participants included in analyses. The HPV infection rate of U.S. NHANES and Beijing females were, respectively 45.73 and 8.22%. The number of male sex partners, marital status, and history of HPV infection were the predominant factors that influenced HPV infection in both NHANES and Beijing female cohorts. However, condom application was not an independent influencing factor for HPV infection in both cohorts. R package Modelbest was established. The nomogram developed based on Modelbest package showed better performance than the nomogram which only included significant factors in multivariate regression analysis.ConclusionCollectively, despite the widespread availability of HPV vaccines, HPV infection is still prevalent. Compared with condom promotion, avoidance of multiple sexual partners seems to be more effective for preventing HPV infection. Nomograms developed based on Modelbest can provide improved personalized risk assessment for HPV infection. Our R package Modelbest has potential to be a powerful tool for future predictive model studies.
f
Table_1_Exploring the relationship between total serum calcium and melanoma...
frontiersin.figshare.com
figshare.com
xlsx
Updated Dec 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qiaochu Zhou; Wei Wang; Jinhui Wang; Changchang Li; Jianle Ji (2024). Table_1_Exploring the relationship between total serum calcium and melanoma development: a cross-sectional study.xlsx [Dataset]. http://doi.org/10.3389/fnut.2024.1461818.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fnut.2024.1461818.s001
Dataset updated
Dec 23, 2024
Dataset provided by
Frontiers
Authors
Qiaochu Zhou; Wei Wang; Jinhui Wang; Changchang Li; Jianle Ji
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundMelanoma is the fourth leading cause of cancer-related death worldwide. The continuous exploration and reporting of risk factors of melanoma is important for standardizing and reducing the incidence of the disease. Calcium signaling is a promising therapeutic target for melanoma; however, the relationship between total serum calcium levels and melanoma development remains unclear.MethodsIn this study, we included patients with melanoma from the National Health and Nutrition Examination Survey (NHANES) database from 2003 to 2006 and from 2009 to 2016. The baseline clinical characteristics of the participants were analyzed using the chi-square and rank-sum tests. Subsequently, a fitted model was constructed to evaluate the relationship between total serum calcium levels and melanoma development. The performance of total serum calcium levels and covariates in predicting the risk of melanoma was assessed based on ROC curves. Finally, LASSO regression analysis was performed using the “glmnet” R package to identify clinical characteristics associated with melanoma.ResultsA total of 13,432 participants were included in this study. Age, race, household poverty-to-income ratio, response of the skin to sunlight after a certain period of non-exposure, wearing long-sleeved shirts, frequency of sunscreen use, and arthritis were significantly correlated with the development of melanoma. The p-values of total serum calcium levels in three fitted models were 
The association between environmental quality and diabetes in the U.S.
catalog.data.gov
s.cnmilf.com
Updated Nov 12, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). The association between environmental quality and diabetes in the U.S. [Dataset]. https://catalog.data.gov/dataset/the-association-between-environmental-quality-and-diabetes-in-the-u-s
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Area covered
United States
Description
Population-based county-level estimates for diagnosed (DDP), undiagnosed (UDP), and total diabetes prevalence (TDP) were acquired from the Institute for Health Metrics and Evaluation (IHME) for the years 2004-2012 (Evaluation 2017). Prevalence estimates were calculated using a two-stage approach. The first stage used National Health and Nutrition Examination Survey (NHANES) data to predict high fasting plasma glucose (FPG) levels (≥126 mg/dL) and/or hemoglobin A1C (HbA1C) levels (≥6.5% [48 mmol/mol]) based on self-reported demographic and behavioral characteristics (Dwyer-Lindgren, Mackenbach et al. 2016). This model was then applied to Behavioral Risk Factor Surveillance System (BRFSS) data to impute high FPG and/or A1C status for each BRFSS respondent (Dwyer-Lindgren, Mackenbach et al. 2016). The second stage used the imputed BRFSS data to fit a series of small area models, which were used to predict the county-level prevalence of each of the diabetes-related outcomes (Dwyer-Lindgren, Mackenbach et al. 2016). Diagnosed diabetes was defined as the proportion of adults (age 20+ years) who reported a previous diabetes diagnosis, represented as an age-standardized prevalence percentage. Undiagnosed diabetes was defined as proportion of adults (age 20+ years) who have a high FPG or HbA1C but did not report a previous diagnosis of diabetes. Total diabetes was defined as the proportion of adults (age 20+ years) who reported a previous diabetes diagnosis and/or had a high FPG/HbA1C. The age-standardized diabetes prevalence (%) was used as the outcome. The EQI was constructed for 2000-2005 for all US counties and is composed of five domains (air, water, built, land, and sociodemographic), each composed of variables to represent the environmental quality of that domain. Domain-specific EQIs were developed using principal components analysis (PCA) to reduce these variables within each domain while the overall EQI was constructed from a second PCA from these individual domains (L. C. Messer et al., 2014). To account for differences in environment across rural and urban counties, the overall and domain-specific EQIs were stratified by rural urban continuum codes (RUCCs) (U.S. Department of Agriculture, 2015). This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Human health data are not available publicly. EQI data are available at: https://edg.epa.gov/data/Public/ORD/NHEERL/EQI. Format: Data are stored as csv files. This dataset is associated with the following publication: Jagai, J., A. Krajewski, S. Shaikh, D. Lobdell, and R. Sargis. Association between environmental quality and diabetes in the U.S.A.. Journal of Diabetes Investigation. John Wiley & Sons, Inc., Hoboken, NJ, USA, 11(2): 315-324, (2020).
f
Performance of classification models (testing data).
figshare.com
plos.figshare.com
xls
Updated May 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prasanna Santhanam; Tanmay Nath; Faiz Khan Mohammad; Rexford S. Ahima (2023). Performance of classification models (testing data). [Dataset]. http://doi.org/10.1371/journal.pone.0233336.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0233336.t006
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Prasanna Santhanam; Tanmay Nath; Faiz Khan Mohammad; Rexford S. Ahima
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Performance of classification models (testing data).
Estimated US population prevalence and number needed to treat to avoid...
figshare.com
xls
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joseph A. Johnston; David R. Nelson; Limin Zhang; Sarah E. Curtis; James R. Voelker; John R. Wetterau (2023). Estimated US population prevalence and number needed to treat to avoid all-cause mortality in select NHANES patient subgroups. [Dataset]. http://doi.org/10.1371/journal.pone.0218435.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0218435.t003
Dataset updated
Jun 4, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Joseph A. Johnston; David R. Nelson; Limin Zhang; Sarah E. Curtis; James R. Voelker; John R. Wetterau
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Description
Estimated US population prevalence and number needed to treat to avoid all-cause mortality in select NHANES patient subgroups.
f
Table_1_Subclinical hearing loss and educational performance in children: a...
frontiersin.figshare.com
bin
Updated Aug 3, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rahul K. Sharma; Alexander Chern; Justin S. Golub; Anil K. Lalwani (2023). Table_1_Subclinical hearing loss and educational performance in children: a national study.docx [Dataset]. http://doi.org/10.3389/fauot.2023.1214188.s001
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.3389/fauot.2023.1214188.s001
Dataset updated
Aug 3, 2023
Dataset provided by
Frontiers
Authors
Rahul K. Sharma; Alexander Chern; Justin S. Golub; Anil K. Lalwani
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ObjectiveHearing loss can cause speech and language delays, communication barriers, and learning problems. Such factors are associated with reduced academic achievement, social isolation, decreased quality of life, and poorer health outcomes. We use a national cohort of children to examine how subclinical hearing loss is associated with academic/educational performance. The goal of this study is to determine if different levels of subclinical hearing loss (pure tone average ≤ 25 dB HL) are associated with educational testing outcomes in children.DesignAnalysis of children 6–16 years old who participated in the National Health and Nutrition Examination Survey (NHANES-III, 1988–1994) was performed. Air-conduction thresholds were measured at 0.5, 1, 2, 4, 6, and 8 kHz. A four-frequency pure-tone average (PTA) was calculated from 0.5, 1, 2, and 4 kHz. Hearing thresholds were divided into categories ( ≤ 0, 1–10, and 11–25 dB) for analysis. The outcomes of interest were the Wide Range Achievement Test (WRAT-R) and Wechsler Intelligence Scale for Children (WISC-R). Analysis was conducted using ANOVA and logistic regression.ResultsWe analyzed 3,965 participants. In univariable analysis, the average scores in scaled math, reading, digit span (short-term memory), and block design (visual-motor skills) were significantly lower with worsening hearing categories (p < 0.01). In multivariable regression PTAs of 1–10 dB HL (OR 1.72, 95% CI 1.29–2.29, p < 0.01) and 11-25 dB HL (OR: 2.99, 95% CI 1.3–6.65, p < 0.01), compared to PTA of ≤0 dB HL, were associated with poor reading test performance (
f
Infection status by nicotine exposure.
plos.figshare.com
figshare.com
xls
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erin L. Tompkins; Thomas A. Beltran; Elizabeth J. Gelner; Aaron R. Farmer (2023). Infection status by nicotine exposure. [Dataset]. http://doi.org/10.1371/journal.pone.0234704.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0234704.t002
Dataset updated
May 30, 2023
Dataset provided by
PLOS ONE
Authors
Erin L. Tompkins; Thomas A. Beltran; Elizabeth J. Gelner; Aaron R. Farmer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Infection status by nicotine exposure.
f
Characteristics of NHANES 1999–2002 participants by quartiles (Q) of red...
figshare.com
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jing Hu; WenYen Juan; Nadine R. Sahyoun (2023). Characteristics of NHANES 1999–2002 participants by quartiles (Q) of red blood cell (RBC) folate and dietary folate equivalents (DFE)1. [Dataset]. http://doi.org/10.1371/journal.pone.0148697.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0148697.t001
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Jing Hu; WenYen Juan; Nadine R. Sahyoun
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Characteristics of NHANES 1999–2002 participants by quartiles (Q) of red blood cell (RBC) folate and dietary folate equivalents (DFE)1.
Performance of 5-fold cross-validation of classification models -training...
plos.figshare.com
xls
Updated Jun 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prasanna Santhanam; Tanmay Nath; Faiz Khan Mohammad; Rexford S. Ahima (2023). Performance of 5-fold cross-validation of classification models -training data (mean ± SD). [Dataset]. http://doi.org/10.1371/journal.pone.0233336.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0233336.t005
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Prasanna Santhanam; Tanmay Nath; Faiz Khan Mohammad; Rexford S. Ahima
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Performance of 5-fold cross-validation of classification models -training data (mean ± SD).
f
Key parameters of various machine learning models used for regression task.
plos.figshare.com
xls
Updated May 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prasanna Santhanam; Tanmay Nath; Faiz Khan Mohammad; Rexford S. Ahima (2020). Key parameters of various machine learning models used for regression task. [Dataset]. http://doi.org/10.1371/journal.pone.0233336.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0233336.t001
Dataset updated
May 20, 2020
Dataset provided by
PLOS ONE
Authors
Prasanna Santhanam; Tanmay Nath; Faiz Khan Mohammad; Rexford S. Ahima
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Key parameters of various machine learning models used for regression task.
f
Performance of 5-fold cross-validation of predictive models (training data)....
figshare.com
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prasanna Santhanam; Tanmay Nath; Faiz Khan Mohammad; Rexford S. Ahima (2023). Performance of 5-fold cross-validation of predictive models (training data). [Dataset]. http://doi.org/10.1371/journal.pone.0233336.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0233336.t004
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Prasanna Santhanam; Tanmay Nath; Faiz Khan Mohammad; Rexford S. Ahima
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Performance of 5-fold cross-validation of predictive models (training data).
Threshold analysis between DII and EM.
plos.figshare.com
xls
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pan-Wei Hu; Bi-Rong Yang; Xiao-Le Zhang; Xiao-Tong Yan; Juan-Juan Ma; Cong Qi; Guo-Jing Jiang (2023). Threshold analysis between DII and EM. [Dataset]. http://doi.org/10.1371/journal.pone.0283216.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0283216.t004
Dataset updated
Jun 21, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Pan-Wei Hu; Bi-Rong Yang; Xiao-Le Zhang; Xiao-Tong Yan; Juan-Juan Ma; Cong Qi; Guo-Jing Jiang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Endometriosis is a common chronic inflammatory and estrogen-dependent disease that mostly affects people of childbearing age. The dietary inflammatory index (DII) is a novel instrument for assessing the overall inflammatory potential of diet. However, no studies have shown the relationship between DII and endometriosis to date. This study aimed to elucidate the relationship between DII and endometriosis. Data were acquired from the National Health and Nutrition Examination Survey (NHANES) 2001–2006. DII was calculated using an inbuilt function in the R package. Relevant patient information was obtained through a questionnaire containing their gynecological history. Based on an endometriosis questionnaire survey, those participants who answered yes were considered cases (with endometriosis), and participants who answered no were considered as controls (without endometriosis) group. Multivariate weighted logistic regression was applied to examine the correlation between DII and endometriosis. Subgroup analysis and smoothing curve between DII and endometriosis were conducted in a further investigation. Compared to the control group, patients were prone to having a higher DII (P = 0.014). Adjusted multivariate regression models showed that DII was positively correlated with the incidence of endometriosis (P < 0.05). Analysis of subgroups revealed no significant heterogeneity. In middle-aged and older women (age ≥ 35 years), the smoothing curve fitting analysis results demonstrated a non-linear relationship between DII and the prevalence of endometriosis. Therefore, using DII as an indicator of dietary-related inflammation may help to provide new insight into the role of diet in the prevention and management of endometriosis.
Hazard ratios (HR) of overall cancer and 95% confidence intervals (95% CI)...
plos.figshare.com
xls
Updated Feb 22, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jing Hu; WenYen Juan; Nadine R. Sahyoun (2016). Hazard ratios (HR) of overall cancer and 95% confidence intervals (95% CI) by quartiles (Q) of red blood cell (RBC) folate, serum folate, and dietary folate equivalents (DFE) 1,2, NHANES 1999–2002. [Dataset]. http://doi.org/10.1371/journal.pone.0148697.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0148697.t002
Dataset updated
Feb 22, 2016
Dataset provided by
PLOShttp://plos.org/
Authors
Jing Hu; WenYen Juan; Nadine R. Sahyoun
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Hazard ratios (HR) of overall cancer and 95% confidence intervals (95% CI) by quartiles (Q) of red blood cell (RBC) folate, serum folate, and dietary folate equivalents (DFE) 1,2, NHANES 1999–2002.
Association between gamma gap and all-cause mortality with gamma gap...
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stephen P. Juraschek; Alison R. Moliterno; William Checkley; Edgar R. Miller III (2023). Association between gamma gap and all-cause mortality with gamma gap dichotomized at different cutpoints (Hazard Ratios, 95% CI). [Dataset]. http://doi.org/10.1371/journal.pone.0143494.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0143494.t002
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Stephen P. Juraschek; Alison R. Moliterno; William Checkley; Edgar R. Miller III
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Note: Bold represents P < 0.05Model 1: adjusted for age, sex, race/ethnicityModel 2: adjusted for model 1 + estimated glomerular filtration rate, albuminuria, hypertension, smoking status, body mass index, total cholesterol, HDL-cholesterol, self-reported cancer, aspartate aminotransferase, alanine aminotransferase, total bilirubin, alkaline phosphatase, hepatitis B virus core Igg status, hepatitis C virus Igg status, C-reactive protein, white blood cell count, and serum albumin*Between percentiles; 0.5 was used to indicate that this was between percentiles.Association between gamma gap and all-cause mortality with gamma gap dichotomized at different cutpoints (Hazard Ratios, 95% CI).
f
Nonparametric correlation coefficients between log transformed HOMA-IR and...
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prasanna Santhanam; Steven P. Rowe; Jenny Pena Dias; Rexford S. Ahima (2023). Nonparametric correlation coefficients between log transformed HOMA-IR and HOMA-β. [Dataset]. http://doi.org/10.1371/journal.pone.0216900.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0216900.t002
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Prasanna Santhanam; Steven P. Rowe; Jenny Pena Dias; Rexford S. Ahima
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Nonparametric correlation coefficients between log transformed HOMA-IR and HOMA-β.
Descriptive statistics of the study cohort.
plos.figshare.com
xls
Updated Jun 7, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prasanna Santhanam; Steven P. Rowe; Jenny Pena Dias; Rexford S. Ahima (2023). Descriptive statistics of the study cohort. [Dataset]. http://doi.org/10.1371/journal.pone.0216900.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0216900.t001
Dataset updated
Jun 7, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Prasanna Santhanam; Steven P. Rowe; Jenny Pena Dias; Rexford S. Ahima
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Descriptive statistics of the study cohort.
f
R code for data analysis.
plos.figshare.com
txt
Updated Feb 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mi Zhou; Zikun Zhao; Xiaoran Li; Yanxuan Jin; Xinlei Hong; Haoning He; Mengchu Zhao; Xiaomei Song (2025). R code for data analysis. [Dataset]. http://doi.org/10.1371/journal.pone.0311103.s001
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0311103.s001
Dataset updated
Feb 19, 2025
Dataset provided by
PLOS ONE
Authors
Mi Zhou; Zikun Zhao; Xiaoran Li; Yanxuan Jin; Xinlei Hong; Haoning He; Mengchu Zhao; Xiaomei Song
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundPhysical activity (PA) is important for students in secondary school, however, trends in PA among secondary school students have shown a significant decline. There is a need to understand the PA of middle school students.ObjectiveThe first objective is to identify the PA levels and screen time of students in middle school. The second objective of the study is to examine the PA levels and screen time among students of different genders.MethodsParticipants from four consecutive two-year cycles of National Health and Nutrition Examination Survey (NHANES, 2011–2012, 2013–2014, 2015–2016, and 2017–2018) were included in this study. Spearman correlation model was used to identify the correlation between participants’ demographics, PA, and screen time data. Negative binomial regression model was used to describe students’ PA and screen time (Dependent variable) in different grades (Independent variables). Gender and Age were taken as control variables.ResultsAfter the data preprocessing, 2516 participants were included in this study. A significant correlation has been found between grade and PA, instead of screen time. Negative binomial regression shows that students have the lowest PA in their transition year grade 6, and their screen time decreased with the grade increased. Significant differences can be found across gender. Future efforts should focus on developing school transition support programs designed to improve PA.
f
Hazard ratios of overall cancer incidence and 95% confidence intervals (95%...
figshare.com
plos.figshare.com
xls
Updated Jun 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jing Hu; WenYen Juan; Nadine R. Sahyoun (2023). Hazard ratios of overall cancer incidence and 95% confidence intervals (95% CI) by continuous levels of red blood cell (RBC) folate, serum folate, and dietary folate equivalents (DFE) 1, NHANES 1999–2002. [Dataset]. http://doi.org/10.1371/journal.pone.0148697.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0148697.t003
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Jing Hu; WenYen Juan; Nadine R. Sahyoun
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Hazard ratios of overall cancer incidence and 95% confidence intervals (95% CI) by continuous levels of red blood cell (RBC) folate, serum folate, and dietary folate equivalents (DFE) 1, NHANES 1999–2002.

Facebook

Twitter

Click to copy link

Link copied

Cite

Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet (2025). NHANES 1988-2018 [Dataset]. http://doi.org/10.6084/m9.figshare.21743372.v2

NHANES 1988-2018

Explore at:

69 scholarly articles cite this dataset (View in Google Scholar)

application/gzipAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.21743372.v2

Dataset updated

Feb 18, 2025

Dataset provided by

figshare

Authors

Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The National Health and Nutrition Examination Survey (NHANES) provides data on the health and environmental exposure of the non-institutionalized US population. Such data have considerable potential to understand how the environment and behaviors impact human health. These data are also currently leveraged to answer public health questions such as prevalence of disease. However, these data need to first be processed before new insights can be derived through large-scale analyses. NHANES data are stored across hundreds of files with multiple inconsistencies. Correcting such inconsistencies takes systematic cross examination and considerable efforts but is required for accurately and reproducibly characterizing the associations between the exposome and diseases (e.g., cancer mortality outcomes). Thus, we developed a set of curated and unified datasets and accompanied code by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-2018), totaling 134,310 participants and 4,740 variables. The variables convey 1) demographic information, 2) dietary consumption, 3) physical examination results, 4) occupation, 5) questionnaire items (e.g., physical activity, general health status, medical conditions), 6) medications, 7) mortality status linked from the National Death Index, 8) survey weights, 9) environmental exposure biomarker measurements, and 10) chemical comments that indicate which measurements are below or above the lower limit of detection. We also provide a data dictionary listing the variables and their descriptions to help researchers browse the data. We also provide R markdown files to show example codes on calculating summary statistics and running regression models to help accelerate high-throughput analysis of the exposome and secular trends on cancer mortality. csv Data Record: The curated NHANES datasets and the data dictionaries includes 13 .csv files and 1 excel file. The curated NHANES datasets involves 10 .csv formatted files, one for each module and labeled as the following: 1) mortality, 2) dietary, 3) demographics, 4) response, 5) medications, 6) questionnaire, 7) chemicals, 8) occupation, 9) weights, and 10) comments. The eleventh file is a dictionary that lists the variable name, description, module, category, units, CAS Number, comment use, chemical family, chemical family shortened, number of measurements, and cycles available for all 4,740 variables in NHANES ("dictionary_nhanes.csv"). The 12th csv file contains the harmonized categories for the categorical variables ("dictionary_harmonized_categories.csv"). The 13th file contains the dictionary for descriptors on the drugs codes (“dictionary_drug_codes.csv”). The 14th file is an excel file that contains the cleaning documentation, which records all the inconsistencies for all affected variables to help curate each of the NHANES datasets (“nhanes_inconsistencies_documentation.xlsx”). R Data Record: For researchers who want to conduct their analysis in the R programming language, the curated NHANES datasets and the data dictionaries can be downloaded as a .zip file which include an .RData file and an .R file. We provided an .RData file that contains all the aforementioned datasets as R data objects (“w - nhanes_1988_2018.RData”). Also in this .RData file, we make available all R scripts on customized functions that were written to curate the data. We also provide an .R file that shows how we used the customized functions (i.e. our pipeline) to curate the data (“m - nhanes_1988_2018.R”).

Clear search

Close search

Google apps

Main menu

NHANES 1988-2018

Data from: USDA National Nutrient Database for Standard Reference Dataset...

Table_2_Factors affecting HPV infection in U.S. and Beijing females: A...

Table_1_Exploring the relationship between total serum calcium and melanoma...

The association between environmental quality and diabetes in the U.S.

Performance of classification models (testing data).

Estimated US population prevalence and number needed to treat to avoid...

Table_1_Subclinical hearing loss and educational performance in children: a...

Infection status by nicotine exposure.

Characteristics of NHANES 1999–2002 participants by quartiles (Q) of red...

Performance of 5-fold cross-validation of classification models -training...

Key parameters of various machine learning models used for regression task.

Performance of 5-fold cross-validation of predictive models (training data)....

Threshold analysis between DII and EM.

Hazard ratios (HR) of overall cancer and 95% confidence intervals (95% CI)...

Association between gamma gap and all-cause mortality with gamma gap...

Nonparametric correlation coefficients between log transformed HOMA-IR and...

Descriptive statistics of the study cohort.

R code for data analysis.

Hazard ratios of overall cancer incidence and 95% confidence intervals (95%...

NHANES 1988-2018