Facebook
TwitterA simple and robust non-linear method is presented for normalization using array signal distribution analysis and cubic splines. Both the regression and spline-based methods described performed better than existing linear methods when assessed on the variability of replicate arrays
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ABSTRACT The considerable volume of data generated by sensors in the field presents systematic errors; thus, it is extremely important to exclude these errors to ensure mapping quality. The objective of this research was to develop and test a methodology to identify and exclude outliers in high-density spatial data sets, determine whether the developed filter process could help decrease the nugget effect and improve the spatial variability characterization of high sampling data. We created a filter composed of a global, anisotropic, and an anisotropic local analysis of data, which considered the respective neighborhood values. For that purpose, we used the median to classify a given spatial point into the data set as the main statistical parameter and took into account its neighbors within a radius. The filter was tested using raw data sets of corn yield, soil electrical conductivity (ECa), and the sensor vegetation index (SVI) in sugarcane. The results showed an improvement in accuracy of spatial variability within the data sets. The methodology reduced RMSE by 85 %, 97 %, and 79 % in corn yield, soil ECa, and SVI respectively, compared to interpolation errors of raw data sets. The filter excluded the local outliers, which considerably reduced the nugget effects, reducing estimation error of the interpolated data. The methodology proposed in this work had a better performance in removing outlier data when compared to two other methodologies from the literature.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Decoy database search with target-decoy competition (TDC) provides an intuitive, easy-to-implement method for estimating the false discovery rate (FDR) associated with spectrum identifications from shotgun proteomics data. However, the procedure can yield different results for a fixed data set analyzed with different decoy databases, and this decoy-induced variability is particularly problematic for smaller FDR thresholds, data sets, or databases. The average TDC (aTDC) protocol combats this problem by exploiting multiple independently shuffled decoy databases to provide an FDR estimate with reduced variability. We provide a tutorial introduction to aTDC, describe an improved variant of the protocol that offers increased statistical power, and discuss how to deploy aTDC in practice using the Crux software toolkit.
Facebook
TwitterThis dataset provides salinity measurements collected from water bodies along 17 east-west transects in along the lower Kashunuk River, Yukon-Kuskokwim Delta National Wildlife Refuge, 25 June - 30 July 1993.
Facebook
TwitterThe purpose of this project is to improve the accuracy of statistical software by providing reference datasets with certified computational results that enable the objective evaluation of statistical software. Currently datasets and certified values are provided for assessing the accuracy of software for univariate statistics, linear regression, nonlinear regression, and analysis of variance. The collection includes both generated and 'real-world' data of varying levels of difficulty. Generated datasets are designed to challenge specific computations. These include the classic Wampler datasets for testing linear regression algorithms and the Simon & Lesage datasets for testing analysis of variance algorithms. Real-world data include challenging datasets such as the Longley data for linear regression, and more benign datasets such as the Daniel & Wood data for nonlinear regression. Certified values are 'best-available' solutions. The certification procedure is described in the web pages for each statistical method. Datasets are ordered by level of difficulty (lower, average, and higher). Strictly speaking the level of difficulty of a dataset depends on the algorithm. These levels are merely provided as rough guidance for the user. Producing correct results on all datasets of higher difficulty does not imply that your software will pass all datasets of average or even lower difficulty. Similarly, producing correct results for all datasets in this collection does not imply that your software will do the same for your particular dataset. It will, however, provide some degree of assurance, in the sense that your package provides correct results for datasets known to yield incorrect results for some software. The Statistical Reference Datasets is also supported by the Standard Reference Data Program.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Lower township by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Lower township across both sexes and to determine which sex constitutes the majority.
Key observations
There is a slight majority of female population, with 51.13% of total population being female. Source: U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Lower township Population by Gender. You can refer the same here
Facebook
TwitterThis dataset tracks the updates made on the dataset "A new non-linear normalization method for reducing variability in DNA microarray experiments" as a repository for previous versions of the data and metadata.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this article, we propose using the principle of boosting to reduce the bias of a random forest prediction in the regression setting. From the original random forest fit, we extract the residuals and then fit another random forest to these residuals. We call the sum of these two random forests a one-step boosted forest. We show with simulated and real data that the one-step boosted forest has a reduced bias compared to the original random forest. The article also provides a variance estimate of the one-step boosted forest by an extension of the infinitesimal Jackknife estimator. Using this variance estimate, we can construct prediction intervals for the boosted forest and we show that they have good coverage probabilities. Combining the bias reduction and the variance estimate, we show that the one-step boosted forest has a significant reduction in predictive mean squared error and thus an improvement in predictive performance. When applied on datasets from the UCI database, one-step boosted forest performs better than random forest and gradient boosting machine algorithms. Theoretically, we can also extend such a boosting process to more than one step and the same principles outlined in this article can be used to find variance estimates for such predictors. Such boosting will reduce bias even further but it risks over-fitting and also increases the computational burden. Supplementary materials for this article are available online.
Facebook
TwitterCoding ExplanationsExplanations of why the selection gradients were coded as either constructed, non-constructed, or mixed.Temporal AnalysisSelection gradients used in the temporal analysis from Clark et al. 2019 in The American Naturalist: Niche construction affects the variability and strength of selection.Spatial AnalysisSelection gradients used in the spatial analysis from Clark et al. 2019 in The American Naturalist: Niche construction affects the variability and strength of selection.Combined AnalysisSelection gradients used in the combined analysis from Clark et al. 2019 in The American Naturalist: Niche construction affects the variability and strength of selection.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionMeta-analysis is a powerful means for leveraging the hundreds of experiments being run worldwide into more statistically powerful analyses. This is also true for the analysis of omic data, including genome-wide DNA methylation. In particular, thousands of DNA methylation profiles generated using the Illumina 450k are stored in the publicly accessible Gene Expression Omnibus (GEO) repository. Often, however, the intensity values produced by the BeadChip (raw data) are not deposited, therefore only pre-processed values -obtained after computational manipulation- are available. Pre-processing is possibly different among studies and may then affect meta-analysis by introducing non-biological sources of variability.Material and methodsTo systematically investigate the effect of pre-processing on meta-analysis, we analysed four different collections of DNA methylation samples (datasets), each composed of two subsets, for which raw data from controls (i.e. healthy subjects) and cases (i.e. patients) are available. We pre-processed the data from each dataset with nine among the most common pipelines found in literature. Moreover, we evaluated the performance of regRCPqn, a modification of the RCP algorithm that aims to improve data consistency. For each combination of pre-processing (9 × 9), we first evaluated the between-sample variability among control subjects and, then, we identified genomic positions that are differentially methylated between cases and controls (differential analysis).Results and conclusionThe pre-processing of DNA methylation data affects both the between-sample variability and the loci identified as differentially methylated, and the effects of pre-processing are strongly dataset-dependent. By contrast, application of our renormalization algorithm regRCPqn: (i) reduces variability and (ii) increases agreement between meta-analysed datasets, both critical components of data harmonization.
Facebook
TwitterThe goal of this research is to reduce the lead content in sweetpotato through development of varieties that have concentrations below the action levels for lead in processed food intended for babies and young children in guidance for industry set forth by the Food and Drug Administration (FDA-2022-D-0278) in January 2025. This dataset provides the concentration of lead (ppb) accumulation in sweetpotato root tissue (flesh) for 10 genotypes that were grown in sand and treated with a nutrient solution containing 10 ppm of lead. For determination of lead concentration in the flesh, the skins were removed prior to elemental analyses with a single quadrupole inductively coupled plasma mass spectrometer (Agilent 7900 ICP-MS). All experiments were arranged in a randomized complete block design (RCBD) with three replications. This research demonstrates that genotype-specific variability of lead accumulation exists in U.S. sweetpotato germplasm and can be used for development of new varieties that have low levels of lead to ensure a safe source of food for human consumption.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The use of multiple imputation (MI) is becoming increasingly popular for addressing missing data. Although some conventional MI approaches have been well studied and have shown empirical validity, they have limitations when processing large datasets with complex data structures. Their imputation performances usually rely on the proper specification of imputation models, and this requires expert knowledge of the inherent relations among variables. Moreover, these standard approaches tend to be computationally inefficient for medium and large datasets. In this article, we propose a scalable MI framework mixgb, which is based on XGBoost, subsampling, and predictive mean matching. Our approach leverages the power of XGBoost, a fast implementation of gradient boosted trees, to automatically capture interactions and nonlinear relations while achieving high computational efficiency. In addition, we incorporate subsampling and predictive mean matching to reduce bias and to better account for appropriate imputation variability. The proposed framework is implemented in an R package mixgb. Supplementary materials for this article are available online.
Facebook
TwitterAll data files are in excel format. Files with names CSU are different mesocosms qPCR data results for vitellogen gene and 18s a house keeping gene. Data files labelled ORD are qPCR data generated by NERL Cincinnati. Those labeled R5 are qPCR data generated by EPA’s Region 5 lab and RMI_Mass are qPCR data generated by the University of Massachusetts Amherst. This dataset is associated with the following publication: Jastrow , A., D. Gordon , K. Auger, E. Punska, K. Arcaro, K. Keteles , D. Winkleman, D. Lattier , A. Biales , and J. Lazorchak. Tools to minimize interlaboratory variability in vitellogenin gene expression monitoring programs. ENVIRONMENTAL TOXICOLOGY AND CHEMISTRY. Society of Environmental Toxicology and Chemistry, Pensacola, FL, USA, 36(11): 3102-3107, (2017).
Facebook
TwitterThree wells in New Hampshire were sampled bimonthly over three years to evaluate the temporal variability of arsenic concentrations and groundwater age. All samples had measurable concentrations of arsenic throughout the entire sampling period and concentrations in individual wells varied, on average, by more than 7 µg/L. High arsenic concentrations (>10 µg/L) were measured in wells KFW-87 and SGW-93, consistent with the high pH and low dissolved oxygen typically found in bedrock wells. Lower arsenic concentrations (<10 µg/L) at well SGW-65 were consistent with lower pH typical of the glacial aquifer. The well producing the oldest water, public bedrock well SGW-93, was not the well with the highest arsenic concentrations; however the groundwater age generally increased at this well over time with arsenic concentrations. Arsenic concentrations at the private bedrock well, KFW-87, which had the highest concentrations among the three wells covaried with groundwater depth (rho=-0.53, p=0.029), suggesting flushing during recharge events. Arsenic concentrations in the public supply wells, SGW-93 and SGW-65, correlated significantly with one another by sample date (rho=0.77, p<0.001). Similarly, the old fraction of water in the public glacial well and the young fraction of water in the public bedrock well correlated significantly by sampling date, suggesting that some of the water captured by the glacial aquifer well may originate in the bedrock aquifer, as no other drivers of arsenic variability were observed in the glacial well. A direct relation between groundwater age and arsenic could not be determined with the available data and age model results for this time period at any of the wells, however, pumping rate and depth to water appeared to be indicative of arsenic concentration changes over time. This data release documents four Microsoft Excel tables that contain data for understanding arsenic variability related to three water-supply wells in southeast New Hampshire. Table_1_GF_AgeInterpretations.xlsx contains dissolved gas modeling results, environmental tracer concentrations (tritium, tritiogenic helium-3, sulfur hexafluoride, carbon-14, and chlorofluorocarbons (CFCs)), and results for the mean age of groundwater by calibration of lumped parameter models to tracer concentrations (Jurgens and others, 2012). Dissolved gas modeling and environmental tracer results were averaged when multiple dissolved gas models and tracer concentrations were computed in tables 2 and 3. In cases where age was modeled with a binary lumped parameter model (BMM), the mean age was computed from the mean age and fraction of the two components in the mixture. Please see the processing steps below and the main manuscript for additional details on the results presented in this table. Table_2_GF_DissolvedGasModeling.xlsx contains detailed information on the calibration of dissolved gas models to dissolved gas concentrations (neon, argon, krypton, xenon, nitrogen, oxygen, carbon dioxide, methane, hydrogen, and nitrous oxide). Calibration was done using methods described by Aeschbach-Hertig and others (1999) with modifications to include nitrogen gas (Weiss 1970). In most cases, a single set of noble gas data (neon, argon, krypton, and xenon) were used to determine recharge conditions (recharge temperature, excess air or entrapped air, and fractionation). In cases where noble gas data were not available, multiple analyses of nitrogen and argon (collected sequentially on the same sample date) were used to determine recharge conditions. Table_3_GF_ComputedTracerConcentrations.xlsx contains detailed information on calculations of environmental tracer data. Dissolved gas models were paired with sulfur hexafluoride and helium isotopes (3He/4He) and helium to determine concentrations of tritiogenic helium-3 (from decay of tritium; Solomon and Cook, 2000). Multiple tracer concentrations were computed when sites had multiple dissolved gas model results and analyses for sulfur hexafluoride or helium isotopes. Table_4_GF_ConcentrationsAndValues.xlsx contains values of selected physiochemical parameters collected during well purging and selected chemical concentrations from filtered samples collected on various dates at each well. The table also contains physical characteristics, depth to water, and pumping rate, of each well that were calculated from continuous data. Depth to water was calculated as the minimum monthly values at KFW-87 and pumping rate was calculated as the arithmetic mean between each sampling date at SGW-65 and SGW-93.
Facebook
TwitterThis dataset contains deidentified subject level data from the study titled: Responses to Exposure to Low Levels of Concentrated Ambient Particles in Healthy Young Adults (RECAP). Subject, exposure, and health endpoint data are included in the dataset. Health endpoint data includes inflammatory, heart rate variability and cardiac repolarization, lung function, blood chemistry, and lipids measures. This dataset is associated with the following publication: Wyatt, L., R. Devlin, A. Rappold, and M. Case. Low levels of fine particulate matter increase vascular damage and reduce pulmonary function in young healthy adults. Particle and Fibre Toxicology. BioMed Central Ltd, London, UK, 17(1): 58, (2020).
Facebook
Twitter
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Lower Burrell by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Lower Burrell across both sexes and to determine which sex constitutes the majority.
Key observations
There is a slight majority of male population, with 50.3% of total population being male. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Lower Burrell Population by Race & Ethnicity. You can refer the same here
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Lower Salem by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Lower Salem across both sexes and to determine which sex constitutes the majority.
Key observations
There is a majority of female population, with 64.75% of total population being female. Source: U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Lower Salem Population by Gender. You can refer the same here
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Lower Allen township by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Lower Allen township across both sexes and to determine which sex constitutes the majority.
Key observations
There is a majority of male population, with 59.12% of total population being male. Source: U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Lower Allen township Population by Gender. You can refer the same here
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Lower Yoder township by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Lower Yoder township across both sexes and to determine which sex constitutes the majority.
Key observations
There is a slight majority of male population, with 50.39% of total population being male. Source: U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Lower Yoder township Population by Gender. You can refer the same here
Facebook
TwitterA simple and robust non-linear method is presented for normalization using array signal distribution analysis and cubic splines. Both the regression and spline-based methods described performed better than existing linear methods when assessed on the variability of replicate arrays