Facebook
TwitterThis dataset was created by sciencestoked
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by JORGE GARCIA-INIGUEZ
Released under MIT
Facebook
TwitterBiostatistics Using R: A Laboratory Manual was created with the goals of providing biological content to lab sessions by using authentic research data and introducing R programming language. Chapter 2 introduces sampling, accuracy, and precision.
Facebook
TwitterLicense: GPL-v2 The R script presents an advanced sampling approach for monitoring biodiversity on agricultural land by combining multiple objectives and integrating environmental and geographic space. The example demonstrates the first-stage selection of squares (km2) in the ALL-EMA sampling design using modern sampling techniques such as unequal probability sampling with fixed sample size, balanced sampling, stratified balancing and geographic spreading. Sampling is done with unequal probabilities and weights defined by power allocation to give equal weight to extrapolations to the total agricultural area of Switzerland and two stratifications of predefined interest (regions and agricultural production zones). Calibration is used to limit the distribution of the sampling weights. The sample sizes are almost fixed within the strata and evenly distributed across the years of a temporal rotation plan, which is favourable for the organisation of the field survey. Sampling also ensures an optimal (annual) distribution across geographic space, including altitude. Despite the complexity of the sampling, estimation based on probability theory is straightforward. Ecker, K.T., Meier, E.S. & Tillé, Y. 2023. Integrating spatial and ecological information into comprehensive biodiversity monitoring on agricultural land. Environmental Monitoring and Assessment 195.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary statistics of population and samples taken at different sampling schemes for n = 4, r = 1.
Facebook
TwitterSSP (simulation-based sampling protocol) is an R package that uses simulations of ecological data and dissimilarity-based multivariate standard error (MultSE) as an estimator of precision to evaluate the adequacy of different sampling efforts for studies that will test hypothesis using permutational multivariate analysis of variance. The procedure consists in simulating several extensive data matrixes that mimic some of the relevant ecological features of the community of interest using a pilot data set. For each simulated data, several sampling efforts are repeatedly executed and MultSE calculated. The mean value, 0.025 and 0.975 quantiles of MultSE for each sampling effort across all simulated data are then estimated and standardized regarding the lowest sampling effort. The optimal sampling effort is identified as that in which the increase in sampling effort does not improve the highest MultSE beyond a threshold value (e.g. 2.5 %). The performance of SSP was validated using real dat...
Facebook
TwitterLead concentrations in drinking water samples collected under various sampling protocols in homes with lead service lines and in homes without lead service lines in two US cities. This dataset is associated with the following publication: Lytle, D., M. Urbanic, A. Paul, R. Achtemeier, A. Lewis, S. Hammaker, A. Estep, M. Nadagouda, R. James, and S. Triantafyllidou. Alternative approaches to lead sampling in drinking water: A comparative study of homes with and without lead service lines in two cities. WATER RESEARCH. Elsevier Science Ltd, New York, NY, USA, 994: 180063, (2025).
Facebook
TwitterGeneralized distance sampling (GDS) models are the distance sampling equivalent of temporary emigration N-mixture models. In addition to density and the perceptibility component of detection, both contain an additional parameter for availability for detection which becomes estimable when data from repeated 'visits' are available. GDS models thus account for open populations. This makes them more robust, since natural populations are hardly ever perfectly closed, arguably even over the course of a single breeding season. However, the performance of these models has not been tested thoroughly, and prior (unpublished) analyses suggested that biased estimates, especially for density (high) and availability (low), may typically occur under certain conditions. We conducted three simulation studies and found that bias arises in low-information scenarios, particularly with low sample sizes and low parameter values. Our simulations enable us to determine "estimation frontiers", which separate sa..., , # Title of Dataset: Performance of generalized distance sampling models with temporary emigration: a simulation study
The study was not based on real data. All data used in the study were generated using simulation code.
The dataset contains four R files with simulation codes:
First, run Code_1. The other codes are independent, but the first simul...,
Facebook
TwitterThis data release presents calculated accumulated wastewater (ACCWW, as a percent of total streamflow) values for 43 National Hydrologic Dataset Version 2.1 (NHDPlus V2.1) stream segments coinciding with long-term smallmouth bass sampling locations (Table 1) in the Shenandoah River Watershed (encompassing parts of Virginia and West Virginia, USA). Values are calculated for quarter-year (Quarter 1 [Q1], January - March; Quarter 2 [Q2], April - June; Quarter 3 [Q3], July-September; Quarter 4 [Q4], October-December) time scales (Table 2) and annual time scales (Table 3) for years 2000 to 2018. Estimates at a stream segment represent the combined total upstream wastewater discharges as well as direct discharges into the stream segment. Any users of these data should review the entire metadata record and the associated manuscript (see Larger Work Citation). See 'Distribution Liability' statements for more information.
Facebook
TwitterEmory University (analyzed the urine samples for pyrethroid metabolites). This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Contact Researcher. Format: Pyrethroid metabolite concentration data for 50 adults over six-weeks. This dataset is associated with the following publication: Morgan , M., J. Sobus , D.B. Barr, C. Croghan , F. Chen , R. Walker, L. Alston, E. Andersen, and M. Clifton. Temporal variability of pyrethroid metabolite levels in bedtime, morning, and 24-hr urine samples for 50 adults in North Carolina. ENVIRONMENT INTERNATIONAL. Elsevier Science Ltd, New York, NY, USA, 144: 81-91, (2015).
Facebook
Twitter*n is the number of days in which samples were collected at each site on the same day.
Facebook
TwitterThe U.S. Geological Survey, in cooperation with the University of South Florida and Eckerd College, completed a bathymetric, sidescan sonar, high-resolution seismic-reflection, and surface sediment sampling survey of the inner shelf environment along the western Florida coast. The survey area extends 15km from Sarasota Point to Buttonwood Harbor. This study is part of a larger program initiated by the U.S. Geological Survey to map the geologic framework and monitor the modern processes that affect the western Florida coastal zone. This portion of the project included a reconnaissance high-resolution seismic and side-scan sonar surveys of the entire study area, detailed mapping to identify patterns of hard grounds and sediment cover, and coring of sediments to document historical development of the inner shelf and coastal system.
Facebook
TwitterInferences of population structure and more precisely the identification of genetically homogeneous groups of individuals are essential to the fields of ecology, evolutionary biology, and conservation biology. Such population structure inferences are routinely investigated via the program STRUCTURE implementing a Bayesian algorithm to identify groups of individuals at Hardy-Weinberg and linkage equilibrium. While the method is performing relatively well under various population models with even sampling between subpopulations, the robustness of the method to uneven sample size between subpopulations and/or hierarchical levels of population structure has not yet been tested despite being commonly encountered in empirical datasets. In this study, I used simulated and empirical microsatellite datasets to investigate the impact of uneven sample size between subpopulations and/or hierarchical levels of population structure on the detected population structure. The results demonstrated that u...
Facebook
TwitterFIsh caught on NOAA R/V Townsend Cromwell cruises from 1982 to 1998 and NOAA R/V Oscar E Sette in 2007 and 2009 were measured and/or weighed and sex determination was conducted. Specimen samples were also preserved from selected fishes.
Facebook
TwitterThe dataset is a relational dataset of 8,000 households households, representing a sample of the population of an imaginary middle-income country. The dataset contains two data files: one with variables at the household level, the other one with variables at the individual level. It includes variables that are typically collected in population censuses (demography, education, occupation, dwelling characteristics, fertility, mortality, and migration) and in household surveys (household expenditure, anthropometric data for children, assets ownership). The data only includes ordinary households (no community households). The dataset was created using REaLTabFormer, a model that leverages deep learning methods. The dataset was created for the purpose of training and simulation and is not intended to be representative of any specific country.
The full-population dataset (with about 10 million individuals) is also distributed as open data.
The dataset is a synthetic dataset for an imaginary country. It was created to represent the population of this country by province (equivalent to admin1) and by urban/rural areas of residence.
Household, Individual
The dataset is a fully-synthetic dataset representative of the resident population of ordinary households for an imaginary middle-income country.
ssd
The sample size was set to 8,000 households. The fixed number of households to be selected from each enumeration area was set to 25. In a first stage, the number of enumeration areas to be selected in each stratum was calculated, proportional to the size of each stratum (stratification by geo_1 and urban/rural). Then 25 households were randomly selected within each enumeration area. The R script used to draw the sample is provided as an external resource.
other
The dataset is a synthetic dataset. Although the variables it contains are variables typically collected from sample surveys or population censuses, no questionnaire is available for this dataset. A "fake" questionnaire was however created for the sample dataset extracted from this dataset, to be used as training material.
The synthetic data generation process included a set of "validators" (consistency checks, based on which synthetic observation were assessed and rejected/replaced when needed). Also, some post-processing was applied to the data to result in the distributed data files.
This is a synthetic dataset; the "response rate" is 100%.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Default sim_abundance function call, with descriptions, default values and associated parameter symbols of key arguments.
Facebook
TwitterSampling localities, equilibrium simulations, and simulations with varying r.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
R code for simulating sampling strategies. Description: R code that creates an exemplary data set and simulates the sampling strategies. (R 26Â kb)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset comprises analysis results (major ions, N2, and Ar concentrations) from groundwater samples taken in the Lower Rhine Embayment (Germany) to assess the impact of sampling methods on N2, Ar, and excess-N2 concentrations. The data is used in the manuscript "Comparing Groundwater Sampling Devices for Denitrification Assessment using the N2/Ar Method" by Felix Fahrenbach and Thomas R. Rüde, which is currently undergoing review by Groundwater.
The libraries tidyverse (Wickham et al. 2019), psych (Revelle 2014), car (Fox and Weisberg 2019), rstatix (Kassambara 2023), and PMCMRplus (Pohlert 2024) need to be installed to run the R scripts. Running the Python scripts requires the following packages: numpy (Harris et al. 2020), pandas (McKinney 2010), scipy (Virtanen et al. 2020), statsmodels (Seabold and Perktold 2010), and matplotlib (Hunter 2007).
References Fox, J., and S. Weisberg. 2019. An R Companion to Applied Regression. 3rd ed. Thousand Oaks CA: Sage, https://www.john-fox.ca/Companion/. Harris, C. R., K. J. Millman, S. J. van der Walt, R. Gommers, P. Virtanen, D. Cournapeau, E. Wieser, et al. 2020. Array programming with NumPy. Nature 585, no. 7825: 357–62, https://doi.org/10.1038/s41586-020-2649-2. Hunter, J. D. 2007. Matplotlib: A 2D graphics environment. Computing in Science & Engineering 9, no. 3: 90–95, https://doi.org/10.1109/MCSE.2007.55. Kassambara, A. 2023. rstatix: Pipe-Friendly Framework for Basic Statistical Tests, https://rpkgs.datanovia.com/rstatix/. McKinney, W. 2010. Data Structures for Statistical Computing in Python. In Proceedings of the 9th Python in Science Conference, edited by S. van der Walt and J. Millman, 56–61, https://doi.org/10.25080/Majora-92bf1922-00a. Pohlert, T. 2024. PMCMRplus: Calculate Pairwise Multiple Comparisons of Mean Rank Sums Extended, https://CRAN.R-project.org/package=PMCMRplus. Revelle, W. 2014. psych: Procedures for Psychological, Psychometric, and Personality Research. Evanston, Illinois: Northwestern University, https://CRAN.R-project.org/package=psych. Seabold, S., and J. Perktold. 2010. statsmodels: Econometric and statistical modeling with python. In Proceedings of the 9th Python in Science Conference. Virtanen, P., R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, et al. 2020. SciPy 1.0: Fundamental algorithms for scientific computing in python. Nature Methods 17: 261–72, https://doi.org/10.1038/s41592-019-0686-2. Wickham, H., M. Averick, J. Bryan, W. Chang, L. McGowan, R. François, G. Grolemund, et al. 2019. Welcome to the Tidyverse. Journal of Open Source Software 4, no. 43: 1686, https://doi.org/10.21105/joss.01686.
Facebook
TwitterThe science party maintained a sampling event log, recording all instrument deployments and significant events during the 2010 RAPID_I cruise aboard the R/V WEATHERBIRD II (WB1105). Refer to comments column for additional information.
Facebook
TwitterThis dataset was created by sciencestoked