88 datasets found
  1. f

    Data from: Separating Measurement Error and Signal in Environmental Data:...

    • acs.figshare.com
    xlsx
    Updated Oct 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marschall Furman; Kent W. Thomas; Barbara Jane George (2023). Separating Measurement Error and Signal in Environmental Data: Use of Replicates to Address Uncertainty [Dataset]. http://doi.org/10.1021/acs.est.3c02231.s003
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Oct 5, 2023
    Dataset provided by
    ACS Publications
    Authors
    Marschall Furman; Kent W. Thomas; Barbara Jane George
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Measurement uncertainty has long been a concern in the characterizing and interpreting environmental and toxicological measurements. We compared statistical analysis approaches when there are replicates: a Naı̈ve approach that omits replicates, a Hybrid approach that inappropriately treats replicates as independent samples, and a Measurement Error Model (MEM) approach in a random effects analysis of variance (ANOVA) model that appropriately incorporates replicates. A simulation study assessed the effects of sample size and levels of replication, signal variance, and measurement error on estimates from the three statistical approaches. MEM results were superior overall with confidence intervals for the observed mean narrower on average than those from the Naı̈ve approach, giving improved characterization. The MEM approach also featured an unparalleled advantage in estimating signal and measurement error variance separately, directly addressing measurement uncertainty. These MEM estimates were approximately unbiased on average with more replication and larger sample sizes. Case studies illustrated analyzing normally distributed arsenic and log-normally distributed chromium concentrations in tap water and calculating MEM confidence intervals for the true, latent signal mean and latent signal geometric mean (i.e., with measurement error removed). MEM estimates are valuable for study planning; we used simulation to compare various sample sizes and levels of replication.

  2. f

    DataSheet_1_Large Sample Size Fallacy in Trials About Antipsychotics for...

    • figshare.com
    docx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tessa A. Hulshof; Sytse U. Zuidema; Sarah I. M. Janus; Hendrika J. Luijendijk (2023). DataSheet_1_Large Sample Size Fallacy in Trials About Antipsychotics for Neuropsychiatric Symptoms in Dementia.docx [Dataset]. http://doi.org/10.3389/fphar.2019.01701.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Frontiers
    Authors
    Tessa A. Hulshof; Sytse U. Zuidema; Sarah I. M. Janus; Hendrika J. Luijendijk
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundA typical antipsychotics for neuropsychiatric symptoms in dementia have been tested in much larger trials than the older conventional drugs. The advantage of larger sample sizes is that negative findings become less likely and the effect estimates more precise. However, as sample sizes increase, the trials also get more expensive and time consuming while exposing more patients to drugs with unknown safety profiles. Moreover, a large sample size might yield a statistically significant effect that is not necessarily clinically relevant.ObjectiveTo assess (1) the variation in sample size and sample size calculations of antipsychotic trials in dementia, (2) the size of reported treatment effects and related statistical significance, and (3) general study characteristics that might be related to sample size.Study Design and SettingWe performed a meta-epidemiological study of randomized trials that tested antipsychotics for neuropsychiatric symptoms in dementia. The trials compared conventional or atypical antipsychotics with placebo or another antipsychotic. Two reviewers independently extracted sample size, sample size calculations, reported treatment effects with p-values, and general study characteristics (drug type, trial duration, type of funding). We calculated a reference sample size of 83 and 433 per study group for the placebo-controlled and head-to-head trials respectively.ResultsWe identified 33 placebo-controlled trials, and 18 head-to-head trials. Only 14 (42%) and 2 (11%), respectively, reported a sample size calculation. The average sample size per arm was 34 (range 6–179) in placebo-controlled trials testing conventional drugs, 107 (8–237) in such trials testing atypical drugs, and 104 (95–115) in such trials testing both drug types; it was 31 (10–88) in head-to-head trials. Thirteen out of 18 trials with sample sizes larger than required (72%) reported a statistically significant treatment effect, of which two (15%) were clinically relevant. None of the head-to-head trials reported a statistically significant treatment effect, even though some suggested non-inferiority. In placebo-controlled trials of atypical drugs, longer trial duration (>6 weeks) and commercial funding were associated with higher sample size.ConclusionSample size calculations were poorly reported in antipsychotic trials for dementia. Placebo-controlled trials of atypical antipsychotics showed large sample size fallacy while head-to-head trials were massively underpowered.

  3. i

    Household Poverty Survey 1998 - Gambia, The

    • dev.ihsn.org
    • datacatalog.ihsn.org
    • +1more
    Updated Apr 25, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gambia Bureau of Statistics (GBOS) (2019). Household Poverty Survey 1998 - Gambia, The [Dataset]. https://dev.ihsn.org/nada/catalog/71981
    Explore at:
    Dataset updated
    Apr 25, 2019
    Dataset authored and provided by
    Gambia Bureau of Statistics (GBOS)
    Time period covered
    1998
    Area covered
    The Gambia
    Description

    Abstract

    Rather than studying the entire population, the 1998 Household Poverty Study opted for a sample survey. The advantages of sampling against a complete coverage are well documented and will not be dwelt on here. This notwithstanding, it is worth mentioning that this option allowed for a wide range of issues to be studied. In all, the survey collected information on issues such as education, health, employment and earnings, anthropometry, demography, among others.

    Geographic coverage

    National

    Analysis unit

    • Households
    • Individuals

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    SURVEY DESIGN

    The sample size of any study depends a to large extent on three key factors: 1. The degree of accuracy required 2. The extent of variation in the population with regards to key characteristics of the study. 3. The population size.

    The sample size also needs to be sufficiently large to allow for meaningful analysis bearing in mind the objective of the study, which was mainly to provide a wide range of indicators which will form benchmark information from which poverty will be monitored over time and space.

    Against this background, the sample size for the 1998 Household Poverty Study was set at 2000 households. This was deemed sufficient because it would provide enough cases for subgroup analysis. Two thousand households would also provide sufficient cases given the resource constraints in terms of financing, personnel and time.

    SAMPLE SELECTION

    In order to have a sample that is representative of the country and to avoid conducting interviews in rural areas with scattered population, cluster sampling procedure was adopted using the existing geographical clusters in the form of Enumeration Areas (EA). Technically, Enumeration Areas are mapped to contain about 500 persons but in reality, they range from 300 to 1000 persons. The EA demarcation covers the whole country and conforms to the administrative boundaries.

    Another consideration in the sampling process was the number of households to be selected since this has implications on costs and sampling error. According to Scott (cited in CSD, 1994), a constant take of households per enumeration area has no effect on the sampling error over a Probability Proportional to Size (PPS) technique at the first stage of sampling.

    Unlike rural areas where the rich and poor normally live in the same area, the urban population is more residentially homogeneous. In other words, rich people tend to live in certain areas whilst the poor also tend to cluster together. Given the above considerations, a multistage sampling procedure using the PPS technique was adopted. Therefore, 18 households were randomly selected in rural areas against nine in the urban areas.

    In summary, a multi-stage sample with probability proportional to size (PPS) was taken. Enumeration areas were stratified into 15 groups based on division and density within divisions. A fraction of these EAs (same as in 1993 Population Census) was selected with PPS and 18 households for rural EAs (or 9 for urban EAs) selected using simple random sampling procedure.

    Note: See detailed sampling information in the survey final report which is presented in this documentation.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The survey was administered using structured questionnaire that consists of two parts. Part one of the questionnaire collected demographic, health, education and crop production, among other information. Part two collected information mostly on household expenditure and anthropometric measures.

  4. Data from: Long-term resource variation and group size: A large-sample field...

    • healthdata.gov
    • catalog.data.gov
    application/rdfxml +5
    Updated Jul 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Long-term resource variation and group size: A large-sample field test of the Resource Dispersion Hypothesis [Dataset]. https://healthdata.gov/d/64it-jkd5
    Explore at:
    csv, xml, tsv, json, application/rssxml, application/rdfxmlAvailable download formats
    Dataset updated
    Jul 14, 2025
    Description

    Background The Resource Dispersion Hypothesis (RDH) proposes a mechanism for the passive formation of social groups where resources are dispersed, even in the absence of any benefits of group living per se. Despite supportive modelling, it lacks empirical testing. The RDH predicts that, rather than Territory Size (TS) increasing monotonically with Group Size (GS) to account for increasing metabolic needs, TS is constrained by the dispersion of resource patches, whereas GS is independently limited by their richness. We conducted multiple-year tests of these predictions using data from the long-term study of badgers Meles meles in Wytham Woods, England. The study has long failed to identify direct benefits from group living and, consequently, alternative explanations for their large group sizes have been sought.

       Results
       TS was not consistently related to resource dispersion, nor was GS consistently related to resource richness. Results differed according to data groupings and whether territories were mapped using minimum convex polygons or traditional methods. Habitats differed significantly in resource availability, but there was also evidence that food resources may be spatially aggregated within habitat types as well as between them.
    
    
       Conclusions
       This is, we believe, the largest ever test of the RDH and builds on the long-term project that initiated part of the thinking behind the hypothesis. Support for predictions were mixed and depended on year and the method used to map territory borders. We suggest that within-habitat patchiness, as well as model assumptions, should be further investigated for improved tests of the RDH in the future.
    
  5. n

    Data from: Effects of sample size and full sibs on genetic diversity...

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Apr 21, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gregorio Sánchez-Montes; Arturo Hugo Ariño; José Luis Vizmanos; Jinliang Wang; Íñigo Martínez-Solano; Gregorio Montes (2017). Effects of sample size and full sibs on genetic diversity characterization: a case study of three syntopic Iberian pond-breeding amphibians [Dataset]. http://doi.org/10.5061/dryad.f65s7
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 21, 2017
    Authors
    Gregorio Sánchez-Montes; Arturo Hugo Ariño; José Luis Vizmanos; Jinliang Wang; Íñigo Martínez-Solano; Gregorio Montes
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Area covered
    Spain, Sierra de Guadarrama
    Description

    Accurate characterization of genetic diversity is essential for understanding population demography, predicting future trends and implementing efficient conservation policies. For that purpose, molecular markers are routinely developed for nonmodel species, but key questions regarding sampling design, such as calculation of minimum sample sizes or the effect of relatives in the sample, are often neglected. We used accumulation curves and sibship analyses to explore how these 2 factors affect marker performance in the characterization of genetic diversity. We illustrate this approach with the analysis of an empirical dataset including newly optimized microsatellite sets for 3 Iberian amphibian species: Hyla molleri, Bufo calamita, and Pelophylax perezi. We studied 17–21 populations per species (total n = 547, 652, and 516 individuals, respectively), including a reference locality in which the effect of sample size was explored using larger samples (77–96 individuals). As expected, FIS and tests for Hardy–Weinberg equilibrium and linkage disequilibrium were affected by the presence of full sibs, and most initially inferred disequilibria were no longer statistically significant when full siblings were removed from the sample. We estimated that to obtain reliable estimates, the minimum sample size (potentially including full sibs) was close to 20 for expected heterozygosity, and between 50 and 80 for allelic richness. Our pilot study based on a reference population provided a rigorous assessment of marker properties and the effects of sample size and presence of full sibs in the sample. These examples illustrate the advantages of this approach to produce robust and reliable results for downstream analyses.

  6. f

    Data from: Functional Additive Mixed Models

    • tandf.figshare.com
    txt
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fabian Scheipl; Ana-Maria Staicu; Sonja Greven (2023). Functional Additive Mixed Models [Dataset]. http://doi.org/10.6084/m9.figshare.987098.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Fabian Scheipl; Ana-Maria Staicu; Sonja Greven
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We propose an extensive framework for additive regression models for correlated functional responses, allowing for multiple partially nested or crossed functional random effects with flexible correlation structures for, for example, spatial, temporal, or longitudinal functional data. Additionally, our framework includes linear and nonlinear effects of functional and scalar covariates that may vary smoothly over the index of the functional response. It accommodates densely or sparsely observed functional responses and predictors which may be observed with additional error and includes both spline-based and functional principal component-based terms. Estimation and inference in this framework is based on standard additive mixed models, allowing us to take advantage of established methods and robust, flexible algorithms. We provide easy-to-use open source software in the pffr() function for the R package refund. Simulations show that the proposed method recovers relevant effects reliably, handles small sample sizes well, and also scales to larger datasets. Applications with spatially and longitudinally observed functional data demonstrate the flexibility in modeling and interpretability of results of our approach.

  7. i

    Roads Rehabilitation 2012 - Senegal

    • datacatalog.ihsn.org
    • anads.ansd.sn
    • +2more
    Updated Mar 29, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Impaq International (2019). Roads Rehabilitation 2012 - Senegal [Dataset]. https://datacatalog.ihsn.org/catalog/study/SEN_2012_MCC-RR_v01_M
    Explore at:
    Dataset updated
    Mar 29, 2019
    Dataset authored and provided by
    Impaq International
    Time period covered
    2012
    Area covered
    Senegal
    Description

    Abstract

    The road sector plays a critical role because 99% of goods produced in Senegal are transported by roads. Because the elected segments of RN 2 and RN 6 have a commercially and politically central geographical location and because of their poor initial state their rehabilitation should have a detectible positive effect on local populations (column 3, 4 and 6.) Implementers rehabilitate the road segments of RN2 and RN6 under the supervision of MCC and MCA (Activities). The byproduct of the activities performed is the rehabilitated roads: 120 km and 256 km of rehabilitated RN2 and RN6 road segments, respectively (Outputs). Note that unexpected delays in the implementation have occurred because of environment factors, such as extreme weather and civil unrest. These contingencies can prevent the timely rehabilitation of the roads. There may be other factors that affect the road rehabilitation project. For example, there could be cost overruns that reduce the length of roads that end up being rehabilitated (outputs), thus affecting fewer beneficiaries than planned (outcomes).

    Some outcomes may be realized immediately upon completion of the project, while others may take longer to materialize. Once the road rehabilitation implementation is complete, it is expected that the time and cost required to travel to a certain destination via the rehabilitated roads will be reduced. Also, the targeted road segments will be improved in quality and are thus likely to be used more frequently. These outcomes are expected to be realized shortly after the completion of the roads (short-term outcomes).
    The completion of the road rehabilitation is also expected to unlock economic and social opportunities for households and individuals using the road (medium/long-term outcomes). For example, the project may improve access to markets to buy and sell products. It may also be easier and cheaper to find inputs needed for production activities for both formal enterprises and household informal economic activities. Due to the reduced time and cost of travel on the rehabilitated roads, households may enjoy easier access to basic facilities such as schools and health centers. Furthermore, there may be more employment opportunities due to increased demand in markets accessible via the improved roads. Lastly, the value of land and assets along the rehabilitated roads is expected to rise as demand for the road use rises.

    Research Question 1: Did the RRP reduce the travel time and costs to households/enterprises located near the rehabilitated roads?

    Research Question 2: Did the RRP lead to increased work opportunities for employment and income among beneficiary households?

    Research Question 3: Did the RRP lead to increased access to health and education services?

    Research Question 4: Did the project affect business opportunities and enterprise revenues?

    Research Question 5: What is the ex-post Economic Rate of Return (ERR) of the RRP?

    Research Question 6: How are the benefits of the projects distributed among subgroups of the population such as gender, age and income?

    Research Question 7: How do the long-term impacts of the road projects per dollar invested compare to other typical infrastructure investments?

    Geographic coverage

    Regional Coverage

    Analysis unit

    Households and enterprises

    Universe

    Habitants and Enterprises nearby the Roads.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    We use statistical power analysis to calculate the minimum sample size required to detect an effect of a given size. Identifying an appropriate sample size for our impact evaluation depends on various factors and assumptions, including a desired effect size, target power and significance level. For the desired effect size, we used information on the magnitude of benefits from the Beneficiary Analysis provided by MCC. The power of a statistical test is the probability of detecting a true effect when it truly exists. The significance level is the probability of falsely detecting an effect when it does not exist. We calculate the minimum sample sizes required to detect an effect of a given size for each of the combinations of the most commonly used power and significance levels. Using the present value of benefit stream as a share of annual income of about 10% and the per capita GNI of USD 820 in the ERR spreadsheet from MCC, we estimate that approximately of benefits are expected to be generated from the RRP per household for the first 5 years.

    Using the least restrictive criteria for the power and test size (80% power and a 5% significance level), it was determined that we need at least 1,227 households in each of the treatment and comparison groups. Thus, the minimum total household sample size is 4,908 (=1,227*4). As mentioned, this is the minimum required sample size for the least restrictive assumption for power and test size. For more robust results, we would need larger sample sizes. However, given the trade-off between the statistical rigor and the budgetary constraints faced by MCA-S, we have selected the smallest sample size consistent with a rigorous impact evaluation.

    Regarding the sample size requirement for the enterprise survey, in discussion with MCC and MCA-S, we concluded not to use the power analysis due to lack of information about the number of enterprises along the treatment and comparison roads. Instead, we relied on the input of MCA-S staffers who know about business conditions for enterprises along RN2 and RN6. We then proposed a survey sample of approximately 600 enterprises.

    Research instrument

    Baseline data have been collected using in-person interviews from households and enterprises located along the treatment and comparison areas. The baseline survey collected data on background characteristics and key outcomes of interest (income, use of the roads and various economic activities) for both household and enterprises. The survey instrument tocollect household data was structured in several sections that collected the following information: § Demographic characteristics of household members § Employment and revenues of household members § Household food and non-food consumption (whether a household has consumed certain types of food and the frequency of purchase) § Salary and non-agricultural income of household members § Household assets (e.g., type of home, access to electricity, etc.) § Household members' use of the road, frequency of use, time and distance traveled to various destinations such as local market, communal market, school, health infrastructure and workplace § Agricultural/Livestock production and commercialization: amount of production realized and sold by crop

    A separate questionnaire was developed to gather information on enterprises. This data collection effort is essential to gain a full picture of the impact of the RRP. The survey collected detailed information on the type of enterprise activities, the quantity of goods produced and sold, the costs related to the commercialization of goods and the purchase of raw materials, the size of the enterprises in terms of employees and capital equipment, revenues and use of the road in the same areas in which the heads of households were interviewed. In particular, the survey instrument to collect enterprise data was structured in several sections that collected the following information: § Information on the entrepreneur § Characteristics of the enterprise: e.g., primary activity, workers employed, mobile equipment and machinery (tractors, etc.) § Production and commercialization: e.g., the amount of sales from products and services, destination of products and services, use of the road to deliver the products/services, distance traveled on the road § Difficulties encountered in the entrepreneurial activity, including whether the enterprise has difficulties obtaining credit, recruiting personnel and difficulties related to the access of the road.

  8. Data from: Estimating density from presence/absence data in clustered...

    • zenodo.org
    • datadryad.org
    csv
    Updated Jun 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Magnus Ekström; Magnus Ekström; Saskia Sandring; Anton Grafström; Per-Anders Esseen; Bengt Gunnar Jonsson; Göran Ståhl; Saskia Sandring; Anton Grafström; Per-Anders Esseen; Bengt Gunnar Jonsson; Göran Ståhl (2022). Data from: Estimating density from presence/absence data in clustered populations [Dataset]. http://doi.org/10.5061/dryad.zgmsbcc6t
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 2, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Magnus Ekström; Magnus Ekström; Saskia Sandring; Anton Grafström; Per-Anders Esseen; Bengt Gunnar Jonsson; Göran Ståhl; Saskia Sandring; Anton Grafström; Per-Anders Esseen; Bengt Gunnar Jonsson; Göran Ståhl
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    1. Inventories of plant populations are fundamental in ecological research and monitoring, but such surveys are often prone to field assessment errors. Presence/absence (P/A) sampling may have advantages over plant cover assessments for reducing such errors. However, the linking between P/A data and plant density depends on model assumptions for plant spatial distributions. Previous studies have shown, for example, how that plant density can be estimated under Poisson model assumptions on the plant locations. In this study new methods are developed and evaluated for linking P/A data with plant density assuming that plants occur in clustered spatial patterns.
    2. New theory was derived for estimating plant density under Neyman-Scott type cluster models such as the Matérn and Thomas cluster processes. Suggested estimators, corresponding confidence intervals, and a proposed goodness-of-fit test were evaluated in a Monte-Carlo simulation study assuming a Matérn cluster process. Further, the estimators were applied to plant data from environmental monitoring in Sweden to demonstrate their empirical application.
    3. The simulation study showed that our methods work well for large enough sample sizes. The judgment of what is "large enough'' is often difficult, but simulations indicate that a sample size is large enough when the sampling distributions of the parameter estimators are symmetric or mildly skewed. Bootstrap may be used to check whether this is true. The empirical results suggests that the derived methodology may be useful for estimating density of plants such as Leucanthemum vulgare and Scorzonera humilis.
    4. By developing estimators of plant density from P/A data under realistic model assumptions about plants' spatial distributions, P/A sampling will become a more useful tool for inventories of plant populations. Our new theory is an important step in this direction.

  9. Data from: WiBB: An integrated method for quantifying the relative...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Jun 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qin Li; Qin Li; Xiaojun Kou; Xiaojun Kou (2022). WiBB: An integrated method for quantifying the relative importance of predictive variables [Dataset]. http://doi.org/10.5061/dryad.xsj3tx9g1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 5, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Qin Li; Qin Li; Xiaojun Kou; Xiaojun Kou
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset contains simulated datasets, empirical data, and R scripts described in the paper: "Li, Q. and Kou, X. (2021) WiBB: An integrated method for quantifying the relative importance of predictive variables. Ecography (DOI: 10.1111/ecog.05651)".

    A fundamental goal of scientific research is to identify the underlying variables that govern crucial processes of a system. Here we proposed a new index, WiBB, which integrates the merits of several existing methods: a model-weighting method from information theory (Wi), a standardized regression coefficient method measured by ß* (B), and bootstrap resampling technique (B). We applied the WiBB in simulated datasets with known correlation structures, for both linear models (LM) and generalized linear models (GLM), to evaluate its performance. We also applied two other methods, relative sum of wight (SWi), and standardized beta (ß*), to evaluate their performance in comparison with the WiBB method on ranking predictor importances under various scenarios. We also applied it to an empirical dataset in a plant genus Mimulus to select bioclimatic predictors of species' presence across the landscape. Results in the simulated datasets showed that the WiBB method outperformed the ß* and SWi methods in scenarios with small and large sample sizes, respectively, and that the bootstrap resampling technique significantly improved the discriminant ability. When testing WiBB in the empirical dataset with GLM, it sensibly identified four important predictors with high credibility out of six candidates in modeling geographical distributions of 71 Mimulus species. This integrated index has great advantages in evaluating predictor importance and hence reducing the dimensionality of data, without losing interpretive power. The simplicity of calculation of the new metric over more sophisticated statistical procedures, makes it a handy method in the statistical toolbox.

  10. i

    Living Conditions Monitoring Survey VI 2010 - Zambia

    • dev.ihsn.org
    • catalog.ihsn.org
    • +1more
    Updated Apr 25, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Central Statistical Office (2019). Living Conditions Monitoring Survey VI 2010 - Zambia [Dataset]. https://dev.ihsn.org/nada/catalog/study/ZMB_2010_LCMS-VI_v01_M
    Explore at:
    Dataset updated
    Apr 25, 2019
    Dataset authored and provided by
    Central Statistical Office
    Time period covered
    2010
    Area covered
    Zambia
    Description

    Abstract

    The main objective of the 2006 and 2010 LCMS surveys was to provide the basis for comparison of poverty estimates derived from cross-sectional survey data between 2006 and 2010.

    In addition, the survey provides a basis on which to: - Monitor the impact of government policies on the well being of the Zambian population. - Monitor the level of poverty and its distribution in Zambia. - Provide various users with a set of reliable indicators against which to monitor - Identify vulnerable groups in society and enhance targeting in policy implementation.

    Geographic coverage

    In the LCMS 2010, all the 1000 sampled SEAs were enumerated representing 100 percent coverage at national level.

    Analysis unit

    • Households
    • Individuals

    Universe

    The survey covered all de jure household members (usual residents) resident in the household.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    Sample stratification and allocation The sampling frame used for the LCMS VI was developed from the 2000 Census of Population and Housing. The country is administratively demarcated into 9 provinces, which are further divided into 72 districts. The districts are further subdivided into 150 constituencies, which are in turn divided into wards. For the purposes of conducting CSO surveys, Wards are further divided into Census Supervisory Areas (CSA), which are further subdivided into Standard Enumeration areas (SEAs). For the purposes of this survey, SEAs constituted the Primary Sampling Units (PSUs). In order to have reasonable estimates at district level and at the same time take into account variation in the sizes of the districts, the survey adopted the Square Root sample allocation method, (Leslie Kish, 1987). This approach offers a compromise between equal and proportional allocation i.e. small sized strata (Districts) are allocated larger samples compared to proportional allocation. However, it should be pointed out that the sample size for the smallest districts is still fairly small, so it is important to examine the confidence intervals for the district-level estimates in order to determine whether the level of precision is adequate. The allocation of the sample points to rural and urban strata was done in such a way that it was proportional to their sizes in each district. Although this method was used, it was observed from the LCMS 2006 that the coefficient of variation (CV) of the poverty estimates was highest in districts which are predominantly urban and lowest in rural districts. This means that the sample size in some urban districts may have been inadequate to measure poverty with a good level of precision. That is, given the higher variability in the urban districts, a larger sample size would be required. Also some districts had very low CV estimates, indicating a higher level of precision for the poverty estimates. In order to try and improve the precision of the poverty estimates for the urban districts, the initial distribution of the sample was adjusted. It was necessary to increase the number of PSUs for some districts without increasing the budget and at the same time not compromising significantly the precision of the poverty estimates for rural areas. Rural districts which had the lowest CVs in the 2006 LCMS results had their sample size reduced, and these were in turn distributed to districts with the highest CVs. The distribution of the sample for the LCMS 2006 and LCMS 2010 were initially the same but changed after the later was adjusted. Table 2.1 in the Survey Report shows the allocation of PSUs in the survey.

    Sample Selection The LCMS VI employed a two-stage stratified cluster sample design whereby during the first stage, 1000 SEAs were selected with Probability Proportional to Estimated Size (PPES) within the respective strata. The size measure was taken from the frame developed from the 2000 Census of Population and Housing. During the second stage, households were systematically selected from an enumeration area listing. The survey was designed to provide reliable estimates at the district, provincial, rural/urban and national levels. However, the reliability for some indicators may be limited for the smaller districts, given the limited sample size. This will be determined by the tabulation of sampling errors and confidence intervals.

    Selection of households Listing of all the households in the selected SEAs was done before a sample of households to be interviewed was drawn. In the case of rural SEAs, households were stratified and listed according to their agricultural activity status. Therefore, there were four explicit strata created at the second sampling stage in each rural SEA namely, the Small Scale Stratum (SSS), the Medium Scale Stratum (MSS), the Large Scale Stratum (LSS) and the Non-agricultural Stratum (NAS). For the purposes of the LCMS VI, Seven, five and three households were selected from the SSS, MSS and NAS, respectively. The large scale households were selected on a 100 percent basis. The urban SEAs were explicitly stratified into low cost, medium cost and high cost areas according to CSO's and local authority classification of residential areas. From each rural and urban SEA, 15 and 25 households were selected, respectively. However, the number of rural households selected in some cases exceeded the prescribed sample size of 15 households depending on the availability of large scale farming households.The selection of households from various strata was preceded by assigning fully responding households sampling serial numbers. The circular systematic sampling method was used to select households. The method assumes that households are arranged in a circle (G. Kalton, 1983) and the following relationship applies: Let N = nk, Where: N = Total number of households assigned sampling serial numbers in a stratum n = Total desired sample size to be drawn from a stratum in an SEA k = The sampling interval in a given SEA calculated as k=N/n.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    Three types of questionnaires will be used in the survey. These are:- 1. The Listing Booklet - to be used for listing all the households residing in the selected Standard Enumeration Areas (SEAs) 2. The Main questionnaire - to be used for collecting detailed information on all household members in the selected households 3. The Prices questionnaire:- to be used to collect unit prices of various commodities. This information is vital for harmonising regional differences in prices

    Cleaning operations

    The Living Conditions Monitoring Survey data were entered using CSPro version 4.0 software. The LCMS 2010 application used a double entry system unlike the LCMS 2006 application which used single entry. The 2010 data entry was done by two teams, one team in the Provinces and another one at CSO headquarters. The data were then compared and matched by a team of matchers. Errors identified by matchers were corrected as a way of completing data entry. The major advantage of double entry (verification) is that data entry errors generated by the data entry operator are greatly minimized. The data were then exported to SAS, SPSS and Stata formats for data cleaning bulation and analysis.

    Response rate

    The household response rate was calculated as the ratio of originally selected households with completed interviews over the total number of households selected. The household response rate was also generally very high with a national average of 98 percent of the originally selected households for both survey periods.

  11. L

    Large Sample AFM Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated May 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Large Sample AFM Report [Dataset]. https://www.datainsightsmarket.com/reports/large-sample-afm-652562
    Explore at:
    doc, pdf, pptAvailable download formats
    Dataset updated
    May 15, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Large Sample Atomic Force Microscopy (AFM) market is experiencing robust growth, driven by increasing demand across diverse applications and technological advancements. The market, estimated at $500 million in 2025, is projected to expand significantly over the forecast period (2025-2033), with a Compound Annual Growth Rate (CAGR) of 8%. This growth is primarily fueled by the rising adoption of AFM for surface detection and characterization in materials science, semiconductor manufacturing, and nanotechnology research. The development of fully automated systems, offering enhanced throughput and ease of use, is a key trend contributing to market expansion. Furthermore, advancements in force spectroscopy techniques, enabling detailed investigation of material properties at the nanoscale, are creating new opportunities for market penetration. The ability to analyze larger samples is a significant advantage of Large Sample AFM, improving efficiency and reducing experimental limitations compared to traditional AFM systems. While factors like high initial investment costs might restrain market growth in some segments, this is being offset by the long-term cost savings and research advancements offered by the technology. The market segmentation reveals a strong preference for fully automated systems due to their improved efficiency and user-friendliness. Among applications, surface detection remains the dominant segment, reflecting the widespread use of AFM in quality control and materials characterization. However, force spectroscopy and electrical property detection segments are expected to witness substantial growth driven by rising research activities in advanced materials and electronics. Geographically, North America and Europe currently hold significant market shares, owing to strong research infrastructure and industry presence. However, the Asia-Pacific region is poised for rapid growth, propelled by increasing investment in nanotechnology research and development, particularly in countries like China and South Korea. Key players in the Large Sample AFM market, such as Hitachi, Bruker, and Park Systems, are investing heavily in research and development to maintain their competitive edge and meet the rising demands of the market.

  12. f

    A confidence interval analysis of sampling effort, sequencing depth, and...

    • plos.figshare.com
    • zenodo.org
    • +1more
    pdf
    Updated Jun 6, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ryoko Oono (2023). A confidence interval analysis of sampling effort, sequencing depth, and taxonomic resolution of fungal community ecology in the era of high-throughput sequencing [Dataset]. http://doi.org/10.1371/journal.pone.0189796
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 6, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Ryoko Oono
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    High-throughput sequencing technology has helped microbial community ecologists explore ecological and evolutionary patterns at unprecedented scales. The benefits of a large sample size still typically outweigh that of greater sequencing depths per sample for accurate estimations of ecological inferences. However, excluding or not sequencing rare taxa may mislead the answers to the questions ‘how and why are communities different?’ This study evaluates the confidence intervals of ecological inferences from high-throughput sequencing data of foliar fungal endophytes as case studies through a range of sampling efforts, sequencing depths, and taxonomic resolutions to understand how technical and analytical practices may affect our interpretations. Increasing sampling size reliably decreased confidence intervals across multiple community comparisons. However, the effects of sequencing depths on confidence intervals depended on how rare taxa influenced the dissimilarity estimates among communities and did not significantly decrease confidence intervals for all community comparisons. A comparison of simulated communities under random drift suggests that sequencing depths are important in estimating dissimilarities between microbial communities under neutral selective processes. Confidence interval analyses reveal important biases as well as biological trends in microbial community studies that otherwise may be ignored when communities are only compared for statistically significant differences.

  13. 2018 American Community Survey: B19123 | FAMILY SIZE BY CASH PUBLIC...

    • data.census.gov
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ACS, 2018 American Community Survey: B19123 | FAMILY SIZE BY CASH PUBLIC ASSISTANCE INCOME OR HOUSEHOLDS RECEIVING FOOD STAMPS/SNAP BENEFITS IN THE PAST 12 MONTHS (ACS 1-Year Estimates Detailed Tables) [Dataset]. https://data.census.gov/table/ACSDT1Y2018.B19123?q=Income%20and%20Earnings&t=SNAP/Food%20Stamps&g=310XX00US36740&y=2018
    Explore at:
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    Authors
    ACS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    2018
    Description

    Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, it is the Census Bureau's Population Estimates Program that produces and disseminates the official estimates of the population for the nation, states, counties, cities, and towns and estimates of housing units for states and counties..Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found on the American Community Survey website in the .Technical Documentation.. section......Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the .Methodology.. section..Source: U.S. Census Bureau, 2018 American Community Survey 1-Year Estimates.Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see .ACS Technical Documentation..). The effect of nonsampling error is not represented in these tables..While the 2018 American Community Survey (ACS) data generally reflect the July 2015 Office of Management and Budget (OMB) delineations of metropolitan and micropolitan statistical areas, in certain instances the names, codes, and boundaries of the principal cities shown in ACS tables may differ from the OMB delineations due to differences in the effective dates of the geographic entities..Estimates of urban and rural populations, housing units, and characteristics reflect boundaries of urban areas defined based on Census 2010 data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..Explanation of Symbols:..An "**" entry in the margin of error column indicates that either no sample observations or too few sample observations were available to compute a standard error and thus the margin of error. A statistical test is not appropriate..An "-" entry in the estimate column indicates that either no sample observations or too few sample observations were available to compute an estimate, or a ratio of medians cannot be calculated because one or both of the median estimates falls in the lowest interval or upper interval of an open-ended distribution, or the margin of error associated with a median was larger than the median itself..An "-" following a median estimate means the median falls in the lowest interval of an open-ended distribution..An "+" following a median estimate means the median falls in the upper interval of an open-ended distribution..An "***" entry in the margin of error column indicates that the median falls in the lowest interval or upper interval of an open-ended distribution. A statistical test is not appropriate..An "*****" entry in the margin of error column indicates that the estimate is controlled. A statistical test for sampling variability is not appropriate. .An "N" entry in the estimate and margin of error columns indicates that data for this geographic area cannot be displayed because the number of sample cases is too small..An "(X)" means that the estimate is not applicable or not available....

  14. f

    Literature search strategy.

    • plos.figshare.com
    xls
    Updated May 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nucki Nursjamsi Hidajat; R. M. Satrio Nugroho Magetsari; Gregorius Thomas Prasetiyo; Danendra Rakha Putra Respati; Kevin Christian Tjandra (2024). Literature search strategy. [Dataset]. http://doi.org/10.1371/journal.pone.0296149.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 15, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Nucki Nursjamsi Hidajat; R. M. Satrio Nugroho Magetsari; Gregorius Thomas Prasetiyo; Danendra Rakha Putra Respati; Kevin Christian Tjandra
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundThe recommendation on whether to bury or expose the Kirschner wire (K-wire) for the management of fractures has still been controversial with inconsistent results in the published studies due to the potential issue associated with exposed K-wire is the heightened risk of infection, as it comes into direct contact with the external environment and air. This study aims to summarize the specific outcomes between buried and exposed K-wire for the management of hand and forearm fractures.MethodsWe conducted relevant literature searches on Europe PMC, Medline, Scopus, and Cochrane Library databases using specific keywords. This investigation focuses on individuals of any age diagnosed with hand or forearm fractures who underwent surgery involving Kirschner wire (K-wire) fixation. It examines the comparison between buried and exposed K-wire fixation, emphasizing primary outcome pin infection, along with secondary outcomes such as early pin removal, days to pin removal, and surgical duration. The study includes observational studies (cohort/case-control) or randomized clinical trials (RCTs). The results of continuous variables were pooled into the standardized mean difference (SMD), while dichotomous variables were pooled into odds ratio (OR) along with 95% confidence intervals using random-effect models. The quality of included studies was assessed with Cochrane Collaborations, Risk of Bias version 2 (RoB v2).ResultsA total of 11 studies were included. Our pooled analysis revealed that buried K-wire was associated with a lower risk of pin site infection [RR 0.49 (95% CI 0.36–0.67), p < 0.00001, I2 = 0%] and 33.85 days longer duration until pin removal [MD 33.85 days (95% CI 18.68–49.02), p < 0.0001, I2 = 99%] when compared with exposed K-wire. However, the duration of surgery was 9.98 minutes significantly longer in the buried K-wire [MD 6.98 minutes (95% CI 2.19–11.76), p = 0.004, I2 = 42%] with no significant difference in the early pin removal rate [RR 0.73 (95% CI 0.36–1.45), p = 0.37, I2 = 0%]. Further regression analysis revealed that sample size, age, sex, and duration of follow-up did not affect those relationships.ConclusionBuried K-wire may offer benefits in reducing the infection rate with a longer duration until pin removal. However, further RCTs with larger sample sizes are still needed to confirm the results of our study.

  15. Good Growth Plan 2014-2019 - France

    • microdata.worldbank.org
    • catalog.ihsn.org
    • +1more
    Updated Jan 27, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Syngenta (2023). Good Growth Plan 2014-2019 - France [Dataset]. https://microdata.worldbank.org/index.php/catalog/5625
    Explore at:
    Dataset updated
    Jan 27, 2023
    Dataset authored and provided by
    Syngenta
    Time period covered
    2014 - 2019
    Area covered
    France
    Description

    Abstract

    Syngenta is committed to increasing crop productivity and to using limited resources such as land, water and inputs more efficiently. Since 2014, Syngenta has been measuring trends in agricultural input efficiency on a global network of real farms. The Good Growth Plan dataset shows aggregated productivity and resource efficiency indicators by harvest year. The data has been collected from more than 4,000 farms and covers more than 20 different crops in 46 countries. The data (except USA data and for Barley in UK, Germany, Poland, Czech Republic, France and Spain) was collected, consolidated and reported by Kynetec (previously Market Probe), an independent market research agency. It can be used as benchmarks for crop yield and input efficiency.

    Geographic coverage

    National coverage

    Analysis unit

    Agricultural holdings

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    A. Sample design Farms are grouped in clusters, which represent a crop grown in an area with homogenous agro- ecological conditions and include comparable types of farms. The sample includes reference and benchmark farms. The reference farms were selected by Syngenta and the benchmark farms were randomly selected by Kynetec within the same cluster.

    B. Sample size Sample sizes for each cluster are determined with the aim to measure statistically significant increases in crop efficiency over time. This is done by Kynetec based on target productivity increases and assumptions regarding the variability of farm metrics in each cluster. The smaller the expected increase, the larger the sample size needed to measure significant differences over time. Variability within clusters is assumed based on public research and expert opinion. In addition, growers are also grouped in clusters as a means of keeping variances under control, as well as distinguishing between growers in terms of crop size, region and technological level. A minimum sample size of 20 interviews per cluster is needed. The minimum number of reference farms is 5 of 20. The optimal number of reference farms is 10 of 20 (balanced sample).

    C. Selection procedure The respondents were picked randomly using a "quota based random sampling" procedure. Growers were first randomly selected and then checked if they complied with the quotas for crops, region, farm size etc. To avoid clustering high number of interviews at one sampling point, interviewers were instructed to do a maximum of 5 interviews in one village.

    BF Screened from France were selected based on the following criterion: (a) Grain (or silage corn if the weather doesn't allow for grain corn) growers in Allier, Calvados, Côte-d'Or, Côtes-d'Armor, Eure-et-Loir, Finistère, Isère, Meurthe-et-Moselle, Pas-de-Calais, Haut-Rhin - Grain or silage corn growers (grain is the objective, but it could turn into silage corn if weather conditions don't allow for grain corn production) - Growers with a relatively good technology level / professionalism - Hybrid corn

    (b) Sunflower growers in Charente, Charente-Maritime, Cher, Haute-Garonne, Gers, Indre - Sunflower growers - Growers with a relatively good technology level / professionalism - Growers who are rotating their crops (implicating that we are measuring data on similar but different plots every year)
    - Un tournesoliculteur (volontaire ayant un savoir-faire et historique culture -> the farmer must be experienced ( ie having a good know how in farming in general). He is used to manage sunflower)
    - Hybrid sunflower

    In departments: Poitou-Charentes: 16 and 17 Sud Ouest: 31 and 32 Centre: 18 or 36

    (c) Grapes growers in Champagne - Vine grapes (for vine processing, so NOT table grapes)
    - In Champagne
    - Farmer is also a processor or belongs to processor with a strict contract (guideline on quality)
    - Background: don't go to big chateaux (they are already high level, probably don't want to share information because of competitive advantage)

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    Data collection tool for 2019 covered the following information:

    (A) PRE- HARVEST INFORMATION

    PART I: Screening PART II: Contact Information PART III: Farm Characteristics a. Biodiversity conservation b. Soil conservation c. Soil erosion d. Description of growing area e. Training on crop cultivation and safety measures PART IV: Farming Practices - Before Harvest a. Planting and fruit development - Field crops b. Planting and fruit development - Tree crops c. Planting and fruit development - Sugarcane d. Planting and fruit development - Cauliflower e. Seed treatment

    (B) HARVEST INFORMATION

    PART V: Farming Practices - After Harvest a. Fertilizer usage b. Crop protection products c. Harvest timing & quality per crop - Field crops d. Harvest timing & quality per crop - Tree crops e. Harvest timing & quality per crop - Sugarcane f. Harvest timing & quality per crop - Banana g. After harvest PART VI - Other inputs - After Harvest a. Input costs b. Abiotic stress c. Irrigation

    See all questionnaires in external materials tab

    Cleaning operations

    Data processing:

    Kynetec uses SPSS (Statistical Package for the Social Sciences) for data entry, cleaning, analysis, and reporting. After collection, the farm data is entered into a local database, reviewed, and quality-checked by the local Kynetec agency. In the case of missing values or inconsistencies, farmers are re-contacted. In some cases, grower data is verified with local experts (e.g. retailers) to ensure data accuracy and validity. After country-level cleaning, the farm-level data is submitted to the global Kynetec headquarters for processing. In the case of missing values or inconsistences, the local Kynetec office was re-contacted to clarify and solve issues.

    Quality assurance Various consistency checks and internal controls are implemented throughout the entire data collection and reporting process in order to ensure unbiased, high quality data.

    • Screening: Each grower is screened and selected by Kynetec based on cluster-specific criteria to ensure a comparable group of growers within each cluster. This helps keeping variability low.

    • Evaluation of the questionnaire: The questionnaire aligns with the global objective of the project and is adapted to the local context (e.g. interviewers and growers should understand what is asked). Each year the questionnaire is evaluated based on several criteria, and updated where needed.

    • Briefing of interviewers: Each year, local interviewers - familiar with the local context of farming -are thoroughly briefed to fully comprehend the questionnaire to obtain unbiased, accurate answers from respondents.

    • Cross-validation of the answers: o Kynetec captures all growers' responses through a digital data-entry tool. Various logical and consistency checks are automated in this tool (e.g. total crop size in hectares cannot be larger than farm size) o Kynetec cross validates the answers of the growers in three different ways: 1. Within the grower (check if growers respond consistently during the interview) 2. Across years (check if growers respond consistently throughout the years) 3. Within cluster (compare a grower's responses with those of others in the group) o All the above mentioned inconsistencies are followed up by contacting the growers and asking them to verify their answers. The data is updated after verification. All updates are tracked.

    • Check and discuss evolutions and patterns: Global evolutions are calculated, discussed and reviewed on a monthly basis jointly by Kynetec and Syngenta.

    • Sensitivity analysis: sensitivity analysis is conducted to evaluate the global results in terms of outliers, retention rates and overall statistical robustness. The results of the sensitivity analysis are discussed jointly by Kynetec and Syngenta.

    • It is recommended that users interested in using the administrative level 1 variable in the location dataset use this variable with care and crosscheck it with the postal code variable.

    Data appraisal

    Due to the above mentioned checks, irregularities in fertilizer usage data were discovered which had to be corrected:

    For data collection wave 2014, respondents were asked to give a total estimate of the fertilizer NPK-rates that were applied in the fields. From 2015 onwards, the questionnaire was redesigned to be more precise and obtain data by individual fertilizer products. The new method of measuring fertilizer inputs leads to more accurate results, but also makes a year-on-year comparison difficult. After evaluating several solutions to this problems, 2014 fertilizer usage (NPK input) was re-estimated by calculating a weighted average of fertilizer usage in the following years.

  16. 2023 American Community Survey: B19123 | Family Size by Cash Public...

    • data.census.gov
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ACS, 2023 American Community Survey: B19123 | Family Size by Cash Public Assistance Income or Households Receiving Food Stamps/SNAP Benefits in the Past 12 Months (ACS 1-Year Estimates Detailed Tables) [Dataset]. https://data.census.gov/table/ACSDT1Y2023.B19123
    Explore at:
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    Authors
    ACS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    2023
    Description

    Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, the decennial census is the official source of population totals for April 1st of each decennial year. In between censuses, the Census Bureau's Population Estimates Program produces and disseminates the official estimates of the population for the nation, states, counties, cities, and towns and estimates of housing units and the group quarters population for states and counties..Information about the American Community Survey (ACS) can be found on the ACS website. Supporting documentation including code lists, subject definitions, data accuracy, and statistical testing, and a full list of ACS tables and table shells (without estimates) can be found on the Technical Documentation section of the ACS website.Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Source: U.S. Census Bureau, 2023 American Community Survey 1-Year Estimates.ACS data generally reflect the geographic boundaries of legal and statistical areas as of January 1 of the estimate year. For more information, see Geography Boundaries by Year..Users must consider potential differences in geographic boundaries, questionnaire content or coding, or other methodological issues when comparing ACS data from different years. Statistically significant differences shown in ACS Comparison Profiles, or in data users' own analysis, may be the result of these differences and thus might not necessarily reflect changes to the social, economic, housing, or demographic characteristics being compared. For more information, see Comparing ACS Data..Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see ACS Technical Documentation). The effect of nonsampling error is not represented in these tables..Estimates of urban and rural populations, housing units, and characteristics reflect boundaries of urban areas defined based on 2020 Census data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..Explanation of Symbols:- The estimate could not be computed because there were an insufficient number of sample observations. For a ratio of medians estimate, one or both of the median estimates falls in the lowest interval or highest interval of an open-ended distribution. For a 5-year median estimate, the margin of error associated with a median was larger than the median itself.N The estimate or margin of error cannot be displayed because there were an insufficient number of sample cases in the selected geographic area. (X) The estimate or margin of error is not applicable or not available.median- The median falls in the lowest interval of an open-ended distribution (for example "2,500-")median+ The median falls in the highest interval of an open-ended distribution (for example "250,000+").** The margin of error could not be computed because there were an insufficient number of sample observations.*** The margin of error could not be computed because the median falls in the lowest interval or highest interval of an open-ended distribution.***** A margin of error is not appropriate because the corresponding estimate is controlled to an independent population or housing estimate. Effectively, the corresponding estimate has no sampling error and the margin of error may be treated as zero.

  17. e

    Annual Survey of Hours and Earnings, 1997-2023: Secure Access - Dataset -...

    • b2find.eudat.eu
    Updated May 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Annual Survey of Hours and Earnings, 1997-2023: Secure Access - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/7ec8ca52-b51b-54ec-88b9-04bfac54f26b
    Explore at:
    Dataset updated
    May 4, 2023
    Description

    Abstract copyright UK Data Service and data collection copyright owner. The Annual Survey of Hours and Earnings (ASHE) is one of the largest surveys of the earnings of individuals in the UK. Data on the wages, paid hours of work, and pensions arrangements of nearly one per cent of the working population are collected. Other variables relating to age, occupation and industrial classification are also available. The ASHE sample is drawn from National Insurance records for working individuals, and the survey forms are sent to their respective employers to complete. While limited in terms of personal characteristics compared to surveys such as the Labour Force Survey, the ASHE is useful not only because of its larger sample size, but also the responses regarding wages and hours are considered to be more accurate, since the responses are provided by employers rather than from employees themselves. A further advantage of the ASHE is that data for the same individuals are collected year after year. It is therefore possible to construct a panel dataset of responses for each individual running back as far as 1997, and to track how occupations, earnings and working hours change for individuals over time. Furthermore, using the unique business identifiers, it is possible to combine ASHE data with data from other business surveys, such as the Annual Business Survey (UK Data Archive SN 7451). The ASHE replaced the New Earnings Survey (NES, SN 6704) in 2004. NES was developed in the 1970s in response to the policy needs of the time. The survey had changed very little in its thirty-year history. ASHE datasets for the years 1997-2003 were derived using ASHE methodologies applied to NES data. The ASHE improves on the NES in the following ways:the NES questionnaire allowed too much variation in employer responses, leading to wide variations in the dataweightings have been introduced to take account of the population size (significant biases were a known problem in NES data)the significant numbers of employees who change jobs between the sample selection and survey reference dates are retained in the ASHE sample, whereas these were dropped from the NESLinking to other business studies These data contain Inter-Departmental Business Register (IDBR) reference numbers. These are anonymous but unique reference numbers assigned to business organisations. Their inclusion allows researchers to combine different business survey sources together. Researchers may consider applying for other business data to assist their research. Observations from Northern Ireland The ASHE data held by the UK Data Archive include very few observations from Northern Ireland. Users requiring access to Northern Ireland data are advised to contact the Northern Ireland Statistics and Research Agency, who administer this aspect of the survey. Local unit reference variable, luref The local unit reference variable 'luref', is generated to indicate multiple occurrences of the same local unit for disclosure checking purposes. It is inconsistent across years and is not an IDBR reference number. It should not be used to link ASHE with other business datasets.For Secure Lab projects applying for access to this study as well as to SN 6697 Business Structure Database and/or SN 7683 Business Structure Database Longitudinal, only postcode-free versions of the data will be made available.Latest Edition InformationFor the twenty-fifth edition (April 2024), the data file 'ashegb_2022r_2023p_soc20_ restricted' has been updated, along with the accompanying data dictionary. An error was identified with the previous edition data file. The work postcode was not included for around 1,000 records (across the board) of the 148,000 records in the 2022 sample. This would have a minimal impact on high level analysis, but affect detailed geography level analysis. The 2022 published tables were not affected. Main Topics: The ASHE contains a small number of variables for each individual, relating to wages, hours of work, pension arrangements, and occupation and industrial classifications. There are also variables for age, gender and full/part-time status. Because the data are collected by the employer, there are also variables relating to the organisation employing the individual. These include employment size and legal status (e.g. public company). Various geography variables are included in the data files.

  18. R

    Data from: IZA Evaluation Dataset Survey

    • ed.iza.org
    • dataverse.iza.org
    docx, zip
    Updated Oct 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patrick Arni; Marco Caliendo; Steffen Künn; Klaus F. Zimmermann; Patrick Arni; Marco Caliendo; Steffen Künn; Klaus F. Zimmermann (2023). IZA Evaluation Dataset Survey [Dataset]. http://doi.org/10.15185/izadp.7971.1
    Explore at:
    docx(44055), zip(16669702)Available download formats
    Dataset updated
    Oct 20, 2023
    Dataset provided by
    Research Data Center of IZA (IDSC)
    Authors
    Patrick Arni; Marco Caliendo; Steffen Künn; Klaus F. Zimmermann; Patrick Arni; Marco Caliendo; Steffen Künn; Klaus F. Zimmermann
    License

    https://www.iza.org/wc/dataverse/IIL-1.0.pdfhttps://www.iza.org/wc/dataverse/IIL-1.0.pdf

    Time period covered
    2007 - 2011
    Area covered
    Germany, Federal States
    Description

    The IZA Evaluation Dataset Survey (IZA ED) was developed in order to obtain reliable longitudinal estimates for the impact of Active Labor Market Policies (ALMP). Moreover, it is suitable for studying the processes of job search and labor market reintegration. The data allow analyzing dynamics with respect to a rich set of individual and labor market characteristics. It covers the initial period of unemployment as well as long-term outcomes, for a total period of up to 3 years after unemployment entry. A longitudinal questionnaire records monthly labor market activities and their duration in detail for the mentioned period. These activities are, for example, employment, unemployment, ALMP, other training etc. Available information covers employment status, occupation, sector, and related earnings, hours, unemployment benefits or other transfer payments. A cross-sectional questionnaire contains all basic information including the process of entering into unemployment, and demographics. The entry into unemployment describes detailed job search behavior such as search intensity, search channels and the role of the Employment Agency. Moreover, reservation wages and individual expectations about leaving unemployment or participating in ALMP programs are recorded. The available demographic information covers employment status, occupation and sector, as well as specifics about citizenship and ethnic background, educational levels, number and age of children, household structure and income, family background, health status, and workplace as well as place of residence regions. The survey provides as well detailed information about the treatment by the unemployment insurance authorities, imposed labor market policies, benefit receipt and sanctions. The survey focuses additionally on individual characteristics and behavior. Such co-variates of individuals comprise social networks, ethnic and migration background, relations and identity, personality traits, cognitive and non-cognitive skills, life and job satisfaction, risky behavior, attitudes and preferences. The main advantages of the IZA ED are the large sample size of unemployed individuals, the accuracy of employment histories, the innovative and rich set of individual co-variates and the fact that the survey measures important characteristics shortly after entry into unemployment.

  19. p

    Household Income and Expenditure Survey 2022 - Tuvalu

    • microdata.pacificdata.org
    Updated May 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Central Statistics Division (2025). Household Income and Expenditure Survey 2022 - Tuvalu [Dataset]. https://microdata.pacificdata.org/index.php/catalog/880
    Explore at:
    Dataset updated
    May 15, 2025
    Dataset authored and provided by
    Central Statistics Division
    Time period covered
    2022 - 2023
    Area covered
    Tuvalu
    Description

    Abstract

    The main purpose of a Household Income and Expenditure Survey (HIES) survey was to present high quality and representative national household data on income and expenditure in order to update Consumer Price Index (CPI), improve statistics on National Accounts and measure poverty within the country. These statistics are a requirement for evidence based policy-making in reducing poverty within the country and monitor progress in the national strategic plan in place.

    Geographic coverage

    Urban (Funafuti) and rural areas (outer islands).

    Analysis unit

    Household and Individual.

    Universe

    Private households.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The sampling design of the Tuvalu 2022 HIES consists in the random selection of the appropriate numbers of households (within each strata urban and rural) in order to be able to disaggregate HIES results at the strata level (in addition to National level). The urban strata of Tuvalu is made of the island of Funafuti (as a whole) and the rest of the country (all outer islands) compose the rural strata. The statistical unit used to run this sampling analysis is the household. The sample procedure is based on the following steps: - Assessment of the accuracy of the previous 2015 HIES in terms of per capita total expenditure (variable of interest) and check whether the sample size at that time were appropriate and correctly distributed among both stratas, - Update this assessment process by using the most recent population count to get the new sample size and distribution, - Proceed to the random selection of households using this most recent population count. The sampling frame (most recent household listing and population count) used to update and select is the 2021 Tuvalu Household Listing conducted by the Central Statistics Division of Tuvalu. At the National level, the 2015 Tuvalu HIES reported a good accuracy of the per capita total expenditure (less than 5%) but the disaggregation results by strata showed a lower quality of the result in Tuvalu urban. The Tuvalu 2021 household listing provides the most recent distribution of the households across all the islands of Tuvalu. This step consists in updating the accuracy of the previous 2015 HIES by using this recent household count and get the appropriate RSE by changing the sample size. For budget constraint, the total sample size cannot get increased, as the funding situation does not allow higher sample size. It means that the only parameter that can be modified is the distribution of the sample across the strata. Sample size by stratum: -Urban: 350 (out of 1,010 urban households as per the 2021 listing) -Rural: 310 (out of 835 rural households as per the 2021 listing) -National: 660 (out of 1,845 total households as per the 2021 listing)

    2015 per capita mean total expenditure (AUD): -Urban: 3,190 -Rural: 2,780 -National: 3,000

    Relative Standard Error (RSE): -Urban: 5.1% -Rural: 4.1% -National: 3.3%

    It results from this new sample design a new distribution that shows an increase in Funafuti urban, mainly due to: - The low quality of the survey results from the 2015 HIES, - The number of households that have increased by more than 15% between 2015 and 2020 in Tuvalu urban area.

    The household selection process is based on a simple random procedure within each stratum: - The 350 households in Funafuti are selected using the same probability of selection across all villages of the islands - The 310 household in rural Tuvalu are distributed proportionally to the size of each rural island of Tuvalu. This proportional allocation of the sample across rural Tuvalu islands generates the best accuracy at the strata level.

    Distribution of sample accross strata: Urban: Funafuti 350 Rural: Nanumea 42
    Nanumaga 37 Niutao 46
    Nui 39
    Vaitupu 75
    Nukufetau 45
    Nukulaelae 23
    Niukalita 4

    Non-response is a problem in surveys, and it is crucial that the field teams interview the selected households (the location on the map and the name of the household head are used to help to determine the selected households). During the first visit, interviewers must do their best to convince the household head to participate in the survey (and get his/her approval to proceed to interview). It may happen in the field that the first visit results in: I. A refusal: the household head does not show any interest in the survey and is reluctant to participate, II. The house is empty (household members away at the time of the visit).

    (I) Refusal: if the interviewer cannot convince the household head to participate, he has to liaise with the survey management, and the supervisor will help in the discussion to convince the household head to respond. In this case, it is important to mention that all responses are kept confidential and insist on the importance of it for the benefit of Tuvalu population. (II) Empty house: the interviewer must investigate (checking with neighbours) whether or not the house is still inhabited by the family: o If it is not the case, the dwelling is then vacant, and the replacement procedure must be activated. o If the dwelling is still occupied, interviewer must come back later the same day or the day after at different time

    Only in extreme cases of persistent refusal or empty house (household members away during the time of the collection) the replacement procedure must be activated. The replacement procedure consists in changing the selected household to the closest neighbour who is available.

    Mode of data collection

    Computer Assisted Personal Interview [capi]

    Research instrument

    The 2022 Tuvalu Household Income and Expenditure Survey (HIES) questionnaire was developed in English language and it follows the Pacific Standard HIES questionnaire structure. It is administered on CAPI using Survey Solution, and the diary is no longer part of the form. All transactions (food, non food, home production and gifts) are collected through different recall sections during the same visit. The traditional 14 days diary is no longer recommended in the region. This new method of implementing the HIES present some interesting and valuable advantages such as: cost saving, data quality, time reduction for data processing and reporting. The 2022 HIES of Tuvalu was directly integrated to a census through a Long Form Census (LFC). The LFC was an experiment led by the World Bank and the Pacific Community to try and group a census and a HIES collection. All households were normally enumerated during the 2022 Census and households selected to participate to the HIES were then asked the HIES questions.

    Below is a list of all modules in this questionnaire: -Household ID -Demographic characteristics -Education -Health -Functional difficulties -Communication -Alcohol -Other individual expenses -Labour force -Fisheries -Handicraft and home-processed food -Dwelling characteristics -Assets -Home maintenance -Vehicles -International trips -Domestic trips -Household services -Financial support -Other household expenditure -Ceremonies -Remittances -Food insecurity -Financial inclusion -Livestock & aquaculture -Agriculture parcel -Agriculture vegetables -Agriculture rootcrops -Agriculture fruits

    The survey questionnaire can be found in this documentation.

    Cleaning operations

    Data was edited, cleaned and imputed using the software Stata.

    Response rate

    There was a total of 662 households from the original selection of the sample. 592 of them were contacted 528 accepted the interviews. The number of valid households is 464, or 70% of households before replacement. After replacement, 54 households were considered valid making the final completion rate at 78% (73% in urban and 85% in rural area).

  20. n

    Data from: Irreproducible text-book 'knowledge': the effects of color bands...

    • data.niaid.nih.gov
    • search.dataone.org
    • +1more
    zip
    Updated Feb 16, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daiping Wang; Wolfgang Forstmeier; Malika Ihle; Mehdi Khadraoui; Sofia Jerónimo; Katrin Martin; Bart Kempenaers (2018). Irreproducible text-book 'knowledge': the effects of color bands on zebra finch fitness [Dataset]. http://doi.org/10.5061/dryad.cc145b6
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 16, 2018
    Dataset provided by
    University of Florida
    Max Planck Institute for Ornithology
    Authors
    Daiping Wang; Wolfgang Forstmeier; Malika Ihle; Mehdi Khadraoui; Sofia Jerónimo; Katrin Martin; Bart Kempenaers
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Many fields of science currently experience a heated debate about the extent of publication bias against null-findings. Here, we show a case where putatively well-established text-book knowledge cannot be confirmed. Across four decades, zebra finch (Taeniopygia guttata) studies have reported effects of bands of certain colors on male or female attractiveness and further on behavior, physiology, life-history and fitness. Only 8 out of 39 publications presented exclusively null-findings. Here, we analyze the results of eight experiments in which we quantified the fitness of 730 color-banded individuals from four captive populations (two domesticated and two recently wild-derived). This sample size exceeds the combined sample size of all 23 publications that clearly support the “color-band effect” hypothesis. In our populations, band color explained no variance in fitness and there were no context- or population-specific band color effects. Analysis of unpublished data from three other laboratories strengthens our null finding. Finally, a meta-analysis of previously published results is indicative of selective reporting and suggests that the effect size approaches zero when sample size is large. We argue that our field would benefit from more effective means to counter confirmation bias and publication bias.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Marschall Furman; Kent W. Thomas; Barbara Jane George (2023). Separating Measurement Error and Signal in Environmental Data: Use of Replicates to Address Uncertainty [Dataset]. http://doi.org/10.1021/acs.est.3c02231.s003

Data from: Separating Measurement Error and Signal in Environmental Data: Use of Replicates to Address Uncertainty

Related Article
Explore at:
xlsxAvailable download formats
Dataset updated
Oct 5, 2023
Dataset provided by
ACS Publications
Authors
Marschall Furman; Kent W. Thomas; Barbara Jane George
License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

Measurement uncertainty has long been a concern in the characterizing and interpreting environmental and toxicological measurements. We compared statistical analysis approaches when there are replicates: a Naı̈ve approach that omits replicates, a Hybrid approach that inappropriately treats replicates as independent samples, and a Measurement Error Model (MEM) approach in a random effects analysis of variance (ANOVA) model that appropriately incorporates replicates. A simulation study assessed the effects of sample size and levels of replication, signal variance, and measurement error on estimates from the three statistical approaches. MEM results were superior overall with confidence intervals for the observed mean narrower on average than those from the Naı̈ve approach, giving improved characterization. The MEM approach also featured an unparalleled advantage in estimating signal and measurement error variance separately, directly addressing measurement uncertainty. These MEM estimates were approximately unbiased on average with more replication and larger sample sizes. Case studies illustrated analyzing normally distributed arsenic and log-normally distributed chromium concentrations in tap water and calculating MEM confidence intervals for the true, latent signal mean and latent signal geometric mean (i.e., with measurement error removed). MEM estimates are valuable for study planning; we used simulation to compare various sample sizes and levels of replication.

Search
Clear search
Close search
Google apps
Main menu