100+ datasets found

Supplementary material from "Visual comparison of two data sets: Do people...
figshare.com
xlsx
Updated Mar 14, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robin Kramer; Caitlin Telfer; Alice Towler (2017). Supplementary material from "Visual comparison of two data sets: Do people use the means and the variability?" [Dataset]. http://doi.org/10.6084/m9.figshare.4751095.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.4751095.v1
Dataset updated
Mar 14, 2017
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Robin Kramer; Caitlin Telfer; Alice Towler
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In our everyday lives, we are required to make decisions based upon our statistical intuitions. Often, these involve the comparison of two groups, such as luxury versus family cars and their suitability. Research has shown that the mean difference affects judgements where two sets of data are compared, but the variability of the data has only a minor influence, if any at all. However, prior research has tended to present raw data as simple lists of values. Here, we investigated whether displaying data visually, in the form of parallel dot plots, would lead viewers to incorporate variability information. In Experiment 1, we asked a large sample of people to compare two fictional groups (children who drank ‘Brain Juice’ versus water) in a one-shot design, where only a single comparison was made. Our results confirmed that only the mean difference between the groups predicted subsequent judgements of how much they differed, in line with previous work using lists of numbers. In Experiment 2, we asked each participant to make multiple comparisons, with both the mean difference and the pooled standard deviation varying across data sets they were shown. Here, we found that both sources of information were correctly incorporated when making responses. Taken together, we suggest that increasing the salience of variability information, through manipulating this factor across items seen, encourages viewers to consider this in their judgements. Such findings may have useful applications for best practices when teaching difficult concepts like sampling variation.
R
A 2D Design Space defined with non-linear equations using different sampling...
entrepot.recherche.data.gouv.fr
tsv, txt, zip
Updated Apr 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Manuel LOPEZ CABRERA; Manuel LOPEZ CABRERA; Wahb ZOUHRI; Wahb ZOUHRI; Sandra ZIMMER-CHEVRET; Sandra ZIMMER-CHEVRET; Jean-Yves DANTAN; Jean-Yves DANTAN (2025). A 2D Design Space defined with non-linear equations using different sampling methods with different number of data points [Dataset]. http://doi.org/10.57745/MYIYZU
Explore at:
zip(449491), zip(44110), tsv(397), txt(2945), zip(4223266)Available download formats
Unique identifier
https://doi.org/10.57745/MYIYZU
Dataset updated
Apr 22, 2025
Dataset provided by
Recherche Data Gouv
Authors
Manuel LOPEZ CABRERA; Manuel LOPEZ CABRERA; Wahb ZOUHRI; Wahb ZOUHRI; Sandra ZIMMER-CHEVRET; Sandra ZIMMER-CHEVRET; Jean-Yves DANTAN; Jean-Yves DANTAN
License
https://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html
Dataset funded by
Agence nationale de la recherche
Description
A 2D design space with two parameters were created using different sampling methods: grid, Latin hypercube sampling (LHS), random, and antithetic version of the last two. The number of sample points to cover the study space are: 100, 225, 625, 1225, and 2500. The lower values for both parameters equal to 0.2 and upper values equal to 1. The design space is based on the geometry characterised by non-linear equations, and non-convexity. The synthetic tabular datasets contain two parameters and consider a binary classification problem, where points are “Good” denoted with “1” if they are in the interior of the design space and “Bad” denoted with “0” if they are not. The datasets were used to extract two extra datasets to train, evaluate, and compare classification models coupled with active learning strategies. The two extra datasets extracted from the datasets containing the values of parameters and the target associated are: (i) the indexes of the initial labelled samples and (ii) the indexes of the initial training samples.
i
Estimating the Size of Populations through a Household Survey 2011 - Rwanda
datacatalog.ihsn.org
microdata.worldbank.org
Updated Oct 10, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rwanda Biomedical Center/ Institute of HIV/AIDS, Disease Prevention and Control Department (RBC/IHDPC) (2017). Estimating the Size of Populations through a Household Survey 2011 - Rwanda [Dataset]. https://datacatalog.ihsn.org/catalog/7192
Explore at:
Dataset updated
Oct 10, 2017
Dataset authored and provided by
Rwanda Biomedical Center/ Institute of HIV/AIDS, Disease Prevention and Control Department (RBC/IHDPC)
Time period covered
2011
Area covered
Rwanda
Description
Abstract

The Estimating the Size of Populations through a Household Survey (EPSHS), sought to assess the feasibility of the network scale-up and proxy respondent methods for estimating the sizes of key populations at higher risk of HIV infection and to compare the results to other estimates of the population sizes. The study was undertaken based on the assumption that if these methods proved to be feasible with a reasonable amount of data collection for making adjustments, countries would be able to add this module to their standard household survey to produce size estimates for their key populations at higher risk of HIV infection. This would facilitate better programmatic responses for prevention and caring for people living with HIV and would improve the understanding of how HIV is being transmitted in the country.

The specific objectives of the ESPHS were: 1. To assess the feasibility of the network scale-up method for estimating the sizes of key populations at higher risk of HIV infection in a Sub-Saharan African context; 2. To assess the feasibility of the proxy respondent method for estimating the sizes of key populations at higher risk of HIV infection in a Sub-Saharan African context; 3. To estimate the population size of MSM, FSW, IDU, and clients of sex workers in Rwanda at a national level; 4. To compare the estimates of the sizes of key populations at higher risk for HIV produced by the network scale-up and proxy respondent methods with estimates produced using other methods; and 5. To collect data to be used in scientific publications comparing the use of the network scale-up method in different national and cultural environments.

Geographic coverage

National

Analysis unit

Household

Individual

Sampling procedure

The Estimating the Size of Populations through a Household Survey (ESPHS) used a two-stage sample design, implemented in a representative sample of 2,125 households selected nationwide in which all women and men age 15 years and above where eligible for an individual interview. The sampling frame used was the preparatory frame for the Rwanda Population and Housing Census (RPHC), which was conducted in 2012; it was provided by the National Institute of Statistics of Rwanda (NISR).

The sampling frame was a complete list of natural villages covering the whole country (14,837 villages). Two strata were defined: the city of Kigali and the rest of the country. One hundred and thirty Primary Sampling Units (PSU) were selected from the sampling frame (35 in Kigali and 95 in the other stratum). To reduce clustering effect, only 20 households were selected per cluster in Kigali and 15 in the other clusters. As a result, 33 percent of the households in the sample were located in Kigali.

The list of households in each cluster was updated upon arrival of the survey team in the cluster. Once the listing had been updated, a number was assigned to each existing household in the cluster. The supervisor then identified the households to be interviewed in the survey by using a table in which the households were randomly pre-selected. This table also provided the list of households pre-selected for each of the two different definitions of what it means "to know" someone.

For further details on sample design and implementation, see Appendix A of the final report.

Mode of data collection

Face-to-face [f2f]

Research instrument

The Estimating the Size of Populations through a Household Survey (ESPHS) used two types of questionnaires: a household questionnaire and an individual questionnaire. The same individual questionnaire was used to interview both women and men. In addition, two versions of the individual questionnaire were developed, using two different definitions of what it means “to know” someone. Each version of the individual questionnaire was used in half of the selected households.

Cleaning operations

The processing of the ESPHS data began shortly after the fieldwork commenced. Completed questionnaires were returned periodically from the field to the SPH office in Kigali, where they were entered and checked for consistency by data processing personnel who were specially trained for this task. Data were entered using CSPro, a programme specially developed for use in DHS surveys. All data were entered twice (100 percent verification). The concurrent processing of the data was a distinct advantage for data quality, because the School of Public Health had the opportunity to advise field teams of problems detected during data entry. The data entry and editing phase of the survey was completed in late August 2011.

Response rate

A total of 2,125 households were selected in the sample, of which 2,120 were actually occupied at the time of the interview. The number of occupied households successfully interviewed was 2,102, yielding a household response rate of 99 percent.

From the households interviewed, 2,629 women were found to be eligible and 2,567 were interviewed, giving a response rate of 98 percent. Interviews with men covered 2,102 of the eligible 2,149 men, yielding a response rate of 98 percent. The response rates do not significantly vary by type of questionnaire or residence.

Sampling error estimates

The estimates from a sample survey are affected by two types of errors: (1) non-sampling errors, and (2) sampling errors. Non-sampling errors are the results of mistakes made in implementing data collection and data processing, such as failure to locate and interview the correct household, misunderstanding of the questions on the part of either the interviewer or the respondent, and data entry errors. Although numerous efforts were made to minimize this type of error during the implementation of the Rwanda ESPHS 2011, non-sampling errors are impossible to avoid and difficult to evaluate statistically.

Sampling errors, on the other hand, can be evaluated statistically. The sample of respondents selected in the ESPHS 2011 is only one of many samples that could have been selected from the same population, using the same design and identical size. Each of these samples would yield results that differ somewhat from the results of the actual sample selected. Sampling errors are a measure of the variability between all possible samples. Although the degree of variability is not known exactly, it can be estimated from the survey results.

A sampling error is usually measured in terms of the standard error for a particular statistic (mean, percentage, etc.), which is the square root of the variance. The standard error can be used to calculate confidence intervals within which the true value for the population can reasonably be assumed to fall. For example, for any given statistic calculated from a sample survey, the value of that statistic will fall within a range of plus or minus two times the standard error of that statistic in 95 percent of all possible samples of identical size and design.

If the sample of respondents had been selected as a simple random sample, it would have been possible to use straightforward formulas for calculating sampling errors. However, the ESPHS 2011 sample is the result of a multi-stage stratified design, and, consequently, it was necessary to use more complex formulae. The computer software used to calculate sampling errors for the ESPHS 2011 is a SAS program. This program uses the Taylor linearization method for variance estimation for survey estimates that are means or proportions.

A more detailed description of estimates of sampling errors are presented in Appendix B of the survey report.
d
Data from: Comparison of seven DNA metabarcoding sampling methods to assess...
search.dataone.org
data.niaid.nih.gov
+1more
Updated Feb 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neil Paprocki; Shannon Blair; Courtney Conway; Jennifer Adams; Stacey Nerkowski; Jeff Kidd; Lisette Waits (2025). Comparison of seven DNA metabarcoding sampling methods to assess diet in a large avian predator [Dataset]. http://doi.org/10.5061/dryad.rv15dv4hh
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.rv15dv4hh
Dataset updated
Feb 26, 2025
Dataset provided by
Dryad Digital Repository
Authors
Neil Paprocki; Shannon Blair; Courtney Conway; Jennifer Adams; Stacey Nerkowski; Jeff Kidd; Lisette Waits
Description
DNA metabarcoding is a rapidly advancing tool for diet assessment in wildlife ecology. Studies have used a variety of field collection methods to evaluate diet, however there is a pressing need to understand the differences among sampling methods and the downstream inferential consequences they may have on our ability to document diet accurately and efficiently. We evaluated seven DNA metabarcoding sampling methods to assess the diet of a large avian predator: Buteo lagopus (rough-legged hawk). We collected beak swabs, talon swabs, cheek (buccal) swabs, cloacal swabs, and cloacal loops from captured birds, and collected fecal samples from both captured and uncaptured birds. We described and compared variation in prey recovery within and among the seven sampling methods and identified appropriate analytical methods to compare diet among individuals sampled via different methods. Beak and talon swabs produced the highest prey detection rates, yielded the greatest prey richness per sample,..., We collected diet samples from Buteo lagopus throughout their North American nonbreeding range in the conterminous United States and Canada during the nonbreeding season (November â€“ March) from 2020 â€“ 2023. We sampled diet from captured birds via: 1) talon swabs, 2) beak swabs, 3) cheek (i.e., buccal cavity) swabs, 4) cloacal swabs, 5) cloacal fecal loops, and 6) fecal samples. We also collected 7) fecal samples from uncaptured free-ranging birds. In total, we collected 592 samples from 189 individuals including 113 captured and 76 uncaptured Buteo lagopus. All samples were extracted in a laboratory dedicated to low quality and quantity DNA samples. No forms of high-quality DNA were handled or stored in this laboratory. We collected DNA onto three â€œsubstratesâ€ as described above: nylon bristle swabs (beak and talon samples), foam swabs (cheek and cloacal samples) and homogenized feces (fecal samples, fecal loops). Each substrate had its own DNA extraction protocol, but the protoco..., , # Comparison of seven DNA metabarcoding sampling methods to assess diet in a large avian predator

https://doi.org/10.5061/dryad.rv15dv4hh

Description of the data and file structure

Diet metabarcoding samples were collected using bristle and foam swabs from Rough-Legged Hawk beaks, talons, cheeks, cloacas and feces. DNA from swabs was extracted in a low-input (clean) DNA laboratory in Moscow, ID. Vertebrates were targeted using the V5 region of the 12S RNA gene using primers from Riaz et al (2011). Libraries were prepped using a two-step protocol with the locus PCR in the first round and an i5/i7 dual-index PCR in the second round. Samples were sequenced in duplicate with rolling positive replicates between libraries (ie, one sample from a previous library included in duplicate in a subsequent library to assess inter-library concordance), a duplicate off-target positive control, and a duplicate no template control in each library. Librarie...
2010-2014 ACS Children by Parental Labor Force Participation Variables -...
hub.arcgis.com
mapdirect-fdep.opendata.arcgis.com
Updated Nov 18, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Esri (2020). 2010-2014 ACS Children by Parental Labor Force Participation Variables - Boundaries [Dataset]. https://hub.arcgis.com/maps/feea68d7d0c4457aa7adfa10c180802a
Explore at:
Dataset updated
Nov 18, 2020
Dataset authored and provided by
Esrihttp://esri.com/
Area covered

Description
This layer contains 2010-2014 American Community Survey (ACS) 5-year data, and contains estimates and margins of error. The layer shows children by age group by parents' labor force participation. This is shown by tract, county, and state boundaries. There are also additional calculated attributes related to this topic, which can be mapped or used within analysis.This layer is symbolized to show the percent of children with no available (residential) parent in the labor force. To see the full list of attributes available in this service, go to the "Data" tab, and choose "Fields" at the top right. Vintage: 2010-2014ACS Table(s): B23008 Data downloaded from: Census Bureau's API for American Community Survey Date of API call: November 11, 2020National Figures: data.census.govThe United States Census Bureau's American Community Survey (ACS):About the SurveyGeography & ACSTechnical DocumentationNews & UpdatesThis ready-to-use layer can be used within ArcGIS Pro, ArcGIS Online, its configurable apps, dashboards, Story Maps, custom apps, and mobile apps. Data can also be exported for offline workflows. For more information about ACS layers, visit the FAQ. Please cite the Census and ACS when using this data.Data Note from the Census:Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see Accuracy of the Data). The effect of nonsampling error is not represented in these tables.Data Processing Notes:This layer has associated layers containing the most recent ACS data available by the U.S. Census Bureau. Click here to learn more about ACS data releases and click here for the associated boundaries layer. The reason this data is 5+ years different from the most recent vintage is due to the overlapping of survey years. It is recommended by the U.S. Census Bureau to compare non-overlapping datasets.Boundaries come from the US Census TIGER geodatabases. Boundary vintage (2014) appropriately matches the data vintage as specified by the Census. These are Census boundaries with water and/or coastlines clipped for cartographic purposes. For census tracts, the water cutouts are derived from a subset of the 2010 AWATER (Area Water) boundaries offered by TIGER. For state and county boundaries, the water and coastlines are derived from the coastlines of the 500k TIGER Cartographic Boundary Shapefiles. The original AWATER and ALAND fields are still available as attributes within the data table (units are square meters). The States layer contains 52 records - all US states, Washington D.C., and Puerto RicoCensus tracts with no population that occur in areas of water, such as oceans, are removed from this data service (Census Tracts beginning with 99).Percentages and derived counts, and associated margins of error, are calculated values (that can be identified by the "_calc_" stub in the field name), and abide by the specifications defined by the American Community Survey.Field alias names were created based on the Table Shells file available from the American Community Survey Summary File Documentation page.Negative values (e.g., -4444...) have been set to null, with the exception of -5555... which has been set to zero. These negative values exist in the raw API data to indicate the following situations:The margin of error column indicates that either no sample observations or too few sample observations were available to compute a standard error and thus the margin of error. A statistical test is not appropriate.Either no sample observations or too few sample observations were available to compute an estimate, or a ratio of medians cannot be calculated because one or both of the median estimates falls in the lowest interval or upper interval of an open-ended distribution.The median falls in the lowest interval of an open-ended distribution, or in the upper interval of an open-ended distribution. A statistical test is not appropriate.The estimate is controlled. A statistical test for sampling variability is not appropriate.The data for this geographic area cannot be displayed because the number of sample cases is too small.
f
Data_Sheet_1_The Power of Microbiome Studies: Some Considerations on Which...
frontiersin.figshare.com
docx
Updated Jun 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jannigje Gerdien Kers; Edoardo Saccenti (2023). Data_Sheet_1_The Power of Microbiome Studies: Some Considerations on Which Alpha and Beta Metrics to Use and How to Report Results.docx [Dataset]. http://doi.org/10.3389/fmicb.2021.796025.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fmicb.2021.796025.s001
Dataset updated
Jun 13, 2023
Dataset provided by
Frontiers
Authors
Jannigje Gerdien Kers; Edoardo Saccenti
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundSince sequencing techniques have become less expensive, larger sample sizes are applicable for microbiota studies. The aim of this study is to show how, and to what extent, different diversity metrics and different compositions of the microbiota influence the needed sample size to observe dissimilar groups. Empirical 16S rRNA amplicon sequence data obtained from animal experiments, observational human data, and simulated data were used to perform retrospective power calculations. A wide variation of alpha diversity and beta diversity metrics were used to compare the different microbiota datasets and the effect on the sample size.ResultsOur data showed that beta diversity metrics are the most sensitive to observe differences as compared with alpha diversity metrics. The structure of the data influenced which alpha metrics are the most sensitive. Regarding beta diversity, the Bray–Curtis metric is in general the most sensitive to observe differences between groups, resulting in lower sample size and potential publication bias.ConclusionWe recommend performing power calculations and to use multiple diversity metrics as an outcome measure. To improve microbiota studies, awareness needs to be raised on the sensitivity and bias for microbiota research outcomes created by the used metrics rather than biological differences. We have seen that different alpha and beta diversity metrics lead to different study power: because of this, one could be naturally tempted to try all possible metrics until one or more are found that give a statistically significant test result, i.e., p-value < α. This way of proceeding is one of the many forms of the so-called p-value hacking. To this end, in our opinion, the only way to protect ourselves from (the temptation of) p-hacking would be to publish a statistical plan before experiments are initiated, describing the outcomes of interest and the corresponding statistical analyses to be performed.
f
Case for omitting tied observations in the two-sample t-test and the...
plos.figshare.com
tiff
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Monnie McGee (2023). Case for omitting tied observations in the two-sample t-test and the Wilcoxon-Mann-Whitney Test [Dataset]. http://doi.org/10.1371/journal.pone.0200837
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0200837
Dataset updated
Jun 4, 2023
Dataset provided by
PLOS ONE
Authors
Monnie McGee
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
When the distributional assumptions for a t-test are not met, the default position of many analysts is to resort to a rank-based test, such as the Wilcoxon-Mann-Whitney Test to compare the difference in means between two samples. The Wilcoxon-Mann-Whitney Test presents no danger of tied observations when the observations in the data are continuous. However, in practice, observations are discretized due various logical reasons, or the data are ordinal in nature. When ranks are tied, most textbooks recommend using mid-ranks to replace the tied ranks, a practice that affects the distribution of the Wilcoxon-Mann-Whitney Test under the null hypothesis. Other methods for breaking ties have also been proposed. In this study, we examine four tie-breaking methods—average-scores, mid-ranks, jittering, and omission—for their effects on Type I and Type II error of the Wilcoxon-Mann-Whitney Test and the two-sample t-test for various combinations of sample sizes, underlying population distributions, and percentages of tied observations. We use the results to determine the maximum percentage of ties for which the power and size are seriously affected, and for which method of tie-breaking results in the best Type I and Type II error properties. Not surprisingly, the underlying population distribution of the data has less of an effect on the Wilcoxon-Mann-Whitney Test than on the t-test. Surprisingly, we find that the jittering and omission methods tend to hold Type I error at the nominal level, even for small sample sizes, with no substantial sacrifice in terms of Type II error. Furthermore, the t-test and the Wilcoxon-Mann-Whitney Test are equally effected by ties in terms of Type I and Type II error; therefore, we recommend omitting tied observations when they occur for both the two-sample t-test and the Wilcoxon-Mann-Whitney due to the bias in Type I error that is created when tied observations are left in the data, in the case of the t-test, or adjusted using mid-ranks or average-scores, in the case of the Wilcoxon-Mann-Whitney.
n
Data from: A comparison of density estimation methods for monitoring marked...
data.niaid.nih.gov
datadryad.org
zip
Updated Oct 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joshua Twining; Ben Augustine; David Tosh; Denise O'Meara; Claire McFarlane; Marina Reyne; Sarah Helyar; Ian Montgomery (2022). A comparison of density estimation methods for monitoring marked and unmarked animal populations [Dataset]. http://doi.org/10.5061/dryad.xwdbrv1g2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.xwdbrv1g2
Dataset updated
Oct 11, 2022
Dataset provided by
National Museums Northern Ireland
Queen's University Belfast
Waterford Institute of Technology
Cornell University
Authors
Joshua Twining; Ben Augustine; David Tosh; Denise O'Meara; Claire McFarlane; Marina Reyne; Sarah Helyar; Ian Montgomery
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
These data were generated to compare different methods of estimating population density from marked and unmarked animal populations. We compare conventional live trapping with two more modern, non-invasive field methods of population estimation: genetic fingerprinting from hair-tube sampling and camera trapping for the European pine marten (Martes martes). We used arrays of camera traps, live traps, and hair tubes to collect the relevant data in the Ring of Gullion in Northern Ireland. We apply marked spatial capture-recapture models to the genetic and live trapping data where individuals were identifiable, and unmarked spatial capture-recapture (uSCR), distance sampling (CT-DS), and random encounter models (REM) to the camera trap data where individual ID was not possible. All five approaches produced plausible and relatively consistent point estimates (0.41 – 0.99 animals per km2), despite differences in precision, cost, and effort being apparent. In addition to the data, we provide novel code for running unmarked spatial capture-recapture (uSCR) and random encounter models (REM) to the camera trap data where individual ID was not possible. Methods All fieldwork was carried out in the Ring of Gullion, Northern Ireland, UK. Cameras Thirty Bushnell HD Trophy Cam 8MP camera traps (model number: 119577) with 8GB SD cards were deployed during June and July 2019. Thirty Bushnell HD Trophy Cam 8MP camera traps (model number: 119577) with 8GB SD cards were deployed during June and July 2019. At the end of the survey period, camera traps were checked and for each detection (the first image in a trigger sequence of an individual pine marten) distance to animal (m) and angle of detection (°) were measured in situ. Noninvasive genetic sampling Twenty hair tubes based on those developed by Mullins et al. (2010), were deployed across the study site between June and July 2019. Hair-tubes were checked weekly and sticky patches and bait were replaced on each visit. Hair samples were frozen at -20oC prior to DNA extraction. Microsatellite analysis to identify individual pine marten was carried out using up to 11 microsatellite markers. Each sample was analysed in duplicate and only samples giving identical results in the replicates were scored. Live traps Twelve Tomahawk 205 live cage traps were deployed along two perpendicular transects spaced approximately 400m apart. Trapping was conducted from August - October 2019 with daily trap checks. Trapped animals were anaesthetised with an intramuscular injection of ketamine (25mg per kg) and midazolam (0.2mg per kg) and scanned for a microchip. Statistical analyses Spatially explicit capture-recapture (SECR) models were used to estimate density for both live trapping and gNIS (Efford & Boulanger, 2019). Occasion lengths for live trapping were one day, whilst for gNIS were one week. For live trapping, we specified a single-use detector type, whilst for gNIS we specified a proximity-based detector type. Density was calculated from camera traps using REM (Rowcliffe et al. 2008), CT-DS (Howe et al. 2017) and uSCR (Chandler & Royle, 2013).
n
Data from: Floristic composition, structure and diversity of riparian...
data.niaid.nih.gov
search.dataone.org
+1more
zip
Updated Dec 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tolulope Borisade; Anthony Odiwe (2022). Floristic composition, structure and diversity of riparian forests in southwestern Nigeria: Conservation is inevitable [Dataset]. http://doi.org/10.5061/dryad.zgmsbccg3
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.zgmsbccg3
Dataset updated
Dec 20, 2022
Dataset provided by
Obafemi Awolowo University
Bamidele Olumilua University of Education, Science and Technology
Authors
Tolulope Borisade; Anthony Odiwe
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Area covered
Nigeria
Description
The Nigerian riparian forest ecosystems had declined in extent and distribution and this had been attributed mainly to land use change. This study intended to provide an understanding of the links between plant diversity, composition, structures, and disturbances both anthropogenic and natural processes inducing the vegetation dynamics. Nine study sites were used for this study, within each site, five (5) plots (0.25 ha in size) were marked out and placed systematically at an interval of 10 m along the transect. A complete enumeration of plant species was carried out and identified at the species level. Diversity indices and structural parameters were determined and anthropogenic activities were ranked. A total number of 233 plant species were identified, belonging to 80 families; out of which, Euphorbiaceae and Apocynaceae were dominant families The density and basal area ranged from 2,200-6,000 ha-1 and 2.59-17.58 m2 ha-1 respectively across the study sites. Pterocarpus santalinoides, Alchornea cordiflora, Chassalia kolly, Tetracera spp, Fimbristylis, Bambusa vulgaris and Cyrtosperma senegalense were the dominant species. The Shannon diversity index ranged from (1.38-3.49), Simpson (0.66-0.97), and Evenness diversity (0.43-0.84). Fisher alpha (10.03-30.21) and Whittaker beta diversity (0.36-0.89) values were highest in Ipetumodu (site VIII) and lowest in Ilesha (site II). Seventy-three (73%) of the species in this study had a low important value index (IVI). The dominance of some lianas and herbaceous species in the riparian forest sites showed disturbances, stages of ecological succession, and regeneration of the vegetation. Conservation is inevitable towards maintaining and protecting species diversity, ecosystem roles, and services of these forests in Nigeria. Methods Description of the study area: The study was carried out in Osun State located in Southwestern Nigeria. The state lies within latitude 7° 30′ N and longitudes 4° 30′ E. Ilesha West, Atakumosa West, Ife North, Ife North East, Ife South, and Ayedade were the Local Government Areas, where the sites are situated within Osun State, Southwestern Nigeria. Nine riparian forest sites (Ifetedo, Ilesha, Osu, Famia, Ibodi, Tonkere Gbongan, Edunabon, and Ipetumodu) were selected from these Local Government areas and designated as site I-IX respectively. These areas represented riparian zones with flooding patterns of Southwestern Nigeria and have high plant species diversity, despite varying ongoing anthropogenic activities (Figure 1, Table 1). The detailed climate, soil, and vegetation of the study area in Southwestern, Nigeria have been described (Borisade 2020, Borisade and Odiwe2021). Data Collection Vegetation Sampling and Data Collection for Plant Species Woody species assessment: In each community, forest stands within the selected sites were sampled on a transect, 150 m long extended perpendicular to and across the streams/river. Nine sites were used for this study, within each site, five plots (0.25 ha in size) were marked out and each plot was placed systematically at an interval of 10 m along the transect, thereby giving a total of 45 plots across the study sites. Each plot was demarcated by narrow-cut thin lines to mark the boundary of the plot. Stem diameters were measured using a diameter tape at breast height (1.3 m). For individuals with buttresses or other stem irregularities at breast height, the diameter was measured above the buttresses, following usual forestry procedures. For shrub and lianas encountered, stems were counted, the diameter of three “average” stems from individual species was measured and then the composite (stem diameter (dbh)) of the shrub and lianas were calculated to enable the computation of its basal area in the manner used for the trees (Burton et al. 2005). For density calculations, each individual was considered as one, independent of its number of branches. Herbaceous species assessment: To assess the species composition, density, and cover of herbaceous species mainly forbs, and other understory forms such as grasses, sedges, and ferns. In each established sampling plot (0.25 ha), five 1m x 1m quadrats were randomly laid. In each quadrat, all rooted plant species were identified and counted. Furthermore, five 10 m line transects were randomly laid using a measuring tape. A single cover pin was perpendicularly dropped at every meter point along the transect. Any plant species part or base touched was noted to estimate the aerial and basal cover of the species in the plot. The percentage cover was calculated as the number of ‘hits’ per species divided by the total number of pins dropped multiplied by 100. Plants were identified to species level and unknown species were cross-checked using IFE herbarium, Flora of West Tropical Africa by Hutchinson and Dalziel (1954), Handbook of West African Weeds (Akobundu and Agyakwa1998), Trees of Nigeria (Keay1989), Plant lists and PROTA. Voucher specimens of the different species were collected, dried, and deposited in the IFE herbarium, Department of Botany, Obafemi Awolowo University, Ile-Ife, Nigeria. Disturbance ranking: Indication of anthropogenic activities in the riparian forest was made within the study sites and disturbance scores were obtained for each site using the method described by Mani and Parthasarathy (2006) which was modified. Disturbance scores were given to each site by qualitatively evaluating various disturbances (tree logging, mining, grazing, flooding occurrence, agriculture, and settlement) ranked into low (1), occasional (2), and frequent (3) levels of disturbances. The summation of all the scores within each site was considered and sites with high values were ranked as high disturbance sites while those with low values were as low disturbance sites. Data and Statistical Analysis: The data from the complete enumeration of woody species were used to determine stem densities per hectare for different life forms (trees, shrubs, and climbers) and this was calculated using the number of individuals divided by sample areas. The data were used to establish floristic composition in terms of species, genera, and families. The important quantitative analysis such as density, frequency, and abundance of tree species, shrubs, climbers, and forbs species was determined as described by Curtis and McIntosh (1950). The basal area for woody species was also determined. Density: The density of each species, Di, was calculated as the number of individuals in a unit area: Di = ni / A where Di is the density for species i; n is the total number of individuals counted for species i, and A is the total area sampled. Frequency: The frequency for each species was calculated as: fi = ji / K where fi is the frequency of species i, ji is the number of samples taken and k is the number of occurrences. Basal area: Basal area which indicated the importance (dominance) of tree and shrub cover (Carratti et al.2004), was determined as: Basal area (m2ha-1) = C2 / 4π where: C = girth size (diameter at breast height), π = 22/7 = 3.14. The basal area of each species was determined by adding the basal area of individuals of the species and the plot basal area was calculated by adding basal areas of all species in each plot while the site basal area was calculated as the mean woody species basal of all the sample plots. Importance Value Index: This index was used to determine the overall importance of each species in the community structure. Relative Dominance = Basal area per family/ Total Basal area×100 Relative Frequency = Frequency of each species/ Sum of frequency values of all species×100 Relative Density = Density of each species/ Density of all species ×100 The three relative values were added together to obtain Importance Values (IV) for each species. Diversity Indices: The total number of species recorded in the sampling plots (species richness), rarefaction measure as well as Shannon-Wiener diversity Index (H’), dominance and beta diversity, and species evenness were employed to quantify and characterise species diversity and species-abundance distributions of the plant communities. Alpha species diversity was calculated for each site as Shannon-Weiner diversity, described by (Magurran2004), Dominance was evaluated using Simpson’s index of dominance (1949); Beta diversity was determined using Whittaker’s (Whittaker 1972); Rarefaction diversity (E(Sn)) was computed from the floristic data in order to compare species numbers from samples of different sizes among the community types (Hsieh and Li 1998). The Species evenness was calculated using Pielou Index (J) (1977). The confidence interval of structural parameters such as density and basal area was set at 95%. The disturbance scores, species density, and diversity indices (Shannon-weiner diversity, species evenness, and dominance) were subjected to Correspondence Analysis (CA) and the overall influence of these parameters on the riparian vegetation sites to Detrended Correspondence Analysis (DCA) using PAleontological STatistics (PAST) version 3.17 software (Hammer et al. 2001).
n
Data from: Acids in Coffee - A Review of Sensory Measurements and...
data.niaid.nih.gov
search.dataone.org
+1more
zip
Updated Jul 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sara Yeager (2021). Acids in Coffee - A Review of Sensory Measurements and Meta-Analysis of Chemical Composition [Dataset]. http://doi.org/10.25338/B8C91C
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.25338/B8C91C
Dataset updated
Jul 14, 2021
Dataset provided by
University of California, Davis
Authors
Sara Yeager
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
This dataset contains information from the meta-analysis presented in "Acids in Coffee: A Review of Sensory Measurements and Meta-Analysis of Chemical Composition." Acid concentrations were extracted from a total of 121 publications for at least one of 26 different organic acids (OAs) or 23 different chlorogenic acids (CGAs), yielding 7,509 distinct data points. Concentrations were collected for Coffea arabica, Coffea canpehora (robusta), and other types of coffee, for both green and different roast levels.

Methods To obtain a more complete picture on the acid composition in coffee, we conducted an extensive review and meta-analysis of the scientific literature. Web of Science, Google Scholar, and the University of California Library catalog were searched between April to December 2020 for any publications that included data about the amounts of acid in coffee samples. This search focused explicitly on measurements of the concentration of individual CGAs and OAs in coffee, not the overall amount of acid in coffee (usually expressed as total titratable acidity). Access was limited to online versions of publications due to COVID-19 restrictions during the time of the database search. Articles not available directly online were obtained through Interlibrary Loan requests. In the case of articles published in languages other than English, a translating website was used to read the article. Abstracts and full texts were examined for specific data about the absolute amounts of any chlorogenic or organic acids. Articles that only examined the presence, relative amounts, or formation pathways of CGAs or OAs were excluded. Papers that reported CGAs or OAs in units of mg/L without including the original mass of coffee used were excluded due to the fact that amounts in units of mg/L cannot be directly compared with amounts in units of mg/kg (comparing mass in wet basis versus mass in dry basis).

If the publication did contain specific amounts of CGAs or OAs that satisfied the preceding conditions, then all roast levels, extraction types, and coffee species were included, except for decaffeinated and instant coffee. The additional processing on decaffeinated and instant coffee complicates comparison with other coffees. If the publication listed data for store-bought samples, those were included as well. In some cases, roast level and coffee species were not specified, and these data points were categorized as “unspecified”. For the purposes of this review, Coffea arabica will be referred to as “arabica” coffee and Coffea canephora cv. robusta will be referred to as “robusta” coffee.
A tremendous complicating factor is the roast level, which strongly affects acid concentrations but is very challenging to quantify precisely; subjective roast descriptions like “dark roast” have no universally accepted definition. For the purpose of the meta-analysis, we therefore performed a semi-qualitative classification of the reported roast levels into three categories – light, medium, or dark – using the following methodology.

The roast levels for specific data in publications was determined in one of four ways: (1) as the publication’s self-described roast level; (2) from the publication’s reported amount of water lost during roasting (11-13% = light, 14-16% = medium, 17-20% dark) or organic roast loss percentage (ORL%) (2-4% = light, 4.1-5.5% = medium, 5.6-7% = dark) (Perrone et al. 2008; Weers et al. 1995); (3) the publication’s reported L*a*b* color values of the roasted beans where L* of 30, 25, and 20 correspond to light, medium, and dark, respectively (Chindapan, Soydok, and Devahastin 2019); or (4) as “unspecified” if the publication did not mention any of the above. If the publication provided finer demarcations of roast level (e.g., a “light roast” and a “very light” roast), then we grouped their samples as appropriate into just our three broad categories. Lastly, samples that were labelled simply as “roasted” without giving any indication to the degree of roast kept the label of “roasted” and were included when comparing roasted coffee as a whole (Correia, Leitao and Clifford 1995; Agnoletti et al. 2019). We emphasize that because roast level is very qualitative and methods of measuring roast level vary greatly, the roast level labels used in this paper are approximate, based on the information available in the cited publications.

Similarly, extraction of the acids for analysis varied widely among the different publications. If a chemical solvent such as methanol was used, the extraction type was labelled as “solvent”; soaking the coffee grounds in hot water was labelled as “immersion”; extraction types such as “French press” or “espresso” were explicitly mentioned in their respective publications and the labels were kept for data collection.

Lastly, all measurements were converted to mg/kg to simplify comparison. Accordingly, the units reported in the publications often had to be converted, e.g., data reported in units of g/kg was multiplied by 1000 to match units of mg/kg. In cases, where publications reported concentration in terms of mmol/kg, the molecular weight of the specific acid was used to convert to mg/kg. Lastly, in articles that presented the data in units of mg/L and included the original brew recipe (grams of coffee and liters of water), the data was converted to units of mg/kg using the brew recipe, assuming full extraction from the dry coffee grounds. Data for 23 different CGAs and 26 different OAs was collected and analyzed While thirty-eight OAs have been quantified in coffee (Maier 1999), many are present in trace amounts and not commonly reported. Those reported in fewer than 2 publications and with amounts less than 0.01/kg were not included, accounting for the difference in total OAs analyzed in this review.

In chlorogenic acids the widely reported acids are total CQA, 5-CQA, 4-CQA, 3-CQA, total diCQA, and total FQA. Some publications would report only total concentrations of one class (“Total diCQA”) instead of quantifying each isomer, so three categories were created, “Total CQA”, “Total FQA”, and “Total di-CQA”, to compare across publications (Anthony, Clifford, and Noirot 1993). Each of these categories includes the sum of each isomer in that class; for example, “Total CQA” is a sum of 5-CQA, 4-CQA, and 3-CQA. 27 unique CGAs have been identified in coffee (Clifford et al. 2003; Clifford 2006). The limited recurrences (fewer than 2 publications) of some species led to their exclusion from data collection.
i
Living Standards Measurement Survey 2003 (General Population, Wave 2 Panel)...
datacatalog.ihsn.org
catalog.ihsn.org
+1more
Updated Jun 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Strategic Marketing & Media Research Institute Group (SMMRI) (2025). Living Standards Measurement Survey 2003 (General Population, Wave 2 Panel) and Roma Settlement Survey 2003 - Serbia and Montenegro [Dataset]. https://datacatalog.ihsn.org/catalog/5178
Explore at:
Dataset updated
Jun 8, 2025
Dataset provided by
Ministry of Social Affairs
Strategic Marketing & Media Research Institute Group (SMMRI)
Time period covered
2003
Area covered
Serbia and Montenegro
Description
Abstract

The study included four separate surveys:

The LSMS survey of general population of Serbia in 2002

The survey of Family Income Support (MOP in Serbian) recipients in 2002 These two datasets are published together separately from the 2003 datasets.

The LSMS survey of general population of Serbia in 2003 (panel survey)

The survey of Roma from Roma settlements in 2003 These two datasets are published together.

Objectives

LSMS represents multi-topical study of household living standard and is based on international experience in designing and conducting this type of research. The basic survey was carried out in 2002 on a representative sample of households in Serbia (without Kosovo and Metohija). Its goal was to establish a poverty profile according to the comprehensive data on welfare of households and to identify vulnerable groups. Also its aim was to assess the targeting of safety net programs by collecting detailed information from individuals on participation in specific government social programs. This study was used as the basic document in developing Poverty Reduction Strategy (PRS) in Serbia which was adopted by the Government of the Republic of Serbia in October 2003.

The survey was repeated in 2003 on a panel sample (the households which participated in 2002 survey were re-interviewed).

Analysis of the take-up and profile of the population in 2003 was the first step towards formulating the system of monitoring in the Poverty Reduction Strategy (PRS). The survey was conducted in accordance with the same methodological principles used in 2002 survey, with necessary changes referring only to the content of certain modules and the reduction in sample size. The aim of the repeated survey was to obtain panel data to enable monitoring of the change in the living standard within a period of one year, thus indicating whether there had been a decrease or increase in poverty in Serbia in the course of 2003. [Note: Panel data are the data obtained on the sample of households which participated in the both surveys. These data made possible tracking of living standard of the same persons in the period of one year.]

Along with these two comprehensive surveys, conducted on national and regional representative samples which were to give a picture of the general population, there were also two surveys with particular emphasis on vulnerable groups. In 2002, it was the survey of living standard of Family Income Support recipients with an aim to validate this state supported program of social welfare. In 2003 the survey of Roma from Roma settlements was conducted. Since all present experiences indicated that this was one of the most vulnerable groups on the territory of Serbia and Montenegro, but with no ample research of poverty of Roma population made, the aim of the survey was to compare poverty of this group with poverty of basic population and to establish which categories of Roma population were at the greatest risk of poverty in 2003. However, it is necessary to stress that the LSMS of the Roma population comprised potentially most imperilled Roma, while the Roma integrated in the main population were not included in this study.

Geographic coverage

The surveys were conducted on the whole territory of Serbia (without Kosovo and Metohija).

Kind of data

Sample survey data [ssd]

Sampling procedure

Sample frame for both surveys of general population (LSMS) in 2002 and 2003 consisted of all permanent residents of Serbia, without the population of Kosovo and Metohija, according to definition of permanently resident population contained in UN Recommendations for Population Censuses, which were applied in 2002 Census of Population in the Republic of Serbia. Therefore, permanent residents were all persons living in the territory Serbia longer than one year, with the exception of diplomatic and consular staff.

The sample frame for the survey of Family Income Support recipients included all current recipients of this program on the territory of Serbia based on the official list of recipients given by Ministry of Social affairs.

The definition of the Roma population from Roma settlements was faced with obstacles since precise data on the total number of Roma population in Serbia are not available. According to the last population Census from 2002 there were 108,000 Roma citizens, but the data from the Census are thought to significantly underestimate the total number of the Roma population. However, since no other more precise data were available, this number was taken as the basis for estimate on Roma population from Roma settlements. According to the 2002 Census, settlements with at least 7% of the total population who declared itself as belonging to Roma nationality were selected. A total of 83% or 90,000 self-declared Roma lived in the settlements that were defined in this way and this number was taken as the sample frame for Roma from Roma settlements.

Planned sample: In 2002 the planned size of the sample of general population included 6.500 households. The sample was both nationally and regionally representative (representative on each individual stratum). In 2003 the planned panel sample size was 3.000 households. In order to preserve the representative quality of the sample, we kept every other census block unit of the large sample realized in 2002. This way we kept the identical allocation by strata. In selected census block unit, the same households were interviewed as in the basic survey in 2002. The planned sample of Family Income Support recipients in 2002 and Roma from Roma settlements in 2003 was 500 households for each group.

Sample type: In both national surveys the implemented sample was a two-stage stratified sample. Units of the first stage were enumeration districts, and units of the second stage were the households. In the basic 2002 survey, enumeration districts were selected with probability proportional to number of households, so that the enumeration districts with bigger number of households have a higher probability of selection. In the repeated survey in 2003, first-stage units (census block units) were selected from the basic sample obtained in 2002 by including only even numbered census block units. In practice this meant that every second census block unit from the previous survey was included in the sample. In each selected enumeration district the same households interviewed in the previous round were included and interviewed. On finishing the survey in 2003 the cases were merged both on the level of households and members.

Stratification: Municipalities are stratified into the following six territorial strata: Vojvodina, Belgrade, Western Serbia, Central Serbia (Šumadija and Pomoravlje), Eastern Serbia and South-east Serbia. Primary units of selection are further stratified into enumeration districts which belong to urban type of settlements and enumeration districts which belong to rural type of settlement.

The sample of Family Income Support recipients represented the cases chosen randomly from the official list of recipients provided by Ministry of Social Affairs. The sample of Roma from Roma settlements was, as in the national survey, a two-staged stratified sample, but the units in the first stage were settlements where Roma population was represented in the percentage over 7%, and the units of the second stage were Roma households. Settlements are stratified in three territorial strata: Vojvodina, Beograd and Central Serbia.

Mode of data collection

Face-to-face [f2f]

Research instrument

In all surveys the same questionnaire with minimal changes was used. It included different modules, topically separate areas which had an aim of perceiving the living standard of households from different angles. Topic areas were the following: 1. Roster with demography. 2. Housing conditions and durables module with information on the age of durables owned by a household with a special block focused on collecting information on energy billing, payments, and usage. 3. Diary of food expenditures (weekly), including home production, gifts and transfers in kind. 4. Questionnaire of main expenditure-based recall periods sufficient to enable construction of annual consumption at the household level, including home production, gifts and transfers in kind. 5. Agricultural production for all households which cultivate 10+ acres of land or who breed cattle. 6. Participation and social transfers module with detailed breakdown by programs 7. Labour Market module in line with a simplified version of the Labour Force Survey (LFS), with special additional questions to capture various informal sector activities, and providing information on earnings 8. Health with a focus on utilization of services and expenditures (including informal payments) 9. Education module, which incorporated pre-school, compulsory primary education, secondary education and university education. 10. Special income block, focusing on sources of income not covered in other parts (with a focus on remittances).

Response rate

During field work, interviewers kept a precise diary of interviews, recording both successful and unsuccessful visits. Particular attention was paid to reasons why some households were not interviewed. Separate marks were given for households which were not interviewed due to refusal and for cases when a given household could not be found on the territory of the chosen census block.

In 2002 a total of 7,491 households were contacted. Of this number a total of 6,386 households in 621 census rounds were interviewed. Interviewers did not manage to collect the data for 1,106 or 14.8% of selected households. Out of this number 634 households
w
Surveying Japanese-Brazilian Households: Comparison of Census-Based,...
microdata.worldbank.org
catalog.ihsn.org
Updated Jan 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Johan Mistiaen (2020). Surveying Japanese-Brazilian Households: Comparison of Census-Based, Snowball and Intercept Point Surveys 2006 - Brazil [Dataset]. https://microdata.worldbank.org/index.php/catalog/2231
Explore at:
Dataset updated
Jan 9, 2020
Dataset provided by
Johan Mistiaen
David McKenzie
Time period covered
2006 - 2007
Area covered
Brazil
Description
Abstract

This study is an experiment designed to compare the performance of three methodologies for sampling households with migrants:

a stratified sample using the census to sample census tracts randomly, in which each household is then listed and screened to determine whether or not it has a migrant, with the full length questionnaire then being applied in a second phase only to the households of interest;

a snowball survey in which households are asked to provide referrals to other households with migrant members;

an intercept point survey (or time-and-space sampling survey), in which individuals are sampled during set time periods at a prespecified set of locations where households in the target group are likely to congregate.

Researchers from the World Bank applied these methods in the context of a survey of Brazilians of Japanese descent (Nikkei), requested by the World Bank. There are approximately 1.2-1.9 million Nikkei among Brazil’s 170 million population.

The survey was designed to provide detail on the characteristics of households with and without migrants, to estimate the proportion of households receiving remittances and with migrants in Japan, and to examine the consequences of migration and remittances on the sending households.

The same questionnaire was used for the stratified random sample and snowball surveys, and a shorter version of the questionnaire was used for the intercept surveys. Researchers can directly compare answers to the same questions across survey methodologies and determine the extent to which the intercept and snowball surveys can give similar results to the more expensive census-based survey, and test for the presence of biases.

Geographic coverage

Sao Paulo and Parana states

Analysis unit

Japanese-Brazilian (Nikkei) households and individuals

The 2000 Brazilian Census was used to classify households as Nikkei or non-Nikkei. The Brazilian Census does not ask ethnicity but instead asks questions on race, country of birth and whether an individual has lived elsewhere in the last 10 years. On the basis of these questions, a household is classified as (potentially) Nikkei if it has any of the following: 1) a member born in Japan; 2) a member who is of yellow race and who has lived in Japan in the last 10 years; 3) a member who is of yellow race, who was not born in a country other than Japan (predominantly Korea, Taiwan or China) and who did not live in a foreign country other than Japan in the last 10 years.

Kind of data

Sample survey data [ssd]

Sampling procedure

1) Stratified random sample survey

Two states with the largest Nikkei population - Sao Paulo and Parana - were chosen for the study.

The sampling process consisted of three stages. First, a stratified random sample of 75 census tracts was selected based on 2000 Brazilian census. Second, interviewers carried out a door-to-door listing within each census tract to determine which households had a Nikkei member. Third, the survey questionnaire was then administered to households that were identified as Nikkei. A door-to-door listing exercise of the 75 census tracts was then carried out between October 13th, 2006, and October 29th, 2006. The fieldwork began on November 19, 2006, and all dwellings were visited at least once by December 22, 2006. The second wave of surveying took place from January 18th, 2007, to February 2nd, 2007, which was intended to increase the number of households responding.

2) Intercept survey

The intercept survey was designed to carry out interviews at a range of locations that were frequented by the Nikkei population. It was originally designed to be done in Sao Paulo city only, but a second intercept point survey was later carried out in Curitiba, Parana. Intercept survey took place between December 9th, 2006, and December 20th, 2006, whereas the Curitiba intercept survey took place between March 3rd and March 12th, 2007.

Consultations with Nikkei community organizations, local researchers and officers of the bank Sudameris, which provides remittance services to this community, were used to select a broad range of locations. Interviewers were assigned to visit each location during prespecified blocks of time. Two fieldworkers were assigned to each location. One fieldworker carried out the interviews, while the other carried out a count of the number of people with Nikkei appearance who appeared to be 18 years old or older who passed by each location. For the fixed places, this count was made throughout the prespecified time block. For example, between 2.30 p.m. and 3.30 p.m. at the sports club, the interviewer counted 57 adult Nikkeis. Refusal rates were carefully recorded, along with the sex and approximate age of the person refusing.

In all, 516 intercept interviews were collected.

3) Snowball sampling survey

The questionnaire that was used was the same as used for the stratified random sample. The plan was to begin with a seed list of 75 households, and to aim to reach a total sample of 300 households through referrals from the initial seed households. Each household surveyed was asked to supply the names of three contacts: (a) a Nikkei household with a member currently in Japan; (b) a Nikkei household with a member who has returned from Japan; (c) a Nikkei household without members in Japan and where individuals had not returned from Japan.

The snowball survey took place from December 5th to 20th, 2006. The second phase of the snowballing survey ran from January 22nd, 2007, to March 23rd, 2007. More associations were contacted to provide additional seed names (69 more names were obtained) and, as with the stratified sample, an adaptation of the intercept survey was used when individuals refused to answer the longer questionnaire. A decision was made to continue the snowball process until a target sample size of 100 had been achieved.

The final sample consists of 60 households who came as seed households from Japanese associations, and 40 households who were chain referrals. The longest chain achieved was three links.

Mode of data collection

Face-to-face [f2f]

Research instrument

1) Stratified sampling and snowball survey questionnaire

This questionnaire has 36 pages with over 1,000 variables, taking over an hour to complete.

If subjects refused to answer the questionnaire, interviewers would leave a much shorter version of the questionnaire to be completed by the household by themselves, and later picked up. This shorter questionnaire was the same as used in the intercept point survey, taking seven minutes on average. The intention with the shorter survey was to provide some data on households that would not answer the full survey because of time constraints, or because respondents were reluctant to have an interviewer in their house.

2) Intercept questionnaire

The questionnaire is four pages in length, consisting of 62 questions and taking a mean time of seven minutes to answer. Respondents had to be 18 years old or older to be interviewed.

Response rate

1) Stratified random sampling 403 out of the 710 Nikkei households were surveyed, an interview rate of 57%. The refusal rate was 25%, whereas the remaining households were either absent on three attempts or were not surveyed because building managers refused permission to enter the apartment buildings. Refusal rates were higher in Sao Paulo than in Parana, reflecting greater concerns about crime and a busier urban environment.

2) Intercept Interviews 516 intercept interviews were collected, along with 325 refusals. The average refusal rate is 39%, with location-specific refusal rates ranging from only 3% at the food festival to almost 66% at one of the two grocery stores.
f
DataSheet2_Cell Type Diversity Statistic: An Entropy-Based Metric to Compare...
frontiersin.figshare.com
xlsx
Updated Jun 6, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tanya T Karagiannis; Stefano Monti; Paola Sebastiani (2023). DataSheet2_Cell Type Diversity Statistic: An Entropy-Based Metric to Compare Overall Cell Type Composition Across Samples.XLSX [Dataset]. http://doi.org/10.3389/fgene.2022.855076.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2022.855076.s002
Dataset updated
Jun 6, 2023
Dataset provided by
Frontiers
Authors
Tanya T Karagiannis; Stefano Monti; Paola Sebastiani
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Changes of cell type composition across samples can carry biological significance and provide insight into disease and other conditions. Single cell transcriptomics has made it possible to study cell type composition at a fine resolution. Most single cell studies investigate compositional changes between samples for each cell type independently, not accounting for the fixed number of cells per sample in sequencing data. Here, we provide a metric of the distribution of cell type proportions in a sample that can be used to compare the overall distribution of cell types across multiple samples and biological conditions. This is the first method to measure overall cell type composition at the single cell level. We use the method to assess compositional changes in peripheral blood mononuclear cells (PBMCs) related to aging and extreme old age using multiple single cell datasets from individuals of four age groups across the human lifespan.
Analysis of proportional data in reproductive and developmental toxicity...
catalog.data.gov
Updated Nov 12, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Analysis of proportional data in reproductive and developmental toxicity studies: comparison of logit transformation, arcsine square root transformation, and nonparametric analysis [Dataset]. https://catalog.data.gov/dataset/analysis-of-proportional-data-in-reproductive-and-developmental-toxicity-studies-compariso
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
We conducted power calculations to compare different approaches (nonparametric, arcsine square root-transformed, logit-transformed, untransformed) for analyzing litter-based proportional data. A reproductive toxicity study with a control and one treated group provided data for two endpoints: prenatal loss, and fertility by in utero insemination (IUI). Type I error and power were estimated by 10,000 simulations based on two-sample one-tailed t-tests with varying numbers of litters per group. To further compare the different approaches, we conducted additional analyses with the mean proportions shifted toward zero to produce illustrative scenarios. Analyses based on logit-transformed proportions had greater power than those based on untransformed or arcsine square root-transformed proportions, or nonparametric procedures.
Living Standards Survey 1995 -1997 - China
microdata.fao.org
Updated Nov 8, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The World Bank (2022). Living Standards Survey 1995 -1997 - China [Dataset]. https://microdata.fao.org/index.php/catalog/1533
Explore at:
Dataset updated
Nov 8, 2022
Dataset provided by
World Bankhttp://worldbank.org/
Research Centre for Rural Economy
Time period covered
1995 - 1997
Area covered
China
Description
Abstract

China Living Standards Survey (LSS) consists of one household survey and one community (village) survey, conducted in Hebei and Liaoning Provinces (northern and northeast China) in July 1995 and July 1997 respectively. Five villages from each three sample counties of each province were selected (six were selected in Liaoyang County of Liaoning Province because of administrative area change). About 880 farm households were selected from total thirty-one sample villages for the household survey. The same thirty-one villages formed the samples of community survey. This document provides information on the content of different questionnaires, the survey design and implementation, data processing activities, and the different available data sets.

Geographic coverage

Regional

Analysis unit

Households

Kind of data

Sample survey data [ssd]

Sampling procedure

The China LSS sample is not a rigorous random sample drawn from a well-defined population. Instead it is only a rough approximation of the rural population in Hebei and Liaoning provinces in North-eastern China. The reason for this is that part of the motivation for the survey was to compare the current conditions with conditions that existed in Hebei and Liaoning in the 1930's. Because of this, three counties in Hebei and three counties in Liaoning were selected as "primary sampling units" because data had been collected from those six counties by the Japanese occupation government in the 1930's. Within each of these six counties (xian) five villages (cun) were selected, for an overall total of 30 villages (in fact, an administrative change in one village led to 31 villages being selected). In each county a "main village" was selected that was in fact a village that had been surveyed in the 1930s. Because of the interest in these villages 50 households were selected from each of these six villages (one for each of the six counties). In addition, four other villages were selected in each county. These other villages were not drawn randomly but were selected so as to "represent" variation within the county. Within each of these villages 20 households were selected for interviews. Thus, the intended sample size was 780 households, 130 from each county. Unlike county and village selection, the selection of households within each village was done according to standard sample selection procedures. In each village, a list of all households in the village was obtained from village leaders. An "interval" was calculated as the number of the households in the village divided by the number of households desired for the sample (50 for main villages and 20 for other villages). For the list of households, a random number was drawn between 1 and the interval number. This was used as a starting point. The interval was then added to this number to get a second number, then the interval was added to this second number to get a third number, and so on. The set of numbers produced were the numbers used to select the households, in terms of their order on the list. In fact, the number of households in the sample is 785, as opposed to 780. Most of this difference is due to a village in which 24 households were interviewed, as opposed to the goal of 20 households

Mode of data collection

Face-to-face [f2f]

Cleaning operations

(a) DATA ENTRY All responses obtained from the household interviews were recorded in the household questionnaires. These were then entered into the computer, in the field, using data entry programs written in BASIC. The data produced by the data entry program were in the form of household files, i.e. one data file for all of the data in one household/community questionnaire. Thus, for the household there were about 880 data files. These data files were processed at the University of Toronto and the World Bank to produce datasets in statistical software formats, each of which contained information for all households for a subset of variables. The subset of variables chosen corresponded to data entry screens, so these files are hereafter referred to as "screen files". For the household survey component 66 data files were created. Members of the survey team checked and corrected data by checking the questionnaires for original recorded information. We would like to emphasize that correction here refers to checking questionnaires, in case of errors in skip patterns, incorrect values, or outlying values, and changing values if and only if data in the computer were different from those in the questionnaires. The personnel in charge of data preparation were given specific instructions not to change data even if values in the questionnaires were clearly incorrect. We have no reason to believe that these instructions were not followed, and every reason to believe that the data resulting from these checks and corrections are accurate and of the highest quality possible.

(b) DATA EDITING The screen files were then brought to World Bank headquarters in Washington, D.C. and uploaded to a mainframe computer, where they were converted to "standard" LSMS formats by merging datasets to produce separate datasets for each section with variable names corresponding to the questionnaires. In some cases, this has meant a single dataset for a section, while in others it has meant retaining "screen" datasets with just the variable names changed. Linking Parts of the Household Survey Each household has a unique identification number which is contained in the variable HID. Values for this variable range from 10101 to 60520. The first number is the code for the six counties in which data were collected, the second and third digits are for the villages within each county. Finally, the last two digits of HID contain the household number within the village. Data for households from different parts of the survey can be merged by using the HID variable which appears in each dataset of the household survey. To link information for an individual use should be made of both the household identification number, HID, and the person identification number, PID. A child in the household can be linked to the parents, if the parents are household members, through the parents' id codes in Section 01B. For parents who are not in the household, information is collected on the parent's schooling, main occupation and whether he/she is currently alive. Household members can be linked with their non-resident children through the parents' id codes in Section 01C. Linking the Household to the Community Data The community data have a somewhat different set of identifying variables than the household data. Each community dataset has four identifying variables: province (code 7 for Hebei and code 8 for Liaoning); county (six two digit codes, of which the first digit represents province and the second digit represents the three counties in each province); township (3 digit code, first digit is county, second digit is county and third digit is township); and village (4 digit code, first digit is county, second digit is county, third digit is township, and third fourth digit is village). Constructed Data Set Researchers at the World Bank and the University of Toronto have created a data set with information on annual household expenditures, region codes, etc. This constructed data set is made available for general use with the understanding that the description below is the only documentation that will be provided. Any manipulation of the data requires assumptions to be made and, as much as possible, those assumptions are explained below. Except where noted, the data set has been created using only the original (raw) data sets. A researcher could construct a somewhat different data set by incorporating different assumptions. Aggregate Expenditure, TOTEXP. The dataset TOTEXP contains variables for total household annual expenditures (for the year 1994) and variables for the different components of total household expenditures: food expenditures, non-food expenditures, use value of consumer durables, etc. These, along with the algorithm used to calculate household expenditures are detailed in Appendix D. The dataset also contains the variable HID, which can be used to match this dataset to the household level data set. Note that all of the expenditure variables are totals for the household. That is, they are not in per capita terms. Researchers will have to divide these variables by household size to get per capita numbers. The household size variable is included in the data set.
f
UC_vs_US Statistic Analysis.xlsx
figshare.com
xlsx
Updated Jul 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
F. (Fabiano) Dalpiaz (2020). UC_vs_US Statistic Analysis.xlsx [Dataset]. http://doi.org/10.23644/uu.12631628.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.23644/uu.12631628.v1
Dataset updated
Jul 9, 2020
Dataset provided by
Utrecht University
Authors
F. (Fabiano) Dalpiaz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.

Tagging scheme: Aligned (AL) - A concept is represented as a class in both models, either

with the same name or using synonyms or clearly linkable names; Wrongly represented (WR) - A class in the domain expert model is incorrectly represented in the student model, either (i) via an attribute, method, or relationship rather than class, or (ii) using a generic term (e.g., user'' instead ofurban planner''); System-oriented (SO) - A class in CM-Stud that denotes a technical implementation aspect, e.g., access control. Classes that represent legacy system or the system under design (portal, simulator) are legitimate; Omitted (OM) - A class in CM-Expert that does not appear in any way in CM-Stud; Missing (MI) - A class in CM-Stud that does not appear in any way in CM-Expert.

All the calculations and information provided in the following sheets

originate from that raw data.

Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,

including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.

Sheet 3 (Size-Ratio):

The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.

Sheet 4 (Overall):

Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.

For sheet 4 as well as for the following four sheets, diverging stacked bar

charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:

Sheet 5 (By-Notation):

Model correctness and model completeness is compared by notation - UC, US.

Sheet 6 (By-Case):

Model correctness and model completeness is compared by case - SIM, HOS, IFA.

Sheet 7 (By-Process):

Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.

Sheet 8 (By-Grade):

Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.
d
Rainbow trout eDNA and mesocosm water volume comparison data, Creston Fish...
catalog.data.gov
data.usgs.gov
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Rainbow trout eDNA and mesocosm water volume comparison data, Creston Fish Hatchery MT, 2016 [Dataset]. https://catalog.data.gov/dataset/rainbow-trout-edna-and-mesocosm-water-volume-comparison-data-creston-fish-hatchery-mt-2016
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
We used mesocosm experiments to compare coarse filter-large water volume samples (hereafter large volume filter samples) vs. fine filter-small water volume samples (hereafter small volume filter samples) for detection and quantification of rainbow trout (Oncorhynchus mykiss). We report the quantity of rainbow trout DNA detected using large and small water volume sampling approaches under control (no fish) and treatment (1 fish) conditions.
Method Comparison Manuscript
catalog.data.gov
gimi9.com
Updated Nov 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2020). Method Comparison Manuscript [Dataset]. https://catalog.data.gov/dataset/method-comparison-manuscript
Explore at:
Dataset updated
Nov 12, 2020
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
Coliphage are alternative fecal indicators that may be suitable surrogates for viral pathogens, but a majority of standard detection methods utilize insufficient sample volumes (1-100 mL) for routine detection in environmental waters. Here we compare three somatic and F+ coliphage enumeration methods based on a paired measurement from 1L samples collected from the Great Lakes region (n=74). Methods include: 1) a dead-end hollow fiber ultrafilter combined with single agar layer plaque assay (D-HFUF-SAL); 2) a modified SAL (M-SAL); and 3) a direct membrane filtration (DMF) technique. Overall, D-HFUF-SAL outperformed all other methods as it yielded the lowest frequency of non-detects [(ND); 10.8%] and the highest average coliphage concentrations (2.51 ± 1.02 log10 plaque forming unit/liter (PFU/L) and 0.79 ± 0.71 log10 PFU/L for somatic and F+, respectively). M-SAL yielded 29.7% ND and average concentrations of 2.26 ± 1.15 log10 PFU/L (somatic) and 0.59 ± 0.82 log10 PFU/L (F+). DMF performed worse compared to D-HFUF-SAL and M-SAL methods (ND of 65.6%; average somatic coliphage concentration 1.52 ± 1.32 log10 PFU/L, with no F+ detected), indicating this procedure is unsuitable for 1L surface water sample volumes. This study represents an important step toward the use of a coliphage method for recreational water quality criteria purposes. This dataset is associated with the following publication: McMinn, B., E. Rhodes, E. Huff, P. Wanjugi, M. Ware, S. Nappier, M. Cyterski, O. Shanks, K. Oshima, and A. Korajkic. Comparison of somatic and F+ coliphage enumeration methods with large volume surface water samples. JOURNAL OF VIROLOGICAL METHODS. Elsevier Science Ltd, New York, NY, USA, 261: 63-66, (2018).
e
DIA-PASEF acquisition of mixed-species samples for ProteoBench
ebi.ac.uk
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Karima Chaoui, DIA-PASEF acquisition of mixed-species samples for ProteoBench [Dataset]. https://www.ebi.ac.uk/pride/archive/projects/PXD062685
Explore at:
Authors
Karima Chaoui
Variables measured
Proteomics
Description
Analysis of samples with peptides from different species in known amounts to compare performances of data analysis software tools within the ProteoBench platform.
f
Comparison of soil sampling and analytical methods for asbestos at the Sumas...
plos.figshare.com
xlsx
Updated Jun 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julie Wroble; Timothy Frederick; Alicia Frame; Daniel Vallero (2023). Comparison of soil sampling and analytical methods for asbestos at the Sumas Mountain Asbestos Site—Working towards a toolbox for better assessment [Dataset]. http://doi.org/10.1371/journal.pone.0180210
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0180210
Dataset updated
Jun 6, 2023
Dataset provided by
PLOS ONE
Authors
Julie Wroble; Timothy Frederick; Alicia Frame; Daniel Vallero
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Established soil sampling methods for asbestos are inadequate to support risk assessment and risk-based decision making at Superfund sites due to difficulties in detecting asbestos at low concentrations and difficulty in extrapolating soil concentrations to air concentrations. Environmental Protection Agency (EPA)’s Office of Land and Emergency Management (OLEM) currently recommends the rigorous process of Activity Based Sampling (ABS) to characterize site exposures. The purpose of this study was to compare three soil analytical methods and two soil sampling methods to determine whether one method, or combination of methods, would yield more reliable soil asbestos data than other methods. Samples were collected using both traditional discrete (“grab”) samples and incremental sampling methodology (ISM). Analyses were conducted using polarized light microscopy (PLM), transmission electron microscopy (TEM) methods or a combination of these two methods. Data show that the fluidized bed asbestos segregator (FBAS) followed by TEM analysis could detect asbestos at locations that were not detected using other analytical methods; however, this method exhibited high relative standard deviations, indicating the results may be more variable than other soil asbestos methods. The comparison of samples collected using ISM versus discrete techniques for asbestos resulted in no clear conclusions regarding preferred sampling method. However, analytical results for metals clearly showed that measured concentrations in ISM samples were less variable than discrete samples.

Facebook

Twitter

Click to copy link

Link copied

Cite

Robin Kramer; Caitlin Telfer; Alice Towler (2017). Supplementary material from "Visual comparison of two data sets: Do people use the means and the variability?" [Dataset]. http://doi.org/10.6084/m9.figshare.4751095.v1

Supplementary material from "Visual comparison of two data sets: Do people use the means and the variability?"

Explore at:

xlsxAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.4751095.v1

Dataset updated

Mar 14, 2017

Dataset provided by

figshare
Figsharehttp://figshare.com/

Authors

Robin Kramer; Caitlin Telfer; Alice Towler

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

In our everyday lives, we are required to make decisions based upon our statistical intuitions. Often, these involve the comparison of two groups, such as luxury versus family cars and their suitability. Research has shown that the mean difference affects judgements where two sets of data are compared, but the variability of the data has only a minor influence, if any at all. However, prior research has tended to present raw data as simple lists of values. Here, we investigated whether displaying data visually, in the form of parallel dot plots, would lead viewers to incorporate variability information. In Experiment 1, we asked a large sample of people to compare two fictional groups (children who drank ‘Brain Juice’ versus water) in a one-shot design, where only a single comparison was made. Our results confirmed that only the mean difference between the groups predicted subsequent judgements of how much they differed, in line with previous work using lists of numbers. In Experiment 2, we asked each participant to make multiple comparisons, with both the mean difference and the pooled standard deviation varying across data sets they were shown. Here, we found that both sources of information were correctly incorporated when making responses. Taken together, we suggest that increasing the salience of variability information, through manipulating this factor across items seen, encourages viewers to consider this in their judgements. Such findings may have useful applications for best practices when teaching difficult concepts like sampling variation.

Clear search

Close search

Google apps

Main menu

Supplementary material from "Visual comparison of two data sets: Do people...

A 2D Design Space defined with non-linear equations using different sampling...

Estimating the Size of Populations through a Household Survey 2011 - Rwanda

Abstract

Geographic coverage

Analysis unit

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

Response rate

Sampling error estimates

Data from: Comparison of seven DNA metabarcoding sampling methods to assess...

Description of the data and file structure

2010-2014 ACS Children by Parental Labor Force Participation Variables -...

Data_Sheet_1_The Power of Microbiome Studies: Some Considerations on Which...

Case for omitting tied observations in the two-sample t-test and the...

Data from: A comparison of density estimation methods for monitoring marked...

Data from: Floristic composition, structure and diversity of riparian...

Data from: Acids in Coffee - A Review of Sensory Measurements and...

Living Standards Measurement Survey 2003 (General Population, Wave 2 Panel)...

Abstract

Geographic coverage

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Response rate

Surveying Japanese-Brazilian Households: Comparison of Census-Based,...

Abstract

Geographic coverage

Analysis unit

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Response rate

DataSheet2_Cell Type Diversity Statistic: An Entropy-Based Metric to Compare...

Analysis of proportional data in reproductive and developmental toxicity...

Living Standards Survey 1995 -1997 - China

Abstract

Geographic coverage

Analysis unit

Kind of data

Sampling procedure

Mode of data collection

Cleaning operations

UC_vs_US Statistic Analysis.xlsx

Rainbow trout eDNA and mesocosm water volume comparison data, Creston Fish...

Method Comparison Manuscript

DIA-PASEF acquisition of mixed-species samples for ProteoBench

Comparison of soil sampling and analytical methods for asbestos at the Sumas...

Supplementary material from "Visual comparison of two data sets: Do people use the means and the variability?"