Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Numerous studies demonstrating that statistical errors are common in basic science publications have led to calls to improve statistical training for basic scientists. In this article, we sought to evaluate statistical requirements for PhD training and to identify opportunities for improving biostatistics education in the basic sciences. We provide recommendations for improving statistics training for basic biomedical scientists, including: 1. Encouraging departments to require statistics training, 2. Tailoring coursework to the students’ fields of research, and 3. Developing tools and strategies to promote education and dissemination of statistical knowledge. We also provide a list of statistical considerations that should be addressed in statistics education for basic scientists.
"Robust standard errors" are used in a vast array of scholarship to correct standard errors for model misspecification. However, when misspecification is bad enough to make classical and robust standard errors diverge, assuming that it is nevertheless not so bad as to bias everything else requires considerable optimism. And even if the optimism is warranted, settling for a misspecified model, with or without robust standard errors, w ill still bias estimators of all but a few quantities of interest. Even though this message is well known to methodologists, it has failed to reach most applied researchers. The resulting cavernous gap between theory and practice suggests that considerable gains in applied statistics may be possible. We seek to help applied researchers realize these gains via an alternative perspective that offers a productive way to use robust standard errors; a new general and easier-to-use "generalized information matrix test" statistic; and practical illustrations via simulations and real examples from published research. Instead of jettisoning this extremely popular tool, as some suggest, we show how robust and classical standard error differences can provide effective clues about model misspecification, likely biases, and a guide to more reliable inferences. See also: Unifying Statistical Analysis
THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE PALESTINIAN CENTRAL BUREAU OF STATISTICS
The Palestinian Central Bureau of Statistics (PCBS) carried out four rounds of the Labor Force Survey 2005 (LFS). The survey rounds covered a total sample of about 30252 households, and the number of completed questionaire was 26595, which amounts to a sample of around 92384 individuals aged 15 years and over.
The importance of this survey lies in that it focuses mainly on labour force key indicators, main characteristics of the employed, unemployed, underemployed and persons outside labour force, labour force according to level of education, distribution of the employed population by occupation, economic activity, place of work, employment status, hours and days worked and average daily wage in NIS for the employees.
The survey main objectives are: - To estimate the labor force and its percentage to the population. - To estimate the number of employed individuals. - To analyze labour force according to gender, employment status, educational level , occupation and economic activity. - To provide information about the main changes in the labour market structure and its socio economic characteristics. - To estimate the numbers of unemployed individuals and analyze their general characteristics. - To estimate the rate of working hours and wages for employed individuals in addition to analyze of other characteristics.
The raw survey data provided by the Statistical Agency were cleaned and harmonized by the Economic Research Forum, in the context of a major project that started in 2009. During which extensive efforts have been exerted to acquire, clean, harmonize, preserve and disseminate micro data of existing labor force surveys in several Arab countries.
Covering a representative sample on the region level (West Bank, Gaza Strip), the locality type (urban, rural, camp) and the governorates.
1- Household/family. 2- Individual/person.
The survey covered all Palestinian households who are a usual residence of the Palestinian Territory.
Sample survey data [ssd]
THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE PALESTINIAN CENTRAL BUREAU OF STATISTICS
The methodology was designed according to the context of the survey, international standards, data processing requirements and comparability of outputs with other related surveys.
All Palestinians aged 10 years or older living in the Palestinian Territory, excluding those living in institutions such as prisons or shelters.
The sampling frame consisted of a master sample of Enumeration Areas (EAs) selected from the population housing and establishment census 1997. The master sample consists of area units of relatively equal size (number of households), these units have been used as Primary Sampling Units (PSUs).
The sample is a two-stage stratified cluster random sample.
Stratification: Four levels of stratification were made:
The sample size in the first round consisted of 7,563 households, which amounts to a sample of around 22,759 persons aged 15 years and over. In the second round the sample consisted of 7,563 households, which amounts to a sample of around 23,104 persons aged 15 years and over, in the third round the sample consisted of 7,563 households, which amounts to a sample of around 23,123 persons aged 15 years and over. In the fourth round the sample consisted of 7,563 households; which amounts to a sample of around 23,398 persons aged 15 years and over.
The sample size allowed for non-response and related losses. In addition, the average number of households selected in each cell was 16.
Each round of the Labor Force Survey covers all the 481 master sample areas. Basically, the areas remain fixed over time, but households in 50% of the EAs are replaced each round. The same household remains in the sample over 2 consecutive rounds, rests for the next two rounds and represented again in the sample for another and last two consecutive rounds before it is dropped from the sample. A 50 % overlap is then achieved between both consecutive rounds and between consecutive years (making the sample efficient for monitoring purposes). In earlier applications of the LFS (rounds 1 to 11); the rotation pattern used was different; requiring a household to remain in the sample for six consecutive rounds, then dropped. The objective of such a pattern was to increase the overlap between consecutive rounds. The new rotation pattern was introduced to reduce the burden on the households resulting from visiting the same household for six consecutive times.
Face-to-face [f2f]
One of the main survey tools is the questionnaire, the survey questionnaire was designed according to the International Labour Organization (ILO) recommendations. The questionnaire includes four main parts:
Identification Data: The main objective for this part is to record the necessary information to identify the household, such as, cluster code, sector, type of locality, cell, housing number and the cell code.
Quality Control: This part involves groups of controlling standards to monitor the field and office operation, to keep in order the sequence of questionnaire stages (data collection, field and office coding, data entry, editing after entry and store the data.
Household Roster: This part involves demographic characteristics about the household, like number of persons in the household, date of birth, sex, educational level…etc.
Employment Part: This part involves the major research indicators, where one questionnaire had been answered by every 15 years and over household member, to be able to explore their labour force status and recognize their major characteristics toward employment status, economic activity, occupation, place of work, and other employment indicators.
Data editing took place at a number of stages through the processing including: 1. office editing and coding 2. during data entry 3. structure checking and completeness 4. structural checking of SPSS data files
The overall response rate for the survey was 93.2%
More information on the distribution of response rates by different survey rounds is available in Page 12 of the data user guide provided among the disseminated survey materials under a file named "Palestine 2005- Data User Guide (English).pdf".
Since the data reported here are based on a sample survey and not on a complete enumeration, they are subjected to sampling errors as well as non-sampling errors. Sampling errors are random outcomes of the sample design, and are, therefore, in principle measurable by the statistical concept of standard error. A description of the estimated standard errors and the effects of the sample design on sampling errors are provided in the annual report provided among the disseminated survey materials under a file named "Palestine 2005- LFS Annual Report (Arabic).pdf".
Non-sampling errors can occur at the various stages of survey implementation whether in data collection or in data processing. They are generally difficult to be evaluated statistically. They cover a wide range of errors, including errors resulting from non-response, sampling frame coverage, coding and classification, data processing, and survey response (both respondent and interviewer-related). The use of effective training and supervision and the careful design of questions have direct bearing on limiting the magnitude of non-sampling errors, and hence enhancing the quality of the resulting data. The following are possible sources of non-sampling errors:
• Errors due to non-response because households were away from home or refused to participate. The overall non response rate amounted to almost 12.1% which is relatively low; a much higher rates is rather common in an international perspective. The refusal rate was only 0.8%. It is difficult
https://dataverse.nl/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.34894/U9L9NVhttps://dataverse.nl/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.34894/U9L9NV
Social science commonly studies relationships among variables by employing survey questions. Answers to these questions will contain some degree of measurement error, distorting the relationships of interest. Such distortions can be removed by standard statistical methods, when these are provided knowledge of a question’s measurement error variance. However, acquiring this information routinely necessitates additional experimentation, which is infeasible in practice. We use three decades’ worth of survey experiments combined with machine learning methods to show that survey measurement error variance can be predicted from the way a question was asked. By predicting experimentally obtained estimates of survey measurement error variance from question characteristics, we enable researchers to obtain estimates of the extent of measurement error in a survey question without requiring additional data collection. Our results suggest only some commonly accepted best practices in survey design have a noticeable impact on study quality, and that predicting measurement error variance is a useful approach to removing this impact in future social surveys.
THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE PALESTINIAN CENTRAL BUREAU OF STATISTICS
The Palestinian Central Bureau of Statistics (PCBS) carried out four rounds of the Labor Force Survey 2009 (LFS). The survey rounds covered a total sample of about 30,625 households, and the number of completed questionaire is 26,581, which amounts to a sample of around 116,202 individuals aged 10 years and over, including 94,304 individuals in the working-age population 15 years and above.
The importance of this survey lies in that it focuses mainly on labour force key indicators, main characteristics of the employed, unemployed, underemployed and persons outside labour force, labour force according to level of education, distribution of the employed population by occupation, economic activity, place of work, employment status, hours and days worked and average daily wage in NIS for the employees.
The survey main objectives are: - To estimate the labor force and its percentage to the population. - To estimate the number of employed individuals. - To analyze labour force according to gender, employment status, educational level, occupation and economic activity. - To provide information about the main changes in the labour market structure and its socio economic characteristics. - To estimate the numbers of unemployed individuals and analyze their general characteristics. - To estimate the rate of working hours and wages for employed individuals in addition to analyze of other characteristics.
The raw survey data provided by the Statistical Agency were cleaned and harmonized by the Economic Research Forum, in the context of a major project that started in 2009. During which extensive efforts have been exerted to acquire, clean, harmonize, preserve and disseminate micro data of existing labor force surveys in several Arab countries.
Covering a representative sample on the region level (West Bank, Gaza Strip), the locality type (urban, rural, camp) and the governorates.
1- Household/family. 2- Individual/person.
The survey covered all Palestinian households who are a usual residence of the Palestinian Territory.
Sample survey data [ssd]
THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE PALESTINIAN CENTRAL BUREAU OF STATISTICS
The methodology was designed according to the context of the survey, international standards, data processing requirements and comparability of outputs with other related surveys.
---> Target Population: All Palestinians aged 10 years and over living in the Palestinian Territory, excluding persons living in institutions such as prisons or shelters.
---> Sampling Frame: The sampling frame consisted of a master sample of enumeration areas (EAs) selected from the population housing and establishment census 2007, the master sample consists of area units of relatively equal size (number of households), these units have been used as primary sampling units (PSUs).
---> Sample Design The sample is a two-stage stratified cluster random sample.
---> Stratification: Four levels of stratification were made: 1. Stratification by Governorates. 2. Stratification by type of locality which comprises: (a) Urban (b) Rural (c) Refugee Camps 3. Stratification by classifying localities, excluding governorate centers, into three strata based on the ownership of households of durable goods within these localities. 4. Stratification by size of locality (number of households).
---> Sample Size: The sample size was about 7,627 households in the 52 th round and 7,627 households in the 53th round, and 7,677 households in the 54th round and 7,694 households in the 55th round, the total number of the households about 30,625 households, the number of completed questionnaires about 26,590 questionnaires, this number considered appropriate to provide estimations on main labour force characteristics at Palestinian Territory. The sample size in 1st quarter, 2009 consisted of 7,627 households, which amounts 29,559 persons aged 10 years and over (including 23,901 aged 15 years and over). In the 2nd quarter the sample consisted of 7,627 households, which amounts of 27,135 persons aged 10 years and over (including 22,124 aged 15 years and over), in the 3rd quarter the sample consisted of 7,677 households, which amounts of 29,455 persons aged 10 years and over (including 23,907 aged 15 years and over). In the 4th quarter the sample consisted of 7,694 households; which amounts of 30,053 persons aged 10 years and over (including 24,371 aged 15 years and over).
---> Sample Rotation: Each round of the Labor Force Survey covers all the 481 master sample areas. Basically, the areas remain fixed over time, but households in 50% of the EAs are replaced each round. The same household remains in the sample over 2 consecutive rounds, rests for the next two rounds and represented again in the sample for another and last two consecutive rounds before it is dropped from the sample. A 50 % overlap is then achieved between both consecutive rounds and between consecutive years (making the sample efficient for monitoring purposes). In earlier applications of the LFS (rounds 1 to 11); the rotation pattern used was different; requiring a household to remain in the sample for six consecutive rounds, then dropped. The objective of such a pattern was to increase the overlap between consecutive rounds. The new rotation pattern was introduced to reduce the burden on the households resulting from visiting the same household for six consecutive times.
Face-to-face [f2f]
The survey questionnaire was designed according to the International Labour Organization (ILO) recommendations. The questionnaire includes four main parts:
---> 1. Identification Data: The main objective for this part is to record the necessary information to identify the household, such as, cluster code, sector, type of locality, cell, housing number and the cell code.
---> 2. Quality Control: This part involves groups of controlling standards to monitor the field and office operation, to keep in order the sequence of questionnaire stages (data collection, field and office coding, data entry, editing after entry and store the data.
---> 3. Household Roster: This part involves demographic characteristics about the household, like number of persons in the household, date of birth, sex, educational level…etc.
---> 4. Employment Part: This part involves the major research indicators, where one questionnaire had been answered by every 15 years and over household member, to be able to explore their labour force status and recognize their major characteristics toward employment status, economic activity, occupation, place of work, and other employment indicators.
---> Raw Data Data editing took place at a number of stages through the processing including: 1. office editing and coding 2. during data entry 3. structure checking and completeness 4. structural checking of SPSS data files
---> Harmonized Data - The SPSS package is used to clean and harmonize the datasets. - The harmonization process starts with a cleaning process for all raw data files received from the Statistical Agency. - All cleaned data files are then merged to produce one data file on the individual level containing all variables subject to harmonization. - A country-specific program is generated for each dataset to generate/ compute/ recode/ rename/ format/ label harmonized variables. - A post-harmonization cleaning process is then conducted on the data. - Harmonized data is saved on the household as well as the individual level, in SPSS and then converted to STATA, to be disseminated.
Errors due to non-response because households were away from home or refused to participate. The overall non response rate amounted to almost 13.2% which is relatively low; a much higher rates is rather common in an international perspective. The refusal rate was only 1.2%. The errors of over coverage rate was only 5.4%. It is difficult however to assess the amount of bias resulting from non response. The PCBS has not yet undertaken any nonresponse study. Such a study may indicate, that non-response is more frequent in some population groups than in others. This is rather normal and such information is necessary to be able to compensate for bias resulting from non-response errors.
---> Statistical Errors Since the data reported here are based on a sample survey and not on a complete enumeration, they are subjected to sampling errors as well as non-sampling errors. Sampling errors are random outcomes of the sample design, and are, therefore, in principle measurable by the statistical concept of standard error. A description of the estimated standard errors and the effects of the sample design on sampling errors are provided in the previous chapter. Data of this survey affected by statistical errors due to use the sample, Therefore, the emergence of certain differences from the real values expect obtained through censuses. It had been calculated variation of the most important indicators exists and the facility with the report and the dissemination levels of the
The objective of the survey was to obtain data on: Number of enterprises and persons engaged in the economic survey series by activity. Value of output, intermediate consumption and stocks. Value added components. Payments and transfers. Assets and capital formation. Contribution of the surveyed activities to the GDP and other national accounts variables.
West Bank and Gaza Strip
enterprise constitutes the primary sampling unit (PSU)
Industrial Enterprises (private sector)
Sample survey data [ssd]
The sample of the Industrial Survey is a single-stage stratified random - systematic sample in which the enterprise constitutes the primary sampling unit (PSU). Three levels of strata were used to arrive at an efficient representative sample (i.e. economic activity, size of employment and geographical levels). The sample size for the West Bank and Gaza Strip amounted to (2,935) enterprises out of the (15,311) enterprises that comprise the survey frame of 2009.
Face-to-face [f2f]
The questionnaire used for this survey has much in common with other questionnaires in the economic survey series. The design of the questionnaire takes into account major economic variables pertaining to the examined phenomenon and meets the needs of the Palestinian National Accounts. Two forms of questionnaires are used a shorter version of the questionnaire used for the enterprises belonging to the household sector and branches, the detailed form used for other sectors.
For insuring quality and consistency of data a set of measures were taken into account for strengthening accuracy of data as follows: · Preparing data entry program before data collection for checking readiness of the program for data entry. · A set of validation rules were applied on the program for checking consistency of data. · Efficiency of the program was checked through pre-testing in entering few questionnaires, including incorrect information for checking its efficiency, in capturing these information. · Well trained data keyers were selected and trained for the main data entry. · Weekly or biweekly data files were received by project management for checking accuracy and consistency, notes of correction are provided for data entry management for correction.
original sample (2935) Non-response cases was 558. Over coverage cases was 100. Net sample: 2,935-100= 2,835. Response rate: 80.3%. Non-response rate: 19.7%.
Statistical Errors: The findings of the survey are affected by statistical errors due to using sampling in conducting the survey for the units of the target population, which increases the chances of having variances from the actual values we expect to obtain from the data had we conducted the survey using comprehensive enumeration.. The variance of the key goods in the survey was computed and dissemination was carried out on the level of West Bank and Gaza Strip for reasons related to sample design and computation of the variance of the different indicators.
Non-Statistical Errors These types of errors could appear on one or all the survey stages that include data collection and data entry: Response errors: these types of errors are related to, responders, fieldworkers, and data entry personnel. To avoid mistakes and reduce the impact there has been a series of actions that would enhance the accuracy of the data through a process of data collection from the field and the data processing.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the R code for the data generation and analysis for the paper:
Zhou, Z., Li, D., Huh, D., Xie, M., & Mun, E. Y. (2023). A Simulation Study of the Performance of Statistical Models for Count Outcomes with Excessive Zeros. arXiv preprint arXiv:2301.12674.
Abstract
Background: Outcome measures that are count variables with excessive zeros are common in health behaviors research. Examples include the number of standard drinks consumed or alcohol-related problems experienced over time. There is a lack of empirical data about the relative performance of prevailing statistical models for assessing the efficacy of interventions when outcomes are zero-inflated, particularly compared with recently developed marginalized count regression approaches for such data. Methods: The current simulation study examined five commonly used approaches for analyzing count outcomes, including two linear models (with outcomes on raw and log-transformed scales, respectively) and three prevailing count distribution-based models (i.e., Poisson, negative binomial, and zero-inflated Poisson (ZIP) models). We also considered the marginalized zero-inflated Poisson (MZIP) model, a novel alternative that estimates the overall effects on the population mean while adjusting for zero-inflation. Motivated by alcohol misuse prevention trials, extensive simulations were conducted to evaluate and compare the statistical power and Type I error rate of candidate statistical models and approaches across data conditions that varied in sample size (N = 100 to 500), zero rate (0.2 to 0.8), and intervention effect sizes conditions. Results: Under zero-inflation, the Poisson model failed to control the Type I error rate, resulting in higher than expected false positive results. When the intervention effects on the zero (vs. non-zero) and count parts were in the same direction, the MZIP model had the highest statistical power, followed by the linear model with outcomes on the raw scale, negative binomial model, and ZIP model. The performance of linear model with a log-transformed outcome variable was unsatisfactory. When only one of the effects on the zero (vs. non-zero) part and the count part existed, the ZIP model had the highest statistical power. Conclusions: The MZIP model demonstrated better statistical properties in detecting true intervention effects and controlling false positive results for zero-inflated count outcomes. This MZIP model may serve as an appealing analytical approach to evaluating overall intervention effects in studies with count outcomes marked by excessive zeros.
The objective of the survey was to obtain data on: 1. Number of enterprises and persons engaged in construction contractors survey by activity. 2. Value of output, intermediate consumption and stocks. 3. Value added components. 4. Payments and transfers. 5. Assets and capital formation. 6. Contribution of the surveyed activities to the GDP and other national accounts variables.
West Bank and Gaza Strip
Enterprise constitutes the primary sampling unit (PSU)
All enterprises in construction contractors survey (comprehensive).
Sample survey data [ssd]
Comprehensive Survey
The number of enterprises in Construction Contractors Survey for the base year 2009, amounted to (495) enterprise, which form the whole frame distributed in all the West bank and Gaza Strip governorates, depending on the establishments census 2007 in determining the frame of contractors enterprises.
Sample design: for all enterprises engaged in economic activities has been done, without sampling techniques (comprehensive counting for all construction activities was adopted).
Face-to-face [f2f]
The questionnaire used for this survey has much in common with other questionnaires in the economic survey series. The design of the questionnaire takes into account major economic variables pertaining to the examined phenomenon and meets the needs of the Palestinian National Accounts. Two forms of questionnaires are used a shorter version of the questionnaire used for the enterprises belonging to the household sector and branches, the detailed form used for other sectors.
For insuring quality and consistency of data a set of measures were taken into account for strengthening accuracy of data as follows: - Preparing data entry program before data collection for checking readiness of the program for data entry. - A set of validation rules were applied on the program for checking consistency of data. - Efficiency of the program was checked through pre-testing in entering few questionnaires, including incorrect information for checking its efficiency, in capturing these information. - Well trained data keyers were selected and trained for the main data entry. - Weekly or biweekly data files were received by project management for checking accuracy and consistency, notes of correction are provided for data entry management for correction.
Response rate: 84.5%
Non-Statistical Errors These types of errors could appear on one or all the survey stages that include data collection and data entry.
Response errors: these types of errors are related to, responders, fieldworkers, and data entry personnel's. And to avoid mistakes and reduce the impact has been a series of actions that would enhance the accuracy of the data through a process of data collection from the field and the data processing.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data sharing and presence of decision errors per paper.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Geometric morphometric analyses are frequently employed to quantify biological shape and shape variation. Despite the popularity of this technique, quantification of measurement error in geometric morphometric datasets and its impact on statistical results is seldom assessed in the literature. Here, we evaluate error on 2D landmark coordinate configurations of the lower first molar of five North American Microtus (vole) species. We acquired data from the same specimens several times to quantify error from four data acquisition sources: specimen presentation, imaging device, interobserver variation, and intraobserver variation. We then evaluated the impact of those errors on linear discriminant analysis-based classifications of the five species using recent specimens of known species affinity and fossil specimens of unknown species affinity. Results indicate that data acquisition error can be substantial, sometimes explaining >30% of the total variation among datasets. Comparisons of datasets digitized by different individuals exhibit the greatest discrepancies in landmark precision and comparison of datasets photographed from different presentation angles yield the greatest discrepancies in species classification results. All error sources impact statistical classification to some extent. For example, no two landmark dataset replicates exhibit the same predicted group memberships of recent or fossil specimens. Our findings emphasize the need to mitigate error as much as possible during geometric morphometric data collection. Though the impact of measurement error on statistical fidelity is likely analysis-specific, we recommend that all geometric morphometric studies standardize specimen imaging equipment, specimen presentations (if analyses are 2D), and landmark digitizers to reduce error and subsequent analytical misinterpretations.
Methods The following methodological description is adapted from the "Study design", "Data preparation", and "Quantifying measurement error impacts on classification statistics" methods subsections of the associated manuscript (Fox et al. in press):
We replicated 2D digital specimen images (n=247) and m1 landmark configurations (n=21 landmarks) of (McGuire, 2011) to quantify measurement error from four data acquisition sources and its impact on Microtus species classification. All photographed specimens are from the University of California Museum of Vertebrate Zoology (MVZ); see Appendix I of (McGuire, 2011) for a list of the recent Microtus specimens included. We were unable to acquire four of the 251 original specimens from McGuire (2011) (MVZ: 68521, 83519, 96735, 99283) so the final number of individuals analyzed per species is as follows: M. californicus (n=49), M. longicaudus (n=49), M. montanus (n=48), M. oregoni (n=50), M. townsendii (n=51). Each phase of landmark data acquisition (i.e., specimen presentation, specimen imaging, and inter/intra observer digitization) was repeated to quantify error from those sources. Our study design for quantifying error from each source was as follows:
Imaging device - We assembled two datasets using specimen images obtained from two different cameras to evaluate inter-instrument variation (hereafter “imaging device” or simply “device” variation). The first image set included the original Microtus dentary images photographed with a Nikon D70s (hereafter Nikon) from (McGuire, 2011). The second image set included the same specimens photographed with a Dino-Lite Edge AM4815ZTL Digital Microscope (hereafter Dino-Lite). Efforts were made to replicate the original Nikon specimen orientations, especially projected angles of occlusal tooth surfaces and specimen distances from the camera lens, to minimize presentation error during this iteration. However, presentation error is necessarily a residual component of imaging device error in 2D systems no matter what measures are taken to control it.
Specimen presentation - After an initial Dino-Lite photograph was taken, each Microtus specimen was tilted haphazardly along its anteroposterior and/or labiolingual axis and re-photographed with all landmark loci still visible. This was done to simulate specimen orientation changes that may occur when comparing dissimilar specimens such as in situ teeth and isolated teeth. That scenario is not uncommon when comparing fossil specimens to recent specimens since complete preservation of fossilized craniodental remains is rare. When loose m1s were available from recent Microtus specimens, those teeth were photographed in isolation rather than in situ during this iteration. We note, however, that intentionally tilting specimens potentially exacerbates presentation error relative to the amount of error typically introduced when specimen orientations are standardized. The intent of this modification is to quantify potential presentation error rather than expected error since presentation error will vary by study (Fruciano, 2016).
Inter/intra observer error - To quantify observer variation, the original Nikon Microtus m1 images and Dino-Lite resampled images were digitized by two observers using the 21-landmark protocol of Wallace (2006) and McGuire (2011). Those observers allowed us to evaluate methodological experience since one observer, hereafter referred to as the experienced observer (EO), had previous experience conducting 2D landmark analyses at the time this study was initiated while the other observer, hereafter referred to as the new observer (NO), did not. Each image set was then digitized a second time by the EO and NO with at least one week between iterations to evaluate intraobserver variation on landmark placement.
Nine unique landmark datasets were assembled in total to evaluate measurement error from the four focal data acquisition sources. First, Nikon and Dino-Lite image sets were assembled to quantify imaging device variation. Those image sets were digitized twice by each observer to evaluate inter and intraobserver error (two image sets and two digitizing iterations per observer = eight datasets). A “tilted” Dino-Lite image set was then assembled and digitized by the EO to quantify data variation due to changes in specimen presentation resulting in a total of nine datasets. All image sets were assembled and digitized using TpsUtil 32 (Rohlf, 2018a) and TpsDig 2.32 (Rohlf, 2018b) software respectively. Each landmark dataset was superimposed via Generalized Procrustes Analysis (GPA) to standardize effects of rotation, orientation, and scale among specimens using the gpagen function in the R package “geomorph”(version 3.1.3, Adams et al., 2019). During GPA, all specimens are translated to the origin, scaled to unit-centroid size, and optimally rotated via a generalized least-squares algorithm to align them along a common coordinate system (Rohlf and Slice, 1990).
To determine how source-specific measurement error impacts Microtus species classification, we ran linear discrimant analyses on each of the nine GPA-transformed landmark datasets using the lda function in the R package “MASS” (version 7.3, Venables and Ripley, 2002). Forty-two x, y coordinates from the 21 digitized landmarks were used as predictor variables to classify each specimen into a predicted species group. We used leave-one-out cross-validation to determine the percentage of specimens correctly classified within their respective species groups since it reduces standard LDA-group overfitting (Kovarovic et al., 2011). Prior probabilities of group membership were assigned using the default lda argument based on the proportion of group samples which, in this case, are nearly equal due to similar sample sizes among species. Linear discriminant analysis predicted group membership (PGM) error percentages were calculated for each landmark dataset by dividing the number of misclassified individuals across all five species by the total number of individuals (n=247) multiplied by 100.
Next, a set of 31 fossil Microtus m1 images of unknown species identity was digitized by the EO, using the same 21-landmark protocol, and appended to each dataset of recent Microtus specimens to evaluate error impacts on the PGM of unknown specimens. Fossil specimens included mostly isolated m1s and were photographed with the same Dino-Lite camera as recent Microtus specimens. Each of the nine recent Microtus landmark datasets served as a unique discriminant function training set to classify the unknown fossils into one or more of the five extant species groups. All fossil specimens are from Project 23, Deposit 1, at Rancho La Brea in Los Angeles, CA and are late Pleistocene in age (~46,000 to ~31,000 radiocarbon years before present (Fox et al., 2019; Fuller et al., 2020)). Due to their geographic and temporal location, it is unlikely that the fossils belong to a species of Microtus other than the five included in our LDA training sets. Linear discriminant analyses were run on landmark coordinate variables of each dataset with fossils entered as unknowns.
References:
Adams, D.C., Collyer, M.L., Kaliontzopoulou, A., 2019. Geomorph: Software for geometric morphometric analyses. R package version 3.1.0.
Fox, N.S., Takeuchi, G.T., Farrell, A.B., Blois, J.L., 2019. A protocol for differentiating late Quaternary leporids in southern California with remarks on Project 23 lagomorphs at Rancho La Brea, Los Angeles, California, USA. PaleoBios 36, 1–20.
Fox, N.S., Veneracion, J.J., Blois, J.L. (in press). Are geometric morphometric analyses replicable? Evaluating landmark measurement error and its impact on extant and fossil Microtus classification. Ecology and Evolution.
Fruciano, C., 2016. Measurement error in geometric morphometrics. Development Genes and Evolution 226, 139–158. https://doi.org/10.1007/s00427-016-0537-4
Fuller, B.T., Southon, J.R., Fahrni, S.M., Farrell, A.B., Takeuchi, G.T., Nehlich, O., Guiry, E.J., Richards, M.P., Lindsey,
The Livestock Survey, 2013 aims to provide data on the structure of the livestock sector as the basis for formulating future policies and plans for development. It will also update existing data on agricultural holdings from the Agricultural Census of 2010 and build a database that will facilitate the collection of agricultural data in the future via administrative records
Palestine
Agricultural holding
All animal and mixed holdings in Palestine during 2013.
Sample survey data [ssd]
Sampling Frame The animal and mixed agricultural holdings frame was created from the agricultural census data of 2010 and extracted based on the following criteria: any number of cattle or camels, at least five sheep or goats, at least 50 poultry birds (layers and broilers), or 50 rabbits, or other poultry like turkeys, ducks, common quail, or a mixture of them, or at least three beehives controlled by the holder.
A master sample of 7,297 holdings from the animal and mixed holdings frame was updated prior to sample selection.
Sample Size The estimated sample size is 5,000 holdings.
Sample Design
The sample is a one-stage stratified systematic random sample.
Sample Strata The animal and mixed holdings are stratified into three levels, which are: 1. Governorates. 2. The main agricultural activities were identified by the highest holding size in the category: these activities are the raising cattle, raising sheep and goats, raising camels, poultry farming, beehives, mixed animals. The size of the holdings were classified into five categories
Face-to-face [f2f]
The questionnaire for the Livestock Survey 2013 was designed based on the recommendations of the Food and Agriculture Organization of the United Nations (FAO) and the questionnaire used for the Agricultural Census of 2010. The special situation of Palestine was taken into account, in addition to the specific requirements of the technical phase of field work and of data processing and analysis The questionnaire consisted of the main items as follows: Identification data: Indicators about the holder, the holding and the respondent.
Data on holder: Included indicators on the sex, age, educational attainment, number in household, legal status of holder, and other indicators.
Holding data: Included indicators on the type of holding, tenure, main purpose of production, and other indicators.
Livestock data: Included indicators on the type, number, strain, age, sex, system of raising, main purpose of raising, number acquired or disposed of, quantity and value production, slaughtered in a holding, value of slaughtered, and other indicators.
Poultry data: Included indicators on the type, area of worked barns, average cycles per year, system of raising, quantity and value production, and other indicators.
Domestic poultry & equines data: Included indicators on type and number.
Beehive data: Included indicators such as the type, number, strain, quantity and value of production??.
Agricultural practices data: Included indicators on agricultural practices for livestock, poultry and bees.
Agricultural labor force data: Included indicators on the agricultural labor force in a holding such as the number, employment status, sex, age, average daily working hours, number of work days in an agricultural year and average daily wage.
Agricultural machinery and equipment: Included indicators on the number and source of machinery. Agricultural buildings data: Included indicators on the type and area of building.
Animal intermediate consumption: Included indicators on the type, quantity and value of animal intermediate consumption.
Preparation of Data Entry Program The data entry program was prepared using Oracle software and data entry screens were designed. Rules of data entry were established to guarantee successful entry of questionnaires and queries were used to check data after each entry. These queries examined variables on the questionnaire.
2.5.2 Data Entry Having designed the data entry program and tested it to verify readiness, and after training staff on data entry programs, data entry began on 4 November 2013 and finished on 8 January 2014 with 15 staff engaged in the data entry process.
2.5.3 Editing of Entered Data Special rules were formulated for editing the stored data to guarantee reliability and ensure accurate and clean data.
2.5.4 Results Extraction and Data Tabulation An SPSS program was used for extracting the results and empty tables were prepared in advance to facilitate the tabulation process. The report tables were formulated based on international recommendations, while taking the Palestinian situation into consideration in the data tabulation of the survey.
Response rate was 94.3%
Includes multiple aspects of data quality, beginning with the initial planning of the survey up to the final publication, plus how to understand and use the data. There are seven dimensions of statistical quality: relevance, accuracy, timeliness, accessibility, comparability, coherence, and completeness.
2.6.1 Data Accuracy
Includes checking the accuracy of data in multiple aspects, primarily statistical errors due to the use of a sample, as well as errors due to non-statistical staff and survey tools, in addition to response rates in the survey and the most important effects on estimates. This section includes the following:
Statistical Errors Survey data may be affected by sampling errors resulting from the use of a sample instead of a census. Variance estimation was carried out for the main estimates and the results were acceptable within the publishing domains as shown in the tables of variance estimation.
Non-sampling Errors Non-statistical errors are probable in all stages of the project, during data collection and processing. These are referred to as non-response errors, interviewing errors, and data entry errors. To avoid and reduce the impact of these errors, efforts were exerted through intensive training on how to conduct interviews and factors to be followed and avoided during the interview, in addition to practical and theoretical exercises. Re-interview survey was conducted for 5% of the main survey and re-interview data proved that there is high level of consistency with the main indicators.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
This dataset originates from a research project aimed at exploring the dual-edged effects of algorithmic norm pressure on gig workers' service performance. The dataset consists of two parts: scenario experiment data and field survey data.1. Data Generation Process and Processing MethodsScenario Experiment:In Study 1, we designed scenario materials representing high and low algorithmic norm pressure conditions. Participants' stress responses, job crafting behaviors, and service performance were collected through an online experimental platform. The dataset includes participants' scenario response scores along with other variable ratings.Field Survey:In Study 2, we conducted a field survey to collect self-reported and peer-reported data from gig workers on algorithmic transparency, online community support, job crafting behavior, and service performance.2. Dataset OverviewTime and Geographic Scope:Data collection was conducted between October 2023 and January 2024, covering major cities such as Beijing, Shanghai, and Guangzhou.Number of Records:Scenario experiment data: Over 300 participants' experimental records.Field survey data: Over 300 valid survey responses.3. Data StructureScenario Experiment Data: Includes variables such as participant ID, scenario condition, stress rating, job crafting behavior rating, and other variable scores.Survey Data: Includes variables such as participant ID, demographic information, algorithmic transparency rating, online community support rating, job crafting behavior rating, and service performance rating.Measurement Units: All ratings are based on a 7-point Likert scale, and the data is expressed as dimensionless scores.4. Data Missingness and Error ControlMissing Data: Some participants in the field survey did not complete all questions; these cases were removed from the dataset.Error Range: Since the data collection involves subjective ratings, participant biases may exist. Data cleaning and statistical analysis were conducted to minimize errors.5. Data File DescriptionScenario Experiment Data File: Stored in .xlsx format, including participant ID, scenario condition, stress rating, job crafting behavior rating, and other key variables. Additionally, two scenario videos are included.Survey Data File: Stored in .xlsx format, containing demographic information and variable scores.The dataset can be accessed and processed using common text editors or statistical software, such as Excel, SPSS, and Mplus.
The Tanzania Demographic and Health Survey (TDHS) is part of the worldwide Demographic and Health Surveys (DHS) programme, which is designed to collect data on fertility, family planning, and maternal and child health.
The primary objective of the 1999 TRCHS was to collect data at the national level (with breakdowns by urban-rural and Mainland-Zanzibar residence wherever warranted) on fertility levels and preferences, family planning use, maternal and child health, breastfeeding practices, nutritional status of young children, childhood mortality levels, knowledge and behaviour regarding HIV/AIDS, and the availability of specific health services within the community.1 Related objectives were to produce these results in a timely manner and to ensure that the data were disseminated to a wide audience of potential users in governmental and nongovernmental organisations within and outside Tanzania. The ultimate intent is to use the information to evaluate current programmes and to design new strategies for improving health and family planning services for the people of Tanzania.
National. The sample was designed to provide estimates for the whole country, for urban and rural areas separately, and for Zanzibar and, in some cases, Unguja and Pemba separately.
Sample survey data
The TRCHS used a three-stage sample design. Overall, 176 census enumeration areas were selected (146 on the Mainland and 30 in Zanzibar) with probability proportional to size on an approximately self-weighting basis on the Mainland, but with oversampling of urban areas and Zanzibar. To reduce costs and maximise the ability to identify trends over time, these enumeration areas were selected from the 357 sample points that were used in the 1996 TDHS, which in turn were selected from the 1988 census frame of enumeration in a two-stage process (first wards/branches and then enumeration areas within wards/branches). Before the data collection, fieldwork teams visited the selected enumeration areas to list all the households. From these lists, households were selected to be interviewed. The sample was designed to provide estimates for the whole country, for urban and rural areas separately, and for Zanzibar and, in some cases, Unguja and Pemba separately. The health facilities component of the TRCHS involved visiting hospitals, health centres, and pharmacies located in areas around the households interviewed. In this way, the data from the two components can be linked and a richer dataset produced.
See detailed sample implementation in the APPENDIX A of the final report.
Face-to-face
The household survey component of the TRCHS involved three questionnaires: 1) a Household Questionnaire, 2) a Women’s Questionnaire for all individual women age 15-49 in the selected households, and 3) a Men’s Questionnaire for all men age 15-59.
The health facilities survey involved six questionnaires: 1) a Community Questionnaire administered to men and women in each selected enumeration area; 2) a Facility Questionnaire; 3) a Facility Inventory; 4) a Service Provider Questionnaire; 5) a Pharmacy Inventory Questionnaire; and 6) a questionnaire for the District Medical Officers.
All these instruments were based on model questionnaires developed for the MEASURE programme, as well as on the questionnaires used in the 1991-92 TDHS, the 1994 TKAP, and the 1996 TDHS. These model questionnaires were adapted for use in Tanzania during meetings with representatives from the Ministry of Health, the University of Dar es Salaam, the Tanzania Food and Nutrition Centre, USAID/Tanzania, UNICEF/Tanzania, UNFPA/Tanzania, and other potential data users. The questionnaires and manual were developed in English and then translated into and printed in Kiswahili.
The Household Questionnaire was used to list all the usual members and visitors in the selected households. Some basic information was collected on the characteristics of each person listed, including his/her age, sex, education, and relationship to the head of the household. The main purpose of the Household Questionnaire was to identify women and men who were eligible for individual interview and children under five who were to be weighed and measured. Information was also collected about the dwelling itself, such as the source of water, type of toilet facilities, materials used to construct the house, ownership of various consumer goods, and use of iodised salt. Finally, the Household Questionnaire was used to collect some rudimentary information about the extent of child labour.
The Women’s Questionnaire was used to collect information from women age 15-49. These women were asked questions on the following topics: · Background characteristics (age, education, religion, type of employment) · Birth history · Knowledge and use of family planning methods · Antenatal, delivery, and postnatal care · Breastfeeding and weaning practices · Vaccinations, birth registration, and health of children under age five · Marriage and recent sexual activity · Fertility preferences · Knowledge and behaviour concerning HIV/AIDS.
The Men’s Questionnaire covered most of these same issues, except that it omitted the sections on the detailed reproductive history, maternal health, and child health. The final versions of the English questionnaires are provided in Appendix E.
Before the questionnaires could be finalised, a pretest was done in July 1999 in Kibaha District to assess the viability of the questions, the flow and logical sequence of the skip pattern, and the field organisation. Modifications to the questionnaires, including wording and translations, were made based on lessons drawn from the exercise.
In all, 3,826 households were selected for the sample, out of which 3,677 were occupied. Of the households found, 3,615 were interviewed, representing a response rate of 98 percent. The shortfall is primarily due to dwellings that were vacant or in which the inhabitants were not at home despite of several callbacks.
In the interviewed households, a total of 4,118 eligible women (i.e., women age 15-49) were identified for the individual interview, and 4,029 women were actually interviewed, yielding a response rate of 98 percent. A total of 3,792 eligible men (i.e., men age 15-59), were identified for the individual interview, of whom 3,542 were interviewed, representing a response rate of 93 percent. The principal reason for nonresponse among both eligible men and women was the failure to find them at home despite repeated visits to the household. The lower response rate among men than women was due to the more frequent and longer absences of men.
The response rates are lower in urban areas due to longer absence of respondents from their homes. One-member households are more common in urban areas and are more difficult to interview because they keep their houses locked most of the time. In urban settings, neighbours often do not know the whereabouts of such people.
The estimates from a sample survey are affected by two types of errors: (1) non-sampling errors, and (2) sampling errors. Non-sampling errors are the results of mistakes made in implementing data collection and data processing, such as failure to locate and interview the correct household, misunderstanding of the questions on the part of either the interviewer or the respondent, and data entry errors. Although numerous efforts were made during the implementation of the TRCHS to minimise this type of error, nonsampling errors are impossible to avoid and difficult to evaluate statistically.
Sampling errors, on the other hand, can be evaluated statistically. The sample of respondents selected in the TRCHS is only one of many samples that could have been selected from the same population, using the same design and expected size. Each of these samples would yield results that differ somewhat from the results of the actual sample selected. Sampling errors are a measure of the variability between all possible samples. Although the degree of variability is not known exactly, it can be estimated from the survey results.
A sampling error is usually measured in terms of the standard error for a particular statistic (mean, percentage, etc.), which is the square root of the variance. The standard error can be used to calculate confidence intervals within which the true value for the population can reasonably be assumed to fall. For example, for any given statistic calculated from a sample survey, the value of that statistic will fall within a range of plus or minus two times the standard error of that statistic in 95 percent of all possible samples of identical size and design.
If the sample of respondents had been selected as a simple random sample, it would have been possible to use straightforward formulas for calculating sampling errors. However, the TRCHS sample is the result of a two-stage stratified design, and, consequently, it was necessary to use more complex formulae. The computer software used to calculate sampling errors for the TRCHS is the ISSA Sampling Error Module (SAMPERR). This module used the Taylor linearisation method of variance estimation for survey estimates that are means or proportions. The Jackknife repeated replication method is used for variance estimation of more complex statistics such as fertility and mortality rate
Note: See detailed sampling error calculation in the APPENDIX B
The 2005 Republic of Palau Census of Population and Housing will be used to give a snapshot of Republic of Palau's population and housing at the mid-point of the decade. This Census is also important because it measures the population at the beginning of the implementation of the Compact of Free Association. The information collected in the census is needed to plan for the needs of the population. The government uses the census figures to allocate funds for public services in a wide variety of areas, such as education, housing, and job training. The figures also are used by private businesses, academic institutions, local organizations, and the public in general to understand who we are and what our situation is, in order to prepare better for our future needs.
The fundamental purpose of a census is to provide information on the size, distribution and characteristics of a country's population. The census data are used for policymaking, planning and administration, as well as in management and evaluation of programmes in education, labour force, family planning, housing, health, transportation and rural development. A basic administrative use is in the demarcation of constituencies and allocation of representation to governing bodies. The census is also an invaluable resource for research, providing data for scientific analysis of the composition and distribution of the population and for statistical models to forecast its future growth. The census provides business and industry with the basic data they need to appraise the demand for housing, schools, furnishings, food, clothing, recreational facilities, medical supplies and other goods and services.
A hierarchical geographic presentation shows the geographic entities in a superior/subordinate structure in census products. This structure is derived from the legal, administrative, or areal relationships of the entities. The hierarchical structure is depicted in report tables by means of indentation. The following structure is used for the 2005 Census of the Republic of Palau:
Republic of Palau State Hamlet/Village Enumeration District Block
Individuals Families Households General Population
The Census covered all the households and respective residents in the entire country.
Census/enumeration data [cen]
Not applicable to a full enumeration census.
Face-to-face [f2f]
The 2005 Palau Census of Population and Housing comprises three parts: 1. Housing - one form for each household 2. Population - one for for each member of the household 3. People who have left home - one form for each household.
Full scale processing and editing activiities comprised eight separate sessions either with or separately but with remote guidance of the U.S. Census Bureau experts to finalize all datasets for publishing stage.
Processing operation was handled with care to produce a set of data that describes the population as clearly and accurately as possible. To meet this objective, questionnaires were reviewed and edited during field data collection operations by crew leaders for consistency, completeness, and acceptability. Questionnaires were also reviewed by census clerks in the census office for omissions, certain inconsistencies, and population coverage. For example, write-in entries such as "Don't know" or "NA" were considered unacceptable in certain quantities and/or in conjunction with other data omissions.
As a result of this review operation, a telephone or personal visit follow-up was made to obtain missing information. Potential coverage errors were included in the follow-up, as well as questionnaires with omissions or inconsistencies beyond the completeness and quality tolerances specified in the review procedures.
Subsequent to field operations, remaining incomplete or inconsistent information on the questionnaires was assigned using imputation procedures during the final automated edit of the collected data. Allocations, or computer assignments of acceptable data in place of unacceptable entries or blanks, were needed most often when an entry for a given item was lacking or when the information reported for a person or housing unit on that item was inconsistent with other information for that same person or housing unit. As in previous censuses, the general procedure for changing unacceptable entries was to assign an entry for a person or housing unit that was consistent with entries for persons or housing units with similar characteristics. The assignment of acceptable data in lace of blanks or unacceptable entries enhanced the usefulness of the data.
Another way to make corrections during the computer editing process is substitution. Substitution is the assignment of a full set of characteristics for a person or housing unit. Because of the detailed field operations, substitution was not needed for the 2005 Census.
Sampling Error is not applicable to full enumeration censuses.
In any large-scale statistical operation, such as the 2005 Census of the Republic of Palau, human- and machine-related errors were anticipated. These errors are commonly referred to as nonsampling errors. Such errors include not enumerating every household or every person in the population, not obtaining all required information form the respondents, obtaining incorrect or inconsistent information, and recording information incorrectly. In addition, errors can occur during the field review of the enumerators' work, during clerical handling of the census questionnaires, or during the electronic processing of the questionnaires.
To reduce various types of nonsampling errors, a number of techniques were implemented during the planning, data collection, and data processing activities. Quality assurance methods were used throughout the data collection and processing phases of the census to improve the quality of the data.
The 2013 NDHS is designed to provide information on fertility, family planning, and health in the country for use by the government in monitoring the progress of its programs on population, family planning and health.
In particular, the 2013 NDHS has the following specific objectives: • Collect data which will allow the estimation of demographic rates, particularly fertility rates and under-five mortality rates by urban-rural residence and region. • Analyze the direct and indirect factors which determine the level and patterns of fertility. • Measure the level of contraceptive knowledge and practice by method, urban-rural residence, and region. • Collect data on health, immunizations, prenatal and postnatal check-ups, assistance at delivery, breastfeeding, and prevalence and treatment of diarrhea, fever and acute respiratory infections among children below five years old. • Collect data on environmental health, utilization of health facilities, health care financing, prevalence of common non-communicable and infectious diseases, and membership in the National Health Insurance Program (PhilHealth). • Collect data on awareness of cancer, heart disease, diabetes, dengue fever and tuberculosis. • Determine the knowledge of women about AIDS, and the extent of misconception on HIV transmission and access to HIV testing. • Determine the extent of violence against women.
National coverage
Sample survey data [ssd]
The sample selection methodology for the 2013 NDHS is based on a stratified two-stage sample design, using the 2010 Census of Population and Housing (CPH) as a frame. The first stage involved a systematic selection of 800 sample enumeration areas (EAs) distributed by stratum (region, urban/rural). In the second stage, 20 sample housing units were selected from each sample EA, using systematic random sampling.
All households in the sampled housing units were interviewed. An EA is defined as an area with discern able boundaries consisting of contiguous households. The sample was designed to provide data representative of the country and its 17 administrative regions.
Further details on the sample design and implementation are given in Appendix A of the final report.
Face-to-face [f2f]
The 2013 NDHS used three questionnaires: Household Questionnaire, Individual Woman’s Questionnaire, and Women’s Safety Module. The development of these questionnaires resulted from the solicited comments and suggestions during the deliberation in the consultative meetings and separate meetings conducted with the various agencies/organizations namely: PSA-NSO, POPCOM, DOH, FNRI, ICF International, NEDA, PCW, PhilHealth, PIDS, PLCPD, UNFPA, USAID, UPPI, UPSE, and WHO. The three questionnaires were translated from English into six major languages - Tagalog, Cebuano, Ilocano, Bicol, Hiligaynon, and Waray.
The main purpose of the Household Questionnaire was to identify female members of the sample household who were eligible for interview with the Individual Woman’s Questionnaire and the Women’s Safety Module.
The Individual Woman’s Questionnaire was used to collect information from all women aged 15-49 years.
The Women’s Safety Module was used to collect information on domestic violence in the country, its prevalence, severity and frequency from only one selected respondent from among all the eligible women who were identified from the Household Questionnaire.
All completed questionnaires and the control forms were returned to the PSA-NSO central office in Manila for data processing, which consisted of manual editing, data entry and verification, and editing of computer-identified errors. An ad-hoc group of thirteen regular employees from the DSSD, the Information Resources Department (IRD), and the Information Technology Operations Division (ITOD) of the NSO was created to work fulltime and oversee data processing operation in the NDHS Data Processing Center that was carried out at the NSO-CVEA Building in Quezon City, Philippines. This group was responsible for the different aspects of NDHS data processing. There were 19 data encoders hired to process the data who underwent training on September 12-13, 2013.
Data entry started on September 16, 2013. The computer package program called Census and Survey Processing System (CSPro) was used for data entry, editing, and verification. Mr. Alexander Izmukhambetov, a data processing specialist from ICF International, spent two weeks at NSO in September 2013 to finalize the data entry program. Data processing was completed on December 6, 2013.
For the 2013 NDHS sample, 16,732 households were selected, of which 14,893 were occupied. Of these households, 14,804 were successfully interviewed, yielding a household response rate of 99.4 percent. The household response rates in urban and rural areas are almost identical.
Among the households interviewed, 16,437 women were identified as eligible respondents, and the interviews were completed for 16,155 women, yielding a response rate of 98.3 percent. On the other hand, for the women’s safety module, from a total of 11,373 eligible women, 10,963 were interviewed with privacy, translating to a 96.4 percent response rate. At the individual level, urban and rural response rates showed no difference. The principal reason for non-response among women was the failure to find individuals at home, despite interviewers’ repeated visits to the household.
The estimates from a sample survey are affected by two types of errors: (1) nonsampling errors and (2) sampling errors. Nonsampling errors are the results of mistakes made in implementing data collection and data processing, such as failure to locate and interview the correct household, misunderstanding of the questions on the part of either the interviewer or the respondent, and data entry errors. Although numerous efforts were made during the implementation of the 2013 National Demographic and Health Survey (NDHS) to minimize this type of error, nonsampling errors are impossible to avoid and difficult to evaluate statistically.
Sampling errors, on the other hand, can be evaluated statistically. The sample of respondents selected in the 2013 NDHS is only one of many samples that could have been selected from the same population, using the same design and identical size. Each of these samples would yield results that differ somewhat from the results of the actual sample selected. Sampling error is a measure of the variability between the results of all possible samples. Although the degree of variability is not known exactly, it can be estimated from the survey data.
A sampling error is usually measured in terms of the standard error for a particular statistic (mean, percentage, etc.), which is the square root of the variance. The standard error can be used to calculate confidence intervals within which the true value for the population can reasonably be assumed to fall. For example, for any given statistic calculated from a sample survey, the value of that statistic will fall within a range of plus or minus two times the standard error of that statistic in 95 percent of all possible samples of identical size and design.
If the sample of respondents had been selected as a simple random sample, it would have been possible to use straightforward formulas for calculating sampling errors. However, the 2013 NDHS sample is the result of a multistage stratified design, and, consequently, it was necessary to use more complex formulae. The computer software used to calculate sampling errors for the 2013 NDHS is a SAS program. This program used the Taylor linearization method for variance estimation for survey estimates that are means or proportions. The Jackknife repeated replications method is used for variance estimation of more complex statistics such as fertility and mortality rates.
The Taylor linearization method treats any percentage or average as a ratio estimate, r = y/x, where y represents the total sample value for variable y, and x represents the total number of weighted cases in the group or subgroup under consideration.
Further details on sampling errors calculation are given in Appendix B of the final report.
Data quality tables were produced to review the quality of the data: - Household age distribution - Age distribution of eligible and interviewed women - Completeness of reporting - Births by calendar years - Reporting of age at death in days - Reporting of age at death in months
Note: The tables are presented in APPENDIX C of the final report.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract The aim of this work was to assess the knowledge of students' errors in statistical inference in a group of Spanish prospective High and Secondary School teachers. This knowledge is part of the cognitive facet of teachers' didactic-mathematical knowledge. A sample of seventy prospective teachers was asked to describe their students' most likely error in carrying out a statistical test and a confidence interval, after having solved themselves a problem of each type. The responses are classified in agreement with the different steps in the procedures (selecting a procedure, problem setting, conceptual, procedural, and interpretative errors). The categories in each of these stages are determined by considering the errors described in research on understanding statistical inference. The results suggest a medium knowledge of the most frequent errors by prospective teachers; however, there is a general lack of precision and little awareness of errors linked to the significance level and p-value.
The Tanzania Demographic and Health Survey (TDHS) is part of the worldwide Demographic and Health Surveys (DHS) programme, which is designed to collect data on fertility, family planning, and maternal and child health.
The primary objective of the 1999 TRCHS was to collect data at the national level (with breakdowns by urban-rural and Mainland-Zanzibar residence wherever warranted) on fertility levels and preferences, family planning use, maternal and child health, breastfeeding practices, nutritional status of young children, childhood mortality levels, knowledge and behaviour regarding HIV/AIDS, and the availability of specific health services within the community.1 Related objectives were to produce these results in a timely manner and to ensure that the data were disseminated to a wide audience of potential users in governmental and nongovernmental organisations within and outside Tanzania. The ultimate intent is to use the information to evaluate current programmes and to design new strategies for improving health and family planning services for the people of Tanzania.
National. The sample was designed to provide estimates for the whole country, for urban and rural areas separately, and for Zanzibar and, in some cases, Unguja and Pemba separately.
Households, individuals
Men and women 15-49, children under 5
Sample survey data
The TRCHS used a three-stage sample design. Overall, 176 census enumeration areas were selected (146 on the Mainland and 30 in Zanzibar) with probability proportional to size on an approximately self-weighting basis on the Mainland, but with oversampling of urban areas and Zanzibar. To reduce costs and maximise the ability to identify trends over time, these enumeration areas were selected from the 357 sample points that were used in the 1996 TDHS, which in turn were selected from the 1988 census frame of enumeration in a two-stage process (first wards/branches and then enumeration areas within wards/branches). Before the data collection, fieldwork teams visited the selected enumeration areas to list all the households. From these lists, households were selected to be interviewed. The sample was designed to provide estimates for the whole country, for urban and rural areas separately, and for Zanzibar and, in some cases, Unguja and Pemba separately. The health facilities component of the TRCHS involved visiting hospitals, health centres, and pharmacies located in areas around the households interviewed. In this way, the data from the two components can be linked and a richer dataset produced.
See detailed sample implementation in the APPENDIX A of the final report.
Face-to-face
The household survey component of the TRCHS involved three questionnaires: 1) a Household Questionnaire, 2) a Women’s Questionnaire for all individual women age 15-49 in the selected households, and 3) a Men’s Questionnaire for all men age 15-59.
The health facilities survey involved six questionnaires: 1) a Community Questionnaire administered to men and women in each selected enumeration area; 2) a Facility Questionnaire; 3) a Facility Inventory; 4) a Service Provider Questionnaire; 5) a Pharmacy Inventory Questionnaire; and 6) a questionnaire for the District Medical Officers.
All these instruments were based on model questionnaires developed for the MEASURE programme, as well as on the questionnaires used in the 1991-92 TDHS, the 1994 TKAP, and the 1996 TDHS. These model questionnaires were adapted for use in Tanzania during meetings with representatives from the Ministry of Health, the University of Dar es Salaam, the Tanzania Food and Nutrition Centre, USAID/Tanzania, UNICEF/Tanzania, UNFPA/Tanzania, and other potential data users. The questionnaires and manual were developed in English and then translated into and printed in Kiswahili.
The Household Questionnaire was used to list all the usual members and visitors in the selected households. Some basic information was collected on the characteristics of each person listed, including his/her age, sex, education, and relationship to the head of the household. The main purpose of the Household Questionnaire was to identify women and men who were eligible for individual interview and children under five who were to be weighed and measured. Information was also collected about the dwelling itself, such as the source of water, type of toilet facilities, materials used to construct the house, ownership of various consumer goods, and use of iodised salt. Finally, the Household Questionnaire was used to collect some rudimentary information about the extent of child labour.
The Women’s Questionnaire was used to collect information from women age 15-49. These women were asked questions on the following topics: · Background characteristics (age, education, religion, type of employment) · Birth history · Knowledge and use of family planning methods · Antenatal, delivery, and postnatal care · Breastfeeding and weaning practices · Vaccinations, birth registration, and health of children under age five · Marriage and recent sexual activity · Fertility preferences · Knowledge and behaviour concerning HIV/AIDS.
The Men’s Questionnaire covered most of these same issues, except that it omitted the sections on the detailed reproductive history, maternal health, and child health. The final versions of the English questionnaires are provided in Appendix E.
Before the questionnaires could be finalised, a pretest was done in July 1999 in Kibaha District to assess the viability of the questions, the flow and logical sequence of the skip pattern, and the field organisation. Modifications to the questionnaires, including wording and translations, were made based on lessons drawn from the exercise.
In all, 3,826 households were selected for the sample, out of which 3,677 were occupied. Of the households found, 3,615 were interviewed, representing a response rate of 98 percent. The shortfall is primarily due to dwellings that were vacant or in which the inhabitants were not at home despite of several callbacks.
In the interviewed households, a total of 4,118 eligible women (i.e., women age 15-49) were identified for the individual interview, and 4,029 women were actually interviewed, yielding a response rate of 98 percent. A total of 3,792 eligible men (i.e., men age 15-59), were identified for the individual interview, of whom 3,542 were interviewed, representing a response rate of 93 percent. The principal reason for nonresponse among both eligible men and women was the failure to find them at home despite repeated visits to the household. The lower response rate among men than women was due to the more frequent and longer absences of men.
The response rates are lower in urban areas due to longer absence of respondents from their homes. One-member households are more common in urban areas and are more difficult to interview because they keep their houses locked most of the time. In urban settings, neighbours often do not know the whereabouts of such people.
The estimates from a sample survey are affected by two types of errors: (1) non-sampling errors, and (2) sampling errors. Non-sampling errors are the results of mistakes made in implementing data collection and data processing, such as failure to locate and interview the correct household, misunderstanding of the questions on the part of either the interviewer or the respondent, and data entry errors. Although numerous efforts were made during the implementation of the TRCHS to minimise this type of error, nonsampling errors are impossible to avoid and difficult to evaluate statistically.
Sampling errors, on the other hand, can be evaluated statistically. The sample of respondents selected in the TRCHS is only one of many samples that could have been selected from the same population, using the same design and expected size. Each of these samples would yield results that differ somewhat from the results of the actual sample selected. Sampling errors are a measure of the variability between all possible samples. Although the degree of variability is not known exactly, it can be estimated from the survey results.
A sampling error is usually measured in terms of the standard error for a particular statistic (mean, percentage, etc.), which is the square root of the variance. The standard error can be used to calculate confidence intervals within which the true value for the population can reasonably be assumed to fall. For example, for any given statistic calculated from a sample survey, the value of that statistic will fall within a range of plus or minus two times the standard error of that statistic in 95 percent of all possible samples of identical size and design.
If the sample of respondents had been selected as a simple random sample, it would have been possible to use straightforward formulas for calculating sampling errors. However, the TRCHS sample is the result of a two-stage stratified design, and, consequently, it was necessary to use more complex formulae. The computer software used to calculate sampling errors for the TRCHS is the ISSA Sampling Error Module (SAMPERR). This module used the Taylor linearisation method of variance estimation for survey estimates that are means or proportions. The Jackknife repeated replication method is used for variance estimation of more complex statistics such as fertility and mortality rate
Note: See detailed sampling error
The oncology clinical trial market share is expected to increase by USD 4.22 billion from 2020 to 2025, and the market’s growth momentum will accelerate at a CAGR of 6.70%.
This oncology clinical trial market research report provides valuable insights on the post COVID-19 impact on the market, which will help companies evaluate their business approaches. Furthermore, this report extensively covers oncology clinical trial market segmentation by design (interventional, observational, and expanded access) and geography (North America, Europe, Asia, and ROW). The oncology clinical trial market report also offers information on several market vendors, including F. Hoffmann-La Roche Ltd., Icon Plc, IQVIA Holdings Inc., Medpace Holdings Inc., Merck and Co. Inc., Novartis AG, Novotech (Australia) Pty Ltd., Parexel International Corp., Pivotal S.L.U, and Syneos Health Inc. among others.
What will the Oncology Clinical Trial Market Size be During the Forecast Period?
Download the Free Report Sample to Unlock the Oncology Clinical Trial Market Size for the Forecast Period and Other Important Statistics
Oncology Clinical Trial Market: Key Drivers and Challenges
The increasing number of cancer cases across the globe is notably driving the oncology clinical trial market growth, although factors such as inefficient clinical trial design for oncology may impede market growth. Our research analysts have studied the historical data and deduced the key market drivers and the COVID-19 pandemic impact on the oncology clinical trial industry. The holistic analysis of the drivers will help in deducing end goals and refining marketing strategies to gain a competitive edge.
Key Oncology Clinical Trial Market Driver
The number of cancer cases is rising on a global level due to increased pollution and frequent changes in lifestyle. Exposure to carcinogens has increased as the global air quality index has degraded. Pharmaceutical companies and various government organizations are developing new and improved treatments for multiple types of cancers and scheduling oncology clinical trials to get the treatments approved by relevant agencies, which, in turn, will drive the demand for oncology clinical trials during the forecast period.
Key Oncology Clinical Trial Market Challenge
Clinical trials are subject to errors, and oncology clinical trials need to be thorough. The data collection method may be inefficient, and the researchers may not get full disclosure from patients participating in trials. This leads to misinformation resulting in wrongful conclusions, which hamper the oncology study for which the clinical trial was conducted. Other factors such as inappropriate selection of clinical trial candidates and error in clinical trial design may damage the study. Hence, the global market for oncology clinical trials can be restricted by inefficient clinical trial designs.
This oncology clinical trial market analysis report also provides detailed information on other upcoming trends and challenges that will have a far-reaching effect on the market growth. The actionable insights on the trends and challenges will help companies evaluate and develop growth strategies for 2021-2025.
Who are the Major Oncology Clinical Trial Market Vendors?
The report analyzes the market’s competitive landscape and offers information on several market vendors, including:
F. Hoffmann-La Roche Ltd.
Icon Plc
IQVIA Holdings Inc.
Medpace Holdings Inc.
Merck and Co. Inc.
Novartis AG
Novotech (Australia) Pty Ltd.
Parexel International Corp.
Pivotal S.L.U
Syneos Health Inc.
This statistical study of the oncology clinical trial market encompasses successful business strategies deployed by the key vendors. The oncology clinical trial market is fragmented and the vendors are deploying growth strategies such as focusing on product innovation and spending in research and development activities to compete in the market.
To make the most of the opportunities and recover from post COVID-19 impact, market vendors should focus more on the growth prospects in the fast-growing segments, while maintaining their positions in the slow-growing segments.
The oncology clinical trial market forecast report offers in-depth insights into key vendor profiles. The profiles include information on the production, sustainability, and prospects of the leading companies.
Which are the Key Regions for Oncology Clinical Trial Market?
For more insights on the market share of various regions Request for a FREE sample now!
40% of the market’s growth will originate from North America during the forecast period. The US and Canada are the key markets for oncology clinical trials in North America. However, the market growth rate in this region will be slower than the growth of the market in Asia and Europe.
This market research report entails detailed information on the competitive i
THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE PALESTINIAN CENTRAL BUREAU OF STATISTICS
The Palestinian Central Bureau of Statistics (PCBS) carried out four rounds of the Labor Force Survey 2006 (LFS). The survey rounds covered a total sample of about 30,380 households, and the number of completed questionaire was 26,605, which amounts to a sample of around 92,211 individuals aged 15 years and over.
The importance of this survey lies in that it focuses mainly on labour force key indicators, main characteristics of the employed, unemployed, underemployed and persons outside labour force, labour force according to level of education, distribution of the employed population by occupation, economic activity, place of work, employment status, hours and days worked and average daily wage in NIS for the employees.
The survey main objectives are: - To estimate the labor force and its percentage to the population. - To estimate the number of employed individuals. - To analyze labour force according to gender, employment status, educational level , occupation and economic activity. - To provide information about the main changes in the labour market structure and its socio economic characteristics. - To estimate the numbers of unemployed individuals and analyze their general characteristics. - To estimate the rate of working hours and wages for employed individuals in addition to analyze of other characteristics.
The raw survey data provided by the Statistical Agency were cleaned and harmonized by the Economic Research Forum, in the context of a major project that started in 2009. During which extensive efforts have been exerted to acquire, clean, harmonize, preserve and disseminate micro data of existing labor force surveys in several Arab countries.
Covering a representative sample on the region level (West Bank, Gaza Strip), the locality type (urban, rural, camp) and the governorates.
1- Household/family. 2- Individual/person.
The survey covered all Palestinian households who are a usual residence of the Palestinian Territory.
Sample survey data [ssd]
THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE PALESTINIAN CENTRAL BUREAU OF STATISTICS
The methodology was designed according to the context of the survey, international standards, data processing requirements and comparability of outputs with other related surveys.
All Palestinians aged 10 years or older living in the Palestinian Territory, excluding those living in institutions such as prisons or shelters.
The sampling frame consisted of a master sample of Enumeration Areas (EAs) selected from the population housing and establishment census 1997. The master sample consists of area units of relatively equal size (number of households), these units have been used as Primary Sampling Units (PSUs).
The sample is a two-stage stratified cluster random sample.
Stratification: Four levels of stratification were made:
The sample size in the first round consisted of 7,627 households, which amounts to a sample of around 23,334 persons aged 15 years and over. In the second round the sample consisted of 7,627 households, which amounts to a sample of around 23,004 persons aged 15 years and over, in the third round the sample consisted of 7,563 households, which amounts to a sample of around 22,729 persons aged 15 years and over. In the fourth round the sample consisted of 7,563 households; which amounts to a sample of around 23,144 persons aged 15 years and over. The sample size allowed for non-response and related losses. In addition, the average number of households selected in each cell was 16.
Each round of the Labor Force Survey covers all the 481 master sample areas. Basically, the areas remain fixed over time, but households in 50% of the EAs are replaced each round. The same household remains in the sample over 2 consecutive rounds, rests for the next two rounds and represented again in the sample for another and last two consecutive rounds before it is dropped from the sample. A 50 % overlap is then achieved between both consecutive rounds and between consecutive years (making the sample efficient for monitoring purposes). In earlier applications of the LFS (rounds 1 to 11); the rotation pattern used was different; requiring a household to remain in the sample for six consecutive rounds, then dropped. The objective of such a pattern was to increase the overlap between consecutive rounds. The new rotation pattern was introduced to reduce the burden on the households resulting from visiting the same household for six consecutive times.
Face-to-face [f2f]
One of the main survey tools is the questionnaire, the survey questionnaire was designed according to the International Labour Organization (ILO) recommendations. The questionnaire includes four main parts:
1. Identification Data:
The main objective for this part is to record the necessary information to identify the household, such as, cluster code, sector, type of locality, cell, housing number and the cell code.
3.Household Roster: This part involves demographic characteristics about the household, like number of persons in the household, date of birth, sex, educational level…etc.
Data editing took place at a number of stages through the processing including: 1. office editing and coding 2. during data entry 3. structure checking and completeness 4. structural checking of SPSS data files
The overall response rate for the survey was 87.5%
More information on the distribution of response rates by different survey rounds is available in Page 12 of the data user guide provided among the disseminated survey materials under a file named "Palestine 2006- Data User Guide (English).pdf".
Since the data reported here are based on a sample survey and not on a complete enumeration, they are subjected to sampling errors as well as non-sampling errors. Sampling errors are random outcomes of the sample design, and are, therefore, in principle measurable by the statistical concept of standard error. A description of the estimated standard errors and the effects of the sample design on sampling errors are provided in the annual report provided among the disseminated survey materials under a file named "Palestine 2006- LFS Annual Report (Arabic-English).pdf".
Non-sampling errors can occur at the various stages of survey implementation whether in data collection or in data processing. They are generally difficult to be evaluated statistically. They cover a wide range of errors, including errors resulting from non-response, sampling frame coverage, coding and classification, data processing, and survey response (both respondent and interviewer-related). The use of effective training and supervision and the careful design of questions have direct bearing on limiting the magnitude of non-sampling errors, and hence enhancing the quality of the resulting data. The following are possible sources of non-sampling errors:
• Errors due to non-response because households were away from home or refused to participate. The overall non response rate amounted to almost 12.4% which is relatively low; a much higher rates is rather common in an international perspective. The refusal rate was only 0.9%. It
The primary objective of the 2017 Indonesia Dmographic and Health Survey (IDHS) is to provide up-to-date estimates of basic demographic and health indicators. The IDHS provides a comprehensive overview of population and maternal and child health issues in Indonesia. More specifically, the IDHS was designed to: - provide data on fertility, family planning, maternal and child health, and awareness of HIV/AIDS and sexually transmitted infections (STIs) to help program managers, policy makers, and researchers to evaluate and improve existing programs; - measure trends in fertility and contraceptive prevalence rates, and analyze factors that affect such changes, such as residence, education, breastfeeding practices, and knowledge, use, and availability of contraceptive methods; - evaluate the achievement of goals previously set by national health programs, with special focus on maternal and child health; - assess married men’s knowledge of utilization of health services for their family’s health and participation in the health care of their families; - participate in creating an international database to allow cross-country comparisons in the areas of fertility, family planning, and health.
National coverage
The survey covered all de jure household members (usual residents), all women age 15-49 years resident in the household, and all men age 15-54 years resident in the household.
Sample survey data [ssd]
The 2017 IDHS sample covered 1,970 census blocks in urban and rural areas and was expected to obtain responses from 49,250 households. The sampled households were expected to identify about 59,100 women age 15-49 and 24,625 never-married men age 15-24 eligible for individual interview. Eight households were selected in each selected census block to yield 14,193 married men age 15-54 to be interviewed with the Married Man's Questionnaire. The sample frame of the 2017 IDHS is the Master Sample of Census Blocks from the 2010 Population Census. The frame for the household sample selection is the updated list of ordinary households in the selected census blocks. This list does not include institutional households, such as orphanages, police/military barracks, and prisons, or special households (boarding houses with a minimum of 10 people).
The sampling design of the 2017 IDHS used two-stage stratified sampling: Stage 1: Several census blocks were selected with systematic sampling proportional to size, where size is the number of households listed in the 2010 Population Census. In the implicit stratification, the census blocks were stratified by urban and rural areas and ordered by wealth index category.
Stage 2: In each selected census block, 25 ordinary households were selected with systematic sampling from the updated household listing. Eight households were selected systematically to obtain a sample of married men.
For further details on sample design, see Appendix B of the final report.
Face-to-face [f2f]
The 2017 IDHS used four questionnaires: the Household Questionnaire, Woman’s Questionnaire, Married Man’s Questionnaire, and Never Married Man’s Questionnaire. Because of the change in survey coverage from ever-married women age 15-49 in the 2007 IDHS to all women age 15-49, the Woman’s Questionnaire had questions added for never married women age 15-24. These questions were part of the 2007 Indonesia Young Adult Reproductive Survey Questionnaire. The Household Questionnaire and the Woman’s Questionnaire are largely based on standard DHS phase 7 questionnaires (2015 version). The model questionnaires were adapted for use in Indonesia. Not all questions in the DHS model were included in the IDHS. Response categories were modified to reflect the local situation.
All completed questionnaires, along with the control forms, were returned to the BPS central office in Jakarta for data processing. The questionnaires were logged and edited, and all open-ended questions were coded. Responses were entered in the computer twice for verification, and they were corrected for computer-identified errors. Data processing activities were carried out by a team of 34 editors, 112 data entry operators, 33 compare officers, 19 secondary data editors, and 2 data entry supervisors. The questionnaires were entered twice and the entries were compared to detect and correct keying errors. A computer package program called Census and Survey Processing System (CSPro), which was specifically designed to process DHS-type survey data, was used in the processing of the 2017 IDHS.
Of the 49,261 eligible households, 48,216 households were found by the interviewer teams. Among these households, 47,963 households were successfully interviewed, a response rate of almost 100%.
In the interviewed households, 50,730 women were identified as eligible for individual interview and, from these, completed interviews were conducted with 49,627 women, yielding a response rate of 98%. From the selected household sample of married men, 10,440 married men were identified as eligible for interview, of which 10,009 were successfully interviewed, yielding a response rate of 96%. The lower response rate for men was due to the more frequent and longer absence of men from the household. In general, response rates in rural areas were higher than those in urban areas.
The estimates from a sample survey are affected by two types of errors: (1) nonsampling errors and (2) sampling errors. Nonsampling errors result from mistakes made in implementing data collection and data processing, such as failure to locate and interview the correct household, misunderstanding the questions on the part of either the interviewer or the respondent, and data entry errors. Although numerous efforts were made during the implementation of the 2017 Indonesia Demographic and Health Survey (2017 IDHS) to minimize this type of error, nonsampling errors are impossible to avoid and difficult to evaluate statistically.
Sampling errors, on the other hand, can be evaluated statistically. The sample of respondents selected in the 2017 IDHS is only one of many samples that could have been selected from the same population, using the same design and identical size. Each of these samples would yield results that differ somewhat from the results of the actual sample selected. Sampling error is a measure of the variability among all possible samples. Although the degree of variability is not known exactly, it can be estimated from the survey results.
A sampling error is usually measured in terms of the standard error for a particular statistic (mean, percentage, etc.), which is the square root of the variance. The standard error can be used to calculate confidence intervals within which the true value for the population can reasonably be assumed to fall. For example, for any given statistic calculated from a sample survey, the value of that statistic will fall within a range of plus or minus two times the standard error of that statistic in 95 percent of all possible samples of identical size and design.
If the sample of respondents had been selected as a simple random sample, it would have been possible to use straightforward formulas for calculating sampling errors. However, the 2017 IDHS sample is the result of a multi-stage stratified design, and, consequently, it was necessary to use more complex formulas. The computer software used to calculate sampling errors for the 2017 IDHS is a STATA program. This program used the Taylor linearization method for variance estimation for survey estimates that are means or proportions. The Jackknife repeated replication method is used for variance estimation of more complex statistics such as fertility and mortality rates.
A more detailed description of estimates of sampling errors are presented in Appendix C of the survey final report.
Data Quality Tables - Household age distribution - Age distribution of eligible and interviewed women - Age distribution of eligible and interviewed men - Completeness of reporting - Births by calendar year - Reporting of age at death in days - Reporting of age at death in months
See details of the data quality tables in Appendix D of the survey final report.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Numerous studies demonstrating that statistical errors are common in basic science publications have led to calls to improve statistical training for basic scientists. In this article, we sought to evaluate statistical requirements for PhD training and to identify opportunities for improving biostatistics education in the basic sciences. We provide recommendations for improving statistics training for basic biomedical scientists, including: 1. Encouraging departments to require statistics training, 2. Tailoring coursework to the students’ fields of research, and 3. Developing tools and strategies to promote education and dissemination of statistical knowledge. We also provide a list of statistical considerations that should be addressed in statistics education for basic scientists.