Facebook
TwitterA random sample of households were invited to participate in this survey. In the dataset, you will find the respondent level data in each row with the questions in each column. The numbers represent a scale option from the survey, such as 1=Excellent, 2=Good, 3=Fair, 4=Poor. The question stem, response option, and scale information for each field can be found in the var "variable labels" and "value labels" sheets. VERY IMPORTANT NOTE: The scientific survey data were weighted, meaning that the demographic profile of respondents was compared to the demographic profile of adults in Bloomington from US Census data. Statistical adjustments were made to bring the respondent profile into balance with the population profile. This means that some records were given more "weight" and some records were given less weight. The weights that were applied are found in the field "wt". If you do not apply these weights, you will not obtain the same results as can be found in the report delivered to the Bloomington. The easiest way to replicate these results is likely to create pivot tables, and use the sum of the "wt" field rather than a count of responses.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
N.B. This is not real data. Only here for an example for project templates.
Project Title: Add title here
Project Team: Add contact information for research project team members
Summary: Provide a descriptive summary of the nature of your research project and its aims/focal research questions.
Relevant publications/outputs: When available, add links to the related publications/outputs from this data.
Data availability statement: If your data is not linked on figshare directly, provide links to where it is being hosted here (i.e., Open Science Framework, Github, etc.). If your data is not going to be made publicly available, please provide details here as to the conditions under which interested individuals could gain access to the data and how to go about doing so.
Data collection details: 1. When was your data collected? 2. How were your participants sampled/recruited?
Sample information: How many and who are your participants? Demographic summaries are helpful additions to this section.
Research Project Materials: What materials are necessary to fully reproduce your the contents of your dataset? Include a list of all relevant materials (e.g., surveys, interview questions) with a brief description of what is included in each file that should be uploaded alongside your datasets.
List of relevant datafile(s): If your project produces data that cannot be contained in a single file, list the names of each of the files here with a brief description of what parts of your research project each file is related to.
Data codebook: What is in each column of your dataset? Provide variable names as they are encoded in your data files, verbatim question associated with each response, response options, details of any post-collection coding that has been done on the raw-response (and whether that's encoded in a separate column).
Examples available at: https://www.thearda.com/data-archive?fid=PEWMU17 https://www.thearda.com/data-archive?fid=RELLAND14
Facebook
TwitterThe documented dataset covers Enterprise Survey (ES) panel data collected in Malawi in 2009 and 2014, as part of Africa Enterprise Surveys roll-out, an initiative of the World Bank.
New Enterprise Surveys target a sample consisting of longitudinal (panel) observations and new cross-sectional data. Panel firms are prioritized in the sample selection, comprising up to 50% of the sample in the current wave. For all panel firms, regardless of the sample, current eligibility or operating status is determined and included in panel datasets.
Malawi ES 2014 was conducted between April 2014 and February 2015, Malawi ES 2009 was carried out in May - July 2009. The objective of the Enterprise Survey is to obtain feedback from enterprises on the state of the private sector as well as to help in building a panel of enterprise data that will make it possible to track changes in the business environment over time, thus allowing, for example, impact assessments of reforms. Through interviews with firms in the manufacturing and services sectors, the survey assesses the constraints to private sector growth and creates statistically significant business environment indicators that are comparable across countries.
Stratified random sampling was used to select the surveyed businesses. The data was collected using face-to-face interviews.
Data from 673 establishments was analyzed: 436 businesses were from 2014 ES only, 63 - from 2009 ES only, and 174 firms were from both 2009 and 2014 panels.
The standard Enterprise Survey topics include firm characteristics, gender participation, access to finance, annual sales, costs of inputs and labor, workforce composition, bribery, licensing, infrastructure, trade, crime, competition, capacity utilization, land and permits, taxation, informality, business-government relations, innovation and technology, and performance measures. Over 90 percent of the questions objectively measure characteristics of a country’s business environment. The remaining questions assess the survey respondents’ opinions on what are the obstacles to firm growth and performance.
National
The primary sampling unit of the study is an establishment. An establishment is a physical location where business is carried out and where industrial operations take place or services are provided. A firm may be composed of one or more establishments. For example, a brewery may have several bottling plants and several establishments for distribution. For the purposes of this survey an establishment must make its own financial decisions and have its own financial statements separate from those of the firm. An establishment must also have its own management and control over its payroll.
The whole population, or the universe, covered in the Enterprise Surveys is the non-agricultural private economy. It comprises: all manufacturing sectors according to the ISIC Revision 3.1 group classification (group D), construction sector (group F), services sector (groups G and H), and transport, storage, and communications sector (group I). Note that this population definition excludes the following sectors: financial intermediation (group J), real estate and renting activities (group K, except sub-sector 72, IT, which was added to the population under study), and all public or utilities sectors. Companies with 100% government ownership are not eligible to participate in the Enterprise Surveys.
Sample survey data [ssd]
For the Malawi ES, multiple sample frames were used: a sample frame was built using data compiled from local and municipal business registries. Due to the fact that the previous round of surveys utilized different stratification criteria in the 2009 survey sample, the presence of panel firms was limited to a maximum of 50% of the achieved interviews in each stratum. That sample is referred to as the panel.
Face-to-face [f2f]
The following survey instruments were used for Malawi ES 2009 and 2014: - Manufacturing Module Questionnaire - Services Module Questionnaire
The survey is fielded via manufacturing or services questionnaires in order not to ask questions that are irrelevant to specific types of firms, e.g. a question that relates to production and nonproduction workers should not be asked of a retail firm. In addition to questions that are asked across countries, all surveys are customized and contain country-specific questions. An example of customization would be including tourism-related questions that are asked in certain countries when tourism is an existing or potential sector of economic growth. There is a skip pattern in the Service Module Questionnaire for questions that apply only to retail firms.
Data entry and quality controls are implemented by the contractor and data is delivered to the World Bank in batches (typically 10%, 50% and 100%). These data deliveries are checked for logical consistency, out of range values, skip patterns, and duplicate entries. Problems are flagged by the World Bank and corrected by the implementing contractor through data checks, callbacks, and revisiting establishments.
Survey non-response must be differentiated from item non-response. The former refers to refusals to participate in the survey altogether whereas the latter refers to the refusals to answer some specific questions. Enterprise Surveys suffer from both problems and different strategies were used to address these issues.
Item non-response was addressed by two strategies: a- For sensitive questions that may generate negative reactions from the respondent, such as corruption or tax evasion, enumerators were instructed to collect "Refusal to respond" (-8) as a different option from "Don't know" (-9). b- Establishments with incomplete information were re-contacted in order to complete this information, whenever necessary.
Survey non-response was addressed by maximizing efforts to contact establishments that were initially selected for interview. Attempts were made to contact the establishment for interview at different times/days of the week before a replacement establishment (with similar strata characteristics) was suggested for interview. Survey non-response did occur but substitutions were made in order to potentially achieve strata-specific goals.
Facebook
TwitterThe National Sample Survey of Registered Nurses (NSSRN) Download makes data from the survey readily available to users in a one-stop download. The Survey has been conducted approximately every four years since 1977. For each survey year, HRSA has prepared two Public Use File databases in flat ASCII file format without delimiters. The 2008 data are also offerred in SAS and SPSS formats. Information likely to point to an individual in a sparsely-populated county has been withheld. General Public Use Files are State-based and provide information on nurses without identifying the County and Metropolitan Area in which they live or work. County Public Use Files provide most, but not all, the same information on the nurse from the General Public Use File, and also identifies the County and Metropolitan Areas in which the nurses live or work. NSSRN data are to be used for research purposes only and may not be used in any manner to identify individual respondents.
Facebook
TwitterEconomists are shifting attention and resources from work on survey data towork on “big data.” This analysis is an empirical exploration of the trade-offs this transition requires. Parallel models are estimated using the Federal Reserve Bank of New York Consumer Credit Panel/Equifax and the Survey of Consumer Finances. After adjustments to account for different variable definitions and sampled populations, it is possible to arrive at similar models of total household debt. However, the estimates are sensitive to the adjustments. Little similarity is observed in parallel models of nonmortgage debt. While surveys intentionally collect theoretically related variables, it may be necessary to merge external data into commercial big data. In this example, some education and income measures are successfully integrated with the big data, but other external aggregates fail to adequately substitute for survey responses. Big data offers sample sizes, frequencies, and details that surveys cannot match. However, this example illustrates why caution is appropriate when attempting to substitute big data for a carefully executed survey.
Facebook
TwitterThe City of Bloomington contracted with National Research Center, Inc. to conduct the 2019 Bloomington Community Survey. This was the second time a scientific citywide survey had been completed covering resident opinions on service delivery satisfaction by the City of Bloomington and quality of life issues. The first was in 2017. The survey captured the responses of 610 households from a representative sample of 3,000 residents of Bloomington who were randomly selected to complete the survey. VERY IMPORTANT NOTE: The scientific survey data were weighted, meaning that the demographic profile of respondents was compared to the demographic profile of adults in Bloomington from US Census data. Statistical adjustments were made to bring the respondent profile into balance with the population profile. This means that some records were given more "weight" and some records were given less weight. The weights that were applied are found in the field "wt". If you do not apply these weights, you will not obtain the same results as can be found in the report delivered to the City of Bloomington. The easiest way to replicate these results is likely to create pivot tables, and use the sum of the "wt" field rather than a count of responses.
Facebook
TwitterThe Tanzania Demographic and Health Survey (TDHS) is part of the worldwide Demographic and Health Surveys (DHS) programme, which is designed to collect data on fertility, family planning, and maternal and child health.
The primary objective of the 1999 TRCHS was to collect data at the national level (with breakdowns by urban-rural and Mainland-Zanzibar residence wherever warranted) on fertility levels and preferences, family planning use, maternal and child health, breastfeeding practices, nutritional status of young children, childhood mortality levels, knowledge and behaviour regarding HIV/AIDS, and the availability of specific health services within the community.1 Related objectives were to produce these results in a timely manner and to ensure that the data were disseminated to a wide audience of potential users in governmental and nongovernmental organisations within and outside Tanzania. The ultimate intent is to use the information to evaluate current programmes and to design new strategies for improving health and family planning services for the people of Tanzania.
National. The sample was designed to provide estimates for the whole country, for urban and rural areas separately, and for Zanzibar and, in some cases, Unguja and Pemba separately.
Sample survey data
The TRCHS used a three-stage sample design. Overall, 176 census enumeration areas were selected (146 on the Mainland and 30 in Zanzibar) with probability proportional to size on an approximately self-weighting basis on the Mainland, but with oversampling of urban areas and Zanzibar. To reduce costs and maximise the ability to identify trends over time, these enumeration areas were selected from the 357 sample points that were used in the 1996 TDHS, which in turn were selected from the 1988 census frame of enumeration in a two-stage process (first wards/branches and then enumeration areas within wards/branches). Before the data collection, fieldwork teams visited the selected enumeration areas to list all the households. From these lists, households were selected to be interviewed. The sample was designed to provide estimates for the whole country, for urban and rural areas separately, and for Zanzibar and, in some cases, Unguja and Pemba separately. The health facilities component of the TRCHS involved visiting hospitals, health centres, and pharmacies located in areas around the households interviewed. In this way, the data from the two components can be linked and a richer dataset produced.
See detailed sample implementation in the APPENDIX A of the final report.
Face-to-face
The household survey component of the TRCHS involved three questionnaires: 1) a Household Questionnaire, 2) a Women’s Questionnaire for all individual women age 15-49 in the selected households, and 3) a Men’s Questionnaire for all men age 15-59.
The health facilities survey involved six questionnaires: 1) a Community Questionnaire administered to men and women in each selected enumeration area; 2) a Facility Questionnaire; 3) a Facility Inventory; 4) a Service Provider Questionnaire; 5) a Pharmacy Inventory Questionnaire; and 6) a questionnaire for the District Medical Officers.
All these instruments were based on model questionnaires developed for the MEASURE programme, as well as on the questionnaires used in the 1991-92 TDHS, the 1994 TKAP, and the 1996 TDHS. These model questionnaires were adapted for use in Tanzania during meetings with representatives from the Ministry of Health, the University of Dar es Salaam, the Tanzania Food and Nutrition Centre, USAID/Tanzania, UNICEF/Tanzania, UNFPA/Tanzania, and other potential data users. The questionnaires and manual were developed in English and then translated into and printed in Kiswahili.
The Household Questionnaire was used to list all the usual members and visitors in the selected households. Some basic information was collected on the characteristics of each person listed, including his/her age, sex, education, and relationship to the head of the household. The main purpose of the Household Questionnaire was to identify women and men who were eligible for individual interview and children under five who were to be weighed and measured. Information was also collected about the dwelling itself, such as the source of water, type of toilet facilities, materials used to construct the house, ownership of various consumer goods, and use of iodised salt. Finally, the Household Questionnaire was used to collect some rudimentary information about the extent of child labour.
The Women’s Questionnaire was used to collect information from women age 15-49. These women were asked questions on the following topics: · Background characteristics (age, education, religion, type of employment) · Birth history · Knowledge and use of family planning methods · Antenatal, delivery, and postnatal care · Breastfeeding and weaning practices · Vaccinations, birth registration, and health of children under age five · Marriage and recent sexual activity · Fertility preferences · Knowledge and behaviour concerning HIV/AIDS.
The Men’s Questionnaire covered most of these same issues, except that it omitted the sections on the detailed reproductive history, maternal health, and child health. The final versions of the English questionnaires are provided in Appendix E.
Before the questionnaires could be finalised, a pretest was done in July 1999 in Kibaha District to assess the viability of the questions, the flow and logical sequence of the skip pattern, and the field organisation. Modifications to the questionnaires, including wording and translations, were made based on lessons drawn from the exercise.
In all, 3,826 households were selected for the sample, out of which 3,677 were occupied. Of the households found, 3,615 were interviewed, representing a response rate of 98 percent. The shortfall is primarily due to dwellings that were vacant or in which the inhabitants were not at home despite of several callbacks.
In the interviewed households, a total of 4,118 eligible women (i.e., women age 15-49) were identified for the individual interview, and 4,029 women were actually interviewed, yielding a response rate of 98 percent. A total of 3,792 eligible men (i.e., men age 15-59), were identified for the individual interview, of whom 3,542 were interviewed, representing a response rate of 93 percent. The principal reason for nonresponse among both eligible men and women was the failure to find them at home despite repeated visits to the household. The lower response rate among men than women was due to the more frequent and longer absences of men.
The response rates are lower in urban areas due to longer absence of respondents from their homes. One-member households are more common in urban areas and are more difficult to interview because they keep their houses locked most of the time. In urban settings, neighbours often do not know the whereabouts of such people.
The estimates from a sample survey are affected by two types of errors: (1) non-sampling errors, and (2) sampling errors. Non-sampling errors are the results of mistakes made in implementing data collection and data processing, such as failure to locate and interview the correct household, misunderstanding of the questions on the part of either the interviewer or the respondent, and data entry errors. Although numerous efforts were made during the implementation of the TRCHS to minimise this type of error, nonsampling errors are impossible to avoid and difficult to evaluate statistically.
Sampling errors, on the other hand, can be evaluated statistically. The sample of respondents selected in the TRCHS is only one of many samples that could have been selected from the same population, using the same design and expected size. Each of these samples would yield results that differ somewhat from the results of the actual sample selected. Sampling errors are a measure of the variability between all possible samples. Although the degree of variability is not known exactly, it can be estimated from the survey results.
A sampling error is usually measured in terms of the standard error for a particular statistic (mean, percentage, etc.), which is the square root of the variance. The standard error can be used to calculate confidence intervals within which the true value for the population can reasonably be assumed to fall. For example, for any given statistic calculated from a sample survey, the value of that statistic will fall within a range of plus or minus two times the standard error of that statistic in 95 percent of all possible samples of identical size and design.
If the sample of respondents had been selected as a simple random sample, it would have been possible to use straightforward formulas for calculating sampling errors. However, the TRCHS sample is the result of a two-stage stratified design, and, consequently, it was necessary to use more complex formulae. The computer software used to calculate sampling errors for the TRCHS is the ISSA Sampling Error Module (SAMPERR). This module used the Taylor linearisation method of variance estimation for survey estimates that are means or proportions. The Jackknife repeated replication method is used for variance estimation of more complex statistics such as fertility and mortality rate
Note: See detailed sampling error calculation in the APPENDIX B
Facebook
TwitterThe Associated Press is sharing data from the COVID Impact Survey, which provides statistics about physical health, mental health, economic security and social dynamics related to the coronavirus pandemic in the United States.
Conducted by NORC at the University of Chicago for the Data Foundation, the probability-based survey provides estimates for the United States as a whole, as well as in 10 states (California, Colorado, Florida, Louisiana, Minnesota, Missouri, Montana, New York, Oregon and Texas) and eight metropolitan areas (Atlanta, Baltimore, Birmingham, Chicago, Cleveland, Columbus, Phoenix and Pittsburgh).
The survey is designed to allow for an ongoing gauge of public perception, health and economic status to see what is shifting during the pandemic. When multiple sets of data are available, it will allow for the tracking of how issues ranging from COVID-19 symptoms to economic status change over time.
The survey is focused on three core areas of research:
Instead, use our queries linked below or statistical software such as R or SPSS to weight the data.
If you'd like to create a table to see how people nationally or in your state or city feel about a topic in the survey, use the survey questionnaire and codebook to match a question (the variable label) to a variable name. For instance, "How often have you felt lonely in the past 7 days?" is variable "soc5c".
Nationally: Go to this query and enter soc5c as the variable. Hit the blue Run Query button in the upper right hand corner.
Local or State: To find figures for that response in a specific state, go to this query and type in a state name and soc5c as the variable, and then hit the blue Run Query button in the upper right hand corner.
The resulting sentence you could write out of these queries is: "People in some states are less likely to report loneliness than others. For example, 66% of Louisianans report feeling lonely on none of the last seven days, compared with 52% of Californians. Nationally, 60% of people said they hadn't felt lonely."
The margin of error for the national and regional surveys is found in the attached methods statement. You will need the margin of error to determine if the comparisons are statistically significant. If the difference is:
The survey data will be provided under embargo in both comma-delimited and statistical formats.
Each set of survey data will be numbered and have the date the embargo lifts in front of it in the format of: 01_April_30_covid_impact_survey. The survey has been organized by the Data Foundation, a non-profit non-partisan think tank, and is sponsored by the Federal Reserve Bank of Minneapolis and the Packard Foundation. It is conducted by NORC at the University of Chicago, a non-partisan research organization. (NORC is not an abbreviation, it part of the organization's formal name.)
Data for the national estimates are collected using the AmeriSpeak Panel, NORC’s probability-based panel designed to be representative of the U.S. household population. Interviews are conducted with adults age 18 and over representing the 50 states and the District of Columbia. Panel members are randomly drawn from AmeriSpeak with a target of achieving 2,000 interviews in each survey. Invited panel members may complete the survey online or by telephone with an NORC telephone interviewer.
Once all the study data have been made final, an iterative raking process is used to adjust for any survey nonresponse as well as any noncoverage or under and oversampling resulting from the study specific sample design. Raking variables include age, gender, census division, race/ethnicity, education, and county groupings based on county level counts of the number of COVID-19 deaths. Demographic weighting variables were obtained from the 2020 Current Population Survey. The count of COVID-19 deaths by county was obtained from USA Facts. The weighted data reflect the U.S. population of adults age 18 and over.
Data for the regional estimates are collected using a multi-mode address-based (ABS) approach that allows residents of each area to complete the interview via web or with an NORC telephone interviewer. All sampled households are mailed a postcard inviting them to complete the survey either online using a unique PIN or via telephone by calling a toll-free number. Interviews are conducted with adults age 18 and over with a target of achieving 400 interviews in each region in each survey.Additional details on the survey methodology and the survey questionnaire are attached below or can be found at https://www.covid-impact.org.
Results should be credited to the COVID Impact Survey, conducted by NORC at the University of Chicago for the Data Foundation.
To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
Facebook
TwitterDifferent countries have different health outcomes that are in part due to the way respective health systems perform. Regardless of the type of health system, individuals will have health and non-health expectations in terms of how the institution responds to their needs. In many countries, however, health systems do not perform effectively and this is in part due to lack of information on health system performance, and on the different service providers.
The aim of the WHO World Health Survey is to provide empirical data to the national health information systems so that there is a better monitoring of health of the people, responsiveness of health systems and measurement of health-related parameters.
The overall aims of the survey is to examine the way populations report their health, understand how people value health states, measure the performance of health systems in relation to responsiveness and gather information on modes and extents of payment for health encounters through a nationally representative population based community survey. In addition, it addresses various areas such as health care expenditures, adult mortality, birth history, various risk factors, assessment of main chronic health conditions and the coverage of health interventions, in specific additional modules.
The objectives of the survey programme are to: 1. develop a means of providing valid, reliable and comparable information, at low cost, to supplement the information provided by routine health information systems. 2. build the evidence base necessary for policy-makers to monitor if health systems are achieving the desired goals, and to assess if additional investment in health is achieving the desired outcomes. 3. provide policy-makers with the evidence they need to adjust their policies, strategies and programmes as necessary.
The survey sampling frame must cover 100% of the country's eligible population, meaning that the entire national territory must be included. This does not mean that every province or territory need be represented in the survey sample but, rather, that all must have a chance (known probability) of being included in the survey sample.
There may be exceptional circumstances that preclude 100% national coverage. Certain areas in certain countries may be impossible to include due to reasons such as accessibility or conflict. All such exceptions must be discussed with WHO sampling experts. If any region must be excluded, it must constitute a coherent area, such as a particular province or region. For example if ¾ of region D in country X is not accessible due to war, the entire region D will be excluded from analysis.
Households and individuals
The WHS will include all male and female adults (18 years of age and older) who are not out of the country during the survey period. It should be noted that this includes the population who may be institutionalized for health reasons at the time of the survey: all persons who would have fit the definition of household member at the time of their institutionalisation are included in the eligible population.
If the randomly selected individual is institutionalized short-term (e.g. a 3-day stay at a hospital) the interviewer must return to the household when the individual will have come back to interview him/her. If the randomly selected individual is institutionalized long term (e.g. has been in a nursing home the last 8 years), the interviewer must travel to that institution to interview him/her.
The target population includes any adult, male or female age 18 or over living in private households. Populations in group quarters, on military reservations, or in other non-household living arrangements will not be eligible for the study. People who are in an institution due to a health condition (such as a hospital, hospice, nursing home, home for the aged, etc.) at the time of the visit to the household are interviewed either in the institution or upon their return to their household if this is within a period of two weeks from the first visit to the household.
Sample survey data [ssd]
SAMPLING GUIDELINES FOR WHS
Surveys in the WHS program must employ a probability sampling design. This means that every single individual in the sampling frame has a known and non-zero chance of being selected into the survey sample. While a Single Stage Random Sample is ideal if feasible, it is recognized that most sites will carry out Multi-stage Cluster Sampling.
The WHS sampling frame should cover 100% of the eligible population in the surveyed country. This means that every eligible person in the country has a chance of being included in the survey sample. It also means that particular ethnic groups or geographical areas may not be excluded from the sampling frame.
The sample size of the WHS in each country is 5000 persons (exceptions considered on a by-country basis). An adequate number of persons must be drawn from the sampling frame to account for an estimated amount of non-response (refusal to participate, empty houses etc.). The highest estimate of potential non-response and empty households should be used to ensure that the desired sample size is reached at the end of the survey period. This is very important because if, at the end of data collection, the required sample size of 5000 has not been reached additional persons must be selected randomly into the survey sample from the sampling frame. This is both costly and technically complicated (if this situation is to occur, consult WHO sampling experts for assistance), and best avoided by proper planning before data collection begins.
All steps of sampling, including justification for stratification, cluster sizes, probabilities of selection, weights at each stage of selection, and the computer program used for randomization must be communicated to WHO
STRATIFICATION
Stratification is the process by which the population is divided into subgroups. Sampling will then be conducted separately in each subgroup. Strata or subgroups are chosen because evidence is available that they are related to the outcome (e.g. health, responsiveness, mortality, coverage etc.). The strata chosen will vary by country and reflect local conditions. Some examples of factors that can be stratified on are geography (e.g. North, Central, South), level of urbanization (e.g. urban, rural), socio-economic zones, provinces (especially if health administration is primarily under the jurisdiction of provincial authorities), or presence of health facility in area. Strata to be used must be identified by each country and the reasons for selection explicitly justified.
Stratification is strongly recommended at the first stage of sampling. Once the strata have been chosen and justified, all stages of selection will be conducted separately in each stratum. We recommend stratifying on 3-5 factors. It is optimum to have half as many strata (note the difference between stratifying variables, which may be such variables as gender, socio-economic status, province/region etc. and strata, which are the combination of variable categories, for example Male, High socio-economic status, Xingtao Province would be a stratum).
Strata should be as homogenous as possible within and as heterogeneous as possible between. This means that strata should be formulated in such a way that individuals belonging to a stratum should be as similar to each other with respect to key variables as possible and as different as possible from individuals belonging to a different stratum. This maximises the efficiency of stratification in reducing sampling variance.
MULTI-STAGE CLUSTER SELECTION
A cluster is a naturally occurring unit or grouping within the population (e.g. enumeration areas, cities, universities, provinces, hospitals etc.); it is a unit for which the administrative level has clear, nonoverlapping boundaries. Cluster sampling is useful because it avoids having to compile exhaustive lists of every single person in the population. Clusters should be as heterogeneous as possible within and as homogenous as possible between (note that this is the opposite criterion as that for strata). Clusters should be as small as possible (i.e. large administrative units such as Provinces or States are not good clusters) but not so small as to be homogenous.
In cluster sampling, a number of clusters are randomly selected from a list of clusters. Then, either all members of the chosen cluster or a random selection from among them are included in the sample. Multistage sampling is an extension of cluster sampling where a hierarchy of clusters are chosen going from larger to smaller.
In order to carry out multi-stage sampling, one needs to know only the population sizes of the sampling units. For the smallest sampling unit above the elementary unit however, a complete list of all elementary units (households) is needed; in order to be able to randomly select among all households in the TSU, a list of all those households is required. This information may be available from the most recent population census. If the last census was >3 years ago or the information furnished by it was of poor quality or unreliable, the survey staff will have the task of enumerating all households in the smallest randomly selected sampling unit. It is very important to budget for this step if it is necessary and ensure that all households are properly enumerated in order that a representative sample is obtained.
It is always best to have as many clusters in the PSU as possible. The reason for this is that the fewer the number of respondents in each PSU, the lower will be the clustering effect which
Facebook
TwitterThe harmonized data set on health, created and published by the ERF, is a subset of Iraq Household Socio Economic Survey (IHSES) 2012. It was derived from the household, individual and health modules, collected in the context of the above mentioned survey. The sample was then used to create a harmonized health survey, comparable with the Iraq Household Socio Economic Survey (IHSES) 2007 micro data set.
----> Overview of the Iraq Household Socio Economic Survey (IHSES) 2012:
Iraq is considered a leader in household expenditure and income surveys where the first was conducted in 1946 followed by surveys in 1954 and 1961. After the establishment of Central Statistical Organization, household expenditure and income surveys were carried out every 3-5 years in (1971/ 1972, 1976, 1979, 1984/ 1985, 1988, 1993, 2002 / 2007). Implementing the cooperation between CSO and WB, Central Statistical Organization (CSO) and Kurdistan Region Statistics Office (KRSO) launched fieldwork on IHSES on 1/1/2012. The survey was carried out over a full year covering all governorates including those in Kurdistan Region.
The survey has six main objectives. These objectives are:
The raw survey data provided by the Statistical Office were then harmonized by the Economic Research Forum, to create a comparable version with the 2006/2007 Household Socio Economic Survey in Iraq. Harmonization at this stage only included unifying variables' names, labels and some definitions. See: Iraq 2007 & 2012- Variables Mapping & Availability Matrix.pdf provided in the external resources for further information on the mapping of the original variables on the harmonized ones, in addition to more indications on the variables' availability in both survey years and relevant comments.
National coverage: Covering a sample of urban, rural and metropolitan areas in all the governorates including those in Kurdistan Region.
1- Household/family. 2- Individual/person.
The survey was carried out over a full year covering all governorates including those in Kurdistan Region.
Sample survey data [ssd]
----> Design:
Sample size was (25488) household for the whole Iraq, 216 households for each district of 118 districts, 2832 clusters each of which includes 9 households distributed on districts and governorates for rural and urban.
----> Sample frame:
Listing and numbering results of 2009-2010 Population and Housing Survey were adopted in all the governorates including Kurdistan Region as a frame to select households, the sample was selected in two stages: Stage 1: Primary sampling unit (blocks) within each stratum (district) for urban and rural were systematically selected with probability proportional to size to reach 2832 units (cluster). Stage two: 9 households from each primary sampling unit were selected to create a cluster, thus the sample size of total survey clusters was 25488 households distributed on the governorates, 216 households in each district.
----> Sampling Stages:
In each district, the sample was selected in two stages: Stage 1: based on 2010 listing and numbering frame 24 sample points were selected within each stratum through systematic sampling with probability proportional to size, in addition to the implicit breakdown urban and rural and geographic breakdown (sub-district, quarter, street, county, village and block). Stage 2: Using households as secondary sampling units, 9 households were selected from each sample point using systematic equal probability sampling. Sampling frames of each stages can be developed based on 2010 building listing and numbering without updating household lists. In some small districts, random selection processes of primary sampling may lead to select less than 24 units therefore a sampling unit is selected more than once , the selection may reach two cluster or more from the same enumeration unit when it is necessary.
Face-to-face [f2f]
----> Preparation:
The questionnaire of 2006 survey was adopted in designing the questionnaire of 2012 survey on which many revisions were made. Two rounds of pre-test were carried out. Revision were made based on the feedback of field work team, World Bank consultants and others, other revisions were made before final version was implemented in a pilot survey in September 2011. After the pilot survey implemented, other revisions were made in based on the challenges and feedbacks emerged during the implementation to implement the final version in the actual survey.
----> Questionnaire Parts:
The questionnaire consists of four parts each with several sections: Part 1: Socio – Economic Data: - Section 1: Household Roster - Section 2: Emigration - Section 3: Food Rations - Section 4: housing - Section 5: education - Section 6: health - Section 7: Physical measurements - Section 8: job seeking and previous job
Part 2: Monthly, Quarterly and Annual Expenditures: - Section 9: Expenditures on Non – Food Commodities and Services (past 30 days). - Section 10 : Expenditures on Non – Food Commodities and Services (past 90 days). - Section 11: Expenditures on Non – Food Commodities and Services (past 12 months). - Section 12: Expenditures on Non-food Frequent Food Stuff and Commodities (7 days). - Section 12, Table 1: Meals Had Within the Residential Unit. - Section 12, table 2: Number of Persons Participate in the Meals within Household Expenditure Other Than its Members.
Part 3: Income and Other Data: - Section 13: Job - Section 14: paid jobs - Section 15: Agriculture, forestry and fishing - Section 16: Household non – agricultural projects - Section 17: Income from ownership and transfers - Section 18: Durable goods - Section 19: Loans, advances and subsidies - Section 20: Shocks and strategy of dealing in the households - Section 21: Time use - Section 22: Justice - Section 23: Satisfaction in life - Section 24: Food consumption during past 7 days
Part 4: Diary of Daily Expenditures: Diary of expenditure is an essential component of this survey. It is left at the household to record all the daily purchases such as expenditures on food and frequent non-food items such as gasoline, newspapers…etc. during 7 days. Two pages were allocated for recording the expenditures of each day, thus the roster will be consists of 14 pages.
----> Raw Data:
Data Editing and Processing: To ensure accuracy and consistency, the data were edited at the following stages: 1. Interviewer: Checks all answers on the household questionnaire, confirming that they are clear and correct. 2. Local Supervisor: Checks to make sure that questions has been correctly completed. 3. Statistical analysis: After exporting data files from excel to SPSS, the Statistical Analysis Unit uses program commands to identify irregular or non-logical values in addition to auditing some variables. 4. World Bank consultants in coordination with the CSO data management team: the World Bank technical consultants use additional programs in SPSS and STAT to examine and correct remaining inconsistencies within the data files. The software detects errors by analyzing questionnaire items according to the expected parameter for each variable.
----> Harmonized Data:
Iraq Household Socio Economic Survey (IHSES) reached a total of 25488 households. Number of households refused to response was 305, response rate was 98.6%. The highest interview rates were in Ninevah and Muthanna (100%) while the lowest rates were in Sulaimaniya (92%).
Facebook
TwitterTHE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE DEPARTMENT OF STATISTICS OF THE HASHEMITE KINGDOM OF JORDAN
The Department of Statistics (DOS) carried out four rounds of the 2016 Employment and Unemployment Survey (EUS). The survey rounds covered a sample of about fourty nine thousand households Nation-wide. The sampled households were selected using a stratified multi-stage cluster sampling design.
It is worthy to mention that the DOS employed new technology in data collection and data processing. Data was collected using electronic questionnaire instead of a hard copy, namely a hand held device (PDA).
The survey main objectives are: - To identify the demographic, social and economic characteristics of the population and manpower. - To identify the occupational structure and economic activity of the employed persons, as well as their employment status. - To identify the reasons behind the desire of the employed persons to search for a new or additional job. - To measure the economic activity participation rates (the number of economically active population divided by the population of 15+ years old). - To identify the different characteristics of the unemployed persons. - To measure unemployment rates (the number of unemployed persons divided by the number of economically active population of 15+ years old) according to the various characteristics of the unemployed, and the changes that might take place in this regard. - To identify the most important ways and means used by the unemployed persons to get a job, in addition to measuring durations of unemployment for such persons. - To identify the changes overtime that might take place regarding the above-mentioned variables.
The raw survey data provided by the Statistical Agency were cleaned and harmonized by the Economic Research Forum, in the context of a major project that started in 2009. During which extensive efforts have been exerted to acquire, clean, harmonize, preserve and disseminate micro data of existing labor force surveys in several Arab countries.
Covering a sample representative on the national level (Kingdom), governorates, and the three Regions (Central, North and South).
1- Household/family. 2- Individual/person.
The survey covered a national sample of households and all individuals permanently residing in surveyed households.
Sample survey data [ssd]
THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE DEPARTMENT OF STATISTICS OF THE HASHEMITE KINGDOM OF JORDAN
Computer Assisted Personal Interview [capi]
----> Raw Data
A tabulation results plan has been set based on the previous Employment and Unemployment Surveys while the required programs were prepared and tested. When all prior data processing steps were completed, the actual survey results were tabulated using an ORACLE package. The tabulations were then thoroughly checked for consistency of data. The final report was then prepared, containing detailed tabulations as well as the methodology of the survey.
----> Harmonized Data
Facebook
TwitterThe dataset is a relational dataset of 8,000 households households, representing a sample of the population of an imaginary middle-income country. The dataset contains two data files: one with variables at the household level, the other one with variables at the individual level. It includes variables that are typically collected in population censuses (demography, education, occupation, dwelling characteristics, fertility, mortality, and migration) and in household surveys (household expenditure, anthropometric data for children, assets ownership). The data only includes ordinary households (no community households). The dataset was created using REaLTabFormer, a model that leverages deep learning methods. The dataset was created for the purpose of training and simulation and is not intended to be representative of any specific country.
The full-population dataset (with about 10 million individuals) is also distributed as open data.
The dataset is a synthetic dataset for an imaginary country. It was created to represent the population of this country by province (equivalent to admin1) and by urban/rural areas of residence.
Household, Individual
The dataset is a fully-synthetic dataset representative of the resident population of ordinary households for an imaginary middle-income country.
ssd
The sample size was set to 8,000 households. The fixed number of households to be selected from each enumeration area was set to 25. In a first stage, the number of enumeration areas to be selected in each stratum was calculated, proportional to the size of each stratum (stratification by geo_1 and urban/rural). Then 25 households were randomly selected within each enumeration area. The R script used to draw the sample is provided as an external resource.
other
The dataset is a synthetic dataset. Although the variables it contains are variables typically collected from sample surveys or population censuses, no questionnaire is available for this dataset. A "fake" questionnaire was however created for the sample dataset extracted from this dataset, to be used as training material.
The synthetic data generation process included a set of "validators" (consistency checks, based on which synthetic observation were assessed and rejected/replaced when needed). Also, some post-processing was applied to the data to result in the distributed data files.
This is a synthetic dataset; the "response rate" is 100%.
Facebook
TwitterFor the Belarus Household Sample Survey 2015, researchers collect information on demographic characteristics of household members, housing conditions, personal subsidiary plots, property, household expenditure and income.
The information obtained from the survey is used to analyze the influence of new social processes on living standards and to develop policies aimed at social protection of various population groups. Data is also used to compile household accounts in the system of National Accounts, to calculate consumer price index for goods and services, and to estimate the poverty level in the country.
National coverage
A household is understood as the unit of the survey. For survey purposes the category "households" includes: - families consisting of a husband and a wife with or without children or single parent families; - relatives living together and having a common budget (brother and sister, grandmother and grandson and etc.); - persons living together and having a common budget but who are not relatives, for example two friends; - persons living alone; - families consisting of two and more married couples with or without children.
The survey covers all household members excluding persons fully supported by the state, for example people staying in homes for the elderly and the disabled, children in public care institutions, prisoners, etc. The survey also excludes foreigners living and working in Belarus on contract basis and families of military men living in military residential settlements or other restricted areas.
Sample survey data [ssd]
In accordance with international standards, survey data collection and processing system were altered as well as sample designing method. Instead of branch principle that had been employed earlier, probability (random) sampling method was introduced where sampling units are determined based on the probability proportionate to population size. This method ensures sample representativeness at national and regional levels, sample results independence and intentional error avoidance. Household participation in the survey is on a voluntary basis. Sampled households are surveyed for a year and then are subject to replacement (rotation).
To conduct a sample survey, households' residential addresses are sampled. An overall number of the households living in the Republic of Belarus (according to the results of the latest population Census of the Republic of Belarus) forms universe general population during sampling with an exception of collective households and students residing in dormitories. Sampled population forming 0,2% of universe general population is annually arranged by the Belstat.
Face-to-face [f2f]
The main components of the survey are:
1) The main interview which is intended to establish the first contact with the household, to make the list of all household members, to collect the basic information about the household in general and its individual members and to fix the date for subsequent interviews. Before an interview a household receives the initial letter signed by the Minister of Statistics and Analysis stating the date and time of the interviewer's visit. The main interview is conducted in December of the previous year;
2) Four quarterly interviews, conducted in April, July, October of the current year and January of the next year. Quarterly interview covers three previous months and summarizes the information about incomes and major expenditures of the households. At the beginning of every quarter a household is given a diary for recording expenditures during the quarter. The diary is used during quarterly interviews;
3) Four two-week diaries which are handed to a household every quarter. The diary is intended for daily recording of expenditures on foodstuffs and non-food products within 14 days as well as for recording of the consumed foodstuffs which were produced at the individual subsidiary land plot or received as a present.
The following coding systems were developed and introduced for Belarus household sample survey: 1) coding of households covered by the survey; 2) coding of household expenditures; 3) coding of additional incomes and employment of household members by sectors and types of activity.
Facebook
TwitterThis study is an experiment designed to compare the performance of three methodologies for sampling households with migrants:
Researchers from the World Bank applied these methods in the context of a survey of Brazilians of Japanese descent (Nikkei), requested by the World Bank. There are approximately 1.2-1.9 million Nikkei among Brazil’s 170 million population.
The survey was designed to provide detail on the characteristics of households with and without migrants, to estimate the proportion of households receiving remittances and with migrants in Japan, and to examine the consequences of migration and remittances on the sending households.
The same questionnaire was used for the stratified random sample and snowball surveys, and a shorter version of the questionnaire was used for the intercept surveys. Researchers can directly compare answers to the same questions across survey methodologies and determine the extent to which the intercept and snowball surveys can give similar results to the more expensive census-based survey, and test for the presence of biases.
Sao Paulo and Parana states
Japanese-Brazilian (Nikkei) households and individuals
The 2000 Brazilian Census was used to classify households as Nikkei or non-Nikkei. The Brazilian Census does not ask ethnicity but instead asks questions on race, country of birth and whether an individual has lived elsewhere in the last 10 years. On the basis of these questions, a household is classified as (potentially) Nikkei if it has any of the following: 1) a member born in Japan; 2) a member who is of yellow race and who has lived in Japan in the last 10 years; 3) a member who is of yellow race, who was not born in a country other than Japan (predominantly Korea, Taiwan or China) and who did not live in a foreign country other than Japan in the last 10 years.
Sample survey data [ssd]
1) Stratified random sample survey
Two states with the largest Nikkei population - Sao Paulo and Parana - were chosen for the study.
The sampling process consisted of three stages. First, a stratified random sample of 75 census tracts was selected based on 2000 Brazilian census. Second, interviewers carried out a door-to-door listing within each census tract to determine which households had a Nikkei member. Third, the survey questionnaire was then administered to households that were identified as Nikkei. A door-to-door listing exercise of the 75 census tracts was then carried out between October 13th, 2006, and October 29th, 2006. The fieldwork began on November 19, 2006, and all dwellings were visited at least once by December 22, 2006. The second wave of surveying took place from January 18th, 2007, to February 2nd, 2007, which was intended to increase the number of households responding.
2) Intercept survey
The intercept survey was designed to carry out interviews at a range of locations that were frequented by the Nikkei population. It was originally designed to be done in Sao Paulo city only, but a second intercept point survey was later carried out in Curitiba, Parana. Intercept survey took place between December 9th, 2006, and December 20th, 2006, whereas the Curitiba intercept survey took place between March 3rd and March 12th, 2007.
Consultations with Nikkei community organizations, local researchers and officers of the bank Sudameris, which provides remittance services to this community, were used to select a broad range of locations. Interviewers were assigned to visit each location during prespecified blocks of time. Two fieldworkers were assigned to each location. One fieldworker carried out the interviews, while the other carried out a count of the number of people with Nikkei appearance who appeared to be 18 years old or older who passed by each location. For the fixed places, this count was made throughout the prespecified time block. For example, between 2.30 p.m. and 3.30 p.m. at the sports club, the interviewer counted 57 adult Nikkeis. Refusal rates were carefully recorded, along with the sex and approximate age of the person refusing.
In all, 516 intercept interviews were collected.
3) Snowball sampling survey
The questionnaire that was used was the same as used for the stratified random sample. The plan was to begin with a seed list of 75 households, and to aim to reach a total sample of 300 households through referrals from the initial seed households. Each household surveyed was asked to supply the names of three contacts: (a) a Nikkei household with a member currently in Japan; (b) a Nikkei household with a member who has returned from Japan; (c) a Nikkei household without members in Japan and where individuals had not returned from Japan.
The snowball survey took place from December 5th to 20th, 2006. The second phase of the snowballing survey ran from January 22nd, 2007, to March 23rd, 2007. More associations were contacted to provide additional seed names (69 more names were obtained) and, as with the stratified sample, an adaptation of the intercept survey was used when individuals refused to answer the longer questionnaire. A decision was made to continue the snowball process until a target sample size of 100 had been achieved.
The final sample consists of 60 households who came as seed households from Japanese associations, and 40 households who were chain referrals. The longest chain achieved was three links.
Face-to-face [f2f]
1) Stratified sampling and snowball survey questionnaire
This questionnaire has 36 pages with over 1,000 variables, taking over an hour to complete.
If subjects refused to answer the questionnaire, interviewers would leave a much shorter version of the questionnaire to be completed by the household by themselves, and later picked up. This shorter questionnaire was the same as used in the intercept point survey, taking seven minutes on average. The intention with the shorter survey was to provide some data on households that would not answer the full survey because of time constraints, or because respondents were reluctant to have an interviewer in their house.
2) Intercept questionnaire
The questionnaire is four pages in length, consisting of 62 questions and taking a mean time of seven minutes to answer. Respondents had to be 18 years old or older to be interviewed.
1) Stratified random sampling 403 out of the 710 Nikkei households were surveyed, an interview rate of 57%. The refusal rate was 25%, whereas the remaining households were either absent on three attempts or were not surveyed because building managers refused permission to enter the apartment buildings. Refusal rates were higher in Sao Paulo than in Parana, reflecting greater concerns about crime and a busier urban environment.
2) Intercept Interviews 516 intercept interviews were collected, along with 325 refusals. The average refusal rate is 39%, with location-specific refusal rates ranging from only 3% at the food festival to almost 66% at one of the two grocery stores.
Facebook
TwitterThe documentation covers Enterprise Survey panel datasets that were collected in Uruguay in 2006, 2010 and 2017. The Enterprise Survey is a firm-level survey of a representative sample of an economy's private sector. The surveys cover a broad range of business environment topics including access to finance, corruption, infrastructure, crime, competition, and performance measures. The objective of the Enterprise Survey is to gain an understanding of what firms experience in the private sector.
As part of its strategic goal of building a climate for investment, job creation, and sustainable growth, the World Bank has promoted improving the business environment as a key strategy for development, which has led to a systematic effort in collecting enterprise data across countries. The Enterprise Surveys (ES) are an ongoing World Bank project in collecting both objective data based on firms' experiences and enterprises' perception of the environment in which they operate.
National coverage
The primary sampling unit of the study is the establishment. An establishment is a physical location where business is carried out and where industrial operations take place or services are provided. A firm may be composed of one or more establishments. For example, a brewery may have several bottling plants and several establishments for distribution. For the purposes of this survey an establishment must make its own financial decisions and have its own financial statements separate from those of the firm. An establishment must also have its own management and control over its payroll.
The whole population, or the universe, covered in the Enterprise Surveys is the non-agricultural economy. It comprises: all manufacturing sectors according to the ISIC Revision 3.1 group classification (group D), construction sector (group F), services sector (groups G and H), and transport, storage, and communications sector (group I). Note that this population definition excludes the following sectors: financial intermediation (group J), real estate and renting activities (group K, except sub-sector 72, IT, which was added to the population under study), and all public or utilities sectors.
Sample survey data [ssd]
The samples for 2006, 2010 and 2017 Uruguay Enterprise Surveys were selected using stratified random sampling, following the methodology explained in the Sampling Note.
Three levels of stratification were used in Honduras ES: industry, establishment size, and region.
In 2006 ES, industry stratification was designed in the following way: In small economies the population was stratified into 3 manufacturing industries, one services industry - retail-, and one residual sector as defined in the sampling manual. Each industry had a target of 120 interviews.
In 2010 ES, industry stratification was designed in the way that follows: the universe was stratified into 3 manufacturing industries, 1 service industry -retail -, and 1 residual sector as defined in the sampling manual. All sectors had a target of 120 interviews. Regional stratification was defined in two regions (city and the surrounding business area): Montevideo and Canelones.
In 2017 ES, industry stratification was designed as follows: the universe was stratified into Manufacturing industries (ISIC Rev. 3.1 codes 15-37), Retail industries (ISIC code 52) and Other Services (ISIC codes 45, 50, 51, 55, 60-64, and 72). For the Uruguay ES, size stratification was defined as follows: small (5 to 19 employees), medium (20 to 99 employees), and large (100 or more employees). Regional stratification was done across two regions: Montevideo and Canelones.
Face-to-face [f2f]
Two questionnaires - Manufacturing amd Services were used to collect the survey data.
The Questionnaires have common questions (core module) and respectfully additional manufacturing- and services-specific questions. The eligible manufacturing industries have been surveyed using the Manufacturing questionnaire (includes the core module, plus manufacturing specific questions). Retail firms have been interviewed using the Services questionnaire (includes the core module plus retail specific questions) and the residual eligible services have been covered using the Services questionnaire (includes the core module).
Facebook
TwitterPanel data possess several advantages over conventional cross-sectional and time-series data, including their power to isolate the effects of specific actions, treatments, and general policies often at the core of large-scale econometric development studies. While the concept of panel data alone provides the capacity for modeling the complexities of human behavior, the notion of universal panel data – in which time- and situation-driven variances leading to variations in tools, and thus results, are mitigated – can further enhance exploitation of the richness of panel information.
This Basic Information Document (BID) provides a brief overview of the Tanzania National Panel Survey (NPS), but focuses primarily on the theoretical development and application of panel data, as well as key elements of the universal panel survey instrument and datasets generated by the four rounds of the NPS. As this Basic Information Document (BID) for the UPD does not describe in detail the background, development, or use of the NPS itself, the round-specific NPS BIDs should supplement the information provided here.
The NPS Uniform Panel Dataset (UPD) consists of both survey instruments and datasets, meticulously aligned and engineered with the aim of facilitating the use of and improving access to the wealth of panel data offered by the NPS. The NPS-UPD provides a consistent and straightforward means of conducting not only user-driven analyses using convenient, standardized tools, but also for monitoring MKUKUTA, FYDP II, and other national level development indicators reported by the NPS.
The design of the NPS-UPD combines the four completed rounds of the NPS – NPS 2008/09 (R1), NPS 2010/11 (R2), NPS 2012/13 (R3), and NPS 2014/15 (R4) – into pooled, module-specific survey instruments and datasets. The panel survey instruments offer the ease of comparability over time, with modifications and variances easily identifiable as well as those aspects of the questionnaire which have remained identical and offer consistent information. By providing all module-specific data over time within compact, pooled datasets, panel datasets eliminate the need for user-generated merges between rounds and present data in a clear, logical format, increasing both the usability and comprehension of complex data.
Designed for analysis of key indicators at four primary domains of inference, namely: Dar es Salaam, other urban, rural, Zanzibar.
The universe includes all households and individuals in Tanzania with the exception of those residing in military barracks or other institutions.
Sample survey data [ssd]
While the same sample of respondents was maintained over the first three rounds of the NPS, longitudinal surveys tend to suffer from bias introduced by households leaving the survey over time; i.e. attrition. Although the NPS maintains a highly successful recapture rate (roughly 96% retention at the household level), minimizing the escalation of this selection bias, a refresh of longitudinal cohorts was done for the NPS 2014/15 to ensure proper representativeness of estimates while maintaining a sufficient primary sample to maintain cohesion within panel analysis. A newly completed Population and Housing Census (PHC) in 2012, providing updated population figures along with changes in administrative boundaries, emboldened the opportunity to realign the NPS sample and abate collective bias potentially introduced through attrition.
To maintain the panel concept of the NPS, the sample design for NPS 2014/2015 consisted of a combination of the original NPS sample and a new NPS sample. A nationally representative sub-sample was selected to continue as part of the “Extended Panel” while an entirely new sample, “Refresh Panel”, was selected to represent national and sub-national domains. Similar to the sample in NPS 2008/2009, the sample design for the “Refresh Panel” allows analysis at four primary domains of inference, namely: Dar es Salaam, other urban areas on mainland Tanzania, rural mainland Tanzania, and Zanzibar. This new cohort in NPS 2014/2015 will be maintained and tracked in all future rounds between national censuses.
Face-to-face [f2f]
The format of the NPS-UPD survey instrument is similar to previously disseminated NPS survey instruments. Each module has a questionnaire and clearly identifies if the module collects information at the individual or household level. Within each module-specific questionnaire of the NPS-UPD survey instrument, there are five distinct sections, arranged vertically: (1) the UPD - “U” on the survey instrument, (2) R4, (3), R3, (4) R2, and (5) R1 – the latter 4 sections presenting each questionnaire in its original form at time of its respective dissemination.
The uppermost section of each module’s questionnaire (“U”) represents the model universal panel questionnaire, with questions generated from the comprehensive listing of questions across all four rounds of the NPS and codes generated from the comprehensive collection of codes. The following sections are arranged vertically by round, considering R4 as most recent. While not all rounds will have data reported for each question in the UPD and not each question will have reports for each of the UPD codes listed, the NPS-UPD survey instrument represents the visual, all-inclusive set of information collected by the NPS over time.
The four round-specific sections (R4, R3, R2, R1) are aligned with their UPD-equivalent question, visually presenting their contribution to compatibility with the UPD. Each round-specific section includes the original round-specific variable names, response codes and skip patterns (corresponding to their respective round-specific NPS data sets, and despite their variance from other rounds or from the comprehensive UPD code listing)4.
Facebook
TwitterThe American Community Survey (ACS) Public Use Microdata Sample (PUMS) contains a sample of responses to the ACS. The ACS PUMS dataset includes variables for nearly every question on the survey, as well as many new variables that were derived after the fact from multiple survey responses (such as poverty status).Each record in the file represents a single person, or, in the household-level dataset, a single housing unit. In the person-level file, individuals are organized into households, making possible the study of people within the contexts of their families and other household members. Individuals living in Group Quarters, such as nursing facilities or college facilities, are also included on the person file. ACS PUMS data are available at the nation, state, and Public Use Microdata Area (PUMA) levels. PUMAs are special non-overlapping areas that partition each state into contiguous geographic units containing roughly 100,000 people each. ACS PUMS files for an individual year, such as 2019, contain data on approximately one percent of the United States population.
Facebook
TwitterThe basic goal of this survey is to provide the necessary database for formulating national policies at various levels. It represents the contribution of the household sector to the Gross National Product (GNP). Household Surveys help as well in determining the incidence of poverty, and providing weighted data which reflects the relative importance of the consumption items to be employed in determining the benchmark for rates and prices of items and services. Generally, the Household Expenditure and Consumption Survey is a fundamental cornerstone in the process of studying the nutritional status in the Palestinian territory.
The raw survey data provided by the Statistical Office was cleaned and harmonized by the Economic Research Forum, in the context of a major research project to develop and expand knowledge on equity and inequality in the Arab region. The main focus of the project is to measure the magnitude and direction of change in inequality and to understand the complex contributing social, political and economic forces influencing its levels. However, the measurement and analysis of the magnitude and direction of change in this inequality cannot be consistently carried out without harmonized and comparable micro-level data on income and expenditures. Therefore, one important component of this research project is securing and harmonizing household surveys from as many countries in the region as possible, adhering to international statistics on household living standards distribution. Once the dataset has been compiled, the Economic Research Forum makes it available, subject to confidentiality agreements, to all researchers and institutions concerned with data collection and issues of inequality. Data is a public good, in the interest of the region, and it is consistent with the Economic Research Forum's mandate to make micro data available, aiding regional research on this important topic.
The survey data covers urban, rural and camp areas in West Bank and Gaza Strip.
1- Household/families. 2- Individuals.
The survey covered all the Palestinian households who are a usual residence in the Palestinian Territory.
Sample survey data [ssd]
The sampling frame consists of all enumeration areas which were enumerated in 1997; the enumeration area consists of buildings and housing units and is composed of an average of 120 households. The enumeration areas were used as Primary Sampling Units (PSUs) in the first stage of the sampling selection. The enumeration areas of the master sample were updated in 2003.
The sample is a stratified cluster systematic random sample with two stages: First stage: selection of a systematic random sample of 299 enumeration areas. Second stage: selection of a systematic random sample of 12-18 households from each enumeration area selected in the first stage. A person (18 years and more) was selected from each household in the second stage.
The population was divided by: 1- Governorate 2- Type of Locality (urban, rural, refugee camps)
The calculated sample size is 3,781 households.
The target cluster size or "sample-take" is the average number of households to be selected per PSU. In this survey, the sample take is around 12 households.
Detailed information/formulas on the sampling design are available in the user manual.
Face-to-face [f2f]
The PECS questionnaire consists of two main sections:
First section: Certain articles / provisions of the form filled at the beginning of the month,and the remainder filled out at the end of the month. The questionnaire includes the following provisions:
Cover sheet: It contains detailed and particulars of the family, date of visit, particular of the field/office work team, number/sex of the family members.
Statement of the family members: Contains social, economic and demographic particulars of the selected family.
Statement of the long-lasting commodities and income generation activities: Includes a number of basic and indispensable items (i.e, Livestock, or agricultural lands).
Housing Characteristics: Includes information and data pertaining to the housing conditions, including type of shelter, number of rooms, ownership, rent, water, electricity supply, connection to the sewer system, source of cooking and heating fuel, and remoteness/proximity of the house to education and health facilities.
Monthly and Annual Income: Data pertaining to the income of the family is collected from different sources at the end of the registration / recording period.
Second section: The second section of the questionnaire includes a list of 54 consumption and expenditure groups itemized and serially numbered according to its importance to the family. Each of these groups contains important commodities. The number of commodities items in each for all groups stood at 667 commodities and services items. Groups 1-21 include food, drink, and cigarettes. Group 22 includes homemade commodities. Groups 23-45 include all items except for food, drink and cigarettes. Groups 50-54 include all of the long-lasting commodities. Data on each of these groups was collected over different intervals of time so as to reflect expenditure over a period of one full year.
Both data entry and tabulation were performed using the ACCESS and SPSS software programs. The data entry process was organized in 6 files, corresponding to the main parts of the questionnaire. A data entry template was designed to reflect an exact image of the questionnaire, and included various electronic checks: logical check, range checks, consistency checks and cross-validation. Complete manual inspection was made of results after data entry was performed, and questionnaires containing field-related errors were sent back to the field for corrections.
The survey sample consists of about 3,781 households interviewed over a twelve-month period between January 2004 and January 2005. There were 3,098 households that completed the interview, of which 2,060 were in the West Bank and 1,038 households were in GazaStrip. The response rate was 82% in the Palestinian Territory.
The calculations of standard errors for the main survey estimations enable the user to identify the accuracy of estimations and the survey reliability. Total errors of the survey can be divided into two kinds: statistical errors, and non-statistical errors. Non-statistical errors are related to the procedures of statistical work at different stages, such as the failure to explain questions in the questionnaire, unwillingness or inability to provide correct responses, bad statistical coverage, etc. These errors depend on the nature of the work, training, supervision, and conducting all various related activities. The work team spared no effort at different stages to minimize non-statistical errors; however, it is difficult to estimate numerically such errors due to absence of technical computation methods based on theoretical principles to tackle them. On the other hand, statistical errors can be measured. Frequently they are measured by the standard error, which is the positive square root of the variance. The variance of this survey has been computed by using the “programming package” CENVAR.
Facebook
TwitterWithin the frame of PCBS' efforts in providing official Palestinian statistics in the different life aspects of Palestinian society and because the wide spread of Computer, Internet and Mobile Phone among the Palestinian people, and the important role they may play in spreading knowledge and culture and contribution in formulating the public opinion, PCBS conducted the Household Survey on Information and Communications Technology, 2014.
The main objective of this survey is to provide statistical data on Information and Communication Technology in the Palestine in addition to providing data on the following: -
· Prevalence of computers and access to the Internet. · Study the penetration and purpose of Technology use.
Palestine (West Bank and Gaza Strip) , type of locality (Urban, Rural, Refugee Camps) and governorate
Household. Person 10 years and over .
All Palestinian households and individuals whose usual place of residence in Palestine with focus on persons aged 10 years and over in year 2014.
Sample survey data [ssd]
Sampling Frame The sampling frame consists of a list of enumeration areas adopted in the Population, Housing and Establishments Census of 2007. Each enumeration area has an average size of about 124 households. These were used in the first phase as Preliminary Sampling Units in the process of selecting the survey sample.
Sample Size The total sample size of the survey was 7,268 households, of which 6,000 responded.
Sample Design The sample is a stratified clustered systematic random sample. The design comprised three phases:
Phase I: Random sample of 240 enumeration areas. Phase II: Selection of 25 households from each enumeration area selected in phase one using systematic random selection. Phase III: Selection of an individual (10 years or more) in the field from the selected households; KISH TABLES were used to ensure indiscriminate selection.
Sample Strata Distribution of the sample was stratified by: 1- Governorate (16 governorates, J1). 2- Type of locality (urban, rural and camps).
-
Face-to-face [f2f]
The survey questionnaire consists of identification data, quality controls and three main sections: Section I: Data on household members that include identification fields, the characteristics of household members (demographic and social) such as the relationship of individuals to the head of household, sex, date of birth and age.
Section II: Household data include information regarding computer processing, access to the Internet, and possession of various media and computer equipment. This section includes information on topics related to the use of computer and Internet, as well as supervision by households of their children (5-17 years old) while using the computer and Internet, and protective measures taken by the household in the home.
Section III: Data on persons (aged 10 years and over) about computer use, access to the Internet and possession of a mobile phone.
Preparation of Data Entry Program: This stage included preparation of the data entry programs using an ACCESS package and defining data entry control rules to avoid errors, plus validation inquiries to examine the data after it had been captured electronically.
Data Entry: The data entry process started on 8 May 2014 and ended on 23 June 2014. The data entry took place at the main PCBS office and in field offices using 28 data clerks.
Editing and Cleaning procedures: Several measures were taken to avoid non-sampling errors. These included editing of questionnaires before data entry to check field errors, using a data entry application that does not allow mistakes during the process of data entry, and then examining the data by using frequency and cross tables. This ensured that data were error free; cleaning and inspection of the anomalous values were conducted to ensure harmony between the different questions on the questionnaire.
Response Rates= 79%
There are many aspects of the concept of data quality; this includes the initial planning of the survey to the dissemination of the results and how well users understand and use the data. There are three components to the quality of statistics: accuracy, comparability, and quality control procedures.
Checks on data accuracy cover many aspects of the survey and include statistical errors due to the use of a sample, non-statistical errors resulting from field workers or survey tools, and response rates and their effect on estimations. This section includes:
Statistical Errors Data of this survey may be affected by statistical errors due to the use of a sample and not a complete enumeration. Therefore, certain differences can be expected in comparison with the real values obtained through censuses. Variances were calculated for the most important indicators.
Variance calculations revealed that there is no problem in disseminating results nationally or regionally (the West Bank, Gaza Strip), but some indicators show high variance by governorate, as noted in the tables of the main report.
Non-Statistical Errors Non-statistical errors are possible at all stages of the project, during data collection or processing. These are referred to as non-response errors, response errors, interviewing errors and data entry errors. To avoid errors and reduce their effects, strenuous efforts were made to train the field workers intensively. They were trained on how to carry out the interview, what to discuss and what to avoid, and practical and theoretical training took place during the training course. Training manuals were provided for each section of the questionnaire, along with practical exercises in class and instructions on how to approach respondents to reduce refused cases. Data entry staff were trained on the data entry program, which was tested before starting the data entry process.
Several measures were taken to avoid non-sampling errors. These included editing of questionnaires before data entry to check field errors, using a data entry application that does not allow mistakes during the process of data entry, and then examining the data by using frequency and cross tables. This ensured that data were error free; cleaning and inspection of the anomalous values were conducted to ensure harmony between the different questions on the questionnaire.
The sources of non-statistical errors can be summarized as: 1. Some of the households were not at home and could not be interviewed, and some households refused to be interviewed. 2. In unique cases, errors occurred due to the way the questions were asked by interviewers and respondents misunderstood some of the questions.
Facebook
TwitterThe STEP (Skills Toward Employment and Productivity) Measurement program is the first ever initiative to generate internationally comparable data on skills available in developing countries. The program implements standardized surveys to gather information on the supply and distribution of skills and the demand for skills in labor market of low-income countries.
The uniquely-designed Household Survey includes modules that measure the cognitive skills (reading, writing and numeracy), socio-emotional skills (personality, behavior and preferences) and job-specific skills (subset of transversal skills with direct job relevance) of a representative sample of adults aged 15 to 64 living in urban areas, whether they work or not. The cognitive skills module also incorporates a direct assessment of reading literacy based on the Survey of Adults Skills instruments. Modules also gather information about family, health and language.
The survey covered the following regions: Western, Central, Greater Accra, Volta, Eastern, Ashanti, Brong Ahafo, Northern, Upper East and Upper West.
- Areas are classified as urban based on each country's official definition.
The units of analysis are the individual respondents and households. A household roster is undertaken at the start of the survey and the individual respondent is randomly selected among all household members aged 15 to 64 included. The random selection process was designed by the STEP team and compliance with the procedure is carefully monitored during fieldwork.
The target population for the Ghana STEP survey comprises all non-institutionalized persons 15 to 64 years of age (inclusive) living in private dwellings in urban areas of the country at the time of data collection. This includes all residents except foreign diplomats and non-nationals working for international organizations. Exclusions : Military barracks were excluded from the Ghana target population.
Sample survey data [ssd]
The Ghana sample design is a four-stage sample design. There was no explicit stratification but the sample was implicitly stratified by Region. [Note: Implicit stratification was achieved by sorting the PSUs (i.e., EACode) by RegnCode and selecting a systematic sample of PSUs.]
First Stage Sample The primary sample unit (PSU) was a Census Enumeration Area (EA). Each PSU was uniquely defined by the sample frame variables Regncode, and EAcode. The sample frame was sorted by RegnCode to implicitly stratify the sample frame PSUs by region. The sampling objective was to select 250 PSUs, comprised of 200 Initial PSUs and 50 Reserve PSUs. Although 250 PSUs were selected, only 201 PSUs were activated. The PSUs were selected using a systematic probability proportional to size (PPS) sampling method, where the measure of size was the population size (i.e., EAPopn) in a PSU.
Second Stage Sample The second stage sample unit is a PSU partition. It was considered necessary to partition 'large' PSUs into smaller areas to facilitate the listing process. After the partitioning of the PSUs, the survey firm randomly selected one partition. The selected partition was fully listed for subsequent enumeration in accordance with the field procedures.
Third Stage Sample The third stage sample unit (SSU) is a household. The sampling objective was to obtain interviews at 15 households within each selected PSU. The households were selected in each PSU using a systematic random method.
Fourth Stage Sample The fourth stage sample unit was an individual aged 15-64 (inclusive). The sampling objective was to select one individual with equal probability from each selected household.
Sample Size The Ghana firm's sampling objective was to obtain interviews from 3000 individuals in the urban areas of the country. In order to provide sufficient sample to allow for a worst case scenario of a 50% response rate the number of sampled cases was doubled in each selected PSU. Although 50 extra PSUs were selected for use in case it was impossible to conduct any interviews in one or more initially selected PSUs only one reserve PSU was activated. Therefore, the Ghana firm conducted the STEP data collection in a total of 201 PSUs.
Sampling methodologies are described for each country in two documents: (i) The National Survey Design Planning Report (NSDPR) (ii) The weighting documentation
Face-to-face [f2f]
The STEP survey instruments include: (i) a Background Questionnaire developed by the WB STEP team (ii) a Reading Literacy Assessment developed by Educational Testing Services (ETS).
All countries adapted and translated both instruments following the STEP Technical Standards: 2 independent translators adapted and translated the Background Questionnaire and Reading Literacy Assessment, while reconciliation was carried out by a third translator. The WB STEP team and ETS collaborated closely with the survey firms during the process and reviewed the adaptation and translation (using a back translation). In the case of Ghana, no translation was necessary, but the adaptation process ensured that the English used in the Background Questionnaire and Reading Literacy Assessment closely reflected local use.
STEP Data Management Process 1. Raw data is sent by the survey firm 2. The WB STEP team runs data checks on the Background Questionnaire data. - ETS runs data checks on the Reading Literacy Assessment data. - Comments and questions are sent back to the survey firm. 3. The survey firm reviews comments and questions. When a data entry error is identified, the survey firm corrects the data. 4. The WB STEP team and ETS check the data files are clean. This might require additional iterations with the survey firm. 5. Once the data has been checked and cleaned, the WB STEP team computes the weights. Weights are computed by the STEP team to ensure consistency across sampling methodologies. 6. ETS scales the Reading Literacy Assessment data. 7. The WB STEP team merges the Background Questionnaire data with the Reading Literacy Assessment data and computes derived variables.
Detailed information data processing in STEP surveys is provided in the 'Guidelines for STEP Data Entry Programs' document provided as an external resource. The template do-file used by the STEP team to check the raw background questionnaire data is provided as an external resource.
An overall response rate of 83.2% was achieved in the Ghana STEP Survey. Table 20 of the weighting documentation provides the detailed percentage distribution by final status code.
A weighting documentation was prepared for each participating country and provides some information on sampling errors. The weighting documentation is provided as an external resource.
Facebook
TwitterA random sample of households were invited to participate in this survey. In the dataset, you will find the respondent level data in each row with the questions in each column. The numbers represent a scale option from the survey, such as 1=Excellent, 2=Good, 3=Fair, 4=Poor. The question stem, response option, and scale information for each field can be found in the var "variable labels" and "value labels" sheets. VERY IMPORTANT NOTE: The scientific survey data were weighted, meaning that the demographic profile of respondents was compared to the demographic profile of adults in Bloomington from US Census data. Statistical adjustments were made to bring the respondent profile into balance with the population profile. This means that some records were given more "weight" and some records were given less weight. The weights that were applied are found in the field "wt". If you do not apply these weights, you will not obtain the same results as can be found in the report delivered to the Bloomington. The easiest way to replicate these results is likely to create pivot tables, and use the sum of the "wt" field rather than a count of responses.