Facebook
TwitterThe Tanzania Demographic and Health Survey (TDHS) is part of the worldwide Demographic and Health Surveys (DHS) programme, which is designed to collect data on fertility, family planning, and maternal and child health.
The primary objective of the 1999 TRCHS was to collect data at the national level (with breakdowns by urban-rural and Mainland-Zanzibar residence wherever warranted) on fertility levels and preferences, family planning use, maternal and child health, breastfeeding practices, nutritional status of young children, childhood mortality levels, knowledge and behaviour regarding HIV/AIDS, and the availability of specific health services within the community.1 Related objectives were to produce these results in a timely manner and to ensure that the data were disseminated to a wide audience of potential users in governmental and nongovernmental organisations within and outside Tanzania. The ultimate intent is to use the information to evaluate current programmes and to design new strategies for improving health and family planning services for the people of Tanzania.
National. The sample was designed to provide estimates for the whole country, for urban and rural areas separately, and for Zanzibar and, in some cases, Unguja and Pemba separately.
Sample survey data
The TRCHS used a three-stage sample design. Overall, 176 census enumeration areas were selected (146 on the Mainland and 30 in Zanzibar) with probability proportional to size on an approximately self-weighting basis on the Mainland, but with oversampling of urban areas and Zanzibar. To reduce costs and maximise the ability to identify trends over time, these enumeration areas were selected from the 357 sample points that were used in the 1996 TDHS, which in turn were selected from the 1988 census frame of enumeration in a two-stage process (first wards/branches and then enumeration areas within wards/branches). Before the data collection, fieldwork teams visited the selected enumeration areas to list all the households. From these lists, households were selected to be interviewed. The sample was designed to provide estimates for the whole country, for urban and rural areas separately, and for Zanzibar and, in some cases, Unguja and Pemba separately. The health facilities component of the TRCHS involved visiting hospitals, health centres, and pharmacies located in areas around the households interviewed. In this way, the data from the two components can be linked and a richer dataset produced.
See detailed sample implementation in the APPENDIX A of the final report.
Face-to-face
The household survey component of the TRCHS involved three questionnaires: 1) a Household Questionnaire, 2) a Women’s Questionnaire for all individual women age 15-49 in the selected households, and 3) a Men’s Questionnaire for all men age 15-59.
The health facilities survey involved six questionnaires: 1) a Community Questionnaire administered to men and women in each selected enumeration area; 2) a Facility Questionnaire; 3) a Facility Inventory; 4) a Service Provider Questionnaire; 5) a Pharmacy Inventory Questionnaire; and 6) a questionnaire for the District Medical Officers.
All these instruments were based on model questionnaires developed for the MEASURE programme, as well as on the questionnaires used in the 1991-92 TDHS, the 1994 TKAP, and the 1996 TDHS. These model questionnaires were adapted for use in Tanzania during meetings with representatives from the Ministry of Health, the University of Dar es Salaam, the Tanzania Food and Nutrition Centre, USAID/Tanzania, UNICEF/Tanzania, UNFPA/Tanzania, and other potential data users. The questionnaires and manual were developed in English and then translated into and printed in Kiswahili.
The Household Questionnaire was used to list all the usual members and visitors in the selected households. Some basic information was collected on the characteristics of each person listed, including his/her age, sex, education, and relationship to the head of the household. The main purpose of the Household Questionnaire was to identify women and men who were eligible for individual interview and children under five who were to be weighed and measured. Information was also collected about the dwelling itself, such as the source of water, type of toilet facilities, materials used to construct the house, ownership of various consumer goods, and use of iodised salt. Finally, the Household Questionnaire was used to collect some rudimentary information about the extent of child labour.
The Women’s Questionnaire was used to collect information from women age 15-49. These women were asked questions on the following topics: · Background characteristics (age, education, religion, type of employment) · Birth history · Knowledge and use of family planning methods · Antenatal, delivery, and postnatal care · Breastfeeding and weaning practices · Vaccinations, birth registration, and health of children under age five · Marriage and recent sexual activity · Fertility preferences · Knowledge and behaviour concerning HIV/AIDS.
The Men’s Questionnaire covered most of these same issues, except that it omitted the sections on the detailed reproductive history, maternal health, and child health. The final versions of the English questionnaires are provided in Appendix E.
Before the questionnaires could be finalised, a pretest was done in July 1999 in Kibaha District to assess the viability of the questions, the flow and logical sequence of the skip pattern, and the field organisation. Modifications to the questionnaires, including wording and translations, were made based on lessons drawn from the exercise.
In all, 3,826 households were selected for the sample, out of which 3,677 were occupied. Of the households found, 3,615 were interviewed, representing a response rate of 98 percent. The shortfall is primarily due to dwellings that were vacant or in which the inhabitants were not at home despite of several callbacks.
In the interviewed households, a total of 4,118 eligible women (i.e., women age 15-49) were identified for the individual interview, and 4,029 women were actually interviewed, yielding a response rate of 98 percent. A total of 3,792 eligible men (i.e., men age 15-59), were identified for the individual interview, of whom 3,542 were interviewed, representing a response rate of 93 percent. The principal reason for nonresponse among both eligible men and women was the failure to find them at home despite repeated visits to the household. The lower response rate among men than women was due to the more frequent and longer absences of men.
The response rates are lower in urban areas due to longer absence of respondents from their homes. One-member households are more common in urban areas and are more difficult to interview because they keep their houses locked most of the time. In urban settings, neighbours often do not know the whereabouts of such people.
The estimates from a sample survey are affected by two types of errors: (1) non-sampling errors, and (2) sampling errors. Non-sampling errors are the results of mistakes made in implementing data collection and data processing, such as failure to locate and interview the correct household, misunderstanding of the questions on the part of either the interviewer or the respondent, and data entry errors. Although numerous efforts were made during the implementation of the TRCHS to minimise this type of error, nonsampling errors are impossible to avoid and difficult to evaluate statistically.
Sampling errors, on the other hand, can be evaluated statistically. The sample of respondents selected in the TRCHS is only one of many samples that could have been selected from the same population, using the same design and expected size. Each of these samples would yield results that differ somewhat from the results of the actual sample selected. Sampling errors are a measure of the variability between all possible samples. Although the degree of variability is not known exactly, it can be estimated from the survey results.
A sampling error is usually measured in terms of the standard error for a particular statistic (mean, percentage, etc.), which is the square root of the variance. The standard error can be used to calculate confidence intervals within which the true value for the population can reasonably be assumed to fall. For example, for any given statistic calculated from a sample survey, the value of that statistic will fall within a range of plus or minus two times the standard error of that statistic in 95 percent of all possible samples of identical size and design.
If the sample of respondents had been selected as a simple random sample, it would have been possible to use straightforward formulas for calculating sampling errors. However, the TRCHS sample is the result of a two-stage stratified design, and, consequently, it was necessary to use more complex formulae. The computer software used to calculate sampling errors for the TRCHS is the ISSA Sampling Error Module (SAMPERR). This module used the Taylor linearisation method of variance estimation for survey estimates that are means or proportions. The Jackknife repeated replication method is used for variance estimation of more complex statistics such as fertility and mortality rate
Note: See detailed sampling error calculation in the APPENDIX B
Facebook
TwitterThe main objective of the HEIS survey is to obtain detailed data on household expenditure and income, linked to various demographic and socio-economic variables, to enable computation of poverty indices and determine the characteristics of the poor and prepare poverty maps. Therefore, to achieve these goals, the sample had to be representative on the sub-district level. The raw survey data provided by the Statistical Office was cleaned and harmonized by the Economic Research Forum, in the context of a major research project to develop and expand knowledge on equity and inequality in the Arab region. The main focus of the project is to measure the magnitude and direction of change in inequality and to understand the complex contributing social, political and economic forces influencing its levels. However, the measurement and analysis of the magnitude and direction of change in this inequality cannot be consistently carried out without harmonized and comparable micro-level data on income and expenditures. Therefore, one important component of this research project is securing and harmonizing household surveys from as many countries in the region as possible, adhering to international statistics on household living standards distribution. Once the dataset has been compiled, the Economic Research Forum makes it available, subject to confidentiality agreements, to all researchers and institutions concerned with data collection and issues of inequality.
Data collected through the survey helped in achieving the following objectives: 1. Provide data weights that reflect the relative importance of consumer expenditure items used in the preparation of the consumer price index 2. Study the consumer expenditure pattern prevailing in the society and the impact of demographic and socio-economic variables on those patterns 3. Calculate the average annual income of the household and the individual, and assess the relationship between income and different economic and social factors, such as profession and educational level of the head of the household and other indicators 4. Study the distribution of individuals and households by income and expenditure categories and analyze the factors associated with it 5. Provide the necessary data for the national accounts related to overall consumption and income of the household sector 6. Provide the necessary income data to serve in calculating poverty indices and identifying the poor characteristics as well as drawing poverty maps 7. Provide the data necessary for the formulation, follow-up and evaluation of economic and social development programs, including those addressed to eradicate poverty
National
Sample survey data [ssd]
The Household Expenditure and Income survey sample for 2010, was designed to serve the basic objectives of the survey through providing a relatively large sample in each sub-district to enable drawing a poverty map in Jordan. The General Census of Population and Housing in 2004 provided a detailed framework for housing and households for different administrative levels in the country. Jordan is administratively divided into 12 governorates, each governorate is composed of a number of districts, each district (Liwa) includes one or more sub-district (Qada). In each sub-district, there are a number of communities (cities and villages). Each community was divided into a number of blocks. Where in each block, the number of houses ranged between 60 and 100 houses. Nomads, persons living in collective dwellings such as hotels, hospitals and prison were excluded from the survey framework.
A two stage stratified cluster sampling technique was used. In the first stage, a cluster sample proportional to the size was uniformly selected, where the number of households in each cluster was considered the weight of the cluster. At the second stage, a sample of 8 households was selected from each cluster, in addition to another 4 households selected as a backup for the basic sample, using a systematic sampling technique. Those 4 households were sampled to be used during the first visit to the block in case the visit to the original household selected is not possible for any reason. For the purposes of this survey, each sub-district was considered a separate stratum to ensure the possibility of producing results on the sub-district level. In this respect, the survey framework adopted that provided by the General Census of Population and Housing Census in dividing the sample strata. To estimate the sample size, the coefficient of variation and the design effect of the expenditure variable provided in the Household Expenditure and Income Survey for the year 2008 was calculated for each sub-district. These results were used to estimate the sample size on the sub-district level so that the coefficient of variation for the expenditure variable in each sub-district is less than 10%, at a minimum, of the number of clusters in the same sub-district (6 clusters). This is to ensure adequate presentation of clusters in different administrative areas to enable drawing an indicative poverty map.
It should be noted that in addition to the standard non response rate assumed, higher rates were expected in areas where poor households are concentrated in major cities. Therefore, those were taken into consideration during the sampling design phase, and a higher number of households were selected from those areas, aiming at well covering all regions where poverty spreads.
Face-to-face [f2f]
Raw Data: - Organizing forms/questionnaires: A compatible archive system was used to classify the forms according to different rounds throughout the year. A registry was prepared to indicate different stages of the process of data checking, coding and entry till forms were back to the archive system. - Data office checking: This phase was achieved concurrently with the data collection phase in the field where questionnaires completed in the field were immediately sent to data office checking phase. - Data coding: A team was trained to work on the data coding phase, which in this survey is only limited to education specialization, profession and economic activity. In this respect, international classifications were used, while for the rest of the questions, coding was predefined during the design phase. - Data entry/validation: A team consisting of system analysts, programmers and data entry personnel were working on the data at this stage. System analysts and programmers started by identifying the survey framework and questionnaire fields to help build computerized data entry forms. A set of validation rules were added to the entry form to ensure accuracy of data entered. A team was then trained to complete the data entry process. Forms prepared for data entry were provided by the archive department to ensure forms are correctly extracted and put back in the archive system. A data validation process was run on the data to ensure the data entered is free of errors. - Results tabulation and dissemination: After the completion of all data processing operations, ORACLE was used to tabulate the survey final results. Those results were further checked using similar outputs from SPSS to ensure that tabulations produced were correct. A check was also run on each table to guarantee consistency of figures presented, together with required editing for tables' titles and report formatting.
Harmonized Data: - The Statistical Package for Social Science (SPSS) was used to clean and harmonize the datasets. - The harmonization process started with cleaning all raw data files received from the Statistical Office. - Cleaned data files were then merged to produce one data file on the individual level containing all variables subject to harmonization. - A country-specific program was generated for each dataset to generate/compute/recode/rename/format/label harmonized variables. - A post-harmonization cleaning process was run on the data. - Harmonized data was saved on the household as well as the individual level, in SPSS and converted to STATA format.
Facebook
TwitterResearch ICT Africa (RIA) is a non-profit, public interest, research entity which undertakes research on how information and communication technologies are being accessed and used in African countries. The aim is to measure the impact on lifestyles and livelihoods of people and households and to understand how informal businesses can prosper through the use of ICTs. This research can facilitate informed policy-making for improved access, use and application of ICT for social development and economic growth. RIA collects both supply-side and demand-side data. On the demand-side nationally representative surveys are conducted on ICT use and demand in African countries. This survey dataset consists of data collected by household and business surveys in thirteen African countries in 2011-2012.
The surveys had national coverage. Survey countries included Botswana, Cameroon, Ethiopia, Ghana, Kenya, Mozambique, Namibia, Nigeria, Rwanda, South Africa, Tanzania, Uganda, and Tunisia.
Households and individuals
The data is nationally representative on a household and individual level for individuals 16 years of age or older.
Sample survey data [ssd]
The random sampling was performed in four steps for households and businesses, and five steps for individuals. • Step 1: The national census sample frames was split into urban and rural Enumerator areas (EAs). • Step 2: EAs were sampled for each stratum using probability proportional to size (PPS). • Step 3: For each EA two listings were compiled, one for households and one for businesses. The listings serve as sample frame for the simple random sections. • Step 4: 24 Households and 10 businesses were sampled using simple random sample for each selected EA. • Step 5: From all household members 15 years or older or visitors staying the night at the house one was randomly selected based on simple random sampling.
Face-to-face [f2f]
Facebook
TwitterThis collection is a nationally representative--although clustered--1 in 1000 preliminary subsample of the United States population in 1880. The subsample is based on every tenth microfilm reel of enumeration forms (there are a total of 1,454 reels) and, within each reel, on the census page itself. In terms of the Public Use Sample as a whole, a sample density of 1 person per 100 was chosen so that a single sample point was randomly generated for every two census pages. Sample points were chosen for inclusion in the collection only if the individual selected was the first person listed in the dwelling. Under this procedure each dwelling, family, and individual in the population had a 1 in 100 probability of inclusion in the Public Use Sample.
Please Note: This dataset is part of the historical CISER Data Archive Collection and is also available at ICPSR at https://doi.org/10.3886/ICPSR09474.v1. We highly recommend using the ICPSR version as they may make this dataset available in multiple data formats in the future.
Facebook
TwitterIn Austria a population census takes place every 10 years; this census contains a program of important statistical data on population and employment. They roughly corresponds to the information in the Mikrozensus standard survey but are more detailed (for instance with question on the connection of the place of residence and the workplace, questions on education, confession, etc.) Population and Mikrozensus are closely linked which the name already implies: Mikrozensus means a small-scale population census; this should demonstrate that what the population census reports only every 10 years, the Mikrozensus reports through the method of ongoing sampling. These ongoing sample are also collected in the years of the population census. The Mikrozensus however is far more detailed than the survey program of the population census because the Mikrozensus special surveys offer the possibility of asking questions which are fare beyond the scope of the population census. This complementary function of Mikrozensus and population census becomes especially obvious in the June-survey: certain questions that could not be posed in the population census due to the limited program were answered in the Mikrozensus via sampling. These were the topics: questions on the social stratification of the population questions on fertility and succession of birth questions on the silent Human Resources
Facebook
TwitterThe study included four separate surveys:
The survey of Family Income Support (MOP in Serbian) recipients in 2002 These two datasets are published together separately from the 2003 datasets.
The LSMS survey of general population of Serbia in 2003 (panel survey)
The survey of Roma from Roma settlements in 2003 These two datasets are published together.
Objectives
LSMS represents multi-topical study of household living standard and is based on international experience in designing and conducting this type of research. The basic survey was carried out in 2002 on a representative sample of households in Serbia (without Kosovo and Metohija). Its goal was to establish a poverty profile according to the comprehensive data on welfare of households and to identify vulnerable groups. Also its aim was to assess the targeting of safety net programs by collecting detailed information from individuals on participation in specific government social programs. This study was used as the basic document in developing Poverty Reduction Strategy (PRS) in Serbia which was adopted by the Government of the Republic of Serbia in October 2003.
The survey was repeated in 2003 on a panel sample (the households which participated in 2002 survey were re-interviewed).
Analysis of the take-up and profile of the population in 2003 was the first step towards formulating the system of monitoring in the Poverty Reduction Strategy (PRS). The survey was conducted in accordance with the same methodological principles used in 2002 survey, with necessary changes referring only to the content of certain modules and the reduction in sample size. The aim of the repeated survey was to obtain panel data to enable monitoring of the change in the living standard within a period of one year, thus indicating whether there had been a decrease or increase in poverty in Serbia in the course of 2003. [Note: Panel data are the data obtained on the sample of households which participated in the both surveys. These data made possible tracking of living standard of the same persons in the period of one year.]
Along with these two comprehensive surveys, conducted on national and regional representative samples which were to give a picture of the general population, there were also two surveys with particular emphasis on vulnerable groups. In 2002, it was the survey of living standard of Family Income Support recipients with an aim to validate this state supported program of social welfare. In 2003 the survey of Roma from Roma settlements was conducted. Since all present experiences indicated that this was one of the most vulnerable groups on the territory of Serbia and Montenegro, but with no ample research of poverty of Roma population made, the aim of the survey was to compare poverty of this group with poverty of basic population and to establish which categories of Roma population were at the greatest risk of poverty in 2003. However, it is necessary to stress that the LSMS of the Roma population comprised potentially most imperilled Roma, while the Roma integrated in the main population were not included in this study.
The surveys were conducted on the whole territory of Serbia (without Kosovo and Metohija).
Sample survey data [ssd]
Sample frame for both surveys of general population (LSMS) in 2002 and 2003 consisted of all permanent residents of Serbia, without the population of Kosovo and Metohija, according to definition of permanently resident population contained in UN Recommendations for Population Censuses, which were applied in 2002 Census of Population in the Republic of Serbia. Therefore, permanent residents were all persons living in the territory Serbia longer than one year, with the exception of diplomatic and consular staff.
The sample frame for the survey of Family Income Support recipients included all current recipients of this program on the territory of Serbia based on the official list of recipients given by Ministry of Social affairs.
The definition of the Roma population from Roma settlements was faced with obstacles since precise data on the total number of Roma population in Serbia are not available. According to the last population Census from 2002 there were 108,000 Roma citizens, but the data from the Census are thought to significantly underestimate the total number of the Roma population. However, since no other more precise data were available, this number was taken as the basis for estimate on Roma population from Roma settlements. According to the 2002 Census, settlements with at least 7% of the total population who declared itself as belonging to Roma nationality were selected. A total of 83% or 90,000 self-declared Roma lived in the settlements that were defined in this way and this number was taken as the sample frame for Roma from Roma settlements.
Planned sample: In 2002 the planned size of the sample of general population included 6.500 households. The sample was both nationally and regionally representative (representative on each individual stratum). In 2003 the planned panel sample size was 3.000 households. In order to preserve the representative quality of the sample, we kept every other census block unit of the large sample realized in 2002. This way we kept the identical allocation by strata. In selected census block unit, the same households were interviewed as in the basic survey in 2002. The planned sample of Family Income Support recipients in 2002 and Roma from Roma settlements in 2003 was 500 households for each group.
Sample type: In both national surveys the implemented sample was a two-stage stratified sample. Units of the first stage were enumeration districts, and units of the second stage were the households. In the basic 2002 survey, enumeration districts were selected with probability proportional to number of households, so that the enumeration districts with bigger number of households have a higher probability of selection. In the repeated survey in 2003, first-stage units (census block units) were selected from the basic sample obtained in 2002 by including only even numbered census block units. In practice this meant that every second census block unit from the previous survey was included in the sample. In each selected enumeration district the same households interviewed in the previous round were included and interviewed. On finishing the survey in 2003 the cases were merged both on the level of households and members.
Stratification: Municipalities are stratified into the following six territorial strata: Vojvodina, Belgrade, Western Serbia, Central Serbia (Šumadija and Pomoravlje), Eastern Serbia and South-east Serbia. Primary units of selection are further stratified into enumeration districts which belong to urban type of settlements and enumeration districts which belong to rural type of settlement.
The sample of Family Income Support recipients represented the cases chosen randomly from the official list of recipients provided by Ministry of Social Affairs. The sample of Roma from Roma settlements was, as in the national survey, a two-staged stratified sample, but the units in the first stage were settlements where Roma population was represented in the percentage over 7%, and the units of the second stage were Roma households. Settlements are stratified in three territorial strata: Vojvodina, Beograd and Central Serbia.
Face-to-face [f2f]
In all surveys the same questionnaire with minimal changes was used. It included different modules, topically separate areas which had an aim of perceiving the living standard of households from different angles. Topic areas were the following: 1. Roster with demography. 2. Housing conditions and durables module with information on the age of durables owned by a household with a special block focused on collecting information on energy billing, payments, and usage. 3. Diary of food expenditures (weekly), including home production, gifts and transfers in kind. 4. Questionnaire of main expenditure-based recall periods sufficient to enable construction of annual consumption at the household level, including home production, gifts and transfers in kind. 5. Agricultural production for all households which cultivate 10+ acres of land or who breed cattle. 6. Participation and social transfers module with detailed breakdown by programs 7. Labour Market module in line with a simplified version of the Labour Force Survey (LFS), with special additional questions to capture various informal sector activities, and providing information on earnings 8. Health with a focus on utilization of services and expenditures (including informal payments) 9. Education module, which incorporated pre-school, compulsory primary education, secondary education and university education. 10. Special income block, focusing on sources of income not covered in other parts (with a focus on remittances).
During field work, interviewers kept a precise diary of interviews, recording both successful and unsuccessful visits. Particular attention was paid to reasons why some households were not interviewed. Separate marks were given for households which were not interviewed due to refusal and for cases when a given household could not be found on the territory of the chosen census block.
In 2002 a total of 7,491 households were contacted. Of this number a total of 6,386 households in 621 census rounds were interviewed. Interviewers did not manage to collect the data for 1,106 or 14.8% of selected households. Out of this number 634 households
Facebook
TwitterThis intermediate level data set was extracted from the census bureau database. There are 48842 instances of data set, mix of continuous and discrete (train=32561, test=16281).
The data set has 15 attribute which include age, sex, education level and other relevant details of a person. The data set will help to improve your skills in Exploratory Data Analysis, Data Wrangling, Data Visualization and Classification Models.
Feel free to explore the data set with multiple supervised and unsupervised learning techniques. The Following description gives more details on this data set:
age: the age of an individual.workclass: The type of work or employment of an individual. It can have the following categories:
Final Weight: The weights on the CPS files are controlled to independent estimates of the civilian noninstitutional population of the US. These are prepared monthly for us by Population Division here at the Census Bureau. We use 3 sets of controls.These are: 1. A single cell estimate of the population 16+ for each state. 2. Controls for Hispanic Origin by age and sex. 3. Controls by Race, age and sex.
We use all three sets of controls in our weighting program and "rake" through them 6 times so that by the end we come back to all the controls we used.
People with similar demographic characteristics should have similar weights. There is one important caveat to remember about this statement. That is that since the CPS sample is actually a collection of 51 state samples, each with its own probability of selection, the statement only applies within state.
education: The highest level of education completed. education-num: The number of years of education completed. marital-status: The marital status. occupation: Type of work performed by an individual.relationship: The relationship status.race: The race of an individual. sex: The gender of an individual.capital-gain: The amount of capital gain (financial profit).capital-loss: The amount of capital loss an individual has incurred.hours-per-week: The number of hours works per week.native-country: The country of origin or the native country.income: The income level of an individual and serves as the target variable. It indicates whether the income is greater than $50,000 or less than or equal to $50,000, denoted as (>50K, <=50K).
Facebook
TwitterRound 1 of the Afrobarometer survey was conducted from July 1999 through June 2001 in 12 African countries, to solicit public opinion on democracy, governance, markets, and national identity. The full 12 country dataset released was pieced together out of different projects, Round 1 of the Afrobarometer survey,the old Southern African Democracy Barometer, and similar surveys done in West and East Africa.
The 7 country dataset is a subset of the Round 1 survey dataset, and consists of a combined dataset for the 7 Southern African countries surveyed with other African countries in Round 1, 1999-2000 (Botswana, Lesotho, Malawi, Namibia, South Africa, Zambia and Zimbabwe). It is a useful dataset because, in contrast to the full 12 country Round 1 dataset, all countries in this dataset were surveyed with the identical questionnaire
Botswana Lesotho Malawi Namibia South Africa Zambia Zimbabwe
Basic units of analysis that the study investigates include: individuals and groups
Sample survey data [ssd]
A new sample has to be drawn for each round of Afrobarometer surveys. Whereas the standard sample size for Round 3 surveys will be 1200 cases, a larger sample size will be required in societies that are extremely heterogeneous (such as South Africa and Nigeria), where the sample size will be increased to 2400. Other adaptations may be necessary within some countries to account for the varying quality of the census data or the availability of census maps.
The sample is designed as a representative cross-section of all citizens of voting age in a given country. The goal is to give every adult citizen an equal and known chance of selection for interview. We strive to reach this objective by (a) strictly applying random selection methods at every stage of sampling and by (b) applying sampling with probability proportionate to population size wherever possible. A randomly selected sample of 1200 cases allows inferences to national adult populations with a margin of sampling error of no more than plus or minus 2.5 percent with a confidence level of 95 percent. If the sample size is increased to 2400, the confidence interval shrinks to plus or minus 2 percent.
Sample Universe
The sample universe for Afrobarometer surveys includes all citizens of voting age within the country. In other words, we exclude anyone who is not a citizen and anyone who has not attained this age (usually 18 years) on the day of the survey. Also excluded are areas determined to be either inaccessible or not relevant to the study, such as those experiencing armed conflict or natural disasters, as well as national parks and game reserves. As a matter of practice, we have also excluded people living in institutionalized settings, such as students in dormitories and persons in prisons or nursing homes.
What to do about areas experiencing political unrest? On the one hand we want to include them because they are politically important. On the other hand, we want to avoid stretching out the fieldwork over many months while we wait for the situation to settle down. It was agreed at the 2002 Cape Town Planning Workshop that it is difficult to come up with a general rule that will fit all imaginable circumstances. We will therefore make judgments on a case-by-case basis on whether or not to proceed with fieldwork or to exclude or substitute areas of conflict. National Partners are requested to consult Core Partners on any major delays, exclusions or substitutions of this sort.
Sample Design
The sample design is a clustered, stratified, multi-stage, area probability sample.
To repeat the main sampling principle, the objective of the design is to give every sample element (i.e. adult citizen) an equal and known chance of being chosen for inclusion in the sample. We strive to reach this objective by (a) strictly applying random selection methods at every stage of sampling and by (b) applying sampling with probability proportionate to population size wherever possible.
In a series of stages, geographically defined sampling units of decreasing size are selected. To ensure that the sample is representative, the probability of selection at various stages is adjusted as follows:
The sample is stratified by key social characteristics in the population such as sub-national area (e.g. region/province) and residential locality (urban or rural). The area stratification reduces the likelihood that distinctive ethnic or language groups are left out of the sample. And the urban/rural stratification is a means to make sure that these localities are represented in their correct proportions. Wherever possible, and always in the first stage of sampling, random sampling is conducted with probability proportionate to population size (PPPS). The purpose is to guarantee that larger (i.e., more populated) geographical units have a proportionally greater probability of being chosen into the sample. The sampling design has four stages
A first-stage to stratify and randomly select primary sampling units;
A second-stage to randomly select sampling start-points;
A third stage to randomly choose households;
A final-stage involving the random selection of individual respondents
We shall deal with each of these stages in turn.
STAGE ONE: Selection of Primary Sampling Units (PSUs)
The primary sampling units (PSU's) are the smallest, well-defined geographic units for which reliable population data are available. In most countries, these will be Census Enumeration Areas (or EAs). Most national census data and maps are broken down to the EA level. In the text that follows we will use the acronyms PSU and EA interchangeably because, when census data are employed, they refer to the same unit.
We strongly recommend that NIs use official national census data as the sampling frame for Afrobarometer surveys. Where recent or reliable census data are not available, NIs are asked to inform the relevant Core Partner before they substitute any other demographic data. Where the census is out of date, NIs should consult a demographer to obtain the best possible estimates of population growth rates. These should be applied to the outdated census data in order to make projections of population figures for the year of the survey. It is important to bear in mind that population growth rates vary by area (region) and (especially) between rural and urban localities. Therefore, any projected census data should include adjustments to take such variations into account.
Indeed, we urge NIs to establish collegial working relationships within professionals in the national census bureau, not only to obtain the most recent census data, projections, and maps, but to gain access to sampling expertise. NIs may even commission a census statistician to draw the sample to Afrobarometer specifications, provided that provision for this service has been made in the survey budget.
Regardless of who draws the sample, the NIs should thoroughly acquaint themselves with the strengths and weaknesses of the available census data and the availability and quality of EA maps. The country and methodology reports should cite the exact census data used, its known shortcomings, if any, and any projections made from the data. At minimum, the NI must know the size of the population and the urban/rural population divide in each region in order to specify how to distribute population and PSU's in the first stage of sampling. National investigators should obtain this written data before they attempt to stratify the sample.
Once this data is obtained, the sample population (either 1200 or 2400) should be stratified, first by area (region/province) and then by residential locality (urban or rural). In each case, the proportion of the sample in each locality in each region should be the same as its proportion in the national population as indicated by the updated census figures.
Having stratified the sample, it is then possible to determine how many PSU's should be selected for the country as a whole, for each region, and for each urban or rural locality.
The total number of PSU's to be selected for the whole country is determined by calculating the maximum degree of clustering of interviews one can accept in any PSU. Because PSUs (which are usually geographically small EAs) tend to be socially homogenous we do not want to select too many people in any one place. Thus, the Afrobarometer has established a standard of no more than 8 interviews per PSU. For a sample size of 1200, the sample must therefore contain 150 PSUs/EAs (1200 divided by 8). For a sample size of 2400, there must be 300 PSUs/EAs.
These PSUs should then be allocated proportionally to the urban and rural localities within each regional stratum of the sample. Let's take a couple of examples from a country with a sample size of 1200. If the urban locality of Region X in this country constitutes 10 percent of the current national population, then the sample for this stratum should be 15 PSUs (calculated as 10 percent of 150 PSUs). If the rural population of Region Y constitutes 4 percent of the current national population, then the sample for this stratum should be 6 PSU's.
The next step is to select particular PSUs/EAs using random methods. Using the above example of the rural localities in Region Y, let us say that you need to pick 6 sample EAs out of a census list that contains a total of 240 rural EAs in Region Y. But which 6? If the EAs created by the national census bureau are of equal or roughly equal population size, then selection is relatively straightforward. Just number all EAs consecutively, then make six selections using a table of random numbers. This procedure, known as simple random sampling (SRS), will
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Socio-demographic and health indicators of India and study districts.
Facebook
TwitterThe 2007/08 Agriculture Sample Census Survey (ASCS) was designed to meet the data needs of a wide range of users down to district level including policy makers at local, regional and national levels, rural development agencies, funding institutions, researchers, NGOs, farmer organisations, etc. The dataset is both more numerous in its sample and detailed in its scope and coverage, so as to meet the user demand. The census was carried out in order to:
· Identify structural changes if any, in the size of farm household holdings, crop and livestock production, farm input and implement use. It also seeks to determine if there are any improvements in rural infrastructure and in the level of agriculture household living conditions; · Provide benchmark data on productivity, production and agricultural practices in relation to policies and interventions promoted by the Ministry of Agriculture and Food Security and other stake holders. · Obtain benchmark data that will be used to address specific issues such as: food security, rural poverty, gender, agro-processing, marketing and service delivery.
National Coverage
Households
Small scale and Large Scale Farmers within the community.
Sample survey data [ssd]
The Mainland sample consisted of 3,192 villages. The total Mainland sample was 47,880 agricultural households, while in Zanzibar a total of 317 Enumeration Areas (EAs) were selected and 4,755 agriculture households were covered.
The villages were drawn from the National Master Sample (NMS) developed by the National Bureau of Statistics (NBS) to serve as a national framework for the conduct of household-based surveys in the country. The NMS was developed from the 2002 Population and Housing Census.
The number of villages/Enumeration Areas (EAs) were selected for the first stage with a probability proportional to the number of villages/EAs in each district. In the second stage, 15 households were selected from a list of agricultural households in each Village/EA using systematic random sampling.
Face-to-face paper [f2f]
Data editing took place at a number of stages. The following procedures were carried out: - Manual cleaning exercise was done prior to scanning. Questionnaires found dirty or damaged and generally unsuitable for scanning were put aside for manual data entry. - CSPro was used for data entry of all Large Scale Farms and Community based questionnaires. - Scanning and ICR data capture technology for the smallholder questionnaire was also done. - There was also an interactive validation during the ICR extraction process. - The use of a batch validation program developed in CSPro. This was used in order to identify inconsistencies within a questionnaire. - Statistical Package for Social Sciences (SPSS) was used to produce the census tabulations. - Microsoft Excel was used to organize the tables, charts and compute additional indicators. - Arc GIS (Geographical Information System) was used in producing the maps. - Microsoft Word was used in compiling and writing up the reports.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Key Table Information.Table Title.Manufacturing: E-Commerce Statistics for the U.S.: 2022.Table ID.ECNECOMM2022.EC2231ECOMM.Survey/Program.Economic Census.Year.2022.Dataset.ECN Core Statistics Manufacturing: E-Commerce Statistics for the U.S.: 2022.Release Date.2025-01-23.Release Schedule.The Economic Census occurs every five years, in years ending in 2 and 7.The data in this file come from the 2022 Economic Census data files released on a flow basis starting in January 2024 with First Look Statistics. Preliminary U.S. totals released in January 2024 are superseded with final data shown in the releases of later economic census statistics through March 2026.For more information about economic census planned data product releases, see 2022 Economic Census Release Schedule..Dataset Universe.The dataset universe consists of all establishments that are in operation for at least some part of 2022, are located in one of the 50 U.S. states, associated offshore areas, or the District of Columbia, have paid employees, and are classified in one of nineteen in-scope sectors defined by the 2022 North American Industry Classification System (NAICS)..Methodology.Data Items and Other Identifying Records.Sales, value of shipments, or revenue ($1,000)E-Shipments value ($1,000) E-Shipments as percent of total sales, value of shipments, or revenue (%) Range indicating imputed percentage of total sales, value of shipments, or revenueDefinitions can be found by clicking on the column header in the table or by accessing the Economic Census Glossary..Unit(s) of Observation.The reporting units for the economic census are employer establishments. An establishment is generally a single physical location where business is conducted or where services or industrial operations are performed. A company or firm is comprised of one or more in-scope establishments that operate under the ownership or control of a single organization. For some industries, the reporting units are instead groups of all establishments in the same industry belonging to the same firm..Geography Coverage.The data are shown for the U.S. level only. For information about economic census geographies, including changes for 2022, see Geographies..Industry Coverage.The data are shown at the 2- through 3-digit 2022 NAICS code levels for the U.S. For information about NAICS, see Economic Census Code Lists..Sampling.The 2022 Economic Census sample includes all active operating establishments of multi-establishment firms and approximately 1.7 million single-establishment firms, stratified by industry and state. Establishments selected to the sample receive a questionnaire. For all data on this table, establishments not selected into the sample are represented with administrative data. For more information about the sample design, see 2022 Economic Census Methodology..Confidentiality.The Census Bureau has reviewed this data product to ensure appropriate access, use, and disclosure avoidance protection of the confidential source data (Project No. 7504609, Disclosure Review Board (DRB) approval number: CBDRB-FY23-099).To protect confidentiality, the U.S. Census Bureau suppresses cell values to minimize the risk of identifying a particular business’ data or identity.To comply with disclosure avoidance guidelines, data rows with fewer than three contributing firms or three contributing establishments are not presented. Additionally, establishment counts are suppressed when other select statistics in the same row are suppressed. More information on disclosure avoidance is available in the 2022 Economic Census Methodology..Technical Documentation/Methodology.For detailed information about the methods used to collect data and produce statistics, survey questionnaires, Primary Business Activity/NAICS codes, NAPCS codes, and more, see Economic Census Technical Documentation..Weights.No weighting applied as establishments not sampled are represented with administrative data..Table Information.FTP Download.https://www2.census.gov/programs-surveys/economic-census/data/2022/sector31/.API Information.Economic census data are housed in the Census Bureau Application Programming Interface (API)..Symbols.D - Withheld to avoid disclosing data for individual companies; data are included in higher level totalsN - Not available or not comparableS - Estimate does not meet publication standards because of high sampling variability, poor response quality, or other concerns about the estimate quality. Unpublished estimates derived from this table by subtraction are subject to these same limitations and should not be attributed to the U.S. Census Bureau. For a description of publication standards and the total quantity response rate, see link to program methodology page.X - Not applicableA - Relative standard error of 100% or morer - Reviseds - Relative standard error exceeds 40%For a complete list of symbols, see Economic Census Data Dictionary..Data-Specific Notes.Data users who create their own es...
Facebook
Twitterhttps://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de442616https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de442616
Abstract (en): The Public Use Microdata Samples (PUMS) contain person- and household-level information from the "long-form" questionnaires distributed to a sample of the population enumerated in the 1980 Census. This data collection, containing 5-percent data, identifies every state, county groups, and most individual counties with 100,000 or more inhabitants (350 in all). In many cases, individual cities or groups of places with 100,000 or more inhabitants are also identified. Household-level variables include housing tenure, year structure was built, number and types of rooms in dwelling, plumbing facilities, heating equipment, taxes and mortgage costs, number of children, and household and family income. The person record contains demographic items such as sex, age, marital status, race, Spanish origin, income, occupation, transportation to work, and education. All persons and housing units in the United States and Puerto Rico. For this data collection, the full 1980 Census sample that received the "long-form" questionnaire (19.4 percent of all households) was sampled again through a stratified systematic selection procedure with probability proportional to a measure of size. This 5-percent sample, i.e., 5 households for every 100 households in the nation, includes over one-fourth of the households that received the long-form questionnaire. 2006-01-12 All files were removed from dataset 81 and flagged as study-level files, so that they will accompany all downloads.2006-01-12 All files were removed from dataset 80 and flagged as study-level files, so that they will accompany all downloads.2006-01-12 All files were removed from dataset 81 and flagged as study-level files, so that they will accompany all downloads.2006-01-12 All files were removed from dataset 80 and flagged as study-level files, so that they will accompany all downloads.1997-08-25 Part 72, Puerto Rico data, has been added to the collection, as well as supplemental documentation for Puerto Rico in the form of a separate PDF file. The household and person records in each hierarchical data file have logical record lengths of 193 characters, but the number of records varies with each file.The record layout for Part 72, Puerto Rico, is different from the state datasets. Refer to the supplemental documentation for this part.The codebook is available in hardcopy form only, while the Puerto Rico supplemental documentation is provided as a Portable Document Format (PDF) file.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The analysis of census data aggregated by administrative units introduces a statistical bias known as the modifiable areal unit problem (MAUP). Previous researches have mostly assessed the effect of MAUP on upscaling models. The present study contributes to clarify the effects of MAUP on the downscaling methodologies, highlighting how a priori choices of scales and shapes could influence the results. We aggregated chicken and duck fine-resolution census in Thailand, using three administrative census levels in regular and irregular shapes. We then disaggregated the data within the Gridded Livestock of the World analytical framework, sampling predictors in two different ways. A sensitivity analysis on Pearson’s r correlation statistics and RMSE was carried out to understand how size and shapes of the response variables affect the goodness-of-fit and downscaling performances. We showed that scale, rather than shapes and sampling methods, affected downscaling precision, suggesting that training the model using the finest administrative level available is preferable. Moreover, datasets showing non-homogeneous distribution but instead spatial clustering seemed less affected by MAUP, yielding higher Pearson’s r values and lower RMSE compared to a more spatially homogenous dataset. Implementing aggregation sensitivity analysis in spatial studies could help to interpret complex results and disseminate robust products.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
In the Context of COVID-19 information of similar infections like influenza can be very valuable to a data scientist. New York is one of the most affected cities in the COVID-19 pandemia and the knowledge of the distribution of previous infections could be relevant in order to predict future spreadings or develop efficient sampling methods.
The dataset contains weekly information of infections (positive test) in New York Counties during the period Oct 2009-Mar 2019. The months studied are Jan, Feb, Mar, Apr, May, Oct, Nov, Dec. There are included other variables by County like the amount of hospital beds, unemployment rate, population, average income, Median age,Total expenditure per Year in hospital interventions...( See variable description). All information is based on relevant sources. The dataset is a combination of different datasets i list below: 1. Weekly of infections by county: https://data.world/healthdatany/jr8b-6gh6/workspace/file?filename=influenza-laboratory-confirmed-cases-by-county-beginning-2009-10-season-1.csv 2. Area of Counties:https://www.health.ny.gov/statistics/vital_statistics/2006/table02.htm 3. Population size: https://catalog.data.gov/dataset/annual-population-estimates-for-new-york-state-and-counties-beginning-1970 4. Number of Adult care facilities beds: https://health.data.ny.gov/Health/Adult-Care-Facility-Map/6wkx-ptu4 5. Age related data: https://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?src=CF 6. Income data: https://en.wikipedia.org/wiki/List_of_New_York_locations_by_per_capita_income 7. Labour data: https://labor.ny.gov/stats/lslaus.shtm 8. Information about hospitals beds and services: https://health.data.ny.gov/Health/Health-Facility-Certification-Information/2g9y-7kqm 9. Health expenditure by illness: https://health.data.ny.gov/Health/Hospital-Inpatient-Cost-Transparency-Beginning-200/7dtz-qxmr
Testing has been proven to be one of the most relevant tools to fight against virus spreading. Statistics provide of efficient tools to obtain estimation of total number of infections, in particular sampling methods may reduce significantly the costs of testing. This dataset pretends to be used as a tool to understand the distribution of positive tests in the state of New York in order to design sampling methods that could reduce significantly the estimation error.
Facebook
TwitterThe dataset is a relational dataset of 8,000 households households, representing a sample of the population of an imaginary middle-income country. The dataset contains two data files: one with variables at the household level, the other one with variables at the individual level. It includes variables that are typically collected in population censuses (demography, education, occupation, dwelling characteristics, fertility, mortality, and migration) and in household surveys (household expenditure, anthropometric data for children, assets ownership). The data only includes ordinary households (no community households). The dataset was created using REaLTabFormer, a model that leverages deep learning methods. The dataset was created for the purpose of training and simulation and is not intended to be representative of any specific country.
The full-population dataset (with about 10 million individuals) is also distributed as open data.
The dataset is a synthetic dataset for an imaginary country. It was created to represent the population of this country by province (equivalent to admin1) and by urban/rural areas of residence.
Household, Individual
The dataset is a fully-synthetic dataset representative of the resident population of ordinary households for an imaginary middle-income country.
ssd
The sample size was set to 8,000 households. The fixed number of households to be selected from each enumeration area was set to 25. In a first stage, the number of enumeration areas to be selected in each stratum was calculated, proportional to the size of each stratum (stratification by geo_1 and urban/rural). Then 25 households were randomly selected within each enumeration area. The R script used to draw the sample is provided as an external resource.
other
The dataset is a synthetic dataset. Although the variables it contains are variables typically collected from sample surveys or population censuses, no questionnaire is available for this dataset. A "fake" questionnaire was however created for the sample dataset extracted from this dataset, to be used as training material.
The synthetic data generation process included a set of "validators" (consistency checks, based on which synthetic observation were assessed and rejected/replaced when needed). Also, some post-processing was applied to the data to result in the distributed data files.
This is a synthetic dataset; the "response rate" is 100%.
Facebook
TwitterAccess to up-to-date socio-economic data is a widespread challenge in Solomon Islands and other Pacific Island Countries. To increase data availability and promote evidence-based policymaking, the Pacific Observatory provides innovative solutions and data sources to complement existing survey data and analysis. One of these data sources is a series of High Frequency Phone Surveys (HFPS), which began in 2020 as a way to monitor the socio-economic impacts of the COVID-19 Pandemic, and since 2023 has grown into a series of continuous surveys for socio-economic monitoring. See https://www.worldbank.org/en/country/pacificislands/brief/the-pacific-observatory for further details.
For Solmon Islands, after five rounds of data collection from 2020-2020, in April 2023 a monthly HFPS data collection commenced and continued for 18 months (ending September 2024) –on topics including employment, income, food security, health, food prices, assets and well-being. Fieldwork took place in two non-consecutive weeks of each month. Data for April 2023-December 2023 were a repeated cross section, while January 2024 established the first month of a panel, the was continued to September 2024. Each month has approximately 550 households in the sample and is representative of urban and rural areas, but is not representative at the province level. This dataset contains combined monthly survey data for all months of the continuous HFPS in Solomon Islands. There is one date file for household level data with a unique household ID. and a separate file for individual level data within each household data, that can be matched to the household file using the household ID, and which also has a unique individual ID within the household data which can be used to track individuals over time within households, where the data is panel data.
Urban and rural areas of Solomon Islands.
Household, individual.
Sample survey data [ssd]
The initial sample was drawn through Random Digit Dialing (RDD) with geographic stratification. As an objective of the survey was to measure changes in household economic wellbeing over time, the HFPS sought to contact a consistent number of households across each province month to month. This was initially a repeated cross section from April 2023-Dec 2023. The initial sample was drawn from information provided by a major phone service provider in Solomon Islands, covering all the provinces in the country. It had a probability-based weighted design, with a proportionate stratification to achieve geographical representation. The geographical distribution compared to the 2019 Census is listed below for the first month of the HFPS monthly survey:
Choiseul : Census: 4.3%, HFPS: 5.2% Western : Census: 14.4%, HFPS: 13.7% Isabel : Census: 4.8%, HFPS: 4.7% Central : Census: 3.6%, HFPS: 5.2% Ren Bell : Census: 0.6%, HFPS: 1.4% Guadalcanal: Census: 19.8%, HFPS: 21.1% Malaita : Census: 23.1%, HFPS: 18.7% Makira : Census: 5.6%, HFPS: 5.6% Temotu: Census: 3.0%, HFPS: 3% Honiara: Census: 20.7%, HFPS: 21.3%
Source: Census of Population and Housing 2019
Note: The values in the HFPS column represent the proportion of survey participants residing in each province, based on the raw HFPS data from April.
In April 2023, the geographic distribution of World Bank HFPS participants was generally similar to that of the census data at the province level, though within provinces, areas with less mobile phone connectivity are likely to be underrepresented. One indication of this is that urban areas constituted 38.2 percent of the survey sample, which is a slight overrepresentation, compared to 32.5 percent in the Census 2019.
A monthly panel was established in January 2024, that is ongoing as of March 2025. In each subsequent month after January 2024, the survey firm would first attempt to contact all households from the previous month and then attempt to contact households from earlier months that had dropped out. After previous numbers were exhausted, RDD with geographic stratification was used for replacement households. Across all months of the survey a total of, 9,926 interviews were completed.
Computer Assisted Telephone Interview [cati]
The questionnaire, which can be found in the External Resources of this documentation, is available in English, with Solomons Pijin translation. There were few changes to the questionnaire across the survey months, but some sections were only introduced in 2024, namely energy access questions and questions to inform the baseline data of the Solomon Islands Government Integrated Economic Development and Climate Resilience (IEDCR) project.
The raw data were cleaned by the World Bank team using STATA. This included formatting and correcting errors identified through the survey’s monitoring and quality control process. The data are presented in two datasets: a household dataset and an individual dataset. The total number of observations is 9,926 in the household dataset and 62,054 in the individual dataset. The individual dataset contains information on individual demographics and labor market outcomes of all household members aged 15 and above, and the household data set contains information about household demographics, education, food security, food prices, household income, agriculture activities, social protection, access to services, and durable asset ownership. The household identifier (hhid) is available in both the household dataset and the individual dataset. The individual identifier (id_member) can be found in the individual dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time series data for the statistic Agriculture census (Availability score over 20 years) and country Gambia, The. Indicator Definition:Agriculture censuses collect information on agricultural activities, such as size of holding, land tenure, land use, employment and production, and provide basic structural data and sampling frames for agricultural surveys. Censuses of agriculture normally involves collecting key structural data by complete enumeration of all agricultural holdings, in combination with more detailed structural data using sampling methods. It is recommended that agricultural censuses be conducted at least every 10 years.
Facebook
TwitterPersons Persons not organized into households; age grouped into categories; virtual census
UNITS IDENTIFIED: - Dwellings: no - Vacant Units: No - Households: no - Individuals: yes - Group quarters: no
UNIT DESCRIPTIONS: - Dwellings: no - Households: Individuals living in the same dwelling and sharing at least one meal. - Group quarters: Group of persons who share a common roof and food because of work, health, religion, etc.
The entire population of the country: 15,985,538 persons. Microdata are available for 1.19 % of the population, but exclude the institutional population.
MICRODATA SOURCE: Central Bureau of Statistics (Statistics Netherlands)
SAMPLE SIZE (person records): 189725.
SAMPLE DESIGN: 1% sample of the total population, consisting of records of persons prevailing in most sources
Face-to-face [f2f]
Dependent on source: register or survey
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time series data for the statistic Agriculture census (Availability score over 20 years) and country Paraguay. Indicator Definition:Agriculture censuses collect information on agricultural activities, such as size of holding, land tenure, land use, employment and production, and provide basic structural data and sampling frames for agricultural surveys. Censuses of agriculture normally involves collecting key structural data by complete enumeration of all agricultural holdings, in combination with more detailed structural data using sampling methods. It is recommended that agricultural censuses be conducted at least every 10 years.
Facebook
TwitterThe long-term presence of refugees in Chad and the reduction in funding to provide assistance in recent years have led the humanitarian community to reconsider the approach to assistance of these populations. WFP and UNHCR, the Government's main partners in providing assistance to refugees, had conducted a "socio-economic categorization" in 2014 and 2015 in some refugee camps, and an update was decided for 2017. This update was designed to go beyond a simple categorization and focuses on identifying profiles of refugee households that can be empowered in the short to medium term and the factors that can foster this empowerment. The assessment covers 87,724 refugee households in Chad and was carried out during June-August 2017.
Areas hosting refugees in Chad. This includes 19 refugee camps and 9 villages.
Household and individual
All refugee households residing in Chad.
UNHCR PPG: 1TCDA, 1TCDB, 1TCDD
Sample survey data [ssd]
The survey's objective was to deliver representative data of all refugees living in Chad. The total population of refugees at the time of the survey was estimated at slightly below 90,000 households. These refugees were located in 19 refugee camps and 9 villages.
The survey applied a full-coverage (census) approach, i.e. no sample selection was made. The registration database served as the list frame. The total number of completed interviews was 87,724 households.
While the original data collection took a full-coverage approach, the public-release version of the dataset contains a systematically drawn sub-sample of this original data for reasons of statistical disclosure control. The total sample size in the dataset presented for public release is 8,772 households.
None.
Computer Assisted Personal Interview [capi]
All questionaires are provided in section "external ressources".
The dataset presented here has undergone light checking, cleaning and restructuring (data may still contain errors) as well as anonymization (includes removal of direct identifiers and sensitive variables, and grouping values of select variables). Moreover, it constitutes a sub-sample of the data originally collected.
Information unavailable.
Facebook
TwitterThe Tanzania Demographic and Health Survey (TDHS) is part of the worldwide Demographic and Health Surveys (DHS) programme, which is designed to collect data on fertility, family planning, and maternal and child health.
The primary objective of the 1999 TRCHS was to collect data at the national level (with breakdowns by urban-rural and Mainland-Zanzibar residence wherever warranted) on fertility levels and preferences, family planning use, maternal and child health, breastfeeding practices, nutritional status of young children, childhood mortality levels, knowledge and behaviour regarding HIV/AIDS, and the availability of specific health services within the community.1 Related objectives were to produce these results in a timely manner and to ensure that the data were disseminated to a wide audience of potential users in governmental and nongovernmental organisations within and outside Tanzania. The ultimate intent is to use the information to evaluate current programmes and to design new strategies for improving health and family planning services for the people of Tanzania.
National. The sample was designed to provide estimates for the whole country, for urban and rural areas separately, and for Zanzibar and, in some cases, Unguja and Pemba separately.
Sample survey data
The TRCHS used a three-stage sample design. Overall, 176 census enumeration areas were selected (146 on the Mainland and 30 in Zanzibar) with probability proportional to size on an approximately self-weighting basis on the Mainland, but with oversampling of urban areas and Zanzibar. To reduce costs and maximise the ability to identify trends over time, these enumeration areas were selected from the 357 sample points that were used in the 1996 TDHS, which in turn were selected from the 1988 census frame of enumeration in a two-stage process (first wards/branches and then enumeration areas within wards/branches). Before the data collection, fieldwork teams visited the selected enumeration areas to list all the households. From these lists, households were selected to be interviewed. The sample was designed to provide estimates for the whole country, for urban and rural areas separately, and for Zanzibar and, in some cases, Unguja and Pemba separately. The health facilities component of the TRCHS involved visiting hospitals, health centres, and pharmacies located in areas around the households interviewed. In this way, the data from the two components can be linked and a richer dataset produced.
See detailed sample implementation in the APPENDIX A of the final report.
Face-to-face
The household survey component of the TRCHS involved three questionnaires: 1) a Household Questionnaire, 2) a Women’s Questionnaire for all individual women age 15-49 in the selected households, and 3) a Men’s Questionnaire for all men age 15-59.
The health facilities survey involved six questionnaires: 1) a Community Questionnaire administered to men and women in each selected enumeration area; 2) a Facility Questionnaire; 3) a Facility Inventory; 4) a Service Provider Questionnaire; 5) a Pharmacy Inventory Questionnaire; and 6) a questionnaire for the District Medical Officers.
All these instruments were based on model questionnaires developed for the MEASURE programme, as well as on the questionnaires used in the 1991-92 TDHS, the 1994 TKAP, and the 1996 TDHS. These model questionnaires were adapted for use in Tanzania during meetings with representatives from the Ministry of Health, the University of Dar es Salaam, the Tanzania Food and Nutrition Centre, USAID/Tanzania, UNICEF/Tanzania, UNFPA/Tanzania, and other potential data users. The questionnaires and manual were developed in English and then translated into and printed in Kiswahili.
The Household Questionnaire was used to list all the usual members and visitors in the selected households. Some basic information was collected on the characteristics of each person listed, including his/her age, sex, education, and relationship to the head of the household. The main purpose of the Household Questionnaire was to identify women and men who were eligible for individual interview and children under five who were to be weighed and measured. Information was also collected about the dwelling itself, such as the source of water, type of toilet facilities, materials used to construct the house, ownership of various consumer goods, and use of iodised salt. Finally, the Household Questionnaire was used to collect some rudimentary information about the extent of child labour.
The Women’s Questionnaire was used to collect information from women age 15-49. These women were asked questions on the following topics: · Background characteristics (age, education, religion, type of employment) · Birth history · Knowledge and use of family planning methods · Antenatal, delivery, and postnatal care · Breastfeeding and weaning practices · Vaccinations, birth registration, and health of children under age five · Marriage and recent sexual activity · Fertility preferences · Knowledge and behaviour concerning HIV/AIDS.
The Men’s Questionnaire covered most of these same issues, except that it omitted the sections on the detailed reproductive history, maternal health, and child health. The final versions of the English questionnaires are provided in Appendix E.
Before the questionnaires could be finalised, a pretest was done in July 1999 in Kibaha District to assess the viability of the questions, the flow and logical sequence of the skip pattern, and the field organisation. Modifications to the questionnaires, including wording and translations, were made based on lessons drawn from the exercise.
In all, 3,826 households were selected for the sample, out of which 3,677 were occupied. Of the households found, 3,615 were interviewed, representing a response rate of 98 percent. The shortfall is primarily due to dwellings that were vacant or in which the inhabitants were not at home despite of several callbacks.
In the interviewed households, a total of 4,118 eligible women (i.e., women age 15-49) were identified for the individual interview, and 4,029 women were actually interviewed, yielding a response rate of 98 percent. A total of 3,792 eligible men (i.e., men age 15-59), were identified for the individual interview, of whom 3,542 were interviewed, representing a response rate of 93 percent. The principal reason for nonresponse among both eligible men and women was the failure to find them at home despite repeated visits to the household. The lower response rate among men than women was due to the more frequent and longer absences of men.
The response rates are lower in urban areas due to longer absence of respondents from their homes. One-member households are more common in urban areas and are more difficult to interview because they keep their houses locked most of the time. In urban settings, neighbours often do not know the whereabouts of such people.
The estimates from a sample survey are affected by two types of errors: (1) non-sampling errors, and (2) sampling errors. Non-sampling errors are the results of mistakes made in implementing data collection and data processing, such as failure to locate and interview the correct household, misunderstanding of the questions on the part of either the interviewer or the respondent, and data entry errors. Although numerous efforts were made during the implementation of the TRCHS to minimise this type of error, nonsampling errors are impossible to avoid and difficult to evaluate statistically.
Sampling errors, on the other hand, can be evaluated statistically. The sample of respondents selected in the TRCHS is only one of many samples that could have been selected from the same population, using the same design and expected size. Each of these samples would yield results that differ somewhat from the results of the actual sample selected. Sampling errors are a measure of the variability between all possible samples. Although the degree of variability is not known exactly, it can be estimated from the survey results.
A sampling error is usually measured in terms of the standard error for a particular statistic (mean, percentage, etc.), which is the square root of the variance. The standard error can be used to calculate confidence intervals within which the true value for the population can reasonably be assumed to fall. For example, for any given statistic calculated from a sample survey, the value of that statistic will fall within a range of plus or minus two times the standard error of that statistic in 95 percent of all possible samples of identical size and design.
If the sample of respondents had been selected as a simple random sample, it would have been possible to use straightforward formulas for calculating sampling errors. However, the TRCHS sample is the result of a two-stage stratified design, and, consequently, it was necessary to use more complex formulae. The computer software used to calculate sampling errors for the TRCHS is the ISSA Sampling Error Module (SAMPERR). This module used the Taylor linearisation method of variance estimation for survey estimates that are means or proportions. The Jackknife repeated replication method is used for variance estimation of more complex statistics such as fertility and mortality rate
Note: See detailed sampling error calculation in the APPENDIX B