This study is an experiment designed to compare the performance of three methodologies for sampling households with migrants:
Researchers from the World Bank applied these methods in the context of a survey of Brazilians of Japanese descent (Nikkei), requested by the World Bank. There are approximately 1.2-1.9 million Nikkei among Brazil’s 170 million population.
The survey was designed to provide detail on the characteristics of households with and without migrants, to estimate the proportion of households receiving remittances and with migrants in Japan, and to examine the consequences of migration and remittances on the sending households.
The same questionnaire was used for the stratified random sample and snowball surveys, and a shorter version of the questionnaire was used for the intercept surveys. Researchers can directly compare answers to the same questions across survey methodologies and determine the extent to which the intercept and snowball surveys can give similar results to the more expensive census-based survey, and test for the presence of biases.
Sao Paulo and Parana states
Japanese-Brazilian (Nikkei) households and individuals
The 2000 Brazilian Census was used to classify households as Nikkei or non-Nikkei. The Brazilian Census does not ask ethnicity but instead asks questions on race, country of birth and whether an individual has lived elsewhere in the last 10 years. On the basis of these questions, a household is classified as (potentially) Nikkei if it has any of the following: 1) a member born in Japan; 2) a member who is of yellow race and who has lived in Japan in the last 10 years; 3) a member who is of yellow race, who was not born in a country other than Japan (predominantly Korea, Taiwan or China) and who did not live in a foreign country other than Japan in the last 10 years.
Sample survey data [ssd]
1) Stratified random sample survey
Two states with the largest Nikkei population - Sao Paulo and Parana - were chosen for the study.
The sampling process consisted of three stages. First, a stratified random sample of 75 census tracts was selected based on 2000 Brazilian census. Second, interviewers carried out a door-to-door listing within each census tract to determine which households had a Nikkei member. Third, the survey questionnaire was then administered to households that were identified as Nikkei. A door-to-door listing exercise of the 75 census tracts was then carried out between October 13th, 2006, and October 29th, 2006. The fieldwork began on November 19, 2006, and all dwellings were visited at least once by December 22, 2006. The second wave of surveying took place from January 18th, 2007, to February 2nd, 2007, which was intended to increase the number of households responding.
2) Intercept survey
The intercept survey was designed to carry out interviews at a range of locations that were frequented by the Nikkei population. It was originally designed to be done in Sao Paulo city only, but a second intercept point survey was later carried out in Curitiba, Parana. Intercept survey took place between December 9th, 2006, and December 20th, 2006, whereas the Curitiba intercept survey took place between March 3rd and March 12th, 2007.
Consultations with Nikkei community organizations, local researchers and officers of the bank Sudameris, which provides remittance services to this community, were used to select a broad range of locations. Interviewers were assigned to visit each location during prespecified blocks of time. Two fieldworkers were assigned to each location. One fieldworker carried out the interviews, while the other carried out a count of the number of people with Nikkei appearance who appeared to be 18 years old or older who passed by each location. For the fixed places, this count was made throughout the prespecified time block. For example, between 2.30 p.m. and 3.30 p.m. at the sports club, the interviewer counted 57 adult Nikkeis. Refusal rates were carefully recorded, along with the sex and approximate age of the person refusing.
In all, 516 intercept interviews were collected.
3) Snowball sampling survey
The questionnaire that was used was the same as used for the stratified random sample. The plan was to begin with a seed list of 75 households, and to aim to reach a total sample of 300 households through referrals from the initial seed households. Each household surveyed was asked to supply the names of three contacts: (a) a Nikkei household with a member currently in Japan; (b) a Nikkei household with a member who has returned from Japan; (c) a Nikkei household without members in Japan and where individuals had not returned from Japan.
The snowball survey took place from December 5th to 20th, 2006. The second phase of the snowballing survey ran from January 22nd, 2007, to March 23rd, 2007. More associations were contacted to provide additional seed names (69 more names were obtained) and, as with the stratified sample, an adaptation of the intercept survey was used when individuals refused to answer the longer questionnaire. A decision was made to continue the snowball process until a target sample size of 100 had been achieved.
The final sample consists of 60 households who came as seed households from Japanese associations, and 40 households who were chain referrals. The longest chain achieved was three links.
Face-to-face [f2f]
1) Stratified sampling and snowball survey questionnaire
This questionnaire has 36 pages with over 1,000 variables, taking over an hour to complete.
If subjects refused to answer the questionnaire, interviewers would leave a much shorter version of the questionnaire to be completed by the household by themselves, and later picked up. This shorter questionnaire was the same as used in the intercept point survey, taking seven minutes on average. The intention with the shorter survey was to provide some data on households that would not answer the full survey because of time constraints, or because respondents were reluctant to have an interviewer in their house.
2) Intercept questionnaire
The questionnaire is four pages in length, consisting of 62 questions and taking a mean time of seven minutes to answer. Respondents had to be 18 years old or older to be interviewed.
1) Stratified random sampling 403 out of the 710 Nikkei households were surveyed, an interview rate of 57%. The refusal rate was 25%, whereas the remaining households were either absent on three attempts or were not surveyed because building managers refused permission to enter the apartment buildings. Refusal rates were higher in Sao Paulo than in Parana, reflecting greater concerns about crime and a busier urban environment.
2) Intercept Interviews 516 intercept interviews were collected, along with 325 refusals. The average refusal rate is 39%, with location-specific refusal rates ranging from only 3% at the food festival to almost 66% at one of the two grocery stores.
https://www.icpsr.umich.edu/web/ICPSR/studies/8236/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/8236/terms
The 1940 Census Public Use Microdata Sample Project was assembled through a collaborative effort between the United States Bureau of the Census and the Center for Demography and Ecology at the University of Wisconsin. The collection contains a stratified 1-percent sample of households, with separate records for each household, for each "sample line" respondent, and for each person in the household. These records were encoded from microfilm copies of original handwritten enumeration schedules from the 1940 Census of Population. Geographic identification of the location of the sampled households includes Census regions and divisions, states (except Alaska and Hawaii), standard metropolitan areas (SMAs), and state economic areas (SEAs). Accompanying the data collection is a codebook that includes an abstract, descriptions of sample design, processing procedures and file structure, a data dictionary (record layout), category code lists, and a glossary. Also included is a procedural history of the 1940 Census. Each of the 20 subsamples contains three record types: household, sample line, and person. Household variables describe the location and condition of the household. The sample line records contain variables describing demographic characteristics such as nativity, marital status, number of children, veteran status, wage deductions for Social Security, and occupation. Person records also contain variables describing demographic characteristics including nativity, marital status, family membership, education, employment status, income, and occupation.
The programme for the World Census of Agriculture 2000 is the eighth in the series for promoting a global approach to agricultural census taking. The first and second programmes were sponsored by the International Institute for Agriculture (IITA) in 1930 and 1940. Subsequent ones up to 1990 were promoted by the Food and Agriculture Organization of the United Nations(FAO). FAO recommends that each country should conduct at least one agricultural census in each census programme decade and its programme for the World Census of Agriculture 2000 for instance corresponds to agricultural census to be undertaken during the decade 1996 to 2005. Many countries do not have sufficient resources for conducting an agricultural census. It therefore became an acceptable practice since 1960 to conduct agricultural census on sample basis for those countries lacking the resources required for a complete enumeration.
In Nigeria's case, a combination of complete enumeration and sample enumeration is adopted whereby the rural (peasant) holdings are covered on sample basis while the modern holdings are covered on complete enumeration. The project named “National Agricultural Sample Census” derives from this practice. Nigeria through the National Agricultural Sample Census (NASC) participated in the 1970's, 1980's, 1990's programmes of the World Census of Agriculture. Nigeria failed to conduct the Agricultural Census in 2003/2004 because of lack of funding. The NBS regular annual agriculture surveys since 1996 had been epileptic and many years of backlog of data set are still unprocessed. The baseline agricultural data is yet to be updated while the annual regular surveys suffered set back. There is an urgent need by the governments (Federal, State, LGA), sector agencies, FAO and other International Organizations to come together to undertake the agricultural census exercise which is long overdue. The conduct of 2006/2008 National Agricultural Sample Census Survey is now on course with the pilot exercise carried out in the third quarter of 2007.
The National Agricultural Sample Census (NASC) 2006/08 is imperative to the strengthening of the weak agricultural data in Nigeria. The project is phased into three sub-projects for ease of implementation; the Pilot Survey, Modern Agricultural Holding and the Main Census. It commenced in the third quarter of 2006 and to terminate in the first quarter of 2008. The pilot survey was implemented collaboratively by National Bureau of Statistics.
The main objective of the pilot survey was to test the adequacy of the survey instruments, equipments and administration of questionnaires, data processing arrangement and report writing. The pilot survey conducted in July 2007 covered the two NBS survey system-the National Integrated Survey of Households (NISH) and National Integrated Survey of Establishment (NISE). The survey instruments were designed to be applied using the two survey systems while the use of Geographic Positioning System (GPS) was introduced as additional new tool for implementing the project.
The Stakeholders workshop held at Kaduna on 21st-23rd May 2007 was one of the initial bench marks for the take off of the pilot survey. The pilot survey implementation started with the first level training (training of trainers) at the NBS headquarters between 13th - 15th June 2007. The second level training for all levels of field personnels was implemented at headquarters of the twelve (12) concerned states between 2nd - 6th July 2007. The field work of the pilot survey commenced on the 9th July and ended on the 13th of July 07. The IMPS and SPSS were the statistical packages used to develop the data entry programme.
State
Household based of fish farmers
The survey covered all de jure household members (usual residents), who were into fish production
Census/enumeration data [cen]
The survey was carried out in 12 states falling under 6 geo-political zones. 2 states were covered in each geo-political zone. 2 local government areas per selected state were studied. 2 Rural enumeration areas per local government area were covered and 3 Fishing farming housing units were systematically selected and canvassed .
There was deviations from the original sample design
Face-to-face [f2f]
The NASC fishery questionnaire was divided into the following sections: - Holding identification: This is to identify the holder through HU serial number, HH serial number, and demographic characteristics. - Type of fishing sites used by holder. - Sources and quantities of fishing inputs. - Quantity of aquatic production by type. - Quantity sold and value of sale of aquatic products. - Funds committed to fishing by source and others
The data processing and analysis plan involved five main stages: training of data processing staff; manual editing and coding; development of data entry programme; data entry and editing and tabulation. Census and Surveys Processing System (CSPro) software were used for data entry, Statistical Package for Social Sciences (SPSS) and CSPro for editing and a combination of SPSS, Statistical Analysis Software (SAS) and EXCEL for table generation. The subject-matter specialists and computer personnel from the NBS and CBN implemented the data processing work. Tabulation Plans were equally developed by these officers for their areas and topics covered in the three-survey system used for the exercise. The data editing is in 2 phases namely manual editing before the data entry were done. This involved using editors at the various zones to manually edit and ensure consistency in the information on the questionnaire. The second editing is the computer editing, this is the cleaning of the already enterd data. The completed questionnaires were collated and edited manually (a) Office editing and coding were done by the editor using visual control of the questionnaire before data entry (b) Cspro was used to design the data entry template provided as external resource (c) Ten operator plus two suppervissor and two progammer were used (d) Ten machines were used for data entry (e) After data entry data entry supervisor runs fequency on each section to see that all the questionnaire were enterd
Both Enumeration Area (EA) and Fish holders' level Response Rate was 100 per cent.
No computation of sampling error
The Quality Control measures were carried out during the survey, essentially to ensure quality of data
This collection contains individual-level and 1-percent national sample data from the 1960 Census of Population and Housing conducted by the Census Bureau. It consists of a representative sample of the records from the 1960 sample questionnaires. The data are stored in 30 separate files, containing in total over two million records, organized by state. Some files contain the sampled records of several states while other files contain all or part of the sample for a single state. There are two types of records stored in the data files: one for households and one for persons. Each household record is followed by a variable number of person records, one for each of the household members. Data items in this collection include the individual responses to the basic social, demographic, and economic questions asked of the population in the 1960 Census of Population and Housing. Data are provided on household characteristics and features such as the number of persons in household, number of rooms and bedrooms, and the availability of hot and cold piped water, flush toilet, bathtub or shower, sewage disposal, and plumbing facilities. Additional information is provided on tenure, gross rent, year the housing structure was built, and value and location of the structure, as well as the presence of air conditioners, radio, telephone, and television in the house, and ownership of an automobile. Other demographic variables provide information on age, sex, marital status, race, place of birth, nationality, education, occupation, employment status, income, and veteran status. The data files were obtained by ICPSR from the Center for Social Analysis, Columbia University. (Source: downloaded from ICPSR 7/13/10)
Please Note: This dataset is part of the historical CISER Data Archive Collection and is also available at ICPSR at https://doi.org/10.3886/ICPSR07756.v1. We highly recommend using the ICPSR version as they may make this dataset available in multiple data formats in the future.
The 2000 Republic of Palau Census of Population and Housing was the second census collected and processed entirely by the republic itself. This monograph provides analyses of data from the most recent census of Palau for decision makers in the United States and Palau to understand current socioeconomic conditions. The 2005 Census of Population and Housing collected a wide range of information on the characteristics of the population including demographics, educational attainments, employment status, fertility, housing characteristics, housing characteristics and many others.
National
The 1990, 1995 and 2000 censuses were all modified de jure censuses, counting people and recording selected characteristics of each individual according to his or her usual place of residence as of census day. Data were collected for each enumeration district - the households and population in each enumerator assignment - and these enumeration districts were then collected into hamlets in Koror, and the 16 States of Palau.
Census/enumeration data [cen]
No sampling - whole universe covered
Face-to-face [f2f]
The 2000 censuses of Palau employed a modified list-enumerate procedure, also known as door-to-door enumeration. Beginning in mid-April 2000, enumerators began visiting each housing unit and conducted personal interviews, recording the information collected on the single questionnaire that contained all census questions. Follow-up enumerators visited all addresses for which questionnaires were missing to obtain the information required for the census.
The completed questionnaires were checked for completeness and consistency of responses, and then brought to OPS for processing. After checking in the questionnaires, OPS staff coded write-in responses (e.g., ethnicity or race, relationship, language). Then data entry clerks keyed all the questionnaire responses. The OPS brought the keyed data to the U.S. Census Bureau headquarters near Washington, DC, where OPS and Bureau staff edited the data using the Consistency and Correction (CONCOR) software package prior to generating tabulations using the Census Tabulation System (CENTS) package. Both packages were developed at the Census Bureau's International Programs Center (IPC) as part of the Integrated Microcomputer Processing System (IMPS).
The goal of census data processing is to produce a set of data that described the population as clearly and accurately as possible. To meet this objective, crew leaders reviewed and edited questionnaires during field data collection to ensure consistency, completeness, and acceptability. Census clerks also reviewed questionnaires for omissions, certain inconsistencies, and population coverage. Census personnel conducted a telephone or personal visit follow-up to obtain missing information. The follow-ups considered potential coverage errors as well as questionnaires with omissions or inconsistencies beyond the completeness and quality tolerances specified in the review procedures.
Following field operations, census staff assigned remaining incomplete information and corrected inconsistent information on the questionnaires using imputation procedures during the final automated edit of the data. The use of allocations, or computer assignments of acceptable data, occurred most often when an entry for a given item was lacking or when the information reported for a person or housing unit on an item was inconsistent with other information for that same person or housing unit. In all of Palau’s censuses, the general procedure for changing unacceptable entries was to assign an entry for a person or housing unit that was consistent with entries for persons or housing units with similar characteristics. The assignment of acceptable data in place of blanks or unacceptable entries enhanced the usefulness of the data.
Human and machine-related errors occur in any large-scale statistical operation. Researchers generally refer to these problems as non-sampling errors. These errors include the failure to enumerate every household or every person in a population, failure to obtain all required information from residents, collection of incorrect or inconsistent information, and incorrect recording of information. In addition, errors can occur during the field review of the enumerators' work, during clerical handling of the census questionnaires, or during the electronic processing of the questionnaires. To reduce various types of non-sampling errors, Census office personnel used several techniques during planning, data collection, and data processing activities. Quality assurance methods were used throughout the data collection and processing phases of the census to improve the quality of the data.
Census staff implemented several coverage improvement programs during the development of census enumeration and processing strategies to minimize under-coverage of the population and housing units. A quality assurance program improved coverage in each census. Telephone and personal visit follow-ups also helped improve coverage. Computer and clerical edits emphasized improving the quality and consistency of the data. Local officials participated in post-census local reviews. Census enumerators conducted additional re-canvassing where appropriate.
This collection is a nationally representative--although clustered--1 in 1000 preliminary subsample of the United States population in 1880. The subsample is based on every tenth microfilm reel of enumeration forms (there are a total of 1,454 reels) and, within each reel, on the census page itself. In terms of the Public Use Sample as a whole, a sample density of 1 person per 100 was chosen so that a single sample point was randomly generated for every two census pages. Sample points were chosen for inclusion in the collection only if the individual selected was the first person listed in the dwelling. Under this procedure each dwelling, family, and individual in the population had a 1 in 100 probability of inclusion in the Public Use Sample.
Please Note: This dataset is part of the historical CISER Data Archive Collection and is also available at ICPSR at https://doi.org/10.3886/ICPSR09474.v1. We highly recommend using the ICPSR version as they may make this dataset available in multiple data formats in the future.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Table showing all variables, classifications and codes included within the Census 2021 microdata samples. This covers the secure, safeguarded and public samples.
The 2007/08 Agricultural Sample Census was designed to meet the data needs of a wide range of users down to district level including policy makers at local, regional and national levels, rural development agencies, funding institutions, researchers, NGOs, farmers' organizations, and others. The dataset is both more numerous in its sample and detailed in its scope and coverage so as to meet the user demand.
The census was carried out in order to:
-Provide benchmark data on productivity, production and agricultural practices in relation to policies and interventions promoted by the Ministry of Agriculture and Food Security and other stakeholders; and
Tanzania Mainland and Zanzibar
Community, Household, Individual
Small scale farmers, Large Scale Farmers, Community
Sample survey data [ssd]
The Mainland sample consisted of 3,192 villages. The total Mainland sample was 47,880 agricultural households while in Zanzibar, a total of 317 EAs were selected and 4,755 agricultural households were covered.
The villages were drawn from the National Master Sample (NMS) developed by the National Bureau of Statistics (NBS) to serve as a national framework for the conduct of household based surveys in the country. The National Master Sample was developed from the previous 2002 Population and Housing Census.
The numbers of villages/Enumeration Areas (EAs) were selected for the first stage with a probability proportional to the number of villages/EAs in each district. In the second stage, 15 households were selected from a list of agricultural households in each village/EA using systematic random sampling.
Face-to-face [f2f]
The census used three different questionnaires: - Small scale farm questionnaire - Community level questionnaire - Large scale farm questionnaire
The small scale farm questionnaire was the main census instrument and it included questions related to crop and livestock production and practices; population demographics; access to services, community resources and infrastructure; issues on poverty and gender. The main topics covered were:
The community level questionnaire was designed to collect village level data such as access and use of common resources, community tree plantation and seasonal farm gate prices.
The Large Scale Farm questionnaire was administered to large farms either privately or corporately managed.
Data editing took place at a number of stages throughout the processing, including: - Manual cleaning exercisePrior to scanning. (Questionnaires found dirty or damaged and generally unsuitable for scanning were put aside for manual data entry ) - CSPro was used for data entry of all Large Scale Farms and Community based questionnaires - Scanning and ICR data capture technology for the smallholder questionnaire - There was an Interactive validation during the ICR extraction process. - The use of a batch validation program developed in CSPro. This was used in order to identify inconsistencies within a questionnaire. - Statistical Package for Social Sciences (SPSS) was used to produce the Census tabulations - Microsoft Excel was used to organize the tables, charts and compute additional indicators -Arc GIS (Geographical Information System) was used in producing the maps. - Microsoft Word was used in compiling and writing up the reports
analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
analyze the american community survey (acs) with r and monetdb experimental. think of the american community survey (acs) as the united states' census for off-years - the ones that don't end in zero. every year, one percent of all americans respond, making it the largest complex sample administered by the u.s. government (the decennial census has a much broader reach, but since it attempts to contact 100% of the population, it's not a sur vey). the acs asks how people live and although the questionnaire only includes about three hundred questions on demography, income, insurance, it's often accurate at sub-state geographies and - depending how many years pooled - down to small counties. households are the sampling unit, and once a household gets selected for inclusion, all of its residents respond to the survey. this allows household-level data (like home ownership) to be collected more efficiently and lets researchers examine family structure. the census bureau runs and finances this behemoth, of course. the dow nloadable american community survey ships as two distinct household-level and person-level comma-separated value (.csv) files. merging the two just rectangulates the data, since each person in the person-file has exactly one matching record in the household-file. for analyses of small, smaller, and microscopic geographic areas, choose one-, three-, or fiv e-year pooled files. use as few pooled years as you can, unless you like sentences that start with, "over the period of 2006 - 2010, the average american ... [insert yer findings here]." rather than processing the acs public use microdata sample line-by-line, the r language brazenly reads everything into memory by default. to prevent overloading your computer, dr. thomas lumley wrote the sqlsurvey package principally to deal with t his ram-gobbling monster. if you're already familiar with syntax used for the survey package, be patient and read the sqlsurvey examples carefully when something doesn't behave as you expect it to - some sqlsurvey commands require a different structure (i.e. svyby gets called through svymean) and others might not exist anytime soon (like svyolr). gimme some good news: sqlsurvey uses ultra-fast monetdb (click here for speed tests), so follow the monetdb installation instructions before running this acs code. monetdb imports, writes, recodes data slowly, but reads it hyper-fast . a magnificent trade-off: data exploration typically requires you to think, send an analysis command, think some more, send another query, repeat. importation scripts (especially the ones i've already written for you) can be left running overnight sans hand-holding. the acs weights generalize to the whole united states population including individuals living in group quarters, but non-residential respondents get an abridged questionnaire, so most (not all) analysts exclude records with a relp variable of 16 or 17 right off the bat. this new github repository contains four scripts: 2005-2011 - download all microdata.R create the batch (.bat) file needed to initiate the monet database in the future download, unzip, and import each file for every year and size specified by the user create and save household- and merged/person-level replicate weight complex sample designs create a well-documented block of code to re-initiate the monet db server in the future fair warning: this full script takes a loooong time. run it friday afternoon, commune with nature for the weekend, and if you've got a fast processor and speedy internet connection, monday morning it should be ready for action. otherwise, either download only the years and sizes you need or - if you gotta have 'em all - run it, minimize it, and then don't disturb it for a week. 2011 single-year - analysis e xamples.R run the well-documented block of code to re-initiate the monetdb server load the r data file (.rda) containing the replicate weight designs for the single-year 2011 file perform the standard repertoire of analysis examples, only this time using sqlsurvey functions 2011 single-year - variable reco de example.R run the well-documented block of code to re-initiate the monetdb server copy the single-year 2011 table to maintain the pristine original add a new age category variable by hand add a new age category variable systematically re-create then save the sqlsurvey replicate weight complex sample design on this new table close everything, then load everything back up in a fresh instance of r replicate a few of the census statistics. no muss, no fuss replicate census estimates - 2011.R run the well-documented block of code to re-initiate the monetdb server load the r data file (.rda) containing the replicate weight designs for the single-year 2011 file match every nation wide statistic on the census bureau's estimates page, using sqlsurvey functions click here to view these four scripts for more detail about the american community survey (acs), visit: < ul> the us census...
The State Legislative District Summary File (Sample) (SLDSAMPLE) contains the sample data, which is the information compiled from the questions asked of a sample of all people and housing units. Population items include basic population totals; urban and rural; households and families; marital status; grandparents as caregivers; language and ability to speak English; ancestry; place of birth, citizenship status, and year of entry; migration; place of work; journey to work (commuting); school enrollment and educational attainment; veteran status; disability; employment status; industry, occupation, and class of worker; income; and poverty status. Housing items include basic housing totals; urban and rural; number of rooms; number of bedrooms; year moved into unit; household size and occupants per room; units in structure; year structure built; heating fuel; telephone service; plumbing and kitchen facilities; vehicles available; value of home; monthly rent; and shelter costs. The file contains subject content identical to that shown in Summary File 3 (SF 3).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The 2020 Census Production Settings Demographic and Housing Characteristics (DHC) Approximate Monte Carlo (AMC) method seed Privacy Protected Microdata File (PPMF0) and PPMF replicates (PPMF1, PPMF2, ..., PPMF50) are a set of microdata files intended for use in estimating the magnitude of error(s) introduced by the 2020 Census Disclosure Avoidance System (DAS) into the 2020 Census Redistricting Data Summary File (P.L. 94-171), the Demographic and Housing Characteristics File, and the Demographic Profile.
The PPMF0 was the source of the publicly released, official 2020 Census data products referenced above, and was created by executing the 2020 DAS TopDown Algorithm (TDA) using the confidential 2020 Census Edited File (CEF) as the initial input; the official location for the PPMF0 is on the United States Census Bureau FTP server, but we also include a copy of it here for convenience. The replicates were then created by executing the 2020 DAS TDA repeatedly with the PPMF0 as its initial input.
Inspired by analogy to the use of bootstrap methods in non-private contexts, U.S. Census Bureau (USCB) researchers explored whether simple calculations based on comparing each PPMFi to the PPMF0 could be used to reliably estimate the scale of errors introduced by the 2020 DAS, and generally found this approach worked well.
The PPMF0 and PPMFi files contained here are provided so that external researchers can estimate properties of DAS-introduced error without privileged access to internal USCB-curated data sets; further information on the estimation methodology can be found in Ashmead et. al 2024.
The 2020 DHC AMC seed PPMF0 and PPMF replicates have been cleared for public dissemination by the USCB Disclosure Review Board (CBDRB-FY22-DSEP-004). The PPMF0 and PPMF replicates contain all Person and Units attributes necessary to produce the 2020 Census Redistricting Data Summary File (P.L. 94-171), the Demographic and Housing Characteristics File, and the Demographic Profile for both the United States and Puerto Rico, and include geographic detail down to the Census Block level. They do not include attributes specific to either the Detailed DHC-A or Detailed DHC-B products; in particular, data on Major Race (e.g., White Alone) is included, but data on Detailed Race (e.g., Cambodian) is not included in the PPMF0 and replicates.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
The main population base for published statistical tables from the 2011 Census in Northern Ireland is the usual resident population base as at Census day, 27 March 2011. By way of background, for 2011 Census purposes a usual resident of the United Kingdom (UK) is anyone who, on Census day, was in the UK and had stayed or intended to stay in the UK for a period of 12 months or more, or had a permanent UK address and was outside the UK and had intended to be outside the UK for less than 12 months.
Against this background, the 2011 Census Microdata Sample of Anonymised Records (SARs) Teaching File comprises a sample of 19,862 records (approximately 1 per cent) relating to people who were usually resident in Northern Ireland at the time of the 2011 Census. For each individual, information is available for seventeen separate characteristics (for example, sex, age, marital status) to varying degrees of detail. Both the size of the sample and the content of the records in the file have been harmonised, wherever possible, with the equivalent SARs teaching file that the Office for National Statistics simultaneously released for England and Wales.
The primary purpose of the teaching file, which comprises unit-record level data as opposed to statistical aggregates, is as an educational tool aimed at:
The 2009 Population and Housing Census was implemented according to Prime Ministerial Decision No. 94/2008/QD-TTg dated 10 July, 2008. This was the fourth population census and the third housing census implemented in Vietnam since the nation was reunified in 1975. The Census aimed to collect basic data on the population and housing for the entire territory of the Socialist Republic of Vietnam, to provide data for research and analysis of population and housing developments nationally and for each locality. It responded to information needs for assessing implementation of socio-economic development plans covering the period 2001 to 2010, for developing the socio-economic development plans for 2011 to 2020 and for monitoring performance on Millennium Development Goals of the United Nations to which the Vietnamese Government is committed.
National
Households Individuals Dwelling
The 2009 Population and Housing Census enumerated all Vietnamese regularly residing in the territory of the Socialist Republic of Vietnam at the reference point of 0:00 on 01 April, 2009; Vietnamese citizens given permission by the authorities to travel overseas and still within the authorized period; deaths (members of the household) that occurred between the first day of the Lunar Year of the Rat (07 February, 2008) to 31 March, 2009; and residential housing of the population.
Population and housing censuses were implemented simultaneously taking the household as the survey unit. The household could include one individual who eats and resides alone or a group of individuals who eat and reside together. For household with 2 persons and over, its members may or may not share a common budget; or be related by blood or not; or marital or adoptive relationship or not; or in combination of both. The household head was the main respondent. For information of which the head of household was unaware, the enumerator was required to directly interview the survey subject. For information on labour and employment, the enumerator was required to directly interview all respondents aged 15 and older; for questions on births, the enumerator was required to directly interview women in childbearing ages (from 15 to 49 years of age) to determine the responses. For information on housing, the enumerator was required to directly survey the household head and/or combine this with direct observation to determine the information to record in the forms.
Census/enumeration data [cen]
Sample size In the 2009 Population and Housing Census, besides a full enumeration, some indicators were collected in a sample survey. The census sample survey was designed to: (1) expand survey contents; (2) improve survey quality, especially for sensitive and complicated questions; and (3) save on survey costs. To improve the efficiency and reliability of the census sample data, the sample size was 15% of the total population of the country. The sample of the census is a single-stage cluster sample design with stratification and systematic sample selection. Sample selection is implemented in two steps: Step 1, select the strata to determine the sample size for each district. Step 2, independently and systematically select from the sample frame of enumeration areas in each district to determine the specific enumeration areas in the sample.
The sample size of the two census sample surveys in 1989 and 1999 was 5% and 3% respectively, only representative at the provincial level; sample survey indicators covered fertility history of women aged 15-49 years and deaths in the household in the previous 12 months. In the 2009 Census, besides the above two indicators, many other indicators were also included in the census sample survey. The census sample survey provides data representative at the district level. When determining sample size and allocation, the frequency of events was taken into account for various indicators including birth and deaths in the 12 months prior to the survey, and the number of people unemployed in urban areas, etc.; efforts were also made to ensure the ability to compare results between districts within the same province/municipality and between provinces/ municipalities.
Stratification and sample allocation across strata To ensure representativeness of the sample for each district throughout the country and because the population size is not uniform across districts or provinces, the Central Steering Committee decided to allocate the sample directly to 682 out of 684 districts (excluding 2 island districts) throughout the country in 2 steps:
Step 1: Determine the sampling rate f(r) for 3 regions including: - Region 1: including 132 urban districts; - Region 2: including 294 delta and coastal rural districts; - Region 3: including 256 mountainous and island districts.
Step 2: Allocate the sample across districts in each region based on the sampling rates for each region as determined in Step 1 using the inverse sampling allocation method. Through applying to this allocation method, the number of sampling units in each small district is increased adequately to ensure representativeness. The formula used to calculate the sample rate for each district in each region is provided on page 22 of the Census Report (Part1) provided as external resources.
Sampling unit and method The sampling unit is the enumeration area that was ascertained in the step to delimit enumeration areas. The sampling frame is the list of all enumeration areas that was made following the order of the list of administrative units at the commune level within each district. In this way, the whole country has 682 sample frames (682 strata).
The provincial steering committee was responsible for selecting sample enumeration areas using systematic random sampling as follows: Step 1: Take the total of all enumeration areas in the district, divide by the number of enumeration areas needed in the sampleto determine the skip (k), which is calculated with precision up to 1 decimal point. Step 2: Select the first enumeration area (b, with b = k), corresponding to the first enumeration area to be selected. Each successive enumeration area to be selected will correspond to the order number: bi = b + i x k ; here i = 1, 2, 3…. Stopping when the number of enumeration areas needed has been selected.
Face-to-face [f2f]
The questionnaires and survey materials were designed and tested three times before final approval.
The 2009 Population and Housing Census applied Intelligent Character Recognition technology/scanning technology for direct data entry from census forms to the computer to replace the traditional keyboard data entry that is commonly used in Vietnam at present. This is an advanced technology, and the first time it had been applied in a statistical survey in Vietnam. Preparatory work had to be done carefully and meticulously. Through organization of many workshops and 7 pilot applications with technical and financial assistance from the UNFPA, the new technology was mastered, and the Census Steering Committee Standing Committee approved use of this technology to process the entire results of the 2009 Population and Housing Census. The Government decided to allocate funds through the project on Modernization of the General Statistics Office using World Bank Loan funds to procure the scanning system equipment, software and technical assistance. The successful use of this technology will create a precedent for continued use of scanning technology in other statistical surveys
After checking and coding at the Provincial/municipal steering committee office, (both the complete census and the census sample survey), forms were checked and accepted then transferred for processing to one of three Statistical Computing Centres in Hanoi, Ho Chi Minh City and Da Nang. Data processing was implemented in only a few locations, following standard procedures and a fixed timeline. The steering committee at each level and processing centres fully implemented their assigned responsibilities, especially the checking, transmitting and maintenance of survey forms in good condition. The Central Steering Committee collaborated with the Statistical Computer Centres to set up a plan for processing and compiling results, setting up tabulation plans, interpreting and synthesizing output tables, and developing options for extrapolating from sample to population estimates.
The General Statistics Office completed the work of developing software applications and training using ReadSoft software (the one used in pilot testing), organized training on network management and training on systems and programs for logic checks and data editing, developed a data processing protocol, integrated these systems and completed data flow management programs. The General Statistics Office collaborated with the contractor, FPT, to develop software applications, train staff, testl the system and complete the programs using the new TIS and E-form software.
Compilation of results was implemented in 2 stages. In stage 1 data were compiled from the Census Sample Survey by the end of October, 2009, and in stage 2, data were compiled from the completed census forms, with work finalized in May 2010.
Estimates from the Census sample survey were affected by two types of error: (1) non-sampling error, and (2) sampling error. Non-sampling error is the result of errors in implementation of data collection and processing such as visiting the
The 1961 Census Microdata Individual File for Great Britain: 5% Sample dataset was created from existing digital records from the 1961 Census under a project known as Enhancing and Enriching Historic Census Microdata Samples (EEHCM), which was funded by the Economic and Social Research Council with input from the Office for National Statistics and National Records of Scotland. The project ran from 2012-2014 and was led from the UK Data Archive, University of Essex, in collaboration with the Cathie Marsh Institute for Social Research (CMIST) at the University of Manchester and the Census Offices. In addition to the 1961 data, the team worked on files from the 1971 Census and 1981 Census.
The original 1961 records preceded current data archival standards and were created before microdata sets for secondary use were anticipated. A process of data recovery and quality checking was necessary to maximise their utility for current researchers, though some imperfections remain (see the User Guide for details). Three other 1961 Census datasets have been created:
Topics covered in the 2021 UK Census included:
The 2021 Census: Safeguarded Individual Microdata Sample at Grouped Local Authority Level dataset consists of a random sample of 5% of person records from the 2021 Census. It includes records for 3,021,611 persons. These data cover England and Wales only. The lowest level of geography is grouped local authority. This means groups of local authorities or single local authorities where the population reaches at least 120,000 persons. The dataset contains 87 variables and a low level of detail.
Census Microdata
Microdata are small samples of individual records from a single census from which identifying information have been removed. They contain a range of individual and household characteristics and can be used to carry out analysis not possible from standard census outputs, such as:
The microdata samples are designed to protect the confidentiality of individuals and households. This is done by applying access controls and removing information that might directly identify a person, such as names, addresses and date of birth. Record swapping is applied to the census data used to create the microdata samples. This is a statistical disclosure control (SDC) method, which makes very small changes to the data to prevent the identification of individuals. The microdata samples use further SDC methods, such as collapsing variables and restricting detail. The samples also include records that have been edited to prevent inconsistent data and contain imputed persons, households, and data values. To protect confidentiality, imputation flags are not included in any 2021 Census microdata sample.
IPUMS-International is an effort to inventory, preserve, harmonize, and disseminate census microdata from around the world. The project has collected the world's largest archive of publicly available census samples. The data are coded and documented consistently across countries and over time to facillitate comparative research. IPUMS-International makes these data available to qualified researchers free of charge through a web dissemination system.
The IPUMS project is a collaboration of the Minnesota Population Center, National Statistical Offices, and international data archives. Major funding is provided by the U.S. National Science Foundation and the Demographic and Behavioral Sciences Branch of the National Institute of Child Health and Human Development. Additional support is provided by the University of Minnesota Office of the Vice President for Research, the Minnesota Population Center, and Sun Microsystems.
National coverage
Dwelling
UNITS IDENTIFIED: - Dwellings: No - Households: Yes - Individuals: Yes - Group quarters: Yes
UNIT DESCRIPTIONS: - Group quarters: A collective household is a group of persons that does not live in an ordinary household, but lives in a collective establishment, sharing meal times.
Residents of France, of any nationality. Does not include French citizens living in other countries, foreign tourists, or people passing through.
Census/enumeration data [cen]
SAMPLE DESIGN: Systematic manual sorting into lots with different sample units according to target population. Lots divide the population into different samples (1/4 and 3/4). 1/20 sample is selected from 1/4 sample.
SAMPLE UNIT: Private dwellings and individuals for group quarters and compte a part
SAMPLE FRACTION: 5%
SAMPLE UNIVERSE: The microdata sample includes mainland France.
SAMPLE SIZE (person records): 2,631,713
Face-to-face [f2f]
Separate forms for buildings, group quarters (collective households), group quarters (compte a part), private households, and boats. Four forms for individuals (living in group quarters and private dwellings; two different forms for people compte a part; living in boats).
The UK censuses took place on 27 March 2011. They were run by the Northern Ireland Statistics & Research Agency (NISRA), National Records of Scotland (NRS), and the Office for National Statistics (ONS) for both England and Wales. The UK comprises the countries of England, Wales, Scotland and Northern Ireland.
Statistics from the UK censuses help paint a picture of the nation and how we live. They provide a detailed snapshot of the population and its characteristics and underpin funding allocation to provide public services. This is the home for all UK census data.
The 2011 Census Microdata Individual Safeguarded Sample (Local Authority): England and Wales data collection forms part of the statistical outputs from the 2011 UK Census. A safeguarded microdata sample of individuals has been identified as a key Census user requirement, and was highlighted as part of a report specifying microdata products from the 2011 Census written by an expert user, Dr. Jo Wathan from the University of Manchester.The Jordan Population and Family Health Survey (JPFHS) is part of the worldwide Demographic and Health Surveys Program, which is designed to collect data on fertility, family planning, and maternal and child health.
The primary objective of the 2012 Jordan Population and Family Health Survey (JPFHS) is to provide reliable estimates of demographic parameters, such as fertility, mortality, family planning, and fertility preferences, as well as maternal and child health and nutrition, that can be used by program managers and policymakers to evaluate and improve existing programs. The JPFHS data will be useful to researchers and scholars interested in analyzing demographic trends in Jordan, as well as those conducting comparative, regional, or cross-national studies.
National coverage
Sample survey data [ssd]
Sample Design The 2012 JPFHS sample was designed to produce reliable estimates of major survey variables for the country as a whole, urban and rural areas, each of the 12 governorates, and for the two special domains: the Badia areas and people living in refugee camps. To facilitate comparisons with previous surveys, the sample was also designed to produce estimates for the three regions (North, Central, and South). The grouping of the governorates into regions is as follows: the North consists of Irbid, Jarash, Ajloun, and Mafraq governorates; the Central region consists of Amman, Madaba, Balqa, and Zarqa governorates; and the South region consists of Karak, Tafiela, Ma'an, and Aqaba governorates.
The 2012 JPFHS sample was selected from the 2004 Jordan Population and Housing Census sampling frame. The frame excludes the population living in remote areas (most of whom are nomads), as well as those living in collective housing units such as hotels, hospitals, work camps, prisons, and the like. For the 2004 census, the country was subdivided into convenient area units called census blocks. For the purposes of the household surveys, the census blocks were regrouped to form a general statistical unit of moderate size (30 households or more), called a "cluster", which is widely used in surveys as a primary sampling unit (PSU).
Stratification was achieved by first separating each governorate into urban and rural areas and then, within each urban and rural area, by Badia areas, refugee camps, and other. A two-stage sampling procedure was employed. In the first stage, 806 clusters were selected with probability proportional to the cluster size, that is, the number of residential households counted in the 2004 census. A household listing operation was then carried out in all of the selected clusters, and the resulting lists of households served as the sampling frame for the selection of households in the second stage. In the second stage of selection, a fixed number of 20 households was selected in each cluster with an equal probability systematic selection. A subsample of two-thirds of the selected households was identified for anthropometry measurements.
Refer to Appendix A in the final report (Jordan Population and Family Health Survey 2012) for details of sampling weights calculation.
Face-to-face [f2f]
The 2012 JPFHS used two questionnaires, namely the Household Questionnaire and the Woman’s Questionnaire (see Appendix D). The Household Questionnaire was used to list all usual members of the sampled households, and visitors who slept in the household the night before the interview, and to obtain information on each household member’s age, sex, educational attainment, relationship to the head of the household, and marital status. In addition, questions were included on the socioeconomic characteristics of the household, such as source of water, sanitation facilities, and the availability of durable goods. Moreover, the questionnaire included questions about child discipline. The Household Questionnaire was also used to identify women who were eligible for the individual interview (ever-married women age 15-49 years). In addition, all women age 15-49 and children under age 5 living in the subsample of households were eligible for height and weight measurement and anemia testing.
The Woman’s Questionnaire was administered to ever-married women age 15-49 and collected information on the following topics: • Respondent’s background characteristics • Birth history • Knowledge, attitudes, and practice of family planning and exposure to family planning messages • Maternal health (antenatal, delivery, and postnatal care) • Immunization and health of children under age 5 • Breastfeeding and infant feeding practices • Marriage and husband’s background characteristics • Fertility preferences • Respondent’s employment • Knowledge of AIDS and sexually transmitted infections (STIs) • Other health issues specific to women • Early childhood development • Domestic violence
In addition, information on births, pregnancies, and contraceptive use and discontinuation during the five years prior to the survey was collected using a monthly calendar.
The Household and Woman’s Questionnaires were based on the model questionnaires developed by the MEASURE DHS program. Additions and modifications to the model questionnaires were made in order to provide detailed information specific to Jordan. The questionnaires were then translated into Arabic.
Anthropometric data were collected during the 2012 JPFHS in a subsample of two-thirds of the selected households in each cluster. All women age 15-49 and children age 0-4 in these households were measured for height using Shorr height boards and for weight using electronic Seca scales. In addition, a drop of capillary blood was taken from these women and children in the field to measure their hemoglobin level using the HemoCue system. Hemoglobin testing was used to estimate the prevalence of anemia.
Fieldwork and data processing activities overlapped. Data processing began two weeks after the start of the fieldwork. After field editing of questionnaires for completeness and consistency, the questionnaires for each cluster were packaged together and sent to the central office in Amman, where they were registered and stored. Special teams were formed to carry out office editing and coding of the openended questions.
Data entry and verification started after two weeks of office data processing. The process of data entry, including 100 percent reentry, editing, and cleaning, was done by using PCs and the CSPro (Census and Survey Processing) computer package, developed specially for such surveys. The CSPro program allows data to be edited while being entered. Data processing operations were completed by early January 2013. A data processing specialist from ICF International made a trip to Jordan in February 2013 to follow up on data editing and cleaning and to work on the tabulation of results for the survey preliminary report, which was published in March 2013. The tabulations for this report were completed in April 2013.
In all, 16,120 households were selected for the survey and, of these, 15,722 were found to be occupied households. Of these households, 15,190 (97 percent) were successfully interviewed.
In the households interviewed, 11,673 ever-married women age 15-49 were identified and interviews were completed with 11,352 women, or 97 percent of all eligible women.
The estimates from a sample survey are affected by two types of errors: (1) nonsampling errors and (2) sampling errors. Nonsampling errors are the results of mistakes made in implementing data collection and data processing, such as failure to locate and interview the correct household, misunderstanding of the questions on the part of either the interviewer or the respondent, and data entry errors. Although numerous efforts were made during the implementation of the 2012 Jordan Population and Family Health Survey (JPFHS) to minimize this type of error, nonsampling errors are impossible to avoid and difficult to evaluate statistically.
Sampling errors, on the other hand, can be evaluated statistically. The sample of respondents selected in the 2012 JPFHS is only one of many samples that could have been selected from the same population, using the same design and identical size. Each of these samples would yield results that differ somewhat from the results of the actual sample selected. Sampling error is a measure of the variability between all possible samples. Although the degree of variability is not known exactly, it can be estimated from the survey results.
A sampling error is usually measured in terms of the standard error for a particular statistic (mean, percentage, etc.), which is the square root of the variance. The standard error can be used to calculate confidence intervals within which the true value for the population can reasonably be assumed to fall. For example, for any given statistic calculated from a sample survey, the value of that statistic will fall within a range of plus or minus two times the standard error of that statistic in 95 percent of all possible samples of identical size and design.
If the sample of respondents had been selected as a simple random sample, it would have been possible to use straightforward formulas for calculating sampling errors. However, the 2012 JPFHS sample is the result of a multistage stratified design, and, consequently, it was necessary to use more complex formulae. The computer
Censuses are principal means of collecting basic population and housing statistics required for social and economic development, policy interventions, their implementation and evaluation. The Post-Apartheid South African government has conducted three Censuses, in 1996, 2001 and 2011.
The South African Census 2011 has national coverage.
Households and individuals
The South African Census 2011 covered every person present in South Africa on Census Night, 9-31 October 2011 including all de jure household members and residents of institutions.
Census/enumeration data
The sampling frame for the PES was the complete list of Census 2011 EAs, amounting to 103 576 EAs. The primary sampling units (PSUs) were the Census EAs. The principle for selecting the PES sample is that the EA boundaries for sampled EAs should have well defined boundaries, and these boundaries should correspond with those of Census EAs to allow for item-by-item comparison between the Census and PES records. The stratification and sampling process followed will allow for the provision of estimates at national, provincial, urban (geography type = urban) and non-urban (geography type = farm and traditional) levels, but estimates will only be reliable at national and provincial levels. The sample of 600 EAs was selected and allocated to the provinces based on expected standard errors which were based on those obtained in PES 2001. Populations in institutions (other than Workers' Hostels), floating and homeless individuals were excluded from the PES sample.
The data files in the dataset include Household, Person, and Mortality files. The 10% sample for the Mortality data file was sampled separately and is not the same as the 10% sample for Household file and Person file.
Face-to-face
Three sets of questionnaires were developed for Census 2011: 1. Questionnaire A - the household questionnaire - administed to the population in a household set-up including those households that were found within an institution, such as staff residences 2. Questionnaire B - the population in transit (departing) and those on holiday on reference night (9/10 October 2011). The homeless were also enumerated using this set of questions 3. Questionnaire C - the institutions questionnaire administered to the population in collective living quarters (people who spent census night 9/10 October 2011 at the institution)
A Post-Enumeration Survey was carried out after the census, which used a PES questionnaire.
Comparison of Census 2011 with previous Censuses requires alignment of the data to 2011 municipal boundaries Questions on disability asked in former censuses were replaced in census 2011 with General health and functioning questions. Misreporting on general health and functioning for children younger than five years means data for this variable are only profiled for persons five years and older.
The dataset does not have a code list for the “geotype” variable which has 3 values (1,2,3).
This study is an experiment designed to compare the performance of three methodologies for sampling households with migrants:
Researchers from the World Bank applied these methods in the context of a survey of Brazilians of Japanese descent (Nikkei), requested by the World Bank. There are approximately 1.2-1.9 million Nikkei among Brazil’s 170 million population.
The survey was designed to provide detail on the characteristics of households with and without migrants, to estimate the proportion of households receiving remittances and with migrants in Japan, and to examine the consequences of migration and remittances on the sending households.
The same questionnaire was used for the stratified random sample and snowball surveys, and a shorter version of the questionnaire was used for the intercept surveys. Researchers can directly compare answers to the same questions across survey methodologies and determine the extent to which the intercept and snowball surveys can give similar results to the more expensive census-based survey, and test for the presence of biases.
Sao Paulo and Parana states
Japanese-Brazilian (Nikkei) households and individuals
The 2000 Brazilian Census was used to classify households as Nikkei or non-Nikkei. The Brazilian Census does not ask ethnicity but instead asks questions on race, country of birth and whether an individual has lived elsewhere in the last 10 years. On the basis of these questions, a household is classified as (potentially) Nikkei if it has any of the following: 1) a member born in Japan; 2) a member who is of yellow race and who has lived in Japan in the last 10 years; 3) a member who is of yellow race, who was not born in a country other than Japan (predominantly Korea, Taiwan or China) and who did not live in a foreign country other than Japan in the last 10 years.
Sample survey data [ssd]
1) Stratified random sample survey
Two states with the largest Nikkei population - Sao Paulo and Parana - were chosen for the study.
The sampling process consisted of three stages. First, a stratified random sample of 75 census tracts was selected based on 2000 Brazilian census. Second, interviewers carried out a door-to-door listing within each census tract to determine which households had a Nikkei member. Third, the survey questionnaire was then administered to households that were identified as Nikkei. A door-to-door listing exercise of the 75 census tracts was then carried out between October 13th, 2006, and October 29th, 2006. The fieldwork began on November 19, 2006, and all dwellings were visited at least once by December 22, 2006. The second wave of surveying took place from January 18th, 2007, to February 2nd, 2007, which was intended to increase the number of households responding.
2) Intercept survey
The intercept survey was designed to carry out interviews at a range of locations that were frequented by the Nikkei population. It was originally designed to be done in Sao Paulo city only, but a second intercept point survey was later carried out in Curitiba, Parana. Intercept survey took place between December 9th, 2006, and December 20th, 2006, whereas the Curitiba intercept survey took place between March 3rd and March 12th, 2007.
Consultations with Nikkei community organizations, local researchers and officers of the bank Sudameris, which provides remittance services to this community, were used to select a broad range of locations. Interviewers were assigned to visit each location during prespecified blocks of time. Two fieldworkers were assigned to each location. One fieldworker carried out the interviews, while the other carried out a count of the number of people with Nikkei appearance who appeared to be 18 years old or older who passed by each location. For the fixed places, this count was made throughout the prespecified time block. For example, between 2.30 p.m. and 3.30 p.m. at the sports club, the interviewer counted 57 adult Nikkeis. Refusal rates were carefully recorded, along with the sex and approximate age of the person refusing.
In all, 516 intercept interviews were collected.
3) Snowball sampling survey
The questionnaire that was used was the same as used for the stratified random sample. The plan was to begin with a seed list of 75 households, and to aim to reach a total sample of 300 households through referrals from the initial seed households. Each household surveyed was asked to supply the names of three contacts: (a) a Nikkei household with a member currently in Japan; (b) a Nikkei household with a member who has returned from Japan; (c) a Nikkei household without members in Japan and where individuals had not returned from Japan.
The snowball survey took place from December 5th to 20th, 2006. The second phase of the snowballing survey ran from January 22nd, 2007, to March 23rd, 2007. More associations were contacted to provide additional seed names (69 more names were obtained) and, as with the stratified sample, an adaptation of the intercept survey was used when individuals refused to answer the longer questionnaire. A decision was made to continue the snowball process until a target sample size of 100 had been achieved.
The final sample consists of 60 households who came as seed households from Japanese associations, and 40 households who were chain referrals. The longest chain achieved was three links.
Face-to-face [f2f]
1) Stratified sampling and snowball survey questionnaire
This questionnaire has 36 pages with over 1,000 variables, taking over an hour to complete.
If subjects refused to answer the questionnaire, interviewers would leave a much shorter version of the questionnaire to be completed by the household by themselves, and later picked up. This shorter questionnaire was the same as used in the intercept point survey, taking seven minutes on average. The intention with the shorter survey was to provide some data on households that would not answer the full survey because of time constraints, or because respondents were reluctant to have an interviewer in their house.
2) Intercept questionnaire
The questionnaire is four pages in length, consisting of 62 questions and taking a mean time of seven minutes to answer. Respondents had to be 18 years old or older to be interviewed.
1) Stratified random sampling 403 out of the 710 Nikkei households were surveyed, an interview rate of 57%. The refusal rate was 25%, whereas the remaining households were either absent on three attempts or were not surveyed because building managers refused permission to enter the apartment buildings. Refusal rates were higher in Sao Paulo than in Parana, reflecting greater concerns about crime and a busier urban environment.
2) Intercept Interviews 516 intercept interviews were collected, along with 325 refusals. The average refusal rate is 39%, with location-specific refusal rates ranging from only 3% at the food festival to almost 66% at one of the two grocery stores.