The General Household Survey-Panel (GHS-Panel) is implemented in collaboration with the World Bank Living Standards Measurement Study (LSMS) team as part of the Integrated Surveys on Agriculture (ISA) program. The objectives of the GHS-Panel include the development of an innovative model for collecting agricultural data, interinstitutional collaboration, and comprehensive analysis of welfare indicators and socio-economic characteristics. The GHS-Panel is a nationally representative survey of approximately 5,000 households, which are also representative of the six geopolitical zones. The 2023/24 GHS-Panel is the fifth round of the survey with prior rounds conducted in 2010/11, 2012/13, 2015/16 and 2018/19. The GHS-Panel households were visited twice: during post-planting period (July - September 2023) and during post-harvest period (January - March 2024).
National
• Households • Individuals • Agricultural plots • Communities
The survey covered all de jure households excluding prisons, hospitals, military barracks, and school dormitories.
Sample survey data [ssd]
The original GHS‑Panel sample was fully integrated with the 2010 GHS sample. The GHS sample consisted of 60 Primary Sampling Units (PSUs) or Enumeration Areas (EAs), chosen from each of the 37 states in Nigeria. This resulted in a total of 2,220 EAs nationally. Each EA contributed 10 households to the GHS sample, resulting in a sample size of 22,200 households. Out of these 22,200 households, 5,000 households from 500 EAs were selected for the panel component, and 4,916 households completed their interviews in the first wave.
After nearly a decade of visiting the same households, a partial refresh of the GHS‑Panel sample was implemented in Wave 4 and maintained for Wave 5. The refresh was conducted to maintain the integrity and representativeness of the sample. The refresh EAs were selected from the same sampling frame as the original GHS‑Panel sample in 2010. A listing of households was conducted in the 360 EAs, and 10 households were randomly selected in each EA, resulting in a total refresh sample of approximately 3,600 households.
In addition to these 3,600 refresh households, a subsample of the original 5,000 GHS‑Panel households from 2010 were selected to be included in the new sample. This “long panel” sample of 1,590 households was designed to be nationally representative to enable continued longitudinal analysis for the sample going back to 2010. The long panel sample consisted of 159 EAs systematically selected across Nigeria’s six geopolitical zones.
The combined sample of refresh and long panel EAs in Wave 5 that were eligible for inclusion consisted of 518 EAs based on the EAs selected in Wave 4. The combined sample generally maintains both the national and zonal representativeness of the original GHS‑Panel sample.
Although 518 EAs were identified for the post-planting visit, conflict events prevented interviewers from visiting eight EAs in the North West zone of the country. The EAs were located in the states of Zamfara, Katsina, Kebbi and Sokoto. Therefore, the final number of EAs visited both post-planting and post-harvest comprised 157 long panel EAs and 354 refresh EAs. The combined sample is also roughly equally distributed across the six geopolitical zones.
Computer Assisted Personal Interview [capi]
The GHS-Panel Wave 5 consisted of three questionnaires for each of the two visits. The Household Questionnaire was administered to all households in the sample. The Agriculture Questionnaire was administered to all households engaged in agricultural activities such as crop farming, livestock rearing, and other agricultural and related activities. The Community Questionnaire was administered to the community to collect information on the socio-economic indicators of the enumeration areas where the sample households reside.
GHS-Panel Household Questionnaire: The Household Questionnaire provided information on demographics; education; health; labour; childcare; early child development; food and non-food expenditure; household nonfarm enterprises; food security and shocks; safety nets; housing conditions; assets; information and communication technology; economic shocks; and other sources of household income. Household location was geo-referenced in order to be able to later link the GHS-Panel data to other available geographic data sets (forthcoming).
GHS-Panel Agriculture Questionnaire: The Agriculture Questionnaire solicited information on land ownership and use; farm labour; inputs use; GPS land area measurement and coordinates of household plots; agricultural capital; irrigation; crop harvest and utilization; animal holdings and costs; household fishing activities; and digital farming information. Some information is collected at the crop level to allow for detailed analysis for individual crops.
GHS-Panel Community Questionnaire: The Community Questionnaire solicited information on access to infrastructure and transportation; community organizations; resource management; changes in the community; key events; community needs, actions, and achievements; social norms; and local retail price information.
The Household Questionnaire was slightly different for the two visits. Some information was collected only in the post-planting visit, some only in the post-harvest visit, and some in both visits.
The Agriculture Questionnaire collected different information during each visit, but for the same plots and crops.
The Community Questionnaire collected prices during both visits, and different community level information during the two visits.
CAPI: Wave five exercise was conducted using Computer Assisted Person Interview (CAPI) techniques. All the questionnaires (household, agriculture, and community questionnaires) were implemented in both the post-planting and post-harvest visits of Wave 5 using the CAPI software, Survey Solutions. The Survey Solutions software was developed and maintained by the Living Standards Measurement Unit within the Development Economics Data Group (DECDG) at the World Bank. Each enumerator was given a tablet which they used to conduct the interviews. Overall, implementation of survey using Survey Solutions CAPI was highly successful, as it allowed for timely availability of the data from completed interviews.
DATA COMMUNICATION SYSTEM: The data communication system used in Wave 5 was highly automated. Each field team was given a mobile modem which allowed for internet connectivity and daily synchronization of their tablets. This ensured that head office in Abuja had access to the data in real-time. Once the interview was completed and uploaded to the server, the data was first reviewed by the Data Editors. The data was also downloaded from the server, and Stata dofile was run on the downloaded data to check for additional errors that were not captured by the Survey Solutions application. An excel error file was generated following the running of the Stata dofile on the raw dataset. Information contained in the excel error files were then communicated back to respective field interviewers for their action. This monitoring activity was done on a daily basis throughout the duration of the survey, both in the post-planting and post-harvest.
DATA CLEANING: The data cleaning process was done in three main stages. The first stage was to ensure proper quality control during the fieldwork. This was achieved in part by incorporating validation and consistency checks into the Survey Solutions application used for the data collection and designed to highlight many of the errors that occurred during the fieldwork.
The second stage cleaning involved the use of Data Editors and Data Assistants (Headquarters in Survey Solutions). As indicated above, once the interview is completed and uploaded to the server, the Data Editors review completed interview for inconsistencies and extreme values. Depending on the outcome, they can either approve or reject the case. If rejected, the case goes back to the respective interviewer’s tablet upon synchronization. Special care was taken to see that the households included in the data matched with the selected sample and where there were differences, these were properly assessed and documented. The agriculture data were also checked to ensure that the plots identified in the main sections merged with the plot information identified in the other sections. Additional errors observed were compiled into error reports that were regularly sent to the teams. These errors were then corrected based on re-visits to the household on the instruction of the supervisor. The data that had gone through this first stage of cleaning was then approved by the Data Editor. After the Data Editor’s approval of the interview on Survey Solutions server, the Headquarters also reviews and depending on the outcome, can either reject or approve.
The third stage of cleaning involved a comprehensive review of the final raw data following the first and second stage cleaning. Every variable was examined individually for (1) consistency with other sections and variables, (2) out of range responses, and (3) outliers. However, special care was taken to avoid making strong assumptions when resolving potential errors. Some minor errors remain in the data where the diagnosis and/or solution were unclear to the data cleaning team.
Response
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
analyze the area resource file (arf) with r the arf is fun to say out loud. it's also a single county-level data table with about 6,000 variables, produced by the united states health services and resources administration (hrsa). the file contains health information and statistics for over 3,000 us counties. like many government agencies, hrsa provides only a sas importation script and an as cii file. this new github repository contains two scripts: 2011-2012 arf - download.R download the zipped area resource file directly onto your local computer load the entire table into a temporary sql database save the condensed file as an R data file (.rda), comma-separated value file (.csv), and/or stata-readable file (.dta). 2011-2012 arf - analysis examples.R limit the arf to the variables necessary for your analysis sum up a few county-level statistics merge the arf onto other data sets, using both fips and ssa county codes create a sweet county-level map click here to view these two scripts for mo re detail about the area resource file (arf), visit: the arf home page the hrsa data warehouse notes: the arf may not be a survey data set itself, but it's particularly useful to merge onto other survey data. confidential to sas, spss, stata, and sudaan users: time to put down the abacus. time to transition to r. :D
Introduction. This document provides an overview of an archive composed of four sections.
[1] An introduction (this document) which describes the scope of the project
[2] Yearly folder, from 2002 until 2010, of the coarse Microsoft Access datasets + the surveys used to collect information for each year. The word coarse does not mean the information in the Microsoft Access dataset was not corrected for mistakes; it was, but some mistakes and inconsistencies remain, such as with data on age or education. Furthermore, the coarse dataset provides disaggregated information for selected topics, which appear in summary statistics in the clean dataset. For example, in the coarse dataset one can find the different illnesses afflicting a person during the past 14 days whereas in the clean dataset only the total number of illnesses appears.
[3] A letter from the Gran Consejo Tsimane’ authorizing the public use of de-identified data collected in our studies among Tsimane’.
[4] A Microsoft Excel document with the unique identification number for each person in the panel study.
Background. During 2002-2010, a team of international researchers, surveyors, and translators gathered longitudinal (panel) data on the demography, economy, social relations, health, nutritional status, local ecological knowledge, and emotions of about 1400 native Amazonians known as Tsimane’ who lived in thirteen villages near and far from towns in the department of Beni in the Bolivian Amazon. A report titled “Too little, too late” summarizes selected findings from the study and is available to the public at the electronic library of Brandeis University:
https://scholarworks.brandeis.edu/permalink/01BRAND_INST/1bo2f6t/alma9923926194001921
A copy of the clean, merged, and appended Stata (V17) dataset is available to the public at the following two web addresses:
[a] Brandeis University:
https://scholarworks.brandeis.edu/permalink/01BRAND_INST/1bo2f6t/alma9923926193901921
[b] Inter-university Consortium for Political and Social Research (ICPSR), University of Michigan (only available to users affiliated with institutions belonging to ICPSR)
http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/37671/utilization
Chapter 4 of the report “Too little, too late” mentioned above describes the motivation and history of the study, the difference between the coarse and clean datasets, and topics which can be examined only with coarse data.
Aims. The aims of this archive are to:
· Make available in Microsoft Access the coarse de-identified dataset [1] for each of the seven yearly surveys (2004-2010) and [2] one Access data based on quarterly surveys done during 2002 and 2003. Together, these two datasets form one longitudinal dataset of individuals, households, and villages.
· Provide guidance on how to link files within and across years, and
· Make available a Microsoft Excel file with a unique identification number to link individuals across years
The datasets in the archive.
· Eight Microsoft Access datasets with data on a wide range of variables. Except for the Access file for 2002-2003, all the other information in each of the other Access files refers to one year. Within any Access dataset, users will find two types of files:
o Thematic files. The name of a thematic file contains the prefix tbl (e.g., 29_tbl_Demography or tbl_29_Demography). The file name (sometimes in Spanish, sometimes in English) indicates the content of the file. For example, in the Access dataset for one year, the micro file tbl_30_Ventas has all the information on sales for that year. Within each micro file, columns contain information on a variable and the name of the column indicates the content of the variable. For instance, the column heading item in the Sales file would indicate the type of good sold. The exac…
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Background: In 1986, the Congress enacted Public Laws 99-500 and 99-591, requiring a biennial report on the Special Supplemental Nutrition Program for Women, Infants, and Children (WIC). In response to these requirements, FNS developed a prototype system that allowed for the routine acquisition of information on WIC participants from WIC State Agencies. Since 1992, State Agencies have provided electronic copies of these data to FNS on a biennial basis.FNS and the National WIC Association (formerly National Association of WIC Directors) agreed on a set of data elements for the transfer of information. In addition, FNS established a minimum standard dataset for reporting participation data. For each biennial reporting cycle, each State Agency is required to submit a participant-level dataset containing standardized information on persons enrolled at local agencies for the reference month of April. The 2020 Participant and Program Characteristics (PC2020) is the 17th to be completed using the prototype PC reporting system. In April 2020, there were 89 State agencies: the 50 States, American Samoa, the District of Columbia, Guam, the Northern Mariana Islands, Puerto Rico, the U.S. Virgin Islands, and 33 Indian Tribal Organizations (ITOs).Processing methods and equipment used: Specifications on formats (“Guidance for States Providing Participant Data”) were provided to all State agencies in January 2020. This guide specified 20 minimum dataset (MDS) elements and 11 supplemental dataset (SDS) elements to be reported on each WIC participant. Each State Agency was required to submit all 20 MDS items and any SDS items collected by the State agency. Study date(s) and duration The information for each participant was from the participants’ most current WIC certification as of April 2020.Study spatial scale (size of replicates and spatial scale of study area): In April 2020, there were 89 State agencies: the 50 States, American Samoa, the District of Columbia, Guam, the Northern Mariana Islands, Puerto Rico, the U.S. Virgin Islands, and 33 Indian Tribal Organizations (ITOs).Level of true replication: UnknownSampling precision (within-replicate sampling or pseudoreplication):State Agency Data Submissions. PC2020 is a participant dataset consisting of 7,036,867 active records. The records, submitted to USDA by the State Agencies, comprise a census of all WIC enrollees, so there is no sampling involved in the collection of this data.PII Analytic Datasets. State agency files were combined to create a national census participant file of approximately 7 million records. The census dataset contains potentially personally identifiable information (PII) and is therefore not made available to the public.National Sample Dataset. The public use SAS analytic dataset made available to the public has been constructed from a nationally representative sample drawn from the census of WIC participants, selected by participant category. The national sample consists of 1 percent of the total number of participants, or 70,368 records. The distribution by category is 5,469 pregnant women, 6,131 breastfeeding women, 4,373 postpartum women, 16,817 infants, and 37,578 children.Level of subsampling (number and repeat or within-replicate sampling): The proportionate (or self-weighting) sample was drawn by WIC participant category: pregnant women, breastfeeding women, postpartum women, infants, and children. In this type of sample design, each WIC participant has the same probability of selection across all strata. Sampling weights are not needed when the data are analyzed. In a proportionate stratified sample, the largest stratum accounts for the highest percentage of the analytic sample.Study design (before–after, control–impacts, time series, before–after-control–impacts): None – Non-experimentalDescription of any data manipulation, modeling, or statistical analysis undertaken: Each entry in the dataset contains all MDS and SDS information submitted by the State agency on the sampled WIC participant. In addition, the file contains constructed variables used for analytic purposes. To protect individual privacy, the public use file does not include State agency, local agency, or case identification numbers.Description of any gaps in the data or other limiting factors: All State agencies provided data on a census of their WIC participants.Resources in this dataset:Resource Title: WIC PC 2020 National Sample File Public Use Codebook.; File Name: PC2020 National Sample File Public Use Codebook.docx; Resource Description: WIC PC 2020 National Sample File Public Use CodebookResource Title: WIC PC 2020 Public Use CSV Data.; File Name: wicpc2020_public_use.csv; Resource Description: WIC PC 2020 Public Use CSV DataResource Title: WIC PC 2020 Data Set SAS, R, SPSS, Stata.; File Name: PC2020 Ag Data Commons.zipResource; Description: WIC PC 2020 Data Set SAS, R, SPSS, Stata One dataset in multiple formats
The 2016 Integrated Household Panel Survey (IHPS) was launched in April 2016 as part of the Malawi Fourth Integrated Household Survey fieldwork operation. The IHPS 2016 targeted 1,989 households that were interviewed in the IHPS 2013 and that could be traced back to half of the 204 enumeration areas that were originally sampled as part of the Third Integrated Household Survey (IHS3) 2010/11. The 2019 IHPS was launched in April 2019 as part of the Malawi Fifth Integrated Household Survey fieldwork operations targeting the 2,508 households that were interviewed in 2016. The panel sample expanded each wave through the tracking of split-off individuals and the new households that they formed. Available as part of this project is the IHPS 2019 data, the IHPS 2016 data as well as the rereleased IHPS 2010 & 2013 data including only the subsample of 102 EAs with updated panel weights. Additionally, the IHPS 2016 was the first survey that received complementary financial and technical support from the Living Standards Measurement Study – Plus (LSMS+) initiative, which has been established with grants from the Umbrella Facility for Gender Equality Trust Fund, the World Bank Trust Fund for Statistical Capacity Building, and the International Fund for Agricultural Development, and is implemented by the World Bank Living Standards Measurement Study (LSMS) team, in collaboration with the World Bank Gender Group and partner national statistical offices. The LSMS+ aims to improve the availability and quality of individual-disaggregated household survey data, and is, at start, a direct response to the World Bank IDA18 commitment to support 6 IDA countries in collecting intra-household, sex-disaggregated household survey data on 1) ownership of and rights to selected physical and financial assets, 2) work and employment, and 3) entrepreneurship – following international best practices in questionnaire design and minimizing the use of proxy respondents while collecting personal information. This dataset is included here.
National coverage
The IHPS 2016 and 2019 attempted to track all IHPS 2013 households stemming from 102 of the original 204 baseline panel enumeration areas as well as individuals that moved away from the 2013 dwellings between 2013 and 2016 as long as they were neither servants nor guests at the time of the IHPS 2013; were projected to be at least 12 years of age and were known to be residing in mainland Malawi but excluding those in Likoma Island and in institutions, including prisons, police compounds, and army barracks.
Sample survey data [ssd]
A sub-sample of IHS3 2010 sample enumeration areas (EAs) (i.e. 204 EAs out of 768 EAs) was selected prior to the start of the IHS3 field work with the intention to (i) to track and resurvey these households in 2013 in accordance with the IHS3 fieldwork timeline and as part of the Integrated Household Panel Survey (IHPS 2013) and (ii) visit a total of 3,246 households in these EAs twice to reduce recall associated with different aspects of agricultural data collection. At baseline, the IHPS sample was selected to be representative at the national, regional, urban/rural levels and for each of the following 6 strata: (i) Northern Region - Rural, (ii) Northern Region - Urban, (iii) Central Region - Rural, (iv) Central Region - Urban, (v) Southern Region - Rural, and (vi) Southern Region - Urban. The IHPS 2013 main fieldwork took place during the period of April-October 2013, with residual tracking operations in November-December 2013.
Given budget and resource constraints, for the IHPS 2016 the number of sample EAs in the panel was reduced to 102 out of the 204 EAs. As a result, the domains of analysis are limited to the national, urban and rural areas. Although the results of the IHPS 2016 cannot be tabulated by region, the stratification of the IHPS by region, urban and rural strata was maintained. The IHPS 2019 tracked all individuals 12 years or older from the 2016 households.
Computer Assisted Personal Interview [capi]
Data Entry Platform To ensure data quality and timely availability of data, the IHPS 2019 was implemented using the World Bank’s Survey Solutions CAPI software. To carry out IHPS 2019, 1 laptop computer and a wireless internet router were assigned to each team supervisor, and each enumerator had an 8–inch GPS-enabled Lenovo tablet computer that the NSO provided. The use of Survey Solutions allowed for the real-time availability of data as the completed data was completed, approved by the Supervisor and synced to the Headquarters server as frequently as possible. While administering the first module of the questionnaire the enumerator(s) also used their tablets to record the GPS coordinates of the dwelling units. Geo-referenced household locations from that tablet complemented the GPS measurements taken by the Garmin eTrex 30 handheld devices and these were linked with publically available geospatial databases to enable the inclusion of a number of geospatial variables - extensive measures of distance (i.e. distance to the nearest market), climatology, soil and terrain, and other environmental factors - in the analysis.
Data Management The IHPS 2019 Survey Solutions CAPI based data entry application was designed to stream-line the data collection process from the field. IHPS 2019 Interviews were mainly collected in “sample” mode (assignments generated from headquarters) and a few in “census” mode (new interviews created by interviewers from a template) for the NSO to have more control over the sample. This hybrid approach was necessary to aid the tracking operations whereby an enumerator could quickly create a tracking assignment considering that they were mostly working in areas with poor network connection and hence could not quickly receive tracking cases from Headquarters.
The range and consistency checks built into the application was informed by the LSMS-ISA experience with the IHS3 2010/11, IHPS 2013 and IHPS 2016. Prior programming of the data entry application allowed for a wide variety of range and consistency checks to be conducted and reported and potential issues investigated and corrected before closing the assigned enumeration area. Headquarters (the NSO management) assigned work to the supervisors based on their regions of coverage. The supervisors then made assignments to the enumerators linked to their supervisor account. The work assignments and syncing of completed interviews took place through a Wi-Fi connection to the IHPS 2019 server. Because the data was available in real time it was monitored closely throughout the entire data collection period and upon receipt of the data at headquarters, data was exported to Stata for other consistency checks, data cleaning, and analysis.
Data Cleaning The data cleaning process was done in several stages over the course of fieldwork and through preliminary analysis. The first stage of data cleaning was conducted in the field by the field-based field teams utilizing error messages generated by the Survey Solutions application when a response did not fit the rules for a particular question. For questions that flagged an error, the enumerators were expected to record a comment within the questionnaire to explain to their supervisor the reason for the error and confirming that they double checked the response with the respondent. The supervisors were expected to sync the enumerator tablets as frequently as possible to avoid having many questionnaires on the tablet, and to enable daily checks of questionnaires. Some supervisors preferred to review completed interviews on the tablets so they would review prior to syncing but still record the notes in the supervisor account and reject questionnaires accordingly. The second stage of data cleaning was also done in the field, and this resulted from the additional error reports generated in Stata, which were in turn sent to the field teams via email or DropBox. The field supervisors collected reports for their assignments and in coordination with the enumerators reviewed, investigated, and collected errors. Due to the quick turn-around in error reporting, it was possible to conduct call-backs while the team was still operating in the EA when required. Corrections to the data were entered in the rejected questionnaires and sent back to headquarters.
The data cleaning process was done in several stages over the course of the fieldwork and through preliminary analyses. The first stage was during the interview itself. Because CAPI software was used, as enumerators asked the questions and recorded information, error messages were provided immediately when the information recorded did not match previously defined rules for that variable. For example, if the education level for a 12 year old respondent was given as post graduate. The second stage occurred during the review of the questionnaire by the Field Supervisor. The Survey Solutions software allows errors to remain in the data if the enumerator does not make a correction. The enumerator can write a comment to explain why the data appears to be incorrect. For example, if the previously mentioned 12 year old was, in fact, a genius who had completed graduate studies. The next stage occurred when the data were transferred to headquarters where the NSO staff would again review the data for errors and verify the comments from the
The General Household Survey-Panel (GHS-Panel) is implemented in collaboration with the World Bank Living Standards Measurement Study (LSMS) team as part of the Integrated Surveys on Agriculture (ISA) program. The objectives of the GHS-Panel include the development of an innovative model for collecting agricultural data, interinstitutional collaboration, and comprehensive analysis of welfare indicators and socio-economic characteristics. The GHS-Panel is a nationally representative survey of approximately 5,000 households, which are also representative of the six geopolitical zones. The 2018/19 is the fourth round of the survey with prior rounds conducted in 2010/11, 2012/13, and 2015/16. GHS-Panel households were visited twice: first after the planting season (post-planting) between July and September 2018 and second after the harvest season (post-harvest) between January and February 2019.
National, the survey covered all the 36 states and Federal Capital Territory (FCT).
Households, Individuals, Agricultural plots, Communites
Sample survey data [ssd]
The original GHS-Panel sample of 5,000 households across 500 enumeration areas (EAs) and was designed to be representative at the national level as well as at the zonal level. The complete sampling information for the GHS-Panel is described in the Basic Information Document for GHS-Panel 2010/2011. However, after a nearly a decade of visiting the same households, a partial refresh of the GHS-Panel sample was implemented in Wave 4. For the partial refresh of the sample, a new set of 360 EAs were randomly selected which consisted of 60 EAs per zone. The refresh EAs were selected from the same sampling frame as the original GHS-Panel sample in 2010 (the "master frame").
A listing of all households was conducted in the 360 EAs and 10 households were randomly selected in each EA, resulting in a total refresh sample of approximated 3,600 households. In addition to these 3,600 refresh households, a subsample of the original 5,000 GHS-Panel households from 2010 were selected to be included in the new sample. This "long panel" sample was designed to be nationally representative to enable continued longitudinal analysis for the sample going back to 2010. The long panel sample consisted of 159 EAs systematically selected across the 6 geopolitical Zones. The systematic selection ensured that the distribution of EAs across the 6 Zones (and urban and rural areas within) is proportional to the original GHS-Panel sample.
Interviewers attempted to interview all households that originally resided in the 159 EAs and were successfully interviewed in the previous visit in 2016. This includes households that had moved away from their original location in 2010. In all, interviewers attempted to interview 1,507 households from the original panel sample. The combined sample of refresh and long panel EAs consisted of 519 EAs. The total number of households that were successfully interviewed in both visits was 4,976.
While the combined sample generally maintains both national and Zonal representativeness of the original GHS-Panel sample, the security situation in the North East of Nigeria prevented full coverage of the Zone. Due to security concerns, rural areas of Borno state were fully excluded from the refresh sample and some inaccessible urban areas were also excluded. Security concerns also prevented interviewers from visiting some communities in other parts of the country where conflict events were occurring. Refresh EAs that could not be accessed were replaced with another randomly selected EA in the Zone so as not to compromise the sample size. As a result, the combined sample is representative of areas of Nigeria that were accessible during 2018/19. The sample will not reflect conditions in areas that were undergoing conflict during that period. This compromise was necessary to ensure the safety of interviewers.
Computer Assisted Personal Interview [capi]
CAPI: For the first time in GHS-Panel, the Wave four exercise was conducted using Computer Assisted Person Interview (CAPI) techniques. All the questionnaires, household, agriculture and community questionnaires were implemented in both the post-planting and post-harvest visits of Wave 4 using the CAPI software, Survey Solutions. The Survey Solutions software was developed and maintained by the Survey Unit within the Development Economics Data Group (DECDG) at the World Bank. Each enumerator was given tablets which they used to conduct the interviews. Overall, implementation of survey using Survey Solutions CAPI was highly successful, as it allowed for timely availability of the data from completed interviews. DATA COMMUNICATION SYSTEM: The data communication system used in Wave 4 was highly automated. Each field team was given a mobile modem allow for internet connectivity and daily synchronization of their tablet. This ensured that head office in Abuja has access to the data in real-time. Once the interview is completed and uploaded to the server, the data is first reviewed by the Data Editors.
The data is also downloaded from the server, and Stata dofile was run on the downloaded data to check for additional errors that were not captured by the Survey Solutions application. An excel error file is generated following the running of the Stata dofile on the raw dataset. Information contained in the excel error files are communicated back to respective field interviewers for action by the interviewers. This action is done on a daily basis throughout the duration of the survey, both in the post-planting and post-harvest. DATA CLEANING: The data cleaning process was done in three main stages. The first stage was to ensure proper quality control during the fieldwork. This was achieved in part by incorporating validation and consistency checks into the Survey Solutions application used for the data collection and designed to highlight many of the errors that occurred during the fieldwork. The second stage cleaning involved the use of Data Editors and Data Assistants (Headquarters in Survey Solutions). As indicated above, once the interview is completed and uploaded to the server, the Data Editors review completed interview for inconsistencies and extreme values. Depending on the outcome, they can either approve or reject the case. If rejected, the case goes back to the respective interviewer's tablet upon synchronization. Special care was taken to see that the households included in the data matched with the selected sample and where there were differences, these were properly assessed and documented.
The agriculture data were also checked to ensure that the plots identified in the main sections merged with the plot information identified in the other sections. Additional errors observed were compiled into error reports that were regularly sent to the teams. These errors were then corrected based on re-visits to the household on the instruction of the supervisor. The data that had gone through this first stage of cleaning was then approved by the Data Editor. After the Data Editor's approval of the interview on Survey Solutions server, the Headquarters also reviews and depending on the outcome, can either reject or approve. The third stage of cleaning involved a comprehensive review of the final raw data following the first and second stage cleaning. Every variable was examined individually for (1) consistency with other sections and variables, (2) out of range responses, and (3) outliers. However, special care was taken to avoid making strong assumptions when resolving potential errors. Some minor errors remain in the data where the diagnosis and/or solution were unclear to the data cleaning team.
The Integrated Household Survey is one of the primary instruments implemented by the Government of Malawi through the National Statistical Office (NSO) roughly every 3-5 years to monitor and evaluate the changing conditions of Malawian households. The IHS data have, among other insights, provided benchmark poverty and vulnerability indicators to foster evidence-based policy formulation and monitor the progress of meeting the Millennium Development Goals (MDGs), the goals listed as part of the Malawi Growth and Development Strategy (MGDS) and now the Sustainable Development Goals (SDGs).
National coverage
Members of the following households are not eligible for inclusion in the survey: • All people who live outside the selected EAs, whether in urban or rural areas. • All residents of dwellings other than private dwellings, such as prisons, hospitals and army barracks. • Members of the Malawian armed forces who reside within a military base. (If such individuals reside in private dwellings off the base, however, they should be included among the households eligible for random selection for the survey.) • Non-Malawian diplomats, diplomatic staff, and members of their households. (However, note that non-Malawian residents who are not diplomats or diplomatic staff and are resident in private dwellings are eligible for inclusion in the survey. The survey is not restricted to Malawian citizens alone.) • Non-Malawian tourists and others on vacation in Malawi.
Sample survey data [ssd]
The IHS5 sampling frame is based on the listing information and cartography from the 2018 Malawi Population and Housing Census (PHC); includes the three major regions of Malawi, namely North, Center and South; and is stratified into rural and urban strata. The urban strata include the four major urban areas: Lilongwe City, Blantyre City, Mzuzu City, and the Municipality of Zomba. All other areas are considered as rural areas, and each of the 27 districts were considered as a separate sub-stratum as part of the main rural stratum. The sampling frame further excludes the population living in institutions, such as hospitals, prisons and military barracks. Hence, the IHS5 strata are composed of 32 districts in Malawi.
A stratified two-stage sample design was used for the IHS5.
Note: Detailed sample design information is presented in the "Fifth Integrated Household Survey 2019-2020, Basic Information Document" document.
Computer Assisted Personal Interview [capi]
HOUSEHOLD QUESTIONNAIRE The Household Questionnaire is a multi-topic survey instrument and is near-identical to the content and organization of the IHS3 and IHS4 questionnaires. It encompasses economic activities, demographics, welfare and other sectoral information of households. It covers a wide range of topics, dealing with the dynamics of poverty (consumption, cash and non-cash income, savings, assets, food security, health and education, vulnerability and social protection). Although the IHS5 household questionnaire covers a wide variety of topics in detail it intentionally excludes in-depth information on topics covered in other surveys that are part of the NSO’s statistical plan (such as maternal and child health issues covered at length in the Malawi Demographic and Health Survey).
AGRICULTURE QUESTIONNAIRE All IHS5 households that are identified as being involved in agricultural or livestock activities were administered the agriculture questionnaire, which is primarily modelled after the IHS3 counterpart. The modules are expanding on the agricultural content of the IHS4, IHS3, IHS2, AISS, and other regional agricultural surveys, while remaining consistent with the NACAL topical coverage and methodology. The development of the agriculture questionnaire was done with input from the aforementioned stakeholders who provided input on the household questionnaire as well as outside researchers involved in research and policy discussions pertaining to the Malawian agriculture. The agriculture questionnaire allows, among other things, for extensive agricultural productivity analysis through the diligent estimation of land areas, both owned and cultivated, labor and non-labor input use and expenditures, and production figures for main crops, and livestock. Although one of the major foci of the agriculture data collection effort was to produce smallholder production estimates for major crops, it is also possible to disaggregate the data by gender and main geographical regions. The IHS5 cross-sectional households supply information on the last completed rainy season (2017/2018 or 2018/2019) and the last completed dry season (2018 or 2019) depending on the timing of their interview.
FISHERIES QUESTIONNAIRE The design of the IHS5 fishery questionnaire is identical to the questionnaire designed for IHS3. The IHS3 fisheries questionnaire was informed by the design and piloting of a fishery questionnaire by the World Fish Center (WFC), which was supported by the LSMS-ISA project for the purpose of assembling a fishery questionnaire that could be integrated into multi-topic household-surveys. The WFC piloted the draft instrument in November 2009 in the Lower Shire region, and the NSO team considered the revised draft in designing the IHS5 fishery questionnaire.
COMMUNITY QUESTIONNAIRE The content of the IHS5 Community Questionnaire follows the content of the IHS3 & IHS4 Community Questionnaires. A “community” is defined as the village or urban location surrounding the enumeration area selected for inclusion in the sample and which most residents recognize as being their community. The IHS5 community questionnaire was administered to each community associated with the cross-sectional EAs interviewed. Identical to the IHS3 and IHS4 approach, to a group of several knowledgeable residents such as the village headman, the headmaster of the local school, the agricultural field assistant, religious leaders, local merchants, health workers and long-term knowledgeable residents. The instrument gathers information on a range of community characteristics, including religious and ethnic background, physical infrastructure, access to public services, economic activities, communal resource management, organization and governance, investment projects, and local retail price information for essential goods and services.
MARKET QUESTIONNAIRE The Market Survey consisted of one questionnaire which is composed of four modules. Module A: Market Identification, Module B: Seasonal Main Crops, Module C: Permanents Crops, and Module D: Food Consumption.
DATA ENTRY PLATFORM To ensure data quality and timely availability of data, the IHS5 was implemented using the World Bank’s Survey Solutions CAPI software. To carry out IHS5, 1 laptop computer and a wireless internet router were assigned to each team supervisor, and each enumerator had an 8–inch GPS-enabled Lenovo tablet computer. The use of Survey Solutions allowed for the real-time availability of data as the completed data was completed, approved by the Supervisor and synced to the Headquarters server as frequently as possible. While administering the first module of the questionnaire the enumerator(s) also used their tablets to record the GPS coordinates of the dwelling units. In Survey Solutions, Headquarters can then see the location of the dwellings plotted on a map of Malawi to better enable supervision from afar – checking both the number of interviews performed and the fact that the sample households lie within EA boundaries. Geo-referenced household locations from that tablet complemented the GPS measurements taken by the Garmin eTrex 30 handheld devices and these were linked with publically available geospatial databases to enable the inclusion of a number of geospatial variables - extensive measures of distance (i.e. distance to the nearest market), climatology, soil and terrain, and other environmental factors - in the analysis.
The range and consistency checks built into the application was informed by the LSMS-ISA experience in previous IHS waves. Prior programming of the data entry application allowed for a wide variety of range and consistency checks to be conducted and reported and potential issues investigated and corrected before closing the assigned enumeration area. Headquarters (NSO management) assigned work to supervisors based on their regions of coverage. Supervisors then made assignments to the enumerators linked to their Supervisor account. The work assignments and syncing of completed interviews took place through a Wi-Fi connection to the IHS5 server. Because the data was available in real time it was monitored closely throughout the entire data collection period and upon receipt of the data at headquarters, data was exported to STATA for other consistency checks, data cleaning, and analysis.
DATA MANAGEMENT The IHS5 Survey Solutions CAPI based data entry application was designed to stream-line the data collection process from the field. IHS5 Interviews were collected in “sample” mode (assignments generated from headquarters) as opposed to “census” mode (new interviews created by interviewers from a template) for the NSO to have more control over the sample.
The range and consistency checks built into the application was informed by the LSMS-ISA experience in previous IHS waves. Prior programming of the data
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Although Jordan’s economy is dominated by micro and small enterprises (MSEs), relatively little is known about them. To overcome this informational gap, USAID LENS conducted a survey of MSEs in 2014, 2015, and 2018 to better understand Jordanian enterprises and to assess the major barriers and opportunities for growth. The study covers general demographics, workforce trends, firm performance, access to finance, processes and networks, and the impact of the Syrian refugee crisis.The survey consists of a range of questions in a two-stage design with stratification. The data gathers representative information for all MSEs operating in the governorates of Amman (excluding the Greater Amman Municipality), Zarqa, Irbid, Karak, Tafilah, and Aqaba (excluding the ASEZA free zone). Although the study is not intended to be national in scope, the target population of the six areas collectively capture 60% of the kingdom’s population.As a complex survey, the research design was undertaken using probability sampling in two phases. In the first phase, 977 geographic clusters were randomly selected from districts in each area. From these clusters, 97,347 households were contacted through door-to-door interviews, of which 10,197 reported owning a business. A sub-sample of 6,385 MSEs was then drawn, stratified by sector governorate. 4,721 of these MSEs were then successfully surveyed in wave I (2014/15) and 1699 in wave II (2018). The results can reliably be generalized to all MSEs within this geographic boundary. For more details, see the weighting report (2018)The public release of the MSE Survey (2018) includes the majority of the data collected in the core 2018 sample. However, to protect privacy of respondents, a number of sensitive and identifying variables have been omitted and generalized. To request access to a full scientific release without these limitations, please contact:Bryanna Millis,Senior Technical Advisor, FHI 360bmillis@fhi360.org
The cleaned and harmonized version of the survey data produced and published by the Economic Research Forum represents 100% of the original survey data collected by the Central Agency for Public Mobilization and Statistics (CAPMAS)
In any society, the human element represents the basis of the work force which exercises all the service and production activities. Therefore, it is a mandate to produce labor force statistics and studies, that is related to the growth and distribution of manpower and labor force distribution by different types and characteristics.
In this context, the Central Agency for Public Mobilization and Statistics conducts "Quarterly Labor Force Survey" which includes data on the size of manpower and labor force (employed and unemployed) and their geographical distribution by their characteristics.
By the end of each year, CAPMAS issues the annual aggregated labor force bulletin publication that includes the results of the quarterly survey rounds that represent the manpower and labor force characteristics during the year.
----> Historical Review of the Labor Force Survey:
1- The First Labor Force survey was undertaken in 1957. The first round was conducted in November of that year, the survey continued to be conducted in successive rounds (quarterly, bi-annually, or annually) till now.
2- Starting the October 2006 round, the fieldwork of the labor force survey was developed to focus on the following two points: a. The importance of using the panel sample that is part of the survey sample, to monitor the dynamic changes of the labor market. b. Improving the used questionnaire to include more questions, that help in better defining of relationship to labor force of each household member (employed, unemployed, out of labor force ...etc.). In addition to re-order of some of the already existing questions in much logical way.
3- Starting the January 2008 round, the used methodology was developed to collect more representative sample during the survey year. this is done through distributing the sample of each governorate into five groups, the questionnaires are collected from each of them separately every 15 days for 3 months (in the middle and the end of the month)
----> The survey aims at covering the following topics:
1- Measuring the size of the Egyptian labor force among civilians (for all governorates of the republic) by their different characteristics. 2- Measuring the employment rate at national level and different geographical areas. 3- Measuring the distribution of employed people by the following characteristics: gender, age, educational status, occupation, economic activity, and sector. 4- Measuring unemployment rate at different geographic areas. 5- Measuring the distribution of unemployed people by the following characteristics: gender, age, educational status, unemployment type "ever employed/never employed", occupation, economic activity, and sector for people who have ever worked.
The raw survey data provided by the Statistical Agency were cleaned and harmonized by the Economic Research Forum, in the context of a major project that started in 2009. During which extensive efforts have been exerted to acquire, clean, harmonize, preserve and disseminate micro data of existing labor force surveys in several Arab countries.
Covering a sample of urban and rural areas in all the governorates.
1- Household/family. 2- Individual/person.
The survey covered a national sample of households and all individuals permanently residing in surveyed households.
Sample survey data [ssd]
The cleaned and harmonized version of the survey data produced and published by the Economic Research Forum represents 100% of the original survey data collected by the Central Agency for Public Mobilization and Statistics (CAPMAS)
Sample Design and Selection
The sample of the LFS 2006 survey is a simple systematic random sample.
Sample Size
The sample size varied in each quarter (it is Q1=19429, Q2=19419, Q3=19119 and Q4=18835) households with a total number of 76802 households annually. These households are distributed on the governorate level (urban/rural).
A more detailed description of the different sampling stages and allocation of sample across governorates is provided in the Methodology document available among external resources in Arabic.
Face-to-face [f2f]
The questionnaire design follows the latest International Labor Organization (ILO) concepts and definitions of labor force, employment, and unemployment.
The questionnaire comprises 3 tables in addition to the identification and geographic data of household on the cover page.
----> Table 1- Demographic and employment characteristics and basic data for all household individuals
Including: gender, age, educational status, marital status, residence mobility and current work status
----> Table 2- Employment characteristics table
This table is filled by employed individuals at the time of the survey or those who were engaged to work during the reference week, and provided information on: - Relationship to employer: employer, self-employed, waged worker, and unpaid family worker - Economic activity - Sector - Occupation - Effective working hours - Work place - Average monthly wage
----> Table 3- Unemployment characteristics table
This table is filled by all unemployed individuals who satisfied the unemployment criteria, and provided information on: - Type of unemployment (unemployed, unemployed ever worked) - Economic activity and occupation in the last held job before being unemployed - Last unemployment duration in months - Main reason for unemployment
----> Raw Data
Office editing is one of the main stages of the survey. It started once the questionnaires were received from the field and accomplished by the selected work groups. It includes: a-Editing of coverage and completeness b-Editing of consistency
----> Harmonized Data
Not seeing a result you expected?
Learn how you can add new datasets to our index.
The General Household Survey-Panel (GHS-Panel) is implemented in collaboration with the World Bank Living Standards Measurement Study (LSMS) team as part of the Integrated Surveys on Agriculture (ISA) program. The objectives of the GHS-Panel include the development of an innovative model for collecting agricultural data, interinstitutional collaboration, and comprehensive analysis of welfare indicators and socio-economic characteristics. The GHS-Panel is a nationally representative survey of approximately 5,000 households, which are also representative of the six geopolitical zones. The 2023/24 GHS-Panel is the fifth round of the survey with prior rounds conducted in 2010/11, 2012/13, 2015/16 and 2018/19. The GHS-Panel households were visited twice: during post-planting period (July - September 2023) and during post-harvest period (January - March 2024).
National
• Households • Individuals • Agricultural plots • Communities
The survey covered all de jure households excluding prisons, hospitals, military barracks, and school dormitories.
Sample survey data [ssd]
The original GHS‑Panel sample was fully integrated with the 2010 GHS sample. The GHS sample consisted of 60 Primary Sampling Units (PSUs) or Enumeration Areas (EAs), chosen from each of the 37 states in Nigeria. This resulted in a total of 2,220 EAs nationally. Each EA contributed 10 households to the GHS sample, resulting in a sample size of 22,200 households. Out of these 22,200 households, 5,000 households from 500 EAs were selected for the panel component, and 4,916 households completed their interviews in the first wave.
After nearly a decade of visiting the same households, a partial refresh of the GHS‑Panel sample was implemented in Wave 4 and maintained for Wave 5. The refresh was conducted to maintain the integrity and representativeness of the sample. The refresh EAs were selected from the same sampling frame as the original GHS‑Panel sample in 2010. A listing of households was conducted in the 360 EAs, and 10 households were randomly selected in each EA, resulting in a total refresh sample of approximately 3,600 households.
In addition to these 3,600 refresh households, a subsample of the original 5,000 GHS‑Panel households from 2010 were selected to be included in the new sample. This “long panel” sample of 1,590 households was designed to be nationally representative to enable continued longitudinal analysis for the sample going back to 2010. The long panel sample consisted of 159 EAs systematically selected across Nigeria’s six geopolitical zones.
The combined sample of refresh and long panel EAs in Wave 5 that were eligible for inclusion consisted of 518 EAs based on the EAs selected in Wave 4. The combined sample generally maintains both the national and zonal representativeness of the original GHS‑Panel sample.
Although 518 EAs were identified for the post-planting visit, conflict events prevented interviewers from visiting eight EAs in the North West zone of the country. The EAs were located in the states of Zamfara, Katsina, Kebbi and Sokoto. Therefore, the final number of EAs visited both post-planting and post-harvest comprised 157 long panel EAs and 354 refresh EAs. The combined sample is also roughly equally distributed across the six geopolitical zones.
Computer Assisted Personal Interview [capi]
The GHS-Panel Wave 5 consisted of three questionnaires for each of the two visits. The Household Questionnaire was administered to all households in the sample. The Agriculture Questionnaire was administered to all households engaged in agricultural activities such as crop farming, livestock rearing, and other agricultural and related activities. The Community Questionnaire was administered to the community to collect information on the socio-economic indicators of the enumeration areas where the sample households reside.
GHS-Panel Household Questionnaire: The Household Questionnaire provided information on demographics; education; health; labour; childcare; early child development; food and non-food expenditure; household nonfarm enterprises; food security and shocks; safety nets; housing conditions; assets; information and communication technology; economic shocks; and other sources of household income. Household location was geo-referenced in order to be able to later link the GHS-Panel data to other available geographic data sets (forthcoming).
GHS-Panel Agriculture Questionnaire: The Agriculture Questionnaire solicited information on land ownership and use; farm labour; inputs use; GPS land area measurement and coordinates of household plots; agricultural capital; irrigation; crop harvest and utilization; animal holdings and costs; household fishing activities; and digital farming information. Some information is collected at the crop level to allow for detailed analysis for individual crops.
GHS-Panel Community Questionnaire: The Community Questionnaire solicited information on access to infrastructure and transportation; community organizations; resource management; changes in the community; key events; community needs, actions, and achievements; social norms; and local retail price information.
The Household Questionnaire was slightly different for the two visits. Some information was collected only in the post-planting visit, some only in the post-harvest visit, and some in both visits.
The Agriculture Questionnaire collected different information during each visit, but for the same plots and crops.
The Community Questionnaire collected prices during both visits, and different community level information during the two visits.
CAPI: Wave five exercise was conducted using Computer Assisted Person Interview (CAPI) techniques. All the questionnaires (household, agriculture, and community questionnaires) were implemented in both the post-planting and post-harvest visits of Wave 5 using the CAPI software, Survey Solutions. The Survey Solutions software was developed and maintained by the Living Standards Measurement Unit within the Development Economics Data Group (DECDG) at the World Bank. Each enumerator was given a tablet which they used to conduct the interviews. Overall, implementation of survey using Survey Solutions CAPI was highly successful, as it allowed for timely availability of the data from completed interviews.
DATA COMMUNICATION SYSTEM: The data communication system used in Wave 5 was highly automated. Each field team was given a mobile modem which allowed for internet connectivity and daily synchronization of their tablets. This ensured that head office in Abuja had access to the data in real-time. Once the interview was completed and uploaded to the server, the data was first reviewed by the Data Editors. The data was also downloaded from the server, and Stata dofile was run on the downloaded data to check for additional errors that were not captured by the Survey Solutions application. An excel error file was generated following the running of the Stata dofile on the raw dataset. Information contained in the excel error files were then communicated back to respective field interviewers for their action. This monitoring activity was done on a daily basis throughout the duration of the survey, both in the post-planting and post-harvest.
DATA CLEANING: The data cleaning process was done in three main stages. The first stage was to ensure proper quality control during the fieldwork. This was achieved in part by incorporating validation and consistency checks into the Survey Solutions application used for the data collection and designed to highlight many of the errors that occurred during the fieldwork.
The second stage cleaning involved the use of Data Editors and Data Assistants (Headquarters in Survey Solutions). As indicated above, once the interview is completed and uploaded to the server, the Data Editors review completed interview for inconsistencies and extreme values. Depending on the outcome, they can either approve or reject the case. If rejected, the case goes back to the respective interviewer’s tablet upon synchronization. Special care was taken to see that the households included in the data matched with the selected sample and where there were differences, these were properly assessed and documented. The agriculture data were also checked to ensure that the plots identified in the main sections merged with the plot information identified in the other sections. Additional errors observed were compiled into error reports that were regularly sent to the teams. These errors were then corrected based on re-visits to the household on the instruction of the supervisor. The data that had gone through this first stage of cleaning was then approved by the Data Editor. After the Data Editor’s approval of the interview on Survey Solutions server, the Headquarters also reviews and depending on the outcome, can either reject or approve.
The third stage of cleaning involved a comprehensive review of the final raw data following the first and second stage cleaning. Every variable was examined individually for (1) consistency with other sections and variables, (2) out of range responses, and (3) outliers. However, special care was taken to avoid making strong assumptions when resolving potential errors. Some minor errors remain in the data where the diagnosis and/or solution were unclear to the data cleaning team.
Response