The 2016 Integrated Household Panel Survey (IHPS) was launched in April 2016 as part of the Malawi Fourth Integrated Household Survey fieldwork operation. The IHPS 2016 targeted 1,989 households that were interviewed in the IHPS 2013 and that could be traced back to half of the 204 enumeration areas that were originally sampled as part of the Third Integrated Household Survey (IHS3) 2010/11. The 2019 IHPS was launched in April 2019 as part of the Malawi Fifth Integrated Household Survey fieldwork operations targeting the 2,508 households that were interviewed in 2016. The panel sample expanded each wave through the tracking of split-off individuals and the new households that they formed. Available as part of this project is the IHPS 2019 data, the IHPS 2016 data as well as the rereleased IHPS 2010 & 2013 data including only the subsample of 102 EAs with updated panel weights. Additionally, the IHPS 2016 was the first survey that received complementary financial and technical support from the Living Standards Measurement Study – Plus (LSMS+) initiative, which has been established with grants from the Umbrella Facility for Gender Equality Trust Fund, the World Bank Trust Fund for Statistical Capacity Building, and the International Fund for Agricultural Development, and is implemented by the World Bank Living Standards Measurement Study (LSMS) team, in collaboration with the World Bank Gender Group and partner national statistical offices. The LSMS+ aims to improve the availability and quality of individual-disaggregated household survey data, and is, at start, a direct response to the World Bank IDA18 commitment to support 6 IDA countries in collecting intra-household, sex-disaggregated household survey data on 1) ownership of and rights to selected physical and financial assets, 2) work and employment, and 3) entrepreneurship – following international best practices in questionnaire design and minimizing the use of proxy respondents while collecting personal information. This dataset is included here.
National coverage
The IHPS 2016 and 2019 attempted to track all IHPS 2013 households stemming from 102 of the original 204 baseline panel enumeration areas as well as individuals that moved away from the 2013 dwellings between 2013 and 2016 as long as they were neither servants nor guests at the time of the IHPS 2013; were projected to be at least 12 years of age and were known to be residing in mainland Malawi but excluding those in Likoma Island and in institutions, including prisons, police compounds, and army barracks.
Sample survey data [ssd]
A sub-sample of IHS3 2010 sample enumeration areas (EAs) (i.e. 204 EAs out of 768 EAs) was selected prior to the start of the IHS3 field work with the intention to (i) to track and resurvey these households in 2013 in accordance with the IHS3 fieldwork timeline and as part of the Integrated Household Panel Survey (IHPS 2013) and (ii) visit a total of 3,246 households in these EAs twice to reduce recall associated with different aspects of agricultural data collection. At baseline, the IHPS sample was selected to be representative at the national, regional, urban/rural levels and for each of the following 6 strata: (i) Northern Region - Rural, (ii) Northern Region - Urban, (iii) Central Region - Rural, (iv) Central Region - Urban, (v) Southern Region - Rural, and (vi) Southern Region - Urban. The IHPS 2013 main fieldwork took place during the period of April-October 2013, with residual tracking operations in November-December 2013.
Given budget and resource constraints, for the IHPS 2016 the number of sample EAs in the panel was reduced to 102 out of the 204 EAs. As a result, the domains of analysis are limited to the national, urban and rural areas. Although the results of the IHPS 2016 cannot be tabulated by region, the stratification of the IHPS by region, urban and rural strata was maintained. The IHPS 2019 tracked all individuals 12 years or older from the 2016 households.
Computer Assisted Personal Interview [capi]
Data Entry Platform To ensure data quality and timely availability of data, the IHPS 2019 was implemented using the World Bank’s Survey Solutions CAPI software. To carry out IHPS 2019, 1 laptop computer and a wireless internet router were assigned to each team supervisor, and each enumerator had an 8–inch GPS-enabled Lenovo tablet computer that the NSO provided. The use of Survey Solutions allowed for the real-time availability of data as the completed data was completed, approved by the Supervisor and synced to the Headquarters server as frequently as possible. While administering the first module of the questionnaire the enumerator(s) also used their tablets to record the GPS coordinates of the dwelling units. Geo-referenced household locations from that tablet complemented the GPS measurements taken by the Garmin eTrex 30 handheld devices and these were linked with publically available geospatial databases to enable the inclusion of a number of geospatial variables - extensive measures of distance (i.e. distance to the nearest market), climatology, soil and terrain, and other environmental factors - in the analysis.
Data Management The IHPS 2019 Survey Solutions CAPI based data entry application was designed to stream-line the data collection process from the field. IHPS 2019 Interviews were mainly collected in “sample” mode (assignments generated from headquarters) and a few in “census” mode (new interviews created by interviewers from a template) for the NSO to have more control over the sample. This hybrid approach was necessary to aid the tracking operations whereby an enumerator could quickly create a tracking assignment considering that they were mostly working in areas with poor network connection and hence could not quickly receive tracking cases from Headquarters.
The range and consistency checks built into the application was informed by the LSMS-ISA experience with the IHS3 2010/11, IHPS 2013 and IHPS 2016. Prior programming of the data entry application allowed for a wide variety of range and consistency checks to be conducted and reported and potential issues investigated and corrected before closing the assigned enumeration area. Headquarters (the NSO management) assigned work to the supervisors based on their regions of coverage. The supervisors then made assignments to the enumerators linked to their supervisor account. The work assignments and syncing of completed interviews took place through a Wi-Fi connection to the IHPS 2019 server. Because the data was available in real time it was monitored closely throughout the entire data collection period and upon receipt of the data at headquarters, data was exported to Stata for other consistency checks, data cleaning, and analysis.
Data Cleaning The data cleaning process was done in several stages over the course of fieldwork and through preliminary analysis. The first stage of data cleaning was conducted in the field by the field-based field teams utilizing error messages generated by the Survey Solutions application when a response did not fit the rules for a particular question. For questions that flagged an error, the enumerators were expected to record a comment within the questionnaire to explain to their supervisor the reason for the error and confirming that they double checked the response with the respondent. The supervisors were expected to sync the enumerator tablets as frequently as possible to avoid having many questionnaires on the tablet, and to enable daily checks of questionnaires. Some supervisors preferred to review completed interviews on the tablets so they would review prior to syncing but still record the notes in the supervisor account and reject questionnaires accordingly. The second stage of data cleaning was also done in the field, and this resulted from the additional error reports generated in Stata, which were in turn sent to the field teams via email or DropBox. The field supervisors collected reports for their assignments and in coordination with the enumerators reviewed, investigated, and collected errors. Due to the quick turn-around in error reporting, it was possible to conduct call-backs while the team was still operating in the EA when required. Corrections to the data were entered in the rejected questionnaires and sent back to headquarters.
The data cleaning process was done in several stages over the course of the fieldwork and through preliminary analyses. The first stage was during the interview itself. Because CAPI software was used, as enumerators asked the questions and recorded information, error messages were provided immediately when the information recorded did not match previously defined rules for that variable. For example, if the education level for a 12 year old respondent was given as post graduate. The second stage occurred during the review of the questionnaire by the Field Supervisor. The Survey Solutions software allows errors to remain in the data if the enumerator does not make a correction. The enumerator can write a comment to explain why the data appears to be incorrect. For example, if the previously mentioned 12 year old was, in fact, a genius who had completed graduate studies. The next stage occurred when the data were transferred to headquarters where the NSO staff would again review the data for errors and verify the comments from the
Understanding Society, (UK Household Longitudinal Study), which began in 2009, is conducted by the Institute for Social and Economic Research (ISER) at the University of Essex and the survey research organisations Verian Group (formerly Kantar Public) and NatCen. It builds on and incorporates, the British Household Panel Survey (BHPS), which began in 1991.
At Wave 10 of the Innovation Panel (IP) (SN 6849), conducted in 2017, respondents were asked for permission to link their Twitter data to their survey responses. This study consists of the Twitter data collected from consenting respondents and corresponding data retrieved through the Twitter Application Programming Interface (API) for them covering the period between June 2007 and February 2023. Data in this dataset can be linked to data on the same individuals from previous and future waves of the main annual Innovation Panel interviews (SN 6849) using the personal identifier pidp. For full details of this study, please refer to the User Guide.
Co-funders
In addition to the Economic and Social Research Council, co-funders for the study included the Department of Work and Pensions, the Department for Education, the Department for Transport, the Department of Culture, Media and Sport, the Department for Community and Local Government, the Department of Health, the Scottish Government, the Welsh Assembly Government, the Northern Ireland Executive, the Department of Environment and Rural Affairs, and the Food Standards Agency.
Suitable data analysis software
The depositor provides these data in Stata format. Users are strongly advised to analyse them in Stata, as transfer to other formats may result in unforeseen issues.
The NPS 2019/20 with sex-disaggregated data (NPS-SDD 2019/20) is an off-shoot survey undertaken by following the entire NPS 2014/15 “Extended Panel” sample. The NPS-SDD 2019/20 is the first Extended Panel with sex-disaggregated data survey, collecting information on a wide range of topics including agricultural production, non-farm income generating activities, individual rights to plots, consumption expenditures, and a wealth of other socioeconomic characteristics.
Designed for analysis of key indicators at the national level.
Households; Individuals;
The universe includes all households and individuals in Tanzania with the exception of those residing in military barracks or other institutions.
Sample survey data [ssd]
The sample design for the NPS-SDD 2019/20 targeted the sub-sample of households from the initial NPS cohort originating in 2008/09 and subsequently surveyed in all four consecutive rounds, considered the “Extended Panel”. This consisted of 989 households from the NPS 2014/15 sample to be tracked and interviewed in the NPS-SDD 2019/20.
It is worth mentioning that the sample design included complete households that could not be interviewed in NPS 2014/15, excluding those households that had refused to be interviewed in NPS 2014/15. This constituted an additional 8 households. Individuals meeting the eligibility requirement that were interviewed as part of the NPS 2012/13, but were not located and interviewed during the NPS 2014/15, were also included in this round if located. Additionally, individuals from NPS 2014/15 who moved into another This constituted an additional 158 individuals assigned to their last known associated household.
The eligibility requirement for inclusion in the NPS is defined as any household member aged 15 years and above, excluding live-in servants. Households with at least one eligible member were completely interviewed, including any non-eligible members present in the household. Any household or eligible members that had either moved or split away from a primary household were tracked and interviewed in their new location.
Additionally, the final sample for NPS-SDD 2019/20 included any resulting split-off households identified during data collection (i.e. a previous NPS member who had moved or started another household). Ultimately, the final sample size for NPS-SDD 2019/20 was 5,587 individuals in 1,184 households.
Computer Assisted Personal Interview [capi]
The NPS-SDD 2019/20 consists of four survey instruments: a Household Questionnaire, Agriculture Questionnaire, Livestock Questionnaire, and a Community Questionnaire.
The Household Questionnaire is comprised of thematic sections. This questionnaire allows for the construction of a full consumption-based welfare measure, permitting distributional and incidence analysis. Data within the household instrument is structured around a household panel survey, and will add additional living standards measure in the form of sex-disaggregated data, this additional level of information will add value in the analysis of intra-household dynamics and revealing a more refined picture of welfare of Tanzania. To protect the confidentiality of respondents, sensitive information has been masked in or removed from the public household data files.
The NPS Extended Panel also includes a robust instrument on household agriculture activities. It offers an essential data source to understand the dynamic role of agriculture to household welfare. Agriculture information is collected at both the plot and crop level on inputs, production and sales, consistent with key phases in the agricultural value chain. The NPS Extended Panel likewise recognizes the importance of livestock activities to many households. As with the integrated instrument on agriculture, the NPS contains a robust instrument to capture details on these activities. The Livestock Questionnaire is administered to all households participating in these activities and asks about the inputs, outputs, labour, and sales related to these activities. Table 3 provides a more comprehensive list of the sections found within the Livestock Questionnaire.
The Community Questionnaire collects information on physical and economic infrastructure and events in surveyed communities . Responses to the community questionnaire are provided through a group discussion among key informants within the community.
Each of the NPS questionnaires were developed in collaboration with line ministries and donor partners, including the Technical Committee, over a period of several months. The NBS solicited feedback from various stakeholders in regards to survey content and design paying due consideration to comparability with previous panel rounds.
Additional data cleaning was conducted as the final stage of the data processing. Further adjustment of the data post-entry was conducted under the principle of absolute certainty where adjustments must be evidence-based and correction values true beyond a reasonable doubt. As such, the resulting final data files may still contain some inconsistencies and outliers. Handling of these values is thus left entirely to the data user. Throughout the data processing system, versions of the data are archived at all key steps and all checking and cleaning syntax documented and archived.
As with most panel surveys a certain portion of panel respondents are not able to be re-interviewed over time. This attrition of panel respondents can lead to attrition bias where respondents drop out of the survey non-randomly and where the attrition is correlated with variables of interest. The Tanzania NPS has fortunately maintained low attrition over the rounds, thus minimizing the potential for attrition bias within the datasets.
By the end of data collection, 974 of the 989 households had been located and 908 households were successfully re-interviewed for a total household attrition rate of 9.2 percent. At the individual level, 2,621 of the 3,188 eligible household members (over the age of 15 years and not a household servant) were successfully re-interviewed during the NPS-SDD 2019/20, equating to an individual attrition rate of roughly 17.7 percent between the NPS 2014/15 and the NPS-SDD 2019/20 (for extended panel households).
The sample of households selected in the NPS-SDD 2019/2020 is only one of many samples that could have been selected from the same population. Each alternative sample would yield slightly different from the results of the selected sample. Sampling errors are a measure of the variability between all possible samples and although the degree of variability cannot be directly observed, it can be estimated from the survey results and statistically evaluated. A sampling error can be measured in terms of the standard error for a particular statistic. The computer software program STATA used estat effects to calculate sampling errors for the NPS-SDD 2019/2020. In addition to the standard error, STATA computed the design effect (DEFF) for each estimate, which is defined as the ratio between the standard error using the given sample design and the standard error that would result if a simple random sample had been used. A DEFF value of 1.0 indicates that the sample design is as efficient as a simple random sample, while a value greater than 1.0 indicates the increase in the sampling error is due to the use of a more complex and less statistically efficient (but perhaps more logistically efficient) design. STATA also computed the relative error and confidence limits for the estimates. Sampling errors for the NPS-SDD 2019/2020 are calculated for selected variables considered to be of primary interest at the household and individual levels. For each variable of interest, the value of the statistic (R), its standard error (SE), the number of cases, the design effect (DEFF), the relative standard error (SE/R), and the 95 percent confidence limits (R±2SE) are provided in Tables 1-10 in the BID. The DEFF is considered undefined when the standard error in a simple random sample is zero (when the estimate is close to 0 or 1).
The General Household Survey-Panel (GHS-Panel) is implemented in collaboration with the World Bank Living Standards Measurement Study (LSMS) team as part of the Integrated Surveys on Agriculture (ISA) program. The objectives of the GHS-Panel include the development of an innovative model for collecting agricultural data, interinstitutional collaboration, and comprehensive analysis of welfare indicators and socio-economic characteristics. The GHS-Panel is a nationally representative survey of approximately 5,000 households, which are also representative of the six geopolitical zones. The 2018/19 is the fourth round of the survey with prior rounds conducted in 2010/11, 2012/13, and 2015/16. GHS-Panel households were visited twice: first after the planting season (post-planting) between July and September 2018 and second after the harvest season (post-harvest) between January and February 2019.
National
The survey covered all de jure households excluding prisons, hospitals, military barracks, and school dormitories.
Sample survey data [ssd]
The original GHS-Panel sample of 5,000 households across 500 enumeration areas (EAs) and was designed to be representative at the national level as well as at the zonal level. The complete sampling information for the GHS-Panel is described in the Basic Information Document for GHS-Panel 2010/2011. However, after a nearly a decade of visiting the same households, a partial refresh of the GHS-Panel sample was implemented in Wave 4.
For the partial refresh of the sample, a new set of 360 EAs were randomly selected which consisted of 60 EAs per zone. The refresh EAs were selected from the same sampling frame as the original GHS-Panel sample in 2010 (the “master frame”). A listing of all households was conducted in the 360 EAs and 10 households were randomly selected in each EA, resulting in a total refresh sample of approximated 3,600 households.
In addition to these 3,600 refresh households, a subsample of the original 5,000 GHS-Panel households from 2010 were selected to be included in the new sample. This “long panel” sample was designed to be nationally representative to enable continued longitudinal analysis for the sample going back to 2010. The long panel sample consisted of 159 EAs systematically selected across the 6 geopolitical Zones. The systematic selection ensured that the distribution of EAs across the 6 Zones (and urban and rural areas within) is proportional to the original GHS-Panel sample. Interviewers attempted to interview all households that originally resided in the 159 EAs and were successfully interviewed in the previous visit in 2016. This includes households that had moved away from their original location in 2010. In all, interviewers attempted to interview 1,507 households from the original panel sample.
The combined sample of refresh and long panel EAs consisted of 519 EAs. The total number of households that were successfully interviewed in both visits was 4,976.
While the combined sample generally maintains both national and Zonal representativeness of the original GHS-Panel sample, the security situation in the North East of Nigeria prevented full coverage of the Zone. Due to security concerns, rural areas of Borno state were fully excluded from the refresh sample and some inaccessible urban areas were also excluded. Security concerns also prevented interviewers from visiting some communities in other parts of the country where conflict events were occurring. Refresh EAs that could not be accessed were replaced with another randomly selected EA in the Zone so as not to compromise the sample size. As a result, the combined sample is representative of areas of Nigeria that were accessible during 2018/19. The sample will not reflect conditions in areas that were undergoing conflict during that period. This compromise was necessary to ensure the safety of interviewers.
Computer Assisted Personal Interview [capi]
The GHS-Panel Wave 4 consists of three questionnaires for each of the two visits. The Household Questionnaire was administered to all households in the sample. The Agriculture Questionnaire was administered to all households engaged in agricultural activities such as crop farming, livestock rearing and other agricultural and related activities. The Community Questionnaire was administered to the community to collect information on the socio-economic indicators of the enumeration areas where the sample households reside.
GHS-Panel Household Questionnaire: The Household Questionnaire provides information on demographics; education; health (including anthropometric measurement for children); labor; food and non-food expenditure; household nonfarm income-generating activities; food security and shocks; safety nets; housing conditions; assets; information and communication technology; and other sources of household income. Household location is geo-referenced in order to be able to later link the GHS-Panel data to other available geographic data sets.
GHS-Panel Agriculture Questionnaire: The Agriculture Questionnaire solicits information on land ownership and use; farm labor; inputs use; GPS land area measurement and coordinates of household plots; agricultural capital; irrigation; crop harvest and utilization; animal holdings and costs; and household fishing activities. Some information is collected at the crop level to allow for detailed analysis for individual crops.
GHS-Panel Community Questionnaire: The Community Questionnaire solicits information on access to infrastructure; community organizations; resource management; changes in the community; key events; community needs, actions and achievements; and local retail price information.
The Household Questionnaire is slightly different for the two visits. Some information was collected only in the post-planting visit, some only in the post-harvest visit, and some in both visits.
The Agriculture Questionnaire collects different information during each visit, but for the same plots and crops.
CAPI: For the first time in GHS-Panel, the Wave four exercise was conducted using Computer Assisted Person Interview (CAPI) techniques. All the questionnaires, household, agriculture and community questionnaires were implemented in both the post-planting and post-harvest visits of Wave 4 using the CAPI software, Survey Solutions. The Survey Solutions software was developed and maintained by the Survey Unit within the Development Economics Data Group (DECDG) at the World Bank. Each enumerator was given tablets which they used to conduct the interviews. Overall, implementation of survey using Survey Solutions CAPI was highly successful, as it allowed for timely availability of the data from completed interviews.
DATA COMMUNICATION SYSTEM: The data communication system used in Wave 4 was highly automated. Each field team was given a mobile modem allow for internet connectivity and daily synchronization of their tablet. This ensured that head office in Abuja has access to the data in real-time. Once the interview is completed and uploaded to the server, the data is first reviewed by the Data Editors. The data is also downloaded from the server, and Stata dofile was run on the downloaded data to check for additional errors that were not captured by the Survey Solutions application. An excel error file is generated following the running of the Stata dofile on the raw dataset. Information contained in the excel error files are communicated back to respective field interviewers for action by the interviewers. This action is done on a daily basis throughout the duration of the survey, both in the post-planting and post-harvest.
DATA CLEANING: The data cleaning process was done in three main stages. The first stage was to ensure proper quality control during the fieldwork. This was achieved in part by incorporating validation and consistency checks into the Survey Solutions application used for the data collection and designed to highlight many of the errors that occurred during the fieldwork.
The second stage cleaning involved the use of Data Editors and Data Assistants (Headquarters in Survey Solutions). As indicated above, once the interview is completed and uploaded to the server, the Data Editors review completed interview for inconsistencies and extreme values. Depending on the outcome, they can either approve or reject the case. If rejected, the case goes back to the respective interviewer’s tablet upon synchronization. Special care was taken to see that the households included in the data matched with the selected sample and where there were differences, these were properly assessed and documented. The agriculture data were also checked to ensure that the plots identified in the main sections merged with the plot information identified in the other sections. Additional errors observed were compiled into error reports that were regularly sent to the teams. These errors were then corrected based on re-visits to the household on the instruction of the supervisor. The data that had gone through this first stage of cleaning was then approved by the Data Editor. After the Data Editor’s approval of the interview on Survey Solutions server, the Headquarters also reviews and depending on the outcome, can either reject or approve.
The third stage of cleaning involved a comprehensive review of the final raw data following
Understanding Society (the UK Household Longitudinal Study), which began in 2009, is conducted by the Institute for Social and Economic Research (ISER) at the University of Essex, and the survey research organisations Verian Group (formerly Kantar Public) and NatCen. It builds on and incorporates, the British Household Panel Survey (BHPS), which began in 1991.
The Understanding Society: Calendar Year Dataset, 2022: Special Licence Access, is designed for analysts to conduct cross-sectional analysis for the 2022 calendar year. The Calendar Year datasets combine data collected in a specific year from across multiple waves and these are released as separate calendar year studies, with appropriate analysis weights, starting with the 2020 Calendar Year dataset. Each subsequent year, an additional yearly study is released.
The Calendar Year data is designed to enable timely cross-sectional analysis of individuals and households in a calendar year. Such analysis can however, only involve variables that are collected in every wave (excluding rotating content which is only collected in some of the waves). Due to overlapping fieldwork the data files combine data collected in the three waves that make up a calendar year. Analysis cannot be restricted to data collected in one wave during a calendar year, as this subset will not be representative of the population. Further details and guidance on this study can be found in the document 9334_main_survey_calendar_year_user_guide_2022.
These calendar year datasets should be used for cross-sectional analysis only. For those interested in longitudinal analyses using Understanding Society please access the main survey datasets: End User Licence version or Special Licence version.
Understanding Society: the UK Household Longitudinal Study, started in 2009 with a general population sample (GPS) of UK residents living in private households of around 26,000 households and an ethnic minority boost sample (EMBS) of 4,000 households. All members of these responding households and their descendants became part of the core sample who were eligible to be interviewed every year. Anyone who joined these households after this initial wave, were also interviewed as long as they lived with these core sample members to provide the household context. At each annual interview, some basic demographic information was collected about every household member, information about the household is collected from one household member, all 16+ year old household members are eligible for adult interviews, 10-15 year old household members are eligible for youth interviews, and some information is collected about 0-9 year olds from their parents or guardians. Since 1991 until 2008/9 a similar survey, the British Household Panel Survey (BHPS), was fielded. The surviving members of this survey sample were incorporated into Understanding Society in 2010. In 2015, an immigrant and ethnic minority boost sample (IEMBS) of around 2,500 households was added. In 2022 a GPS boost sample (GPS2) of around 5,700 households was added. To know more about the sample design, following rules, interview modes, incentives, consent, questionnaire content please see the study overview and user guide.
Co-funders
In addition to the Economic and Social Research Council, co-funders for the study included the Department of Work and Pensions, the Department for Education, the Department for Transport, the Department of Culture, Media and Sport, the Department for Community and Local Government, the Department of Health, the Scottish Government, the Welsh Assembly Government, the Northern Ireland Executive, the Department of Environment and Rural Affairs, and the Food Standards Agency.
End User Licence and Special Licence versions:
There are two versions of the Calendar Year 2022 data. One is available under the standard End User Licence (EUL) agreement (SN 9333), and the other is a Special Licence (SL) version (SN 9334). The SL version contains month and year of birth variables instead of just age, more detailed country and occupation coding for a number of variables and various income variables have not been top-coded (see 9334_eul_vs_sl_variable_differences for more details). Users are advised to first obtain the standard EUL version of the data to see if they are sufficient for their research requirements. The SL data have more restrictive access conditions; prospective users of the SL version will need to complete an extra application form and demonstrate to the data owners exactly why they need access to the additional variables in order to get permission to use that version. The main longitudinal versions of the Understanding Society study may be found under SNs 6614 (EUL) and 6931 (SL).
Low- and Medium-level geographical identifiers produced for the mainstage longitudinal dataset can be used with this Calendar Year 2022 dataset, subject to SL access conditions. See the User Guide for further details.
Suitable data analysis software
These data are provided by the depositor in Stata format. Users are strongly advised to analyse them in Stata. Transfer to other formats may result in unforeseen issues. Stata SE or MP software is needed to analyse the larger files, which contain about 1,800 variables.
Understanding Society, (UK Household Longitudinal Study), which began in 2009, is conducted by the Institute for Social and Economic Research (ISER) at the University of Essex and the survey research organisations Verian Group (formerly Kantar Public) and NatCen. It builds on and incorporates, the British Household Panel Survey (BHPS), which began in 1991.
The Understanding Society: Calendar Year Dataset, 2022, is designed for analysts to conduct cross-sectional analysis for the 2022 calendar year. The Calendar Year datasets combine data collected in a specific year from across multiple waves and these are released as separate calendar year studies, with appropriate analysis weights, starting with the 2020 Calendar Year dataset. Each subsequent year, an additional yearly study is released.
The Calendar Year data is designed to enable timely cross-sectional analysis of individuals and households in a calendar year. Such analysis can, however, only involve variables that are collected in every wave (excluding rotating content, which is only collected in some of the waves). Due to overlapping fieldwork, the data files combine data collected in the three waves that make up a calendar year. Analysis cannot be restricted to data collected in one wave during a calendar year, as this subset will not be representative of the population. Further details and guidance on this study can be found in the document 9333_main_survey_calendar_year_user_guide_2022.
These calendar year datasets should be used for cross-sectional analysis only. For those interested in longitudinal analyses using Understanding Society please access the main survey datasets: End User Licence version or Special Licence version.
Understanding Society: the UK Household Longitudinal Study, started in 2009 with a general population sample (GPS) of UK residents living in private households of around 26,000 households and an ethnic minority boost sample (EMBS) of 4,000 households. All members of these responding households and their descendants became part of the core sample who were eligible to be interviewed every year. Anyone who joined these households after this initial wave was also interviewed as long as they lived with these core sample members to provide the household context. At each annual interview, some basic demographic information was collected about every household member, information about the household is collected from one household member, all 16+-year-old household members are eligible for adult interviews, 10-15-year-old household members are eligible for youth interviews, and some information is collected about 0-9 year-olds from their parents or guardians. Since 1991 until 2008/9 a similar survey, the British Household Panel Survey (BHPS), was fielded. The surviving members of this survey sample were incorporated into Understanding Society in 2010. In 2015, an immigrant and ethnic minority boost sample (IEMBS) of around 2,500 households was added. In 2022, a GPS boost sample (GPS2) of around 5,700 households was added. To know more about the sample design, following rules, interview modes, incentives, consent, and questionnaire content, please see the study overview and user guide.
Co-funders
In addition to the Economic and Social Research Council, co-funders for the study included the Department of Work and Pensions, the Department for Education, the Department for Transport, the Department of Culture, Media and Sport, the Department for Community and Local Government, the Department of Health, the Scottish Government, the Welsh Assembly Government, the Northern Ireland Executive, the Department of Environment and Rural Affairs, and the Food Standards Agency.
End User Licence and Special Licence versions:
There are two versions of the Calendar Year 2022 data. One is available under the standard End User Licence (EUL) agreement (SN 9333), and the other is a Special Licence (SL) version (SN 9334). The SL version contains month and year of birth variables instead of just age, more detailed country and occupation coding for a number of variables and various income variables have not been top-coded (see document 9333_eul_vs_sl_variable_differences for more details). Users are advised first to obtain the standard EUL version of the data to see if they are sufficient for their research requirements. The SL data have more restrictive access conditions; prospective users of the SL version will need to complete an extra application form and demonstrate to the data owners exactly why they need access to the additional variables in order to get permission to use that version. The main longitudinal versions of the Understanding Society study may be found under SNs 6614 (EUL) and 6931 (SL).
Low- and Medium-level geographical identifiers produced for the mainstage longitudinal dataset can be used with this Calendar Year 2022 dataset, subject to SL access conditions. See the User Guide for further details.
Suitable data analysis software
These data are provided by the depositor in Stata format. Users are strongly advised to analyse them in Stata. Transfer to other formats may result in unforeseen issues. Stata SE or MP software is needed to analyse the larger files, which contain about 1,800 variables.
Understanding Society, (UK Household Longitudinal Study), which began in 2009, is conducted by the Institute for Social and Economic Research (ISER) at the University of Essex and the survey research organisations Verian Group (formerly Kantar Public) and NatCen. It builds on and incorporates, the British Household Panel Survey (BHPS), which began in 1991.
The Understanding Society: Longitudinal Teaching Dataset, Waves 1-9, 2009-2018 is a teaching resource using data from Understanding Society, the UK Household Longitudinal Study, which interviews individuals in the sampled households every year. There are two target audiences – 1) lecturers who would like to use the data file provided for longitudinal methods teaching purposes, and 2) data users who are new to using longitudinal data and can get a better understanding of using longitudinal data by using the supplied analysis guidance which utilizes the data file.
The statistical software used to construct the dataset is Stata and the analysis guidance provided is accompanied by Stata syntax only. The datafile is also available to download in SPSS and tab-delimited text formats. The User Guide includes guidance on how to convert the datafile in Stata format to R.
A second teaching resource using the Understanding Society survey is also available, see SN 8465, Understanding Society: Ethnicity and Health Teaching Dataset.
For information on the main Understanding Society study, see SN 6614, Understanding Society and Harmonised BHPS.
The Ethiopia Socioeconomic Survey (ESS) is a collaborative project between the Central Statistics Agency (CSA) of Ethiopia and the World Bank Living Standards Measurement Study-Integrated Surveys on Agriculture (LSMS-ISA) team. The objective of the LSMS-ISA is to collect multi-topic, household-level panel data with a special focus on improving agriculture statistics and generating a clearer understanding of the link between agriculture and other sectors of the economy. The project also aims to build capacity, share knowledge across countries, and improve survey methodologies and technology.
ESS is a long-term project to collect panel data. The project responds to the data needs of the country, given the dependence of a high percentage of households in agriculture activities in the country. The ESS collects information on household agricultural activities along with other information on the households like human capital, other economic activities, access to services and resources. The ability to follow the same households over time makes the ESS a new and powerful tool for studying and understanding the role of agriculture in household welfare over time as it allows analyses of how households add to their human and physical capital, how education affects earnings, and the role of government policies and programs on poverty, inter alia. The ESS is the first panel survey to be carried out by the CSA that links a multi-topic household questionnaire with detailed data on agriculture.
National Coverage.
Households
ESS uses a nationally representative sample of over 5,000 households living in rural and urban areas. The urban areas include both small and large towns.
Sample survey data [ssd]
The sample is a two-stage probability sample. The first stage of sampling entailed selecting primary sampling units, or CSA enumeration areas (EAs). A total of 433 EAs were selected based on probability proportional to size of the total EAs in each region. For the rural sample, 290 EAs were selected from the AgSS EAs. A total of 43 and 100 EAs were selected for small town and urban areas, respectively. In order to ensure sufficient sample size in the most populous regions (Amhara, Oromiya, SNNP, and Tigray) and Addis Ababa, quotas were set for the number of EAs in each region. The sample is not representative for each of the small regions including Afar, Benshangul Gumuz, Dire Dawa, Gambella, Harari, and Somalie regions. However, estimates can be produced for a combination of all smaller regions as one “other region” category. A more detailed description of the sample design is provided in Section 3 of the Basic Information Document provided under the Related Materials tab.
Mixed data collection mode
The interviews were carried out using pen-and-paper (PAPI) as well as computer-assisted personal interviewing (CAPI) method. A concurrent data entry arrangement was implemented for PAPI. In this arrangement, the enumerators did not wait until all the interviews were completed. Rather, once the enumerators completed approximately 3-4 questionnaires, supervisors collected these interviews from enumerators and brought them to the branch offices for data entry. This process took place as enumerators continued administering interviews with other households. Then questionnaires were keyed at the branch offices as soon as they were completed using the CSPro data entry application software. The data from the completed questionnaires were then checked for any interview or data entry errors using a STATA program. Data entry errors were flagged for the data entry clerks and the interview errors were then sent to back to the field for correction and feedback to the ongoing interviews. Several rounds of this process were undertaken until the final data files were produced. Additional cleaning was carried out, as needed, by checking the hard copies. In ESS3, CAPI (with a Survey Solutions platform) was used to collect the community data in large town areas.
During wave 3, 1255 households were re-interviewed yielding a response rate of 85 percent. Attrition in urban areas is 15% due to consent refusal and inability to trace the whereabouts of sample households.
Understanding Society, (UK Household Longitudinal Study), which began in 2009, is conducted by the Institute for Social and Economic Research (ISER) at the University of Essex and the survey research organisations Verian Group (formerly Kantar Public) and NatCen. It builds on and incorporates, the British Household Panel Survey (BHPS), which began in 1991.
The Understanding Society Wave 2 Nurse Health Assessment, conducted in 2010-2012, was completed with 15,646 adult participants from the General Population component living in England, Scotland or Wales who completed a full Wave 2 interview. In addition, blood samples were obtained from 9,920 individuals. The Wave 3 Nurse Health Assessment, conducted in 2011-2012, was completed with the BHPS sample component. Assessments were conducted with 5,053 individuals and blood samples were obtained from 3,366 individuals. The Nurse Health Assessment, which included physical measures, such as height, weight, lung function, blood pressure and grip strength, as well as a range of blood samples, followed the main wave interview by approximately five months. As well as a range of blood analytes, two proteomic panels have been produced and a number of epigenetic ageing variables have been derived. The physical measures, biomarkers and questionnaire data from the Nurse Health Assessment interview are available from the UK Data Service. Genetics and epigenetic information is also available, with and without survey data; see the Understanding Society website for more information - https://www.understandingsociety.ac.uk/topic-page/biomarkers-genetics-and-epigenetics.
For information on the main Understanding Society study, see SN 6614, Understanding Society and Harmonised BHPS.
The Special Licence version of the Understanding Society: Nurse Health Assessment study is held under SN 7587. It contains variables covering prescription medication codes and associated usage questions, together with polygenic score variables, derived from analysis of the genetics data, that are not included in the standard End User Licence version (SN 7251). Users are advised to check that study first to see if the data are suitable for their needs before making an application for the Special Licence version. See documentation for further details.
Latest edition information
For the 6th edition (June 2025) five new biomarker variables have been deposited. There is also a new ageing (clock) variable and others have been renamed. Various other changes have also been made across multiple data files. The User Guides have also been updated. For full details please refer to the document '7251_revisions_june_2025.pdf' and to the User Guides.
Suitable data analysis software
These data are provided by the depositor in Stata format. Users are strongly advised to analyse them in Stata. Transfer to other formats may result in unforeseen issues. Stata SE or MP software is needed to analyse the larger files, which contain over 2,047 variables.
Understanding Society (the UK Household Longitudinal Study), which began in 2009, is conducted by the Institute for Social and Economic Research (ISER) at the University of Essex, and the survey research organisations Verian Group (formerly Kantar Public) and NatCen. It builds on and incorporates, the British Household Panel Survey (BHPS), which began in 1991.
The Harmonized Histories is an international comparative dataset, created through harmonising data from existing surveys into one common format. The aim of Harmonized Histories is to facilitate cross-national research on topics related to transition to adulthood, family formation, and childbearing. The dataset focuses on fertility and partnership histories but also captures information on socio-economic status, place of residence and information on the childhood family. You can find more information about Harmonized Histories and access to the datasets from other countries via the Generations & Gender Programme (GGP) website.
Two datasets are provided. The first includes all people aged 16 or over who participated in the full interview of Wave 1 of the Understanding Society project and the data as is collected at Wave 1. The second dataset follows the people who are in the first dataset prospectively. Thus, it includes all the retrospective information from the first dataset and has been updated when things changed, for instance the partners got married or had children. For more information please refer to the User Guide.
Harmonized Histories uses Understanding Society for data on the UK. As Harmonized Histories is a cross-national project, please note that the variable naming conventions and terminology used in this dataset are different to the standard Understanding Society naming and terms.
Further information may also be found on the Understanding Society mainstage webpage and links to publications based on the study can be found on the Understanding Society "https://www.understandingsociety.ac.uk/research/publications"> Latest Research webpage.
Understanding Society acknowledges Professor Brienna Perelli-Harris, Dr Niels Blom and Karolin Kubisch for making this dataset available to Understanding Society.
Suitable data analysis software
These data are provided by the depositor in Stata format. Users are strongly advised to analyse them in Stata, although SPSS and tab-delimited text versions are also available if needed. Users should note that transfer to other software formats may result in unforeseen issues.
Understanding Society (the UK Household Longitudinal Study), which began in 2009, is conducted by the Institute for Social and Economic Research (ISER) at the University of Essex, and the survey research organisations Verian Group (formerly Kantar Public) and NatCen. It builds on and incorporates, the British Household Panel Survey (BHPS), which began in 1991.
This release combines fourteen waves of Understanding Society data with harmonised data from all eighteen waves of the BHPS. As multi-topic studies, the purpose of Understanding Society and BHPS is to understand short- and long-term effects of social and economic change in the UK at the household and individual levels. The study has a strong emphasis on domains of family and social ties, employment, education, financial resources, and health. Understanding Society is an annual survey of each adult member of a nationally representative sample. The same individuals are re-interviewed in each wave approximately 12 months apart. When individuals move they are followed within the UK and anyone joining their households are also interviewed as long as they are living with them. The study has five sample components: the general population sample; a boost sample of ethnic minority group members; an immigrant and ethnic minority boost sample (from wave 6); participants from the BHPS; and a second general population boost sample added at this wave. In addition, there is the Understanding Society Innovation Panel (which is a separate standalone survey (see SN 6849)). The fieldwork period is for 24 months. Data collection uses computer assisted personal interviewing (CAPI) and web interviews (from wave 7), and includes a telephone mop-up. From March 2020 (the end of wave 10 and the 2nd year of wave 11), due to the coronavirus pandemic, face-to-face interviews were suspended, and the survey was conducted by web and telephone only, but otherwise has continued as before. Face-to-face interviewing was resumed from April 2022. One person completes the household questionnaire. Each person aged 16 is invited to complete the individual adult interview and self-completed questionnaire. Parents are asked questions about their children under 10 years old. Youths aged 10 to 15 are asked to respond to a self-completion questionnaire. For the general and BHPS samples biomarker, genetic and epigenetic data are also available. The biomarker data, and summary genetics and epigenetic scores, are available via UKDS (see SN 7251); detailed genetics and epigenetics data are available by application (see below). In 2020-21 an additional frequent web survey was separately issued to sample members to capture data on the rapid changes in people’s lives due to the COVID-19 pandemic (see SN 8644). Participants are asked consent to link their data to wide-ranging administrative data sets (see below).
Further information may be found on the Understanding Society Main stage webpage and links to publications based on the study can be found on the Understanding Society Latest Research webpage.
Co-funders
In addition to the Economic and Social Research Council, co-funders for the study included the Department of Work and Pensions, the Department for Education, the Department for Transport, the Department of Culture, Media and Sport, the Department for Community and Local Government, the Department of Health, the Scottish Government, the Welsh Assembly Government, the Northern Ireland Executive, the Department of Environment and Rural Affairs, and the Food Standards Agency.
End User Licence, Special Licence and Secure Access versions:
There are three versions of the main Understanding Society data with different access conditions. One is available under the standard End User Licence (EUL) agreement (SN 6614), one is a Special Licence (SL) version (this study) and the third is a Secure Access version (SN 6676). The SL version contains month as well as year of birth variables, more detailed country and occupation coding for a number of variables, various income variables that have not been top-coded, and other potentially sensitive variables (see 6931_eul_vs_sl_variable_differences document available with the SL version for full details of the differences). The Secure Access version, in addition to containing all the variables in the SL version, also contains day of birth as well as Grid Reference geographical variables. Users are advised to first obtain the standard EUL version of the data to see if they are sufficient for their research requirements. The SL and Secure Access versions of the data have more restrictive access conditions and prospective users of those versions should visit the catalogue entries for SN 6931 and SN 6676 respectively for further information.
Low- and Medium-level geographical identifiers are also available subject to SL access conditions; see SNs 6666, 6668-6675, 7453-4, 7629-30, 7245, 7248-9 and 9169-9170. Schools data are available subject to SL access conditions in SN 7182. Higher Education establishments for Wave 5 are available subject to SL access conditions in SN 8578. Interviewer Characteristics data, also subject to SL access conditions is available in SN 8579. In addition, a fine detail geographic dataset (SN 6676) is available under more restrictive Secure Access conditions that contains National Grid postcode grid references (at 1m resolution) for the unit postcode of each household surveyed, derived from ONS Postcode Directories (ONSPD). For details on how to make an application for Secure Access dataset, please see the SN 6676 catalogue record.
How to access genetic and/or bio-medical sample data from Understanding Society:
Information on how to access genetics and epigenetics data directly from the study team is available on the Understanding Society Accessing data webpage.
Linked administrative data
Linked Understanding Society / administrative data are available on a number of different platforms. See the Understanding Society Data linkage webpage for details of those currently available and how they can be accessed.
Latest edition information
For the 18th edition (November 2024) Wave 14 data has been added. Other minor changes and corrections have also been made to Waves 1-13. Please refer to the revisions document for full details.
m_hhresp and n_hhresp files updated, December 2024
In the previous release (18th edition, November 2024), there was an issue with household income estimates in m_hhresp and n_hhresp where a household resides in a new local authority (approx. 300 households in wave 14). The issue has been corrected and imputation models re-estimated and imputed values updated for the full sample. Imputed values will therefore change compared to the versions in the original release. The variables affected are w_ficountax_dv, w_fihhmnnet3_dv, n_fihhmnnet4_dv and n_ctband_dv.
Suitable data analysis software
These data are provided by the depositor in Stata format. Users are strongly advised to analyse them in Stata. Transfer to other formats may result in unforeseen issues. Stata SE or MP software is needed to analyse the larger files, which contain over 2,047 variables.
A theoretical mechanism was analyzed from the micro perspective of the enterprise to explore how information accessibility moderates the effect of accounting manipulation on the sustainable development of digital enterprises. Using data from 1200 listing digital enterprises in China and the DEA-Malmquist index method, the efficiency value of digital enterprises in 2007–2021 was estimated to represent the index of sustainable development of digital enterprises. The accounting manipulation was detected using the panel PSM-DID method based on the Administrative Measures for the Recognition of High-tech Enterprise's policy. The information accessibility value was estimated based on the MDA method. Empirical studies were conducted using text analysis, the panel PSM-DID method, and the double moderating effect model. The results showed that: (1) Accounting manipulation had a negative impact on the sustainable development of "true" digital enterprises and the "fake" digital enterprises; (2) In..., Descriptive statistics for variables and data are shown in Table 1. The samples were selected from the enterprise list of the DE sector of the Shenzhen and Shanghai Stock Exchanges, excluding those listed or delisted in or after 2007, ST enterpreses, and ST* enterprises. The data of 1200 digital enterprises between 2007 to 2021 was from the Guotai’an database (www.gtarsc.com/) and the annual reports of listed companies in the Shenzhen and Shanghai Stock Exchanges. Â
Â
Variable
Symbol
The main dependent variable
Total factor productivity
TFP
Â
Technical efficiency
TE
Â
Scale efficiency
SE
Â
Technological progress
TECH
Efficiency measurement index system
Input
Labor input
lnl
Â
Â
Capital input
lnk
Â
Output
Profit
lny
Â
 Â
Intangible assets
lni
The main independent variables
Company research and development exercise, vertical virtual variable
PsdHiT
Â
Whether companies are identified as high
H..., Evaluation was done by using Stata 15.0., =====================
Provenance for this README
--------------------------
* File name: README.txt
* Authors: Shujuan Wu
* Other contributors: Minmin Li, Jianhua Xiao, Jianhua Tang
* Date created: 2023-07-03
* Date modified: 2024-02-02
Dataset Version and Release History
-----------------------------------
* Current Version:
* Number: 12.0.0
* Date: 2024-02-02
* Persistent identifier: DOI: 10.5061/dryad.jh9w0vtg8
* Summary of changes: n/a
* Embargo Provenance: n/a
* Scope of embargo: n/a
* Embargo period: n/a
Dataset Attribution and Usage
-----------------------------
* Dataset Title: Data for: Information accessibility, accounting manipulation, and sustainable development of digital enterprises: Based on double moderating effect model and panel PSM-DID method
* Persistent Identifier: DOI: 10.5061/dryad.jh9w0vtg8
* Dataset Contributors:
* Creators: Shujuan Wu, Minmin Li, Jianhua Xiao, Jianhua Tang
* Date of Issue: 2023-03-31
* Publisher: ...
Not seeing a result you expected?
Learn how you can add new datasets to our index.
The 2016 Integrated Household Panel Survey (IHPS) was launched in April 2016 as part of the Malawi Fourth Integrated Household Survey fieldwork operation. The IHPS 2016 targeted 1,989 households that were interviewed in the IHPS 2013 and that could be traced back to half of the 204 enumeration areas that were originally sampled as part of the Third Integrated Household Survey (IHS3) 2010/11. The 2019 IHPS was launched in April 2019 as part of the Malawi Fifth Integrated Household Survey fieldwork operations targeting the 2,508 households that were interviewed in 2016. The panel sample expanded each wave through the tracking of split-off individuals and the new households that they formed. Available as part of this project is the IHPS 2019 data, the IHPS 2016 data as well as the rereleased IHPS 2010 & 2013 data including only the subsample of 102 EAs with updated panel weights. Additionally, the IHPS 2016 was the first survey that received complementary financial and technical support from the Living Standards Measurement Study – Plus (LSMS+) initiative, which has been established with grants from the Umbrella Facility for Gender Equality Trust Fund, the World Bank Trust Fund for Statistical Capacity Building, and the International Fund for Agricultural Development, and is implemented by the World Bank Living Standards Measurement Study (LSMS) team, in collaboration with the World Bank Gender Group and partner national statistical offices. The LSMS+ aims to improve the availability and quality of individual-disaggregated household survey data, and is, at start, a direct response to the World Bank IDA18 commitment to support 6 IDA countries in collecting intra-household, sex-disaggregated household survey data on 1) ownership of and rights to selected physical and financial assets, 2) work and employment, and 3) entrepreneurship – following international best practices in questionnaire design and minimizing the use of proxy respondents while collecting personal information. This dataset is included here.
National coverage
The IHPS 2016 and 2019 attempted to track all IHPS 2013 households stemming from 102 of the original 204 baseline panel enumeration areas as well as individuals that moved away from the 2013 dwellings between 2013 and 2016 as long as they were neither servants nor guests at the time of the IHPS 2013; were projected to be at least 12 years of age and were known to be residing in mainland Malawi but excluding those in Likoma Island and in institutions, including prisons, police compounds, and army barracks.
Sample survey data [ssd]
A sub-sample of IHS3 2010 sample enumeration areas (EAs) (i.e. 204 EAs out of 768 EAs) was selected prior to the start of the IHS3 field work with the intention to (i) to track and resurvey these households in 2013 in accordance with the IHS3 fieldwork timeline and as part of the Integrated Household Panel Survey (IHPS 2013) and (ii) visit a total of 3,246 households in these EAs twice to reduce recall associated with different aspects of agricultural data collection. At baseline, the IHPS sample was selected to be representative at the national, regional, urban/rural levels and for each of the following 6 strata: (i) Northern Region - Rural, (ii) Northern Region - Urban, (iii) Central Region - Rural, (iv) Central Region - Urban, (v) Southern Region - Rural, and (vi) Southern Region - Urban. The IHPS 2013 main fieldwork took place during the period of April-October 2013, with residual tracking operations in November-December 2013.
Given budget and resource constraints, for the IHPS 2016 the number of sample EAs in the panel was reduced to 102 out of the 204 EAs. As a result, the domains of analysis are limited to the national, urban and rural areas. Although the results of the IHPS 2016 cannot be tabulated by region, the stratification of the IHPS by region, urban and rural strata was maintained. The IHPS 2019 tracked all individuals 12 years or older from the 2016 households.
Computer Assisted Personal Interview [capi]
Data Entry Platform To ensure data quality and timely availability of data, the IHPS 2019 was implemented using the World Bank’s Survey Solutions CAPI software. To carry out IHPS 2019, 1 laptop computer and a wireless internet router were assigned to each team supervisor, and each enumerator had an 8–inch GPS-enabled Lenovo tablet computer that the NSO provided. The use of Survey Solutions allowed for the real-time availability of data as the completed data was completed, approved by the Supervisor and synced to the Headquarters server as frequently as possible. While administering the first module of the questionnaire the enumerator(s) also used their tablets to record the GPS coordinates of the dwelling units. Geo-referenced household locations from that tablet complemented the GPS measurements taken by the Garmin eTrex 30 handheld devices and these were linked with publically available geospatial databases to enable the inclusion of a number of geospatial variables - extensive measures of distance (i.e. distance to the nearest market), climatology, soil and terrain, and other environmental factors - in the analysis.
Data Management The IHPS 2019 Survey Solutions CAPI based data entry application was designed to stream-line the data collection process from the field. IHPS 2019 Interviews were mainly collected in “sample” mode (assignments generated from headquarters) and a few in “census” mode (new interviews created by interviewers from a template) for the NSO to have more control over the sample. This hybrid approach was necessary to aid the tracking operations whereby an enumerator could quickly create a tracking assignment considering that they were mostly working in areas with poor network connection and hence could not quickly receive tracking cases from Headquarters.
The range and consistency checks built into the application was informed by the LSMS-ISA experience with the IHS3 2010/11, IHPS 2013 and IHPS 2016. Prior programming of the data entry application allowed for a wide variety of range and consistency checks to be conducted and reported and potential issues investigated and corrected before closing the assigned enumeration area. Headquarters (the NSO management) assigned work to the supervisors based on their regions of coverage. The supervisors then made assignments to the enumerators linked to their supervisor account. The work assignments and syncing of completed interviews took place through a Wi-Fi connection to the IHPS 2019 server. Because the data was available in real time it was monitored closely throughout the entire data collection period and upon receipt of the data at headquarters, data was exported to Stata for other consistency checks, data cleaning, and analysis.
Data Cleaning The data cleaning process was done in several stages over the course of fieldwork and through preliminary analysis. The first stage of data cleaning was conducted in the field by the field-based field teams utilizing error messages generated by the Survey Solutions application when a response did not fit the rules for a particular question. For questions that flagged an error, the enumerators were expected to record a comment within the questionnaire to explain to their supervisor the reason for the error and confirming that they double checked the response with the respondent. The supervisors were expected to sync the enumerator tablets as frequently as possible to avoid having many questionnaires on the tablet, and to enable daily checks of questionnaires. Some supervisors preferred to review completed interviews on the tablets so they would review prior to syncing but still record the notes in the supervisor account and reject questionnaires accordingly. The second stage of data cleaning was also done in the field, and this resulted from the additional error reports generated in Stata, which were in turn sent to the field teams via email or DropBox. The field supervisors collected reports for their assignments and in coordination with the enumerators reviewed, investigated, and collected errors. Due to the quick turn-around in error reporting, it was possible to conduct call-backs while the team was still operating in the EA when required. Corrections to the data were entered in the rejected questionnaires and sent back to headquarters.
The data cleaning process was done in several stages over the course of the fieldwork and through preliminary analyses. The first stage was during the interview itself. Because CAPI software was used, as enumerators asked the questions and recorded information, error messages were provided immediately when the information recorded did not match previously defined rules for that variable. For example, if the education level for a 12 year old respondent was given as post graduate. The second stage occurred during the review of the questionnaire by the Field Supervisor. The Survey Solutions software allows errors to remain in the data if the enumerator does not make a correction. The enumerator can write a comment to explain why the data appears to be incorrect. For example, if the previously mentioned 12 year old was, in fact, a genius who had completed graduate studies. The next stage occurred when the data were transferred to headquarters where the NSO staff would again review the data for errors and verify the comments from the