35 datasets found
  1. m

    Example Stata syntax and data construction for negative binomial time series...

    • data.mendeley.com
    Updated Nov 2, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sarah Price (2022). Example Stata syntax and data construction for negative binomial time series regression [Dataset]. http://doi.org/10.17632/3mj526hgzx.2
    Explore at:
    Dataset updated
    Nov 2, 2022
    Authors
    Sarah Price
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We include Stata syntax (dummy_dataset_create.do) that creates a panel dataset for negative binomial time series regression analyses, as described in our paper "Examining methodology to identify patterns of consulting in primary care for different groups of patients before a diagnosis of cancer: an exemplar applied to oesophagogastric cancer". We also include a sample dataset for clarity (dummy_dataset.dta), and a sample of that data in a spreadsheet (Appendix 2).

    The variables contained therein are defined as follows:

    case: binary variable for case or control status (takes a value of 0 for controls and 1 for cases).

    patid: a unique patient identifier.

    time_period: A count variable denoting the time period. In this example, 0 denotes 10 months before diagnosis with cancer, and 9 denotes the month of diagnosis with cancer,

    ncons: number of consultations per month.

    period0 to period9: 10 unique inflection point variables (one for each month before diagnosis). These are used to test which aggregation period includes the inflection point.

    burden: binary variable denoting membership of one of two multimorbidity burden groups.

    We also include two Stata do-files for analysing the consultation rate, stratified by burden group, using the Maximum likelihood method (1_menbregpaper.do and 2_menbregpaper_bs.do).

    Note: In this example, for demonstration purposes we create a dataset for 10 months leading up to diagnosis. In the paper, we analyse 24 months before diagnosis. Here, we study consultation rates over time, but the method could be used to study any countable event, such as number of prescriptions.

  2. d

    Panel on Household Finances 3rd Wave - Dataset - B2FIND

    • demo-b2find.dkrz.de
    Updated Sep 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Panel on Household Finances 3rd Wave - Dataset - B2FIND [Dataset]. http://demo-b2find.dkrz.de/dataset/eee2cda8-d9bf-5d3b-9d71-21b5a92b18f2
    Explore at:
    Dataset updated
    Sep 20, 2025
    Description

    The PHF scientific use file (SUF) Wave 3 Version 3.0 data set is the second updated version of the wave 3 PHF data set and consists of the following five Stata files: PHF_h_wave3_v3_0.dta, PHF_p_wave3_v3_0.dta, PHF_m_wave3_v3_0.dta, PHF_d_wave3_v3_0.dta and PHF_w_wave3_v3_0.dta. The major changes in PHF SUF Wave 3 Version 3.0 compared to the previous version PHF SUF Wave 3 Version 2.0 are as follows: Editing and correction of one variable. The data set contains additional variables. Some variables were removed For more details, see the PHF User Guide on website of the Deutsche Bundesbank. Face-to-face interview: CAPI/CAMI CATI option for personal interviews All private households located in Germany except institutional households (in old-age homes, prisons, barracks etc.) Stratified random sample based on population registers; oversampling of wealthy households

  3. w

    General Household Survey, Panel 2023-2024 - Nigeria

    • microdata.worldbank.org
    • microdata.nigerianstat.gov.ng
    • +2more
    Updated Nov 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Bureau of Statistics (NBS) (2024). General Household Survey, Panel 2023-2024 - Nigeria [Dataset]. https://microdata.worldbank.org/index.php/catalog/6410
    Explore at:
    Dataset updated
    Nov 21, 2024
    Dataset authored and provided by
    National Bureau of Statistics (NBS)
    Time period covered
    2023 - 2024
    Area covered
    Nigeria
    Description

    Abstract

    The General Household Survey-Panel (GHS-Panel) is implemented in collaboration with the World Bank Living Standards Measurement Study (LSMS) team as part of the Integrated Surveys on Agriculture (ISA) program. The objectives of the GHS-Panel include the development of an innovative model for collecting agricultural data, interinstitutional collaboration, and comprehensive analysis of welfare indicators and socio-economic characteristics. The GHS-Panel is a nationally representative survey of approximately 5,000 households, which are also representative of the six geopolitical zones. The 2023/24 GHS-Panel is the fifth round of the survey with prior rounds conducted in 2010/11, 2012/13, 2015/16 and 2018/19. The GHS-Panel households were visited twice: during post-planting period (July - September 2023) and during post-harvest period (January - March 2024).

    Geographic coverage

    National

    Analysis unit

    • Households • Individuals • Agricultural plots • Communities

    Universe

    The survey covered all de jure households excluding prisons, hospitals, military barracks, and school dormitories.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The original GHS‑Panel sample was fully integrated with the 2010 GHS sample. The GHS sample consisted of 60 Primary Sampling Units (PSUs) or Enumeration Areas (EAs), chosen from each of the 37 states in Nigeria. This resulted in a total of 2,220 EAs nationally. Each EA contributed 10 households to the GHS sample, resulting in a sample size of 22,200 households. Out of these 22,200 households, 5,000 households from 500 EAs were selected for the panel component, and 4,916 households completed their interviews in the first wave.

    After nearly a decade of visiting the same households, a partial refresh of the GHS‑Panel sample was implemented in Wave 4 and maintained for Wave 5. The refresh was conducted to maintain the integrity and representativeness of the sample. The refresh EAs were selected from the same sampling frame as the original GHS‑Panel sample in 2010. A listing of households was conducted in the 360 EAs, and 10 households were randomly selected in each EA, resulting in a total refresh sample of approximately 3,600 households.

    In addition to these 3,600 refresh households, a subsample of the original 5,000 GHS‑Panel households from 2010 were selected to be included in the new sample. This “long panel” sample of 1,590 households was designed to be nationally representative to enable continued longitudinal analysis for the sample going back to 2010. The long panel sample consisted of 159 EAs systematically selected across Nigeria’s six geopolitical zones.

    The combined sample of refresh and long panel EAs in Wave 5 that were eligible for inclusion consisted of 518 EAs based on the EAs selected in Wave 4. The combined sample generally maintains both the national and zonal representativeness of the original GHS‑Panel sample.

    Sampling deviation

    Although 518 EAs were identified for the post-planting visit, conflict events prevented interviewers from visiting eight EAs in the North West zone of the country. The EAs were located in the states of Zamfara, Katsina, Kebbi and Sokoto. Therefore, the final number of EAs visited both post-planting and post-harvest comprised 157 long panel EAs and 354 refresh EAs. The combined sample is also roughly equally distributed across the six geopolitical zones.

    Mode of data collection

    Computer Assisted Personal Interview [capi]

    Research instrument

    The GHS-Panel Wave 5 consisted of three questionnaires for each of the two visits. The Household Questionnaire was administered to all households in the sample. The Agriculture Questionnaire was administered to all households engaged in agricultural activities such as crop farming, livestock rearing, and other agricultural and related activities. The Community Questionnaire was administered to the community to collect information on the socio-economic indicators of the enumeration areas where the sample households reside.

    GHS-Panel Household Questionnaire: The Household Questionnaire provided information on demographics; education; health; labour; childcare; early child development; food and non-food expenditure; household nonfarm enterprises; food security and shocks; safety nets; housing conditions; assets; information and communication technology; economic shocks; and other sources of household income. Household location was geo-referenced in order to be able to later link the GHS-Panel data to other available geographic data sets (forthcoming).

    GHS-Panel Agriculture Questionnaire: The Agriculture Questionnaire solicited information on land ownership and use; farm labour; inputs use; GPS land area measurement and coordinates of household plots; agricultural capital; irrigation; crop harvest and utilization; animal holdings and costs; household fishing activities; and digital farming information. Some information is collected at the crop level to allow for detailed analysis for individual crops.

    GHS-Panel Community Questionnaire: The Community Questionnaire solicited information on access to infrastructure and transportation; community organizations; resource management; changes in the community; key events; community needs, actions, and achievements; social norms; and local retail price information.

    The Household Questionnaire was slightly different for the two visits. Some information was collected only in the post-planting visit, some only in the post-harvest visit, and some in both visits.

    The Agriculture Questionnaire collected different information during each visit, but for the same plots and crops.

    The Community Questionnaire collected prices during both visits, and different community level information during the two visits.

    Cleaning operations

    CAPI: Wave five exercise was conducted using Computer Assisted Person Interview (CAPI) techniques. All the questionnaires (household, agriculture, and community questionnaires) were implemented in both the post-planting and post-harvest visits of Wave 5 using the CAPI software, Survey Solutions. The Survey Solutions software was developed and maintained by the Living Standards Measurement Unit within the Development Economics Data Group (DECDG) at the World Bank. Each enumerator was given a tablet which they used to conduct the interviews. Overall, implementation of survey using Survey Solutions CAPI was highly successful, as it allowed for timely availability of the data from completed interviews.

    DATA COMMUNICATION SYSTEM: The data communication system used in Wave 5 was highly automated. Each field team was given a mobile modem which allowed for internet connectivity and daily synchronization of their tablets. This ensured that head office in Abuja had access to the data in real-time. Once the interview was completed and uploaded to the server, the data was first reviewed by the Data Editors. The data was also downloaded from the server, and Stata dofile was run on the downloaded data to check for additional errors that were not captured by the Survey Solutions application. An excel error file was generated following the running of the Stata dofile on the raw dataset. Information contained in the excel error files were then communicated back to respective field interviewers for their action. This monitoring activity was done on a daily basis throughout the duration of the survey, both in the post-planting and post-harvest.

    DATA CLEANING: The data cleaning process was done in three main stages. The first stage was to ensure proper quality control during the fieldwork. This was achieved in part by incorporating validation and consistency checks into the Survey Solutions application used for the data collection and designed to highlight many of the errors that occurred during the fieldwork.

    The second stage cleaning involved the use of Data Editors and Data Assistants (Headquarters in Survey Solutions). As indicated above, once the interview is completed and uploaded to the server, the Data Editors review completed interview for inconsistencies and extreme values. Depending on the outcome, they can either approve or reject the case. If rejected, the case goes back to the respective interviewer’s tablet upon synchronization. Special care was taken to see that the households included in the data matched with the selected sample and where there were differences, these were properly assessed and documented. The agriculture data were also checked to ensure that the plots identified in the main sections merged with the plot information identified in the other sections. Additional errors observed were compiled into error reports that were regularly sent to the teams. These errors were then corrected based on re-visits to the household on the instruction of the supervisor. The data that had gone through this first stage of cleaning was then approved by the Data Editor. After the Data Editor’s approval of the interview on Survey Solutions server, the Headquarters also reviews and depending on the outcome, can either reject or approve.

    The third stage of cleaning involved a comprehensive review of the final raw data following the first and second stage cleaning. Every variable was examined individually for (1) consistency with other sections and variables, (2) out of range responses, and (3) outliers. However, special care was taken to avoid making strong assumptions when resolving potential errors. Some minor errors remain in the data where the diagnosis and/or solution were unclear to the data cleaning team.

    Response

  4. 2

    Understanding Society, Waves 1-, 2008- : Safeguarded/Special Licence

    • datacatalogue.ukdataservice.ac.uk
    Updated Jul 22, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Essex, Institute for Social and Economic Research (2022). Understanding Society, Waves 1-, 2008- : Safeguarded/Special Licence [Dataset]. http://doi.org/10.5255/UKDA-SN-8987-1
    Explore at:
    Dataset updated
    Jul 22, 2022
    Dataset provided by
    UK Data Servicehttps://ukdataservice.ac.uk/
    Authors
    University of Essex, Institute for Social and Economic Research
    Time period covered
    Jan 1, 2020 - Dec 31, 2020
    Area covered
    United Kingdom
    Description

    Understanding Society (the UK Household Longitudinal Study), which began in 2009, is conducted by the Institute for Social and Economic Research (ISER) at the University of Essex, and the survey research organisations Verian Group (formerly Kantar Public) and NatCen. It builds on and incorporates, the British Household Panel Survey (BHPS), which began in 1991.

    The Understanding Society: Calendar Year Dataset, 2020, is designed to enable cross-sectional analysis of individuals and households relating specifically to their annual interviews conducted in the year 2020, and, therefore, combine data collected in three waves (Waves 10, 11 and 12). It has been produced from the same data collected in the main Understanding Society study and released in the longitudinal datasets SN 6614 (End User Licence) and SN 6931 (Special Licence). Such cross-sectional analysis can, however, only involve variables that are collected in every wave in order to have data for the full sample panel. The 2020 dataset is the first of a series of planned Calendar Year Datasets to facilitate cross-sectional analysis of specific years. Full details of the Calendar Year Dataset sample structure (including why some individual interviews from 2021 are included), data structure and additional supporting information can be found in the document '8987_calendar_year_dataset_2020_user_guide'.

    As multi-topic studies, the purpose of Understanding Society is to understand short- and long-term effects of social and economic change in the UK at the household and individual levels. The study has a strong emphasis on domains of family and social ties, employment, education, financial resources, and health. Understanding Society is an annual survey of each adult member of a nationally representative sample. The same individuals are re-interviewed in each wave approximately 12 months apart. When individuals move they are followed within the UK and anyone joining their households are also interviewed as long as they are living with them. The fieldwork period for a single wave is 24 months. Data collection uses computer-assisted personal interviewing (CAPI) and web interviews (from wave 7), and includes a telephone mop up. From March 2020 (the end of wave 10 and 2nd year of wave 11), due to the coronavirus pandemic, face-to-face interviews were suspended and the survey has been conducted by web and telephone only, but otherwise has continued as before. One person completes the household questionnaire. Each person aged 16 or older participates in the individual adult interview and self-completed questionnaire. Youths aged 10 to 15 are asked to respond to a paper self-completion questionnaire. In 2020 an additional frequent web survey was separately issued to sample members to capture data on the rapid changes in people’s lives due to the COVID-19 pandemic (see SN 8644). The COVID-19 Survey data are not included in this dataset.

    Further information may be found on the "https://www.understandingsociety.ac.uk/documentation/mainstage"> Understanding Society main stage webpage and links to publications based on the study can be found on the Understanding Society Latest Research webpage.

    Co-funders
    In addition to the Economic and Social Research Council, co-funders for the study included the Department of Work and Pensions, the Department for Education, the Department for Transport, the Department of Culture, Media and Sport, the Department for Community and Local Government, the Department of Health, the Scottish Government, the Welsh Assembly Government, the Northern Ireland Executive, the Department of Environment and Rural Affairs, and the Food Standards Agency.

    End User Licence and Special Licence versions:
    There are two versions of the Calendar Year 2020 data. One is available under the standard End User Licence (EUL) agreement, and the other is a Special Licence (SL) version. The SL version contains month and year of birth variables instead of just age, more detailed country and occupation coding for a number of variables and various income variables have not been top-coded (see xxxx_eul_vs_sl_variable_differences for more details). Users are advised to first obtain the standard EUL version of the data to see if they are sufficient for their research requirements. The SL data have more restrictive access conditions; prospective users of the SL version will need to complete an extra application form and demonstrate to the data owners exactly why they need access to the additional variables in order to get permission to use that version. The main longitudinal versions of the Understanding Society study may be found under SNs 6614 (EUL) and 6931 (SL).

    Low- and Medium-level geographical identifiers produced for the mainstage longitudinal dataset can be used with this Calendar Year 2020 dataset, subject to SL access conditions. See the User Guide for further details.

    Suitable data analysis software
    These data are provided by the depositor in Stata format. Users are strongly advised to analyse them in Stata. Transfer to other formats may result in unforeseen issues. Stata SE or MP software is needed to analyse the larger files, which contain about 1,900 variables.

  5. f

    Replication package in Stata format, include: 1.main_data.dta: Province...

    • figshare.com
    zip
    Updated Oct 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xuan Lin; Yuantao Jiang (2025). Replication package in Stata format, include: 1.main_data.dta: Province panel dataset (2012–2022) containing core variables, mechanism variables, control variables. [Dataset]. http://doi.org/10.1371/journal.pone.0335065.s002
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 29, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Xuan Lin; Yuantao Jiang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    2.incomeCMDS.dta: Micro-level dataset from the China Migrants Dynamic Survey (CMDS). 3. dofile.do: Stata 17.0 script for data loading, variable construction, and regression analysis. (ZIP)

  6. i

    Integrated Household Panel Survey 2010-2013-2016-2019 (Long-Term Panel, 102...

    • catalog.ihsn.org
    • datacatalog.ihsn.org
    • +1more
    Updated Jul 19, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Statistical Office (NSO) (2023). Integrated Household Panel Survey 2010-2013-2016-2019 (Long-Term Panel, 102 EAs) - Malawi [Dataset]. http://catalog.ihsn.org/catalog/8702
    Explore at:
    Dataset updated
    Jul 19, 2023
    Dataset authored and provided by
    National Statistical Office (NSO)
    Time period covered
    2010 - 2019
    Area covered
    Malawi
    Description

    Abstract

    The 2016 Integrated Household Panel Survey (IHPS) was launched in April 2016 as part of the Malawi Fourth Integrated Household Survey fieldwork operation. The IHPS 2016 targeted 1,989 households that were interviewed in the IHPS 2013 and that could be traced back to half of the 204 enumeration areas that were originally sampled as part of the Third Integrated Household Survey (IHS3) 2010/11. The 2019 IHPS was launched in April 2019 as part of the Malawi Fifth Integrated Household Survey fieldwork operations targeting the 2,508 households that were interviewed in 2016. The panel sample expanded each wave through the tracking of split-off individuals and the new households that they formed. Available as part of this project is the IHPS 2019 data, the IHPS 2016 data as well as the rereleased IHPS 2010 & 2013 data including only the subsample of 102 EAs with updated panel weights. Additionally, the IHPS 2016 was the first survey that received complementary financial and technical support from the Living Standards Measurement Study – Plus (LSMS+) initiative, which has been established with grants from the Umbrella Facility for Gender Equality Trust Fund, the World Bank Trust Fund for Statistical Capacity Building, and the International Fund for Agricultural Development, and is implemented by the World Bank Living Standards Measurement Study (LSMS) team, in collaboration with the World Bank Gender Group and partner national statistical offices. The LSMS+ aims to improve the availability and quality of individual-disaggregated household survey data, and is, at start, a direct response to the World Bank IDA18 commitment to support 6 IDA countries in collecting intra-household, sex-disaggregated household survey data on 1) ownership of and rights to selected physical and financial assets, 2) work and employment, and 3) entrepreneurship – following international best practices in questionnaire design and minimizing the use of proxy respondents while collecting personal information. This dataset is included here.

    Geographic coverage

    National coverage

    Analysis unit

    • Households
    • Individuals
    • Children under 5 years
    • Consumption expenditure commodities/items
    • Communities
    • Agricultural household/ Holder/ Crop

    Universe

    The IHPS 2016 and 2019 attempted to track all IHPS 2013 households stemming from 102 of the original 204 baseline panel enumeration areas as well as individuals that moved away from the 2013 dwellings between 2013 and 2016 as long as they were neither servants nor guests at the time of the IHPS 2013; were projected to be at least 12 years of age and were known to be residing in mainland Malawi but excluding those in Likoma Island and in institutions, including prisons, police compounds, and army barracks.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    A sub-sample of IHS3 2010 sample enumeration areas (EAs) (i.e. 204 EAs out of 768 EAs) was selected prior to the start of the IHS3 field work with the intention to (i) to track and resurvey these households in 2013 in accordance with the IHS3 fieldwork timeline and as part of the Integrated Household Panel Survey (IHPS 2013) and (ii) visit a total of 3,246 households in these EAs twice to reduce recall associated with different aspects of agricultural data collection. At baseline, the IHPS sample was selected to be representative at the national, regional, urban/rural levels and for each of the following 6 strata: (i) Northern Region - Rural, (ii) Northern Region - Urban, (iii) Central Region - Rural, (iv) Central Region - Urban, (v) Southern Region - Rural, and (vi) Southern Region - Urban. The IHPS 2013 main fieldwork took place during the period of April-October 2013, with residual tracking operations in November-December 2013.

    Given budget and resource constraints, for the IHPS 2016 the number of sample EAs in the panel was reduced to 102 out of the 204 EAs. As a result, the domains of analysis are limited to the national, urban and rural areas. Although the results of the IHPS 2016 cannot be tabulated by region, the stratification of the IHPS by region, urban and rural strata was maintained. The IHPS 2019 tracked all individuals 12 years or older from the 2016 households.

    Mode of data collection

    Computer Assisted Personal Interview [capi]

    Cleaning operations

    Data Entry Platform To ensure data quality and timely availability of data, the IHPS 2019 was implemented using the World Bank’s Survey Solutions CAPI software. To carry out IHPS 2019, 1 laptop computer and a wireless internet router were assigned to each team supervisor, and each enumerator had an 8–inch GPS-enabled Lenovo tablet computer that the NSO provided. The use of Survey Solutions allowed for the real-time availability of data as the completed data was completed, approved by the Supervisor and synced to the Headquarters server as frequently as possible. While administering the first module of the questionnaire the enumerator(s) also used their tablets to record the GPS coordinates of the dwelling units. Geo-referenced household locations from that tablet complemented the GPS measurements taken by the Garmin eTrex 30 handheld devices and these were linked with publically available geospatial databases to enable the inclusion of a number of geospatial variables - extensive measures of distance (i.e. distance to the nearest market), climatology, soil and terrain, and other environmental factors - in the analysis.

    Data Management The IHPS 2019 Survey Solutions CAPI based data entry application was designed to stream-line the data collection process from the field. IHPS 2019 Interviews were mainly collected in “sample” mode (assignments generated from headquarters) and a few in “census” mode (new interviews created by interviewers from a template) for the NSO to have more control over the sample. This hybrid approach was necessary to aid the tracking operations whereby an enumerator could quickly create a tracking assignment considering that they were mostly working in areas with poor network connection and hence could not quickly receive tracking cases from Headquarters.

    The range and consistency checks built into the application was informed by the LSMS-ISA experience with the IHS3 2010/11, IHPS 2013 and IHPS 2016. Prior programming of the data entry application allowed for a wide variety of range and consistency checks to be conducted and reported and potential issues investigated and corrected before closing the assigned enumeration area. Headquarters (the NSO management) assigned work to the supervisors based on their regions of coverage. The supervisors then made assignments to the enumerators linked to their supervisor account. The work assignments and syncing of completed interviews took place through a Wi-Fi connection to the IHPS 2019 server. Because the data was available in real time it was monitored closely throughout the entire data collection period and upon receipt of the data at headquarters, data was exported to Stata for other consistency checks, data cleaning, and analysis.

    Data Cleaning The data cleaning process was done in several stages over the course of fieldwork and through preliminary analysis. The first stage of data cleaning was conducted in the field by the field-based field teams utilizing error messages generated by the Survey Solutions application when a response did not fit the rules for a particular question. For questions that flagged an error, the enumerators were expected to record a comment within the questionnaire to explain to their supervisor the reason for the error and confirming that they double checked the response with the respondent. The supervisors were expected to sync the enumerator tablets as frequently as possible to avoid having many questionnaires on the tablet, and to enable daily checks of questionnaires. Some supervisors preferred to review completed interviews on the tablets so they would review prior to syncing but still record the notes in the supervisor account and reject questionnaires accordingly. The second stage of data cleaning was also done in the field, and this resulted from the additional error reports generated in Stata, which were in turn sent to the field teams via email or DropBox. The field supervisors collected reports for their assignments and in coordination with the enumerators reviewed, investigated, and collected errors. Due to the quick turn-around in error reporting, it was possible to conduct call-backs while the team was still operating in the EA when required. Corrections to the data were entered in the rejected questionnaires and sent back to headquarters.

    The data cleaning process was done in several stages over the course of the fieldwork and through preliminary analyses. The first stage was during the interview itself. Because CAPI software was used, as enumerators asked the questions and recorded information, error messages were provided immediately when the information recorded did not match previously defined rules for that variable. For example, if the education level for a 12 year old respondent was given as post graduate. The second stage occurred during the review of the questionnaire by the Field Supervisor. The Survey Solutions software allows errors to remain in the data if the enumerator does not make a correction. The enumerator can write a comment to explain why the data appears to be incorrect. For example, if the previously mentioned 12 year old was, in fact, a genius who had completed graduate studies. The next stage occurred when the data were transferred to headquarters where the NSO staff would again review the data for errors and verify the comments from the

  7. d

    Replication Data for: Reexamining the Effect of Mass Shootings on Public...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Barney, David; Schaffner, Brian (2023). Replication Data for: Reexamining the Effect of Mass Shootings on Public Support for Gun Control [Dataset]. http://doi.org/10.7910/DVN/YJQIXP
    Explore at:
    Dataset updated
    Nov 22, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Barney, David; Schaffner, Brian
    Description

    This repository contains all replication materials for " Reexamining the Effect of Mass Shootings on Public Support for Gun Control" by David J. Barney and Brian F. Schaffner. The data included are as follows: 3 CCES panel datasets, supplementary data for merging, 2 fully-merged CCES panel datasets. In addition, we include two .do files of Stata code, one of which prepares the data for analysis, and one of which replicates the analyses presented in our paper.

  8. H

    Replication data for Decompressing to prevent unrest

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Aug 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Altman (2025). Replication data for Decompressing to prevent unrest [Dataset]. http://doi.org/10.7910/DVN/MDCNVG
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 29, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    David Altman
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Replication Package – Decompressing to Prevent Unrest David Altman, Pontificia Universidad Católica de Chile 1. Description This dataset accompanies the article: Altman, David. “Decompressing to prevent unrest: political participation through citizen-initiated mechanisms of direct democracy” (2025), Social Movement Studies. It contains the data and code necessary to replicate all statistical analyses and tables presented in the article. 2. Coverage Time frame: 1970–2019 Countries: 116 democracies worldwide (electoral and liberal, according to V-Dem v14) Unit of analysis: Country-year 3. Data Sources V-Dem v14 (Coppedge et al., 2024): direct democracy indices (CIC-DPVI, TOC-DPVI), civil society participation index. NAVCO 1.3 (Chenoweth & Shay, 2020): violent and nonviolent resistance campaigns (dependent variable). World Bank, World Development Indicators: GDP per capita (constant 2015 US$), inflation. Author’s coding: harmonization and cleaning of datasets, construction of dependent variable (excluding self-determination/secession cases). 4. Variables accepted: dichotomous dependent variable (1 if violent or nonviolent regime-change/“other” campaign occurred in a given year; 0 otherwise). CIC_DPVI: citizen-initiated component of V-Dem’s Direct Popular Vote Index. TOC_DPVI: top-down component of direct democracy (plebiscites, obligatory referenda). pc_GDP: GDP per capita (constant 2015 US$). Inflation: annual inflation (%). v2x_cspart: Civil Society Participation Index (V-Dem). country, year: identifiers. 5. Files Included data.dta / data.csv – panel dataset used in the article. master.do – Stata do-file to reproduce all analyses. tables.do – generates Tables 1–2. figures.do – generates Figure 1 (coefficient plot). ReadMe.txt – this document. 6. Instructions Open master.do in Stata (v17 or later). Set working directory to the folder containing the replication package. Run the file. This will: Load data.dta Estimate the models (fixed-effects and random-effects logit with lagged IVs) Produce Tables 1–2 in /results/ Produce Figure 1 in /figures/ 7 Citation If you use this dataset, please cite: Altman, David (2025). Replication data for: Decompressing to Prevent Unrest: Political Participation through Citizen-Initiated Mechanisms of Direct Democracy. Harvard Dataverse. DOI: [to be added]

  9. D

    Replication Data for: A High Court Plays the Accordion: Validating Ex Ante...

    • dataverse.no
    • dataverse.azure.uit.no
    • +1more
    tsv, txt
    Updated Sep 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Henrik L. Bentsen; Gunnar Grendstad; William R. Shaffer; Eric N. Waltenburg; Eric N. Waltenburg; Henrik L. Bentsen; Gunnar Grendstad; William R. Shaffer (2023). Replication Data for: A High Court Plays the Accordion: Validating Ex Ante Case Complexity on Oral Arguments [Dataset]. http://doi.org/10.18710/DWIX6Y
    Explore at:
    txt(6671), tsv(235966), txt(213402)Available download formats
    Dataset updated
    Sep 28, 2023
    Dataset provided by
    DataverseNO
    Authors
    Henrik L. Bentsen; Gunnar Grendstad; William R. Shaffer; Eric N. Waltenburg; Eric N. Waltenburg; Henrik L. Bentsen; Gunnar Grendstad; William R. Shaffer
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The data set (saved in Stata *.dta and .txt) contains all observations (Norwegian supreme court cases 2008-2018 decided in five-justice panels) and variables (independent variables measuring complexity of cases and the dependent variable measuring time in hours scheduled for oral arguments) relevant for a complete replication of the the study. ABSTRACT OF STUDY: While high courts with fixed time for oral arguments deprive researchers of the opportunity to extract temporal variance, courts that apply the “accordion model” institutional design and adjust the time for oral arguments according to the perceived complexity of a case are a boon for research that seeks to validate case complexity well ahead of the courts’ opinion writing. We analyse an original data set of all 1,402 merits decisions of the Norwegian Supreme Court from 2008 to 2018 where the justices set time for oral arguments to accommodate the anticipated difficulty of the case. Our validation model empirically tests whether and how attributes of a case associated with ex ante complexity are linked with time allocated for oral arguments. Cases that deal with international law and civil law, have several legal players, are cross-appeals from lower courts are indicative of greater case complexity. We argue that these results speak powerfully to the use of case attributes and/or the time reserved for oral arguments as ex ante measures of case complexity. To enhance the external validity of our findings, future studies should examine whether these results are confirmed in high courts with similar institutional design for oral arguments. Subsequent analyses should also test the degree to which complex cases and/or time for oral arguments have predictive validity on more divergent opinions among the justices and on the time courts and justices need to render a final opinion.

  10. High Frequency Phone Survey, Continuous Data Collection 2023 - Vanuatu

    • microdata.pacificdata.org
    Updated Mar 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shohei Nakamura (2025). High Frequency Phone Survey, Continuous Data Collection 2023 - Vanuatu [Dataset]. https://microdata.pacificdata.org/index.php/catalog/878
    Explore at:
    Dataset updated
    Mar 23, 2025
    Dataset provided by
    World Bank Grouphttp://www.worldbank.org/
    Shohei Nakamura
    William Seitz
    Time period covered
    2024 - 2025
    Area covered
    Vanuatu
    Description

    Abstract

    Access to up-to-date socio-economic data is a widespread challenge in Vanuatu and other Pacific Island Countries. To increase data availability and promote evidence-based policymaking, the Pacific Observatory provides innovative solutions and data sources to complement existing survey data and analysis. One of these data sources is a series of High Frequency Phone Surveys (HFPS), which began in 2020 to monitor the socio-economic impacts of the COVID-19 Pandemic, and since 2023 has grown into a series of continuous surveys for socio-economic monitoring. See https://www.worldbank.org/en/country/pacificislands/brief/the-pacific-observatory for further details.

    For Vanuatu, data for December 2023 – January 2025 was collected with each month having approximately 1000 households in the sample and is representative of urban and rural areas but is not representative at the province level. This dataset contains combined monthly survey data for all months of the continuous HFPS in Vanuatu. There is one date file for household level data with a unique household ID. And a separate file for individual level data within each household data, that can be matched to the household file using the household ID, and which also has a unique individual ID within the household data which can be used to track individuals over time within households, where the data is panel data.

    Geographic coverage

    National, urban and rural. Six provinces were covered by this survey: Sanma, Shefa, Torba, Penama, Malampa and Tafea.

    Analysis unit

    Household and individuals.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The Vanuatu High Frequency Phone Survey (HFPS) sample is drawn from the list of customer phone numbers (MSIDNS) provided by Digicel Vanuatu, one of the country’s two main mobile providers. Digicel’s customer base spans all regions of Vanuatu. For the initial data collection, Digicel filtered their MSIDNS database to ensure a representative distribution across regions. Recognizing the challenge of reaching low-income respondents, Digicel also included low-income areas and customers with a low-income profile (defined by monthly spending between 50 and 150 VT), as well as those with only incoming calls or using the IOU service without repayment. These filtered lists were then randomized, and enumerators began calling the numbers.

    This approach was used to complete the first round of 1,000 interviews. The respondents from this first round formed a panel to be surveyed monthly. Each month, phone numbers from the panel are contacted until all have been interviewed, at which point new phone numbers (fresh MSIDNS from Digicel’s database) are used to replace those that have been exhausted. These new respondents are then added to the panel for future surveys.

    Mode of data collection

    Computer Assisted Telephone Interview [cati]

    Research instrument

    The questionnaire was developed in both English and Bislama. Sections of the Questionnaire:

    -Interview Information -Household Roster (separate modules for new households and returning households) -Labor (separate modules for new households and returning households) -Food Security
    -Household Income -Agriculture
    -Social Protection
    -Access to Services -Assets -Perceptions -Follow-up

    Cleaning operations

    At the end of data collection, the raw dataset was cleaned by the survey firm and the World Bank team. Data cleaning mainly included formatting, relabeling, and excluding survey monitoring variables (e.g., interview start and end times). Data was edited using the software STATA.

    The data are presented in two datasets: a household dataset and an individual dataset. The total number of observations is 13,779 in the household dataset and 77,501 in the individual dataset. The individual dataset contains information on individual demographics and labor market outcomes of all household members aged 15 and above, and the household data set contains information about household demographics, education, food security, household income, agriculture activities, social protection, access to services, and durable asset ownership. The household identifier (hhid) is available in both the household dataset and the individual dataset. The individual identifier (hhid_mem) can be found in the individual dataset.

    Response rate

    In November 2024, a total of 7,874 calls were made. Of these, 2,251 calls were successfully connected, and 1,000 respondents completed the survey. By February 2024, the sample was fully comprised of returning respondents, with a re-contact rate of 99.9 percent.

  11. f

    Data from: Inconsistent Retirement Timing

    • figshare.com
    zip
    Updated Dec 14, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Philipp Schreiber; Christoph Merkle; Martin Weber (2021). Inconsistent Retirement Timing [Dataset]. http://doi.org/10.6084/m9.figshare.17197928.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 14, 2021
    Dataset provided by
    figshare
    Authors
    Philipp Schreiber; Christoph Merkle; Martin Weber
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    AbstractWe study the effect of inconsistent time preferences on actual and planned retirement timing decisions in two independent datasets. Theory predicts that hyperbolic time preferences can lead to dynamically inconsistent retirement timing. In an online experiment with more than 2,000 participants, we find that time-inconsistent participants retire on average 1.75 years earlier than time-consistent participants do. The planned retirement age of non-retired participants decreases with age. This negative age effect is about twice as strong among time-inconsistent participants. The temptation of early retirement seems to rise in the final years of approaching retirement. Consequently, time-inconsistent participants have a higher probability of regretting their retirement decision. We find similar results for a representative household survey (German SAVE panel). Using smoking behavior and overdraft usage as time preference proxies, we confirm that time-inconsistent participants retire earlier and that non-retirees reduce their planned retirement age within the panel.MethodsWe conduct an online experiment in cooperation with a large and well-circulated German newspaper, the Frankfurter Allgemeine Zeitung (FAZ). Participants are recruited via a link on the newspaper's website and two announcements in the print edition. In total, 3,077 participants complete the experiment, which takes them on average 11 minutes. Participants answer questions about retirement planning, time preferences, risk preferences, financial literacy, and demographics. The initial sample for this study consists of 256 retired participants and 2,173 non-retired participants.Usage NotesOur dataset: STATA Do File is attached Additional Datasets: In addition, a German Household Panle is used in this paper. The data cannot be uploaded by us but is available via the Max Planck Institute (https://www.mpisoc.mpg.de/en/social-policy-mea/research/save-2001-2013/). We upload the Do-Files used in the analysis and the results in an excel format (xlsx).

  12. m

    Panel dataset on Brazilian fuel demand

    • data.mendeley.com
    Updated Oct 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sergio Prolo (2024). Panel dataset on Brazilian fuel demand [Dataset]. http://doi.org/10.17632/hzpwbp7j22.1
    Explore at:
    Dataset updated
    Oct 7, 2024
    Authors
    Sergio Prolo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Brazil
    Description

    Summary : Fuel demand is shown to be influenced by fuel prices, people's income and motorization rates. We explore the effects of electric vehicle's rates in gasoline demand using this panel dataset.

    Files : dataset.csv - Panel dimensions are the Brazilian state ( i ) and year ( t ). The other columns are: gasoline sales per capita (ln_Sg_pc), prices of gasoline (ln_Pg) and ethanol (ln_Pe) and their lags, motorization rates of combustion vehicles (ln_Mi_c) and electric vehicles (ln_Mi_e) and GDP per capita (ln_gdp_pc). All variables are all under the natural log function, since we use this to calculate demand elasticities in a regression model.

    adjacency.csv - The adjacency matrix used in interaction with electric vehicles' motorization rates to calculate spatial effects. At first, it follows a binary adjacency formula: for each pair of states i and j, the cell (i, j) is 0 if the states are not adjacent and 1 if they are. Then, each row is normalized to have sum equal to one.

    regression.do - Series of Stata commands used to estimate the regression models of our study. dataset.csv must be imported to work, see comment section.

    dataset_predictions.xlsx - Based on the estimations from Stata, we use this excel file to make average predictions by year and by state. Also, by including years beyond the last panel sample, we also forecast the model into the future and evaluate the effects of different policies that influence gasoline prices (taxation) and EV motorization rates (electrification). This file is primarily used to create images, but can be used to further understand how the forecasting scenarios are set up.

    Sources: Fuel prices and sales: ANP (https://www.gov.br/anp/en/access-information/what-is-anp/what-is-anp) State population, GDP and vehicle fleet: IBGE (https://www.ibge.gov.br/en/home-eng.html?lang=en-GB) State EV fleet: Anfavea (https://anfavea.com.br/en/site/anuarios/)

  13. Variable unit root test.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Sep 11, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jiajun He; Zirui Huang; Xin Fan; Hui Zhang; Rong Zhou; Mingwei Song (2023). Variable unit root test. [Dataset]. http://doi.org/10.1371/journal.pone.0290607.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Sep 11, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Jiajun He; Zirui Huang; Xin Fan; Hui Zhang; Rong Zhou; Mingwei Song
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In this paper, we take the Yangtze River Economic Belt as the study area and analyze three types of environmental regulation tools, namely, command-and-control (CAC), market-incentivized (MI) and public-type (PT). We apply the threshold effect to test the impact of each of these tools on regional economic growth and analyze the relationships between the tools and environmental regulation. The entropy method is used to calculate the comprehensive environmental pollution index of each province and city in the Yangtze River Economic Belt. Using Stata 14.0 measurement software and based on provincial data with respect to the Yangtze River Economic Belt from 2014 to 2021, a panel threshold model is used to test the impact of the three types of environmental regulation tools on regional economic growth and analyze the relationship between environmental regulation and regional economic growth. It is found that the relationship between environmental regulation and economic growth is non-linear. There is no significant relationship between CAC environmental regulation and regional economic growth; there is a single threshold effect between market-incentive environmental regulation and public participation environmental regulation on the economic growth of the Yangtze River economic belt.

  14. 2

    UKHLS

    • datacatalogue.ukdataservice.ac.uk
    Updated Oct 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Essex, Institute for Social and Economic Research (2025). UKHLS [Dataset]. http://doi.org/10.5255/UKDA-SN-9471-1
    Explore at:
    Dataset updated
    Oct 21, 2025
    Dataset provided by
    UK Data Servicehttps://ukdataservice.ac.uk/
    Authors
    University of Essex, Institute for Social and Economic Research
    Area covered
    United Kingdom
    Description

    Understanding Society, (UK Household Longitudinal Study), which began in 2009, is conducted by the Institute for Social and Economic Research (ISER) at the University of Essex and the survey research organisations Verian Group (formerly Kantar Public) and NatCen. It builds on and incorporates, the British Household Panel Survey (BHPS), which began in 1991.

    The Understanding Society: Calendar Year Dataset, 2023, is designed for analysts to conduct cross-sectional analysis for the 2023 calendar year. The Calendar Year datasets combine data collected in a specific year from across multiple waves and these are released as separate calendar year studies, with appropriate analysis weights, starting with the 2020 Calendar Year dataset. Each subsequent year, an additional yearly study is released.

    The Calendar Year data is designed to enable timely cross-sectional analysis of individuals and households in a calendar year. Such analysis can however, only involve variables that are collected in every wave (excluding rotating content which is only collected in some of the waves). Due to overlapping fieldwork the data files combine data collected in the three waves that make up a calendar year. Analysis cannot be restricted to data collected in one wave during a calendar year, as this subset will not be representative of the population. Further details and guidance on this study can be found in the xxxx_main_survey_calendar_year_user_guide_2023.

    These calendar year datasets should be used for cross-sectional analysis only. For those interested in longitudinal analyses using Understanding Society please access the main survey datasets: Safeguarded (End User Licence) version or Safeguarded/Special Licence version.

    Understanding Society: the UK Household Longitudinal Study, started in 2009 with a general population sample (GPS) of UK residents living in private households of around 26,000 households and an ethnic minority boost sample (EMBS) of 4,000 households. All members of these responding households and their descendants became part of the core sample who were eligible to be interviewed every year. Anyone who joined these households after this initial wave, were also interviewed as long as they lived with these core sample members to provide the household context. At each annual interview, some basic demographic information was collected about every household member, information about the household is collected from one household member, all 16+ year old household members are eligible for adult interviews, 10-15 year old household members are eligible for youth interviews, and some information is collected about 0-9 year olds from their parents or guardians. Since 1991 until 2008/9 a similar survey, the British Household Panel Survey (BHPS), was fielded. The surviving members of this survey sample were incorporated into Understanding Society in 2010. In 2015, an immigrant and ethnic minority boost sample (IEMBS) of around 2,500 households was added. In 2022 a GPS boost sample (GPS2) of around 5,700 households was added. To know more about the sample design, following rules, interview modes, incentives, consent, questionnaire content please see the study overview and user guide.

    Co-funders

    In addition to the Economic and Social Research Council, co-funders for the study included the Department of Work and Pensions, the Department for Education, the Department for Transport, the Department of Culture, Media and Sport, the Department for Community and Local Government, the Department of Health, the Scottish Government, the Welsh Assembly Government, the Northern Ireland Executive, the Department of Environment and Rural Affairs, and the Food Standards Agency.

    End User Licence and Special Licence versions:

    There are two versions of the Calendar Year 2023 data. One is available under the standard End User Licence (EUL) agreement, and the other is a Special Licence (SL) version. The SL version contains month and year of birth variables instead of just age, more detailed country and occupation coding for a number of variables and various income variables have not been top-coded (see document '9471_eul_vs_sl_variable_differences' for more details). Users are advised to first obtain the standard EUL version of the data to see if they are sufficient for their research requirements. The SL data have more restrictive access conditions; prospective users of the SL version will need to complete an extra application form and demonstrate to the data owners exactly why they need access to the additional variables in order to get permission to use that version. The main longitudinal versions of the Understanding Society study may be found under SNs 6614 (Safeguarded (EUL)) and 6931 (Safeguarded/SL).

    Low- and Medium-level geographical identifiers produced for the mainstage longitudinal dataset can be used with this Calendar Year 2023 dataset, subject to SL access conditions. See the User Guide for further details.

    Suitable data analysis software

    These data are provided by the depositor in Stata format. Users are strongly advised to analyse them in Stata. Transfer to other formats may result in unforeseen issues. Stata SE or MP software is needed to analyse the larger files, which contain about 1,800 variables.

  15. r

    Sample selection in linear panel data models with heterogeneous coefficients...

    • resodate.org
    Updated Oct 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alyssa Carlson; Riju Joshi (2025). Sample selection in linear panel data models with heterogeneous coefficients (replication data) [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9qb3VybmFsZGF0YS56YncuZXUvZGF0YXNldC9zYW1wbGUtc2VsZWN0aW9uLWluLWxpbmVhci1wYW5lbC1kYXRhLW1vZGVscy13aXRoLWhldGVyb2dlbmVvdXMtY29lZmZpY2llbnRz
    Explore at:
    Dataset updated
    Oct 6, 2025
    Dataset provided by
    Journal of Applied Econometrics
    ZBW
    ZBW Journal Data Archive
    Authors
    Alyssa Carlson; Riju Joshi
    Description

    This archive contains the replication files for "Sample selection in linear panel data models with heterogeneous coefficients" by Alyssa Carlson and Riju Joshi, in Journal of Applied Econometrics (2023). All codes and data are provided. There are two folders corresponding to the simulation study and the empirical application of the paper. To replicate, navigate to the respective folder and follow the README instructions. All codes and datasets are in Stata.

  16. H

    Data for independent undergraduate research: cleaned data for "COVID-19 and...

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Jul 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Haoling Wang (2025). Data for independent undergraduate research: cleaned data for "COVID-19 and the Uneven Impact on Pharmaceutical Innovation: Evidence from China and the EU" [Dataset]. http://doi.org/10.7910/DVN/QAD1L3
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 6, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    Haoling Wang
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    China
    Description

    The sample is drawn from the Industrial R&D Investment Scoreboard published by the IRI/JRC, which tracks the world’s top 1000 firms by R&D spending. Chinese and EU-based pharmaceutical companies form a significant portion of the dataset, making it well-suited for a DID design to compare post-COVID changes in R&D investment across regions. The dataset was restructured into a panel of firm-year observations from 2015 to 2024, covering key variables such as R&D input, capital expenditure, profit, and employment. After excluding entries with missing values in core variables, standard data-cleaning procedures using Stata was implemented. The final analytical sample includes 217 firms, covering 114 Chinese and 103 EU companies, observed over an unbalanced panel structure. Company identity is tracked via the company variable. Key financial indicators such as R&D input, profits, and employees exhibit variation across both time and geography, justifying a panel-data approach.

  17. f

    General Household Survey, Panel 2018-2019 - Nigeria

    • microdata.fao.org
    Updated Nov 8, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Bureau of Statistics (2022). General Household Survey, Panel 2018-2019 - Nigeria [Dataset]. https://microdata.fao.org/index.php/catalog/1374
    Explore at:
    Dataset updated
    Nov 8, 2022
    Dataset authored and provided by
    National Bureau of Statistics
    Time period covered
    2018 - 2019
    Area covered
    Nigeria
    Description

    Abstract

    The General Household Survey-Panel (GHS-Panel) is implemented in collaboration with the World Bank Living Standards Measurement Study (LSMS) team as part of the Integrated Surveys on Agriculture (ISA) program. The objectives of the GHS-Panel include the development of an innovative model for collecting agricultural data, interinstitutional collaboration, and comprehensive analysis of welfare indicators and socio-economic characteristics. The GHS-Panel is a nationally representative survey of approximately 5,000 households, which are also representative of the six geopolitical zones. The 2018/19 is the fourth round of the survey with prior rounds conducted in 2010/11, 2012/13, and 2015/16. GHS-Panel households were visited twice: first after the planting season (post-planting) between July and September 2018 and second after the harvest season (post-harvest) between January and February 2019.

    Geographic coverage

    National, the survey covered all the 36 states and Federal Capital Territory (FCT).

    Analysis unit

    Households, Individuals, Agricultural plots, Communites

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The original GHS-Panel sample of 5,000 households across 500 enumeration areas (EAs) and was designed to be representative at the national level as well as at the zonal level. The complete sampling information for the GHS-Panel is described in the Basic Information Document for GHS-Panel 2010/2011. However, after a nearly a decade of visiting the same households, a partial refresh of the GHS-Panel sample was implemented in Wave 4. For the partial refresh of the sample, a new set of 360 EAs were randomly selected which consisted of 60 EAs per zone. The refresh EAs were selected from the same sampling frame as the original GHS-Panel sample in 2010 (the "master frame").

    A listing of all households was conducted in the 360 EAs and 10 households were randomly selected in each EA, resulting in a total refresh sample of approximated 3,600 households. In addition to these 3,600 refresh households, a subsample of the original 5,000 GHS-Panel households from 2010 were selected to be included in the new sample. This "long panel" sample was designed to be nationally representative to enable continued longitudinal analysis for the sample going back to 2010. The long panel sample consisted of 159 EAs systematically selected across the 6 geopolitical Zones. The systematic selection ensured that the distribution of EAs across the 6 Zones (and urban and rural areas within) is proportional to the original GHS-Panel sample.

    Interviewers attempted to interview all households that originally resided in the 159 EAs and were successfully interviewed in the previous visit in 2016. This includes households that had moved away from their original location in 2010. In all, interviewers attempted to interview 1,507 households from the original panel sample. The combined sample of refresh and long panel EAs consisted of 519 EAs. The total number of households that were successfully interviewed in both visits was 4,976.

    Sampling deviation

    While the combined sample generally maintains both national and Zonal representativeness of the original GHS-Panel sample, the security situation in the North East of Nigeria prevented full coverage of the Zone. Due to security concerns, rural areas of Borno state were fully excluded from the refresh sample and some inaccessible urban areas were also excluded. Security concerns also prevented interviewers from visiting some communities in other parts of the country where conflict events were occurring. Refresh EAs that could not be accessed were replaced with another randomly selected EA in the Zone so as not to compromise the sample size. As a result, the combined sample is representative of areas of Nigeria that were accessible during 2018/19. The sample will not reflect conditions in areas that were undergoing conflict during that period. This compromise was necessary to ensure the safety of interviewers.

    Mode of data collection

    Computer Assisted Personal Interview [capi]

    Cleaning operations

    CAPI: For the first time in GHS-Panel, the Wave four exercise was conducted using Computer Assisted Person Interview (CAPI) techniques. All the questionnaires, household, agriculture and community questionnaires were implemented in both the post-planting and post-harvest visits of Wave 4 using the CAPI software, Survey Solutions. The Survey Solutions software was developed and maintained by the Survey Unit within the Development Economics Data Group (DECDG) at the World Bank. Each enumerator was given tablets which they used to conduct the interviews. Overall, implementation of survey using Survey Solutions CAPI was highly successful, as it allowed for timely availability of the data from completed interviews. DATA COMMUNICATION SYSTEM: The data communication system used in Wave 4 was highly automated. Each field team was given a mobile modem allow for internet connectivity and daily synchronization of their tablet. This ensured that head office in Abuja has access to the data in real-time. Once the interview is completed and uploaded to the server, the data is first reviewed by the Data Editors.

    The data is also downloaded from the server, and Stata dofile was run on the downloaded data to check for additional errors that were not captured by the Survey Solutions application. An excel error file is generated following the running of the Stata dofile on the raw dataset. Information contained in the excel error files are communicated back to respective field interviewers for action by the interviewers. This action is done on a daily basis throughout the duration of the survey, both in the post-planting and post-harvest. DATA CLEANING: The data cleaning process was done in three main stages. The first stage was to ensure proper quality control during the fieldwork. This was achieved in part by incorporating validation and consistency checks into the Survey Solutions application used for the data collection and designed to highlight many of the errors that occurred during the fieldwork. The second stage cleaning involved the use of Data Editors and Data Assistants (Headquarters in Survey Solutions). As indicated above, once the interview is completed and uploaded to the server, the Data Editors review completed interview for inconsistencies and extreme values. Depending on the outcome, they can either approve or reject the case. If rejected, the case goes back to the respective interviewer's tablet upon synchronization. Special care was taken to see that the households included in the data matched with the selected sample and where there were differences, these were properly assessed and documented.

    The agriculture data were also checked to ensure that the plots identified in the main sections merged with the plot information identified in the other sections. Additional errors observed were compiled into error reports that were regularly sent to the teams. These errors were then corrected based on re-visits to the household on the instruction of the supervisor. The data that had gone through this first stage of cleaning was then approved by the Data Editor. After the Data Editor's approval of the interview on Survey Solutions server, the Headquarters also reviews and depending on the outcome, can either reject or approve. The third stage of cleaning involved a comprehensive review of the final raw data following the first and second stage cleaning. Every variable was examined individually for (1) consistency with other sections and variables, (2) out of range responses, and (3) outliers. However, special care was taken to avoid making strong assumptions when resolving potential errors. Some minor errors remain in the data where the diagnosis and/or solution were unclear to the data cleaning team.

  18. f

    Datasets 'Bt cotton, social networks and risk in rural India' & 'Education...

    • sussex.figshare.com
    zip
    Updated Dec 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Annemie Maertens (2023). Datasets 'Bt cotton, social networks and risk in rural India' & 'Education and marriage in rural India' [Dataset]. http://doi.org/10.25377/sussex.22960772.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 12, 2023
    Dataset provided by
    University of Sussex
    Authors
    Annemie Maertens
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    India
    Description

    Datasets collected by A. Maertens. Zip files containing pdf / doc / dta files.

    Stata (https://www.stata.com/) is required to view the .dta files - please refer to the read me.pdf before using the data collected.

    Introduction

    These data were collected in the framework of Dr. Annemie Maertens’ PhD dissertation during the period August 2007 – July 2009. The dissertation was undertaken from Cornell University, but executed in India in collaboration with the International Crops Research Institute for the Semi-Arid Tropics (ICRISAT). The project was sponsored through an NSF Doctoral Dissertation Improvement Grant (Grant No. 0649330).

    The main goal of this project was to collect and analyze household survey data in the Indian states of Andhra Pradesh and Maharashtra in order to gain a better understanding of the role of social networks and identity in economic decision-making. The first panel of this research studied the role of social learning and social pressures in Bacillus thurigiensis (Bt) cotton adoption using data from three villages (Aurepalle, Kanzara and Kinkhed).

    The data collection consisted of five phases: (1) qualitative round (to determine the topic of the two panels); (2) trial round (to field test the questionnaires); (3) training round (to train the enumerators); (4) quantitative collection round (to collect the household level, village level data); (4’) data entry of (4) ; (5) data validation round (to collect additional data to correct the missing variables and inconsistencies uncovered in (4’)).

    Published papers resulting from these data

    Maertens, A., AV Chari and D.R. Just. 2014. ‘Why Farmers Sometimes Love Risks: Evidence from India.’ Economic Development and Cultural Change, 62(2): 239-274 DOI: 10.1086/674028

    Maertens, A., AV Chari and D.R. Just. 2014. ‘Why Farmers Sometimes Love Risks: Evidence from India.’ Economic Development and Cultural Change, 62(2): 239-274. DOI: 10.2139/ssrn.2024678

    Maertens, A. and C.B. Barrett. 2013. ‘Empirical Methods for Identifying Social Network Effects on Technology Adoption.’ American Journal of Agricultural Economics Papers and Proceedings, 95(2):353-359. DOI:10.1093/ajae/aaw098

  19. d

    Panel on Household Finances Interim Survey 2020 - Dataset - B2FIND

    • demo-b2find.dkrz.de
    Updated Sep 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Panel on Household Finances Interim Survey 2020 - Dataset - B2FIND [Dataset]. http://demo-b2find.dkrz.de/dataset/b123b872-c5f7-50a5-ba28-8b7b4b5aac62
    Explore at:
    Dataset updated
    Sep 20, 2025
    Description

    The PHF scientific use file (SUF) Interim Survey 2020 Version 1.0 data set is the first version of the PHF Interim Survey 2020 data set and consists of the Stata file PHF_Interim_2020_v1_0.dta. The PHF interim survey in 2020 mainly served to bridge the period between the last survey on household finance in Germany in 2017 and the PHF 4th wave which was postponed to 2021 as a result of the pandemic. It contains specific questions on the impact of the coronavirus pandemic on households and their saving behavior. The asset structure was also collected, but not in the detail of the main surveys.The different survey mode of the interim survey (PAPI/CAWI) and the associated differences in the design of the questions on wealth further restrict the comparability of the interim survey with the main waves. In particular, it is not possible to compare absolute assets consistently over time. However, the survey results provide an insight into the impact of the pandemic on households’ finances and wealth distribution. A total of 4550 households participated in the interim survey, with the majority already participating in previous surveys. Self-administered questionnaire: Paper Paper-assisted personal interviews (PAPI) Computer-assisted web Interviews (CAWI) All private households located in Germany except institutional households (in old-age homes, prisons, barracks etc.) Refresher Sample: Stratified random sample based on population registers; oversampling of wealthy households and regions in eastern Germany

  20. m

    Data from: Changes in adult well-being and economic inequalities: An...

    • data.mendeley.com
    Updated Jan 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ricardo Godoy (2024). Changes in adult well-being and economic inequalities: An exploratory observational longitudinal study (2002-2010) of micro-level trends among Tsimane’, a small-scale rural society of Indigenous People in the Bolivian Amazon [Dataset]. http://doi.org/10.17632/4vjvbvk94f.1
    Explore at:
    Dataset updated
    Jan 5, 2024
    Authors
    Ricardo Godoy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains the two Stata V17 files on which the article in World Development (2014) with the same title as the dataset is based. It also includes the Stata computer codes (do files) and a one-page Word document on how to run the datasets

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Sarah Price (2022). Example Stata syntax and data construction for negative binomial time series regression [Dataset]. http://doi.org/10.17632/3mj526hgzx.2

Example Stata syntax and data construction for negative binomial time series regression

Explore at:
Dataset updated
Nov 2, 2022
Authors
Sarah Price
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

We include Stata syntax (dummy_dataset_create.do) that creates a panel dataset for negative binomial time series regression analyses, as described in our paper "Examining methodology to identify patterns of consulting in primary care for different groups of patients before a diagnosis of cancer: an exemplar applied to oesophagogastric cancer". We also include a sample dataset for clarity (dummy_dataset.dta), and a sample of that data in a spreadsheet (Appendix 2).

The variables contained therein are defined as follows:

case: binary variable for case or control status (takes a value of 0 for controls and 1 for cases).

patid: a unique patient identifier.

time_period: A count variable denoting the time period. In this example, 0 denotes 10 months before diagnosis with cancer, and 9 denotes the month of diagnosis with cancer,

ncons: number of consultations per month.

period0 to period9: 10 unique inflection point variables (one for each month before diagnosis). These are used to test which aggregation period includes the inflection point.

burden: binary variable denoting membership of one of two multimorbidity burden groups.

We also include two Stata do-files for analysing the consultation rate, stratified by burden group, using the Maximum likelihood method (1_menbregpaper.do and 2_menbregpaper_bs.do).

Note: In this example, for demonstration purposes we create a dataset for 10 months leading up to diagnosis. In the paper, we analyse 24 months before diagnosis. Here, we study consultation rates over time, but the method could be used to study any countable event, such as number of prescriptions.

Search
Clear search
Close search
Google apps
Main menu