78 datasets found
  1. Data from: Inference in High-Dimensional Panel Models With an Application to...

    • tandf.figshare.com
    text/x-tex
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexandre Belloni; Victor Chernozhukov; Christian Hansen; Damian Kozbur (2023). Inference in High-Dimensional Panel Models With an Application to Gun Control [Dataset]. http://doi.org/10.6084/m9.figshare.1604934.v2
    Explore at:
    text/x-texAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Alexandre Belloni; Victor Chernozhukov; Christian Hansen; Damian Kozbur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We consider estimation and inference in panel data models with additive unobserved individual specific heterogeneity in a high-dimensional setting. The setting allows the number of time-varying regressors to be larger than the sample size. To make informative estimation and inference feasible, we require that the overall contribution of the time-varying variables after eliminating the individual specific heterogeneity can be captured by a relatively small number of the available variables whose identities are unknown. This restriction allows the problem of estimation to proceed as a variable selection problem. Importantly, we treat the individual specific heterogeneity as fixed effects which allows this heterogeneity to be related to the observed time-varying variables in an unspecified way and allows that this heterogeneity may differ for all individuals. Within this framework, we provide procedures that give uniformly valid inference over a fixed subset of parameters in the canonical linear fixed effects model and over coefficients on a fixed vector of endogenous variables in panel data instrumental variable models with fixed effects and many instruments. We present simulation results in support of the theoretical developments and illustrate the use of the methods in an application aimed at estimating the effect of gun prevalence on crime rates.

  2. r

    ESTIMATION OF CENSORED PANEL-DATA MODELS WITH SLOPE HETEROGENEITY...

    • resodate.org
    Updated Oct 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jason Abrevaya (2025). ESTIMATION OF CENSORED PANEL-DATA MODELS WITH SLOPE HETEROGENEITY (replication data) [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9qb3VybmFsZGF0YS56YncuZXUvZGF0YXNldC9lc3RpbWF0aW9uLW9mLWNlbnNvcmVkLXBhbmVsZGF0YS1tb2RlbHMtd2l0aC1zbG9wZS1oZXRlcm9nZW5laXR5
    Explore at:
    Dataset updated
    Oct 2, 2025
    Dataset provided by
    ZBW
    ZBW Journal Data Archive
    Journal of Applied Econometrics
    Authors
    Jason Abrevaya
    Description

    This paper considers estimation of censored panel-data models with individual-specific slope heterogeneity. The slope heterogeneity may be random (random slopes model) or related to covariates (correlated random slopes model). Maximum likelihood and censored least-absolute deviations estimators are proposed for both models. The estimators are simple to implement and, in the case of maximum likelihood, lead to straightforward estimation of partial effects. The rescaled bootstrap suggested by Andrews (Econometrica 2000; 68: 399-405) is used to deal with the possibility of variance parameters being equal to zero. The methodology is applied to an empirical study of Dutch household portfolio choice, where the outcome variable (portfolio share in safe assets) has corner solutions at zero and one. As predicted by economic theory, there is strong evidence of correlated random slopes for the age profiles, indicating a heterogeneous age profile of portfolio adjustment that varies significantly with other household characteristics.

  3. d

    Data from: Data-driven model selection within the matrix completion method...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Mar 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heiniger, Sandro (2024). Data-driven model selection within the matrix completion method for causal panel data models [Dataset]. http://doi.org/10.7910/DVN/JGGBQG
    Explore at:
    Dataset updated
    Mar 6, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Heiniger, Sandro
    Description

    Replication data for application. Visit https://dataone.org/datasets/sha256%3Ad1b60121aa674a5618dfe7e00ccaaae8beb063be28c982d294277dafeb21e5a6 for complete metadata about this dataset.

  4. Data from: Panel sample selection model with interactive effects

    • tandf.figshare.com
    bin
    Updated Jul 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jing Kong; Cheng Hsiao (2025). Panel sample selection model with interactive effects [Dataset]. http://doi.org/10.6084/m9.figshare.29456457.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Jul 2, 2025
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Jing Kong; Cheng Hsiao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We consider a two-step estimation procedure to estimate the panel sample selection models with interactive effects. In the first step, we follow the Robinson (1988) procedure to remove the sample selection factors. In the second step, we control the interactive effects. When the cross-section dimension N is large, we propose to use the Pesaran (2006) common correlated effects approach, and when the time series dimension T is large and N is finite we propose to follow the Hsiao, Shi, and Zhou (2022) transformed estimation procedure to eliminate the interactive effects. We show that the resulting estimators are consistent and asymptotically normally distributed. A limited Monte Carlo study is conducted, showing our methods appear to work well in a finite sample. An empirical illustration on female wage rate determination shows that an extra year of work experience could raise the expected log wage rate by 0.1507 under our maintained hypothesis, while neglecting sample selection or interactive effects could lead to seriously biased estimates.

  5. f

    Data from: Identification of Latent Subgroups for Time-varying Panel Data...

    • tandf.figshare.com
    txt
    Updated Nov 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ye He; Qing Luo; Liu Liu; Shengzhi Mao; Ling Zhou (2025). Identification of Latent Subgroups for Time-varying Panel Data Models [Dataset]. http://doi.org/10.6084/m9.figshare.30546617.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Nov 5, 2025
    Dataset provided by
    Taylor & Francis
    Authors
    Ye He; Qing Luo; Liu Liu; Shengzhi Mao; Ling Zhou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This paper introduces a time-varying panel data model that incorporates latent group structures, designed to tackle both individual heterogeneity and smooth structural changes over time. We develop an innovative centre-augmented K-power means (KPM) methodology that promotes convergence of subjects toward their respective cluster centers, enabling the identification of latent group structures without requiring prior knowledge of group composition. This approach delivers both superior precision and computational efficiency. We provide rigorous theoretical foundations, demonstrating estimation consistency, accurate subgroup identification, and consistent selection of the number of groups. The efficacy of the proposed KPM method in accurately identifying the latent group structures in panel data is demonstrated through comprehensive numerical analysis, including simulation studies and two real-world applications.

  6. Replication Data for: Democratization and Gini index: Panel data analysis...

    • search.datacite.org
    • dataverse.harvard.edu
    Updated 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LEIZHEN ZANG; Xiong Feng (2019). Replication Data for: Democratization and Gini index: Panel data analysis based on random forest method [Dataset]. http://doi.org/10.7910/dvn/w2cxvu
    Explore at:
    Dataset updated
    2019
    Dataset provided by
    DataCitehttps://www.datacite.org/
    Harvard Dataverse
    Authors
    LEIZHEN ZANG; Xiong Feng
    Description

    The mechanism for the association between democratic development and the wealth gap has always been the focus of political and economic research, yet with no consistent conclusion. The reasons for that often are, 1) challenges to generalize the results obtained from analyzing a single country’s time series studies or multinational cross-section data analysis, and 2) deviations in research results caused by missing values or variable selection in panel data analysis. When it comes to the latter one, there are two factors contribute to it. One is that the accuracy of estimation is interfered with the presence of missing values in variables, another is that subjective discretion that must be exercised to select suitable proxies amongst many candidates, which are likely to cause variable selection bias. In order to solve these problems, this study is the pioneeringly research to utilize the machine learning method to interpolate missing values efficiently through the random forest model in this topic, and effectively analyzed cross-country data from 151 countries covering the period 1993–2017. Since this paper measures the importance of different variables to the dependent variable, more appropriate and important variables could be selected to construct a complete regression model. Results from different models come to a consensus that the promotion of democracy can significantly narrow the gap between the rich and the poor, with marginally decreasing effect with respect to wealth. In addition, the study finds out that this mechanism exists only in non-colonial nations or presidential states. Finally, this paper discusses the potential theoretical and policy implications of results.

  7. Specification tests for the panel model selection.

    • plos.figshare.com
    xls
    Updated Jun 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ruwan Jayathilaka; Chanuka Jayawardhana; Nilupul Embogama; Shalini Jayasooriya; Navodika Karunarathna; Thisara Gamage; Nethmali Kuruppu (2023). Specification tests for the panel model selection. [Dataset]. http://doi.org/10.1371/journal.pone.0264474.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 15, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Ruwan Jayathilaka; Chanuka Jayawardhana; Nilupul Embogama; Shalini Jayasooriya; Navodika Karunarathna; Thisara Gamage; Nethmali Kuruppu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Specification tests for the panel model selection.

  8. K

    The use of model samples in the process of selection of sensory panel to...

    • uek.rodbuk.pl
    ods, txt
    Updated Feb 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paweł Turek; Paweł Turek (2024). The use of model samples in the process of selection of sensory panel to assess cosmetic products. Data for stage 1 and 2 analysis [Dataset]. http://doi.org/10.58116/UEK/V8FZEV
    Explore at:
    ods(11447), txt(1593), ods(6699)Available download formats
    Dataset updated
    Feb 27, 2024
    Dataset provided by
    Krakow University of Economics
    Authors
    Paweł Turek; Paweł Turek
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The ranking results of all assessor participating in the experiment. The data presented present the results of sensory tests in the field of checking the sensory sensitivity of the sense of touch. the data is used to compare the assessors' efficiency in the ability to sort samples using the sense of touch. Set 1 Sandpaper Set 2 Thickness of a sheet of paper Set 3 PVC cones with plasticizer additive Set 4 Model emulsions

  9. f

    Sample description.

    • figshare.com
    xls
    Updated Feb 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yanfeng Zhang; Keren Chen; Chengjie Zou (2024). Sample description. [Dataset]. http://doi.org/10.1371/journal.pone.0296121.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Feb 14, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Yanfeng Zhang; Keren Chen; Chengjie Zou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In recent years, the world has been facing severe challenges from climate change and environmental issues, with carbon dioxide emissions being considered one of the main driving factors. Many studies have proven that activities in various industries and fields have a significant impact on carbon dioxide emissions. However, few studies have explored the impact of gender on carbon dioxide emissions. This study aims to explore the potential impact of gender diversity on carbon dioxide emissions in the boards of directors of developed and emerging market enterprises. In addition, we also analyzed how board cultural diversity affects carbon dioxide emissions. We searched two European indices provided by Morgan Stanley Capital International (MSCI) from the Bloomberg database and conducted empirical analysis. We selected the MSCI index and MSCI emerging market index from 2010 to 2019 as samples and thoroughly cleaned up the data by removing any observations containing missing information on any variables. Statistical methods such as t-test, ordinary least squares, panel data analysis, regression analysis, and robustness testing were used for statistical analysis. At the same time, differential testing was conducted on sensitive and non-sensitive sectors, and the average representation of female boards in sensitive industries was low. The research results show that the proportion of female members on a company’s board of directors is negatively correlated with carbon dioxide emissions. This discovery is consistent with the legitimacy theory advocating for gender equality and environmental sustainability, emphasizing the importance of gender diversity in reducing greenhouse gas emissions. However, agency theory suggests that diversity may lead to internal conflicts within a company, leading to agency costs and information asymmetry. The research results show a negative correlation between board cultural diversity and carbon dioxide emissions, indicating the potential challenge of board cultural diversity. This study provides important insights for decision-makers and managers, not only inspiring corporate social responsibility and environmental policy formulation, but also of great significance for academic research in the field of climate change. Our research findings help deepen our understanding of the factors that affect carbon dioxide emissions in different sectors and countries, while also expanding the research field between gender diversity, cultural diversity, and environmental sustainability. Although this study still needs to be further expanded and deepened, it provides useful insights into the relationship between board gender and cultural diversity and carbon dioxide emissions.

  10. w

    National Panel Survey 2008-2015, Uniform Panel Dataset - Tanzania

    • microdata.worldbank.org
    • datacatalog.ihsn.org
    • +1more
    Updated Mar 17, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Bureau of Statistics (2021). National Panel Survey 2008-2015, Uniform Panel Dataset - Tanzania [Dataset]. https://microdata.worldbank.org/index.php/catalog/3814
    Explore at:
    Dataset updated
    Mar 17, 2021
    Dataset authored and provided by
    National Bureau of Statistics
    Time period covered
    2008 - 2015
    Area covered
    Tanzania
    Description

    Abstract

    Panel data possess several advantages over conventional cross-sectional and time-series data, including their power to isolate the effects of specific actions, treatments, and general policies often at the core of large-scale econometric development studies. While the concept of panel data alone provides the capacity for modeling the complexities of human behavior, the notion of universal panel data – in which time- and situation-driven variances leading to variations in tools, and thus results, are mitigated – can further enhance exploitation of the richness of panel information.

    This Basic Information Document (BID) provides a brief overview of the Tanzania National Panel Survey (NPS), but focuses primarily on the theoretical development and application of panel data, as well as key elements of the universal panel survey instrument and datasets generated by the four rounds of the NPS. As this Basic Information Document (BID) for the UPD does not describe in detail the background, development, or use of the NPS itself, the round-specific NPS BIDs should supplement the information provided here.

    The NPS Uniform Panel Dataset (UPD) consists of both survey instruments and datasets, meticulously aligned and engineered with the aim of facilitating the use of and improving access to the wealth of panel data offered by the NPS. The NPS-UPD provides a consistent and straightforward means of conducting not only user-driven analyses using convenient, standardized tools, but also for monitoring MKUKUTA, FYDP II, and other national level development indicators reported by the NPS.

    The design of the NPS-UPD combines the four completed rounds of the NPS – NPS 2008/09 (R1), NPS 2010/11 (R2), NPS 2012/13 (R3), and NPS 2014/15 (R4) – into pooled, module-specific survey instruments and datasets. The panel survey instruments offer the ease of comparability over time, with modifications and variances easily identifiable as well as those aspects of the questionnaire which have remained identical and offer consistent information. By providing all module-specific data over time within compact, pooled datasets, panel datasets eliminate the need for user-generated merges between rounds and present data in a clear, logical format, increasing both the usability and comprehension of complex data.

    Geographic coverage

    Designed for analysis of key indicators at four primary domains of inference, namely: Dar es Salaam, other urban, rural, Zanzibar.

    Analysis unit

    • Households
    • Individuals

    Universe

    The universe includes all households and individuals in Tanzania with the exception of those residing in military barracks or other institutions.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    While the same sample of respondents was maintained over the first three rounds of the NPS, longitudinal surveys tend to suffer from bias introduced by households leaving the survey over time; i.e. attrition. Although the NPS maintains a highly successful recapture rate (roughly 96% retention at the household level), minimizing the escalation of this selection bias, a refresh of longitudinal cohorts was done for the NPS 2014/15 to ensure proper representativeness of estimates while maintaining a sufficient primary sample to maintain cohesion within panel analysis. A newly completed Population and Housing Census (PHC) in 2012, providing updated population figures along with changes in administrative boundaries, emboldened the opportunity to realign the NPS sample and abate collective bias potentially introduced through attrition.

    To maintain the panel concept of the NPS, the sample design for NPS 2014/2015 consisted of a combination of the original NPS sample and a new NPS sample. A nationally representative sub-sample was selected to continue as part of the “Extended Panel” while an entirely new sample, “Refresh Panel”, was selected to represent national and sub-national domains. Similar to the sample in NPS 2008/2009, the sample design for the “Refresh Panel” allows analysis at four primary domains of inference, namely: Dar es Salaam, other urban areas on mainland Tanzania, rural mainland Tanzania, and Zanzibar. This new cohort in NPS 2014/2015 will be maintained and tracked in all future rounds between national censuses.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The format of the NPS-UPD survey instrument is similar to previously disseminated NPS survey instruments. Each module has a questionnaire and clearly identifies if the module collects information at the individual or household level. Within each module-specific questionnaire of the NPS-UPD survey instrument, there are five distinct sections, arranged vertically: (1) the UPD - “U” on the survey instrument, (2) R4, (3), R3, (4) R2, and (5) R1 – the latter 4 sections presenting each questionnaire in its original form at time of its respective dissemination.

    The uppermost section of each module’s questionnaire (“U”) represents the model universal panel questionnaire, with questions generated from the comprehensive listing of questions across all four rounds of the NPS and codes generated from the comprehensive collection of codes. The following sections are arranged vertically by round, considering R4 as most recent. While not all rounds will have data reported for each question in the UPD and not each question will have reports for each of the UPD codes listed, the NPS-UPD survey instrument represents the visual, all-inclusive set of information collected by the NPS over time.

    The four round-specific sections (R4, R3, R2, R1) are aligned with their UPD-equivalent question, visually presenting their contribution to compatibility with the UPD. Each round-specific section includes the original round-specific variable names, response codes and skip patterns (corresponding to their respective round-specific NPS data sets, and despite their variance from other rounds or from the comprehensive UPD code listing)4.

  11. Dataset for meta-analysis "The motherhood penalty's size and factors"

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Sep 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Irina Kalabikhina; Irina Kalabikhina; Polina Kuznetsova; Polina Kuznetsova; Sofiia Zhuravleva; Sofiia Zhuravleva (2024). Dataset for meta-analysis "The motherhood penalty's size and factors" [Dataset]. http://doi.org/10.5281/zenodo.13710305
    Explore at:
    binAvailable download formats
    Dataset updated
    Sep 16, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Irina Kalabikhina; Irina Kalabikhina; Polina Kuznetsova; Polina Kuznetsova; Sofiia Zhuravleva; Sofiia Zhuravleva
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    1968 - 2017
    Description

    PLEASE, CITE AS Kalabikhina IE, Kuznetsova PO, Zhuravleva SA (2024) Size and factors of the motherhood penalty in the labour market: A meta-analysis. Population and Economics 8(2): 178-205. https://doi.org/10.3897/popecon.8.e121438

    Explanatory note 1: List of papers used in the meta-analysis - see the file "Meta_regression_analysis_papers".

    The data is presented in WORD format.

    Explanatory note 2: Set of data used in the meta-analysis - see the file "Meta_regression_analysis_table".

    The data is presented in EXCEL format.

    Description of table headers:

    estimate_number - Number of the estimate

    paper_number - Number of the paper

    paper_name - Paper (year and first author)

    paper_excluded - Paper was excluded from the final sample

    survey - Data source

    table_in_paper - Number of the table with the regression results in the paper

    coeff - Regression coefficient for parenthood variable (estimate)

    se - SE of the estimate

    t - t-value of the estimate

    ols - Estimate is obtained using the OLS method

    fixed_effects - Estimate is obtained using the fixed effects method

    panel - Model considers panel data (for several years)

    quintile - Estimate is obtained using the quintile regression method

    other - Estimate is obtained using other methods

    selection_into_motherhood - Estimate is obtained allowing for selection into motherhood

    hackman - Estimate is obtained allowing for selection into employment (Heckman procedure)

    annual_earnings - Annual earnings are considered in the model

    monthly_wage - Monthly wage is considered in the model

    daily_wage - Daily wage is considered in the model

    hourly_wage - Hourly wage is considered in the model

    min_age_kid - Child's age (minimum)

    max_age_kid - Child's age (maximum)

    motherhood - Model uses a dummy variable of the presence of children

    num_kids - Model uses a variable of the number of children

    kid1 - Model uses a variable of the presence of one child

    kid2p - Model uses a variable of the presence of two or more children

    kid2 - Model uses a variable of the presence of two children

    kid3p - Model uses a variable of the presence of three or more children

    kid3 - Model uses a variable of the presence of three children

    kid4p - Model uses a variable of the presence of three or more children

    race/nationality - Model includes a race/ethnicity variable

    age - Model includes the age variable

    marstat - Model includes the marital status variable

    oth_char_hh - Model includes any other variables of other household characteristics

    settl_type - Model includes a variable of the type of settlement (urban, rural)

    region - Model includes a variable of the region of the country

    education - Model includes information on the level of education

    experience - Model includes a variable of work experience

    pot_experience - Model includes a variable of potential work experience, to be calculated from the data on age and number of years of education

    tenure - Model includes a variable of the duration of employment at the current job

    interruptions - Model includes a variable of employment interruptions (related to motherhood)

    occupation - Model includes an occupation variable

    industry - Model includes a variable of the industry of employment

    union - Model includes a variable of trade union membership

    friendly_conditions - Model includes a variable of the favourable working conditions for mothers (flexible schedule, possibility to work from home, etc.).

    hours - Model includes a variable of the number of hours worked

    sector - Model includes a variable of the type of employer ownership (public or private)

    informal - Model includes a variable of informal employment

    size_ent - Model includes a variable of the employer size

    min_age_woman - Woman's age (minimum)

    max_age_woman - Woman's age (maximum)

    mean_age_woman - Woman's age (mean)

    restricted - Sample is limited

    private - Model considers only private sector employees

    state - Model considers only public sector employees

    full_time - Model considers only full-time workers

    part_time - Model considers only part-time workers

    better_educated - Model considers only women with a high level of education

    lower_educated - Model considers only women with a low level of education

    married - Model includes only married women

    single - Model includes only single women

    natives - Model includes only native women (born in the country)

    immigrants - Model includes only immigrant women (born abroad)

    race - Model includes only women of a particular race

    min_year - Time period (minimum year)

    max_year - Time period (maximum year)

    journal - Type of publication

    usa - Sample includes women from the USA

    western_europe - Sample includes women from Western Europe (Belgium, France, Germany, Luxembourg, the Netherlands, Switzerland)

    north_europe - Sample includes women from Northern Europe (Denmark, Finland, Norway, Sweden)

    south_europe - Sample includes women from Southern Europe (Greece, Italy, Portugal, Spain)

    east_centre_europe - Sample includes women from Central or Eastern Europe (Czechia, Hungary, Poland, Russia, Serbia, Ukraine)

    china - Sample includes women from China

    Russia - Sample includes women from Russia

    others - Sample includes women from other countries

    country - Country name

  12. r

    A generalized focused information criterion for GMM (replication data)

    • resodate.org
    Updated Oct 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Minsu Chang (2025). A generalized focused information criterion for GMM (replication data) [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9qb3VybmFsZGF0YS56YncuZXUvZGF0YXNldC9hLWdlbmVyYWxpemVkLWZvY3VzZWQtaW5mb3JtYXRpb24tY3JpdGVyaW9uLWZvci1nbW0=
    Explore at:
    Dataset updated
    Oct 2, 2025
    Dataset provided by
    ZBW
    ZBW Journal Data Archive
    Journal of Applied Econometrics
    Authors
    Minsu Chang
    Description

    This paper proposes a criterion for simultaneous generalized method of moments model and moment selection: the generalized focused information criterion (GFIC). Rather than attempting to identify the true specification, the GFIC chooses from a set of potentially misspecified moment conditions and parameter restrictions to minimize the mean squared error (MSE) of a user-specified target parameter. The intent of the GFIC is to formalize a situation common in applied practice. An applied researcher begins with a set of fairly weak baseline assumptions, assumed to be correct, and must decide whether to impose any of a number of stronger, more controversial suspect assumptions that yield parameter restrictions, additional moment conditions, or both. Provided that the baseline assumptions identify the model, we show how to construct an asymptotically unbiased estimator of the asymptotic MSE to select over these suspect assumptions: the GFIC. We go on to provide results for postselection inference and model averaging that can be applied both to the GFIC and various alternative selection criteria. To illustrate how our criterion can be used in practice, we specialize the GFIC to the problem of selecting over exogeneity assumptions and lag lengths in a dynamic panel model, and show that it performs well in simulations. We conclude by applying the GFIC to a dynamic panel data model for the price elasticity of cigarette demand.

  13. f

    Data from: Change point detection in SCAD-penalized dynamic panel models

    • tandf.figshare.com
    pdf
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chao Gu; Suthakaran Ratnasingam (2025). Change point detection in SCAD-penalized dynamic panel models [Dataset]. http://doi.org/10.6084/m9.figshare.29345152.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 17, 2025
    Dataset provided by
    Taylor & Francis
    Authors
    Chao Gu; Suthakaran Ratnasingam
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We propose a cumulative sum (CUSUM)-based testing procedure to sequentially monitor structural changes in smoothly clipped absolute deviation (SCAD)-penalized dynamic panel models. Initially, this approach uses historical panel data to simultaneously perform variable selection and estimation with the SCAD penalty function. Tests based on CUSUM statistics are conducted to identify any change points in subsequent monitoring data. The consistency of the method and the oracle property of the resulting regularized estimators are examined. The asymptotic properties of the test statistics are established under both the null and alternative hypotheses. Simulations are conducted to demonstrate the performance of the proposed method. Finally, a real data application is provided to illustrate the detection procedure.

  14. f

    Spatial econometric model selection.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Jun 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Duan, Wei; Xu, Dongmei; Zhou, Yutao; Deng, Zhao (2024). Spatial econometric model selection. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001371609
    Explore at:
    Dataset updated
    Jun 5, 2024
    Authors
    Duan, Wei; Xu, Dongmei; Zhou, Yutao; Deng, Zhao
    Description

    Cities are commonly recognized as the immediate hinterland of ports and play a crucial role in fostering the sustainable development of ports. Therefore, it is imperative to investigate the influence of cities on ports. By employing panel data from 2001 to 2021 for both ports and cities in the Bohai Rim region, this study examines the spatial spillover effect of urban economy on port efficiency using the spatial error model (SEM). The findings show that urban economies have a significant spatial spillover effect on port efficiency, but this effect diminishes across different spatial matrices. In particular, the geographical matrix demonstrates a stronger spatial spillover effect of the urban economy on port efficiency. These research findings help to establish a collaborative mechanism for port-city development and provide useful insights for government management decision-making.

  15. Data from: Dynamic Two Stage Modeling for Category-Level and Brand-Level...

    • tandf.figshare.com
    zip
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kei Miyazaki; Takahiro Hoshino; Ulf Böckenholt (2023). Dynamic Two Stage Modeling for Category-Level and Brand-Level Purchases Using Potential Outcome Approach With Bayes Inference [Dataset]. http://doi.org/10.6084/m9.figshare.11387673.v3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Kei Miyazaki; Takahiro Hoshino; Ulf Böckenholt
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We propose an econometric two-stage model for category-level purchase and brand-level purchase that allows for simultaneous brand purchases in the analysis of scanner panel data. The proposed model formulation is consistent with the traditional theory of consumer behavior. We conduct Bayesian estimation with the Markov chain Monte Carlo algorithm for our proposed model. The simulation studies show that previously proposed related models can cause severe bias in predicting future brand choices, while the proposed method can effectively predict them. Additionally in a marketing application, the proposed method can examine brand switching behaviors that existing methods cannot. Moreover, we show that the prediction accuracy of the proposed method is higher than that of existing methods.

  16. o

    Replication data for: Machine Learning Methods for Demand Estimation

    • openicpsr.org
    Updated May 1, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patrick Bajari; Denis Nekipelov; Stephen P. Ryan; Miaoyu Yang (2015). Replication data for: Machine Learning Methods for Demand Estimation [Dataset]. http://doi.org/10.3886/E113366V1
    Explore at:
    Dataset updated
    May 1, 2015
    Dataset provided by
    American Economic Association
    Authors
    Patrick Bajari; Denis Nekipelov; Stephen P. Ryan; Miaoyu Yang
    Description

    We survey and apply several techniques from the statistical and computer science literature to the problem of demand estimation. To improve out-of-sample prediction accuracy, we propose a method of combining the underlying models via linear regression. Our method is robust to a large number of regressors; scales easily to very large data sets; combines model selection and estimation; and can flexibly approximate arbitrary non-linear functions. We illustrate our method using a standard scanner panel data set and find that our estimates are considerably more accurate in out-of-sample predictions of demand than some commonly used alternatives.

  17. w

    General Household Survey, Panel 2023-2024 - Nigeria

    • microdata.worldbank.org
    • microdata.nigerianstat.gov.ng
    • +2more
    Updated Nov 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Bureau of Statistics (NBS) (2024). General Household Survey, Panel 2023-2024 - Nigeria [Dataset]. https://microdata.worldbank.org/index.php/catalog/6410
    Explore at:
    Dataset updated
    Nov 21, 2024
    Dataset authored and provided by
    National Bureau of Statistics (NBS)
    Time period covered
    2023 - 2024
    Area covered
    Nigeria
    Description

    Abstract

    The General Household Survey-Panel (GHS-Panel) is implemented in collaboration with the World Bank Living Standards Measurement Study (LSMS) team as part of the Integrated Surveys on Agriculture (ISA) program. The objectives of the GHS-Panel include the development of an innovative model for collecting agricultural data, interinstitutional collaboration, and comprehensive analysis of welfare indicators and socio-economic characteristics. The GHS-Panel is a nationally representative survey of approximately 5,000 households, which are also representative of the six geopolitical zones. The 2023/24 GHS-Panel is the fifth round of the survey with prior rounds conducted in 2010/11, 2012/13, 2015/16 and 2018/19. The GHS-Panel households were visited twice: during post-planting period (July - September 2023) and during post-harvest period (January - March 2024).

    Geographic coverage

    National

    Analysis unit

    • Households • Individuals • Agricultural plots • Communities

    Universe

    The survey covered all de jure households excluding prisons, hospitals, military barracks, and school dormitories.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The original GHS‑Panel sample was fully integrated with the 2010 GHS sample. The GHS sample consisted of 60 Primary Sampling Units (PSUs) or Enumeration Areas (EAs), chosen from each of the 37 states in Nigeria. This resulted in a total of 2,220 EAs nationally. Each EA contributed 10 households to the GHS sample, resulting in a sample size of 22,200 households. Out of these 22,200 households, 5,000 households from 500 EAs were selected for the panel component, and 4,916 households completed their interviews in the first wave.

    After nearly a decade of visiting the same households, a partial refresh of the GHS‑Panel sample was implemented in Wave 4 and maintained for Wave 5. The refresh was conducted to maintain the integrity and representativeness of the sample. The refresh EAs were selected from the same sampling frame as the original GHS‑Panel sample in 2010. A listing of households was conducted in the 360 EAs, and 10 households were randomly selected in each EA, resulting in a total refresh sample of approximately 3,600 households.

    In addition to these 3,600 refresh households, a subsample of the original 5,000 GHS‑Panel households from 2010 were selected to be included in the new sample. This “long panel” sample of 1,590 households was designed to be nationally representative to enable continued longitudinal analysis for the sample going back to 2010. The long panel sample consisted of 159 EAs systematically selected across Nigeria’s six geopolitical zones.

    The combined sample of refresh and long panel EAs in Wave 5 that were eligible for inclusion consisted of 518 EAs based on the EAs selected in Wave 4. The combined sample generally maintains both the national and zonal representativeness of the original GHS‑Panel sample.

    Sampling deviation

    Although 518 EAs were identified for the post-planting visit, conflict events prevented interviewers from visiting eight EAs in the North West zone of the country. The EAs were located in the states of Zamfara, Katsina, Kebbi and Sokoto. Therefore, the final number of EAs visited both post-planting and post-harvest comprised 157 long panel EAs and 354 refresh EAs. The combined sample is also roughly equally distributed across the six geopolitical zones.

    Mode of data collection

    Computer Assisted Personal Interview [capi]

    Research instrument

    The GHS-Panel Wave 5 consisted of three questionnaires for each of the two visits. The Household Questionnaire was administered to all households in the sample. The Agriculture Questionnaire was administered to all households engaged in agricultural activities such as crop farming, livestock rearing, and other agricultural and related activities. The Community Questionnaire was administered to the community to collect information on the socio-economic indicators of the enumeration areas where the sample households reside.

    GHS-Panel Household Questionnaire: The Household Questionnaire provided information on demographics; education; health; labour; childcare; early child development; food and non-food expenditure; household nonfarm enterprises; food security and shocks; safety nets; housing conditions; assets; information and communication technology; economic shocks; and other sources of household income. Household location was geo-referenced in order to be able to later link the GHS-Panel data to other available geographic data sets (forthcoming).

    GHS-Panel Agriculture Questionnaire: The Agriculture Questionnaire solicited information on land ownership and use; farm labour; inputs use; GPS land area measurement and coordinates of household plots; agricultural capital; irrigation; crop harvest and utilization; animal holdings and costs; household fishing activities; and digital farming information. Some information is collected at the crop level to allow for detailed analysis for individual crops.

    GHS-Panel Community Questionnaire: The Community Questionnaire solicited information on access to infrastructure and transportation; community organizations; resource management; changes in the community; key events; community needs, actions, and achievements; social norms; and local retail price information.

    The Household Questionnaire was slightly different for the two visits. Some information was collected only in the post-planting visit, some only in the post-harvest visit, and some in both visits.

    The Agriculture Questionnaire collected different information during each visit, but for the same plots and crops.

    The Community Questionnaire collected prices during both visits, and different community level information during the two visits.

    Cleaning operations

    CAPI: Wave five exercise was conducted using Computer Assisted Person Interview (CAPI) techniques. All the questionnaires (household, agriculture, and community questionnaires) were implemented in both the post-planting and post-harvest visits of Wave 5 using the CAPI software, Survey Solutions. The Survey Solutions software was developed and maintained by the Living Standards Measurement Unit within the Development Economics Data Group (DECDG) at the World Bank. Each enumerator was given a tablet which they used to conduct the interviews. Overall, implementation of survey using Survey Solutions CAPI was highly successful, as it allowed for timely availability of the data from completed interviews.

    DATA COMMUNICATION SYSTEM: The data communication system used in Wave 5 was highly automated. Each field team was given a mobile modem which allowed for internet connectivity and daily synchronization of their tablets. This ensured that head office in Abuja had access to the data in real-time. Once the interview was completed and uploaded to the server, the data was first reviewed by the Data Editors. The data was also downloaded from the server, and Stata dofile was run on the downloaded data to check for additional errors that were not captured by the Survey Solutions application. An excel error file was generated following the running of the Stata dofile on the raw dataset. Information contained in the excel error files were then communicated back to respective field interviewers for their action. This monitoring activity was done on a daily basis throughout the duration of the survey, both in the post-planting and post-harvest.

    DATA CLEANING: The data cleaning process was done in three main stages. The first stage was to ensure proper quality control during the fieldwork. This was achieved in part by incorporating validation and consistency checks into the Survey Solutions application used for the data collection and designed to highlight many of the errors that occurred during the fieldwork.

    The second stage cleaning involved the use of Data Editors and Data Assistants (Headquarters in Survey Solutions). As indicated above, once the interview is completed and uploaded to the server, the Data Editors review completed interview for inconsistencies and extreme values. Depending on the outcome, they can either approve or reject the case. If rejected, the case goes back to the respective interviewer’s tablet upon synchronization. Special care was taken to see that the households included in the data matched with the selected sample and where there were differences, these were properly assessed and documented. The agriculture data were also checked to ensure that the plots identified in the main sections merged with the plot information identified in the other sections. Additional errors observed were compiled into error reports that were regularly sent to the teams. These errors were then corrected based on re-visits to the household on the instruction of the supervisor. The data that had gone through this first stage of cleaning was then approved by the Data Editor. After the Data Editor’s approval of the interview on Survey Solutions server, the Headquarters also reviews and depending on the outcome, can either reject or approve.

    The third stage of cleaning involved a comprehensive review of the final raw data following the first and second stage cleaning. Every variable was examined individually for (1) consistency with other sections and variables, (2) out of range responses, and (3) outliers. However, special care was taken to avoid making strong assumptions when resolving potential errors. Some minor errors remain in the data where the diagnosis and/or solution were unclear to the data cleaning team.

    Response

  18. r

    Incentive effects in the demand for health care: a bivariate panel count...

    • resodate.org
    Updated Oct 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Regina T. Riphahn (2025). Incentive effects in the demand for health care: a bivariate panel count data estimation (replication data) [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9qb3VybmFsZGF0YS56YncuZXUvZGF0YXNldC9pbmNlbnRpdmUtZWZmZWN0cy1pbi10aGUtZGVtYW5kLWZvci1oZWFsdGgtY2FyZS1hLWJpdmFyaWF0ZS1wYW5lbC1jb3VudC1kYXRhLWVzdGltYXRpb24=
    Explore at:
    Dataset updated
    Oct 2, 2025
    Dataset provided by
    ZBW
    ZBW Journal Data Archive
    Journal of Applied Econometrics
    Authors
    Regina T. Riphahn
    Description

    This paper contributes in three dimensions to the literature on health care demand. First, it features the first application of a bivariate random effects estimator in a count data setting, to permit the efficient estimation of this type of model with panel data. Second, it provides an innovative test of adverse selection and confirms that high-risk individuals are more likely to acquire supplemental add-on insurance. Third, the estimations yield that in accordance with the theory of moral hazard, we observe a much lower frequency of doctor visits among the self-employed, and among mothers of small children.

  19. Data from: Quantile Co-Movement in Financial Markets: A Panel Quantile Model...

    • tandf.figshare.com
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tomohiro Ando; Jushan Bai (2023). Quantile Co-Movement in Financial Markets: A Panel Quantile Model With Unobserved Heterogeneity [Dataset]. http://doi.org/10.6084/m9.figshare.7461701.v3
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Tomohiro Ando; Jushan Bai
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This article introduces a new procedure for analyzing the quantile co-movement of a large number of financial time series based on a large-scale panel data model with factor structures. The proposed method attempts to capture the unobservable heterogeneity of each of the financial time series based on sensitivity to explanatory variables and to the unobservable factor structure. In our model, the dimension of the common factor structure varies across quantiles, and the explanatory variables is allowed to depend on the factor structure. The proposed method allows for both cross-sectional and serial dependence, and heteroscedasticity, which are common in financial markets. We propose new estimation procedures for both frequentist and Bayesian frameworks. Consistency and asymptotic normality of the proposed estimator are established. We also propose a new model selection criterion for determining the number of common factors together with theoretical support. We apply the method to analyze the returns for over 6000 international stocks from over 60 countries during the subprime crisis, European sovereign debt crisis, and subsequent period. The empirical analysis indicates that the common factor structure varies across quantiles. We find that the common factors for the quantiles and the common factors for the mean are different. Supplementary materials for this article are available online.

  20. w

    General Household Survey, Panel 2018-2019, Wave 4 - Nigeria

    • microdata.worldbank.org
    • catalog.ihsn.org
    • +2more
    Updated Oct 5, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Bureau of Statistics (NBS) (2021). General Household Survey, Panel 2018-2019, Wave 4 - Nigeria [Dataset]. https://microdata.worldbank.org/index.php/catalog/3557
    Explore at:
    Dataset updated
    Oct 5, 2021
    Dataset authored and provided by
    National Bureau of Statistics (NBS)
    Time period covered
    2018 - 2019
    Area covered
    Nigeria
    Description

    Abstract

    The General Household Survey-Panel (GHS-Panel) is implemented in collaboration with the World Bank Living Standards Measurement Study (LSMS) team as part of the Integrated Surveys on Agriculture (ISA) program. The objectives of the GHS-Panel include the development of an innovative model for collecting agricultural data, interinstitutional collaboration, and comprehensive analysis of welfare indicators and socio-economic characteristics. The GHS-Panel is a nationally representative survey of approximately 5,000 households, which are also representative of the six geopolitical zones. The 2018/19 is the fourth round of the survey with prior rounds conducted in 2010/11, 2012/13, and 2015/16. GHS-Panel households were visited twice: first after the planting season (post-planting) between July and September 2018 and second after the harvest season (post-harvest) between January and February 2019.

    Geographic coverage

    National

    Analysis unit

    • Households
    • Individuals
    • Agricultural plots
    • Communities

    Universe

    The survey covered all de jure households excluding prisons, hospitals, military barracks, and school dormitories.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The original GHS-Panel sample of 5,000 households across 500 enumeration areas (EAs) and was designed to be representative at the national level as well as at the zonal level. The complete sampling information for the GHS-Panel is described in the Basic Information Document for GHS-Panel 2010/2011. However, after a nearly a decade of visiting the same households, a partial refresh of the GHS-Panel sample was implemented in Wave 4.

    For the partial refresh of the sample, a new set of 360 EAs were randomly selected which consisted of 60 EAs per zone. The refresh EAs were selected from the same sampling frame as the original GHS-Panel sample in 2010 (the “master frame”). A listing of all households was conducted in the 360 EAs and 10 households were randomly selected in each EA, resulting in a total refresh sample of approximated 3,600 households.

    In addition to these 3,600 refresh households, a subsample of the original 5,000 GHS-Panel households from 2010 were selected to be included in the new sample. This “long panel” sample was designed to be nationally representative to enable continued longitudinal analysis for the sample going back to 2010. The long panel sample consisted of 159 EAs systematically selected across the 6 geopolitical Zones. The systematic selection ensured that the distribution of EAs across the 6 Zones (and urban and rural areas within) is proportional to the original GHS-Panel sample. Interviewers attempted to interview all households that originally resided in the 159 EAs and were successfully interviewed in the previous visit in 2016. This includes households that had moved away from their original location in 2010. In all, interviewers attempted to interview 1,507 households from the original panel sample.

    The combined sample of refresh and long panel EAs consisted of 519 EAs. The total number of households that were successfully interviewed in both visits was 4,976.

    Sampling deviation

    While the combined sample generally maintains both national and Zonal representativeness of the original GHS-Panel sample, the security situation in the North East of Nigeria prevented full coverage of the Zone. Due to security concerns, rural areas of Borno state were fully excluded from the refresh sample and some inaccessible urban areas were also excluded. Security concerns also prevented interviewers from visiting some communities in other parts of the country where conflict events were occurring. Refresh EAs that could not be accessed were replaced with another randomly selected EA in the Zone so as not to compromise the sample size. As a result, the combined sample is representative of areas of Nigeria that were accessible during 2018/19. The sample will not reflect conditions in areas that were undergoing conflict during that period. This compromise was necessary to ensure the safety of interviewers.

    Mode of data collection

    Computer Assisted Personal Interview [capi]

    Research instrument

    The GHS-Panel Wave 4 consists of three questionnaires for each of the two visits. The Household Questionnaire was administered to all households in the sample. The Agriculture Questionnaire was administered to all households engaged in agricultural activities such as crop farming, livestock rearing and other agricultural and related activities. The Community Questionnaire was administered to the community to collect information on the socio-economic indicators of the enumeration areas where the sample households reside.

    GHS-Panel Household Questionnaire: The Household Questionnaire provides information on demographics; education; health (including anthropometric measurement for children); labor; food and non-food expenditure; household nonfarm income-generating activities; food security and shocks; safety nets; housing conditions; assets; information and communication technology; and other sources of household income. Household location is geo-referenced in order to be able to later link the GHS-Panel data to other available geographic data sets.

    GHS-Panel Agriculture Questionnaire: The Agriculture Questionnaire solicits information on land ownership and use; farm labor; inputs use; GPS land area measurement and coordinates of household plots; agricultural capital; irrigation; crop harvest and utilization; animal holdings and costs; and household fishing activities. Some information is collected at the crop level to allow for detailed analysis for individual crops.

    GHS-Panel Community Questionnaire: The Community Questionnaire solicits information on access to infrastructure; community organizations; resource management; changes in the community; key events; community needs, actions and achievements; and local retail price information.

    The Household Questionnaire is slightly different for the two visits. Some information was collected only in the post-planting visit, some only in the post-harvest visit, and some in both visits.

    The Agriculture Questionnaire collects different information during each visit, but for the same plots and crops.

    Cleaning operations

    CAPI: For the first time in GHS-Panel, the Wave four exercise was conducted using Computer Assisted Person Interview (CAPI) techniques. All the questionnaires, household, agriculture and community questionnaires were implemented in both the post-planting and post-harvest visits of Wave 4 using the CAPI software, Survey Solutions. The Survey Solutions software was developed and maintained by the Survey Unit within the Development Economics Data Group (DECDG) at the World Bank. Each enumerator was given tablets which they used to conduct the interviews. Overall, implementation of survey using Survey Solutions CAPI was highly successful, as it allowed for timely availability of the data from completed interviews.

    DATA COMMUNICATION SYSTEM: The data communication system used in Wave 4 was highly automated. Each field team was given a mobile modem allow for internet connectivity and daily synchronization of their tablet. This ensured that head office in Abuja has access to the data in real-time. Once the interview is completed and uploaded to the server, the data is first reviewed by the Data Editors. The data is also downloaded from the server, and Stata dofile was run on the downloaded data to check for additional errors that were not captured by the Survey Solutions application. An excel error file is generated following the running of the Stata dofile on the raw dataset. Information contained in the excel error files are communicated back to respective field interviewers for action by the interviewers. This action is done on a daily basis throughout the duration of the survey, both in the post-planting and post-harvest.

    DATA CLEANING: The data cleaning process was done in three main stages. The first stage was to ensure proper quality control during the fieldwork. This was achieved in part by incorporating validation and consistency checks into the Survey Solutions application used for the data collection and designed to highlight many of the errors that occurred during the fieldwork.

    The second stage cleaning involved the use of Data Editors and Data Assistants (Headquarters in Survey Solutions). As indicated above, once the interview is completed and uploaded to the server, the Data Editors review completed interview for inconsistencies and extreme values. Depending on the outcome, they can either approve or reject the case. If rejected, the case goes back to the respective interviewer’s tablet upon synchronization. Special care was taken to see that the households included in the data matched with the selected sample and where there were differences, these were properly assessed and documented. The agriculture data were also checked to ensure that the plots identified in the main sections merged with the plot information identified in the other sections. Additional errors observed were compiled into error reports that were regularly sent to the teams. These errors were then corrected based on re-visits to the household on the instruction of the supervisor. The data that had gone through this first stage of cleaning was then approved by the Data Editor. After the Data Editor’s approval of the interview on Survey Solutions server, the Headquarters also reviews and depending on the outcome, can either reject or approve.

    The third stage of cleaning involved a comprehensive review of the final raw data following

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Alexandre Belloni; Victor Chernozhukov; Christian Hansen; Damian Kozbur (2023). Inference in High-Dimensional Panel Models With an Application to Gun Control [Dataset]. http://doi.org/10.6084/m9.figshare.1604934.v2
Organization logo

Data from: Inference in High-Dimensional Panel Models With an Application to Gun Control

Related Article
Explore at:
text/x-texAvailable download formats
Dataset updated
May 30, 2023
Dataset provided by
Taylor & Francishttps://taylorandfrancis.com/
Authors
Alexandre Belloni; Victor Chernozhukov; Christian Hansen; Damian Kozbur
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

We consider estimation and inference in panel data models with additive unobserved individual specific heterogeneity in a high-dimensional setting. The setting allows the number of time-varying regressors to be larger than the sample size. To make informative estimation and inference feasible, we require that the overall contribution of the time-varying variables after eliminating the individual specific heterogeneity can be captured by a relatively small number of the available variables whose identities are unknown. This restriction allows the problem of estimation to proceed as a variable selection problem. Importantly, we treat the individual specific heterogeneity as fixed effects which allows this heterogeneity to be related to the observed time-varying variables in an unspecified way and allows that this heterogeneity may differ for all individuals. Within this framework, we provide procedures that give uniformly valid inference over a fixed subset of parameters in the canonical linear fixed effects model and over coefficients on a fixed vector of endogenous variables in panel data instrumental variable models with fixed effects and many instruments. We present simulation results in support of the theoretical developments and illustrate the use of the methods in an application aimed at estimating the effect of gun prevalence on crime rates.

Search
Clear search
Close search
Google apps
Main menu