Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
While fixed effects (FE) models are often employed to address potential omitted variables, we argue that these models’ real utility is in isolating a particular dimension of variance from panel data for analysis. In addition, we show through novel mathematical decomposition and simulation that only one-way FE models cleanly capture either the over-time or cross-sectional dimensions in panel data, while the two-way FE model unhelpfully combines within-unit and cross-sectional variation in a way that produces un-interpretable answers. In fact, as we show in this paper, if we begin with the interpretation that many researchers wrongly assign to the two-way FE model—that it represents a single estimate of X on Y while accounting for unit-level heterogeneity and time shocks—the two-way FE specification is statistically unidentified, a fact that statistical software packages like R and Stata obscure through internal matrix processing.
Facebook
TwitterPanel data possess several advantages over conventional cross-sectional and time-series data, including their power to isolate the effects of specific actions, treatments, and general policies often at the core of large-scale econometric development studies. While the concept of panel data alone provides the capacity for modelling the complexities of human behaviour, the notion of universal panel data - in which time- and situation-driven variances leading to variations in tools, and thus results, are mitigated - can further enhance exploitation of the richness of panel information. The NPS Universal Panel Questionnaire (UPQ) consists of both survey instruments and datasets, meticulously aligned and engineered with the aim of facilitating the use of and improving access to the wealth of panel data offered by the NPS. The NPS-UPQ provides a consistent and straightforward means of conducting not only user-driven analyses using convenient, standardized tools, but also for monitoring MKUKUTA, FYDP II, and other national level development indicators reported by the NPS.
The design of the NPS-UPQ combines the four completed rounds of the NPS - NPS 2008/09 (R1), NPS 2010/11 (R2), NPS 2012/13 (R3), and NPS 2014/15 (R4) - into pooled, module-specific survey instruments and datasets. The panel survey instruments offer the ease of comparability over time, with modifications and variances easily identifiable as well as those aspects of the questionnaire which have remained identical and offer consistent information. By providing all module-specific data over time within compact, pooled datasets, panel datasets eliminate the need for user-generated merges between rounds and present data in a clear, logical format, increasing both the usability and comprehension of complex data.
Regional coverage
Households
The universe includes all households and individuals in Tanzania with the exception of those residing in military barracks or other institutions.
Sample survey data [ssd]
SAMPLING PROCEDURE While the same sample of respondents was maintained over the first three rounds of the NPS, longitudinal surveys tend to suffer from bias introduced by households leaving the survey over time, i.e. attrition. Although the NPS maintains a highly successful recapture rate (roughly 96% retention at the household level), minimizing the escalation of this selection bias, a refresh of longitudinal cohorts was done for the NPS 2014/15 to ensure proper representativeness of estimates while maintaining a sufficient primary sample to maintain cohesion within panel analysis. A newly completed Population and Housing Census (PHC) in 2012, providing updated population figures along with changes in administrative boundaries, emboldened the opportunity to realign the NPS sample and abate collective bias potentially introduced through attrition.
To maintain the panel concept of the NPS, the sample design for NPS 2014/2015 consisted of a combination of the original NPS sample and a new NPS sample. A nationally representative sub-sample was selected to continue as part of the “Extended Panel” while an entirely new sample, “Refresh Panel”, was selected to represent national and sub-national domains. Similar to the sample in NPS 2008/2009, the sample design for the “Refresh Panel” allows analysis at four primary domains of inference, namely: Dar es Salaam, other urban areas on mainland Tanzania, rural mainland Tanzania, and Zanzibar. This new cohort in NPS 2014/2015 will be maintained and tracked in all future rounds between national censuses.
Face-to-face [f2f]
The format of the NPS-UPQ survey instrument is similar to previously disseminated NPS survey instruments. Each module has a questionnaire and clearly identifies if the module collects information at the individual or household level. Within each module-specific questionnaire of the NPS-UPQ survey instrument, there are five distinct sections, arranged vertically: (1) the UPQ - “U” on the survey instrument, (2) R4, (3), R3, (4) R2, and (5) R1 – the latter 4 sections presenting each questionnaire in its original form at time of its respective dissemination.
The uppermost section of each module’s questionnaire (“U”) represents the model universal panel questionnaire, with questions generated from the comprehensive listing of questions across all four rounds of the NPS and codes generated from the comprehensive collection of codes. The following sections are arranged vertically by round, considering R4 as most recent. While not all rounds will have data reported for each question in the UPQ and not each question will have reports for each of the UPQ codes listed, the NPS-UPQ survey instrument represents the visual, all-inclusive set of information collected by the NPS over time.
The four round-specific sections (R4, R3, R2, R1) are aligned with their UPQ-equivalent question, visually presenting their contribution to compatibility with the UPQ. Each round-specific section includes the original round-specific variable names, response codes and skip patterns (corresponding to their respective round-specific NPS data sets, and despite their variance from other rounds or from the comprehensive UPQ code listing)4.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We include Stata syntax (dummy_dataset_create.do) that creates a panel dataset for negative binomial time series regression analyses, as described in our paper "Examining methodology to identify patterns of consulting in primary care for different groups of patients before a diagnosis of cancer: an exemplar applied to oesophagogastric cancer". We also include a sample dataset for clarity (dummy_dataset.dta), and a sample of that data in a spreadsheet (Appendix 2).
The variables contained therein are defined as follows:
case: binary variable for case or control status (takes a value of 0 for controls and 1 for cases).
patid: a unique patient identifier.
time_period: A count variable denoting the time period. In this example, 0 denotes 10 months before diagnosis with cancer, and 9 denotes the month of diagnosis with cancer,
ncons: number of consultations per month.
period0 to period9: 10 unique inflection point variables (one for each month before diagnosis). These are used to test which aggregation period includes the inflection point.
burden: binary variable denoting membership of one of two multimorbidity burden groups.
We also include two Stata do-files for analysing the consultation rate, stratified by burden group, using the Maximum likelihood method (1_menbregpaper.do and 2_menbregpaper_bs.do).
Note: In this example, for demonstration purposes we create a dataset for 10 months leading up to diagnosis. In the paper, we analyse 24 months before diagnosis. Here, we study consultation rates over time, but the method could be used to study any countable event, such as number of prescriptions.
Facebook
TwitterIn this paper we study neural networks and their approximating power in panel data models. We provide asymptotic guarantees on deep feed-forward neural network estimation of the conditional mean, building on the work of Farrell et al. (2021), and explore latent patterns in the cross-section. We use the proposed estimators to forecast the progression of new COVID-19 cases across the G7 countries during the pandemic. We find significant forecasting gains over both linear panel and nonlinear time-series models. Containment or lockdown policies, as instigated at the national level by governments, are found to have out-of-sample predictive power for new COVID-19 cases. We illustrate how the use of partial derivatives can help open the “black box” of neural networks and facilitate semi-structural analysis: school and workplace closures are found to have been effective policies at restricting the progression of the pandemic across the G7 countries. But our methods illustrate significant heterogeneity and time variation in the effectiveness of specific containment policies.
Facebook
TwitterPanel data possess several advantages over conventional cross-sectional and time-series data, including their power to isolate the effects of specific actions, treatments, and general policies often at the core of large-scale econometric development studies. While the concept of panel data alone provides the capacity for modeling the complexities of human behavior, the notion of universal panel data – in which time- and situation-driven variances leading to variations in tools, and thus results, are mitigated – can further enhance exploitation of the richness of panel information.
The Basic Information Document (BID) provides a brief overview of the Nigerian General Household Survey (GHS) but focuses primarily on the theoretical development and application of panel data, as well as key elements of the universal panel survey instrument and datasets generated by the four rounds of the GHS. As the BID does not describe in detail the background, development, or use of the GHS itself, the wave-specific GHS BIDs should supplement the information provided here.
The Nigeria Universal Panel Data (NUPD) consists of both survey instruments and datasets from the two survey visits of the GHS - Post-Planting (PP) and Post-Harvest (PH) - meticulously aligned and engineered with the aim of facilitating the use of and improving access to the wealth of panel data offered by the GHS. The NUPD provides a consistent and straightforward means of conducting user-driven analyses using convenient, standardized tools.
The design of the NUPD combines the four completed Waves of the GHS Household Post-Planting and Post-Harvest Surveys – Wave 1 (2010/11), Wave 2 (2012/13), Wave 3 (2015/16), and Wave 4 (2018/19) – into pooled, module-specific survey instruments and datasets. The panel survey instruments offer the ease of comparability over time, with modifications and variances easily identifiable as well as those aspects of the questionnaire which have remained identical and offer consistent information. By providing all module-specific data over time within compact, pooled datasets, panel datasets eliminate the need for user-generated merges between rounds and present data in a clear, logical format, increasing both the usability and comprehension of complex data.
National
The survey covered all de jure households excluding prisons, hospitals, military barracks, and school dormitories.
Sample survey data [ssd]
Please see the GHS BIDs for each round for detailed descriptions of the sample design used in each round and their respective implementation efforts as this is a compilation of datasets from all previous waves.
Face-to-face [f2f]
The larger GHS-Panel project consists of three questionnaires (Household Questionnaire, Agriculture Questionnaire, Community Questionnaire) for each of the two visits (Post-Planting and Post-Harvest). The GHS-NUPD only consists of the Household Questionnaire.
GHS-Panel Household Questionnaire: The Household Questionnaire provides information on demographics; education; health (including anthropometric measurement for children); labor; food and non-food expenditure; household nonfarm income-generating activities; food security and shocks; safety nets; housing conditions; assets; information and communication technology; and other sources of household income.
The Household Questionnaire is slightly different for the two visits. Some information was collected only in the post-planting visit, some only in the post-harvest visit, and some in both visits.
Please see the GHS BIDs for each round for detailed descriptions of data editing and additional data processing efforts as this is a compilation of datasets from all previous waves.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains panel data for a sample of 15 countries (Australia, Austria, Canada, China, Denmark, France, Germany, Israel, Italy, Japan, Republic of Korea, Spain, Sweden, Switzerland and United States) over the period 2006-2015. The series used are available for a small number of developed countries and for a relatively short time period. Solar PV module prices, imports of solar PV panels and public budget for R&D in PV are in real terms and were obtained by dividing them by the United States GDP deflator. The series are obtained from five main sources. Imports value of solar PV panels series are taken from Commodity Trade Statistics database (COMTRADE). PV panels (cells and modules) are a part of the category HS 854140, "Photosensitive Semiconductor Devices, Photovoltaic Cells and Light-Emitting Diodes". Solar PV module prices, cumulative installed PV capacity and public budget for R&D in PV series are constructed from the PVPS report Trends in Photovoltaic Applications of the International Energy Agency (IEA). Population density, political stability index, renewable energy consumption and per capita carbon dioxide emissions series are all obtained from the World Bank (WB). Real GDP per capita series is taken from Federal Reserve Bank of St. Louis (FRED). Technological development in PV and crude oil import price series are drawn from the Organisation for Economic Co-operation and Development (OECD) database. Since crude oil import price series are not available for China and Israel, we use the West Texas Intermediate spot crude oil price as a proxy. The dummy for presence of feed-in tariff is constructed from the OECD database.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains timestamp, voltage, current, and power data for a small 3.3v solar panel collected at ~1Hz. The solar panel is placed in a fixed, fully vertical position facing east on a window on the third floor of a building in Maine, USA. There is no maximum power point tracking controlling the panel, and there is no consideration to load matching in the setup.
The data is collected from to 2024-11-01 to 2024-12-01.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sunspots - Monthly Activity since 1749
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this paper, we consider a partially linear panel data model with nonstationarity and certain cross-sectional dependence. Accounting for the explosive feature of the nonstationary time series, we particularly employ Hermite orthogonal functions in this study. Under a general spatial error dependence structure, we then establish some consistent closed-form estimates for both the unknown parameters and the unknown functions for the cases where N and T go jointly to infinity. Rates of convergence and asymptotic normalities are established for the proposed estimators. Both the finite sample performance and the empirical applications show that the proposed estimation methods work well.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Panel data, also known as longitudinal data, consist of a collection of time series. Each time series, which could itself be multivariate, comprises a sequence of measurements taken on a distinct unit. Mechanistic modeling involves writing down scientifically motivated equations describing the collection of dynamic systems giving rise to the observations on each unit. A defining characteristic of panel systems is that the dynamic interaction between units should be negligible. Panel models therefore consist of a collection of independent stochastic processes, generally linked through shared parameters while also having unit-specific parameters. To give the scientist flexibility in model specification, we are motivated to develop a framework for inference on panel data permitting the consideration of arbitrary nonlinear, partially observed panel models. We build on iterated filtering techniques that provide likelihood-based inference on nonlinear partially observed Markov process models for time series data. Our methodology depends on the latent Markov process only through simulation; this plug-and-play property ensures applicability to a large class of models. We demonstrate our methodology on a toy example and two epidemiological case studies. We address inferential and computational issues arising due to the combination of model complexity and dataset size. Supplementary materials for this article are available online.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Whether democratic and nondemocratic regimes perform differently in social provision policy is an important issue to social scientists and policy makers. Since political regimes are rarely changing, their long-term or dynamic effects on the outcome are of concern to researchers when they evaluate how political regimes affect social policy. However, estimating the dynamic effects of rarely changing variables in the analysis of time-series cross-sectional (TSCS) data by conventional estimators may be problematic when the unit effects are included in the model specification. This article proposes a model to account for and estimate the correlation between the unit effects and explanatory variables. Applying the proposed model to 18 Latin American countries, this article finds evidence that democracy has a positive effect on social spending both in the short and long term.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Spurious regression analysis in panel data when the time series are cross-section dependent is analyzed in the article. The set-up includes (possibly unknown) multiple structural breaks that can affect both the deterministic and the common factor components. We show that consistent estimation of the long-run average parameter is possible once cross-section dependence is controlled using cross-section averages in the spirit of Pesaran’s common correlated effects approach. This result is used to design individual and panel cointegration test statistics that accommodate the presence of structural breaks that can induce parameter instabilities in the deterministic component, the cointegration vector and the common factor loadings.
Facebook
TwitterThe mechanism for the association between democratic development and the wealth gap has always been the focus of political and economic research, yet with no consistent conclusion. The reasons for that often are, 1) challenges to generalize the results obtained from analyzing a single country’s time series studies or multinational cross-section data analysis, and 2) deviations in research results caused by missing values or variable selection in panel data analysis. When it comes to the latter one, there are two factors contribute to it. One is that the accuracy of estimation is interfered with the presence of missing values in variables, another is that subjective discretion that must be exercised to select suitable proxies amongst many candidates, which are likely to cause variable selection bias. In order to solve these problems, this study is the pioneeringly research to utilize the machine learning method to interpolate missing values efficiently through the random forest model in this topic, and effectively analyzed cross-country data from 151 countries covering the period 1993–2017. Since this paper measures the importance of different variables to the dependent variable, more appropriate and important variables could be selected to construct a complete regression model. Results from different models come to a consensus that the promotion of democracy can significantly narrow the gap between the rich and the poor, with marginally decreasing effect with respect to wealth. In addition, the study finds out that this mechanism exists only in non-colonial nations or presidential states. Finally, this paper discusses the potential theoretical and policy implications of results.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset included with this article contains three files describing and defining the sample and variables for VAT impact, and Excel file 1 consists of all raw and filtered data for the variables for the panel data sample. Excel file 2 depicts time-series and cross-sectional data for nonfinancial firms listed on the Saudi market for the second and third quarters of 2019 and the third and fourth quarters of 2020. Excel file 3 presents the raw material of variables used in measuring the company's profitability of the panel data sample
Facebook
Twitterhttps://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Econometric Analysis Software Market size was valued at USD 1.5 Billion in 2024 and is projected to reach USD 3.08 Billion by 2032, growing at a CAGR of 9.4% during the forecast period i.e., 2026–2032.Organizations globally are generating massive volumes of structured and unstructured data, creating unprecedented demand for sophisticated econometric analysis tools.
Facebook
Twitterhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/2.1/customlicense?persistentId=doi:10.7910/DVN/FEW2JPhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/2.1/customlicense?persistentId=doi:10.7910/DVN/FEW2JP
Replication material for Metzger and Jones' "Getting Time Right" (forthcoming, Political Analysis). See "readme.html" in /code folder for further documentation. The CO capsule does not rerun the main simulations, but does provide the raw simulation results from those simulations. Abstract: Logit and probit (L/P) models are a mainstay of binary time-series cross-sectional analyses (BTSCS). Researchers include cubic splines or time polynomials to acknowledge the temporal element inherent in these data. However, L/P models cannot easily accommodate three other aspects of the data’s temporality: whether covariate effects are conditional on time, whether the process of interest is causally complex, and whether our functional form assumption regarding time’s effect is correct. Failing to account for any of these issues amounts to misspecification bias, threatening our inferences’ validity. We argue scholars should consider using Cox duration models when analyzing BTSCS data, as they create fewer opportunities for such misspecification bias, while also having the ability to assess the same hypotheses as L/P. We use Monte Carlo simulations to bring new evidence to light showing Cox models perform just as well—and sometimes better—than logit models in a basic BTSCS setting, and perform considerably better in more complex BTSCS situations. In addition, we highlight a new interpretation technique for Cox models—transition probabilities—to make Cox model results more readily interpretable. We use an application from interstate conflict to demonstrate our points.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data and code to reproduce the results in both the manuscript and the supplementary text. Files are combined in a plain text tar format. A README file describes the contents of each file.
Facebook
Twitterhttps://www.icpsr.umich.edu/web/ICPSR/studies/37072/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/37072/terms
The Monitoring the Future (MTF) project is a long-term epidemiologic and etiologic study of substance use among youth and adults in the United States. It is conducted at the University of Michigan's Institute for Social Research, and funded by a series of investigator-initiated research grants from the National Institute on Drug Abuse. MTF has two components: MTF Main and MTF Panel. From its inception in 1975, the cross-sectional MTF Main study has collected data annually from nationally representative samples of 12,000-19,000 high school seniors in 12th grade located in approximately 135 schools nationwide. Beginning in 1991, similar annual cross-sectional surveys of nationally representative samples of 8th and 10th graders have been conducted. In all, approximately 45,000 students annually respond to about 100 drug use and demographic questions, as well as to about 200 additional questions divided among multiple survey forms on other topics such as attitudes toward government, social institutions, race relations, changing gender roles, educational aspirations, occupational aims, and marital plans. The longitudinal MTF Panel study conducts follow-up surveys with representative subsamples of respondents from each 12th grade cohort participating in MTF Main. From each cohort, a sample of about 2,450 students are selected for longitudinal follow-up, with an oversampling of students who reported prior drug use during their 12th grade survey. Longitudinal follow-up currently spans modal ages 19-30 and 35-60. For surveys at modal ages 19-30, the sample is randomly split into two halves (approx. 1,225 each) to be followed every other year. One half-sample begins its first follow-up the year after high school (at modal age 19), and the other half-sample begins its first follow-up in the second year after high school (at modal age 20). Thus, six young adult follow-up (FU) surveys occur between modal ages 19-30, at modal ages 19/20 (FU1), 21/22 (FU2), 23/24 (FU3), 25/26 (FU4), 27/28 (FU5), and 29/30 (FU6). After age 30, respondents are surveyed every five years: 35, 40, 45, 50, 55, and 60 (these are referred to as FZ surveys). The FZ surveys cover many of the same topics as the 12th grade and FU surveys and include additional questions on life events and health. MTF Panel surveys for the young adults (ages 19-30) were conducted using mailed paper surveys from 1977-2017. In 2018 and 2019, a random half of all those aged 19-30 received a mailed paper survey, while the other half were surveyed using a new procedure that encouraged participation using web surveys (web-push). The FZ surveys (ages 35-60) were conducted using mailed paper surveys through the 2019 data collection. More information about the MTF project can be accessed through the Monitoring the Future website. Annual reports are published by the research team, describing the data collection and trends over time.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set represent the result analysis for the research entitled "The Pattern of International Trade between Bangladesh and USA: Heckscher-Ohlin and Rybczynski Analysis"
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data file is prepared to show the step-by-step data preparation for the paper entitled "A New Ensemble Learning Strategy for Panel Time-Series Forecasting with Applications to Tracking Respiratory Disease Excess Mortality during the COVID-19 pandemic" by Afshin Ashofteh, Jorge M. Bravo, and Mercedes Ayuso. The outputs are available in folders. For detailed information, please refer to the paper.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
While fixed effects (FE) models are often employed to address potential omitted variables, we argue that these models’ real utility is in isolating a particular dimension of variance from panel data for analysis. In addition, we show through novel mathematical decomposition and simulation that only one-way FE models cleanly capture either the over-time or cross-sectional dimensions in panel data, while the two-way FE model unhelpfully combines within-unit and cross-sectional variation in a way that produces un-interpretable answers. In fact, as we show in this paper, if we begin with the interpretation that many researchers wrongly assign to the two-way FE model—that it represents a single estimate of X on Y while accounting for unit-level heterogeneity and time shocks—the two-way FE specification is statistically unidentified, a fact that statistical software packages like R and Stata obscure through internal matrix processing.