Our Time-Series Cross-Sectional data combine various online databases. Country names were first identified and matched using R-package “countrycode” (Arel-Bundock, Enevoldsen, & Yetman, 2018) before all datasets were merged. Occasionally, we modified unidentified country names to be consistent across datasets. We then transformed “wide” data into “long” data and merged them using R’s Tidyverse framework (Wickham, 2014). Our analysis begins with the year 1949, which was occasioned by the fact that one of the key time-variant level-1 variables, pathogen prevalence was only available from 1949 on. See our Supplemental Material for all data, Stata syntax, R-markdown for visualization, supplemental analyses and detailed results (available at https://osf.io/drt8j/).
Researchers of time series cross sectional (TSCS) data regularly face the change-point problem, which re- quires them to discern between significant parametric shifts that can be deemed structural changes and minor parametric shifts that must be considered noise. In this paper, we develop a general Bayesian method for change-point detection in high dimensional data and present its application in the context of the fixed-effect model. Our proposed method, hidden Markov Bayesian bridge model (HMBB), jointly estimates high dimensional regime-specific parameters and hidden regime transitions in a unified way. We apply our method to Alvarez, Garrett, and Lange (1991)’s study of the relationship between government partisanship and economic growth and Allee and Scalera (2012)’s study of membership effects in international organizations. In both applications, we found that the proposed method successfully identify substantively meaningful temporal heterogeneity in parameters of regression models.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Multifactor error structures utilize factor analysis to deal with complex cross-sectional dependence in Time-Series Cross-Sectional data caused by cross-level interactions. The multifactor error structure specification is a generalization of the fixed-effects model. This paper extends the existing multifactor error models from panel econometrics to multilevel modeling, from linear setups to generalized linear models with the probit and logistic links, and from assuming serial independence to modeling the error dynamics with an autoregressive process. I develop Markov Chain Monte Carlo algorithms mixed with a rejection sampling scheme to estimate the multilevel multifactor error structure model with a p-th order autoregressive process in linear, probit, and logistic specifications. I conduct several Monte Carlo studies to compare the performance of alternative specifications and approaches with varying degrees of data complication and different sample sizes. The Monte Carlo studies provide guidance on when and how to apply the proposed model. An empirical application sovereign default demonstrates how the proposed approach can accommodate a complex pattern of cross-sectional dependence and helps answer research questions related to units' sensitivity or vulnerability to systemic shocks.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
When estimating hedonic models of housing prices, the use of time series cross-section repeat sales data can provide improvements in estimator efficiency and correct for unobserved characteristics. However, in cases where serial correlation is present, the irregular timing of sales should also be considered. In this paper we develop a model that uses information on the timing of events to account for the sporadic occurrence of events. The model presumes that the serial correlation process can be decomposed into a time-independent (event-wise) component and a time-dependent (time-wise) component. Empirical tests cannot reject the presence of sporadic correlation patterns, while simulations show that the failure to account for sporadic correlation leads to significant losses in efficiency, and that the losses from ignoring sporadic correlation when it exists are larger than losses when sporadic correlation is falsely assumed.
Companion files for: 2014. Jessica Fortin-Rittberger. “Time-Series Cross-Section” in Henning Best and Christof Wolf (Eds.), The SAGE Handbook of Regression Analysis and Causal Inference, Sage Publishers. DOI: http://dx.doi.org/10.4135/9781446288146.n17 data file (Norris, P. (2009). Democracy timeseries data release 3.0. http://www.pippanorris.com/) and Stata do file
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/5.1/customlicense?persistentId=doi:10.7910/DVN/GGUR0Phttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/5.1/customlicense?persistentId=doi:10.7910/DVN/GGUR0P
Applications of modern methods for analyzing data with missing values, based primarily on multiple imputation, have in the last half-decade become common in American politics and political behavior. Scholars in these fields have thus increasingly avoided the biases and inefficiencies caused by ad hoc methods like listwise deletion and best guess imputation. However, researchers in much of comparative politics and international relations, and others with similar data, have been unable to do the same because the best available imputation methods work poorly with the time-series cross-section data structures common in these fields. We attempt to rectify this situation. First, we build a multiple i mputation model that allows smooth time trends, shifts across cross-sectional units, and correlations over time and space, resulting in far more accurate imputations. Second, we build nonignorable missingness models by enabling analysts to incorporate knowledge from area studies experts via priors on individual missing cell values, rather than on difficult-to-interpret model parameters. Third, since these tasks could not be accomplished within existing imputation algorithms, in that they cannot handle as many variables as needed even in the simpler cross-sectional data for which they were designed, we also develop a new algorithm that substantially expands the range of computationally feasible data types and sizes for which multiple imputation can be used. These developments also made it possible to implement the methods introduced here in freely available open source software that is considerably more reliable than existing strategies. These developments also made it possible to implement the methods introduced here in freely available open source software, Amelia II: A Program for Missing Data, that is considerably more reliable than existing strategies. See also: Missing Data
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
R Scripts contain statistical data analisys for streamflow and sediment data, including Flow Duration Curves, Double Mass Analysis, Nonlinear Regression Analysis for Suspended Sediment Rating Curves, Stationarity Tests and include several plots.
Across the social sciences scholars regularly pool effects over substantial periods of time, a practice that produces faulty inferences if the underlying data generating process is dynamic. To help researchers better perform principled analyses of time-varying processes, we develop a two-stage procedure based upon techniques for permutation testing and statistical process monitoring. Given time series cross-sectional data, we break the role of time through permutation inference and produce a null distribution that reflects a time-invariant data generating process. The null distribution then serves as a stable reference point, enabling the detection of effect changepoints. In Monte Carlo simulations our randomization technique outperforms alternatives for changepoint analysis. A particular benefit of our method is that, by establishing the bounds for time-invariant effects before interacting with actual estimates, it is able to differentiate stochastic fluctuations from genuine changes. We demonstrate the method's utility by applying it to a popular study on the relationship between alliances and the initiation of militarized interstate disputes. The example illustrates how the technique can help researchers make inferences about where changes occur in dynamic relationships and ask important questions about such changes.
This collection contains an array of economic time series data pertaining to the United States, the United Kingdom, Germany, and France, primarily between the 1920s and the 1960s, and including some time series from the 18th and 19th centuries. These data were collected by the National Bureau of Economic Research (NBER), and they constitute a research resource of importance to economists as well as to political scientists, sociologists, and historians. Under a grant from the National Science Foundation, ICPSR and the National Bureau of Economic Research converted this collection (which existed heretofore only on handwritten sheets stored in New York) into fully accessible, readily usable, and completely documented machine-readable form. The NBER collection -- containing an estimated 1.6 million entries -- is divided into 16 major categories: (1) construction, (2) prices, (3) security markets, (4) foreign trade, (5) income and employment, (6) financial status of business, (7) volume of transactions, (8) government finance, (9) distribution of commodities, (10) savings and investments, (11) transportation and public utilities, (12) stocks of commodities, (13) interest rates, and (14) indices of leading, coincident, and lagging indicators, (15) money and banking, and (16) production of commodities. Data from all categories are available in Parts 1-22. The economic variables are usually observations on the entire nation or large subsets of the nation. Frequently, however, and especially in the United States, separate regional and metropolitan data are included in other variables. This makes cross-sectional analysis possible in many cases. The time span of variables in these files may be as short as one year or as long as 160 years. Most data pertain to the first half of the 20th century. Many series, however, extend into the 19th century, and a few reach into the 18th. The oldest series, covering brick production in England and Wales, begins in 1785, and the most recent United States data extend to 1968. The unit of analysis is an interval of time -- a year, a quarter, or a month. The bulk of observations are monthly, and most series of monthly data contain annual values or totals. (Source: downloaded from ICPSR 7/13/10)
Please Note: This dataset is part of the historical CISER Data Archive Collection and is also available at ICPSR -- https://doi.org/10.3886/ICPSR07644.v2. We highly recommend using the ICPSR version as they made this dataset available in multiple data formats.
This replication file contains data and source code to replicate the results in "Generalized Synthetic Control Method: Causal Inference with Interactive Fixed Effects Models" by Yiqing Xu
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Two couples of dataset and R codes used in my publication with the same title.sf.xlsx: the time series datasetSan Francisco.R: R codes used to analyze sf.xlsx27zipcodes.xls: the panel dataset27zipcodes.R: R codes used to analyze 27zipcodes.xls
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This article deals with a variety of dynamic issues in the analysis of time-series-cross-section (TSCS) data. Although the issues raised are general, we focus on applications to comparative political economy, which frequently uses TSCS data. We begin with a discussion of spec- ification and lay out the theoretical differences implied by the various types of dynamic models that can be estimated. It is shown that there is nothing pernicious in using a lagged dependent variable and that all dynamic models either implicitly or explicitly have such a variable; the differences between the models relate to assumptions about the speeds of adjustment of measured and unmeasured variables. When adjust- ment is quick, it is hard to differentiate between the various models; with slower speeds of adjustment, the various models make sufficiently different predictions that they can be tested against each other. As the speed of adjustment gets slower and slower, specification (and estima- tion) gets more and more tricky. We then turn to a discussion of esti- mation. It is noted that models with both a lagged dependent variable and serially correlated errors can easily be estimated; it is only ordi- nary least squares that is inconsistent in this situation. There is a brief discussion of lagged dependent variables combined with fixed effects and issues related to non-stationarity. We then show how our favored method of modeling dynamics combines nicely with methods for deal- ing with other TSCS issues, such as parameter heterogeneity and spatial dependence. We conclude with two examples.
The QoG Institute is an independent research institute within the Department of Political Science at the University of Gothenburg. Overall 30 researchers conduct and promote research on the causes, consequences and nature of Good Governance and the Quality of Government - that is, trustworthy, reliable, impartial, uncorrupted and competent government institutions.
The main objective of our research is to address the theoretical and empirical problem of how political institutions of high quality can be created and maintained. A second objective is to study the effects of Quality of Government on a number of policy areas, such as health, the environment, social policy, and poverty.
The dataset was created as part of a research project titled “Quality of Government and the Conditions for Sustainable Social Policy”. The aim of the dataset is to promote cross-national comparative research on social policy output and its correlates, with a special focus on the connection between social policy and Quality of Government (QoG).
The data comes in three versions: one cross-sectional dataset, and two cross-sectional time-series datasets for a selection of countries. The two combined datasets are called “long” (year 1946-2009) and “wide” (year 1970-2005).
The data contains six types of variables, each provided under its own heading in the codebook: Social policy variables, Tax system variables, Social Conditions, Public opinion data, Political indicators, Quality of government variables.
QoG Social Policy Dataset can be downloaded from the Data Archive of the QoG Institute at http://qog.pol.gu.se/data/datadownloads/data-archive Its variables are now included in QoG Standard.
Purpose:
The primary aim of QoG is to conduct and promote research on corruption. One aim of the QoG Institute is to make publicly available cross-national comparative data on QoG and its correlates. The aim of the QoG Social Policy Dataset is to promote cross-national comparative research on social policy output and its correlates, with a special focus on the connection between social policy and Quality of Government (QoG).
The dataset combining cross-sectional data and time-series data for a selection of 40 countries. The dataset is specifically tailored for the analysis of public opinion data over time, instead uses country as its unit of observation, and one variable for every 5th year from 1970-2005 (or, one per module of each public opinion data source).
Samanni, Marcus. Jan Teorell, Staffan Kumlin, Stefan Dahlberg, Bo Rothstein, Sören Holmberg & Richard Svensson. 2012. The QoG Social Policy Dataset, version 4Apr12. University of Gothenburg:The Quality of Government Institute. http://www.qog.pol.gu.se
The dataset includes aggregate immigration opinions in 13 West European countries (Austria, Belgium, Denmark, France, Germany, Great Britain, Ireland, Italy, the Netherlands, Norway, Portugal, Sweden and Switzerland). The estimations are the result of a dyadic ratios algorithm.
This paper introduces a simple framework of counterfactual estimation for causal inference with time-series cross-sectional data, in which we estimate the average treatment effect on the treated by directly imputing counterfactual outcomes for treated observations. We discuss several novel estimators under this framework, including the fixed effects counterfactual estimator, interactive fixed effects counterfactual estimator, and matrix completion estimator. They provide more reliable causal estimates than conventional twoway fixed effects models when treatment effects are heterogeneous or unobserved time-varying confounders exist. Moreover, we propose a new dynamic treatment effects plot, along with several diagnostic tests, to help researchers gauge the validity of the identifying assumptions. We illustrate these methods with two political economy examples and develop an open-source package, fect, in both R and Stata to facilitate implementation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
While fixed effects (FE) models are often employed to address potential omitted variables, we argue that these models’ real utility is in isolating a particular dimension of variance from panel data for analysis. In addition, we show through novel mathematical decomposition and simulation that only one-way FE models cleanly capture either the over-time or cross-sectional dimensions in panel data, while the two-way FE model unhelpfully combines within-unit and cross-sectional variation in a way that produces un-interpretable answers. In fact, as we show in this paper, if we begin with the interpretation that many researchers wrongly assign to the two-way FE model—that it represents a single estimate of X on Y while accounting for unit-level heterogeneity and time shocks—the two-way FE specification is statistically unidentified, a fact that statistical software packages like R and Stata obscure through internal matrix processing.
The "https://electionstudies.org/data-center/2020-time-series-study/" Target="_blank">American National Election Studies (ANES) 2020 Time Series Study is a continuation of the series of election studies conducted since 1948 to support analysis of public opinion and voting behavior in U.S. presidential elections. This year's study features re-interviews with "https://electionstudies.org/data-center/2016-time-series-study/" Target="_blank">2016 ANES respondents, a freshly drawn cross-sectional sample, and post-election surveys with respondents from the "https://gss.norc.org/" Target="_blank">General Social Survey (GSS). All respondents were assigned to interview by one of three mode groups - by web, video or telephone. The study has a total of 8,280 pre-election interviews and 7,449 post-election re-interviews.
New content for the 2020 pre-election survey includes variables on sexual harassment and misconduct, health insurance, identity politics, immigration, media trust and misinformation, institutional legitimacy, campaigns, party images, trade tariffs and tax policy.
New content for the 2020 post-election survey includes voting experiences, attitudes toward public health officials and organizations, anti-elitism, faith in experts/science, climate change, gun control, opioids, rural-urban identity, international trade, sexual harassment and #MeToo, transgender military service, perception of foreign countries, group empathy, social media usage, misinformation and personal experiences.
(American National Election Studies. 2021. ANES 2020 Time Series Study Full Release [dataset and documentation]. July 19, 2021 version. "https://electionstudies.org/" Target="_blank">https://electionstudies.org/)
The data and programs replicate tables and figures from "Income convergence among U.S. states: cross-sectional and time series evidence", by Heckelman. Please see the ReadMe file for additional details.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Regression results for baseline model and alternative specifications.
Replication material for Metzger and Jones' "Getting Time Right" (forthcoming, Political Analysis). See "readme.html" in /code folder for further documentation. The CO capsule does not rerun the main simulations, but does provide the raw simulation results from those simulations. Abstract: Logit and probit (L/P) models are a mainstay of binary time-series cross-sectional analyses (BTSCS). Researchers include cubic splines or time polynomials to acknowledge the temporal element inherent in these data. However, L/P models cannot easily accommodate three other aspects of the data’s temporality: whether covariate effects are conditional on time, whether the process of interest is causally complex, and whether our functional form assumption regarding time’s effect is correct. Failing to account for any of these issues amounts to misspecification bias, threatening our inferences’ validity. We argue scholars should consider using Cox duration models when analyzing BTSCS data, as they create fewer opportunities for such misspecification bias, while also having the ability to assess the same hypotheses as L/P. We use Monte Carlo simulations to bring new evidence to light showing Cox models perform just as well—and sometimes better—than logit models in a basic BTSCS setting, and perform considerably better in more complex BTSCS situations. In addition, we highlight a new interpretation technique for Cox models—transition probabilities—to make Cox model results more readily interpretable. We use an application from interstate conflict to demonstrate our points.
Our Time-Series Cross-Sectional data combine various online databases. Country names were first identified and matched using R-package “countrycode” (Arel-Bundock, Enevoldsen, & Yetman, 2018) before all datasets were merged. Occasionally, we modified unidentified country names to be consistent across datasets. We then transformed “wide” data into “long” data and merged them using R’s Tidyverse framework (Wickham, 2014). Our analysis begins with the year 1949, which was occasioned by the fact that one of the key time-variant level-1 variables, pathogen prevalence was only available from 1949 on. See our Supplemental Material for all data, Stata syntax, R-markdown for visualization, supplemental analyses and detailed results (available at https://osf.io/drt8j/).