Facebook
TwitterDylan Brewer and Alyssa Carlson
Accepted at Journal of Applied Econometrics, 2023
This replication package contains files required to reproduce results, tables, and figures using Matlab and Stata. We divide the project into instructions to replicate the simulation, the result from Huang et al (2006), and the application.
For reproducing the simulation results
SSML_simfunc: function that produces individual simulations runsSSML_simulation: script that loops over the SSML_simfunc for different DGP and multiple simulation runsSSML_figures: script that generates all figures for the paperSSML_compilefunc: function that compiles the results from SSML_simulation for the SSML_figures scriptSSML_simfunc, SSML_simulation, SSML_figures, SSML_compilefunc to the same folder. This location will be referred to as the FILEPATH.FILEPATH location. FILEPATH location inside SSML_simulation and SSML_figures. SSML_simulation to produce simulation data and results.SSML_figures to produce figures.For reproducing the Huang et. al. (2006) replication results.
*\HuangetalReplication with short descriptions:SSML_huangrep: script that replicates the results from Huang et. al. (2006)Go to https://archive.ics.uci.edu/dataset/14/breast+cancer and save file as "breast-cancer-wisconsin.data"
SSML_huangrep and the breast cancer data to the same folder. This location will be referred to as the FILEPATH.FILEPATH location inside SSML_huangrep SSML_huangrep to produce results and figures.For reproducing the application section results.
*\Application with short descriptions:G0_main_202308.do: Stata wrapper code that will run all application replication filesG1_cqclean_202308.do: Cleans election outcomes dataG2_cqopen_202308.do: Cleans open elections dataG3_demographics_cainc30_202308.do: Cleans demographics dataG4_fips_202308.do: Cleans FIPS code dataG5_klarnerclean_202308.do: Cleans Klarner gubernatorial dataG6_merge_202308.do: Merges cleaned datasets togetherG7_summary_202308.do: Generates summary statistics tables and figuresG8_firststage_202308.do: Runs L1 penalized probit for the first stageG9_prediction_202308.m: Trains learners and makes predictionsG10_figures_202308.m: Generates figures of prediction patternsG11_final_202308.do: Generates final figures and tables of resultsr1_lasso_alwayskeepCF_202308.do: Examines the effect of requiring the control function is not dropped from LASSOlatexTable.m: Code by Eli Duenisch to write LaTeX tables from Matlab (https://www.mathworks.com/matlabcentral/fileexchange/44274-latextable)\CAINC30: County level income and demographics data from the BEA\CPI: CPI data from the BLS\KlarnerGovernors: Carl Klarner's Governors Dataset available at https://dataverse.harvard.edu/dataset.xhtml?persistentId=hdl:1902.1/20408These data cannot be transferred as part of the data use agreement with the CQ Press. Thus, the files are not included.
\CQ_county: County level election outcomes available from http://library.cqpress.com/elections/login.php?requested=%2Felections%2Fdownload-data.php\CQ_open: Open elections available from http://library.cqpress.com/elections/advsearch/elections-with-open-seats-results.php?open_year1=1968&open_year2=2019&open_office=4There is no batch download--downloads for each year must be done by hand. For each year, download as many state outcomes as possible and name the files YYYYa.csv, YYYYb.csv, etc. (Example: 1970a.csv, 1970b.csv, 1970c.csv, 1970d.csv). See line 18 of G1_cqclean_202308.do for file structure information.
G0_main_202308.do on line 18 to the application folder.matlabpath in G0_main_202308.do on line 18 to the appropriate location.G9_prediction_202308.m and G10_figures_202308.m as necessary.G0_main_202308.do in Stata to run all programs.*\Application\Output.Contact Dylan Brewer (brewer@gatech.edu) or Alyssa Carlson (carlsonah@missouri.edu) for help with replication.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
analyze the area resource file (arf) with r the arf is fun to say out loud. it's also a single county-level data table with about 6,000 variables, produced by the united states health services and resources administration (hrsa). the file contains health information and statistics for over 3,000 us counties. like many government agencies, hrsa provides only a sas importation script and an as cii file. this new github repository contains two scripts: 2011-2012 arf - download.R download the zipped area resource file directly onto your local computer load the entire table into a temporary sql database save the condensed file as an R data file (.rda), comma-separated value file (.csv), and/or stata-readable file (.dta). 2011-2012 arf - analysis examples.R limit the arf to the variables necessary for your analysis sum up a few county-level statistics merge the arf onto other data sets, using both fips and ssa county codes create a sweet county-level map click here to view these two scripts for mo re detail about the area resource file (arf), visit: the arf home page the hrsa data warehouse notes: the arf may not be a survey data set itself, but it's particularly useful to merge onto other survey data. confidential to sas, spss, stata, and sudaan users: time to put down the abacus. time to transition to r. :D
Facebook
TwitterWe imported the excel sheet FGD patients’ characteristics into the Stata software for conducting simple descriptive analysis. Therefore, a saved dataset and its do file has been shared with editors and reviewers for their reference. (ZIP)
Facebook
Twitterhttp://rdm.uva.nl/en/support/confidential-data.htmlhttp://rdm.uva.nl/en/support/confidential-data.html
Datasets related to the evaluation and report of the Urban Vitality (UV) Open science support desk. Data were exported from Qualtrics and saved as STATA (.dta) files and analyzed using STATA version 13.1. This item contains:1. Qualtrics-exports: two tab-separated value (.tsv) files2. STATA: two STATA data (.dta) files3. STATA: three STATA log (.txt) filesThe STATA analysis files are deposited in UvA/HvA figshare separately and are publcily available. More information is available in the report.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data and code needed to reproduce the results of the paper "Effects of community management on user activity in online communities", available in draft here.
Instructions:
Please note: I use both Stata and Jupyter Notebook interactively, running a block with a few lines of code at a time. Expect to have to change directories, file names etc.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This replication package contains the raw data and code to replicate the findings reported in the paper. The data are licensed under a Creative Commons Attribution 4.0 International Public License. The code is licensed under a Modified BSD License. See LICENSE.txt for details.
Software requirements
All analysis were done in Stata version 16:
Instructions
Datasets
Descriptions of scripts
1_1_clean_wave1.do
This script processes the raw data from wave 1, the survey experiment
1_2_clean_wave2.do
This script processes the raw data from wave 2, the follow-up survey
1_3_merge_generate.do
This script creates the datasets used in the main analysis and for robustness checks by merging the cleaned data from wave 1 and 2, tests the exclusion criteria and creates additional variables
02_analysis.do
This script estimates regression models in Stata, creates figures and tables, saving them to results/figures and results/tables
03_robustness_checks_no_exclusion.do
This script runs the main analysis using the dataset without applying the exclusion criteria. Results are saved in results/tables
04_figure2_germany_map.do
This script creates Figure 2 in the main manuscript using publicly available data on vaccination numbers in Germany.
05_figureS1_dogmatism_scale.do
This script creates Figure S1 using data from a pretest to adjust the dogmatism scale.
06_AppendixS7.do
This script creates the figures and tables provided in Appendix S7 on the representativity of our sample compared to the German average using publicly available data about the age distribution in Germany.
07_AppendixS10.do
This script creates the figures and tables provided in Appendix S10 on the external validity of vaccination rates in our sample using publicly available data on vaccination numbers in Germany.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
analyze the survey of consumer finances (scf) with r the survey of consumer finances (scf) tracks the wealth of american families. every three years, more than five thousand households answer a battery of questions about income, net worth, credit card debt, pensions, mortgages, even the lease on their cars. plenty of surveys collect annual income, only the survey of consumer finances captures such detailed asset data. responses are at the primary economic unit-level (peu) - the economically dominant, financially interdependent family members within a sampled household. norc at the university of chicago administers the data collection, but the board of governors of the federal reserve pay the bills and therefore call the shots. if you were so brazen as to open up the microdata and run a simple weighted median, you'd get the wrong answer. the five to six thousand respondents actually gobble up twenty-five to thirty thousand records in the final pub lic use files. why oh why? well, those tables contain not one, not two, but five records for each peu. wherever missing, these data are multiply-imputed, meaning answers to the same question for the same household might vary across implicates. each analysis must account for all that, lest your confidence intervals be too tight. to calculate the correct statistics, you'll need to break the single file into five, necessarily complicating your life. this can be accomplished with the meanit sas macro buried in the 2004 scf codebook (search for meanit - you'll need the sas iml add-on). or you might blow the dust off this website referred to in the 2010 codebook as the home of an alternative multiple imputation technique, but all i found were broken links. perhaps it's time for plan c, and by c, i mean free. read the imputation section of the latest codebook (search for imputation), then give these scripts a whirl. they've got that new r smell. the lion's share of the respondents in the survey of consumer finances get drawn from a pretty standard sample of american dwellings - no nursing homes, no active-duty military. then there's this secondary sample of richer households to even out the statistical noise at the higher end of the i ncome and assets spectrum. you can read more if you like, but at the end of the day the weights just generalize to civilian, non-institutional american households. one last thing before you start your engine: read everything you always wanted to know about the scf. my favorite part of that title is the word always. this new github repository contains t hree scripts: 1989-2010 download all microdata.R initiate a function to download and import any survey of consumer finances zipped stata file (.dta) loop through each year specified by the user (starting at the 1989 re-vamp) to download the main, extract, and replicate weight files, then import each into r break the main file into five implicates (each containing one record per peu) and merge the appropriate extract data onto each implicate save the five implicates and replicate weights to an r data file (.rda) for rapid future loading 2010 analysis examples.R prepare two survey of consumer finances-flavored multiply-imputed survey analysis functions load the r data files (.rda) necessary to create a multiply-imputed, replicate-weighted survey design demonstrate how to access the properties of a multiply-imput ed survey design object cook up some descriptive statistics and export examples, calculated with scf-centric variance quirks run a quick t-test and regression, but only because you asked nicely replicate FRB SAS output.R reproduce each and every statistic pr ovided by the friendly folks at the federal reserve create a multiply-imputed, replicate-weighted survey design object re-reproduce (and yes, i said/meant what i meant/said) each of those statistics, now using the multiply-imputed survey design object to highlight the statistically-theoretically-irrelevant differences click here to view these three scripts for more detail about the survey of consumer finances (scf), visit: the federal reserve board of governors' survey of consumer finances homepage the latest scf chartbook, to browse what's possible. (spoiler alert: everything.) the survey of consumer finances wikipedia entry the official frequently asked questions notes: nationally-representative statistics on the financial health, wealth, and assets of american hous eholds might not be monopolized by the survey of consumer finances, but there isn't much competition aside from the assets topical module of the survey of income and program participation (sipp). on one hand, the scf interview questions contain more detail than sipp. on the other hand, scf's smaller sample precludes analyses of acute subpopulations. and for any three-handed martians in the audience, ther e's also a few biases between these two data sources that you ought to consider. the survey methodologists at the federal reserve take their job...
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This archive includes materials needed to replicate analysis reported in Doherty, Touchton, Lyons. 202X. "Partisanship and Support for Devolving Concrete Policy Decisions to the States." Political Behavior. replication_data.dta: Stata formatted dataset with all variables used in the analysis. replication.do: DO file that executes all analysis reported in the article and outputs tables and figures to a subfolder named "tables" Users should save these two files to a folder, create a subfolder titled "tables" and change the path on the first line of the DO file to refer to the main folder.
Facebook
TwitterThis archive contains materials to replicate the analysis reported in: Doherty, David, Peter J. Schraeder, and Kirstie L. Dobbs. "Do Democratic Revolutions 'Activate' Participants?: The Case of Tunisia"The root directory includes five Stata DO files (run on Stata 14.1). The file replication.do calls the other four DO files. These other four DO files conduct analysis specific to a particular dataset. In the case of arab_barometer_w2.do and afrobarometer.do the files simply do necessary recoding and output summary statistics. The remaining two DO files use two datasets used to conduct the statistical analysis reported in the paper. They also complete recoding to ensure that variables from these two datasets are coded similarly. The file replication.do then stacks the recoded data to complete the core analysis reported. The directory includes four folders:1) prepped_data: this folder is where the two recoded datasets that are stacked for the core analysis are deposited. It is empty in this archive.2) private_data: Empty folder referred to in commented out code. The only file originally included in this folder was the full dataset from the original survey used in the analysis. The commented out code (top of "orig_survey.do") stripped out variables not used in the analysis and saved the resulting dataset in the raw_data folder.3) raw_data: Contains all datasets used in the analysis. The tunisia_2012_survey.dta file is from our original survey. The remaining files were downloaded from the Arab Barometer and AfroBarometer websites. 4) tables: Empty folder where tables and figures are saved.To run the analysis, users should simply set the directory at the top of the replication.do file.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
AbstractWe study the effect of inconsistent time preferences on actual and planned retirement timing decisions in two independent datasets. Theory predicts that hyperbolic time preferences can lead to dynamically inconsistent retirement timing. In an online experiment with more than 2,000 participants, we find that time-inconsistent participants retire on average 1.75 years earlier than time-consistent participants do. The planned retirement age of non-retired participants decreases with age. This negative age effect is about twice as strong among time-inconsistent participants. The temptation of early retirement seems to rise in the final years of approaching retirement. Consequently, time-inconsistent participants have a higher probability of regretting their retirement decision. We find similar results for a representative household survey (German SAVE panel). Using smoking behavior and overdraft usage as time preference proxies, we confirm that time-inconsistent participants retire earlier and that non-retirees reduce their planned retirement age within the panel.MethodsWe conduct an online experiment in cooperation with a large and well-circulated German newspaper, the Frankfurter Allgemeine Zeitung (FAZ). Participants are recruited via a link on the newspaper's website and two announcements in the print edition. In total, 3,077 participants complete the experiment, which takes them on average 11 minutes. Participants answer questions about retirement planning, time preferences, risk preferences, financial literacy, and demographics. The initial sample for this study consists of 256 retired participants and 2,173 non-retired participants.Usage NotesOur dataset: STATA Do File is attached Additional Datasets: In addition, a German Household Panle is used in this paper. The data cannot be uploaded by us but is available via the Max Planck Institute (https://www.mpisoc.mpg.de/en/social-policy-mea/research/save-2001-2013/). We upload the Do-Files used in the analysis and the results in an excel format (xlsx).
Facebook
TwitterThe Health Survey for England, 2000-2001: Small Area Estimation Teaching Dataset was prepared as a resource for those interested in learning introductory small area estimation techniques. It was first presented as part of a workshop entitled 'Introducing small area estimation techniques and applying them to the Health Survey for England using Stata'. The data are accompanied by a guide that includes a practical case study enabling users to derive estimates of disability for districts in the absence of survey estimates. This is achieved using various models that combine information from ESDS government surveys with other aggregate data that are reliably available for sub-national areas. Analysis is undertaken using Stata statistical software; all relevant syntax is provided in the accompanying '.do' files.
The data files included in this teaching resource contain HSE variables and data from the Census and Mid-year population estimates and projections that were developed originally by the National Statistical agencies, as follows:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Version 3 release notes: Adds data for 2016.Order rows by year (descending) and ORI.Version 2 release notes: Fix bug where Philadelphia Police Department had incorrect FIPS county code. The Hate Crime data is an FBI data set that is part of the annual Uniform Crime Reporting (UCR) Program data. This data contains information about hate crimes reported in the United States. The data sets here combine all data from the years 1992-2015 into a single file. Please note that the files are quite large and may take some time to open.Each row indicates a hate crime incident for an agency in a given year. I have made a unique ID column ("unique_id") by combining the year, agency ORI9 (the 9 character Originating Identifier code), and incident number columns together. Each column is a variable related to that incident or to the reporting agency. Some of the important columns are the incident date, what crime occurred (up to 10 crimes), the number of victims for each of these crimes, the bias motivation for each of these crimes, and the location of each crime. It also includes the total number of victims, total number of offenders, and race of offenders (as a group). Finally, it has a number of columns indicating if the victim for each offense was a certain type of victim or not (e.g. individual victim, business victim religious victim, etc.). All the data was downloaded from NACJD as ASCII+SPSS Setup files and read into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. For the R code used to clean this data, see here. https://github.com/jacobkap/crime_data. The only changes I made to the data are the following. Minor changes to column names to make all column names 32 characters or fewer (so it can be saved in a Stata format), changed the name of some UCR offense codes (e.g. from "agg asslt" to "aggravated assault"), made all character values lower case, reordered columns. I also added state, county, and place FIPS code from the LEAIC (crosswalk) and generated incident month, weekday, and month-day variables from the incident date variable included in the original data. The zip file contains the data in the following formats and a codebook: .csv - Microsoft Excel.dta - Stata.sav - SPSS.rda - RIf you have any questions, comments, or suggestions please contact me at jkkaplan6@gmail.com.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This folder contains the raw data and the state code to produce the dataset used in the paper "How Do Electoral Incentives Affect Legislator Behavior?". The code produces all tables and figures in the paper and in the appendix. To replicate the findings, download the replication folder with all materials and set the working directory to this folder. Run the file replicate_how_do_electoral_incentives_affect_legislator_behavior.do in Stata. This will produce the main dataset from the raw input data and produce all the tables and figures and save them in the folder tables_figures. The individual results can also replicated using the dataset termlimited.dta in the data_output folder and the relevant do files. The do file electoral_incentives.do shows what do file is needed to replicate a particular table or figure in the paper or appendix.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The article uses a dataset, which cannot be deposited online, but is freely available to registered users. The data of the British Household Panel Study can be requested via https://discover.ukdataservice.ac.uk/catalogue/?sn=5151. Here we provide a STATA do-file that will create the working file, recode the original data and run some robustness tests. The data was prepared in Stata and then saved as SPSS files .sav using Stattrans. This was necessary, as the main cross-lagged latent class models of the paper were estimated using LatentGOLD, which only reads .sav files. Here we also provide the syntax files that were used for estimating these models.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The data set (saved in Stata *.dta and .txt) contains all observations (Norwegian supreme court cases 2008-2018 decided in five-justice panels) and variables (independent variables measuring complexity of cases and the dependent variable measuring time in hours scheduled for oral arguments) relevant for a complete replication of the the study. ABSTRACT OF STUDY: While high courts with fixed time for oral arguments deprive researchers of the opportunity to extract temporal variance, courts that apply the “accordion model” institutional design and adjust the time for oral arguments according to the perceived complexity of a case are a boon for research that seeks to validate case complexity well ahead of the courts’ opinion writing. We analyse an original data set of all 1,402 merits decisions of the Norwegian Supreme Court from 2008 to 2018 where the justices set time for oral arguments to accommodate the anticipated difficulty of the case. Our validation model empirically tests whether and how attributes of a case associated with ex ante complexity are linked with time allocated for oral arguments. Cases that deal with international law and civil law, have several legal players, are cross-appeals from lower courts are indicative of greater case complexity. We argue that these results speak powerfully to the use of case attributes and/or the time reserved for oral arguments as ex ante measures of case complexity. To enhance the external validity of our findings, future studies should examine whether these results are confirmed in high courts with similar institutional design for oral arguments. Subsequent analyses should also test the degree to which complex cases and/or time for oral arguments have predictive validity on more divergent opinions among the justices and on the time courts and justices need to render a final opinion.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Replication files for the paper "Expropriation of the Church's wealth and political violence in 19th century Colombia."It includes a complete dataset (in folder data) and a Stata do-file to replicate the tables and figures from the paper.The other folders are now empty but the program is written to save figures and tables in them.
Facebook
TwitterReplication package for: Bentzen, J.S., Knudsen, A.S.B., Sperling, L.L., & Norenzayan, A. (2025), "Religion exhibits the greatest cultural diversity across 117 countries", Nature Communications. ----------------------------------------------------------- FILES ----------------------------------------------------------- 1_prepare_data.do – Prepares variables from the integrated EVS–WVS dataset. 2_Fig_*.do – Scripts for generating all figures (main text and SI). cntr_id.dta – Crosswalk file mapping country identifiers. readme.txt – This file. ----------------------------------------------------------- INSTRUCTIONS ----------------------------------------------------------- 1. Download the integrated European Values Study (EVS) and World Values Survey (WVS) dataset (1981–2022). Detailed instructions are available here: https://europeanvaluesstudy.eu/methodology-data-documentation/integrated-values-surveys/data-and-documentation/ Access requires free registration. 2. Save the integrated file as: Integrated_values_surveys_1981-2022.dta 3. Open Stata (version 17 or higher) and set your working directory at the top of each script (see line 5 in 1_prepare_data.do). 4. Run the scripts in order: - 1_prepare_data.do - All 2_Fig_*.do scripts (each produces one or more figures for the main text and Supplementary Information).
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This is the extraction of services trade statistics, from WTO, used for modes of supply analysis. The data is saved as STATA software format and in Excel. Data is extracted from WTO database available https://www.wto.org/english/news_e/news19_e/serv_31jul19_e.htm
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data we use in this paper were gathered in the 6th round of Multiple Indicator Cluster Surveys (MICS6), which can be downloaded from https://mics.unicef.org/surveys. The MICS6 surveys are conducted by UNICEF (United Nations International Children's Emergency Fund). We merge the original data from 11 countries and saved the user data in Stata data. In addition, do-file for analysis is also published here.
Facebook
TwitterThe database is saved as .dta (Stata 13 or above) format.The dataset contains the pooled data from the baseline survey (conducted from July 1 to September 30, 2014) and the follow-up survey (conducted from October 1 to December 31, 2015). The dataset is .dta (Stata 13 or later) format.
Facebook
TwitterDylan Brewer and Alyssa Carlson
Accepted at Journal of Applied Econometrics, 2023
This replication package contains files required to reproduce results, tables, and figures using Matlab and Stata. We divide the project into instructions to replicate the simulation, the result from Huang et al (2006), and the application.
For reproducing the simulation results
SSML_simfunc: function that produces individual simulations runsSSML_simulation: script that loops over the SSML_simfunc for different DGP and multiple simulation runsSSML_figures: script that generates all figures for the paperSSML_compilefunc: function that compiles the results from SSML_simulation for the SSML_figures scriptSSML_simfunc, SSML_simulation, SSML_figures, SSML_compilefunc to the same folder. This location will be referred to as the FILEPATH.FILEPATH location. FILEPATH location inside SSML_simulation and SSML_figures. SSML_simulation to produce simulation data and results.SSML_figures to produce figures.For reproducing the Huang et. al. (2006) replication results.
*\HuangetalReplication with short descriptions:SSML_huangrep: script that replicates the results from Huang et. al. (2006)Go to https://archive.ics.uci.edu/dataset/14/breast+cancer and save file as "breast-cancer-wisconsin.data"
SSML_huangrep and the breast cancer data to the same folder. This location will be referred to as the FILEPATH.FILEPATH location inside SSML_huangrep SSML_huangrep to produce results and figures.For reproducing the application section results.
*\Application with short descriptions:G0_main_202308.do: Stata wrapper code that will run all application replication filesG1_cqclean_202308.do: Cleans election outcomes dataG2_cqopen_202308.do: Cleans open elections dataG3_demographics_cainc30_202308.do: Cleans demographics dataG4_fips_202308.do: Cleans FIPS code dataG5_klarnerclean_202308.do: Cleans Klarner gubernatorial dataG6_merge_202308.do: Merges cleaned datasets togetherG7_summary_202308.do: Generates summary statistics tables and figuresG8_firststage_202308.do: Runs L1 penalized probit for the first stageG9_prediction_202308.m: Trains learners and makes predictionsG10_figures_202308.m: Generates figures of prediction patternsG11_final_202308.do: Generates final figures and tables of resultsr1_lasso_alwayskeepCF_202308.do: Examines the effect of requiring the control function is not dropped from LASSOlatexTable.m: Code by Eli Duenisch to write LaTeX tables from Matlab (https://www.mathworks.com/matlabcentral/fileexchange/44274-latextable)\CAINC30: County level income and demographics data from the BEA\CPI: CPI data from the BLS\KlarnerGovernors: Carl Klarner's Governors Dataset available at https://dataverse.harvard.edu/dataset.xhtml?persistentId=hdl:1902.1/20408These data cannot be transferred as part of the data use agreement with the CQ Press. Thus, the files are not included.
\CQ_county: County level election outcomes available from http://library.cqpress.com/elections/login.php?requested=%2Felections%2Fdownload-data.php\CQ_open: Open elections available from http://library.cqpress.com/elections/advsearch/elections-with-open-seats-results.php?open_year1=1968&open_year2=2019&open_office=4There is no batch download--downloads for each year must be done by hand. For each year, download as many state outcomes as possible and name the files YYYYa.csv, YYYYb.csv, etc. (Example: 1970a.csv, 1970b.csv, 1970c.csv, 1970d.csv). See line 18 of G1_cqclean_202308.do for file structure information.
G0_main_202308.do on line 18 to the application folder.matlabpath in G0_main_202308.do on line 18 to the appropriate location.G9_prediction_202308.m and G10_figures_202308.m as necessary.G0_main_202308.do in Stata to run all programs.*\Application\Output.Contact Dylan Brewer (brewer@gatech.edu) or Alyssa Carlson (carlsonah@missouri.edu) for help with replication.