23 datasets found

r
Addressing sample selection bias for machine learning methods (replication...
resodate.org
Updated Oct 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dylan Brewer; Alyssa Carlson (2025). Addressing sample selection bias for machine learning methods (replication data) [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9qb3VybmFsZGF0YS56YncuZXUvZGF0YXNldC9hZGRyZXNzaW5nLXNhbXBsZS1zZWxlY3Rpb24tYmlhcy1mb3ItbWFjaGluZS1sZWFybmluZy1tZXRob2RzLXJlcGxpY2F0aW9uLWRhdGE=
Explore at:
Dataset updated
Oct 2, 2025
Dataset provided by
Journal of Applied Econometrics
ZBW
ZBW Journal Data Archive
Authors
Dylan Brewer; Alyssa Carlson
Description
Addressing sample selection bias for machine learning methods (replication data)

Dylan Brewer and Alyssa Carlson

Accepted at Journal of Applied Econometrics, 2023

Overview

This replication package contains files required to reproduce results, tables, and figures using Matlab and Stata. We divide the project into instructions to replicate the simulation, the result from Huang et al (2006), and the application.

Simulation

For reproducing the simulation results

Included files in *\Simulation with short descriptions:

SSML_simfunc: function that produces individual simulations runs

SSML_simulation: script that loops over the SSML_simfunc for different DGP and multiple simulation runs

SSML_figures: script that generates all figures for the paper

SSML_compilefunc: function that compiles the results from SSML_simulation for the SSML_figures script

Steps for replicating simulation:

Save SSML_simfunc, SSML_simulation, SSML_figures, SSML_compilefunc to the same folder. This location will be referred to as the FILEPATH.

Create OUTPUT folder inside the FILEPATH location.

Change the FILEPATH location inside SSML_simulation and SSML_figures.

Run SSML_simulation to produce simulation data and results.

Run SSML_figures to produce figures.

Huang et al replication

For reproducing the Huang et. al. (2006) replication results.

Included files in *\HuangetalReplication with short descriptions:

SSML_huangrep: script that replicates the results from Huang et. al. (2006)

Obtaining the dataset:

Go to https://archive.ics.uci.edu/dataset/14/breast+cancer and save file as "breast-cancer-wisconsin.data"

Steps for replicating results:

Save SSML_huangrep and the breast cancer data to the same folder. This location will be referred to as the FILEPATH.

Change the FILEPATH location inside SSML_huangrep

Run SSML_huangrep to produce results and figures.

Application

For reproducing the application section results.

Included program files in *\Application with short descriptions:

G0_main_202308.do: Stata wrapper code that will run all application replication files

G1_cqclean_202308.do: Cleans election outcomes data

G2_cqopen_202308.do: Cleans open elections data

G3_demographics_cainc30_202308.do: Cleans demographics data

G4_fips_202308.do: Cleans FIPS code data

G5_klarnerclean_202308.do: Cleans Klarner gubernatorial data

G6_merge_202308.do: Merges cleaned datasets together

G7_summary_202308.do: Generates summary statistics tables and figures

G8_firststage_202308.do: Runs L1 penalized probit for the first stage

G9_prediction_202308.m: Trains learners and makes predictions

G10_figures_202308.m: Generates figures of prediction patterns

G11_final_202308.do: Generates final figures and tables of results

r1_lasso_alwayskeepCF_202308.do: Examines the effect of requiring the control function is not dropped from LASSO

latexTable.m: Code by Eli Duenisch to write LaTeX tables from Matlab (https://www.mathworks.com/matlabcentral/fileexchange/44274-latextable)

Included non-confidential data in subdirectory `*\Application\Data`:

\CAINC30: County level income and demographics data from the BEA

\CPI: CPI data from the BLS

\KlarnerGovernors: Carl Klarner's Governors Dataset available at https://dataverse.harvard.edu/dataset.xhtml?persistentId=hdl:1902.1/20408

Confidential data suppressed in subdirectory `*\Application\CD`:

These data cannot be transferred as part of the data use agreement with the CQ Press. Thus, the files are not included.

\CQ_county: County level election outcomes available from http://library.cqpress.com/elections/login.php?requested=%2Felections%2Fdownload-data.php

\CQ_open: Open elections available from http://library.cqpress.com/elections/advsearch/elections-with-open-seats-results.php?open_year1=1968&open_year2=2019&open_office=4

There is no batch download--downloads for each year must be done by hand. For each year, download as many state outcomes as possible and name the files YYYYa.csv, YYYYb.csv, etc. (Example: 1970a.csv, 1970b.csv, 1970c.csv, 1970d.csv). See line 18 of G1_cqclean_202308.do for file structure information.

Steps for replicating application:

Download confidential data from the CQ Press.

Change the working directory in G0_main_202308.do on line 18 to the application folder.

Change local matlabpath in G0_main_202308.do on line 18 to the appropriate location.

Set directory and file path in G9_prediction_202308.m and G10_figures_202308.m as necessary.

Run G0_main_202308.do in Stata to run all programs.

All output (figures and tables) will be saved to subdirectory *\Application\Output.

Contact

Contact Dylan Brewer (brewer@gatech.edu) or Alyssa Carlson (carlsonah@missouri.edu) for help with replication.
H
Area Resource File (ARF)
dataverse.harvard.edu
Updated May 30, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anthony Damico (2013). Area Resource File (ARF) [Dataset]. http://doi.org/10.7910/DVN/8NMSFV
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/8NMSFV
Dataset updated
May 30, 2013
Dataset provided by
Harvard Dataverse
Authors
Anthony Damico
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
analyze the area resource file (arf) with r the arf is fun to say out loud. it's also a single county-level data table with about 6,000 variables, produced by the united states health services and resources administration (hrsa). the file contains health information and statistics for over 3,000 us counties. like many government agencies, hrsa provides only a sas importation script and an as cii file. this new github repository contains two scripts: 2011-2012 arf - download.R download the zipped area resource file directly onto your local computer load the entire table into a temporary sql database save the condensed file as an R data file (.rda), comma-separated value file (.csv), and/or stata-readable file (.dta). 2011-2012 arf - analysis examples.R limit the arf to the variables necessary for your analysis sum up a few county-level statistics merge the arf onto other data sets, using both fips and ssa county codes create a sweet county-level map click here to view these two scripts for mo re detail about the area resource file (arf), visit: the arf home page the hrsa data warehouse notes: the arf may not be a survey data set itself, but it's particularly useful to merge onto other survey data. confidential to sas, spss, stata, and sudaan users: time to put down the abacus. time to transition to r. :D
f
FGDs patients’ characteristics Stata format dataset and its do file.
datasetcatalog.nlm.nih.gov
Updated Apr 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ottaru, Theresia A.; Kivuyo, Sokoine L.; Wood, Christine V.; Shayo, Elizabeth H.; Mbugi, Erasto V.; Hirschhorn, Lisa R.; Karoli, Peter M.; Kaaya, Sylvia F.; Shayo, Grace A.; Mgina, Eric J.; Hawkins, Claudia A.; Mfinanga, Sayoki G. (2023). FGDs patients’ characteristics Stata format dataset and its do file. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001058989
Explore at:
Dataset updated
Apr 7, 2023
Authors
Ottaru, Theresia A.; Kivuyo, Sokoine L.; Wood, Christine V.; Shayo, Elizabeth H.; Mbugi, Erasto V.; Hirschhorn, Lisa R.; Karoli, Peter M.; Kaaya, Sylvia F.; Shayo, Grace A.; Mgina, Eric J.; Hawkins, Claudia A.; Mfinanga, Sayoki G.
Description
We imported the excel sheet FGD patients’ characteristics into the Stata software for conducting simple descriptive analysis. Therefore, a saved dataset and its do file has been shared with editors and reviewers for their reference. (ZIP)
f
Data.Evaluation Report 9-months pilot Open Science Support Desk
figshare.com
uvaauas.figshare.com
Updated Jan 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
G. ter Riet; N.R. van Ulzen; F.A. van Nes (2021). Data.Evaluation Report 9-months pilot Open Science Support Desk [Dataset]. http://doi.org/10.21943/auas.13614689.v1
Explore at:
Unique identifier
https://doi.org/10.21943/auas.13614689.v1
Dataset updated
Jan 22, 2021
Dataset provided by
University of Amsterdam / Amsterdam University of Applied Sciences
Authors
G. ter Riet; N.R. van Ulzen; F.A. van Nes
License
http://rdm.uva.nl/en/support/confidential-data.htmlhttp://rdm.uva.nl/en/support/confidential-data.html
Description
Datasets related to the evaluation and report of the Urban Vitality (UV) Open science support desk. Data were exported from Qualtrics and saved as STATA (.dta) files and analyzed using STATA version 13.1. This item contains:1. Qualtrics-exports: two tab-separated value (.tsv) files2. STATA: two STATA data (.dta) files3. STATA: three STATA log (.txt) filesThe STATA analysis files are deposited in UvA/HvA figshare separately and are publcily available. More information is available in the report.
Effects of community management on user activity in online communities
zenodo.org
data.niaid.nih.gov
zip
Updated Apr 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alberto Cottica; Alberto Cottica (2025). Effects of community management on user activity in online communities [Dataset]. http://doi.org/10.5281/zenodo.1320261
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1320261
Dataset updated
Apr 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alberto Cottica; Alberto Cottica
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data and code needed to reproduce the results of the paper "Effects of community management on user activity in online communities", available in draft here.

Instructions:

Unzip the files.

Start with JSON files obtained from calling platform APIs: each dataset consists of one file for posts, one for comments, one for users. In the paper we use two datasets, one referring Edgeryders, the other to Matera 2019.

Run them through edgesense (https://github.com/edgeryders/edgesense). Edgesense allows to set the length of the observation period. We set it to 1 week and 1 day for Edgeryders data, and to 1 day for Matera 2019 data. Edgesense stores its results in a file called JSON network.min.json, which we then rename to keep track of the data source and observation length.

Launch Jupyter Notebook and run the notebook provided to convert the network.min.json files into CSV flat files, one for each netwrk file

Launch Stata and open each flat csv files with it, then save it in Stata format.

Use the provided Stata .do scripts to replicate results.

Please note: I use both Stata and Jupyter Notebook interactively, running a block with a few lines of code at a time. Expect to have to change directories, file names etc.
Repeated information of benefits reduce COVID-19 vaccination hesitancy:...
zenodo.org
data-staging.niaid.nih.gov
+1more
zip
Updated Jun 17, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Max Burger; Max Burger; Matthias Mayer; Matthias Mayer; Ivo Steimanis; Ivo Steimanis (2022). Repeated information of benefits reduce COVID-19 vaccination hesitancy: Experimental evidence from Germany [Dataset]. http://doi.org/10.5281/zenodo.6242620
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6242620
Dataset updated
Jun 17, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Max Burger; Max Burger; Matthias Mayer; Matthias Mayer; Ivo Steimanis; Ivo Steimanis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Germany
Description
This replication package contains the raw data and code to replicate the findings reported in the paper. The data are licensed under a Creative Commons Attribution 4.0 International Public License. The code is licensed under a Modified BSD License. See LICENSE.txt for details.

Software requirements

All analysis were done in Stata version 16:

Add-on packages are included in scripts/libraries/stata and do not need to be installed by user. The names, installation sources, and installation dates of these packages are available in scripts/libraries/stata/stata.trk.

Instructions

Save the folder ‘replication_PLOS’ to your local drive.

Open the master script ‘run.do’ and change the global pointing to the working direction (line 20) to the location where you save the folder on your local drive

Run the master script ‘run.do’ to replicate the analysis and generate all tables and figures reported in the paper and supplementary online materials

Datasets

Wave 1 – Survey experiment: ‘wave1_survey_experiment_raw.dta’

Wave 2 – Follow-up Survey: ‘wave2_follow_up_raw.dta'

Map: shape-files ‘plz2stellig.shp’ ‘OSM_PLZ.shp’, area codes ‘Postleitzahlengebiete-_OSM.csv’_, (all links to the sources can be found in the script ‘04_figure2_germany_map.do’)

Pretest: ‘pre-test_corona_raw.dta’

For Appendix S7: ‘alter_geschlecht_zensus_det.xlsx’, ‘vaccination_landkreis_raw.dta’, ‘census2020_age_gender.csv’ (all links to the sources can be found in the script ‘06_AppendixS7.do’)

For Appendix S10: ‘vaccination_landkreis_raw.dta’ (all links to the sources can be found in the script ‘07_AppendixS10.do’)

Descriptions of scripts

1_1_clean_wave1.do
This script processes the raw data from wave 1, the survey experiment
1_2_clean_wave2.do
This script processes the raw data from wave 2, the follow-up survey
1_3_merge_generate.do
This script creates the datasets used in the main analysis and for robustness checks by merging the cleaned data from wave 1 and 2, tests the exclusion criteria and creates additional variables
02_analysis.do
This script estimates regression models in Stata, creates figures and tables, saving them to results/figures and results/tables
03_robustness_checks_no_exclusion.do
This script runs the main analysis using the dataset without applying the exclusion criteria. Results are saved in results/tables
04_figure2_germany_map.do
This script creates Figure 2 in the main manuscript using publicly available data on vaccination numbers in Germany.
05_figureS1_dogmatism_scale.do
This script creates Figure S1 using data from a pretest to adjust the dogmatism scale.
06_AppendixS7.do
This script creates the figures and tables provided in Appendix S7 on the representativity of our sample compared to the German average using publicly available data about the age distribution in Germany.
07_AppendixS10.do
This script creates the figures and tables provided in Appendix S10 on the external validity of vaccination rates in our sample using publicly available data on vaccination numbers in Germany.
H
Survey of Consumer Finances (SCF)
dataverse.harvard.edu
Updated May 30, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anthony Damico (2013). Survey of Consumer Finances (SCF) [Dataset]. http://doi.org/10.7910/DVN/FRMKMF
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/FRMKMF
Dataset updated
May 30, 2013
Dataset provided by
Harvard Dataverse
Authors
Anthony Damico
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
analyze the survey of consumer finances (scf) with r the survey of consumer finances (scf) tracks the wealth of american families. every three years, more than five thousand households answer a battery of questions about income, net worth, credit card debt, pensions, mortgages, even the lease on their cars. plenty of surveys collect annual income, only the survey of consumer finances captures such detailed asset data. responses are at the primary economic unit-level (peu) - the economically dominant, financially interdependent family members within a sampled household. norc at the university of chicago administers the data collection, but the board of governors of the federal reserve pay the bills and therefore call the shots. if you were so brazen as to open up the microdata and run a simple weighted median, you'd get the wrong answer. the five to six thousand respondents actually gobble up twenty-five to thirty thousand records in the final pub lic use files. why oh why? well, those tables contain not one, not two, but five records for each peu. wherever missing, these data are multiply-imputed, meaning answers to the same question for the same household might vary across implicates. each analysis must account for all that, lest your confidence intervals be too tight. to calculate the correct statistics, you'll need to break the single file into five, necessarily complicating your life. this can be accomplished with the meanit sas macro buried in the 2004 scf codebook (search for meanit - you'll need the sas iml add-on). or you might blow the dust off this website referred to in the 2010 codebook as the home of an alternative multiple imputation technique, but all i found were broken links. perhaps it's time for plan c, and by c, i mean free. read the imputation section of the latest codebook (search for imputation), then give these scripts a whirl. they've got that new r smell. the lion's share of the respondents in the survey of consumer finances get drawn from a pretty standard sample of american dwellings - no nursing homes, no active-duty military. then there's this secondary sample of richer households to even out the statistical noise at the higher end of the i ncome and assets spectrum. you can read more if you like, but at the end of the day the weights just generalize to civilian, non-institutional american households. one last thing before you start your engine: read everything you always wanted to know about the scf. my favorite part of that title is the word always. this new github repository contains t hree scripts: 1989-2010 download all microdata.R initiate a function to download and import any survey of consumer finances zipped stata file (.dta) loop through each year specified by the user (starting at the 1989 re-vamp) to download the main, extract, and replicate weight files, then import each into r break the main file into five implicates (each containing one record per peu) and merge the appropriate extract data onto each implicate save the five implicates and replicate weights to an r data file (.rda) for rapid future loading 2010 analysis examples.R prepare two survey of consumer finances-flavored multiply-imputed survey analysis functions load the r data files (.rda) necessary to create a multiply-imputed, replicate-weighted survey design demonstrate how to access the properties of a multiply-imput ed survey design object cook up some descriptive statistics and export examples, calculated with scf-centric variance quirks run a quick t-test and regression, but only because you asked nicely replicate FRB SAS output.R reproduce each and every statistic pr ovided by the friendly folks at the federal reserve create a multiply-imputed, replicate-weighted survey design object re-reproduce (and yes, i said/meant what i meant/said) each of those statistics, now using the multiply-imputed survey design object to highlight the statistically-theoretically-irrelevant differences click here to view these three scripts for more detail about the survey of consumer finances (scf), visit: the federal reserve board of governors' survey of consumer finances homepage the latest scf chartbook, to browse what's possible. (spoiler alert: everything.) the survey of consumer finances wikipedia entry the official frequently asked questions notes: nationally-representative statistics on the financial health, wealth, and assets of american hous eholds might not be monopolized by the survey of consumer finances, but there isn't much competition aside from the assets topical module of the survey of income and program participation (sipp). on one hand, the scf interview questions contain more detail than sipp. on the other hand, scf's smaller sample precludes analyses of acute subpopulations. and for any three-handed martians in the audience, ther e's also a few biases between these two data sources that you ought to consider. the survey methodologists at the federal reserve take their job...
H
Replication Data for: Partisanship and Support for Devolving Concrete Policy...
dataverse.harvard.edu
Updated Oct 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Doherty (2024). Replication Data for: Partisanship and Support for Devolving Concrete Policy Decisions to the States [Dataset]. http://doi.org/10.7910/DVN/AE8KCI
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/AE8KCI
Dataset updated
Oct 1, 2024
Dataset provided by
Harvard Dataverse
Authors
David Doherty
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
United States
Description
This archive includes materials needed to replicate analysis reported in Doherty, Touchton, Lyons. 202X. "Partisanship and Support for Devolving Concrete Policy Decisions to the States." Political Behavior. replication_data.dta: Stata formatted dataset with all variables used in the analysis. replication.do: DO file that executes all analysis reported in the article and outputs tables and figures to a subfolder named "tables" Users should save these two files to a folder, create a subfolder titled "tables" and change the path on the first line of the DO file to refer to the main folder.
f
Doherty_Schraeder_Dobbs_Replication.zip – Research Data for "Do Democratic...
datasetcatalog.nlm.nih.gov
figshare.com
Updated Feb 25, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dobbs, Kirstie; Doherty, David; Schraeder, Peter (2019). Doherty_Schraeder_Dobbs_Replication.zip – Research Data for "Do Democratic Revolutions 'Activate' Participants?: The Case of Tunisia" [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000161519
Explore at:
Dataset updated
Feb 25, 2019
Authors
Dobbs, Kirstie; Doherty, David; Schraeder, Peter
Area covered
Tunisia
Description
This archive contains materials to replicate the analysis reported in: Doherty, David, Peter J. Schraeder, and Kirstie L. Dobbs. "Do Democratic Revolutions 'Activate' Participants?: The Case of Tunisia"The root directory includes five Stata DO files (run on Stata 14.1). The file replication.do calls the other four DO files. These other four DO files conduct analysis specific to a particular dataset. In the case of arab_barometer_w2.do and afrobarometer.do the files simply do necessary recoding and output summary statistics. The remaining two DO files use two datasets used to conduct the statistical analysis reported in the paper. They also complete recoding to ensure that variables from these two datasets are coded similarly. The file replication.do then stacks the recoded data to complete the core analysis reported. The directory includes four folders:1) prepped_data: this folder is where the two recoded datasets that are stacked for the core analysis are deposited. It is empty in this archive.2) private_data: Empty folder referred to in commented out code. The only file originally included in this folder was the full dataset from the original survey used in the analysis. The commented out code (top of "orig_survey.do") stripped out variables not used in the analysis and saved the resulting dataset in the raw_data folder.3) raw_data: Contains all datasets used in the analysis. The tunisia_2012_survey.dta file is from our original survey. The remaining files were downloaded from the Arab Barometer and AfroBarometer websites. 4) tables: Empty folder where tables and figures are saved.To run the analysis, users should simply set the directory at the top of the replication.do file.
f
Data from: Inconsistent Retirement Timing
figshare.com
zip
Updated Dec 14, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Philipp Schreiber; Christoph Merkle; Martin Weber (2021). Inconsistent Retirement Timing [Dataset]. http://doi.org/10.6084/m9.figshare.17197928.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.17197928.v1
Dataset updated
Dec 14, 2021
Dataset provided by
figshare
Authors
Philipp Schreiber; Christoph Merkle; Martin Weber
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
AbstractWe study the effect of inconsistent time preferences on actual and planned retirement timing decisions in two independent datasets. Theory predicts that hyperbolic time preferences can lead to dynamically inconsistent retirement timing. In an online experiment with more than 2,000 participants, we find that time-inconsistent participants retire on average 1.75 years earlier than time-consistent participants do. The planned retirement age of non-retired participants decreases with age. This negative age effect is about twice as strong among time-inconsistent participants. The temptation of early retirement seems to rise in the final years of approaching retirement. Consequently, time-inconsistent participants have a higher probability of regretting their retirement decision. We find similar results for a representative household survey (German SAVE panel). Using smoking behavior and overdraft usage as time preference proxies, we confirm that time-inconsistent participants retire earlier and that non-retirees reduce their planned retirement age within the panel.MethodsWe conduct an online experiment in cooperation with a large and well-circulated German newspaper, the Frankfurter Allgemeine Zeitung (FAZ). Participants are recruited via a link on the newspaper's website and two announcements in the print edition. In total, 3,077 participants complete the experiment, which takes them on average 11 minutes. Participants answer questions about retirement planning, time preferences, risk preferences, financial literacy, and demographics. The initial sample for this study consists of 256 retired participants and 2,173 non-retired participants.Usage NotesOur dataset: STATA Do File is attached Additional Datasets: In addition, a German Household Panle is used in this paper. The data cannot be uploaded by us but is available via the Max Planck Institute (https://www.mpisoc.mpg.de/en/social-policy-mea/research/save-2001-2013/). We upload the Do-Files used in the analysis and the results in an excel format (xlsx).
u
Health Survey for England, 2000-2001: Small Area Estimation Teaching Dataset...
datacatalogue.ukdataservice.ac.uk
Updated Jul 29, 2011
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Manchester, Cathie Marsh Centre for Census and Survey Research, ESDS Government (2011). Health Survey for England, 2000-2001: Small Area Estimation Teaching Dataset [Dataset]. http://doi.org/10.5255/UKDA-SN-6792-1
Explore at:
Unique identifier
https://doi.org/10.5255/UKDA-SN-6792-1
Dataset updated
Jul 29, 2011
Dataset provided by
UK Data Servicehttps://ukdataservice.ac.uk/
Authors
University of Manchester, Cathie Marsh Centre for Census and Survey Research, ESDS Government
Area covered
England
Description
The Health Survey for England, 2000-2001: Small Area Estimation Teaching Dataset was prepared as a resource for those interested in learning introductory small area estimation techniques. It was first presented as part of a workshop entitled 'Introducing small area estimation techniques and applying them to the Health Survey for England using Stata'. The data are accompanied by a guide that includes a practical case study enabling users to derive estimates of disability for districts in the absence of survey estimates. This is achieved using various models that combine information from ESDS government surveys with other aggregate data that are reliably available for sub-national areas. Analysis is undertaken using Stata statistical software; all relevant syntax is provided in the accompanying '.do' files.

The data files included in this teaching resource contain HSE variables and data from the Census and Mid-year population estimates and projections that were developed originally by the National Statistical agencies, as follows:
The main data file, 'hse_data.dta', is a reduced version of the HSE for 2000 and 2001. In order to combine data from two years of the HSE in a consistent way some changes have been made to the weights in each year. Additionally, some recoding of the limiting long term illness (LLTI), disability and the age variable has also been undertaken.
File 'practical_1_task_5_data.dta' contains population counts and model mobility disability rates (estimated during practical 1) distinguishing single year of age and sex for the six case study districts.
File 'practical_2_data.dta' contains the aggregate data required for Practical 2, including age- and sex-specific rates of LLTI (Census) for six UK case study districts, age- and sex-specific rates of mobility disability for England (HSE), and population counts for the six districts.
File 'pop_data_practical_3.dta' contains population counts for the six districts (by age, sex and LLTI status) required for practical 3
The original HSEs for 2000 and 2001 are held at the UK Data Archive under SNs 4628 and 4912 respectively. Full details of the recoding of HSE variables and how the aggregate data was produced can be found in the data documentation.

This unrestricted access data collection is freely available to download under an Open Government Licence from the UK Data Service. Note that the files should be unzipped/saved to the C: drive of the computer to be used; all syntax assumes files are saved at this location.
o
Uniform Crime Reporting (UCR) Program Data: Hate Crime Data 1992-2016
openicpsr.org
datasearch.gesis.org
Updated May 18, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Kaplan (2018). Uniform Crime Reporting (UCR) Program Data: Hate Crime Data 1992-2016 [Dataset]. http://doi.org/10.3886/E103500V3
Explore at:
Unique identifier
https://doi.org/10.3886/E103500V3
Dataset updated
May 18, 2018
Dataset provided by
University of Pennsylvania
Authors
Jacob Kaplan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
1992 - 2015
Area covered
United States
Description
Version 3 release notes: Adds data for 2016.Order rows by year (descending) and ORI.Version 2 release notes: Fix bug where Philadelphia Police Department had incorrect FIPS county code. The Hate Crime data is an FBI data set that is part of the annual Uniform Crime Reporting (UCR) Program data. This data contains information about hate crimes reported in the United States. The data sets here combine all data from the years 1992-2015 into a single file. Please note that the files are quite large and may take some time to open.Each row indicates a hate crime incident for an agency in a given year. I have made a unique ID column ("unique_id") by combining the year, agency ORI9 (the 9 character Originating Identifier code), and incident number columns together. Each column is a variable related to that incident or to the reporting agency. Some of the important columns are the incident date, what crime occurred (up to 10 crimes), the number of victims for each of these crimes, the bias motivation for each of these crimes, and the location of each crime. It also includes the total number of victims, total number of offenders, and race of offenders (as a group). Finally, it has a number of columns indicating if the victim for each offense was a certain type of victim or not (e.g. individual victim, business victim religious victim, etc.). All the data was downloaded from NACJD as ASCII+SPSS Setup files and read into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. For the R code used to clean this data, see here. https://github.com/jacobkap/crime_data. The only changes I made to the data are the following. Minor changes to column names to make all column names 32 characters or fewer (so it can be saved in a Stata format), changed the name of some UCR offense codes (e.g. from "agg asslt" to "aggravated assault"), made all character values lower case, reordered columns. I also added state, county, and place FIPS code from the LEAIC (crosswalk) and generated incident month, weekday, and month-day variables from the incident date variable included in the original data. The zip file contains the data in the following formats and a codebook: .csv - Microsoft Excel.dta - Stata.sav - SPSS.rda - RIf you have any questions, comments, or suggestions please contact me at jkkaplan6@gmail.com.
H
Replication Data for: How Do Electoral Incentives Affect Legislator...
dataverse.harvard.edu
search.dataone.org
Updated Sep 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Fouirnaies; Andrew B. Hall (2021). Replication Data for: How Do Electoral Incentives Affect Legislator Behavior? Evidence from U.S. State Legislatures [Dataset]. http://doi.org/10.7910/DVN/LHTRWM
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/LHTRWM
Dataset updated
Sep 28, 2021
Dataset provided by
Harvard Dataverse
Authors
Alexander Fouirnaies; Andrew B. Hall
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
United States
Description
This folder contains the raw data and the state code to produce the dataset used in the paper "How Do Electoral Incentives Affect Legislator Behavior?". The code produces all tables and figures in the paper and in the appendix. To replicate the findings, download the replication folder with all materials and set the working directory to this folder. Run the file replicate_how_do_electoral_incentives_affect_legislator_behavior.do in Stata. This will produce the main dataset from the raw input data and produce all the tables and figures and save them in the folder tables_figures. The individual results can also replicated using the dataset termlimited.dta in the data_output folder and the relevant do files. The do file electoral_incentives.do shows what do file is needed to replicate a particular table or figure in the paper or appendix.
H
Replication Data for "Core Political Values and the Long-Term Shaping of...
dataverse.harvard.edu
search.dataone.org
Updated Aug 22, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Geoffrey Evans; Anja Neundorf (2018). Replication Data for "Core Political Values and the Long-Term Shaping of Partisanship" [Dataset]. http://doi.org/10.7910/DVN/VJTN9Z
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/VJTN9Z
Dataset updated
Aug 22, 2018
Dataset provided by
Harvard Dataverse
Authors
Geoffrey Evans; Anja Neundorf
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The article uses a dataset, which cannot be deposited online, but is freely available to registered users. The data of the British Household Panel Study can be requested via https://discover.ukdataservice.ac.uk/catalogue/?sn=5151. Here we provide a STATA do-file that will create the working file, recode the original data and run some robustness tests. The data was prepared in Stata and then saved as SPSS files .sav using Stattrans. This was necessary, as the main cross-lagged latent class models of the paper were estimated using LatentGOLD, which only reads .sav files. Here we also provide the syntax files that were used for estimating these models.
D
Replication Data for: A High Court Plays the Accordion: Validating Ex Ante...
dataverse.azure.uit.no
dataverse.no
+1more
tsv, txt
Updated Sep 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Henrik L. Bentsen; Gunnar Grendstad; William R. Shaffer; Eric N. Waltenburg; Eric N. Waltenburg; Henrik L. Bentsen; Gunnar Grendstad; William R. Shaffer (2023). Replication Data for: A High Court Plays the Accordion: Validating Ex Ante Case Complexity on Oral Arguments [Dataset]. http://doi.org/10.18710/DWIX6Y
Explore at:
tsv(235966), txt(213402), txt(6671)Available download formats
Unique identifier
https://doi.org/10.18710/DWIX6Y
Dataset updated
Sep 28, 2023
Dataset provided by
DataverseNO
Authors
Henrik L. Bentsen; Gunnar Grendstad; William R. Shaffer; Eric N. Waltenburg; Eric N. Waltenburg; Henrik L. Bentsen; Gunnar Grendstad; William R. Shaffer
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The data set (saved in Stata *.dta and .txt) contains all observations (Norwegian supreme court cases 2008-2018 decided in five-justice panels) and variables (independent variables measuring complexity of cases and the dependent variable measuring time in hours scheduled for oral arguments) relevant for a complete replication of the the study. ABSTRACT OF STUDY: While high courts with fixed time for oral arguments deprive researchers of the opportunity to extract temporal variance, courts that apply the “accordion model” institutional design and adjust the time for oral arguments according to the perceived complexity of a case are a boon for research that seeks to validate case complexity well ahead of the courts’ opinion writing. We analyse an original data set of all 1,402 merits decisions of the Norwegian Supreme Court from 2008 to 2018 where the justices set time for oral arguments to accommodate the anticipated difficulty of the case. Our validation model empirically tests whether and how attributes of a case associated with ex ante complexity are linked with time allocated for oral arguments. Cases that deal with international law and civil law, have several legal players, are cross-appeals from lower courts are indicative of greater case complexity. We argue that these results speak powerfully to the use of case attributes and/or the time reserved for oral arguments as ex ante measures of case complexity. To enhance the external validity of our findings, future studies should examine whether these results are confirmed in high courts with similar institutional design for oral arguments. Subsequent analyses should also test the degree to which complex cases and/or time for oral arguments have predictive validity on more divergent opinions among the justices and on the time courts and justices need to render a final opinion.
o
Expropriation of the Church's wealth and political conflict in 19th century...
openicpsr.org
stata
Updated Dec 17, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mateo Uribe-Castro (2018). Expropriation of the Church's wealth and political conflict in 19th century Colombia [Dataset]. http://doi.org/10.3886/E107803V2
Explore at:
stataAvailable download formats
Unique identifier
https://doi.org/10.3886/E107803V2
Dataset updated
Dec 17, 2018
Dataset provided by
University of Maryland, College Park
Authors
Mateo Uribe-Castro
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Colombia
Description
Replication files for the paper "Expropriation of the Church's wealth and political violence in 19th century Colombia."It includes a complete dataset (in folder data) and a Stata do-file to replicate the tables and figures from the paper.The other folders are now empty but the program is written to save figures and tables in them.
d
Replication package for \"Religion exhibits the greatest cultural diversity...
search.dataone.org
dataverse.harvard.edu
Updated Oct 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Knudsen, Anne Sofie Beck; Bentzen, Jeanet Sinding; Norenzayan, Ara; Lindbjerg Sperling, Lena (2025). Replication package for \"Religion exhibits the greatest cultural diversity across 117 countries\" [Dataset]. http://doi.org/10.7910/DVN/OQONVO
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/OQONVO
Dataset updated
Oct 28, 2025
Dataset provided by
Harvard Dataverse
Authors
Knudsen, Anne Sofie Beck; Bentzen, Jeanet Sinding; Norenzayan, Ara; Lindbjerg Sperling, Lena
Description
Replication package for: Bentzen, J.S., Knudsen, A.S.B., Sperling, L.L., & Norenzayan, A. (2025), "Religion exhibits the greatest cultural diversity across 117 countries", Nature Communications. ----------------------------------------------------------- FILES ----------------------------------------------------------- 1_prepare_data.do – Prepares variables from the integrated EVS–WVS dataset. 2_Fig_*.do – Scripts for generating all figures (main text and SI). cntr_id.dta – Crosswalk file mapping country identifiers. readme.txt – This file. ----------------------------------------------------------- INSTRUCTIONS ----------------------------------------------------------- 1. Download the integrated European Values Study (EVS) and World Values Survey (WVS) dataset (1981–2022). Detailed instructions are available here: https://europeanvaluesstudy.eu/methodology-data-documentation/integrated-values-surveys/data-and-documentation/ Access requires free registration. 2. Save the integrated file as: Integrated_values_surveys_1981-2022.dta 3. Open Stata (version 17 or higher) and set your working directory at the top of each script (see line 5 in 1_prepare_data.do). 4. Run the scripts in order: - 1_prepare_data.do - All 2_Fig_*.do scripts (each produces one or more figures for the main text and Supplementary Information).
service trade data by mode of supply and service type.dta
figshare.com
bin
Updated Jul 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
riina kerner (2022). service trade data by mode of supply and service type.dta [Dataset]. http://doi.org/10.6084/m9.figshare.20337501.v4
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20337501.v4
Dataset updated
Jul 22, 2022
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
riina kerner
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This is the extraction of services trade statistics, from WTO, used for modes of supply analysis. The data is saved as STATA software format and in Excel. Data is extracted from WTO database available https://www.wto.org/english/news_e/news19_e/serv_31jul19_e.htm
Merged data set
figshare.com
txt
Updated Jan 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Huafeng Zhang (2025). Merged data set [Dataset]. http://doi.org/10.6084/m9.figshare.28246769.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28246769.v1
Dataset updated
Jan 21, 2025
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Huafeng Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data we use in this paper were gathered in the 6th round of Multiple Indicator Cluster Surveys (MICS6), which can be downloaded from https://mics.unicef.org/surveys. The MICS6 surveys are conducted by UNICEF (United Nations International Children's Emergency Fund). We merge the original data from 11 countries and saved the user data in Stata data. In addition, do-file for analysis is also published here.
f
Data from: Ghana EMBRACE Implementation Research
datasetcatalog.nlm.nih.gov
figshare.com
Updated May 18, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Williams, John; Asante, Kwaku Poku; Owusu-Agyei, Seth; Yeji, Francis; okawa, Sumiyo; Addei, Sheila; Kikuchi, Kimiyo; Shibanuma, Akira; Ansah, Evelyn Korkor; Asare, Gloria Quansah; Gyapong, Margaret; Hodgson, Abraham; Tawiah, Charlotte; Oduro, Abraham; Nanishi, Keiko; Jimba, Masamine; Yasuoka, Junko (2021). Ghana EMBRACE Implementation Research [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000931489
Explore at:
Dataset updated
May 18, 2021
Authors
Williams, John; Asante, Kwaku Poku; Owusu-Agyei, Seth; Yeji, Francis; okawa, Sumiyo; Addei, Sheila; Kikuchi, Kimiyo; Shibanuma, Akira; Ansah, Evelyn Korkor; Asare, Gloria Quansah; Gyapong, Margaret; Hodgson, Abraham; Tawiah, Charlotte; Oduro, Abraham; Nanishi, Keiko; Jimba, Masamine; Yasuoka, Junko
Area covered
Ghana
Description
The database is saved as .dta (Stata 13 or above) format.The dataset contains the pooled data from the baseline survey (conducted from July 1 to September 30, 2014) and the follow-up survey (conducted from October 1 to December 31, 2015). The dataset is .dta (Stata 13 or later) format.

Facebook

Twitter

Click to copy link

Link copied

Cite

Dylan Brewer; Alyssa Carlson (2025). Addressing sample selection bias for machine learning methods (replication data) [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9qb3VybmFsZGF0YS56YncuZXUvZGF0YXNldC9hZGRyZXNzaW5nLXNhbXBsZS1zZWxlY3Rpb24tYmlhcy1mb3ItbWFjaGluZS1sZWFybmluZy1tZXRob2RzLXJlcGxpY2F0aW9uLWRhdGE=

Addressing sample selection bias for machine learning methods (replication data)

Explore at:

Dataset updated

Oct 2, 2025

Dataset provided by

Journal of Applied Econometrics
ZBW
ZBW Journal Data Archive

Authors

Dylan Brewer; Alyssa Carlson

Description

Addressing sample selection bias for machine learning methods (replication data)

Dylan Brewer and Alyssa Carlson

Accepted at Journal of Applied Econometrics, 2023

Overview

This replication package contains files required to reproduce results, tables, and figures using Matlab and Stata. We divide the project into instructions to replicate the simulation, the result from Huang et al (2006), and the application.

Simulation

For reproducing the simulation results

Included files in *\Simulation with short descriptions:

SSML_simfunc: function that produces individual simulations runs
SSML_simulation: script that loops over the SSML_simfunc for different DGP and multiple simulation runs
SSML_figures: script that generates all figures for the paper
SSML_compilefunc: function that compiles the results from SSML_simulation for the SSML_figures script

Steps for replicating simulation:

Save SSML_simfunc, SSML_simulation, SSML_figures, SSML_compilefunc to the same folder. This location will be referred to as the FILEPATH.
Create OUTPUT folder inside the FILEPATH location.
Change the FILEPATH location inside SSML_simulation and SSML_figures.
Run SSML_simulation to produce simulation data and results.
Run SSML_figures to produce figures.

Huang et al replication

For reproducing the Huang et. al. (2006) replication results.

Included files in `*\HuangetalReplication` with short descriptions:

SSML_huangrep: script that replicates the results from Huang et. al. (2006)

Obtaining the dataset:

Go to https://archive.ics.uci.edu/dataset/14/breast+cancer and save file as "breast-cancer-wisconsin.data"

Steps for replicating results:

Save SSML_huangrep and the breast cancer data to the same folder. This location will be referred to as the FILEPATH.
Change the FILEPATH location inside SSML_huangrep
Run SSML_huangrep to produce results and figures.

Application

For reproducing the application section results.

Included program files in `*\Application` with short descriptions:

G0_main_202308.do: Stata wrapper code that will run all application replication files
G1_cqclean_202308.do: Cleans election outcomes data
G2_cqopen_202308.do: Cleans open elections data
G3_demographics_cainc30_202308.do: Cleans demographics data
G4_fips_202308.do: Cleans FIPS code data
G5_klarnerclean_202308.do: Cleans Klarner gubernatorial data
G6_merge_202308.do: Merges cleaned datasets together
G7_summary_202308.do: Generates summary statistics tables and figures
G8_firststage_202308.do: Runs L1 penalized probit for the first stage
G9_prediction_202308.m: Trains learners and makes predictions
G10_figures_202308.m: Generates figures of prediction patterns
G11_final_202308.do: Generates final figures and tables of results
r1_lasso_alwayskeepCF_202308.do: Examines the effect of requiring the control function is not dropped from LASSO
latexTable.m: Code by Eli Duenisch to write LaTeX tables from Matlab (https://www.mathworks.com/matlabcentral/fileexchange/44274-latextable)

Included non-confidential data in subdirectory `*\Application\Data`:

\CAINC30: County level income and demographics data from the BEA
\CPI: CPI data from the BLS
\KlarnerGovernors: Carl Klarner's Governors Dataset available at https://dataverse.harvard.edu/dataset.xhtml?persistentId=hdl:1902.1/20408

Confidential data suppressed in subdirectory `*\Application\CD`:

These data cannot be transferred as part of the data use agreement with the CQ Press. Thus, the files are not included.

\CQ_county: County level election outcomes available from http://library.cqpress.com/elections/login.php?requested=%2Felections%2Fdownload-data.php
\CQ_open: Open elections available from http://library.cqpress.com/elections/advsearch/elections-with-open-seats-results.php?open_year1=1968&open_year2=2019&open_office=4

There is no batch download--downloads for each year must be done by hand. For each year, download as many state outcomes as possible and name the files YYYYa.csv, YYYYb.csv, etc. (Example: 1970a.csv, 1970b.csv, 1970c.csv, 1970d.csv). See line 18 of G1_cqclean_202308.do for file structure information.

Steps for replicating application:

Download confidential data from the CQ Press.
Change the working directory in G0_main_202308.do on line 18 to the application folder.
Change local matlabpath in G0_main_202308.do on line 18 to the appropriate location.
Set directory and file path in G9_prediction_202308.m and G10_figures_202308.m as necessary.
Run G0_main_202308.do in Stata to run all programs.
All output (figures and tables) will be saved to subdirectory *\Application\Output.

Contact

Contact Dylan Brewer (brewer@gatech.edu) or Alyssa Carlson (carlsonah@missouri.edu) for help with replication.

Clear search

Close search

Google apps

Main menu

Addressing sample selection bias for machine learning methods (replication...

Addressing sample selection bias for machine learning methods (replication data)

Overview

Simulation

Included files in *\Simulation with short descriptions:

Steps for replicating simulation:

Huang et al replication

Included files in *\HuangetalReplication with short descriptions:

Obtaining the dataset:

Steps for replicating results:

Application

Included program files in *\Application with short descriptions:

Included non-confidential data in subdirectory `*\Application\Data`:

Confidential data suppressed in subdirectory `*\Application\CD`:

Steps for replicating application:

Contact

Area Resource File (ARF)

FGDs patients’ characteristics Stata format dataset and its do file.

Data.Evaluation Report 9-months pilot Open Science Support Desk

Effects of community management on user activity in online communities

Repeated information of benefits reduce COVID-19 vaccination hesitancy:...

Survey of Consumer Finances (SCF)

Replication Data for: Partisanship and Support for Devolving Concrete Policy...

Doherty_Schraeder_Dobbs_Replication.zip – Research Data for "Do Democratic...

Data from: Inconsistent Retirement Timing

Health Survey for England, 2000-2001: Small Area Estimation Teaching Dataset...

Uniform Crime Reporting (UCR) Program Data: Hate Crime Data 1992-2016

Replication Data for: How Do Electoral Incentives Affect Legislator...

Replication Data for "Core Political Values and the Long-Term Shaping of...

Replication Data for: A High Court Plays the Accordion: Validating Ex Ante...

Expropriation of the Church's wealth and political conflict in 19th century...

Replication package for \"Religion exhibits the greatest cultural diversity...

service trade data by mode of supply and service type.dta

Merged data set

Data from: Ghana EMBRACE Implementation Research

Addressing sample selection bias for machine learning methods (replication data)

Addressing sample selection bias for machine learning methods (replication data)

Overview

Simulation

Included files in *\Simulation with short descriptions:

Steps for replicating simulation:

Huang et al replication

Included files in *\HuangetalReplication with short descriptions:

Obtaining the dataset:

Steps for replicating results:

Application

Included program files in *\Application with short descriptions:

Included non-confidential data in subdirectory `*\Application\Data`:

Confidential data suppressed in subdirectory `*\Application\CD`:

Steps for replicating application:

Contact

Included files in `*\HuangetalReplication` with short descriptions:

Included program files in `*\Application` with short descriptions:

Included files in `*\HuangetalReplication` with short descriptions:

Included program files in `*\Application` with short descriptions: