Facebook
Twitteranalyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This exercise dataset was created for researchers interested in learning how to use the models described in the "Handbook on Impact Evaluation: Quantitative Methods and Practices" by S. Khandker, G. Koolwal and H. Samad, World Bank, October 2009 (permanent URL http://go.worldbank.org/FE8098BI60). Public programs are designed to reach certain goals and beneficiaries. Methods to understand whether such programs actually work, as well as the level and nature of impacts on intended beneficiaries, are main themes of this book. Has the Grameen Bank, for example, succeeded in lowering consumption poverty among the rural poor in Bangladesh? Can conditional cash transfer programs in Mexico and Latin America improve health and schooling outcomes for poor women and children? Does a new road actually raise welfare in a remote area in Tanzania, or is it a "highway to nowhere?" This handbook reviews quantitative methods and models of impact evaluation. It begings by reviewing the basic issues pertaining to an evaluation of an intervention to reach certain targets and goals. It then focuses on the experimental design of an impact evaluation, highlighting its strengths and shortcomings, followed by discussions on various non-experimental methods. The authors also cover methods to shed light on the nature and mechanisms by which different participants are benefiting from the program. The handbook provides STATA exercises in the context of evaluating major microcredit programs in Bangladesh, such as the Grameen Bank. This dataset provides both the related Stata data files and the Stata programs.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The code in this replication package assembles the data needed and replicates the analysis of the paper “Labor Market Inequality and the Changing Life Cycle Profile of Male and Female Wages,” by Richard Blundell, Hugo Lopez, and James P. Ziliak. The first file is a Stata file 0_InstallPackages.do which installs a number of plug-in ADO files needed for successful execution (only run once). The second is also a Stata file 1_FullDataPrep.do which calls a number of Stata DO files to compile all the necessary data files and prepares the data for the analysis. The resulting Stata dataset, RunningData_withtaxsim.dta, is found in the replication file /ProcessedData/, and because Matlab relies on csv files, the resulting Matlab input files are at /ProcessedData/MatlabDataInputFiles/. The replicator should expect the code to run for about 3 hours. Then, the Matlab file a2_QuantileEstimation.m should be executed. The parameter estimates reported in the figures and tables come from a Windows desktop version that takes about 5 hours for each of four subsamples for each model specification. Due to the computational complexity of the bootstrap quantile with selection estimator, we made use of a computing cluster with a SLURM job scheduler. There are 8 Matlab bootstrap programs—four to produce standard errors in Tables 1-4 of the manuscript and four to produce standard errors for Supplemental Appendix Tables D1-D4. These boostrap computations were submitted in parallel (i.e. all at once as separate programs) and each took on average 6 days when running on 4 cores. Then the Stata file 3_Figures&Tables.do should be run to produce all 9 figures (30 in the appendix) and 4 tables (7 in the appendix).
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
analyze the survey of consumer finances (scf) with r the survey of consumer finances (scf) tracks the wealth of american families. every three years, more than five thousand households answer a battery of questions about income, net worth, credit card debt, pensions, mortgages, even the lease on their cars. plenty of surveys collect annual income, only the survey of consumer finances captures such detailed asset data. responses are at the primary economic unit-level (peu) - the economically dominant, financially interdependent family members within a sampled household. norc at the university of chicago administers the data collection, but the board of governors of the federal reserve pay the bills and therefore call the shots. if you were so brazen as to open up the microdata and run a simple weighted median, you'd get the wrong answer. the five to six thousand respondents actually gobble up twenty-five to thirty thousand records in the final pub lic use files. why oh why? well, those tables contain not one, not two, but five records for each peu. wherever missing, these data are multiply-imputed, meaning answers to the same question for the same household might vary across implicates. each analysis must account for all that, lest your confidence intervals be too tight. to calculate the correct statistics, you'll need to break the single file into five, necessarily complicating your life. this can be accomplished with the meanit sas macro buried in the 2004 scf codebook (search for meanit - you'll need the sas iml add-on). or you might blow the dust off this website referred to in the 2010 codebook as the home of an alternative multiple imputation technique, but all i found were broken links. perhaps it's time for plan c, and by c, i mean free. read the imputation section of the latest codebook (search for imputation), then give these scripts a whirl. they've got that new r smell. the lion's share of the respondents in the survey of consumer finances get drawn from a pretty standard sample of american dwellings - no nursing homes, no active-duty military. then there's this secondary sample of richer households to even out the statistical noise at the higher end of the i ncome and assets spectrum. you can read more if you like, but at the end of the day the weights just generalize to civilian, non-institutional american households. one last thing before you start your engine: read everything you always wanted to know about the scf. my favorite part of that title is the word always. this new github repository contains t hree scripts: 1989-2010 download all microdata.R initiate a function to download and import any survey of consumer finances zipped stata file (.dta) loop through each year specified by the user (starting at the 1989 re-vamp) to download the main, extract, and replicate weight files, then import each into r break the main file into five implicates (each containing one record per peu) and merge the appropriate extract data onto each implicate save the five implicates and replicate weights to an r data file (.rda) for rapid future loading 2010 analysis examples.R prepare two survey of consumer finances-flavored multiply-imputed survey analysis functions load the r data files (.rda) necessary to create a multiply-imputed, replicate-weighted survey design demonstrate how to access the properties of a multiply-imput ed survey design object cook up some descriptive statistics and export examples, calculated with scf-centric variance quirks run a quick t-test and regression, but only because you asked nicely replicate FRB SAS output.R reproduce each and every statistic pr ovided by the friendly folks at the federal reserve create a multiply-imputed, replicate-weighted survey design object re-reproduce (and yes, i said/meant what i meant/said) each of those statistics, now using the multiply-imputed survey design object to highlight the statistically-theoretically-irrelevant differences click here to view these three scripts for more detail about the survey of consumer finances (scf), visit: the federal reserve board of governors' survey of consumer finances homepage the latest scf chartbook, to browse what's possible. (spoiler alert: everything.) the survey of consumer finances wikipedia entry the official frequently asked questions notes: nationally-representative statistics on the financial health, wealth, and assets of american hous eholds might not be monopolized by the survey of consumer finances, but there isn't much competition aside from the assets topical module of the survey of income and program participation (sipp). on one hand, the scf interview questions contain more detail than sipp. on the other hand, scf's smaller sample precludes analyses of acute subpopulations. and for any three-handed martians in the audience, ther e's also a few biases between these two data sources that you ought to consider. the survey methodologists at the federal reserve take their job...
Facebook
TwitterDatabase of the nation''s substance abuse and mental health research data providing public use data files, file documentation, and access to restricted-use data files to support a better understanding of this critical area of public health. The goal is to increase the use of the data to most accurately understand and assess substance abuse and mental health problems and the impact of related treatment systems. The data include the U.S. general and special populations, annual series, and designs that produce nationally representative estimates. Some of the data acquired and archived have never before been publicly distributed. Each collection includes survey instruments (when provided), a bibliography of related literature, and related Web site links. All data may be downloaded free of charge in SPSS, SAS, STATA, and ASCII formats and most studies are available for use with the online data analysis system. This system allows users to conduct analyses ranging from cross-tabulation to regression without downloading data or relying on other software. Another feature, Quick Tables, provides the ability to select variables from drop down menus to produce cross-tabulations and graphs that may be customized and cut and pasted into documents. Documentation files, such as codebooks and questionnaires, can be downloaded and viewed online.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
September 1., 2016 REPLICATION FILES FOR «THE IMPACT OF STATE TELEVISION ON VOTER TURNOUT», TO BE PUBLISHED BY THE BRITISH JOURNAL OF POLITICAL SCIENCE The replication files consist of two datasets and corresponding STATA do-files. Please note the following: 1. The data used in the current microanalysis are based on the National Election Surveys of 1965, 1969, and 1973. The Institute of Social Research (ISF) was responsible for the original studies, and data was made available by the NSD (Norwegian Center for Research Data). Neither ISF nor NSD are responsible for the analyses/interpretations of the data presented here. 2. Some of the data used in the municipality-level analyses are taken from NSD’s local government database (“Kommunedatabasen”). The NSD is not responsible for the analysis presented here or the interpretation offered in the BJPS-paper. 3. Note the municipality identification has been anonymized to avoid identification of individual respondents. 4. Most of the analyses generate Word-files that are produced by the outreg2 facility in STATA. These tables can be compared with those presented in the paper. The graphs are directly comparable to those in the paper. In a few cases, the results are only generated in the STATA output window. The paper employs two sets of data: I. Municipal level data in entered in STATA-format (AggregateReplicationTVData.dta), and with a corresponding data with map coordinates (muncoord.dta). The STATA code is in a do-file (ReplicationOfAggregateAnalysis.do). II. The survey data is in a STATA-file (ReplicationofIndividualLevelPanel.dta) and a with a corresponding do-file (ReplicationOfIndividualLevelAnalysis 25.08.2016.do). Please remember to change the file reference (i.e. use-statement) to execute the do-files.
Facebook
TwitterThe PHF scientific use file Wave 1 Version 3.0 data set is the second updated version of the wave 1 PHF data set and consists of the following five Stata files: PHF_h_wave1_v3_0.dta, PHF_p_wave1_v3_0.dta, PHF_m_wave1_v3_0.dta, PHF_d_wave1_v3_0.dta and PHF_w_wave1_v3_0.dta. The major changes in SUF Wave 1 Version 3.0 compared to SUF Wave 1 Version 2.0 are as follows: Editing and correction of some values. For more details, see the PHF User Guide on website of the Deutsche Bundesbank. The PHF scientific use file Wave 1 Version 3.0 data set is the second updated version of the wave 1 PHF data set and consists of the following five Stata files: PHF_h_wave1_v3_0.dta, PHF_p_wave1_v3_0.dta, PHF_m_wave1_v3_0.dta, PHF_d_wave1_v3_0.dta and PHF_w_wave1_v3_0.dta. The major changes in SUF Wave 1 Version 3.0 compared to SUF Wave 1 Version 2.0 are as follows: Editing and correction of some values. For more details, see the PHF User Guide on website of the Deutsche Bundesbank. Face-to-face interview: CAPI/CAMI All private households located in Germany except institutional households (in old-age homes, prisons, barracks etc.) Stratified random sample based on population registers; oversampling of wealthy households
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
analyze the area resource file (arf) with r the arf is fun to say out loud. it's also a single county-level data table with about 6,000 variables, produced by the united states health services and resources administration (hrsa). the file contains health information and statistics for over 3,000 us counties. like many government agencies, hrsa provides only a sas importation script and an as cii file. this new github repository contains two scripts: 2011-2012 arf - download.R download the zipped area resource file directly onto your local computer load the entire table into a temporary sql database save the condensed file as an R data file (.rda), comma-separated value file (.csv), and/or stata-readable file (.dta). 2011-2012 arf - analysis examples.R limit the arf to the variables necessary for your analysis sum up a few county-level statistics merge the arf onto other data sets, using both fips and ssa county codes create a sweet county-level map click here to view these two scripts for mo re detail about the area resource file (arf), visit: the arf home page the hrsa data warehouse notes: the arf may not be a survey data set itself, but it's particularly useful to merge onto other survey data. confidential to sas, spss, stata, and sudaan users: time to put down the abacus. time to transition to r. :D
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Canada Trademarks Dataset
18 Journal of Empirical Legal Studies 908 (2021), prepublication draft available at https://papers.ssrn.com/abstract=3782655, published version available at https://onlinelibrary.wiley.com/share/author/CHG3HC6GTFMMRU8UJFRR?target=10.1111/jels.12303
Dataset Selection and Arrangement (c) 2021 Jeremy Sheff
Python and Stata Scripts (c) 2021 Jeremy Sheff
Contains data licensed by Her Majesty the Queen in right of Canada, as represented by the Minister of Industry, the minister responsible for the administration of the Canadian Intellectual Property Office.
This individual-application-level dataset includes records of all applications for registered trademarks in Canada since approximately 1980, and of many preserved applications and registrations dating back to the beginning of Canada’s trademark registry in 1865, totaling over 1.6 million application records. It includes comprehensive bibliographic and lifecycle data; trademark characteristics; goods and services claims; identification of applicants, attorneys, and other interested parties (including address data); detailed prosecution history event data; and data on application, registration, and use claims in countries other than Canada. The dataset has been constructed from public records made available by the Canadian Intellectual Property Office. Both the dataset and the code used to build and analyze it are presented for public use on open-access terms.
Scripts are licensed for reuse subject to the Creative Commons Attribution License 4.0 (CC-BY-4.0), https://creativecommons.org/licenses/by/4.0/. Data files are licensed for reuse subject to the Creative Commons Attribution License 4.0 (CC-BY-4.0), https://creativecommons.org/licenses/by/4.0/, and also subject to additional conditions imposed by the Canadian Intellectual Property Office (CIPO) as described below.
Terms of Use:
As per the terms of use of CIPO's government data, all users are required to include the above-quoted attribution to CIPO in any reproductions of this dataset. They are further required to cease using any record within the datasets that has been modified by CIPO and for which CIPO has issued a notice on its website in accordance with its Terms and Conditions, and to use the datasets in compliance with applicable laws. These requirements are in addition to the terms of the CC-BY-4.0 license, which require attribution to the author (among other terms). For further information on CIPO’s terms and conditions, see https://www.ic.gc.ca/eic/site/cipointernet-internetopic.nsf/eng/wr01935.html. For further information on the CC-BY-4.0 license, see https://creativecommons.org/licenses/by/4.0/.
The following attribution statement, if included by users of this dataset, is satisfactory to the author, but the author makes no representations as to whether it may be satisfactory to CIPO:
The Canada Trademarks Dataset is (c) 2021 by Jeremy Sheff and licensed under a CC-BY-4.0 license, subject to additional terms imposed by the Canadian Intellectual Property Office. It contains data licensed by Her Majesty the Queen in right of Canada, as represented by the Minister of Industry, the minister responsible for the administration of the Canadian Intellectual Property Office. For further information, see https://creativecommons.org/licenses/by/4.0/ and https://www.ic.gc.ca/eic/site/cipointernet-internetopic.nsf/eng/wr01935.html.
Details of Repository Contents:
This repository includes a number of .zip archives which expand into folders containing either scripts for construction and analysis of the dataset or data files comprising the dataset itself. These folders are as follows:
If users wish to construct rather than download the datafiles, the first script that they should run is /py/sftp_secure.py. This script will prompt the user to enter their IP Horizons SFTP credentials; these can be obtained by registering with CIPO at https://ised-isde.survey-sondage.ca/f/s.aspx?s=59f3b3a4-2fb5-49a4-b064-645a5e3a752d&lang=EN&ds=SFTP. The script will also prompt the user to identify a target directory for the data downloads. Because the data archives are quite large, users are advised to create a target directory in advance and ensure they have at least 70GB of available storage on the media in which the directory is located.
The sftp_secure.py script will generate a new subfolder in the user’s target directory called /XML_raw. Users should note the full path of this directory, which they will be prompted to provide when running the remaining python scripts. Each of the remaining scripts, the filenames of which begin with “iterparse”, corresponds to one of the data files in the dataset, as indicated in the script’s filename. After running one of these scripts, the user’s target directory should include a /csv subdirectory containing the data file corresponding to the script; after running all the iterparse scripts the user’s /csv directory should be identical to the /csv directory in this repository. Users are invited to modify these scripts as they see fit, subject to the terms of the licenses set forth above.
With respect to the Stata do-files, only one of them is relevant to construction of the dataset itself. This is /do/CA_TM_csv_cleanup.do, which converts the .csv versions of the data files to .dta format, and uses Stata’s labeling functionality to reduce the size of the resulting files while preserving information. The other do-files generate the analyses and graphics presented in the paper describing the dataset (Jeremy N. Sheff, The Canada Trademarks Dataset, 18 J. Empirical Leg. Studies (forthcoming 2021)), available at https://papers.ssrn.com/abstract=3782655). These do-files are also licensed for reuse subject to the terms of the CC-BY-4.0 license, and users are invited to adapt the scripts to their needs.
The python and Stata scripts included in this repository are separately maintained and updated on Github at https://github.com/jnsheff/CanadaTM.
This repository also includes a copy of the current version of CIPO's data dictionary for its historical XML trademarks archive as of the date of construction of this dataset.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
It is understood that ensuring equation balance is a necessary condition for a valid model of times series data. Yet, the definition of balance provided so far has been incomplete and there has not been a consistent understanding of exactly why balance is important or how it can be applied. The discussion to date has focused on the estimates produced by the GECM. In this paper, we go beyond the GECM and be- yond model estimates. We treat equation balance as a theoretical matter, not merely an empirical one, and describe how to use the concept of balance to test theoretical propositions before longitudinal data have been gathered. We explain how equation balance can be used to check if your theoretical or empirical model is either wrong or incomplete in a way that will prevent a meaningful interpretation of the model. We also raise the issue of “I(0) balance” and its importance. The replication dataset includes the Stata .do file and .dta file to replicate the analysis in section 4.1 of the Supplementary Information.
Facebook
TwitterThe BOP-HH Scientific Use File 202401 Version 01 data set continues the BOPSOCE Scientific Use File Version 1.0. It consists of the Stata files bophh_suf_202401_v02_wave01.dta to bophh_suf_202401_v02_wave48.dta. For more details, see the BOP-HH documentation on the website of the Deutsche Bundesbank. Self-administered questionnaire: Web-based Internet-based survey Individuals in Germany with age 16 or higher
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This repository includes data from Chavez's 2020 study "How does a crisis impact contemporary policy tools? The case of the COVID-19 pandemic, Twitter, and U.S. federal executive departments." The data includes Stata dataset and Stata Do-File, as well as the Rscript file for use in the R application. Data was collected from the 15 official U.S. federal executive department Twitter accounts on December 1st, 2019 and June 15th, 2020.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
NaiveBayes_R.xlsx: This Excel file includes information as to how probabilities of observed features are calculated given recidivism (P(x_ij│R)) in the training data. Each cell is embedded with an Excel function to render appropriate figures. P(Xi|R): This tab contains probabilities of feature attributes among recidivated offenders. NIJ_Recoded: This tab contains re-coded NIJ recidivism challenge data following our coding schema described in Table 1. Recidivated_Train: This tab contains re-coded features of recidivated offenders. Tabs from [Gender] through [Condition_Other]: Each tab contains probabilities of feature attributes given recidivism. We use these conditional probabilities to replace the raw values of each feature in P(Xi|R) tab. NaiveBayes_NR.xlsx: This Excel file includes information as to how probabilities of observed features are calculated given non-recidivism (P(x_ij│N)) in the training data. Each cell is embedded with an Excel function to render appropriate figures. P(Xi|N): This tab contains probabilities of feature attributes among non-recidivated offenders. NIJ_Recoded: This tab contains re-coded NIJ recidivism challenge data following our coding schema described in Table 1. NonRecidivated_Train: This tab contains re-coded features of non-recidivated offenders. Tabs from [Gender] through [Condition_Other]: Each tab contains probabilities of feature attributes given non-recidivism. We use these conditional probabilities to replace the raw values of each feature in P(Xi|N) tab. Training_LnTransformed.xlsx: Figures in each cell are log-transformed ratios of probabilities in NaiveBayes_R.xlsx (P(Xi|R)) to the probabilities in NaiveBayes_NR.xlsx (P(Xi|N)). TestData.xlsx: This Excel file includes the following tabs based on the test data: P(Xi|R), P(Xi|N), NIJ_Recoded, and Test_LnTransformed (log-transformed P(Xi|R)/ P(Xi|N)). Training_LnTransformed.dta: We transform Training_LnTransformed.xlsx to Stata data set. We use Stat/Transfer 13 software package to transfer the file format. StataLog.smcl: This file includes the results of the logistic regression analysis. Both estimated intercept and coefficient estimates in this Stata log correspond to the raw weights and standardized weights in Figure 1. Brier Score_Re-Check.xlsx: This Excel file recalculates Brier scores of Relaxed Naïve Bayes Classifier in Table 3, showing evidence that results displayed in Table 3 are correct. *****Full List***** NaiveBayes_R.xlsx NaiveBayes_NR.xlsx Training_LnTransformed.xlsx TestData.xlsx Training_LnTransformed.dta StataLog.smcl Brier Score_Re-Check.xlsx Data for Weka (Training Set): Bayes_2022_NoID Data for Weka (Test Set): BayesTest_2022_NoID Weka output for machine learning models (Conventional naïve Bayes, AdaBoost, Multilayer Perceptron, Logistic Regression, and Random Forest)
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
replication dataset for part 1 of the paper (tables 1 to 3) (impact of mask wearing on the number of infected cases and fatality rates): 'maskpanel2.dta' file replication dataset for part 2 (tables 4, 5 and following) (drivers of mask wearing): 'drivers.dta' file TO OPEN and USE WITH STATA SOFTWARE (STATA 13 HAS BEEN USED) A do file (code file) is associated with both .dta files to replicate the results.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This data depository contains all experimental materials, data, and code for Spamann, Lawyers' Role-Induced Bias ... All experimental materials (i.e., exercise and survey instrument) are in the pdf file Spamann_experimentalmaterials_all.pdf. The dataset Newman.dta (Stata 14.2) contains the data collected. The Stata do-file Spamann_role_bias_code.do generates the three figures and other reported statistical information reported in the version of the paper originally posted to SSRN in May 2019. Spamann_role_bias_code_revised.do generates the four figures and other reported statistical information reported in the revision submitted to JLS in March 2020 and ultimately accepted by the journal. Both do-files use Newman.dta. Newman.dta is the result of merging 6 csv files generated by Qualtrics in each of the six semesters from students' survey responses. These 6 csv files, and the do-file rawdata_merge_clean.do to merge them, are also included.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Dataset Description: Purpose: Enables replication of empirical findings presented in the manuscript "EFFECTS OF TRADE RESISTANCES ON THE CAPITAL GOODS SECTOR: EVIDENCE FROM A STRUCTURAL GRAVITY MODEL FOR BRAZIL (2008–2016)". Nature and Scope: Quantitative panel dataset covering 2008-2016. Includes 144 countries with bilateral (exporter-importer) observations focused on the capital goods sector. Contains variables related to trade flows, trade policy, gravity determinants, macroeconomic indicators, and estimated model parameters. Content: The Stata dataset (20251019_submitted manuscript_data.dta) includes: bilateral capital goods import flows (imp); applied tariffs (tau, t_imp_Bra, t_exp_Bra); gravity variables (ln_DIST, contig, comlang, colony, rta); country-level macro data (ll, lk, lrgdpna); and estimated Multilateral Resistance terms (OMR/IMR). Origin: Data compiled from public sources: UN Comtrade, WITS, WTO (TRAINS, IDB, CTS), CEPII, PWT 9.1, Mario’s Larch RTA Database. OMR/IMR terms were generated via the estimation procedure detailed in the accompanying paper . Code Description: Purpose: This Stata do-file (20251019_submitted manuscript_dofile.do) contains the complete code necessary to replicate the empirical results presented in the manuscript "EFFECTS OF TRADE RESISTANCES ON THE CAPITAL GOODS SECTOR: EVIDENCE FROM A STRUCTURAL GRAVITY MODEL FOR BRAZIL (2008–2016)". Nature and Scope: The file is a script written in Stata command language. Its scope covers the entire empirical workflow. Content: The do-file executes the following main procedures: Loads the accompanying dataset (20251019_submitted manuscript_data.dta). Defines global macros and sets up the estimation environment. Runs the first-stage Poisson Pseudo-Maximum Likelihood (PPML) gravity model estimations with high-dimensional fixed effects (exporter-product-year, importer-product-year, bilateral pairs) using the ppmlhdfe command. Recovers the estimated fixed effects and constructs the Multilateral Resistance terms (OMR and IMR) based on the gravity model results. Merges the MR terms back into the main dataset. Runs the second-stage OLS regressions for the production function using the recovered OMR term. Runs the second-stage OLS regressions for the capital accumulation function using the recovered IMR term. Includes commands for diagnostic tests (e.g., RESET, MaMu variance tests, if applicable within the code). Dependencies: Requires Stata statistical software (version specified in the do-file or compatible) and likely requires user-written packages such as ppmlhdfe. Requires the accompanying dataset (20251019_submitted manuscript_data.dta) to be in the Stata working directory.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data and code needed to reproduce the results of the paper "Effects of community management on user activity in online communities", available in draft here.
Instructions:
Please note: I use both Stata and Jupyter Notebook interactively, running a block with a few lines of code at a time. Expect to have to change directories, file names etc.
Facebook
TwitterThis archive contains materials to replicate the analysis reported in: Doherty, David, Peter J. Schraeder, and Kirstie L. Dobbs. "Do Democratic Revolutions 'Activate' Participants?: The Case of Tunisia"The root directory includes five Stata DO files (run on Stata 14.1). The file replication.do calls the other four DO files. These other four DO files conduct analysis specific to a particular dataset. In the case of arab_barometer_w2.do and afrobarometer.do the files simply do necessary recoding and output summary statistics. The remaining two DO files use two datasets used to conduct the statistical analysis reported in the paper. They also complete recoding to ensure that variables from these two datasets are coded similarly. The file replication.do then stacks the recoded data to complete the core analysis reported. The directory includes four folders:1) prepped_data: this folder is where the two recoded datasets that are stacked for the core analysis are deposited. It is empty in this archive.2) private_data: Empty folder referred to in commented out code. The only file originally included in this folder was the full dataset from the original survey used in the analysis. The commented out code (top of "orig_survey.do") stripped out variables not used in the analysis and saved the resulting dataset in the raw_data folder.3) raw_data: Contains all datasets used in the analysis. The tunisia_2012_survey.dta file is from our original survey. The remaining files were downloaded from the Arab Barometer and AfroBarometer websites. 4) tables: Empty folder where tables and figures are saved.To run the analysis, users should simply set the directory at the top of the replication.do file.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Replication Package – Decompressing to Prevent Unrest David Altman, Pontificia Universidad Católica de Chile 1. Description This dataset accompanies the article: Altman, David. “Decompressing to prevent unrest: political participation through citizen-initiated mechanisms of direct democracy” (2025), Social Movement Studies. It contains the data and code necessary to replicate all statistical analyses and tables presented in the article. 2. Coverage Time frame: 1970–2019 Countries: 116 democracies worldwide (electoral and liberal, according to V-Dem v14) Unit of analysis: Country-year 3. Data Sources V-Dem v14 (Coppedge et al., 2024): direct democracy indices (CIC-DPVI, TOC-DPVI), civil society participation index. NAVCO 1.3 (Chenoweth & Shay, 2020): violent and nonviolent resistance campaigns (dependent variable). World Bank, World Development Indicators: GDP per capita (constant 2015 US$), inflation. Author’s coding: harmonization and cleaning of datasets, construction of dependent variable (excluding self-determination/secession cases). 4. Variables accepted: dichotomous dependent variable (1 if violent or nonviolent regime-change/“other” campaign occurred in a given year; 0 otherwise). CIC_DPVI: citizen-initiated component of V-Dem’s Direct Popular Vote Index. TOC_DPVI: top-down component of direct democracy (plebiscites, obligatory referenda). pc_GDP: GDP per capita (constant 2015 US$). Inflation: annual inflation (%). v2x_cspart: Civil Society Participation Index (V-Dem). country, year: identifiers. 5. Files Included data.dta / data.csv – panel dataset used in the article. master.do – Stata do-file to reproduce all analyses. tables.do – generates Tables 1–2. figures.do – generates Figure 1 (coefficient plot). ReadMe.txt – this document. 6. Instructions Open master.do in Stata (v17 or later). Set working directory to the folder containing the replication package. Run the file. This will: Load data.dta Estimate the models (fixed-effects and random-effects logit with lagged IVs) Produce Tables 1–2 in /results/ Produce Figure 1 in /figures/ 7 Citation If you use this dataset, please cite: Altman, David (2025). Replication data for: Decompressing to Prevent Unrest: Political Participation through Citizen-Initiated Mechanisms of Direct Democracy. Harvard Dataverse. DOI: [to be added]
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Code and data to reproduce all results and graphs reported in Tannenbaum et al. (2022). This folder contains data files (.dta files) and a Stata do-file (code.do) that stitches together the different data files and executes all analyses and produces all figures reported in the paper. The do-file uses a number of user-written packages, which are listed below. Most of these can be installed using the ssc install command in Stata. Also, users will need to change the current directory path (at the start of the do-file) before executing the code. List of user written packages (descriptions): revrs (reverse-codes variable) ereplace (extends the egen command to permit replacing) grstyle (changes the settings for the overall look of graphs) spmap (used for graphing spatial data) qqvalue (used for obtaining Benjamini-Hochberg corrected p-values) parmby (creates a dataset by calling an estimation command for each by-group) domin (used to perform dominance analyses) coefplot (used for creating coefficient plots) grc1leg (combine graphs with a single common legend) xframeappend (append data frames to the end of the current data frame)
Facebook
Twitteranalyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D