This dataset was created by iFinance Tutor
https://dataverse-staging.rdmc.unc.edu/api/datasets/:persistentId/versions/4.0/customlicense?persistentId=hdl:1902.29/11638https://dataverse-staging.rdmc.unc.edu/api/datasets/:persistentId/versions/4.0/customlicense?persistentId=hdl:1902.29/11638
This is a 3-part short course (held over three afternoons). Stata part 1 will offer an introduction to Stata for Windows. Part 2 will teach entering data in Stata, working with Stata do files, and show how to append, sort, and merge data sets in Stata. Part 3 teaches how to perform basic statistical procedures and how to draw sub samples from large datasets.
analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D
This code merges multiple years of Crime Survey of England and Wales (CSEW) and/or the British Crime Survey (BCS). The purpose of these code is to help researchers to quickly and easily combine multiple survey sweeps of the CSEW and BCS. By combining multiple survey sweeps, people are able to look at, for instance, trends in violence. Furthermore, using such a combined file enables you to look at specific offences, population groups, or consequences, that do not have a high enough frequency if you would use only a single year. This is a Stata do file, access to Stata is therefore required, as is access to all the BCS and CSEW that you want to merge. In specifying the code, you can decide which files you want to merge. Namely, which years of the Crime Surveys you want to merge and if you want the bolt-on datasets that provide uncapped codes, the adolescent and young adult panels, and/or if you want to use the ‘non-white’ panel. This code does not harmonize variables that are different between years. All original data resources are available via Related Resources.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The sample SAS and Stata code provided here is intended for use with certain datasets in the National Neighborhood Data Archive (NaNDA). NaNDA (https://www.openicpsr.org/openicpsr/nanda) contains some datasets that measure neighborhood context at the ZIP Code Tabulation Area (ZCTA) level. They are intended for use with survey or other individual-level data containing ZIP codes. Because ZIP codes do not exactly match ZIP code tabulation areas, a crosswalk is required to use ZIP-code-level geocoded datasets with ZCTA-level datasets from NaNDA. A ZIP-code-to-ZCTA crosswalk was previously available on the UDS Mapper website, which is no longer active. An archived copy of the ZIP-code-to-ZCTA crosswalk file has been included here. Sample SAS and Stata code are provided for merging the UDS mapper crosswalk with NaNDA datasets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This replication packet contains all the data and Stata do-files to reproduce all tables and figures in "Heterogeneity and Heteroskedasticity in Endogenous Switching Models: Estimating the Effects of Physician Advice on Calorie Consumption." by Riju Joshi and Jeffrey M. Wooldridge
Included folders and short description. ........................................
[I] Simulation.zip folder includes
Simulations.do This is a Stata do-file that replicated the Monte Carlo simulations.
README_simulations.txt
[II] Data.zip folder includes
rawdata_2007_2016.dta. This is the raw NHANES dataset. This dataset has been compiled using the Stata do-file "compiling.do" and merged using the Stata do-file "merging.do". Both Stata do-files are in the Application.zip folder.
data_2007_2016.dta
[III] Application.zip folder incudes
compiling.do This is a Stata do-file that compiles NHANES datasets directly from the website. We compile data on several characteristics for each year.
merging.do This is a Stata do-file that merges all the raw NAHNES datasets collected using compiled.do. We merge them for each year and then we append the yearly files. The final raw dataset is named rawdata_2007_2016.dta
prepping.do This is a Stata do-file that prepares the rawdata_2007_2016.dta dataset for analysis. The cleaned dataset is named as data_2007_2016.dta
analysis.do This is a Stata do-file that conducts the analysis.
README_application.txt This file contains instructions on how to replication the application.
Any questions and concerns with replication can be sent to Riju Joshi (riju@pdx.edu)
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
What is the relationship between environment and democracy? The framework of cultural evolution suggests that societal development is an adaptation to ecological threats. Pertinent theories assume that democracy emerges as societies adapt to ecological factors such as higher economic wealth, lower pathogen threats, less demanding climates, and fewer natural disasters. However, previous research confused within-country processes with between-country processes and erroneously interpreted between-country findings as if they generalize to within-country mechanisms. In this article, we analyze a time-series cross-sectional dataset to study the dynamic relationship between environment and democracy (1949-2016), accounting for previous misconceptions in levels of analysis. By separating within-country processes from between-country processes, we find that the relationship between environment and democracy not only differs by countries but also depends on the level of analysis. Economic wealth predicts increasing levels of democracy in between-country comparisons, but within-country comparisons show that democracy declines as countries become wealthier over time. This relationship is only prevalent among historically wealthy countries but not among historically poor countries, whose wealth also increased over time. By contrast, pathogen prevalence predicts lower levels of democracy in both between-country and within-country comparisons. Our longitudinal analyses identifying temporal precedence reveal that not only reductions in pathogen prevalence drive future democracy, but also democracy reduces future pathogen prevalence and increases future wealth. These nuanced results contrast with previous analyses using narrow, cross-sectional data. As a whole, our findings illuminate the dynamic process by which environment and democracy shape each other.
Methods Our Time-Series Cross-Sectional data combine various online databases. Country names were first identified and matched using R-package “countrycode” (Arel-Bundock, Enevoldsen, & Yetman, 2018) before all datasets were merged. Occasionally, we modified unidentified country names to be consistent across datasets. We then transformed “wide” data into “long” data and merged them using R’s Tidyverse framework (Wickham, 2014). Our analysis begins with the year 1949, which was occasioned by the fact that one of the key time-variant level-1 variables, pathogen prevalence was only available from 1949 on. See our Supplemental Material for all data, Stata syntax, R-markdown for visualization, supplemental analyses and detailed results (available at https://osf.io/drt8j/).
This is a STATA file of merged and cleaned data from the SHED survey years 2017 through 2021.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data we use in this paper were gathered in the 6th round of Multiple Indicator Cluster Surveys (MICS6), which can be downloaded from https://mics.unicef.org/surveys. The MICS6 surveys are conducted by UNICEF (United Nations International Children's Emergency Fund). We merge the original data from 11 countries and saved the user data in Stata data. In addition, do-file for analysis is also published here.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This tool--a simple csv or Stata file for merging--gives you a fast way to assign Census county FIPS codes to variously presented county names. This is useful for dealing with county names collected from official sources, such as election returns, which inconsistently present county names and often have misspellings. It will likely take less than ten minutes the first time, and about one minute thereafter--assuming all versions of your county names are in this file. There are about 3,142 counties in the U.S., and there are 77,613 different permutations of county names in this file (ave=25 per county, max=382). Counties with more likely permutations have more versions. Misspellings were added as I came across them over time. I DON'T expect people to cite the use of this tool. DO feel free to suggest the addition of other county name permutations.
Dataset, Stata codes, and other materials used to create figures and tables reported in the paper are included here. For more detailed descriptions of the files posted, please read "00 Description.pdf" first.
This data depository contains all experimental materials, data, and code for Spamann, Lawyers' Role-Induced Bias ... All experimental materials (i.e., exercise and survey instrument) are in the pdf file Spamann_experimentalmaterials_all.pdf. The dataset Newman.dta (Stata 14.2) contains the data collected. The Stata do-file Spamann_role_bias_code.do generates the three figures and other reported statistical information reported in the version of the paper originally posted to SSRN in May 2019. Spamann_role_bias_code_revised.do generates the four figures and other reported statistical information reported in the revision submitted to JLS in March 2020 and ultimately accepted by the journal. Both do-files use Newman.dta. Newman.dta is the result of merging 6 csv files generated by Qualtrics in each of the six semesters from students' survey responses. These 6 csv files, and the do-file rawdata_merge_clean.do to merge them, are also included.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The paper uses data from the BIBB/BAuA Employment Survey of the Working Population on Qualification and Working Conditions in Germany 2018, doi: 10.7803/501.18.1.1.10. The Survey was conducted by the Federal Institute for Vocational Education and Training (BIBB), and the Federal Institute for Occupational Safety and Health (BAuA). For further details, see https://www.bibb.de/de/65740.php and the BIBB-FDZ Data and methodological Report at https://www.bibb.de/veroeffentlichungen/de/publication/show/16563.
The data access was provided via a Scientific-Use-File (called ZA7574_v1-0-0.dta) of the Data Research Centre at the Federal Institute for Vocational Training and Education (BIBB-FDZ). The data are confidential, but not exclusive. To apply for data access, please follow the instructions at https://www.bibb.de/de/120401.php.
To replicate the results reported in the paper, access to this data set must be obtained from the data provider.
The STATA do-file “ik_replication.do” (also available as txt-file “ik_replication.txt”) is replicating all results presented in the paper. It first makes use of the BIBB-BAuA source file “ZA7574_v1-0-0.dta” (see above) to generate and label all relevant variables, specifies the sample, and finally generates a working data set. In a second step, this working data is used to generate the results. Thereby, the analysis makes use of several auxiliary data sets, which can be merged to the working data. These auxiliary data sets have been obtained and constructed from alternative data sources (which we make available as part of the replication package).
A. Google mobility report https://www.gstatic.com/covid19/mobility/2020-03-29_DE_Mobility_Report_en.pdf Google prepared this report to provide information on the responses to social distancing guidance related to COVID-19. We use information for the first weeks of the shutdown on mobility trend changes for places of work on March 29, 2020, relative to a baseline value. The respective numbers are already included in the do-file to replicate the results in the paper and the pdf-file is part of the replication folder (see Source Files/2020-03-29_DE_Mobility_Report_en.pdf).
B. Unemployment across occupations – Data files ba_jul.dta / ba_jul.txt We use information from the report ”Arbeitsmarkt nach Berufen” from July 2020 provided by the Federal Employment Agency (BA) to obtain yearly changes in unemployment for occupations at the three digit level according to the occupation classification KldB 2010. The original file is part of the replication folder (see Source Files/berufe-heft-kldb2010-d-0-202007-xlsx). We use information from sheet 1.1 for number of unemployed persons in July 2020 and 2019 and the respective difference. This information is merged to the working data using the data file “ba_jul.dta” (or “ba_jul.txt”). It contains the following variables: - kldb2010_3d: 3-digit KldB 2010 occupation code (also available in working data) - jul_2020: number of unemployed persons in July 2020 - jul_2019: number of unemployed persons in July 2020 - delta_abs_jul: difference between 2020 and 2019 C. Fadinger and Schymik (2020) – Data files wfh_sch.dta / wfh_sch.txt To generate Figure A.9 in the Appendix, we rely on estimates provided in recent work by Fadinger and Schymik (2020) , who use an alternative measure for the WFH potential at the NUTS2 level. This information is merged to the working data using the data file “wfh_sch.dta” (or “wfh_sch.txt”). It contains the following variables: - GEO: Name of NUTS2 region - shr_homewk_pssb: Estimates on WFH share from Fadinger and Schymik (2020) - region: NUTS2 number (also available in working data)
D. Spatial Autocorrelation – Data files geo_data.dta / geo_data.txt To check for spatial autocorrelation across the 38 NUTS2 regions in Germany, we compute Moran’s I statistic which requires information on the longitude and latitude of NUTS2 regions. This information can be merged to the working data using the data file “geo_data.dta” (or “geo_data.txt”). It contains the following variables: - nuts_id: NUTS2 code - region: NUTS2 number (also available in working data) - longitude: Longitude position - latitude: Latitude position
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This contains three datasets that need to be merged, and the Stata do file to code the variables and replicate the results in the main document and the SI document.
This file describes the replication material for: Trajectories of mental health problems in childhood and adult voting behaviour: Evidence from the 1970s British Cohort Study. Authors: Lisa-Christine Girard & Martin Okolikj. Accepted in Political Behavior. This dataverse holds the following 4 replication files: 1. data_cleaning_traj.R - This file is designed to load, merge and clean the datasets for the estimation of trajectories along with the rescaling of the age 10 Rutter scale. This file was prepared using R-4.1.1 version. 2. traj_estimation.do - With the dataset merged from data_cleaning_traj.R, we run this file in STATA to create and estimate trajectories, to be included in the full dataset. This file was prepared using STATA 17.0 version. 3. data_cleaning.R - This is the file designed to load, merge and clean all datasets in one for preparation of the main analysis following the trajectory estimation. This file was prepared using R-4.1.1 version. 4. POBE Analysis.do - The analysis file is designed to generate the results from the tables in the published paper along with all supplementary materials. This file was prepared using STATA 17.0 version. The data can be accessed at the following address. It requires user registration under special licence conditions: http://discover.ukdataservice.ac.uk/series/?sn=200001. If you have any questions or spot any errors please contact g.lisachristine@gmail.com or martin.okolic@gmail.com.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This repository contains all replication materials for " Reexamining the Effect of Mass Shootings on Public Support for Gun Control" by David J. Barney and Brian F. Schaffner. The data included are as follows: 3 CCES panel datasets, supplementary data for merging, 2 fully-merged CCES panel datasets. In addition, we include two .do files of Stata code, one of which prepares the data for analysis, and one of which replicates the analyses presented in our paper.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Over the year of 2015, about 800.000 refugees arrived in Germany, a number which equals around one percent of the total population. This migration process was labelled the refugee crisis and was accompanied by a contested debate. On the one hand, there was a widespread willingness to voluntarily help arriving refugees, on the other hand, the number of xenophobic attacks against refugees drastically increased. Our paper will focus on a specific form of xenophobic violence with a strong symbolic meaning: We analyze how arson attacks against collective accommodation facilities spread. Using a comprehensive web chronicle, we collected temporal and spatial data about arson attacks perpetrated on accommodations or facilities for refugees in Germany between 2015 and 2017. We counted 251 attacks, assigned each incident location to its county, merged county characteristics such as population size, proportion of foreigners, right-wing party support, and—going beyond previous research—added geographically coded media data from two digital archives. Besides newspaper contents of a popular nation-wide tabloid, we use a data base that covers local fake news on refugees. Based on these data, we constructed a balanced panel data set with the counties as geographical units and periods of 14 days as the time dimension. Results indicate that social contagion drives the diffusion process of arson attacks. Spatial proximity of previous attacks increased the propensity of attacks in the neighboring counties. Attacks were more likely to occur in counties with larger populations and fewer foreigners. While local newspaper coverage did not impact the diffusion of xenophobic attacks, fake news were relevant–but only in East Germany. We also considered two particularly salient threatening events that received nation-wide media attention, namely Merkel’s “border opening” on the 5th of September 2015 and the sexual assaults occurring during New Year’s 2015/16 in Cologne. Both were followed by temporary increases in violence.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This is replication code for "Citizenship Question Effects on Household Survey Response". The SAS program creates housing unit-level variables for the housing units in the 2019 Census Test. The Stata program merges these data together with the 2019 Census Test data, creates some additional variables, and produces the analysis in the paper.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
This dataset was created by iFinance Tutor