100+ datasets found
  1. d

    DHS data extractors for Stata

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emily Oster (2023). DHS data extractors for Stata [Dataset]. http://doi.org/10.7910/DVN/RRX3QD
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Emily Oster
    Description

    This package contains two files designed to help read individual level DHS data into Stata. The first file addresses the problem that versions of Stata before Version 7/SE will read in only up to 2047 variables and most of the individual files have more variables than that. The file will read in the .do, .dct and .dat file and output new .do and .dct files with only a subset of the variables specified by the user. The second file deals with earlier DHS surveys in which .do and .dct file do not exist and only .sps and .sas files are provided. The file will read in the .sas and .sps files and output a .dct and .do file. If necessary the first file can then be run again to select a subset of variables.

  2. o

    Data and Code for: An Empirical Evaluation of Chinese College Admissions...

    • openicpsr.org
    stata
    Updated Sep 7, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yan Chen; Ming Jiang; Onur Kesten (2020). Data and Code for: An Empirical Evaluation of Chinese College Admissions Reforms Through A Natural Experiment [Dataset]. http://doi.org/10.3886/E121101V2
    Explore at:
    stataAvailable download formats
    Dataset updated
    Sep 7, 2020
    Dataset provided by
    University of Sydney
    University of Michigan
    Shanghai Jiao Tong University
    Authors
    Yan Chen; Ming Jiang; Onur Kesten
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2008 - 2009
    Area covered
    China
    Description

    This repository contains datasets and analysis code accompanying the paper "An Empirical Evaluation of Chinese College Admissions Reforms Through A Natural Experiment" by Chen, Jiang, and Kesten. The datasets contain the college admission data for a county in China's Sichuan Province for year 2008 and 2009. These include students' submitted rank-ordered lists of colleges and admission results. All variables are recoded to remove any identifiable information (including college and high school code). The analysis code can be used to replicate the tables and figures in the paper.

  3. d

    Current Population Survey (CPS)

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Damico, Anthony (2023). Current Population Survey (CPS) [Dataset]. http://doi.org/10.7910/DVN/AK4FDD
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Damico, Anthony
    Description

    analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D

  4. Record linkage using Stata

    • linkagelibrary.icpsr.umich.edu
    Updated Jan 3, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nada Wasi; Aaron Flaaen (2019). Record linkage using Stata [Dataset]. http://doi.org/10.3886/E107948V1
    Explore at:
    Dataset updated
    Jan 3, 2019
    Dataset provided by
    Board of Governors of the Federal Reserve System, Division of Research and Statistics
    University of Michigan/ISR
    Authors
    Nada Wasi; Aaron Flaaen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This project points to an article in The Stata Journal describing a set of routines to preprocess nominal data (firm names and addresses), perform probabilistic linking of two datasets, and display candidate matches for clerical review.The ado files and supporting pattern files are downloadable within Stata.

  5. Integrated Postsecondary Education Data System, Complete 1980-2023

    • datalumos.org
    Updated Feb 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States Department of Education. Institute of Education Sciences. National Center for Education Statistics (2025). Integrated Postsecondary Education Data System, Complete 1980-2023 [Dataset]. http://doi.org/10.3886/E218981V2
    Explore at:
    Dataset updated
    Feb 11, 2025
    Dataset provided by
    United States Department of Educationhttps://ed.gov/
    National Center for Education Statisticshttps://nces.ed.gov/
    Institute of Education Scienceshttp://ies.ed.gov/
    Authors
    United States Department of Education. Institute of Education Sciences. National Center for Education Statistics
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    1980 - 2023
    Description

    Integrated Postsecondary Education Data System (IPEDS) Complete Data Files from 1980 to 2023. Includes data file, STATA data file, SPSS program, SAS program, STATA program, and dictionary. All years compressed into one .zip file due to storage limitations.Updated on 2/14/2025 to add Microsoft Access Database files.From IPEDS Complete Data File Help Page (https://nces.ed.gov/Ipeds/help/complete-data-files):Choose the file to download by reading the description in the available titles. Then, click on the link in that row corresponding to the column header of the type of file/information desired to download.To download and view the survey files in basic CSV format use the main download link in the Data File column.For files compatible with the Stata statistical software package, use the alternate download link in the Stata Data File column.To download files with the SPSS, SAS, or STATA (.do) file extension for use with statistical software packages, use the download link in the Programs column.To download the data Dictionary for the selected file, click on the corresponding link in the far right column of the screen. The data dictionary serves as a reference for using and interpreting the data within a particular survey file. This includes the names, definitions, and formatting conventions for each table, field, and data element within the file, important business rules, and information on any relationships to other IPEDS data.For statistical read programs to work properly, both the data file and the corresponding read program file must be downloaded to the same subdirectory on the computer’s hard drive. Download the data file first; then click on the corresponding link in the Programs column to download the desired read program file to the same subdirectory.When viewing downloaded survey files, categorical variables are identified using codes instead of labels. Labels for these variables are available in both the data read program files and data dictionary for each file; however, for files that automatically incorporate this information you will need to select the Custom Data Files option.

  6. H

    Replication data, and data sources

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Apr 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fabio Diaz Pabon (2022). Replication data, and data sources [Dataset]. http://doi.org/10.7910/DVN/ZCSH4I
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 2, 2022
    Dataset provided by
    Harvard Dataverse
    Authors
    Fabio Diaz Pabon
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    These are the different datasets used in the analysis of the relation between protest, protest campaigns, and armed conflict in Colombia and South Africa. Different files are included. 1. Excell file containing a description of the different variables and their sources 2. Stata file of the data (appended data) and stata file for each hypothesis 3. Do file for the analysis used for undertaking the statistical analysis.

  7. d

    Stata et l'analyse de données en sciences sociales

    • search.dataone.org
    Updated Nov 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ferland, Benjamin (2025). Stata et l'analyse de données en sciences sociales [Dataset]. http://doi.org/10.7910/DVN/7LYPFN
    Explore at:
    Dataset updated
    Nov 1, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    Ferland, Benjamin
    Description

    Matériels nécessaires afin de reproduire l'ensemble des résultats présentés dans le manuel "Stata et l'analyse de données en sciences sociales".

  8. H

    Data from: The impact of state television on voter turnout

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Feb 28, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rune Jørgen Sørensen (2017). The impact of state television on voter turnout [Dataset]. http://doi.org/10.7910/DVN/QGMHHQ
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 28, 2017
    Dataset provided by
    Harvard Dataverse
    Authors
    Rune Jørgen Sørensen
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    September 1., 2016 REPLICATION FILES FOR «THE IMPACT OF STATE TELEVISION ON VOTER TURNOUT», TO BE PUBLISHED BY THE BRITISH JOURNAL OF POLITICAL SCIENCE The replication files consist of two datasets and corresponding STATA do-files. Please note the following: 1. The data used in the current microanalysis are based on the National Election Surveys of 1965, 1969, and 1973. The Institute of Social Research (ISF) was responsible for the original studies, and data was made available by the NSD (Norwegian Center for Research Data). Neither ISF nor NSD are responsible for the analyses/interpretations of the data presented here. 2. Some of the data used in the municipality-level analyses are taken from NSD’s local government database (“Kommunedatabasen”). The NSD is not responsible for the analysis presented here or the interpretation offered in the BJPS-paper. 3. Note the municipality identification has been anonymized to avoid identification of individual respondents. 4. Most of the analyses generate Word-files that are produced by the outreg2 facility in STATA. These tables can be compared with those presented in the paper. The graphs are directly comparable to those in the paper. In a few cases, the results are only generated in the STATA output window. The paper employs two sets of data: I. Municipal level data in entered in STATA-format (AggregateReplicationTVData.dta), and with a corresponding data with map coordinates (muncoord.dta). The STATA code is in a do-file (ReplicationOfAggregateAnalysis.do). II. The survey data is in a STATA-file (ReplicationofIndividualLevelPanel.dta) and a with a corresponding do-file (ReplicationOfIndividualLevelAnalysis 25.08.2016.do). Please remember to change the file reference (i.e. use-statement) to execute the do-files.

  9. o

    Climate Change Has Already Made the United States Poorer

    • openicpsr.org
    Updated Nov 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Derek Lemoine (2025). Climate Change Has Already Made the United States Poorer [Dataset]. http://doi.org/10.3886/E240311V1
    Explore at:
    Dataset updated
    Nov 14, 2025
    Dataset provided by
    University of Arizona
    Authors
    Derek Lemoine
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    code.zip: Stata code. See the readme in there.data.zip: Files called in the Stata code, some of which are also generated by the Stata codeoutputdir.zip: Complete set of results files from running Stata codePaper abstract: The climate is already changing. The present study shows that these changes have already affected the U.S. economy. It develops a formal framework that accounts for how climate change has affected each county's economy by altering current and past weather, both locally and elsewhere around the country. The results show that climate change is already reducing annual U.S. income by 0.32% [95% confidence interval: -0.17--0.82%] by altering counties' current, local temperatures, with losses concentrated in the Great Plains and Midwest. Accounting for effects on past temperatures and on temperatures in other counties increases income losses to 12% [2.0--22%] and makes them more widely distributed, with suggestive evidence that trade networks propagate effects around the U.S. Central estimates can change with different indices of nonlocal weather or models of cross-county heterogeneity. Calculations like those developed here could be updated annually as a way of measuring and communicating the progress of climate change.

  10. d

    Data from: SPSS, STATA, and SAS: Flavours of Statistical Software

    • search.dataone.org
    Updated Dec 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michelle Edwards (2023). SPSS, STATA, and SAS: Flavours of Statistical Software [Dataset]. http://doi.org/10.5683/SP3/E3CZEC
    Explore at:
    Dataset updated
    Dec 28, 2023
    Dataset provided by
    Borealis
    Authors
    Michelle Edwards
    Description

    This workshop takes you on a quick tour of Stata, SPSS, and SAS. It examines a data file using each package. Is one more user friendly than the others? Are there significant differences in the codebooks created? This workshop also looks at creating a frequency and cross-tabulation table in each. Which output screen is easiest to read and interpret? The goal of this workshop is to give you an overview of these products and provide you with the information you need to determine whick package fits the requirements of you and your user.

  11. o

    Stata commands and data to replicate Park's "Public Education Funding Cuts...

    • openicpsr.org
    Updated Oct 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jiwon Park (2025). Stata commands and data to replicate Park's "Public Education Funding Cuts and Enrollment Shift to Private Schools: Evidence from the Great Recession" [Dataset]. http://doi.org/10.3886/E238646V1
    Explore at:
    Dataset updated
    Oct 6, 2025
    Dataset provided by
    KIF
    Authors
    Jiwon Park
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Stata commands and data to replicate Park's "Public Education Funding Cuts and Enrollment Shift to Private Schools: Evidence from the Great Recession" Abstract:This paper examines whether public school funding affects private school enrollment. To identify causal effects, we exploit the fact that states historically more reliant on state appropriations and those without a state income tax experienced larger K-12 funding cuts after the Great Recession. These fiscal characteristics provide plausibly exogenous variation in public school resources. We find that a $1,000 decrease in per-pupil funding increases private school enrollment by 0.48 to 0.57 percentage points. The effect is strongest among middle- and upper-middle-income households, suggesting that budget cuts to public education may exacerbate socioeconomic inequality in educational opportunities. Keywords: Private school, K-12 appropriations, Great Recession JEL Classification: H75, I21, I22, I28

  12. o

    Data and Code for: Labor Market Inequality And The Changing Life Cycle...

    • openicpsr.org
    delimited, sas
    Updated Apr 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Richard Blundell; Hugo Lopez; James Ziliak (2024). Data and Code for: Labor Market Inequality And The Changing Life Cycle Profile Of Male And Female Wages [Dataset]. http://doi.org/10.3886/E200241V1
    Explore at:
    delimited, sasAvailable download formats
    Dataset updated
    Apr 9, 2024
    Dataset provided by
    American Economic Association
    Authors
    Richard Blundell; Hugo Lopez; James Ziliak
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    1976 - 2018
    Area covered
    United States
    Description

    The code in this replication package assembles the data needed and replicates the analysis of the paper “Labor Market Inequality and the Changing Life Cycle Profile of Male and Female Wages,” by Richard Blundell, Hugo Lopez, and James P. Ziliak. The first file is a Stata file 0_InstallPackages.do which installs a number of plug-in ADO files needed for successful execution (only run once). The second is also a Stata file 1_FullDataPrep.do which calls a number of Stata DO files to compile all the necessary data files and prepares the data for the analysis. The resulting Stata dataset, RunningData_withtaxsim.dta, is found in the replication file /ProcessedData/, and because Matlab relies on csv files, the resulting Matlab input files are at /ProcessedData/MatlabDataInputFiles/. The replicator should expect the code to run for about 3 hours. Then, the Matlab file a2_QuantileEstimation.m should be executed. The parameter estimates reported in the figures and tables come from a Windows desktop version that takes about 5 hours for each of four subsamples for each model specification. Due to the computational complexity of the bootstrap quantile with selection estimator, we made use of a computing cluster with a SLURM job scheduler. There are 8 Matlab bootstrap programs—four to produce standard errors in Tables 1-4 of the manuscript and four to produce standard errors for Supplemental Appendix Tables D1-D4. These boostrap computations were submitted in parallel (i.e. all at once as separate programs) and each took on average 6 days when running on 4 cores. Then the Stata file 3_Figures&Tables.do should be run to produce all 9 figures (30 in the appendix) and 4 tables (7 in the appendix).

  13. n

    Multilevel modeling of time-series cross-sectional data reveals the dynamic...

    • data.niaid.nih.gov
    • dataone.org
    • +1more
    zip
    Updated Mar 6, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kodai Kusano (2020). Multilevel modeling of time-series cross-sectional data reveals the dynamic interaction between ecological threats and democratic development [Dataset]. http://doi.org/10.5061/dryad.547d7wm3x
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 6, 2020
    Dataset provided by
    University of Nevada, Reno
    Authors
    Kodai Kusano
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    What is the relationship between environment and democracy? The framework of cultural evolution suggests that societal development is an adaptation to ecological threats. Pertinent theories assume that democracy emerges as societies adapt to ecological factors such as higher economic wealth, lower pathogen threats, less demanding climates, and fewer natural disasters. However, previous research confused within-country processes with between-country processes and erroneously interpreted between-country findings as if they generalize to within-country mechanisms. In this article, we analyze a time-series cross-sectional dataset to study the dynamic relationship between environment and democracy (1949-2016), accounting for previous misconceptions in levels of analysis. By separating within-country processes from between-country processes, we find that the relationship between environment and democracy not only differs by countries but also depends on the level of analysis. Economic wealth predicts increasing levels of democracy in between-country comparisons, but within-country comparisons show that democracy declines as countries become wealthier over time. This relationship is only prevalent among historically wealthy countries but not among historically poor countries, whose wealth also increased over time. By contrast, pathogen prevalence predicts lower levels of democracy in both between-country and within-country comparisons. Our longitudinal analyses identifying temporal precedence reveal that not only reductions in pathogen prevalence drive future democracy, but also democracy reduces future pathogen prevalence and increases future wealth. These nuanced results contrast with previous analyses using narrow, cross-sectional data. As a whole, our findings illuminate the dynamic process by which environment and democracy shape each other.

    Methods Our Time-Series Cross-Sectional data combine various online databases. Country names were first identified and matched using R-package “countrycode” (Arel-Bundock, Enevoldsen, & Yetman, 2018) before all datasets were merged. Occasionally, we modified unidentified country names to be consistent across datasets. We then transformed “wide” data into “long” data and merged them using R’s Tidyverse framework (Wickham, 2014). Our analysis begins with the year 1949, which was occasioned by the fact that one of the key time-variant level-1 variables, pathogen prevalence was only available from 1949 on. See our Supplemental Material for all data, Stata syntax, R-markdown for visualization, supplemental analyses and detailed results (available at https://osf.io/drt8j/).

  14. o

    Replication data for "Pandemics Depress the Economy, Public Health...

    • openicpsr.org
    delimited
    Updated Aug 31, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sergio Correia; Stephan Luck; Emil Verner (2022). Replication data for "Pandemics Depress the Economy, Public Health Interventions Do Not: Evidence from the 1918 Flu" [Dataset]. http://doi.org/10.3886/E179061V1
    Explore at:
    delimitedAvailable download formats
    Dataset updated
    Aug 31, 2022
    Dataset provided by
    Board of Governors of the Federal Reserve System
    Massachusetts Institute of Technology
    Federal Reserve Bank of New York
    Authors
    Sergio Correia; Stephan Luck; Emil Verner
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    1917 - 1920
    Area covered
    United States
    Description

    This is the replication package for the paper "Pandemics Depress the Economy, Public Health Interventions Do Not: Evidence from the 1918 Flu".It contains the input data as well as all the required Stata code.Please see the README.PDF file for replication instructions.

  15. o

    Data from: Time Use, College Attainment, and The Working-from-Home...

    • openicpsr.org
    Updated Apr 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamin Cowan (2024). Time Use, College Attainment, and The Working-from-Home Revolution [Dataset]. http://doi.org/10.3886/E201001V1
    Explore at:
    Dataset updated
    Apr 18, 2024
    Dataset provided by
    Washington State University
    Authors
    Benjamin Cowan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The file atus._00014.do is the Stata .do file that reads in the American Time Use Survey (ATUS) data pulled from IPUMS (atus_00014.dat). The file public_do_file reads in the Stata ATUS data (atus_school_closure.dta) and adds data on state-level school closures (schoolclosure20-21_st) and occupational telework potential (teleworkable-cps). See the text of the paper for more details on the sources for these datasets.public_do_file then goes on to perform all analyses in the paper.

  16. z

    Galvanising the Open Access Community: A Study on the Impact of Plan S -...

    • zenodo.org
    bin, csv
    Updated Oct 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    W. Benedikt Schmal; W. Benedikt Schmal (2024). Galvanising the Open Access Community: A Study on the Impact of Plan S - Data and Code [Dataset]. http://doi.org/10.5281/zenodo.12523229
    Explore at:
    csv, binAvailable download formats
    Dataset updated
    Oct 15, 2024
    Dataset provided by
    Scidecode
    Authors
    W. Benedikt Schmal; W. Benedikt Schmal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains the datasets and code underpinning Chapter 3 "Counterfactual Impact Evaluation of Plan S" of the report "Galvanising the Open Access Community: A Study on the Impact of Plan S" commissioned by the cOAlition S to scidecode science consulting.

    Two categories of files are part of this repository:

    1. Datasets

    The 21 CSV source files contain the subsets of publications funded by the funding agencies that are part of this study. These files have been provided by OA.Works, with whom scidecode has collaborated for the data collection process. Data sources and collection and processing workflows applied by OA.Works are described on their website and specifically at https://about.oa.report/docs/data.

    The file "plan_s.dta" is the aggregated data file stored in the format ".dta", which can be accessed with STATA by default or with plenty of programming languages using the respective packages, e.g., R or Python.

    2. Code files

    The associated code files that have been used to process the data files are:

     - data_prep_and_analysis_script.do
    - coef_plots_script.R

    The first file has been used to process the CSV data files above for data preparation and analysis purposes. Here, data aggregation and data preprocessing is executed. Furthermore, all statistical regressions for the ounterfactual impact evaluation are listed in this code file. The second code file "coef_plots_script.R" uses the computed results of the counterfactual impact evaluation to create the final graphic plots using the ggplot2 package.

    The first ".do" file has to be run in STATA, the second one (".R") requires the use of an integrated development environment for R.

    Further Information are avilable in the final report and via the followng URLs:
    https://www.coalition-s.org/
    https://scidecode.com/
    https://oa.works/
    https://openalex.org/
    https://sites.google.com/view/wbschmal
  17. d

    Data from: A Cluster Randomized Controlled Trial of the Safe Public Spaces...

    • catalog.data.gov
    • icpsr.umich.edu
    Updated Mar 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Justice (2025). A Cluster Randomized Controlled Trial of the Safe Public Spaces in Schools Program, New York City, 2016-2018 [Dataset]. https://catalog.data.gov/dataset/a-cluster-randomized-controlled-trial-of-the-safe-public-spaces-in-schools-program-ne-2016-f67d7
    Explore at:
    Dataset updated
    Mar 12, 2025
    Dataset provided by
    National Institute of Justice
    Area covered
    New York
    Description

    This study tests the efficacy of an intervention--Safe Public Spaces (SPS) -- focused on improving the safety of public spaces in schools, such as hallways, cafeterias, and stairwells. Twenty-four schools with middle grades in a large urban area were recruited for participation and were pair-matched and then assigned to either treatment or control. The study comprises four components: an implementation evaluation, a cost study, an impact study, and a community crime study. Community-crime-study: The community crime study used the arrest of juveniles from the NYPD (New York Police Department) data. The data can be found at (https://data.cityofnewyork.us/Public-Safety/NYPD-Arrests-Data-Historic-/8h9b-rp9u). Data include all arrest for the juvenile crime during the life of the intervention. The 12 matched schools were identified and geo-mapped using Quantum GIS (QGIS) 3.8 software. Block groups in the 2010 US Census in which the schools reside and neighboring block groups were mapped into micro-areas. This resulted in twelve experimental school blocks and 11 control blocks which the schools reside (two of the control schools existed in the same census block group). Additionally, neighboring blocks using were geo-mapped into 70 experimental and 77 control adjacent block groups (see map). Finally, juvenile arrests were mapped into experimental and control areas. Using the ARIMA time-series method in Stata 15 statistical software package, arrest data were analyzed to compare the change in juvenile arrests in the experimental and control sites. Cost-study: For the cost study, information from the implementing organization (Engaging Schools) was combined with data from phone conversations and follow-up communications with staff in school sites to populate a Resource Cost Model. The Resource Cost Model Excel file will be provided for archiving. This file contains details on the staff time and materials allocated to the intervention, as well as the NYC prices in 2018 US dollars associated with each element. Prices were gathered from multiple sources, including actual NYC DOE data on salaries for position types for which these data were available and district salary schedules for the other staff types. Census data were used to calculate benefits. Impact-evaluation: The impact evaluation was conducted using data from the Research Alliance for New York City Schools. Among the core functions of the Research Alliance is maintaining a unique archive of longitudinal data on NYC schools to support ongoing research. The Research Alliance builds and maintains an archive of longitudinal data about NYC schools. Their agreement with the New York City Department of Education (NYC DOE) outlines the data they receive, the process they use to obtain it, and the security measures to keep it safe. Implementation-study: The implementation study comprises the baseline survey and observation data. Interview transcripts are not archived.

  18. B

    Analysis Data for "Identifying and characterizing pesticide use on 9,000...

    • borealisdata.ca
    • search.dataone.org
    Updated Aug 19, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ashley Larsen; Sofie McComb; Claire Powers; Sofie McComb (2021). Analysis Data for "Identifying and characterizing pesticide use on 9,000 fields of organic agriculture" [Dataset]. http://doi.org/10.5683/SP2/K2OCWO
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 19, 2021
    Dataset provided by
    Borealis
    Authors
    Ashley Larsen; Sofie McComb; Claire Powers; Sofie McComb
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    AbstractWe identify the location of ~9,000 organic fields from 2013 — 2019 using field-level crop and pesticide use data, along with state certification data, for Kern County, CA, one of the US’ most valuable crop producing counties. We parse apart how being organic relative to conventional affects decisions to spray pesticides and, if spraying, how much to spray. We show the expected probability of spraying any pesticides is reduced by about 30 percentage points for organic relative to conventional fields, across different metrics of pesticide use including overall weight applied and coarse ecotoxicity metrics. We report little difference, on average, in pesticide use for organic and conventional fields that do spray, though observe substantial crop-specific heterogeneity. MethodsPlease see description in manuscript & supplementary information. Usage notesPlease see README. The Stata code file is a supplementary data file associated with the manuscript. As noted in the README, missing values are represented by empty cells, per the syntax for Stata. See README for an explanation for why different variables have missing data.

  19. H

    Replication Data for: Estimating Incumbency Effects Using Regression...

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Jan 8, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BK Song (2019). Replication Data for: Estimating Incumbency Effects Using Regression Discontinuity Design [Dataset]. http://doi.org/10.7910/DVN/JSOWUR
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 8, 2019
    Dataset provided by
    Harvard Dataverse
    Authors
    BK Song
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This is the replication data for "Estimating Incumbency Effects Using Regression Discontinuity Design." The data are in dta format and Stata ado- and do-files are included. (2018-09-30)

  20. o

    Inconvenient Truths About Logistic Regression

    • openicpsr.org
    Updated Dec 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Howell-Moroney (2022). Inconvenient Truths About Logistic Regression [Dataset]. http://doi.org/10.3886/E183501V1
    Explore at:
    Dataset updated
    Dec 10, 2022
    Dataset provided by
    University of Memphis
    Authors
    Michael Howell-Moroney
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data deposit contains .do files and a data file to replicate analysisStata .do Files ME Sim.do is a Stata .do file to create and run data simulations. (No data file required)Stata Code for CPS data and Analysis.do is a Stata .do file to create variables for analysis and runs the regressions in the paper.Data FileCPS 2017 Volunteering and Civic Life Supplement

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Emily Oster (2023). DHS data extractors for Stata [Dataset]. http://doi.org/10.7910/DVN/RRX3QD

DHS data extractors for Stata

Explore at:
Dataset updated
Nov 21, 2023
Dataset provided by
Harvard Dataverse
Authors
Emily Oster
Description

This package contains two files designed to help read individual level DHS data into Stata. The first file addresses the problem that versions of Stata before Version 7/SE will read in only up to 2047 variables and most of the individual files have more variables than that. The file will read in the .do, .dct and .dat file and output new .do and .dct files with only a subset of the variables specified by the user. The second file deals with earlier DHS surveys in which .do and .dct file do not exist and only .sps and .sas files are provided. The file will read in the .sas and .sps files and output a .dct and .do file. If necessary the first file can then be run again to select a subset of variables.

Search
Clear search
Close search
Google apps
Main menu