Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We include Stata syntax (dummy_dataset_create.do) that creates a panel dataset for negative binomial time series regression analyses, as described in our paper "Examining methodology to identify patterns of consulting in primary care for different groups of patients before a diagnosis of cancer: an exemplar applied to oesophagogastric cancer". We also include a sample dataset for clarity (dummy_dataset.dta), and a sample of that data in a spreadsheet (Appendix 2).
The variables contained therein are defined as follows:
case: binary variable for case or control status (takes a value of 0 for controls and 1 for cases).
patid: a unique patient identifier.
time_period: A count variable denoting the time period. In this example, 0 denotes 10 months before diagnosis with cancer, and 9 denotes the month of diagnosis with cancer,
ncons: number of consultations per month.
period0 to period9: 10 unique inflection point variables (one for each month before diagnosis). These are used to test which aggregation period includes the inflection point.
burden: binary variable denoting membership of one of two multimorbidity burden groups.
We also include two Stata do-files for analysing the consultation rate, stratified by burden group, using the Maximum likelihood method (1_menbregpaper.do and 2_menbregpaper_bs.do).
Note: In this example, for demonstration purposes we create a dataset for 10 months leading up to diagnosis. In the paper, we analyse 24 months before diagnosis. Here, we study consultation rates over time, but the method could be used to study any countable event, such as number of prescriptions.
Facebook
Twitteranalyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D
Facebook
TwitterSeptember 1., 2016 REPLICATION FILES FOR «THE IMPACT OF STATE TELEVISION ON VOTER TURNOUT», TO BE PUBLISHED BY THE BRITISH JOURNAL OF POLITICAL SCIENCE The replication files consist of two datasets and corresponding STATA do-files. Please note the following: 1. The data used in the current microanalysis are based on the National Election Surveys of 1965, 1969, and 1973. The Institute of Social Research (ISF) was responsible for the original studies, and data was made available by the NSD (Norwegian Center for Research Data). Neither ISF nor NSD are responsible for the analyses/interpretations of the data presented here. 2. Some of the data used in the municipality-level analyses are taken from NSD’s local government database (“Kommunedatabasen”). The NSD is not responsible for the analysis presented here or the interpretation offered in the BJPS-paper. 3. Note the municipality identification has been anonymized to avoid identification of individual respondents. 4. Most of the analyses generate Word-files that are produced by the outreg2 facility in STATA. These tables can be compared with those presented in the paper. The graphs are directly comparable to those in the paper. In a few cases, the results are only generated in the STATA output window. The paper employs two sets of data: I. Municipal level data in entered in STATA-format (AggregateReplicationTVData.dta), and with a corresponding data with map coordinates (muncoord.dta). The STATA code is in a do-file (ReplicationOfAggregateAnalysis.do). II. The survey data is in a STATA-file (ReplicationofIndividualLevelPanel.dta) and a with a corresponding do-file (ReplicationOfIndividualLevelAnalysis 25.08.2016.do). Please remember to change the file reference (i.e. use-statement) to execute the do-files.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The code in this replication package assembles the data needed and replicates the analysis of the paper “Labor Market Inequality and the Changing Life Cycle Profile of Male and Female Wages,” by Richard Blundell, Hugo Lopez, and James P. Ziliak. The first file is a Stata file 0_InstallPackages.do which installs a number of plug-in ADO files needed for successful execution (only run once). The second is also a Stata file 1_FullDataPrep.do which calls a number of Stata DO files to compile all the necessary data files and prepares the data for the analysis. The resulting Stata dataset, RunningData_withtaxsim.dta, is found in the replication file /ProcessedData/, and because Matlab relies on csv files, the resulting Matlab input files are at /ProcessedData/MatlabDataInputFiles/. The replicator should expect the code to run for about 3 hours. Then, the Matlab file a2_QuantileEstimation.m should be executed. The parameter estimates reported in the figures and tables come from a Windows desktop version that takes about 5 hours for each of four subsamples for each model specification. Due to the computational complexity of the bootstrap quantile with selection estimator, we made use of a computing cluster with a SLURM job scheduler. There are 8 Matlab bootstrap programs—four to produce standard errors in Tables 1-4 of the manuscript and four to produce standard errors for Supplemental Appendix Tables D1-D4. These boostrap computations were submitted in parallel (i.e. all at once as separate programs) and each took on average 6 days when running on 4 cores. Then the Stata file 3_Figures&Tables.do should be run to produce all 9 figures (30 in the appendix) and 4 tables (7 in the appendix).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Canada Trademarks Dataset
18 Journal of Empirical Legal Studies 908 (2021), prepublication draft available at https://papers.ssrn.com/abstract=3782655, published version available at https://onlinelibrary.wiley.com/share/author/CHG3HC6GTFMMRU8UJFRR?target=10.1111/jels.12303
Dataset Selection and Arrangement (c) 2021 Jeremy Sheff
Python and Stata Scripts (c) 2021 Jeremy Sheff
Contains data licensed by Her Majesty the Queen in right of Canada, as represented by the Minister of Industry, the minister responsible for the administration of the Canadian Intellectual Property Office.
This individual-application-level dataset includes records of all applications for registered trademarks in Canada since approximately 1980, and of many preserved applications and registrations dating back to the beginning of Canada’s trademark registry in 1865, totaling over 1.6 million application records. It includes comprehensive bibliographic and lifecycle data; trademark characteristics; goods and services claims; identification of applicants, attorneys, and other interested parties (including address data); detailed prosecution history event data; and data on application, registration, and use claims in countries other than Canada. The dataset has been constructed from public records made available by the Canadian Intellectual Property Office. Both the dataset and the code used to build and analyze it are presented for public use on open-access terms.
Scripts are licensed for reuse subject to the Creative Commons Attribution License 4.0 (CC-BY-4.0), https://creativecommons.org/licenses/by/4.0/. Data files are licensed for reuse subject to the Creative Commons Attribution License 4.0 (CC-BY-4.0), https://creativecommons.org/licenses/by/4.0/, and also subject to additional conditions imposed by the Canadian Intellectual Property Office (CIPO) as described below.
Terms of Use:
As per the terms of use of CIPO's government data, all users are required to include the above-quoted attribution to CIPO in any reproductions of this dataset. They are further required to cease using any record within the datasets that has been modified by CIPO and for which CIPO has issued a notice on its website in accordance with its Terms and Conditions, and to use the datasets in compliance with applicable laws. These requirements are in addition to the terms of the CC-BY-4.0 license, which require attribution to the author (among other terms). For further information on CIPO’s terms and conditions, see https://www.ic.gc.ca/eic/site/cipointernet-internetopic.nsf/eng/wr01935.html. For further information on the CC-BY-4.0 license, see https://creativecommons.org/licenses/by/4.0/.
The following attribution statement, if included by users of this dataset, is satisfactory to the author, but the author makes no representations as to whether it may be satisfactory to CIPO:
The Canada Trademarks Dataset is (c) 2021 by Jeremy Sheff and licensed under a CC-BY-4.0 license, subject to additional terms imposed by the Canadian Intellectual Property Office. It contains data licensed by Her Majesty the Queen in right of Canada, as represented by the Minister of Industry, the minister responsible for the administration of the Canadian Intellectual Property Office. For further information, see https://creativecommons.org/licenses/by/4.0/ and https://www.ic.gc.ca/eic/site/cipointernet-internetopic.nsf/eng/wr01935.html.
Details of Repository Contents:
This repository includes a number of .zip archives which expand into folders containing either scripts for construction and analysis of the dataset or data files comprising the dataset itself. These folders are as follows:
If users wish to construct rather than download the datafiles, the first script that they should run is /py/sftp_secure.py. This script will prompt the user to enter their IP Horizons SFTP credentials; these can be obtained by registering with CIPO at https://ised-isde.survey-sondage.ca/f/s.aspx?s=59f3b3a4-2fb5-49a4-b064-645a5e3a752d&lang=EN&ds=SFTP. The script will also prompt the user to identify a target directory for the data downloads. Because the data archives are quite large, users are advised to create a target directory in advance and ensure they have at least 70GB of available storage on the media in which the directory is located.
The sftp_secure.py script will generate a new subfolder in the user’s target directory called /XML_raw. Users should note the full path of this directory, which they will be prompted to provide when running the remaining python scripts. Each of the remaining scripts, the filenames of which begin with “iterparse”, corresponds to one of the data files in the dataset, as indicated in the script’s filename. After running one of these scripts, the user’s target directory should include a /csv subdirectory containing the data file corresponding to the script; after running all the iterparse scripts the user’s /csv directory should be identical to the /csv directory in this repository. Users are invited to modify these scripts as they see fit, subject to the terms of the licenses set forth above.
With respect to the Stata do-files, only one of them is relevant to construction of the dataset itself. This is /do/CA_TM_csv_cleanup.do, which converts the .csv versions of the data files to .dta format, and uses Stata’s labeling functionality to reduce the size of the resulting files while preserving information. The other do-files generate the analyses and graphics presented in the paper describing the dataset (Jeremy N. Sheff, The Canada Trademarks Dataset, 18 J. Empirical Leg. Studies (forthcoming 2021)), available at https://papers.ssrn.com/abstract=3782655). These do-files are also licensed for reuse subject to the terms of the CC-BY-4.0 license, and users are invited to adapt the scripts to their needs.
The python and Stata scripts included in this repository are separately maintained and updated on Github at https://github.com/jnsheff/CanadaTM.
This repository also includes a copy of the current version of CIPO's data dictionary for its historical XML trademarks archive as of the date of construction of this dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This exercise dataset was created for researchers interested in learning how to use the models described in the "Handbook on Impact Evaluation: Quantitative Methods and Practices" by S. Khandker, G. Koolwal and H. Samad, World Bank, October 2009 (permanent URL http://go.worldbank.org/FE8098BI60). Public programs are designed to reach certain goals and beneficiaries. Methods to understand whether such programs actually work, as well as the level and nature of impacts on intended beneficiaries, are main themes of this book. Has the Grameen Bank, for example, succeeded in lowering consumption poverty among the rural poor in Bangladesh? Can conditional cash transfer programs in Mexico and Latin America improve health and schooling outcomes for poor women and children? Does a new road actually raise welfare in a remote area in Tanzania, or is it a "highway to nowhere?" This handbook reviews quantitative methods and models of impact evaluation. It begings by reviewing the basic issues pertaining to an evaluation of an intervention to reach certain targets and goals. It then focuses on the experimental design of an impact evaluation, highlighting its strengths and shortcomings, followed by discussions on various non-experimental methods. The authors also cover methods to shed light on the nature and mechanisms by which different participants are benefiting from the program. The handbook provides STATA exercises in the context of evaluating major microcredit programs in Bangladesh, such as the Grameen Bank. This dataset provides both the related Stata data files and the Stata programs.
Facebook
TwitterThis STATA dataset and accompanying STATA do-file were used to create the tables and supplementary analysis for the article "Overcoming Resource Competition Among Co-Ethnics: Elites, Endorsements, and Multiracial Support for Urban Distributive Policies."
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary : Fuel demand is shown to be influenced by fuel prices, people's income and motorization rates. We explore the effects of electric vehicle's rates in gasoline demand using this panel dataset.
Files : dataset.csv - Panel dimensions are the Brazilian state ( i ) and year ( t ). The other columns are: gasoline sales per capita (ln_Sg_pc), prices of gasoline (ln_Pg) and ethanol (ln_Pe) and their lags, motorization rates of combustion vehicles (ln_Mi_c) and electric vehicles (ln_Mi_e) and GDP per capita (ln_gdp_pc). All variables are all under the natural log function, since we use this to calculate demand elasticities in a regression model.
adjacency.csv - The adjacency matrix used in interaction with electric vehicles' motorization rates to calculate spatial effects. At first, it follows a binary adjacency formula: for each pair of states i and j, the cell (i, j) is 0 if the states are not adjacent and 1 if they are. Then, each row is normalized to have sum equal to one.
regression.do - Series of Stata commands used to estimate the regression models of our study. dataset.csv must be imported to work, see comment section.
dataset_predictions.xlsx - Based on the estimations from Stata, we use this excel file to make average predictions by year and by state. Also, by including years beyond the last panel sample, we also forecast the model into the future and evaluate the effects of different policies that influence gasoline prices (taxation) and EV motorization rates (electrification). This file is primarily used to create images, but can be used to further understand how the forecasting scenarios are set up.
Sources: Fuel prices and sales: ANP (https://www.gov.br/anp/en/access-information/what-is-anp/what-is-anp) State population, GDP and vehicle fleet: IBGE (https://www.ibge.gov.br/en/home-eng.html?lang=en-GB) State EV fleet: Anfavea (https://anfavea.com.br/en/site/anuarios/)
Facebook
TwitterThis file contains the Stata code used to produce tables and figures for "Shareholder Monitoring Through Voting: New Evidence from Proxy Contests". A pseudo dataset containing roughly 10% of the original data is also included.
Facebook
TwitterHuman Papilloma Virus (HPV) is a preventable cause of cervical cancer, the most common cancer among women in Uganda. The Uganda Ministry of Health included the HPV vaccine in the free routine immunization schedule since 2015. Five years after this policy, we assessed the uptake of the HPV vaccine and associated socio-demographic factors among young women living in fishing communities in Central Uganda in 2020. We analyzed secondary data from 94 young women aged 9–25 years who were recruited from the two fishing communities (Kasenyi landing site and Koome Island) in a primary implementation study that aimed to promote awareness of maternal and childhood vaccines. We assessed uptake of the HPV vaccine as the proportion of participants who self-reported to have ever received at least one dose of the HPV vaccine. We assessed the socio-demographic factors associated with HPV vaccine uptake using a modified Poisson regression model adjusted for clustering by study site in STATA version 17. Th..., This was a secondary analysis of data collected from a larger implementation project that aimed at increasing awareness of maternal vaccines in fishing communities of Wakiso and Mukono districts in Uganda. This data was extracted from the main datasets by the data manager, no identifying infromation was included. Data analysis was done in STATA version 17.0 (Texas USA)., , # Uptake of Human Papilloma Virus Vaccine among young women in fishing communities in Wakiso and Mukono districts, Uganda
This dataset is a STATA file (created in STATA version 17.0). It contains all the variables used in the analysis that gave rise to the findings reported in this Manuscript. Variable descriptions and the value labels are provided in the file.
The data are in the form of a STATA file with an extension of "dta". It can be opened in STATA software. The value for each variable is already defined with the respective value labels and also the variable descriptions are in this dataset
Some variables in the data like actual age, religion, tribe and number of children were not included in this data set to preserve confidentiality. However, data on these variables can be accessed by contacting the corresponding author
Data can also be accessed by contacting the corresponding author at
Facebook
TwitterThe Current Population Survey Civic Engagement and Volunteering (CEV) Supplement is the most robust longitudinal survey about volunteerism and other forms of civic engagement in the United States. Produced by AmeriCorps in partnership with the U.S. Census Bureau, the CEV takes the pulse of our nation’s civic health every two years. The latest data was collected in September 2023. The CEV can generate reliable estimates at the national level, within states and the District of Columbia, and in the largest twelve Metropolitan Statistical Areas to support evidence-based decision making and efforts to understand how people make a difference in communities across the country. Click on "Export" to download and review an excerpt from the 2023 CEV Analytic Codebook that shows the variables available in the analytic CEV datasets produced by AmeriCorps. Click on "Show More" to download and review the following 2023 CEV data and resources provided as attachments: 1) 2023 CEV Dataset Fact Sheet – brief summary of technical aspects of the 2023 CEV dataset. 2) CEV FAQs – answers to frequently asked technical questions about the CEV 3) Constructs and measures in the CEV 4) 2023 CEV Analytic Data and Setup Files – analytic dataset in Stata (.dta), R (.rda), and Excel (.csv) formats, codebook for analytic dataset, and Stata code (.do) to convert raw dataset to analytic formatting produced by AmeriCorps 5) 2023 CEV Technical Documentation – codebook for raw dataset and full supplement documentation produced by U.S. Census Bureau 6) 2023 CEV Raw Data and Read In Files – raw dataset in Stata (.dta) format, Stata code (.do) and dictionary file (.dct) to read ASCII dataset (.dat) into Stata using layout files (.lis)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the datasets and code underpinning Chapter 3 "Counterfactual Impact Evaluation of Plan S" of the report "Galvanising the Open Access Community: A Study on the Impact of Plan S" commissioned by the cOAlition S to scidecode science consulting.
Two categories of files are part of this repository:
1. Datasets
The 21 CSV source files contain the subsets of publications funded by the funding agencies that are part of this study. These files have been provided by OA.Works, with whom scidecode has collaborated for the data collection process. Data sources and collection and processing workflows applied by OA.Works are described on their website and specifically at https://about.oa.report/docs/data.
The file "plan_s.dta" is the aggregated data file stored in the format ".dta", which can be accessed with STATA by default or with plenty of programming languages using the respective packages, e.g., R or Python.
2. Code files
The associated code files that have been used to process the data files are:
- data_prep_and_analysis_script.do
- coef_plots_script.R
The first file has been used to process the CSV data files above for data preparation and analysis purposes. Here, data aggregation and data preprocessing is executed. Furthermore, all statistical regressions for the ounterfactual impact evaluation are listed in this code file. The second code file "coef_plots_script.R" uses the computed results of the counterfactual impact evaluation to create the final graphic plots using the ggplot2 package.
The first ".do" file has to be run in STATA, the second one (".R") requires the use of an integrated development environment for R.
Further Information are avilable in the final report and via the followng URLs:https://www.coalition-s.org/ https://scidecode.com/ https://oa.works/ https://openalex.org/
https://sites.google.com/view/wbschmal
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This package includes Stata do file and the main Stata datasets used to generate tables (Table 1, Table 3, Table 4, Table A1, and Table 7) for the article titled "The Geography of Investor Attention". Due to data restrictions and the utilization of multiple datasets in the paper, we provide a subsample of the main dataset, with pseudo firm identifiers, to aid in understanding both the code's structure and the main dataset employed.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset is created for the year 2018 in Germany for 267 popular initiatives at the local level where the information was collected regarding the following variables: topic of initiative, State legislation, turnout, validity of the initiative, success of the popular initiative, proportion of yes voters in case of a referendum, size of municipality, profile of initiators (local council or citizens), proportion of no voters, index of mobilization, repartition index (difference between yes and no voters), approval rate, number of inhabitants, correction of initiative, status of the case (open/closed).
The dataset was created with the help of the existing popular initiatives registered by the University of Wuppertal and the association Mehr Demokratie in Germany. The idea of the dataset is to evaluate in details on which factors the success of popular initiatives depend in the different German States (Länder). A repartition index (difference between yes and no voters) and a mobilization index (repartition index multiplied by the turnout) were calculated and added in the dataset. All the other variables were also created in order to balance the result of these initiatives. The final aim is to be able to measure how direct democratic tools influence local politics in Germany. This is why it is important to examine the prevailing factors for the satisfaction of citizens who use these procedures. In this dataset, the destiny of an initiative (failure/success) can be taken as the dependent variable and all the others could be classified as independent variables.Direct democracy offers possibilities for citizens to influence political decisions especially at the local level. In Germany, the local political systems have been affected by the introduction of direct democratic tools such as citizen initiatives and local referenda since the Reunification. The State legislations defined new conditions for citizen initiatives and municipal referenda with a minimum number of valid signatures for initiatives and a minimum approval rate for the referenda. In the attached file, you will find the dataset in Excel file as well as a .dta file that you can open with the software Stata (https://www.stata.com/).
Facebook
TwitterThe Current Population Survey Civic Engagement and Volunteering (CEV) Supplement is the most robust longitudinal survey about volunteerism and other forms of civic engagement in the United States. Produced by AmeriCorps in partnership with the U.S. Census Bureau, the CEV takes the pulse of our nation’s civic health every two years. The data on this page was collected in September 2017. The CEV can generate reliable estimates at the national level, within states and the District of Columbia, and in the largest twelve Metropolitan Statistical Areas to support evidence-based decision making and efforts to understand how people make a difference in communities across the country. This page was updated on January 16, 2025 to ensure consistency across all waves of CEV data. Click on "Export" to download and review an excerpt from the 2017 CEV Analytic Codebook that shows the variables available in the analytic CEV datasets produced by AmeriCorps. Click on "Show More" to download and review the following 2017 CEV data and resources provided as attachments: 1) CEV FAQs – answers to frequently asked technical questions about the CEV 2) Constructs and measures in the CEV 3) 2017 CEV Analytic Data and Setup Files – analytic dataset in Stata (.dta), R (.rdata), SPSS (.sav), and Excel (.csv) formats, codebook for analytic dataset, and Stata code (.do) to convert raw dataset to analytic formatting produced by AmeriCorps. 4) 2017 CEV Technical Documentation – codebook for raw dataset and full supplement documentation produced by U.S. Census Bureau 5) 2017 CEV Raw Data and Read In Files – raw dataset in Stata (.dta) format, Stata code (.do) and dictionary file (.dct) to read ASCII dataset (.dat) into Stata using layout files (.lis)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data and code needed to reproduce the results of the paper "Effects of community management on user activity in online communities", available in draft here.
Instructions:
Please note: I use both Stata and Jupyter Notebook interactively, running a block with a few lines of code at a time. Expect to have to change directories, file names etc.
Facebook
TwitterThese data are part of NACJD's Fast Track Release and are distributed as they were received from the data depositor. The files have been zipped by NACJD for release, but not checked or processed except for the removal of direct identifiers. Users should refer to the accompanying readme files for a brief dscription of the files available with this collection and consult the investigator(s) if further information is needed. The qualitative data are not available as part of the data collection at this time. Numerous high-profile events involving student victimization on school buses have raised critical questions regarding the safety of school-based transportation for children, the efforts taken by school districts to protect students on buses, and the most effective transportation-based behavioral management strategies for reducing misconduct. To address these questions, a national web-based survey was administered to public school district-level transportation officials throughout the United States to assess the prevalence of misconduct on buses, identify strategies to address misconduct, and describe effective ways to reduce student misbehavior on buses. Telephone interviews were also conducted with a small group of transportation officials to understand the challenges of transportation-based behavioral management, to determine successful strategies to create safe and positive school bus environments, and to identify data-driven approaches for tracking and assessing disciplinary referrals. The collection includes 10 Stata data files: BVSBS_analysis file.dta (n=2,595; 1058 variables) Title Crosswalk File.dta (n=2,594; 3 variables) Lessons Learned and Open Dummies.dta (n=1,543; 200 variables) CCD dataset.dta (n=12,494; 89 variables) BVSB_REGION.dta (n=4; 3 variables) BVSB_SCHOOLS.dta (n=3; 3 variables) BVSB_STUDENTS.dta (n=3; 3 variables) BVSB_URBAN.dta (n=8; 3 variables) BVSB_WHITE.dta (n=3; 3 variables) FINALRAKER.dta (n=2,595; 2 variables)
Facebook
Twitterhttps://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de466780https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de466780
Abstract (en): Between 1954 and 1996, more than 200 nuclear power projects were publicly announced in the USA. Barely half of these projects were completed and generated power commercially. Existing research has highlighted a number of potential explanations for the varying siting outcomes of these projects, including contentious political protest, socioeconomic, and political conditions within potential host communities, regulatory changes (‘ratcheting’), and cost overruns. However, questions remain about which of these factors, if any, had an impact on these outcomes. We created a new data set of 228 host communities where siting was attempted to illuminate the factors that led projects towards either completion or cancellation. We include county-level regulatory, reactor-specific, demographic, and political factors which may correlate with the outcomes of attempts to site nuclear reactors over this time period. The full draft of our forthcoming peer reviewed article in International Journal of Energy Research can be found at http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2423935. We include the Stata dataset, the codebook, and the .do file used to create the statistical analysis for the paper. Funding insitution(s): Purdue University (Center for the Environment).
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Note: This version supersedes version 1: https://doi.org/10.15482/USDA.ADC/1522654. In Fall of 2019 the USDA Food and Nutrition Service (FNS) conducted the third Farm to School Census. The 2019 Census was sent via email to 18,832 school food authorities (SFAs) including all public, private, and charter SFAs, as well as residential care institutions, participating in the National School Lunch Program. The questionnaire collected data on local food purchasing, edible school gardens, other farm to school activities and policies, and evidence of economic and nutritional impacts of participating in farm to school activities. A total of 12,634 SFAs completed usable responses to the 2019 Census. Version 2 adds the weight variable, “nrweight”, which is the Non-response weight. Processing methods and equipment used The 2019 Census was administered solely via the web. The study team cleaned the raw data to ensure the data were as correct, complete, and consistent as possible. This process involved examining the data for logical errors, contacting SFAs and consulting official records to update some implausible values, and setting the remaining implausible values to missing. The study team linked the 2019 Census data to information from the National Center of Education Statistics (NCES) Common Core of Data (CCD). Records from the CCD were used to construct a measure of urbanicity, which classifies the area in which schools are located. Study date(s) and duration Data collection occurred from September 9 to December 31, 2019. Questions asked about activities prior to, during and after SY 2018-19. The 2019 Census asked SFAs whether they currently participated in, had ever participated in or planned to participate in any of 30 farm to school activities. An SFA that participated in any of the defined activities in the 2018-19 school year received further questions. Study spatial scale (size of replicates and spatial scale of study area) Respondents to the survey included SFAs from all 50 States as well as American Samoa, Guam, the Northern Mariana Islands, Puerto Rico, the U.S. Virgin Islands, and Washington, DC. Level of true replication Unknown Sampling precision (within-replicate sampling or pseudoreplication) No sampling was involved in the collection of this data. Level of subsampling (number and repeat or within-replicate sampling) No sampling was involved in the collection of this data. Study design (before–after, control–impacts, time series, before–after-control–impacts) None – Non-experimental Description of any data manipulation, modeling, or statistical analysis undertaken Each entry in the dataset contains SFA-level responses to the Census questionnaire for SFAs that responded. This file includes information from only SFAs that clicked “Submit” on the questionnaire. (The dataset used to create the 2019 Farm to School Census Report includes additional SFAs that answered enough questions for their response to be considered usable.) In addition, the file contains constructed variables used for analytic purposes. The file does not include weights created to produce national estimates for the 2019 Farm to School Census Report. The dataset identified SFAs, but to protect individual privacy the file does not include any information for the individual who completed the questionnaire. Description of any gaps in the data or other limiting factors See the full 2019 Farm to School Census Report [https://www.fns.usda.gov/cfs/farm-school-census-and-comprehensive-review] for a detailed explanation of the study’s limitations. Outcome measurement methods and equipment used None Resources in this dataset:Resource Title: 2019 Farm to School Codebook with Weights. File Name: Codebook_Update_02SEP21.xlsxResource Description: 2019 Farm to School Codebook with WeightsResource Title: 2019 Farm to School Data with Weights CSV. File Name: census2019_public_use_with_weight.csvResource Description: 2019 Farm to School Data with Weights CSVResource Title: 2019 Farm to School Data with Weights SAS R Stata and SPSS Datasets. File Name: Farm_to_School_Data_AgDataCommons_SAS_SPSS_R_STATA_with_weight.zipResource Description: 2019 Farm to School Data with Weights SAS R Stata and SPSS Datasets
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This file contains two documents. 1. A stata dataset of newspaper advertisements for cotton seed varieties in the antebellum United States, 2. A set of Stata commands used to generate the tables in the JEH paper "Biological Innovation without Intellectual Property Rights: Cottonseed Markets in the Antebellum American South"
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We include Stata syntax (dummy_dataset_create.do) that creates a panel dataset for negative binomial time series regression analyses, as described in our paper "Examining methodology to identify patterns of consulting in primary care for different groups of patients before a diagnosis of cancer: an exemplar applied to oesophagogastric cancer". We also include a sample dataset for clarity (dummy_dataset.dta), and a sample of that data in a spreadsheet (Appendix 2).
The variables contained therein are defined as follows:
case: binary variable for case or control status (takes a value of 0 for controls and 1 for cases).
patid: a unique patient identifier.
time_period: A count variable denoting the time period. In this example, 0 denotes 10 months before diagnosis with cancer, and 9 denotes the month of diagnosis with cancer,
ncons: number of consultations per month.
period0 to period9: 10 unique inflection point variables (one for each month before diagnosis). These are used to test which aggregation period includes the inflection point.
burden: binary variable denoting membership of one of two multimorbidity burden groups.
We also include two Stata do-files for analysing the consultation rate, stratified by burden group, using the Maximum likelihood method (1_menbregpaper.do and 2_menbregpaper_bs.do).
Note: In this example, for demonstration purposes we create a dataset for 10 months leading up to diagnosis. In the paper, we analyse 24 months before diagnosis. Here, we study consultation rates over time, but the method could be used to study any countable event, such as number of prescriptions.