These are the STATA data sheets imported from excel. These are used directly for meta-analysis
This is STATA software code for analysis on publicly available NHANES data
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We include Stata syntax (dummy_dataset_create.do) that creates a panel dataset for negative binomial time series regression analyses, as described in our paper "Examining methodology to identify patterns of consulting in primary care for different groups of patients before a diagnosis of cancer: an exemplar applied to oesophagogastric cancer". We also include a sample dataset for clarity (dummy_dataset.dta), and a sample of that data in a spreadsheet (Appendix 2).
The variables contained therein are defined as follows:
case: binary variable for case or control status (takes a value of 0 for controls and 1 for cases).
patid: a unique patient identifier.
time_period: A count variable denoting the time period. In this example, 0 denotes 10 months before diagnosis with cancer, and 9 denotes the month of diagnosis with cancer,
ncons: number of consultations per month.
period0 to period9: 10 unique inflection point variables (one for each month before diagnosis). These are used to test which aggregation period includes the inflection point.
burden: binary variable denoting membership of one of two multimorbidity burden groups.
We also include two Stata do-files for analysing the consultation rate, stratified by burden group, using the Maximum likelihood method (1_menbregpaper.do and 2_menbregpaper_bs.do).
Note: In this example, for demonstration purposes we create a dataset for 10 months leading up to diagnosis. In the paper, we analyse 24 months before diagnosis. Here, we study consultation rates over time, but the method could be used to study any countable event, such as number of prescriptions.
General information: The data sets contain information on how often materials of studies available through GESIS: Data Archive for the Social Sciences were downloaded and/or ordered through one of the archive´s plattforms/services between 2004 and 2017.
Sources and plattforms: Study materials are accessible through various GESIS plattforms and services: Data Catalogue (DBK), histat, datorium, data service (and others).
Years available: - Data Catalogue: 2012-2017 - data service: 2006-2017 - datorium: 2014-2017 - histat: 2004-2017
Data sets: Data set ZA6899_Datasets_only_all_sources contains information on how often data files such as those with dta- (Stata) or sav- (SPSS) extension have been downloaded. Identification of data files is handled semi-automatically (depending on the plattform/serice). Multiple downloads of one file by the same user (identified through IP-address or username for registered users) on the same days are only counted as one download.
Data set ZA6899_Doc_and_Data_all_sources contains information on how often study materials have been downloaded. Multiple downloads of any file of the same study by the same user (identified through IP-address or username for registered users) on the same days are only counted as one download.
Both data sets are available in three formats: csv (quoted, semicolon-separated), dta (Stata v13, labeled) and sav (SPSS, labeled). All formats contain identical information.
Variables: Variables/columns in both data sets are identical. za_nr ´Archive study number´ version ´GESIS Archiv Version´ doi ´Digital Object Identifier´ StudyNo ´Study number of respective study´ Title ´English study title´ Title_DE ´German study title´ Access ´Access category (0, A, B, C, D, E)´ PubYear ´Publication year of last version of the study´ inZACAT ´Study is currently also available via ZACAT´ inHISTAT ´Study is currently also available via HISTAT´ inDownloads ´There are currently data files available for download for this study in DBK or datorium´ Total ´All downloads combined´ downloads_2004 ´downloads/orders from all sources combined in 2004´ [up to ...] downloads_2017 ´downloads/orders from all sources combined in 2017´ d_2004_dbk ´downloads from source dbk in 2004´ [up to ...] d_2017_dbk ´downloads from source dbk in 2017´ d_2004_histat ´downloads from source histat in 2004´ [up to ...] d_2017_histat ´downloads from source histat in 2017´ d_2004_dataservice ´downloads/orders from source dataservice in 2004´ [up to ...] d_2017_dataservice ´downloads/orders from source dataservice in 2017´
More information is available within the codebook.
https://dataverse-staging.rdmc.unc.edu/api/datasets/:persistentId/versions/4.0/customlicense?persistentId=hdl:1902.29/11638https://dataverse-staging.rdmc.unc.edu/api/datasets/:persistentId/versions/4.0/customlicense?persistentId=hdl:1902.29/11638
This is a 3-part short course (held over three afternoons). Stata part 1 will offer an introduction to Stata for Windows. Part 2 will teach entering data in Stata, working with Stata do files, and show how to append, sort, and merge data sets in Stata. Part 3 teaches how to perform basic statistical procedures and how to draw sub samples from large datasets.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
It is a widely accepted fact that evolving software systems change and grow. However, it is less well-understood how change is distributed over time, specifically in object oriented software systems. The patterns and techniques used to measure growth permit developers to identify specific releases where significant change took place as well as to inform them of the longer term trend in the distribution profile. This knowledge assists developers in recording systemic and substantial changes to a release, as well as to provide useful information as input into a potential release retrospective. However, these analysis methods can only be applied after a mature release of the code has been developed. But in order to manage the evolution of complex software systems effectively, it is important to identify change-prone classes as early as possible. Specifically, developers need to know where they can expect change, the likelihood of a change, and the magnitude of these modifications in order to take proactive steps and mitigate any potential risks arising from these changes. Previous research into change-prone classes has identified some common aspects, with different studies suggesting that complex and large classes tend to undergo more changes and classes that changed recently are likely to undergo modifications in the near future. Though the guidance provided is helpful, developers need more specific guidance in order for it to be applicable in practice. Furthermore, the information needs to be available at a level that can help in developing tools that highlight and monitor evolution prone parts of a system as well as support effort estimation activities. The specific research questions that we address in this chapter are: (1) What is the likelihood that a class will change from a given version to the next? (a) Does this probability change over time? (b) Is this likelihood project specific, or general? (2) How is modification frequency distributed for classes that change? (3) What is the distribution of the magnitude of change? Are most modifications minor adjustments, or substantive modifications? (4) Does structural complexity make a class susceptible to change? (5) Does popularity make a class more change-prone? We make recommendations that can help developers to proactively monitor and manage change. These are derived from a statistical analysis of change in approximately 55000 unique classes across all projects under investigation. The analysis methods that we applied took into consideration the highly skewed nature of the metric data distributions. The raw metric data (4 .txt files and 4 .log files in a .zip file measuring ~2MB in total) is provided as a comma separated values (CSV) file, and the first line of the CSV file contains the header. A detailed output of the statistical analysis undertaken is provided as log files generated directly from Stata (statistical analysis software).
This package contains two files designed to help read individual level DHS data into Stata. The first file addresses the problem that versions of Stata before Version 7/SE will read in only up to 2047 variables and most of the individual files have more variables than that. The file will read in the .do, .dct and .dat file and output new .do and .dct files with only a subset of the variables specified by the user. The second file deals with earlier DHS surveys in which .do and .dct file do not exist and only .sps and .sas files are provided. The file will read in the .sas and .sps files and output a .dct and .do file. If necessary the first file can then be run again to select a subset of variables.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
file1: Regression models for intentional injury crimes.file2: Regression models for bribery and corruption.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data and commands replicate all tables and figures in the paper titled "Chinese Agriculture in the Age of High-speed Rail: Effects on Agricultural Value Added and Food Output" publihsed in Agribusiness, 2023, 39 (2), 387-405. If using the data in this paper, please cite Gao, Y., & Wang, X. (2023). Chinese agriculture in the age of high-speed rail: Effects on agricultural value added and food output. Agribusiness, 39, 387–405. https://doi.org/10.1002/agr.21771
This dataset was created by iFinance Tutor
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is data used for regression model in Stata format
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Revised STATA do-file and dataset prepared for journal article resubmission.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset and Stata code
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Background
This dataset contains human evaluations of whether outputs on the TaTA dataset are a) understandable and b) attributable to the source tables. See TaTA: A Multilingual Table-to-Text Dataset for African Languages for more details. It can be used to train a learned metric, called StATA, to evaluate model performance on the TaTA dataset. Paper: https://www.arxiv.org/abs/2503.23204 The original can be found here.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cleaned Dataset for the Project
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This supplementary material includes instructions and STATA do files that replicate the empirical results in the paper. The data used is the CPS and needs to be downloaded from the IPMUS website.
Dataset and Stata codes for the paper "Collaboration, Alphabetical Order and Gender Discrimination"
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the dataset based on the survey responses of the general population and patients in the Netherlands
Restricted Use data from the ILAB Philippines study
These are the STATA data sheets imported from excel. These are used directly for meta-analysis