Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
It is a widely accepted fact that evolving software systems change and grow. However, it is less well-understood how change is distributed over time, specifically in object oriented software systems. The patterns and techniques used to measure growth permit developers to identify specific releases where significant change took place as well as to inform them of the longer term trend in the distribution profile. This knowledge assists developers in recording systemic and substantial changes to a release, as well as to provide useful information as input into a potential release retrospective. However, these analysis methods can only be applied after a mature release of the code has been developed. But in order to manage the evolution of complex software systems effectively, it is important to identify change-prone classes as early as possible. Specifically, developers need to know where they can expect change, the likelihood of a change, and the magnitude of these modifications in order to take proactive steps and mitigate any potential risks arising from these changes. Previous research into change-prone classes has identified some common aspects, with different studies suggesting that complex and large classes tend to undergo more changes and classes that changed recently are likely to undergo modifications in the near future. Though the guidance provided is helpful, developers need more specific guidance in order for it to be applicable in practice. Furthermore, the information needs to be available at a level that can help in developing tools that highlight and monitor evolution prone parts of a system as well as support effort estimation activities. The specific research questions that we address in this chapter are: (1) What is the likelihood that a class will change from a given version to the next? (a) Does this probability change over time? (b) Is this likelihood project specific, or general? (2) How is modification frequency distributed for classes that change? (3) What is the distribution of the magnitude of change? Are most modifications minor adjustments, or substantive modifications? (4) Does structural complexity make a class susceptible to change? (5) Does popularity make a class more change-prone? We make recommendations that can help developers to proactively monitor and manage change. These are derived from a statistical analysis of change in approximately 55000 unique classes across all projects under investigation. The analysis methods that we applied took into consideration the highly skewed nature of the metric data distributions. The raw metric data (4 .txt files and 4 .log files in a .zip file measuring ~2MB in total) is provided as a comma separated values (CSV) file, and the first line of the CSV file contains the header. A detailed output of the statistical analysis undertaken is provided as log files generated directly from Stata (statistical analysis software).
https://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.11588/DATA/GV8NBLhttps://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.11588/DATA/GV8NBL
The NATCOOP project set out to study how nature shapes the preferences and incentives of economic agents and how this in turn affects common-pool resource management. Imagine a group of fishermen targeting a species that requires a lot of teamwork to harvest. Do these fishers become more social over time compared to fishers that work in a more solitary manner? If so, does this have implications for how the fishery should be managed? To study this, the NATCOOP team travelled to Chile and Tanzania and collected data using surveys and economic experiments. These two very different countries have a large population of small-scale fishermen, and both host several distinct types of fisheries. Over the course of five field trips, the project team surveyed more than 2500 fishermen with each field trip contributing to the main research question by measuring fishermen’s preferences for cooperation and risk. Additionally, each fieldtrip aimed to answer another smaller research question that was either focused on risk taking or cooperation behavior in the fisheries. The data from both surveys and experiments are now publicly available and can be freely studied by other researchers, resource managers, or interested citizens. Overall, the NATCOOP dataset contains participants’ responses to a plethora of survey questions and their actions during incentivized economic experiments. It is available in both the .dta and .csv format, and its use is recommended with statistical software such as R or Stata. For those unaccustomed with statistical analysis, we included a video tutorial on how to use the data set in the open-source program R.
https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de443631https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de443631
Abstract (en): This study is part of a time-series collection of national surveys fielded continuously since 1952. The election studies are designed to present data on Americans' social backgrounds, enduring political predispositions, social and political values, perceptions and evaluations of groups and candidates, opinions on questions of public policy, and participation in political life. In addition to core items, new content includes questions on values, political knowledge, and attitudes on racial policy, as well as more general attitudes conceptualized as antecedent to these opinions on racial issues. The Main Data File also contains vote validation data that were expanded to include information from the appropriate election office and were attached to the records of each of the respondents in the post-election survey. The expanded data consist of the respondent's post case ID, vote validation ID, and two variables to clarify the distinction between the office of registration and the office associated with the respondent's sample address. The second data file, Bias Nonresponse Data File, contains respondent-level field administration variables. Of 3,833 lines of sample that were originally issued for the 1990 Study, 2,176 resulted in completed interviews, others were nonsample, and others were noninterviews for a variety of reasons. For each line of sample, the Bias Nonresponse Data File includes sampling data, result codes, control variables, and interviewer variables. Detailed geocode data are blanked but available under conditions of confidential access (contact the American National Election Studies at the Center for Political Studies, University of Michigan, for further details). This is a specialized file, of particular interest to those who are interested in survey nonresponse. Demographic variables include age, party affiliation, marital status, education, employment status, occupation, religious preference, and ethnicity. ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection: Performed consistency checks.; Standardized missing values.; Checked for undocumented or out-of-range codes.. Response Rates: The response rate for this study is 67.7 percent. The study was in the field until January 31, although 67 percent of the interviews were taken by November 25, 80 percent by December 7, and 93 percent by December 31. All United States households in the 50 states. National multistage area probability sample. 2015-11-10 The study metadata was updated.2009-01-09 YYYY-MM-DD Part 1, the Main Data File, incorporates errata that were posted separately under the Fourth ICPSR Edition. Part 2, the Bias Nonresponse Data File, has been added to the data collection, along with corresponding SAS, SPSS, and Stata setup files and documentation. The codebook has been updated by adding a technical memorandum on the sampling design of the study previously missing from the codebook. The nonresponse file contains respondent-level field administration variables for those interested in survey nonresponse. The collection now includes files in ASCII, SPSS portable, SAS transport (CPORT), and Stata system formats.2000-02-21 The data for this study are now available in SAS transport and SPSS export formats in addition to the ASCII data file. Variables in the dataset have been renumbered to the following format: 2-digit (or 2-character) year prefix + 4 digits + [optional] 1-character suffix. Dataset ID and version variables have also been added. Additionally, the Voter Validation Office Administration Interview File (Expanded Version) has been merged with the main data file, and the codebook and SPSS setup files have been replaced. Also, SAS setup files have been added to the collection, and the data collection instrument is now provided as a PDF file. Two files are no longer being released with this collection: the Voter Validation Office Administration Interview File (Unexpanded Version) and the Results of First Contact With Respondent file. Funding insitution(s): National Science Foundation (SOC77-08885 and SES-8341310). face-to-face interviewThere was significantly more content in this post-election survey than ...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These datasets contain many economic variables related to agriculture like crop output value, profit and several others. These datasets can be used for testing several hypotheses related to agricultural economics, both at plot level and household level.
Users can also reproduce these datasets using the STATA 14 do file ‘VDSA data management for agricultural performance’. This STATA program file uses the Village Dynamics in South Asia (VDSA) raw data files in excel format. The resulting output will be two data files in stata format, one at plot level and other at household level.
These plot level and household level data sets are also included in this repository. The word file ‘guidelines’ contain instructions to extract VDSA raw data from VDSA knowledge bank and use them as inputs to run the STATA do file ‘VDSA data management for agricultural performance’
The VDSA raw data files in excel format needed to run the stata do file are also available in this repository for users convenience
The raw VDSA data were generated by the International Crops Research Institute for the Semi-Arid Tropics (ICRISAT) in partnership with Indian Council of Agricultural Research (ICAR) Institutes and the International Rice Research Institute (IRRI) and funded by the Bill & Melinda Gates Foundation (BMGF) (Grant ID: 51937). The data were acquired in surveys by resident field investigators. Data collection was mostly through paper based questionnaires and Samsung tablets were also used since 2012. The survey instruments used for different modules are available at http://vdsa.icrisat.ac.in/vdsa-questionaires.aspx
Study sites were selected using a stepwise purposive sampling covering agro-ecological diversity of the region. Three districts within each zone were selected based on soil, climate parameters as well as the share of agricultural land under ICRISAT mandate crops. On similar lines, one typical sub-district within each district and two villages within each sub-district were selected. Within each village, ten random households from four landholding groups were selected.
Selected farmers were visited by well trained, agriculture graduate, resident field investigators, once every three weeks to collect information related to various socioeconomic indicators. Some of the data modules like details on crop cultivation activities including plot wise input, output was collected every three weeks while others like general endowments were collected once at the beginning of every agricultural year.
The compiled data, source data, data descriptions and data management code are all published in a public repository at http://dataverse.icrisat.org/dataverse/socialscience at https://doi.org/10.21421/D2/HDEUKU]
Some of the several benefits of these data are:
Scientists, students, development practitioners can benefit from these data to track changes in the livelihood options of the rural poor as this data provides long-term, multi-generational perspective on agricultural, social and economic change in rural livelihoods.
The survey sites provide a socio-economic field laboratory for teaching and training students and researchers
This dataset can be used for diverse agricultural, development and socio-economic analysis and to better understand the dynamics of Indian agriculture.
The data helps to provide feedback for designing policy interventions, setting research priorities and refining technologies.
Shed light on the pathways in which new technologies, policies, and programs impact poverty, village economies, and societies
Collection consists of Stata do files which reformat the economic activity history, partner history and parenthood history variables included in the ELSA life history study (person-level, wide format) into separate long format files (person-activity-spell, person-child and person-partner format respectively). The aim of the project that the code resulted from was to compare the employment and parenthood histories from age 16 reported in ELSA to the employment and parenthood histories of two of the UK birth cohorts (the 1958 National Child Development Study and the 1970 British Cohort Study) and thus some decisions were necessary for the purpose of harmonisation. The accompanying note sets out the decisions taken in the data management process. This ESRC-funded PhD project compared cohorts to investigate whether and how differences in women and men's labour market and family patterns have changed over time and educational differences in these patterns. The Stata do files reorganise some of the data from the English Longitudinal Study of Ageing (ELSA) life history data file (wave_3_life_history_data.dta). For information about the ELSA survey see the survey documentation (link under Related Resources).
Background: Adolescent girls in Kenya are disproportionately affected by early and unintended pregnancies, unsafe abortion and HIV infection. The In Their Hands (ITH) programme in Kenya aims to increase adolescents' use of high-quality sexual and reproductive health (SRH) services through targeted interventions. ITH Programme aims to promote use of contraception and testing for sexually transmitted infections (STIs) including HIV or pregnancy, for sexually active adolescent girls, 2) provide information, products and services on the adolescent girl's terms; and 3) promote communities support for girls and boys to access SRH services.
Objectives: The objectives of the evaluation are to assess: a) to what extent and how the new Adolescent Reproductive Health (ARH) partnership model and integrated system of delivery is working to meet its intended objectives and the needs of adolescents; b) adolescent user experiences across key quality dimensions and outcomes; c) how ITH programme has influenced adolescent voice, decision-making autonomy, power dynamics and provider accountability; d) how community support for adolescent reproductive and sexual health initiatives has changed as a result of this programme.
Methodology ITH programme is being implemented in two phases, a formative planning and experimentation in the first year from April 2017 to March 2018, and a national roll out and implementation from April 2018 to March 2020. This second phase is informed by an Annual Programme Review and thorough benchmarking and assessment which informed critical changes to performance and capacity so that ITH is fit for scale. It is expected that ITH will cover approximately 250,000 adolescent girls aged 15-19 in Kenya by April 2020. The programme is implemented by a consortium of Marie Stopes Kenya (MSK), Well Told Story, and Triggerise. ITH's key implementation strategies seek to increase adolescent motivation for service use, create a user-defined ecosystem and platform to provide girls with a network of accessible subsidized and discreet SRH services; and launch and sustain a national discourse campaign around adolescent sexuality and rights. The 3-year study will employ a mixed-methods approach with multiple data sources including secondary data, and qualitative and quantitative primary data with various stakeholders to explore their perceptions and attitudes towards adolescents SRH services. Quantitative data analysis will be done using STATA to provide descriptive statistics and statistical associations / correlations on key variables. All qualitative data will be analyzed using NVIVO software.
Study Duration: 36 months - between 2018 and 2020.
Narok and Homabay counties
Households
All adolescent girls aged 15-19 years resident in the household.
The sampling of adolescents for the household survey was based on expected changes in adolescent's intention to use contraception in future. According to the Kenya Demographic and Health Survey 2014, 23.8% of adolescents and young women reported not intending to use contraception in future. This was used as a baseline proportion for the intervention as it aimed to increase demand and reduce the proportion of sexually active adolescents who did not intend to use contraception in the future. Assuming that the project was to achieve an impact of at least 2.4 percentage points in the intervention counties (i.e. a reduction by 10%), a design effect of 1.5 and a non- response rate of 10%, a sample size of 1885 was estimated using Cochran's sample size formula for categorical data was adequate to detect this difference between baseline and end line time points. Based on data from the 2009 Kenya census, there were approximately 0.46 adolescents girls per a household, which meant that the study was to include approximately 4876 households from the two counties at both baseline and end line surveys.
We collected data among a representative sample of adolescent girls living in both urban and rural ITH areas to understand adolescents' access to information, use of SRH services and SRH-related decision making autonomy before the implementation of the intervention. Depending on the number of ITH health facilities in the two study counties, Homa Bay and Narok that, we sampled 3 sub-Counties in Homa Bay: West Kasipul, Ndhiwa and Kasipul; and 3 sub-Counties in Narok, Narok Town, Narok South and Narok East purposively. In each of the ITH intervention counties, there were sub-counties that had been prioritized for the project and our data collection focused on these sub-counties selected for intervention. A stratified sampling procedure was used to select wards with in the sub-counties and villages from the wards. Then households were selected from each village after all households in the villages were listed. The purposive selection of sub-counties closer to ITH intervention facilities meant that urban and semi-urban areas were oversampled due to the concentration of health facilities in urban areas.
Qualitative Sampling
Focus Group Discussion participants were recruited from the villages where the ITH adolescent household survey was conducted in both counties. A convenience sample of consenting adults living in the villages were invited to participate in the FGDS. The discussion was conducted in local languages. A facilitator and note-taker trained on how to use the focus group guide, how to facilitate the group to elicit the information sought, and how to take detailed notes. All focus group discussions took place in the local language and were tape-recorded, and the consent process included permission to tape-record the session. Participants were identified only by their first names and participants were asked not to share what was discussed outside of the focus group. Participants were read an informed consent form and asked to give written consent. In-depth interviews were conducted with purposively selected sample of consenting adolescent girls who participated in the adolescent survey. We conducted a total of 45 In-depth interviews with adolescent girls (20 in Homa Bay County and 25 in Narok County respectively). In addition, 8 FGDs (4 each per county) were conducted with mothers of adolescent girls who are usual residents of the villages which had been identified for the interviews and another 4 FGDs (2 each per county) with CHVs.
N/A
Face-to-face [f2f] for quantitative data collection and Focus Group Discussions and In Depth Interviews for qualitative data collection
The questionnaire covered; socio-demographic and household information, SRH knowledge and sources of information, sexual activity and relationships, family planning knowledge, access, choice and use when needed, exposure to family planning messages and voice and decision making autonomy and quality of care for those who visited health facilities in the 12 months before the survey. The questionnaire was piloted before the data collection and the questions reviewed for appropriateness, comprehension and flow. The questionnaire was piloted among a sample of 42 adolescent girls (two each per field interviewer) 15-19 from a community outside the study counties.
The questionnaire was originally developed in English and later translated into Kiswahili. The questionnaire was programmed using ODK-based Survey CTO platform for data collection and management and was administered through face-to-face interview.
The survey tools were programmed using the ODK-based SurveyCTO platform for data collection and management. During programming, consistency checks were in-built into the data capture software which ensured that there were no cases of missing or implausible information/values entered into the database by the field interviewers. For example, the application included controls for variables ranges, skip patterns, duplicated individuals, and intra- and inter-module consistency checks. This reduced or eliminated errors usually introduced at the data capture stage. Once programmed, the survey tools were tested by the programming team who in conjunction with the project team conducted further testing on the application's usability, in-built consistency checks (skips, variable ranges, duplicating individuals etc.), and inter-module consistency checks. Any issues raised were documented and tracked on the Issue Tracker and followed up to full and timely resolution. After internal testing was done, the tools were availed to the project and field teams to perform user acceptance testing (UAT) so as to verify and validate that the electronic platform worked exactly as expected, in terms of usability, questions design, checks and skips etc.
Data cleaning was performed to ensure that data were free of errors and that indicators generated from these data were accurate and consistent. This process begun on the first day of data collection as the first records were uploaded into the database. The data manager used data collected during pilot testing to begin writing scripts in Stata 14 to check the variables in the data in 'real-time'. This ensured the resolutions of any inconsistencies that could be addressed by the data collection teams during the fieldwork activities. The Stata 14 scripts that perform real-time checks and clean data also wrote to a .rtf file that detailed every check performed against each variable, any inconsistencies encountered, and all steps that were taken to address these inconsistencies. The .rtf files also reported when a variable was
What is the Active Prevalence of COVID-19? By Mu-Jeung Yang, Marinho Bertanha, Nathan Seegert, Maclean Gaulin, Adam Looney, Brian Orleans, Andrew T. Pavia, Kristina Stratford, Matthew Samore, Steven Alder Code repository to recreate the figures and tables in “What is the Active Prevalence of COVID-19?”, Review of Economics and Statistics, 2023 Data • Our primary data on COVID-19 positivity rates and case counts are publicly available from covidtracking.com • Population data for Utah is publicly available from the Census Bureau. • Our testing data used to calibrate our model contains sensitive private information, and is thus not available for distribution. However, researchers interested in replicating this part of the analysis can apply with an email to mjyang@ou.edu, for an anonymized and randomized subsample that replicates our main results. Decisions about data sharing will be made on a case-by-case basis. Instructions Code can generally be run in numerical order presented in filenames. All but one are stata files, run using Stata 17 (but should be generally compatible with other versions): 1. 1.0_load_data.do is run by other files, not individually. 2. 1.1_cache-load_lasso_data.do is used to create the dataset for lasso regressions, which use interactions. This file makes those interaction variables, and names them appropriately to be used in loops and with Stata’s * notation. 3. 2.1_cache_bootstrap_results.do caches the CIs from our SE bootstrap procedure, because it takes a long time to run. Caches bootstrap results to ./output/bootstrap/. 4. 3.0_table_1.do creates summary statistics and tex variables to be used in the paper. 5. 3.1_table_2.do creates table 2, which uses bootstrap SEs, so 2.1_cache_bootstrap_results.do should have been run first. Also saves off data to a temporary file for use in making figures below. 6. 3.2_table_3.do makes the state estimates in table 3. 7. 4.0_figure_1.ipynb uses python to generate Figure 1. 8. 4.1_figure_2.do makes both panels of figure 2, using the cached file from 3.1_table_2.do. 9. 5.0_appendix_c_table_1.do makes Table 1 in Appendix C. 10. 5.1_appendix_c_table_2.do makes Table 2 in Appendix C. 11. 6.0_appendix_b_figure_3.do makes figure 3 in Appendix B. To run, extract this repo to ~/Desktop/RESTAT_CODE and execute the files in Stata or Python as per above.
The 2016 Integrated Household Panel Survey (IHPS) was launched in April 2016 as part of the Malawi Fourth Integrated Household Survey fieldwork operation. The IHPS 2016 targeted 1,989 households that were interviewed in the IHPS 2013 and that could be traced back to half of the 204 enumeration areas that were originally sampled as part of the Third Integrated Household Survey (IHS3) 2010/11. The 2019 IHPS was launched in April 2019 as part of the Malawi Fifth Integrated Household Survey fieldwork operations targeting the 2,508 households that were interviewed in 2016. The panel sample expanded each wave through the tracking of split-off individuals and the new households that they formed. Available as part of this project is the IHPS 2019 data, the IHPS 2016 data as well as the rereleased IHPS 2010 & 2013 data including only the subsample of 102 EAs with updated panel weights. Additionally, the IHPS 2016 was the first survey that received complementary financial and technical support from the Living Standards Measurement Study – Plus (LSMS+) initiative, which has been established with grants from the Umbrella Facility for Gender Equality Trust Fund, the World Bank Trust Fund for Statistical Capacity Building, and the International Fund for Agricultural Development, and is implemented by the World Bank Living Standards Measurement Study (LSMS) team, in collaboration with the World Bank Gender Group and partner national statistical offices. The LSMS+ aims to improve the availability and quality of individual-disaggregated household survey data, and is, at start, a direct response to the World Bank IDA18 commitment to support 6 IDA countries in collecting intra-household, sex-disaggregated household survey data on 1) ownership of and rights to selected physical and financial assets, 2) work and employment, and 3) entrepreneurship – following international best practices in questionnaire design and minimizing the use of proxy respondents while collecting personal information. This dataset is included here.
National coverage
The IHPS 2016 and 2019 attempted to track all IHPS 2013 households stemming from 102 of the original 204 baseline panel enumeration areas as well as individuals that moved away from the 2013 dwellings between 2013 and 2016 as long as they were neither servants nor guests at the time of the IHPS 2013; were projected to be at least 12 years of age and were known to be residing in mainland Malawi but excluding those in Likoma Island and in institutions, including prisons, police compounds, and army barracks.
Sample survey data [ssd]
A sub-sample of IHS3 2010 sample enumeration areas (EAs) (i.e. 204 EAs out of 768 EAs) was selected prior to the start of the IHS3 field work with the intention to (i) to track and resurvey these households in 2013 in accordance with the IHS3 fieldwork timeline and as part of the Integrated Household Panel Survey (IHPS 2013) and (ii) visit a total of 3,246 households in these EAs twice to reduce recall associated with different aspects of agricultural data collection. At baseline, the IHPS sample was selected to be representative at the national, regional, urban/rural levels and for each of the following 6 strata: (i) Northern Region - Rural, (ii) Northern Region - Urban, (iii) Central Region - Rural, (iv) Central Region - Urban, (v) Southern Region - Rural, and (vi) Southern Region - Urban. The IHPS 2013 main fieldwork took place during the period of April-October 2013, with residual tracking operations in November-December 2013.
Given budget and resource constraints, for the IHPS 2016 the number of sample EAs in the panel was reduced to 102 out of the 204 EAs. As a result, the domains of analysis are limited to the national, urban and rural areas. Although the results of the IHPS 2016 cannot be tabulated by region, the stratification of the IHPS by region, urban and rural strata was maintained. The IHPS 2019 tracked all individuals 12 years or older from the 2016 households.
Computer Assisted Personal Interview [capi]
Data Entry Platform To ensure data quality and timely availability of data, the IHPS 2019 was implemented using the World Bank’s Survey Solutions CAPI software. To carry out IHPS 2019, 1 laptop computer and a wireless internet router were assigned to each team supervisor, and each enumerator had an 8–inch GPS-enabled Lenovo tablet computer that the NSO provided. The use of Survey Solutions allowed for the real-time availability of data as the completed data was completed, approved by the Supervisor and synced to the Headquarters server as frequently as possible. While administering the first module of the questionnaire the enumerator(s) also used their tablets to record the GPS coordinates of the dwelling units. Geo-referenced household locations from that tablet complemented the GPS measurements taken by the Garmin eTrex 30 handheld devices and these were linked with publically available geospatial databases to enable the inclusion of a number of geospatial variables - extensive measures of distance (i.e. distance to the nearest market), climatology, soil and terrain, and other environmental factors - in the analysis.
Data Management The IHPS 2019 Survey Solutions CAPI based data entry application was designed to stream-line the data collection process from the field. IHPS 2019 Interviews were mainly collected in “sample” mode (assignments generated from headquarters) and a few in “census” mode (new interviews created by interviewers from a template) for the NSO to have more control over the sample. This hybrid approach was necessary to aid the tracking operations whereby an enumerator could quickly create a tracking assignment considering that they were mostly working in areas with poor network connection and hence could not quickly receive tracking cases from Headquarters.
The range and consistency checks built into the application was informed by the LSMS-ISA experience with the IHS3 2010/11, IHPS 2013 and IHPS 2016. Prior programming of the data entry application allowed for a wide variety of range and consistency checks to be conducted and reported and potential issues investigated and corrected before closing the assigned enumeration area. Headquarters (the NSO management) assigned work to the supervisors based on their regions of coverage. The supervisors then made assignments to the enumerators linked to their supervisor account. The work assignments and syncing of completed interviews took place through a Wi-Fi connection to the IHPS 2019 server. Because the data was available in real time it was monitored closely throughout the entire data collection period and upon receipt of the data at headquarters, data was exported to Stata for other consistency checks, data cleaning, and analysis.
Data Cleaning The data cleaning process was done in several stages over the course of fieldwork and through preliminary analysis. The first stage of data cleaning was conducted in the field by the field-based field teams utilizing error messages generated by the Survey Solutions application when a response did not fit the rules for a particular question. For questions that flagged an error, the enumerators were expected to record a comment within the questionnaire to explain to their supervisor the reason for the error and confirming that they double checked the response with the respondent. The supervisors were expected to sync the enumerator tablets as frequently as possible to avoid having many questionnaires on the tablet, and to enable daily checks of questionnaires. Some supervisors preferred to review completed interviews on the tablets so they would review prior to syncing but still record the notes in the supervisor account and reject questionnaires accordingly. The second stage of data cleaning was also done in the field, and this resulted from the additional error reports generated in Stata, which were in turn sent to the field teams via email or DropBox. The field supervisors collected reports for their assignments and in coordination with the enumerators reviewed, investigated, and collected errors. Due to the quick turn-around in error reporting, it was possible to conduct call-backs while the team was still operating in the EA when required. Corrections to the data were entered in the rejected questionnaires and sent back to headquarters.
The data cleaning process was done in several stages over the course of the fieldwork and through preliminary analyses. The first stage was during the interview itself. Because CAPI software was used, as enumerators asked the questions and recorded information, error messages were provided immediately when the information recorded did not match previously defined rules for that variable. For example, if the education level for a 12 year old respondent was given as post graduate. The second stage occurred during the review of the questionnaire by the Field Supervisor. The Survey Solutions software allows errors to remain in the data if the enumerator does not make a correction. The enumerator can write a comment to explain why the data appears to be incorrect. For example, if the previously mentioned 12 year old was, in fact, a genius who had completed graduate studies. The next stage occurred when the data were transferred to headquarters where the NSO staff would again review the data for errors and verify the comments from the
These data include total anchovy and sardine biomass west and southeast of Cape Agulhas (sampled in November), and anchovy and sardine recruitment west of Cape Infanta (sampled in May). Seabird variables included % of the diet comprised of anchovy, % of the diet comprised of sardine, breeding success, and survival.
Benguela Current African Penguin - These data were collected at two seabird colonies: Dassen Island (-33.4205 lat, 18.0872 lon) and Robben Island (-33.8067, 18.371 long), South Africa.
Benguela Current Cape Gannett - These data were collected at two seabird colonies: Lamberts Bay (-32.0896 lat, 18.3026 lon) and Malgas Island (-33.0526, 17.9254 long), South Africa.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1. PERCEIVE regional panel datasets - secondary data collected from Eurostat, EU Commission on Strutural Fund Expenditures and quality of government for NUTS 1, 2 and 3 regions from 1990-2015, (STATA files). See codebook for more detail about variables
2. Flash Eurobarometer survey data on "Awarness of EU Regional Policy" and questionaires (STATA files)
3. Standard Eurobaromter survey data, annual, from 2000-2016 and questionaires (STATA files)
4. Expenditure data on EU Structural Funds, latest three budget periods (2000-2020) (Excel file)
5. Orignal PERCEIVE survey data (STATA file) and description of survey questions, descriptive results (word file)
STATA do.files for the replication of the regression analyses of hcpaters 2, 3 and 4 of the thesis titled Learning, Capabilities and Governance in Global Value Chains, by Caio Torres Mazzi
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Clearance Certificate No | 2018FBREC575Positivism philosophy underpins the study; a quantitative survey method was used to collect cross-sectional data from 409 women-owned SMEs in the Kigali sample population selected purposively. Validity and reliability tested. The data analysis package used was the Statistical Package for the Social Sciences (SPSS) latest version (26spss) and Stata 16. Ethical consideration guidelines for researcher applied during data collection.
The data was gathered from 409 women-owned SMEs in Kigali using a Personal survey; Computer-Assisted Personal Interviewing, Close-ended questionnaires. The researcher composed the questionnaire and consulted the CPUT statistician, who helped adjust the questionnaire to meet the criteria of validity and reliability to enable the analysis statistically, using SPSS and Stata. He conducted a pilot test from 15 women entrepreneurs in Kigali to ensure that the questionnaire was comprehensive, free from error, bias and easy to respond.
The questionnaire contained seven sections. 1) Demographic information was gathered using dichotomous, multiple-choice, fill-in, filter and partially closed questions. 2) Dichotomous questions, multiple-choice and follow-up questions collected information investigating the knowledge/skills of women-owned SMEs in Kigali. 3) Business profile data was collected and used fill-in, multiple-choice dichotomous, and filter questions. 4) Multiple-choice questions were used to collect motivation and opinion data from women-owned businesses in Kigali. 5) The Likert scale was used to measure the constraints faced by women-owned SMEs in Kigali. 6) Used Dichotomous, multiple-choice, fill-in, filter, partially closed, and Likert scale questions to find ICT solutions to constraints faced by women-owned SMEs in Kigali. 7) Dichotomous, multiple-choice, filter, partially closed, and Likert scale questions were used to collect data from women-owned businesses in Kigali toward the government and stakeholders' efforts and policies to facilitate and promote the integration of ICT among women businesses in Kigali. The second data, including organisation reports, government publications, journal articles, and theses, were collected from the literature review.
Data were collected according to CPUT ethics of conduct, and the researcher received the consent of respondents. The data analysis used SPSS and Stata software and presented in graphs, charts and tables.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This set of data replication file materials contain 18 files (1 READ Me document, 1 Data Codebook document, 2 Stata database files, 7 Stata program files, and 7 Stata output files corresponding to each of the program files) to perform statistical analysis included in both the article manuscript and Online Appendix documents. Please consult READ ME document before proceeding. Please note, the descriptive statistics appearing in Online Appendix A are executed in the "Manuscript" program code and corresponding output files. All statistical graphics (Stata *.gph files) were generated from the code contained in the program files listed below.
https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de449982https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de449982
Abstract (en): The National Incident-Based Reporting System (NIBRS) is a part of the Uniform Crime Reporting Program (UCR), administered by the Federal Bureau of Investigation (FBI). The extract files version of NIBRS was created to simplify working with NIBRS data. Data management issues with NIBRS are significant, especially when two or more segment levels are being merged. These issues require skills separate from data analysis. NIBRS data as formatted by the FBI are stored in a single file. These data are organized by various segment levels (record types). There are six main segment levels: administrative, offense, property, victim, offender, and arrestee. Each segment level has a different length and layout. There are other segment levels that occur with less frequency than the six main levels. Significant computing resources are necessary to work with the data in its single-file format. In addition, the user must be sophisticated in working with data in complex file types. For these reasons and the desire to facilitate the use of NIBRS data, ICPSR created the extract files. The data are not a representative sample of crime in the United States. ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection: Created variable labels and/or value labels.; Standardized missing values.; Performed recodes and/or calculated derived variables.; Checked for undocumented or out-of-range codes.. Datasets:DS0: Study-Level FilesDS1: Incident-Level FileDS2: Victim-Level FileDS3: Offender-Level FileDS4: Arrestee-Level File Law enforcement agencies in the United States participating in the National Incident-Based Reporting System. Smallest Geographic Unit: city 2018-09-27 This study has been updated to include Stata setup files and Stata system files for all data sets. Funding institution(s): United States Department of Justice. Office of Justice Programs. Federal Bureau of Investigation. United States Department of Justice. Office of Justice Programs. Bureau of Justice Statistics. self-enumerated questionnaire
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundZimbabwe has high cervical cancer (CC) burden of 19% and mortality rate of 64%. Zimbabwe uses Visual Inspection with Acetic Acid and Cervicography (VIAC) for CC screening. Manicaland and Midlands provinces recorded low VIAC positivity of 3% (target 5–25%) and treatment coverage of 78% (target = 90%) between October 2020 and September 2021.ObjectivesWe explored VIAC positivity rate and clinical management of clients screening positive in Manicaland and Midlands provinces.MethodsWe conducted a retrospective cross-sectional study using routine VIAC and CC management data for period October 2020 to September 2021. Two samples were used, 1) a sample drawn from 48,000 women VIAC screened to measure positivity rate, and 2) a sample of 1,763 VIAC positive women to assess clinical management. Kobo-based tool was used to abstract data from facility registers, and data were analyzed using STATA 15.ResultsWe analyzed data for 2,454 out of 48,000 women screened through VIAC. About 82% (2,007/2,454) were HIV positive, median ages were 40 and 38 years for HIV positives and negatives respectively. Most (64% and 77%) of HIV positive and negative clients respectively were married. VIAC positivity was 5.9% and 3.4% among HIV positive and negative women screened for the first time, and 3.2% and 5.6% for repeat visits respectively. Overall, 89.1% (1,571/1,763) of VIAC positive women received treatment. Most (41%) of those treated received thermocoagulation. Overall, 43.1% of clients received treatment on VIAC day, and 77.4% within 30 days. Six-month post-treatment coverage was 3.8%.ConclusionVIAC positivity among HIV positive women screening for the first time was 5.9%, within the expected 5–25%. Treatment coverage was high, and turnaround time from diagnosis to treatment met national standards. Post-treatment coverage was suboptimal. We recommend continued implementation of quality improvement initiatives, capacity building of clinicians, and optimization of post-treatment review of clients.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data and code needed to reproduce the results of the paper "Effects of community management on user activity in online communities", available in draft here.
Instructions:
Please note: I use both Stata and Jupyter Notebook interactively, running a block with a few lines of code at a time. Expect to have to change directories, file names etc.
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
The pooled cross-sectional dataset (1359 firms in total) available for download has some specific features:
To have a feel of the sectoral distribution of the sample, type in Stata: tab service year
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
DOI of article: 10.1016/j.worlddev.2024.106813
The analysis for this article was performed in STATA/MP 18.0. The replication instructions include all code needed to reproduce the regression results and figures presented in the main text of the article and the appendices.
Data was collected using an online survey of taxonomists, other types of scientists, and users of taxonomic information. It was processed to clean data for analysis according to the standards recorded in the survey codebook, which is also availalbe on Dryad and associated with this manuscript. Data cleaning was performed using Stata. Full information about survey methods are availalbe in the accompanying article and the survey methods supplemental data also availalbe on Dryad. This survey was pre-registered with the Open Science Framework with a full description of survey development, implementation, and analysis methods: https://osf.io/tz7ra/?view_only=4b1bc810ef794f7f9bb57240611989af
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
It is a widely accepted fact that evolving software systems change and grow. However, it is less well-understood how change is distributed over time, specifically in object oriented software systems. The patterns and techniques used to measure growth permit developers to identify specific releases where significant change took place as well as to inform them of the longer term trend in the distribution profile. This knowledge assists developers in recording systemic and substantial changes to a release, as well as to provide useful information as input into a potential release retrospective. However, these analysis methods can only be applied after a mature release of the code has been developed. But in order to manage the evolution of complex software systems effectively, it is important to identify change-prone classes as early as possible. Specifically, developers need to know where they can expect change, the likelihood of a change, and the magnitude of these modifications in order to take proactive steps and mitigate any potential risks arising from these changes. Previous research into change-prone classes has identified some common aspects, with different studies suggesting that complex and large classes tend to undergo more changes and classes that changed recently are likely to undergo modifications in the near future. Though the guidance provided is helpful, developers need more specific guidance in order for it to be applicable in practice. Furthermore, the information needs to be available at a level that can help in developing tools that highlight and monitor evolution prone parts of a system as well as support effort estimation activities. The specific research questions that we address in this chapter are: (1) What is the likelihood that a class will change from a given version to the next? (a) Does this probability change over time? (b) Is this likelihood project specific, or general? (2) How is modification frequency distributed for classes that change? (3) What is the distribution of the magnitude of change? Are most modifications minor adjustments, or substantive modifications? (4) Does structural complexity make a class susceptible to change? (5) Does popularity make a class more change-prone? We make recommendations that can help developers to proactively monitor and manage change. These are derived from a statistical analysis of change in approximately 55000 unique classes across all projects under investigation. The analysis methods that we applied took into consideration the highly skewed nature of the metric data distributions. The raw metric data (4 .txt files and 4 .log files in a .zip file measuring ~2MB in total) is provided as a comma separated values (CSV) file, and the first line of the CSV file contains the header. A detailed output of the statistical analysis undertaken is provided as log files generated directly from Stata (statistical analysis software).