Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
List of Top Schools of The Stata Journal sorted by citations.
Facebook
TwitterThe impact of public transit (PT) on income sorting in U.S. cities has long been debated. Theory suggests that richer households may cluster near PT stations to minimize commute time – or avoid them in favor of more convenient automobile commuting. The equilibrium depends on factors such as PT speed relative to cars and the income gap between rich and poor households. Empirical evidence supports both possibilities, but prior multi-city studies suffer from identification flaws. Using data from 21 U.S. light-rail (LR) systems built or expanded since 1991, this study estimates the effect of new LR stations on nearby neighborhood incomes. My event-study design improves upon earlier work by constructing controls that match pre-treatment conditions and trends in treated station areas and by correcting for the bias that staggered treatment timing can introduce to event study estimates. Across the pooled sample, there is little evidence that new LR stations make surrounding neighborhoods poorer..., , This README.txt file was generated on 2025-10-04 by Erik Nelson.
GENERAL INFORMATION
The Stata .do files in this depository generate the results that are plotted or presented in table format in the paper "The impact of light-rail stations on income sorting in US urban areas." All .do files load the needed datasets. All datasets are .xlsx format. Each Excel file contains data for the urban area that is part of the file's name. The data in each Excel file is in panel form. Each observation in a dataset represents a treated or control area i in urban area u in year t. We observe each area i's average nominal per capita and median HH income in year t = 1990, 2000, 2010, 2017, 2019, 2021, and 2022 (thes...,
Facebook
TwitterAttribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
It is a widely accepted fact that evolving software systems change and grow. However, it is less well-understood how change is distributed over time, specifically in object oriented software systems. The patterns and techniques used to measure growth permit developers to identify specific releases where significant change took place as well as to inform them of the longer term trend in the distribution profile. This knowledge assists developers in recording systemic and substantial changes to a release, as well as to provide useful information as input into a potential release retrospective. However, these analysis methods can only be applied after a mature release of the code has been developed. But in order to manage the evolution of complex software systems effectively, it is important to identify change-prone classes as early as possible. Specifically, developers need to know where they can expect change, the likelihood of a change, and the magnitude of these modifications in order to take proactive steps and mitigate any potential risks arising from these changes. Previous research into change-prone classes has identified some common aspects, with different studies suggesting that complex and large classes tend to undergo more changes and classes that changed recently are likely to undergo modifications in the near future. Though the guidance provided is helpful, developers need more specific guidance in order for it to be applicable in practice. Furthermore, the information needs to be available at a level that can help in developing tools that highlight and monitor evolution prone parts of a system as well as support effort estimation activities. The specific research questions that we address in this chapter are: (1) What is the likelihood that a class will change from a given version to the next? (a) Does this probability change over time? (b) Is this likelihood project specific, or general? (2) How is modification frequency distributed for classes that change? (3) What is the distribution of the magnitude of change? Are most modifications minor adjustments, or substantive modifications? (4) Does structural complexity make a class susceptible to change? (5) Does popularity make a class more change-prone? We make recommendations that can help developers to proactively monitor and manage change. These are derived from a statistical analysis of change in approximately 55000 unique classes across all projects under investigation. The analysis methods that we applied took into consideration the highly skewed nature of the metric data distributions. The raw metric data (4 .txt files and 4 .log files in a .zip file measuring ~2MB in total) is provided as a comma separated values (CSV) file, and the first line of the CSV file contains the header. A detailed output of the statistical analysis undertaken is provided as log files generated directly from Stata (statistical analysis software).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains data from an online experiment designed to test whether economically equivalent penalties—fees (paid before taking) and fines (paid after taking)—influence prosocial behaviour differently. Participants played a modified dictator game in which they could take points from another participant.
The dataset is provided in Excel format (Full-data.xlsx), along with a Stata do-file (submit.do) that reshapes, cleans, and analyses the data.
Platform: oTree
Recruitment: Prolific
Sample size: 201 participants
Design: Each participant played 20 rounds: 10 in the control condition and 10 in one treatment condition (fee or fine). Order of blocks was randomised.
Payment: 200 points = £1. One round was randomly selected for payment.
session – Session number
id – Participant ID
treatment – Assigned treatment (1 = Fee, 2 = Fine)
order – Order of blocks (0 = Control first, 1 = Treatment first)
For each round, participants made decisions in both control (c) and treatment (t) conditions.
c1, t1, c2, t2, … – Tokens available and/or allocated across control and treatment rounds.
takeX – Amount taken from the other participant in case X.
Social norms were elicited after the taking task. Variables include empirical, normative, and responsibility measures at both extensive and intensive margins:
eyX, etX – Empirical expectations (beliefs about what others do)
nyX, ntX – Normative expectations (beliefs about what others think is appropriate)
ryX, rtX – Responsibility measures
casenormX – Case identifier for norm elicitation
From survey responses:
Sex – Gender
Ethnicitysimplified – Simplified ethnicity category
Countryofresidence – Participant’s country of residence
order, session – Experimental setup metadata
analysis.do)The .do file performs the following steps:
Data Preparation
Import raw Excel file
Reshape from wide to long format (cases per participant)
Declare panel data (xtset id)
Variable Generation
Rename variables for clarity (e.g., take for amount taken)
Generate treatment dummies (treat)
Construct demographic dummies (gender, race, nationality)
Analysis Preparation
Create extensive and intensive margin variables
Generate expectation and norm measures
Output
Ready-to-analyse panel dataset for regression and statistical analysis
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundFull antenatal care utilization is a key intervention that creates the opportunity to provide all the necessary health services during pregnancy that aims to reduce maternal and newborn morbidity and mortality. However, there is still a gap in utilizing this service between rural and urban women. So, this study aimed to identify the sources of variations in full antenatal care utilization between the rural and urban areas of Ethiopia.MethodsThe study used the data on a nationwide representative sample of the Mini- Demographic and Health Survey (DHS) of Ethiopia. The data were collected from March 21, 2019, to June 28, 2019, in all regions of Ethiopia. Two stage cluster sampling techniques were used to select the study participants. This study included about 3,927 (weighted samples) of women aged from 15 to 49 years. A multivariate decomposition analysis technique was performed to observe the rural-urban disparities in full antenatal care utilization explained by residence difference in components of endowments and coefficients.ResultsThe prevalence of full antenatal care utilization was 43.25% (95% CI: 41.7%, 44.8%). The difference in the prevalence of full antenatal care utilization between rural and urban women was (rural prevalence was 27.73%, while in urban areas it was 15.52%). These results showed a statistically significant full antenatal care utilization gap in rural urban resident women (-0.21807, 95% CI:(-0.27397, -0.16217)). The majority of the gap was explained by the covariate distribution, which accounted for 76.84%, and the rest, 23.16%, was due to the effect of covariate differences. Educational status, wealth status, religion, region, birth order, and parity differences between urban and rural women explain most of the full antenatal care utilization disparities.Conclusion and recommendationsThere is a significant full antenatal care utilization disparity between rural and urban women in Ethiopia. This variation in the rural-urban full antenatal care utilization was explained by differences in characteristics (endowment). So to decrease this gap, emphasis should be given to resource distribution targeting rural households, improvement of maternal education and creating a platform to access information about the service and its relevance.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset consists of full database of finantial and operational data of Portuguese firms covering the period 2014-2019. In addition, geographical location data is also shared, in order to construct the spatial weights matrix. The Stata do. file is also shared with the computed routines explained in the manuscript. Any question/inquiry should be addressed to samuelf@utad.pt.
Facebook
TwitterThe files provided within this .zip file are meant to reproduce the tables and figures included in the article "Tabloid Media Campaigns and Public Opinion: Quasi-Experimental Evidence on Euroscepticism in England" by Florian Foos and Daniel Bischof in the APSR. Notice: - This is a fully reproducible archive written in Stata's project environment: https://www.statalist.org/forums/forum/general-stata-discussion/general/1302147-how-project-from-ssc-is-different-from-stata-built-in-project. - As the code is written in a project environment we advise all users to carefully read the README.TXT in order to understand how reproduction in Stata's project environment works. - The largest part of our analyses are based on yearly attitudinal data from the British Social Attitudes Survey (BSA): https://www.bsa.natcen.ac.uk. The BSA does not allow researchers to upload these data as part of their replication files; we are also not allowed to upload a recoded version of the data file. However, all yearly BSA surveys are available via the UK Data Service. In order to reproduce the results reported in this paper, you will need to a) register with the UK Data Service (https://beta.ukdataservice.ac.uk/myaccount/login) and b) access and download the relevant .dta files and place them into the replication archive (data_original/BSA/*YEAR*).
Facebook
TwitterWe experimentally investigate cooperation in 14 centres of a mentoring program where participants have two possible natural identities—individuals raised under legal guardianship, suffering a negative stereotype (G; n=112) and users without such a social stigma (NG; n=82). Participants played a Prisoners’ Dilemma game with an anonymous partner from the same centre (centre-ingroup) and from another centre (centre-outgroup). The folder contains the raw data in csv and dta (STATA) format and the script (STATA do file) used to define all the variables used and conduct all the analyses reported in the article. The analyses in the script appear in the exact order of appearance in the text, starting from the Methods section and then the Results section., , # Data from: Conflicting identities and cooperation between groups: Experimental evidence from a mentoring program
Associated article: Antonio M. EspÃn, MarÃa Paz Espinosa, MarÃa J. Vázquez-De Francisco, Pablo Brañas-Garza. Proceedings B. DOI: 10.1098/rspb.2025.1363
The folder contains the raw data in csv (Espin_et_al_ProcB_dataset.csv) and STATA (Espin_et_al_ProcB_dataset.dta) format, and the script (STATA do file: Espin_et_al_ProcB_script.do) used to define all the variables used and conduct all the analyses reported in the article. The analyses in the script appear in the exact order of appearance in the associated manuscript text, starting from the Methods section and then the Results section.
Variable description: cod_center - code of the centre the participant belongs to id - id code of the participant id_incenter - id code of the participant within the centre pop_propGcenter - proportion of G users (raised under legal guardianship) in the centre population based on administr..., We confirm that we received explicit consent from participants to publish the de-identified data in the public domain. All participants are identified by an anonymous numeric code in the data files.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Antenatal care (ANC) is the care given to pregnant by qualified medical experts in order to guarantee the optimal health conditions for the mother and the unborn child during pregnancy. Four or fewer antenatal care (ANC) visits are strongly linked to maternal and perinatal death. Because of this, the World Health Organization created a new model known as minimum of eight antenatal care (ANC8+) contact. This study aims to focus on the current antenatal care contact which not previously addressed. Therefore, the aim of this to investigate time to first antenatal care contact and its predictors among pregnant women at Bishoftu General Hospital 2023/24Methods: An institutional-based cross-sectional study design was conducted among 347 study participants which was selected by systematic random sampling method. The data was collected using pretested, structured questionnaires. Data was entered into Epi Data version 4.6 and analyzed using STATA 15. Descriptive summary statistics like median survival time, Kaplan Meier survival curve, and Log-rank test were computed. Bivariate and multivariable Weibull regresion models were fitted to identify the time to first antenatal care contact and predictors. A hazard ratio with a 95% confidence interval was calculated and p-values < 0.05 were considered statistically significantEthical approval and informed consentEthical clearance was obtained from an institutional Research Ethics Review Board (IRB) of the University of Arsi University (with Reference number, A/CHS/18/2023). In addition, a letter of ethical approval was sent to Bishoftu General Hospital to be obtained from the hospital’s administrators. Informed, voluntary, and verbal were obtained from the head of the hospital and mothers. There are no study participants under the age of 18 years. Before conducting the interviews, information was given to the participants, and were assured of voluntary participation, confidentiality, and freedom to withdraw from the study at any time. The nature and significance of the study were explained to the participantsData collection tool and proceduresTo ensure the quality of data at the beginning, a data collection questionnaire was pre-tested on 5% of the calculated sample size at Chelelaka Health Center and necessary modifications will be made based on gaps identified in the questionnaire. Any error found during the process of checking will be corrected and modifications will be made to the final version of the data abstraction format. Training will be given to data collectors and supervisors for 01 days before the actual data collection task on the already existing records, half-day theoretical and half-day practical training. Data quality will be controlled by designing the proper data collection materials, through continuous supervision. All completed data collection forms will be examined for completeness and consistency during data management, storage, cleaning, and analysis. The data will be entered and cleaned by the principal investigator before analysis. Midwives, who are working in the maternity ward, will collect the data. The principal investigator of the study will control the overall activity.
Facebook
TwitterTHE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE CENTRAL AGENCY FOR PUBLIC MOBILIZATION AND STATISTICS (CAPMAS)
In any society, the human element represents the basis of the work force which exercises all the service and production activities. Therefore, it is a mandate to produce labor force statistics and studies, that is related to the growth and distribution of manpower and labor force distribution by different types and characteristics.
In this context, the Central Agency for Public Mobilization and Statistics conducts "Quarterly Labor Force Survey" which includes data on the size of manpower and labor force (employed and unemployed) and their geographical distribution by their characteristics.
By the end of each year, CAPMAS issues the annual aggregated labor force bulletin publication that includes the results of the quarterly survey rounds that represent the manpower and labor force characteristics during the year.
----> Historical Review of the Labor Force Survey:
1- The First Labor Force survey was undertaken in 1957. The first round was conducted in November of that year, the survey continued to be conducted in successive rounds (quarterly, bi-annually, or annually) till now.
2- Starting the October 2006 round, the fieldwork of the labor force survey was developed to focus on the following two points: a. The importance of using the panel sample that is part of the survey sample, to monitor the dynamic changes of the labor market. b. Improving the used questionnaire to include more questions, that help in better defining of relationship to labor force of each household member (employed, unemployed, out of labor force ...etc.). In addition to re-order of some of the already existing questions in much logical way.
3- Starting the January 2008 round, the used methodology was developed to collect more representative sample during the survey year. this is done through distributing the sample of each governorate into five groups, the questionnaires are collected from each of them separately every 15 days for 3 months (in the middle and the end of the month)
----> The survey aims at covering the following topics:
1- Measuring the size of the Egyptian labor force among civilians (for all governorates of the republic) by their different characteristics. 2- Measuring the employment rate at national level and different geographical areas. 3- Measuring the distribution of employed people by the following characteristics: gender, age, educational status, occupation, economic activity, and sector. 4- Measuring unemployment rate at different geographic areas. 5- Measuring the distribution of unemployed people by the following characteristics: gender, age, educational status, unemployment type "ever employed/never employed", occupation, economic activity, and sector for people who have ever worked.
The raw survey data provided by the Statistical Agency were cleaned and harmonized by the Economic Research Forum, in the context of a major project that started in 2009. During which extensive efforts have been exerted to acquire, clean, harmonize, preserve and disseminate micro data of existing labor force surveys in several Arab countries.
Covering a sample of urban and rural areas in all the governorates.
1- Household/family. 2- Individual/person.
The survey covered a national sample of households and all individuals permanently residing in surveyed households.
Sample survey data [ssd]
THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE CENTRAL AGENCY FOR PUBLIC MOBILIZATION AND STATISTICS (CAPMAS)
----> Sample Design and Selection
The sample of the LFS 2006 survey is a simple systematic random sample.
----> Sample Size
The sample size varied in each quarter (it is Q1=19429, Q2=19419, Q3=19119 and Q4=18835) households with a total number of 76802 households annually. These households are distributed on the governorate level (urban/rural).
A more detailed description of the different sampling stages and allocation of sample across governorates is provided in the Methodology document available among external resources in Arabic.
Face-to-face [f2f]
The questionnaire design follows the latest International Labor Organization (ILO) concepts and definitions of labor force, employment, and unemployment.
The questionnaire comprises 3 tables in addition to the identification and geographic data of household on the cover page.
----> Table 1- Demographic and employment characteristics and basic data for all household individuals
Including: gender, age, educational status, marital status, residence mobility and current work status
----> Table 2- Employment characteristics table
This table is filled by employed individuals at the time of the survey or those who were engaged to work during the reference week, and provided information on: - Relationship to employer: employer, self-employed, waged worker, and unpaid family worker - Economic activity - Sector - Occupation - Effective working hours - Work place - Average monthly wage
----> Table 3- Unemployment characteristics table
This table is filled by all unemployed individuals who satisfied the unemployment criteria, and provided information on: - Type of unemployment (unemployed, unemployed ever worked) - Economic activity and occupation in the last held job before being unemployed - Last unemployment duration in months - Main reason for unemployment
----> Raw Data
Office editing is one of the main stages of the survey. It started once the questionnaires were received from the field and accomplished by the selected work groups. It includes: a-Editing of coverage and completeness b-Editing of consistency
----> Harmonized Data
Facebook
TwitterAttribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
The present dataset contains the data about cases and controls extracted from the Animal Tumours Registry of Lazio region (Italy). The excel file contains three excel worksheets. The first one is the raw dataset, the second one is the STATA codified dataset with the coding legend and the third one is the codified dataset, in order to allow the statistical analysis by STATA software.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Percentage distribution of socio-demographic characteristics among respondents, 2005, 2011 and 2016 EDHS.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the database and Stata commands used to generate the results obtained on the paper with the same name (currently under review)
Facebook
TwitterWhat is the Active Prevalence of COVID-19? By Mu-Jeung Yang, Marinho Bertanha, Nathan Seegert, Maclean Gaulin, Adam Looney, Brian Orleans, Andrew T. Pavia, Kristina Stratford, Matthew Samore, Steven Alder Code repository to recreate the figures and tables in “What is the Active Prevalence of COVID-19?”, Review of Economics and Statistics, 2023 Data • Our primary data on COVID-19 positivity rates and case counts are publicly available from covidtracking.com • Population data for Utah is publicly available from the Census Bureau. • Our testing data used to calibrate our model contains sensitive private information, and is thus not available for distribution. However, researchers interested in replicating this part of the analysis can apply with an email to mjyang@ou.edu, for an anonymized and randomized subsample that replicates our main results. Decisions about data sharing will be made on a case-by-case basis. Instructions Code can generally be run in numerical order presented in filenames. All but one are stata files, run using Stata 17 (but should be generally compatible with other versions): 1. 1.0_load_data.do is run by other files, not individually. 2. 1.1_cache-load_lasso_data.do is used to create the dataset for lasso regressions, which use interactions. This file makes those interaction variables, and names them appropriately to be used in loops and with Stata’s * notation. 3. 2.1_cache_bootstrap_results.do caches the CIs from our SE bootstrap procedure, because it takes a long time to run. Caches bootstrap results to ./output/bootstrap/. 4. 3.0_table_1.do creates summary statistics and tex variables to be used in the paper. 5. 3.1_table_2.do creates table 2, which uses bootstrap SEs, so 2.1_cache_bootstrap_results.do should have been run first. Also saves off data to a temporary file for use in making figures below. 6. 3.2_table_3.do makes the state estimates in table 3. 7. 4.0_figure_1.ipynb uses python to generate Figure 1. 8. 4.1_figure_2.do makes both panels of figure 2, using the cached file from 3.1_table_2.do. 9. 5.0_appendix_c_table_1.do makes Table 1 in Appendix C. 10. 5.1_appendix_c_table_2.do makes Table 2 in Appendix C. 11. 6.0_appendix_b_figure_3.do makes figure 3 in Appendix B. To run, extract this repo to ~/Desktop/RESTAT_CODE and execute the files in Stata or Python as per above.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Stata dta and do file of the paper "Stoop, J., Van Soest, D., and Vyrastekova, J. (2018). Rewards and cooperation in social dilemma games. Journal of Environmental Economics and Management, 88, 300-310".The do file shows the statistical analyses of this paper, showed in the order in which they appear in the paper.
Facebook
TwitterTHE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE PALESTINIAN CENTRAL BUREAU OF STATISTICS
The Palestinian Central Bureau of Statistics (PCBS) carried out four rounds of the Labor Force Survey 2017 (LFS). The survey rounds covered a total sample of about 23,120 households (5,780 households per quarter).
The main objective of collecting data on the labour force and its components, including employment, unemployment and underemployment, is to provide basic information on the size and structure of the Palestinian labour force. Data collected at different points in time provide a basis for monitoring current trends and changes in the labour market and in the employment situation. These data, supported with information on other aspects of the economy, provide a basis for the evaluation and analysis of macro-economic policies.
The raw survey data provided by the Statistical Agency were cleaned and harmonized by the Economic Research Forum, in the context of a major project that started in 2009. During which extensive efforts have been exerted to acquire, clean, harmonize, preserve and disseminate micro data of existing labor force surveys in several Arab countries.
Covering a representative sample on the region level (West Bank, Gaza Strip), the locality type (urban, rural, camp) and the governorates.
1- Household/family. 2- Individual/person.
The survey covered all Palestinian households who are a usual residence of the Palestinian Territory.
Sample survey data [ssd]
THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE PALESTINIAN CENTRAL BUREAU OF STATISTICS
The methodology was designed according to the context of the survey, international standards, data processing requirements and comparability of outputs with other related surveys.
---> Target Population: It consists of all individuals aged 10 years and Above and there are staying normally with their households in the state of Palestine during 2017.
---> Sampling Frame: The sampling frame consists of the master sample, which was updated in 2011: each enumeration area consists of buildings and housing units with an average of about 124 households. The master sample consists of 596 enumeration areas; we used 494 enumeration areas as a framework for the labor force survey sample in 2017 and these units were used as primary sampling units (PSUs).
---> Sampling Size: The estimated sample size is 5,780 households in each quarter of 2017.
---> Sample Design The sample is two stage stratified cluster sample with two stages : First stage: we select a systematic random sample of 494 enumeration areas for the whole round ,and we excluded the enumeration areas which its sizes less than 40 households. Second stage: we select a systematic random sample of 16 households from each enumeration area selected in the first stage, se we select a systematic random of 16 households of the enumeration areas which its size is 80 household and over and the enumeration areas which its size is less than 80 households we select systematic random of 8 households.
---> Sample strata: The population was divided by: 1- Governorate (16 governorate) 2- Type of Locality (urban, rural, refugee camps).
---> Sample Rotation: Each round of the Labor Force Survey covers all of the 494 master sample enumeration areas. Basically, the areas remain fixed over time, but households in 50% of the EAs were replaced in each round. The same households remain in the sample for two consecutive rounds, left for the next two rounds, then selected for the sample for another two consecutive rounds before being dropped from the sample. An overlap of 50% is then achieved between both consecutive rounds and between consecutive years (making the sample efficient for monitoring purposes).
Face-to-face [f2f]
The survey questionnaire was designed according to the International Labour Organization (ILO) recommendations. The questionnaire includes four main parts:
---> 1. Identification Data: The main objective for this part is to record the necessary information to identify the household, such as, cluster code, sector, type of locality, cell, housing number and the cell code.
---> 2. Quality Control: This part involves groups of controlling standards to monitor the field and office operation, to keep in order the sequence of questionnaire stages (data collection, field and office coding, data entry, editing after entry and store the data.
---> 3. Household Roster: This part involves demographic characteristics about the household, like number of persons in the household, date of birth, sex, educational level…etc.
---> 4. Employment Part: This part involves the major research indicators, where one questionnaire had been answered by every 15 years and over household member, to be able to explore their labour force status and recognize their major characteristics toward employment status, economic activity, occupation, place of work, and other employment indicators.
---> Raw Data PCBS started collecting data since 1st quarter 2017 using the hand held devices in Palestine excluding Jerusalem in side boarders (J1) and Gaza Strip, the program used in HHD called Sql Server and Microsoft. Net which was developed by General Directorate of Information Systems. Using HHD reduced the data processing stages, the fieldworkers collect data and sending data directly to server then the project manager can withdrawal the data at any time he needs. In order to work in parallel with Gaza Strip and Jerusalem in side boarders (J1), an office program was developed using the same techniques by using the same database for the HHD.
---> Harmonized Data - The SPSS package is used to clean and harmonize the datasets. - The harmonization process starts with a cleaning process for all raw data files received from the Statistical Agency. - All cleaned data files are then merged to produce one data file on the individual level containing all variables subject to harmonization. - A country-specific program is generated for each dataset to generate/ compute/ recode/ rename/ format/ label harmonized variables. - A post-harmonization cleaning process is then conducted on the data. - Harmonized data is saved on the household as well as the individual level, in SPSS and then converted to STATA, to be disseminated.
The survey sample consists of about 30,230 households of which 23,120 households completed the interview; whereas 14,682 households from the West Bank and 8,438 households in Gaza Strip. Weights were modified to account for non-response rate. The response rate in the West Bank reached 82.4% while in the Gaza Strip it reached 92.7%.
---> Sampling Errors Data of this survey may be affected by sampling errors due to use of a sample and not a complete enumeration. Therefore, certain differences can be expected in comparison with the real values obtained through censuses. Variances were calculated for the most important indicators: the variance table is attached with the final report. There is no problem in disseminating results at national or governorate level for the West Bank and Gaza Strip.
---> Non-Sampling Errors Non-statistical errors are probable in all stages of the project, during data collection or processing. This is referred to as non-response errors, response errors, interviewing errors, and data entry errors. To avoid errors and reduce their effects, great efforts were made to train the fieldworkers intensively. They were trained on how to carry out the interview, what to discuss and what to avoid, carrying out a pilot survey, as well as practical and theoretical training during the training course. Also data entry staff were trained on the data entry program that was examined before starting the data entry process. To stay in contact with progress of fieldwork activities and to limit obstacles, there was continuous contact with the fieldwork team through regular visits to the field and regular meetings with them during the different field visits. Problems faced by fieldworkers were discussed to clarify any issues. Non-sampling errors can occur at the various stages of survey implementation whether in data collection or in data processing. They are generally difficult to be evaluated statistically.
They cover a wide range of errors, including errors resulting from non-response, sampling frame coverage, coding and classification, data processing, and survey response (both respondent and interviewer-related). The use of effective training and supervision and the careful design of questions have direct bearing on limiting the magnitude of non-sampling errors, and hence enhancing the quality of the resulting data. The implementation of the survey encountered non-response where the case ( household was not present at home ) during the fieldwork visit and the case ( housing unit is vacant) become the high percentage of the non response cases. The total non-response rate reached14.2% which is very low once compared to the household surveys conducted by PCBS , The refusal rate reached 3.0% which is very low percentage compared to the
Facebook
TwitterThe Health Survey for England, 2000-2001: Small Area Estimation Teaching Dataset was prepared as a resource for those interested in learning introductory small area estimation techniques. It was first presented as part of a workshop entitled 'Introducing small area estimation techniques and applying them to the Health Survey for England using Stata'. The data are accompanied by a guide that includes a practical case study enabling users to derive estimates of disability for districts in the absence of survey estimates. This is achieved using various models that combine information from ESDS government surveys with other aggregate data that are reliably available for sub-national areas. Analysis is undertaken using Stata statistical software; all relevant syntax is provided in the accompanying '.do' files.
The data files included in this teaching resource contain HSE variables and data from the Census and Mid-year population estimates and projections that were developed originally by the National Statistical agencies, as follows:
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The zip files contain several files with wills from Mexico between 1810 and 1910 collected in order to measure Mexican wealth distribution in its first century of independence. The main file is wills_clean.xlsx, which contains the full collection of wills; in that file, you will find variables for year, state, and wealth, not excluding debts, debts and wealth (net wealth). You can combine this file with the do file cleaningroutine_for_social_tables to produce the detailed social tables. The rest of the files consist of data files with the social tables (for comparison) and xlsx files with the wills from the main file divided by decade to facilitate calculations using the do file inequality_analysis_ routine_clean.do from which you will be able to reproduce the rest of the analysis (unbalanced sample and generalized beta, lognormal, etc.) Note: The calculation programs are .do files; thus, they require stata to be executed. Some of the detailed social tables are dta files, and thus also stata files. You can open them in R and work with them or convert them to any other data format. The wills come from 5 different Mexican archives: Archivo Histórico de Notarias de la Ciudad de México, Archivo General del Estado de Yucatán, Archivo Municipal de Saltillo, Archivo Histórico de la Ciudad de Morelia and, Testamentos del Colegio de Sonora.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ObjectivesThis study aimed to measure the prevalence and associated factors of Intimate Partner Violence (IPV) among women living with and without HIV in Wolaita Zone, Southern Ethiopia.MethodsA comparative cross-sectional study design was used to interview the 816 women between 18–49 years of age (408 = HIV positive, 408 = HIV negative). Using a multistage sampling technique, participants were recruited from nine health facilities based on probability proportional to the number of clients. After data entry (EpiData version 4.4.2.0) the data were exported to STATA/SE 15 software. Binary and multivariable logistic regression analysis were undertaken and the odds ratio (OR) and 95% confidence interval (CI) are presented.ResultsThe lifetime prevalence of IPV among all women was 59.7%, [95% CI: 56.31%-63.05%]. IPV was slightly higher among women living with HIV, 250(61.3%), than those who were HIV negative, 238(58.1%). Lifetime prevalence of emotional violence 413(50.6%), physical violence 349(42.8%), sexual violence 219(26.8%), and controlling behaviours by husbands/partners 489(59.9%) were reported. Associations were found between IPV and controlling behaviour of husband/partner [AOR = 8.13; 95% CI: 4.93–13.42],income [AOR = 3.97; 95% CI:1.81–8.72], bride price payment [AOR = 3.46; 95% CI:1.74–6.87], women’s decision to refuse sex [AOR = 2.99; 95% CI: 1.39–6.41],age group of women [AOR = 2.86; 95% CI:1.67–4.90], partner’s family choosing wife [AOR = 2.83; 95% CI:1.70–4.69], alcohol consumption by partner [AOR = 2.36;95% CI:1.36–4.10], number of sexual partners [AOR = 2.35; 95% CI:1.36–4.09], and if partner ever physically fought with another man [AOR = 1.83; 95% CI:1.05–3.19].ConclusionsThere is a high prevalence of IPV against women both living with and without HIV. Policy priorities should therefore involve males in programs of gender-based violence prevention in order to change their violent behaviour, and interventions are required to improve the economic status of women. Both sexes should be advised to have a single partner and marriage arrangements should be by mutual consent rather than being made by parents.
Facebook
TwitterUnderstanding Society (the UK Household Longitudinal Study), which began in 2009, is conducted by the Institute for Social and Economic Research (ISER) at the University of Essex, and the survey research organisations Verian Group (formerly Kantar Public) and NatCen. It builds on and incorporates, the British Household Panel Survey (BHPS), which began in 1991.
The Understanding Society: Calendar Year Dataset, 2020, is designed to enable cross-sectional analysis of individuals and households relating specifically to their annual interviews conducted in the year 2020, and, therefore, combine data collected in three waves (Waves 10, 11 and 12). It has been produced from the same data collected in the main Understanding Society study and released in the longitudinal datasets SN 6614 (End User Licence) and SN 6931 (Special Licence). Such cross-sectional analysis can, however, only involve variables that are collected in every wave in order to have data for the full sample panel. The 2020 dataset is the first of a series of planned Calendar Year Datasets to facilitate cross-sectional analysis of specific years. Full details of the Calendar Year Dataset sample structure (including why some individual interviews from 2021 are included), data structure and additional supporting information can be found in the document '8987_calendar_year_dataset_2020_user_guide'.
As multi-topic studies, the purpose of Understanding Society is to understand short- and long-term effects of social and economic change in the UK at the household and individual levels. The study has a strong emphasis on domains of family and social ties, employment, education, financial resources, and health. Understanding Society is an annual survey of each adult member of a nationally representative sample. The same individuals are re-interviewed in each wave approximately 12 months apart. When individuals move they are followed within the UK and anyone joining their households are also interviewed as long as they are living with them. The fieldwork period for a single wave is 24 months. Data collection uses computer-assisted personal interviewing (CAPI) and web interviews (from wave 7), and includes a telephone mop up. From March 2020 (the end of wave 10 and 2nd year of wave 11), due to the coronavirus pandemic, face-to-face interviews were suspended and the survey has been conducted by web and telephone only, but otherwise has continued as before. One person completes the household questionnaire. Each person aged 16 or older participates in the individual adult interview and self-completed questionnaire. Youths aged 10 to 15 are asked to respond to a paper self-completion questionnaire. In 2020 an additional frequent web survey was separately issued to sample members to capture data on the rapid changes in people’s lives due to the COVID-19 pandemic (see SN 8644). The COVID-19 Survey data are not included in this dataset.
Further information may be found on the "https://www.understandingsociety.ac.uk/documentation/mainstage"> Understanding Society main stage webpage and links to publications based on the study can be found on the Understanding Society Latest Research webpage.
Co-funders
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
List of Top Schools of The Stata Journal sorted by citations.