Here you find Stata code, which is used for the development and validation of measurement instruments (questionnaires, tests, items, scales) for the social sciences. The description of the analyses carried out with the code can be found in the appendices A1 to A5 of the ZIS Publication Guide. Each code includes comments to guide users through the code. We provide the data set “example1” to run the code.
We provide:
Code for testing the dimensionality of scales comprises exploratory factor analysis, principal component analysis, and confirmatory factor analysis (tau-congeneric and tau-equivalent). For the description of the analyses, see appendices A1 to A2 of the ZIS Publication Guide.
Code used to estimate reliability comprises the estimation of split-half reliability, retest reliability, reliability coefficients for single-factor models (Cronbach’s Alpha, McDonald’s Omega/Raykov’s Rho, AVE [Average Variance Extracted]), and bi-factor models (Omega-H, ECV [Explained Common Variance]). For the description of the analyses, see appendix A3 of the ZIS Publication Guide.
Code for measurement invariance testing within SEM. For the description of the analyses see appendix A5 of the ZIS Publication Guide.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1. PERCEIVE regional panel datasets - secondary data collected from Eurostat, EU Commission on Strutural Fund Expenditures and quality of government for NUTS 1, 2 and 3 regions from 1990-2015, (STATA files). See codebook for more detail about variables
2. Flash Eurobarometer survey data on "Awarness of EU Regional Policy" and questionaires (STATA files)
3. Standard Eurobaromter survey data, annual, from 2000-2016 and questionaires (STATA files)
4. Expenditure data on EU Structural Funds, latest three budget periods (2000-2020) (Excel file)
5. Orignal PERCEIVE survey data (STATA file) and description of survey questions, descriptive results (word file)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Data set of inspections and stata code for replication. To replicate results from the paper, download both files, edit the paths in the do file appropriately, and run. All analyses run in stata 15.1.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description of variables and measurement for the study, Jimma Zone, Southwest Ethiopia, September 2012-December 2013.
The English Longitudinal Study of Ageing (ELSA) is a longitudinal survey of ageing and quality of life among older people that explores the dynamic relationships between health and functioning, social networks and participation, and economic position as people plan for, move into and progress beyond retirement. The main objectives of ELSA are to:
Further information may be found on the "https://www.elsa-project.ac.uk/"> ELSA project website, the or Natcen Social Research: ELSA web pages.
Wave 11 data has been deposited - May 2025
For the 45th edition (May 2025) ELSA Wave 11 core and pension grid data and documentation were deposited. Users should note this dataset version does not contain the survey weights. A version with the survey weights along with IFS and financial derived datasets will be deposited in due course. In the meantime, more information about the data collection or the data collected during this wave of ELSA can be found in the Wave 11 Technical Report or the User Guide.
Health conditions research with ELSA - June 2021
The ELSA Data team have found some issues with historical data measuring health conditions. If you are intending to do any analysis looking at the following health conditions, then please read the ELSA User Guide or if you still have questions contact elsadata@natcen.ac.uk for advice on how you should approach your analysis. The affected conditions are: eye conditions (glaucoma; diabetic eye disease; macular degeneration; cataract), CVD conditions (high blood pressure; angina; heart attack; Congestive Heart Failure; heart murmur; abnormal heart rhythm; diabetes; stroke; high cholesterol; other heart trouble) and chronic health conditions (chronic lung disease; asthma; arthritis; osteoporosis; cancer; Parkinson's Disease; emotional, nervous or psychiatric problems; Alzheimer's Disease; dementia; malignant blood disorder; multiple sclerosis or motor neurone disease).
For information on obtaining data from ELSA that are not held at the UKDS, see the ELSA Genetic data access and Accessing ELSA data webpages.
Wave 10 Health data
Users should note that in Wave 10, the health section of the ELSA questionnaire has been revised and all respondents were asked anew about their health conditions, rather than following the prior approach of asking those who had taken part in the past waves to confirm previously recorded conditions. Due to this reason, the health conditions feed-forward data was not archived for Wave 10, as was done in previous waves.
Harmonized dataset:
Users of the Harmonized dataset who prefer to use the Stata version will need access to Stata MP software, as the version G3 file contains 11,779 variables (the limit for the standard Stata 'Intercooled' version is 2,047).
ELSA COVID-19 study:
A separate ad-hoc study conducted with ELSA respondents, measuring the socio-economic effects/psychological impact of the lockdown on the aged 50+ population of England, is also available under SN 8688,
English Longitudinal Study of Ageing COVID-19 Study.
The metadata set does not comprise any description or summary. The information has not been provided.
These are the different datasets used in the analysis of the relation between protest, protest campaigns, and armed conflict in Colombia and South Africa. Different files are included. 1. Excell file containing a description of the different variables and their sources 2. Stata file of the data (appended data) and stata file for each hypothesis 3. Do file for the analysis used for undertaking the statistical analysis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Integrated Postsecondary Education Data System (IPEDS) Complete Data Files from 1980 to 2023. Includes data file, STATA data file, SPSS program, SAS program, STATA program, and dictionary. All years compressed into one .zip file due to storage limitations.From IPEDS Complete Data File Help Page (https://nces.ed.gov/Ipeds/help/complete-data-files):Choose the file to download by reading the description in the available titles. Then, click on the link in that row corresponding to the column header of the type of file/information desired to download.To download and view the survey files in basic CSV format use the main download link in the Data File column.For files compatible with the Stata statistical software package, use the alternate download link in the Stata Data File column.To download files with the SPSS, SAS, or STATA (.do) file extension for use with statistical software packages, use the download link in the Programs column.To download the data Dictionary for the selected file, click on the corresponding link in the far right column of the screen. The data dictionary serves as a reference for using and interpreting the data within a particular survey file. This includes the names, definitions, and formatting conventions for each table, field, and data element within the file, important business rules, and information on any relationships to other IPEDS data.For statistical read programs to work properly, both the data file and the corresponding read program file must be downloaded to the same subdirectory on the computer’s hard drive. Download the data file first; then click on the corresponding link in the Programs column to download the desired read program file to the same subdirectory.When viewing downloaded survey files, categorical variables are identified using codes instead of labels. Labels for these variables are available in both the data read program files and data dictionary for each file; however, for files that automatically incorporate this information you will need to select the Custom Data Files option.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This repository contains code used in the meta-analysis software Stata to perform the meta-analyses detailed in the research output "Pharmacological targeting of the CCL2/CCR2 axis for atheroprotection: a meta-analysis of preclinical studies". A preprint of the manuscript containing all meta-analysis results, figures, and a detailed description of study methodology has been deposited in BioRxiv: https://www.biorxiv.org/content/10.1101/2021.04.16.439554v1. The final version of this manuscript will be linked here once it has undergone peer-review and publication.
This exercise dataset was created for researchers interested in learning how to use the models described in the "Handbook on Impact Evaluation: Quantitative Methods and Practices" by S. Khandker, G. Koolwal and H. Samad, World Bank, October 2009 (permanent URL http://go.worldbank.org/FE8098BI60).
Public programs are designed to reach certain goals and beneficiaries. Methods to understand whether such programs actually work, as well as the level and nature of impacts on intended beneficiaries, are main themes of this book. Has the Grameen Bank, for example, succeeded in lowering consumption poverty among the rural poor in Bangladesh? Can conditional cash transfer programs in Mexico and Latin America improve health and schooling outcomes for poor women and children? Does a new road actually raise welfare in a remote area in Tanzania, or is it a "highway to nowhere?"
This handbook reviews quantitative methods and models of impact evaluation. It begings by reviewing the basic issues pertaining to an evaluation of an intervention to reach certain targets and goals. It then focuses on the experimental design of an impact evaluation, highlighting its strengths and shortcomings, followed by discussions on various non-experimental methods. The authors also cover methods to shed light on the nature and mechanisms by which different participants are benefiting from the program.
The handbook provides STATA exercises in the context of evaluating major microcredit programs in Bangladesh, such as the Grameen Bank. This dataset provides both the related Stata data files and the Stata programs.
Sample survey data [ssd]
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
It is understood that ensuring equation balance is a necessary condition for a valid model of times series data. Yet, the definition of balance provided so far has been incomplete and there has not been a consistent understanding of exactly why balance is important or how it can be applied. The discussion to date has focused on the estimates produced by the GECM. In this paper, we go beyond the GECM and be- yond model estimates. We treat equation balance as a theoretical matter, not merely an empirical one, and describe how to use the concept of balance to test theoretical propositions before longitudinal data have been gathered. We explain how equation balance can be used to check if your theoretical or empirical model is either wrong or incomplete in a way that will prevent a meaningful interpretation of the model. We also raise the issue of “I(0) balance” and its importance. The replication dataset includes the Stata .do file and .dta file to replicate the analysis in section 4.1 of the Supplementary Information.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Unadjusted odds ratios (UOR), adjusted odds ratios (AOR), and 95% confidence intervals (CI) of quality ANC consultation.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
This spreadsheet provides all cleaned and validated data used in the analysis of the GSLWG survey to gather opinions about the governance of taxonomic lists. Data are anonymous. Interpertations of variables are available in a separate codebook file, also available on Dryad and associated with this manuscript. In addition to raw survey data, additional supplemental data are provided: 1. Coding manual providing definitions for each variable included in the survey dataset in .csv format 2. The data analysis code in Stata .do format and PDF format 3. The survey instrument in several languages in PDF format 4. A detailed description of the survey methodology and data analysis approach in PDF format 5. The full results of the survey in tabular form 6. Additional figures presenting survey results All data are also available on the website of the Open Science Framework (OSF), along with survey pre-registration data: https://osf.io/tz7ra/?view_only=4b1bc810ef794f7f9bb57240611989af Methods Data was collected using an online survey of taxonomists, other types of scientists, and users of taxonomic information. It was processed to clean data for analysis according to the standards recorded in the survey codebook, which is also availalbe on Dryad and associated with this manuscript. Data cleaning was performed using Stata. Full information about survey methods are availalbe in the accompanying article and the survey methods supplemental data also availalbe on Dryad. This survey was pre-registered with the Open Science Framework with a full description of survey development, implementation, and analysis methods: https://osf.io/tz7ra/?view_only=4b1bc810ef794f7f9bb57240611989af
The high-frequency phone survey of refugees monitors the economic and social impact of and responses to the COVID-19 pandemic on refugees and nationals, by calling a sample of households every four weeks. The main objective is to inform timely and adequate policy and program responses. Since the outbreak of the COVID-19 pandemic in Ethiopia, two rounds of data collection of refugees were completed between September and November 2020. The first round of the joint national and refugee HFPS was implemented between the 24 September and 17 October 2020 and the second round between 20 October and 20 November 2020.
Household
Sample survey data [ssd]
The sample was drawn using a simple random sample without replacement. Expecting a high non-response rate based on experience from the HFPS-HH, we drew a stratified sample of 3,300 refugee households for the first round. More details on sampling methodology are provided in the Survey Methodology Document available for download as Related Materials.
Computer Assisted Telephone Interview [cati]
The Ethiopia COVID-19 High Frequency Phone Survey of Refugee questionnaire consists of the following sections:
A more detailed description of the questionnaire is provided in Table 1 of the Survey Methodology Document that is provided as Related Materials. Round 1 and 2 questionnaires available for download.
DATA CLEANING At the end of data collection, the raw dataset was cleaned by the Research team. This included formatting, and correcting results based on monitoring issues, enumerator feedback and survey changes. Data cleaning carried out is detailed below.
Variable naming and labeling: • Variable names were changed to reflect the lowercase question name in the paper survey copy, and a word or two related to the question. • Variables were labeled with longer descriptions of their contents and the full question text was stored in Notes for each variable. • “Other, specify” variables were named similarly to their related question, with “_other” appended to the name. • Value labels were assigned where relevant, with options shown in English for all variables, unless preloaded from the roster in Amharic.
Variable formatting:
• Variables were formatted as their object type (string, integer, decimal, time, date, or datetime).
• Multi-select variables were saved both in space-separated single-variables and as multiple binary variables showing the yes/no value of each possible response.
• Time and date variables were stored as POSIX timestamp values and formatted to show Gregorian dates.
• Location information was left in separate ID and Name variables, following the format of the incoming roster. IDs were formatted to include only the variable level digits, and not the higher-level prefixes (2-3 digits only.)
• Only consented surveys were kept in the dataset, and all personal information and internal survey variables were dropped from the clean dataset. • Roster data is separated from the main data set and kept in long-form but can be merged on the key variable (key can also be used to merge with the raw data).
• The variables were arranged in the same order as the paper instrument, with observations arranged according to their submission time.
Backcheck data review: Results of the backcheck survey are compared against the originally captured survey results using the bcstats command in Stata. This function delivers a comparison of variables and identifies any discrepancies. Any discrepancies identified are then examined individually to determine if they are within reason.
The following data quality checks were completed: • Daily SurveyCTO monitoring: This included outlier checks, skipped questions, a review of “Other, specify”, other text responses, and enumerator comments. Enumerator comments were used to suggest new response options or to highlight situations where existing options should be used instead. Monitoring also included a review of variable relationship logic checks and checks of the logic of answers. Finally, outliers in phone variables such as survey duration or the percentage of time audio was at a conversational level were monitored. A survey duration of close to 15 minutes and a conversation-level audio percentage of around 40% was considered normal. • Dashboard review: This included monitoring individual enumerator performance, such as the number of calls logged, duration of calls, percentage of calls responded to and percentage of non-consents. Non-consent reason rates and attempts per household were monitored as well. Duration analysis using R was used to monitor each module's duration and estimate the time required for subsequent rounds. The dashboard was also used to track overall survey completion and preview the results of key questions. • Daily Data Team reporting: The Field Supervisors and the Data Manager reported daily feedback on call progress, enumerator feedback on the survey, and any suggestions to improve the instrument, such as adding options to multiple choice questions or adjusting translations. • Audio audits: Audio recordings were captured during the consent portion of the interview for all completed interviews, for the enumerators' side of the conversation only. The recordings were reviewed for any surveys flagged by enumerators as having data quality concerns and for an additional random sample of 2% of respondents. A range of lengths were selected to observe edge cases. Most consent readings took around one minute, with some longer recordings due to questions on the survey or holding for the respondent. All reviewed audio recordings were completed satisfactorily. • Back-check survey: Field Supervisors made back-check calls to a random sample of 5% of the households that completed a survey in Round 1. Field Supervisors called these households and administered a short survey, including (i) identifying the same respondent; (ii) determining the respondent's position within the household; (iii) confirming that a member of the the data collection team had completed the interview; and (iv) a few questions from the original survey.
https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de459163https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de459163
Abstract (en): The Longitudinal Study of Generations (LSOG), initiated in 1971, began as a survey of intergenerational relations among 300 three-generation California families with grandparents (then in their sixties), middle-aged parents (then in their early forties), and grandchildren (then aged 15 to 26). The study broadened in 1991 and now includes a fourth generation, the great-grandchildren of these same families. The LSOG, with a fully elaborated generation-sequential design, allows comparisons of sets of aging parents and children at the same stage of life but during different historical periods. These comparisons make possible the investigation of the effects of social change on inter-generational solidarity or conflict across 35 years and four generations, as well as the effects of social change on the ability of families to buffer stressful life transitions (e.g., aging, divorce and remarriage, higher female labor force participation, changes in work and the economy, and possible weakening of family norms of obligation), and the effects of social change on the transmission of values, resources, and behaviors across generations. The LSOG contains information on family structure, household composition, affectual solidarity and conflict, values, attitudes, behaviors, role importance, marital relationships, health and fitness, mental health and well-being, caregiving, leisure activities, and life events and concerns. Demographic variables include age, sex, income, employment status, marital status, socioeconomic history, education, religion, ethnicity, and military service. Presence of Common Scales: Affectual Solidarity Reliability, Consensual Solidarity (Socialization), Associational Solidarity, Functional Solidarity, Intergenerational Social Support, Normative Solidarity, Familism, Structural Solidarity, Intergenerational Feelings of Conflict, Management of Conflict Tactics, Rosenberg Self-Esteem, Depression (CES-D), Locus of Control, Bradburn Affect Balance, Eysenck Extraversion/Neuroticism, Anxiety (Hopkins Symptom Checklist), Activities of Daily Living (IADL/ADL), Religious Ideology, Political Conservatism, Gender Role Ideology, Individualism/Collectivism, Materialism/Humanism, Work Satisfaction, Gilford-Bengtson Marital Satisfaction Datasets:DS0: Study-Level FilesDS1: Waves 1-7DS2: Wave 8 Multi-generation families in California. Smallest Geographic Unit: None Families were drawn randomly from a subscriber list of 840,000 members of a California Health Maintenance Organization in Los Angeles. Families were recruited by enlisting a grandfather over the age of 60 who was part of a three-generation family that was willing to participate. 2019-08-21 The data were updated and resupplied by the data producer; ICPSR has updated the data and documentation to reflect these changes. Additionally, the data producer provided a Stata do file with syntax to merge the two datasets, which is available for download in the study zip folder. The study title was also updated.2016-07-06 Merril Silverstein was added to the collection as a P.I.2015-07-16 Wave 8 was added; including SPSS, SAS, and STATA datasets as well as an ICPSR Variable Description and Frequencies codebook. The codebook for part one was recompiled into a collection level codebook, including both parts one and two. A user guide for the collection has also been added.2009-05-12 Setup files have been updated. Funding institution(s): United States Department of Health and Human Services. National Institutes of Health. National Institute on Aging (2R01AG00799-21A2). computer-assisted self interview (CASI) face-to-face interview mail questionnaire self-enumerated questionnaire telephone interview
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset covers the period from July 01, 2015 to December 02, 2022. It includes daily frequency time series for a set of 27 variables. Description of the variables and sources of data are given in the paper. The command code file includes commands for carrying out the empirical analysis using STATA 17. Some parts of the analysis have been performed using drop-down menus.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This replication packet contains all the data and Stata do-files to reproduce all tables and figures in "Heterogeneity and Heteroskedasticity in Endogenous Switching Models: Estimating the Effects of Physician Advice on Calorie Consumption." by Riju Joshi and Jeffrey M. Wooldridge
Included folders and short description. ........................................
[I] Simulation.zip folder includes
Simulations.do This is a Stata do-file that replicated the Monte Carlo simulations.
README_simulations.txt
[II] Data.zip folder includes
rawdata_2007_2016.dta. This is the raw NHANES dataset. This dataset has been compiled using the Stata do-file "compiling.do" and merged using the Stata do-file "merging.do". Both Stata do-files are in the Application.zip folder.
data_2007_2016.dta
[III] Application.zip folder incudes
compiling.do This is a Stata do-file that compiles NHANES datasets directly from the website. We compile data on several characteristics for each year.
merging.do This is a Stata do-file that merges all the raw NAHNES datasets collected using compiled.do. We merge them for each year and then we append the yearly files. The final raw dataset is named rawdata_2007_2016.dta
prepping.do This is a Stata do-file that prepares the rawdata_2007_2016.dta dataset for analysis. The cleaned dataset is named as data_2007_2016.dta
analysis.do This is a Stata do-file that conducts the analysis.
README_application.txt This file contains instructions on how to replication the application.
Any questions and concerns with replication can be sent to Riju Joshi (riju@pdx.edu)
Replication data for Grittersová, J., Indridason, I. H., Gregory, C. C., & Crespo, R. (2016). Austerity and niche parties:The electoral consequences of fiscal reforms. Electoral Studies, 42, 276–289. Data file, Stata .do file and variable description.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Descriptive and inferential statistics are taught to students in many disciplines. More classroom time is often spent on the theory behind different statistical methods that investigate relationships between variables rather than on how to interpret the results obtained to answer the research question that started the process. While statistical software (such as R, Stata, and SPSS) has made it easier to undertake regression with any dataset, the output produced remains challenging to understand and explain to intended audiences. To address this issue, the author created a 90-minute workshop that teaches students how to read tables of descriptive statistics and linear regression results produced by statistical software. The workshop has been taught each semester at the author’s institution since its creation in the Fall 2022 term, attracting a predominantly graduate student audience. Feedback has been positive thus far, with student requests for additional workshops on reading the results of different statistical models, such as logistic and count regression. Through an explanation of the process and the resources used, this presentation will provide a practical overview of how librarians can teach others how to read descriptive statistics and regression results using a research question and their own experiences working with data to guide them. It will include steps to prepare for designing a statistical literacy workshop. The aim of this presentation is to provide ideas that will help librarians move towards teaching a statistical literacy workshop at their own institutions or help them expand their teaching activities in this area.
Here you find Stata code, which is used for the development and validation of measurement instruments (questionnaires, tests, items, scales) for the social sciences. The description of the analyses carried out with the code can be found in the appendices A1 to A5 of the ZIS Publication Guide. Each code includes comments to guide users through the code. We provide the data set “example1” to run the code.
We provide:
Code for testing the dimensionality of scales comprises exploratory factor analysis, principal component analysis, and confirmatory factor analysis (tau-congeneric and tau-equivalent). For the description of the analyses, see appendices A1 to A2 of the ZIS Publication Guide.
Code used to estimate reliability comprises the estimation of split-half reliability, retest reliability, reliability coefficients for single-factor models (Cronbach’s Alpha, McDonald’s Omega/Raykov’s Rho, AVE [Average Variance Extracted]), and bi-factor models (Omega-H, ECV [Explained Common Variance]). For the description of the analyses, see appendix A3 of the ZIS Publication Guide.
Code for measurement invariance testing within SEM. For the description of the analyses see appendix A5 of the ZIS Publication Guide.