Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PROCare-2023 Project
All collected data, data analysis coding, and crude output are made available. These files can be complemented by those registred prior to data collection (https://zenodo.org/records/8322740).
Content
PROCare-questions.pdf Copy of online survey with question codes (name) and values
PROCare – 2023_codes.xlsx Conversion of survey question names to STATA names
PROCare-dataset.xlsx Full datset without MetaData. For metadata see files PROCare-questions.pdf and PROCare – 2023_codes.xlsx
PROCare-2023.do Executable command STATA file for running full analysis
PROCare-2023.txt Crude STATA export files with all results
Using the dataset
The full dataset is made available for secondary analysis. The coded data is found on PROCare-dataset.xlsx. Metadata for understanding codes require using the files PROCare-questions.pdf & PROCare – 2023_codes.xlsx.
This file is the crude file with all the data entry including partially completed questionnaires and duplicates.
Running the analysis
Full analysis can be run using STATA (version 5.0) by downloading all files and running the PROCare-2023.do file with crude data in the Source_files folder.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Project summary: The CLIMBS-UP survey examined experiences of early-career scholars in economics, biology, physics, and psychology. In the paper associated with these data, we examined the differential negative impacts that marginalized early career scholars experienced due to the COVID-19 pandemic compared to more privileged groups. Participants were doctoral students (n = 2,687), postdoctoral scholars (n = 335), and assistant professors (n = 221) who completed an online survey administered in April and May 2021 (note, responses shared in the data file are only from those who completed at least 94% of the survey, there were an additional 323 respondents who did not complete the full survey). Participants were recruited from four STEM fields (biology, economics, physics, and psychology) at 124 different departments in the United States that were randomly selected and stratified by prestige based on the 2011 National Research Council S-rankings. We divided all departments in the four fields into terciles reflecting top, middle, and bottom tier rankings and randomly selected 10 departments per field/tercile. We oversampled Minority Serving Institutions (MSIs) to ensure at least one MSI was represented in each tier. The STATA data file contains information only for the outcome variables (COVID impacts and job outcomes) for the associated paper (Douglas, Settles, Cech, et al., under review) and does not include any identifiable demographic information other than field and career stage (COV19outcomes.dta). This project also includes a copy of the questionnaire only containing survey items used for the associated paper (COV19survey.pdf).Method: We asked participants to rate the amount of change they have experienced in their research progress, workload, concern about career advancement, and support from mentor(s). They were also asked about their disruptions to work due to life challenges including physical health problems, mental health problems, and additional caretaking responsibilities. We compared these impacts across seven socio-demographic statuses (ie., gender, race, caregiving status, disability status, sexual identity, first generation undergraduate status, and career stage). As the analyses use multiple demographic characteristics that can be used to identify participants, the data file here is limited to career stage, field, and all reported outcome variables including COVID-19 impacts, job satisfaction, professional role confidence, turnover intentions, and burnout. Below is a description of each variable in the downloadable Stata data file (COV19outcomes.dta).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
These files contain the materials necessary to replicate the analyses in the published article "The Political Relevance of Irrelevant Events". The files include a dataset, Stata .do file, and a copy of the questionnaire in Word. The .do file is intended to be run on the data file after that file is imported to Stata.
The high-frequency phone survey of refugees monitors the economic and social impact of and responses to the COVID-19 pandemic on refugees and nationals, by calling a sample of households every four weeks. The main objective is to inform timely and adequate policy and program responses. Since the outbreak of the COVID-19 pandemic in Ethiopia, two rounds of data collection of refugees were completed between September and November 2020. The first round of the joint national and refugee HFPS was implemented between the 24 September and 17 October 2020 and the second round between 20 October and 20 November 2020.
Household
Sample survey data [ssd]
The sample was drawn using a simple random sample without replacement. Expecting a high non-response rate based on experience from the HFPS-HH, we drew a stratified sample of 3,300 refugee households for the first round. More details on sampling methodology are provided in the Survey Methodology Document available for download as Related Materials.
Computer Assisted Telephone Interview [cati]
The Ethiopia COVID-19 High Frequency Phone Survey of Refugee questionnaire consists of the following sections:
A more detailed description of the questionnaire is provided in Table 1 of the Survey Methodology Document that is provided as Related Materials. Round 1 and 2 questionnaires available for download.
DATA CLEANING At the end of data collection, the raw dataset was cleaned by the Research team. This included formatting, and correcting results based on monitoring issues, enumerator feedback and survey changes. Data cleaning carried out is detailed below.
Variable naming and labeling: • Variable names were changed to reflect the lowercase question name in the paper survey copy, and a word or two related to the question. • Variables were labeled with longer descriptions of their contents and the full question text was stored in Notes for each variable. • “Other, specify” variables were named similarly to their related question, with “_other” appended to the name. • Value labels were assigned where relevant, with options shown in English for all variables, unless preloaded from the roster in Amharic.
Variable formatting:
• Variables were formatted as their object type (string, integer, decimal, time, date, or datetime).
• Multi-select variables were saved both in space-separated single-variables and as multiple binary variables showing the yes/no value of each possible response.
• Time and date variables were stored as POSIX timestamp values and formatted to show Gregorian dates.
• Location information was left in separate ID and Name variables, following the format of the incoming roster. IDs were formatted to include only the variable level digits, and not the higher-level prefixes (2-3 digits only.)
• Only consented surveys were kept in the dataset, and all personal information and internal survey variables were dropped from the clean dataset. • Roster data is separated from the main data set and kept in long-form but can be merged on the key variable (key can also be used to merge with the raw data).
• The variables were arranged in the same order as the paper instrument, with observations arranged according to their submission time.
Backcheck data review: Results of the backcheck survey are compared against the originally captured survey results using the bcstats command in Stata. This function delivers a comparison of variables and identifies any discrepancies. Any discrepancies identified are then examined individually to determine if they are within reason.
The following data quality checks were completed: • Daily SurveyCTO monitoring: This included outlier checks, skipped questions, a review of “Other, specify”, other text responses, and enumerator comments. Enumerator comments were used to suggest new response options or to highlight situations where existing options should be used instead. Monitoring also included a review of variable relationship logic checks and checks of the logic of answers. Finally, outliers in phone variables such as survey duration or the percentage of time audio was at a conversational level were monitored. A survey duration of close to 15 minutes and a conversation-level audio percentage of around 40% was considered normal. • Dashboard review: This included monitoring individual enumerator performance, such as the number of calls logged, duration of calls, percentage of calls responded to and percentage of non-consents. Non-consent reason rates and attempts per household were monitored as well. Duration analysis using R was used to monitor each module's duration and estimate the time required for subsequent rounds. The dashboard was also used to track overall survey completion and preview the results of key questions. • Daily Data Team reporting: The Field Supervisors and the Data Manager reported daily feedback on call progress, enumerator feedback on the survey, and any suggestions to improve the instrument, such as adding options to multiple choice questions or adjusting translations. • Audio audits: Audio recordings were captured during the consent portion of the interview for all completed interviews, for the enumerators' side of the conversation only. The recordings were reviewed for any surveys flagged by enumerators as having data quality concerns and for an additional random sample of 2% of respondents. A range of lengths were selected to observe edge cases. Most consent readings took around one minute, with some longer recordings due to questions on the survey or holding for the respondent. All reviewed audio recordings were completed satisfactorily. • Back-check survey: Field Supervisors made back-check calls to a random sample of 5% of the households that completed a survey in Round 1. Field Supervisors called these households and administered a short survey, including (i) identifying the same respondent; (ii) determining the respondent's position within the household; (iii) confirming that a member of the the data collection team had completed the interview; and (iv) a few questions from the original survey.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PROCare-2023 Project
All collected data, data analysis coding, and crude output are made available. These files can be complemented by those registred prior to data collection (https://zenodo.org/records/8322740).
Content
PROCare-questions.pdf Copy of online survey with question codes (name) and values
PROCare – 2023_codes.xlsx Conversion of survey question names to STATA names
PROCare-dataset.xlsx Full datset without MetaData. For metadata see files PROCare-questions.pdf and PROCare – 2023_codes.xlsx
PROCare-2023.do Executable command STATA file for running full analysis
PROCare-2023.txt Crude STATA export files with all results
Using the dataset
The full dataset is made available for secondary analysis. The coded data is found on PROCare-dataset.xlsx. Metadata for understanding codes require using the files PROCare-questions.pdf & PROCare – 2023_codes.xlsx.
This file is the crude file with all the data entry including partially completed questionnaires and duplicates.
Running the analysis
Full analysis can be run using STATA (version 5.0) by downloading all files and running the PROCare-2023.do file with crude data in the Source_files folder.