Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
GENERAL INFORMATION
Title of Dataset: A dataset from a survey investigating disciplinary differences in data citation
Date of data collection: January to March 2022
Collection instrument: SurveyMonkey
Funding: Alfred P. Sloan Foundation
SHARING/ACCESS INFORMATION
Licenses/restrictions placed on the data: These data are available under a CC BY 4.0 license
Links to publications that cite or use the data:
Gregory, K., Ninkov, A., Ripp, C., Peters, I., & Haustein, S. (2022). Surveying practices of data citation and reuse across disciplines. Proceedings of the 26th International Conference on Science and Technology Indicators. International Conference on Science and Technology Indicators, Granada, Spain. https://doi.org/10.5281/ZENODO.6951437
Gregory, K., Ninkov, A., Ripp, C., Roblin, E., Peters, I., & Haustein, S. (2023). Tracing data:
A survey investigating disciplinary differences in data citation. Zenodo. https://doi.org/10.5281/zenodo.7555266
DATA & FILE OVERVIEW
File List
Additional related data collected that was not included in the current data package: Open ended questions asked to respondents
METHODOLOGICAL INFORMATION
Description of methods used for collection/generation of data:
The development of the questionnaire (Gregory et al., 2022) was centered around the creation of two main branches of questions for the primary groups of interest in our study: researchers that reuse data (33 questions in total) and researchers that do not reuse data (16 questions in total). The population of interest for this survey consists of researchers from all disciplines and countries, sampled from the corresponding authors of papers indexed in the Web of Science (WoS) between 2016 and 2020.
Received 3,632 responses, 2,509 of which were completed, representing a completion rate of 68.6%. Incomplete responses were excluded from the dataset. The final total contains 2,492 complete responses and an uncorrected response rate of 1.57%. Controlling for invalid emails, bounced emails and opt-outs (n=5,201) produced a response rate of 1.62%, similar to surveys using comparable recruitment methods (Gregory et al., 2020).
Methods for processing the data:
Results were downloaded from SurveyMonkey in CSV format and were prepared for analysis using Excel and SPSS by recoding ordinal and multiple choice questions and by removing missing values.
Instrument- or software-specific information needed to interpret the data:
The dataset is provided in SPSS format, which requires IBM SPSS Statistics. The dataset is also available in a coded format in CSV. The Codebook is required to interpret to values.
DATA-SPECIFIC INFORMATION FOR: MDCDataCitationReuse2021surveydata
Number of variables: 94
Number of cases/rows: 2,492
Missing data codes: 999 Not asked
Refer to MDCDatacitationReuse2021Codebook.pdf for detailed variable information.
Facebook
Twitterhttps://www.icpsr.umich.edu/web/ICPSR/studies/39492/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/39492/terms
Clinical trials study the effects of medical treatments, like how safe they are and how well they work. But most clinical trials don't get all the data they need from patients. Patients may not answer all questions on a survey, or they may drop out of a study after it has started. The missing data can affect researchers' ability to detect the effects of treatments. To address the problem of missing data, researchers can make different guesses based on why and how data are missing. Then they can look at results for each guess. If results based on different guesses are similar, researchers can have more confidence that the study results are accurate. In this study, the research team created new methods to do these tests and developed software that runs these tests. To access the sensitivity analysis methods and software, please visit the MissingDataMatters website.
Facebook
TwitterReplication and simulation reproduction materials for the article "The MIDAS Touch: Accurate and Scalable Missing-Data Imputation with Deep Learning." Please see the README file for a summary of the contents and the Replication Guide for a more detailed description. Article abstract: Principled methods for analyzing missing values, based chiefly on multiple imputation, have become increasingly popular yet can struggle to handle the kinds of large and complex data that are also becoming common. We propose an accurate, fast, and scalable approach to multiple imputation, which we call MIDAS (Multiple Imputation with Denoising Autoencoders). MIDAS employs a class of unsupervised neural networks known as denoising autoencoders, which are designed to reduce dimensionality by corrupting and attempting to reconstruct a subset of data. We repurpose denoising autoencoders for multiple imputation by treating missing values as an additional portion of corrupted data and drawing imputations from a model trained to minimize the reconstruction error on the originally observed portion. Systematic tests on simulated as well as real social science data, together with an applied example involving a large-scale electoral survey, illustrate MIDAS's accuracy and efficiency across a range of settings. We provide open-source software for implementing MIDAS.
Facebook
TwitterGlobal COVID-19 surveys conducted by National Statistical Offices. This dataset has several columns that contain different types of information. Here's a brief explanation of each column:
1.**Country**: This column likely contains the names of the countries for which the survey data is collected. Each row represents data related to a specific country.
2.**Category**: This column might contain information about the type or category of the survey. It could include categories such as healthcare, economic impact, public sentiment, etc. This helps in categorizing the surveys.
3.**Title and Link**: These columns may contain the title or name of the specific survey and a link to the source or webpage where more information about the survey can be found. The link can be useful for referencing the original source of the data.
4.**Description**: This column likely contains a brief description or summary of the survey's objectives, methodology, or key findings. It provides additional context for the survey data.
5.**Source**: This column may contain information about the organization or agency that conducted the survey. It's essential for understanding the authority behind the data.
6.**Date Added**: This column probably contains the date when the survey data was added to the dataset. This helps track the freshness of the data and can be useful for historical analysis.
With this dataset, you can perform various types of analysis, including but not limited to:
Country-based analysis: You can analyze survey data for specific countries to understand the impact of COVID-19 in different regions.
Category-based analysis: You can group surveys by category and analyze trends or patterns related to healthcare, economics, or public sentiment.
Temporal analysis: You can examine how survey data has evolved over time by using the "Date Added" column to track changes and trends.
Source-based analysis: You can assess the reliability and credibility of the data by considering the source of the surveys.
Data visualization: Create visual representations like charts, graphs, and maps to make the data more understandable and informative.
Before conducting any analysis, it's essential to clean and preprocess the data, handle missing values, and ensure data consistency. Additionally, consider the research questions or insights you want to gain from the dataset, which will guide your analysis approach.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Initial data analysis (IDA) is the part of the data pipeline that takes place between the end of data retrieval and the beginning of data analysis that addresses the research question. Systematic IDA and clear reporting of the IDA findings is an important step towards reproducible research. A general framework of IDA for observational studies includes data cleaning, data screening, and possible updates of pre-planned statistical analyses. Longitudinal studies, where participants are observed repeatedly over time, pose additional challenges, as they have special features that should be taken into account in the IDA steps before addressing the research question. We propose a systematic approach in longitudinal studies to examine data properties prior to conducting planned statistical analyses. In this paper we focus on the data screening element of IDA, assuming that the research aims are accompanied by an analysis plan, meta-data are well documented, and data cleaning has already been performed. IDA data screening comprises five types of explorations, covering the analysis of participation profiles over time, evaluation of missing data, presentation of univariate and multivariate descriptions, and the depiction of longitudinal aspects. Executing the IDA plan will result in an IDA report to inform data analysts about data properties and possible implications for the analysis plan—another element of the IDA framework. Our framework is illustrated focusing on hand grip strength outcome data from a data collection across several waves in a complex survey. We provide reproducible R code on a public repository, presenting a detailed data screening plan for the investigation of the average rate of age-associated decline of grip strength. With our checklist and reproducible R code we provide data analysts a framework to work with longitudinal data in an informed way, enhancing the reproducibility and validity of their work.
Facebook
Twitter
Occupation data for 2021 and 2022 data files
The ONS has identified an issue with the collection of some occupational data in 2021 and 2022 data files in a number of their surveys. While they estimate any impacts will be small overall, this will affect the accuracy of the breakdowns of some detailed (four-digit Standard Occupational Classification (SOC)) occupations, and data derived from them. Further information can be found in the ONS article published on 11 July 2023: https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/articles/revisionofmiscodedoccupationaldataintheonslabourforcesurveyuk/january2021toseptember2022" style="background-color: rgb(255, 255, 255);">Revision of miscoded occupational data in the ONS Labour Force Survey, UK: January 2021 to September 2022.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
In 2013, students of the Statistics class at "https://fses.uniba.sk/en/">FSEV UK were asked to invite their friends to participate in this survey.
responses.csv) consists of 1010 rows and 150 columns (139
integer and 11 categorical).columns.csv file if you want to match the data with the original names.The variables can be split into the following groups:
Many different techniques can be used to answer many questions, e.g.
(in slovak) Sleziak, P. - Sabo, M.: Gender differences in the prevalence of specific phobias. Forum Statisticum Slovacum. 2014, Vol. 10, No. 6. [Differences (gender + whether people lived in village/town) in the prevalence of phobias.]
Sabo, Miroslav. Multivariate Statistical Methods with Applications. Diss. Slovak University of Technology in Bratislava, 2014. [Clustering of variables (music preferences, movie preferences, phobias) + Clustering of people w.r.t. their interests.]
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Percentage (%) and number (n) of missing values in the outcome (maximum grip strength) among participants that were interviewed, by age group and sex using all available data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set contains estimates of the base rates of 550 food safety-relevant food handling practices in European households. The data are representative for the population of private households in the ten European countries in which the SafeConsume Household Survey was conducted (Denmark, France, Germany, Greece, Hungary, Norway, Portugal, Romania, Spain, UK).
Sampling design
In each of the ten EU and EEA countries where the survey was conducted (Denmark, France, Germany, Greece, Hungary, Norway, Portugal, Romania, Spain, UK), the population under study was defined as the private households in the country. Sampling was based on a stratified random design, with the NUTS2 statistical regions of Europe and the education level of the target respondent as stratum variables. The target sample size was 1000 households per country, with selection probability within each country proportional to stratum size.
Fieldwork
The fieldwork was conducted between December 2018 and April 2019 in ten EU and EEA countries (Denmark, France, Germany, Greece, Hungary, Norway, Portugal, Romania, Spain, United Kingdom). The target respondent in each household was the person with main or shared responsibility for food shopping in the household. The fieldwork was sub-contracted to a professional research provider (Dynata, formerly Research Now SSI). Complete responses were obtained from altogether 9996 households.
Weights
In addition to the SafeConsume Household Survey data, population data from Eurostat (2019) were used to calculate weights. These were calculated with NUTS2 region as the stratification variable and assigned an influence to each observation in each stratum that was proportional to how many households in the population stratum a household in the sample stratum represented. The weights were used in the estimation of all base rates included in the data set.
Transformations
All survey variables were normalised to the [0,1] range before the analysis. Responses to food frequency questions were transformed into the proportion of all meals consumed during a year where the meal contained the respective food item. Responses to questions with 11-point Juster probability scales as the response format were transformed into numerical probabilities. Responses to questions with time (hours, days, weeks) or temperature (C) as response formats were discretised using supervised binning. The thresholds best separating between the bins were chosen on the basis of five-fold cross-validated decision trees. The binned versions of these variables, and all other input variables with multiple categorical response options (either with a check-all-that-apply or forced-choice response format) were transformed into sets of binary features, with a value 1 assigned if the respective response option had been checked, 0 otherwise.
Treatment of missing values
In many cases, a missing value on a feature logically implies that the respective data point should have a value of zero. If, for example, a participant in the SafeConsume Household Survey had indicated that a particular food was not consumed in their household, the participant was not presented with any other questions related to that food, which automatically results in missing values on all features representing the responses to the skipped questions. However, zero consumption would also imply a zero probability that the respective food is consumed undercooked. In such cases, missing values were replaced with a value of 0.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Behavioral Risk Factor Surveillance System (BRFSS) is the nation's premier system of health-related telephone surveys that collect uniform, state-specific data about U.S. residents regarding their health-related risk behaviors, chronic health conditions, and use of preventive services.
The objective of the BRFSS is to gather consistent, state-level data on preventive health practices and risk behaviors associated with chronic diseases, injuries, and preventable infectious diseases among adults (aged 18 and older).
Established in 1984 with 15 states, the BRFSS now collects data in all 50 states, the District of Columbia, and three U.S. territories. The system completes more than 400,000 adult interviews each year, making it the largest continuously conducted health survey system in the world.
The 2024 BRFSS dataset continues to use the raking weighting methodology (introduced in 2011) and includes both landline and cellphone-only respondents, ensuring more accurate representation of the U.S. adult population.
The aggregate dataset combines landline and cell phone data collected in 2024 from 49 states, The District of Columbia, Guam, Puerto Rico, and The U.S. Virgin Islands.
This original dataset contains responses from 457,670 individuals and has 301 features. These features are either questions directly asked of participants, or calculated variables based on individual participant responses.
⚠️ Note: Tennessee was unable to collect enough responses to meet inclusion requirements for 2024 and is not included in this public dataset.
Certain survey questions and responses have been modified or omitted to comply with federal data policies in effect during the 2024 collection period. As a result, some variables may contain missing values or appear inconsistent due to questions that were removed or restructured.
Data are collected from a random sample of adults (one per household) via telephone interviews.
Factors assessed include:
- Tobacco use
- Health care access and coverage
- Alcohol consumption
- Physical activity and diet
- HIV/AIDS knowledge and prevention
- Chronic health conditions
- Preventive health services and screenings
The annual dataset contains 301 variables, covering both core questions and optional modules. Please refer to the official BRFSS 2024 Codebook for detailed variable definitions and coding.
This dataset contains 3 files:
1. brfss_survey_data_2024.csv # Dataset in .csv format (converted from SAS)
2. codebook_2024.HTML # CDC codebook for variable definitions
3. main_data_brfss_2024.XPT # Main dataset
⚙️ Note: The CSV file were converted from the original SAS format using pandas. Minor conversion artifacts may exist.
Complete description about each column of the CSV file can be found in the codebook.
Data provided by the U.S. Centers for Disease Control and Prevention (CDC).
Original source and additional years of BRFSS data: CDC BRFSS Annual Data
Citation:
Centers for Disease Control and Prevention (CDC). Behavioral Risk Factor Surveillance System Survey Data. Atlanta, Georgia: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, 2024.
License: Public Domain (U.S. Government Work)
If you use this dataset in your analysis or publication, please cite as:
Behavioral Risk Factor Surveillance System (BRFSS) 2024. U.S. Centers for Disease Control and Prevention (CDC). Public Domain.
Prepared for Kaggle public dataset publication. All data are in the public domain as U.S. Government works.
Facebook
TwitterThis file contains all of the cases and variables that are in the original 2012 General Social Survey, but is prepared for easier use in the classroom. Changes have been made in two areas. First, to avoid confusion when constructing tables or interpreting basic analysis, all missing data codes have been set to system missing. Second, many of the continuous variables have been categorized into fewer categories, and added as additional variables to the file.
The General Social Surveys (GSS) have been conducted by the National Opinion Research Center (NORC) annually since 1972, except for the years 1979, 1981, and 1992 (a supplement was added in 1992), and biennially beginning in 1994. The GSS are designed to be part of a program of social indicator research, replicating questionnaire items and wording in order to facilitate time-trend studies. This data file has all cases and variables asked on the 2012 GSS. There are a total of 4,820 cases in the data set but their initial sampling years vary because the GSS now contains panel cases. Sampling years can be identified with the variable SAMPTYPE.
The 2012 GSS featured special modules on religious scriptures, the environment, dance and theater performances, health care system, government involvement, health concerns, emotional health, financial independence and income inequality.
The GSS has switched from a repeating, cross-section design to a combined repeating cross-section and panel-component design. This file has a rolling panel design, with the 2008 GSS as the base year for the first panel. A sub-sample of 2,000 GSS cases from 2008 was selected for reinterview in 2010 and again in 2012 as part of the GSSs in those years. The 2010 GSS consisted of a new cross-section plus the reinterviews from 2008. The 2012 GSS consists of a new cross-section of 1,974, the first reinterview wave of the 2010 panel cases with 1,551 completed cases, and the second and final reinterview of the 2008 panel with 1,295 completed cases. Altogether, the 2012 GSS had 4,820 cases (1,974 in the new 2012 panel, 1,551 in the 2010 panel, and 1,295 in the 2008 panel).
To download syntax files for the GSS that reproduce well-known religious group recodes, including RELTRAD, please visit the "/research/syntax-repository-list" Target="_blank">ARDA's Syntax Repository.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
analyze the survey of consumer finances (scf) with r the survey of consumer finances (scf) tracks the wealth of american families. every three years, more than five thousand households answer a battery of questions about income, net worth, credit card debt, pensions, mortgages, even the lease on their cars. plenty of surveys collect annual income, only the survey of consumer finances captures such detailed asset data. responses are at the primary economic unit-level (peu) - the economically dominant, financially interdependent family members within a sampled household. norc at the university of chicago administers the data collection, but the board of governors of the federal reserve pay the bills and therefore call the shots. if you were so brazen as to open up the microdata and run a simple weighted median, you'd get the wrong answer. the five to six thousand respondents actually gobble up twenty-five to thirty thousand records in the final pub lic use files. why oh why? well, those tables contain not one, not two, but five records for each peu. wherever missing, these data are multiply-imputed, meaning answers to the same question for the same household might vary across implicates. each analysis must account for all that, lest your confidence intervals be too tight. to calculate the correct statistics, you'll need to break the single file into five, necessarily complicating your life. this can be accomplished with the meanit sas macro buried in the 2004 scf codebook (search for meanit - you'll need the sas iml add-on). or you might blow the dust off this website referred to in the 2010 codebook as the home of an alternative multiple imputation technique, but all i found were broken links. perhaps it's time for plan c, and by c, i mean free. read the imputation section of the latest codebook (search for imputation), then give these scripts a whirl. they've got that new r smell. the lion's share of the respondents in the survey of consumer finances get drawn from a pretty standard sample of american dwellings - no nursing homes, no active-duty military. then there's this secondary sample of richer households to even out the statistical noise at the higher end of the i ncome and assets spectrum. you can read more if you like, but at the end of the day the weights just generalize to civilian, non-institutional american households. one last thing before you start your engine: read everything you always wanted to know about the scf. my favorite part of that title is the word always. this new github repository contains t hree scripts: 1989-2010 download all microdata.R initiate a function to download and import any survey of consumer finances zipped stata file (.dta) loop through each year specified by the user (starting at the 1989 re-vamp) to download the main, extract, and replicate weight files, then import each into r break the main file into five implicates (each containing one record per peu) and merge the appropriate extract data onto each implicate save the five implicates and replicate weights to an r data file (.rda) for rapid future loading 2010 analysis examples.R prepare two survey of consumer finances-flavored multiply-imputed survey analysis functions load the r data files (.rda) necessary to create a multiply-imputed, replicate-weighted survey design demonstrate how to access the properties of a multiply-imput ed survey design object cook up some descriptive statistics and export examples, calculated with scf-centric variance quirks run a quick t-test and regression, but only because you asked nicely replicate FRB SAS output.R reproduce each and every statistic pr ovided by the friendly folks at the federal reserve create a multiply-imputed, replicate-weighted survey design object re-reproduce (and yes, i said/meant what i meant/said) each of those statistics, now using the multiply-imputed survey design object to highlight the statistically-theoretically-irrelevant differences click here to view these three scripts for more detail about the survey of consumer finances (scf), visit: the federal reserve board of governors' survey of consumer finances homepage the latest scf chartbook, to browse what's possible. (spoiler alert: everything.) the survey of consumer finances wikipedia entry the official frequently asked questions notes: nationally-representative statistics on the financial health, wealth, and assets of american hous eholds might not be monopolized by the survey of consumer finances, but there isn't much competition aside from the assets topical module of the survey of income and program participation (sipp). on one hand, the scf interview questions contain more detail than sipp. on the other hand, scf's smaller sample precludes analyses of acute subpopulations. and for any three-handed martians in the audience, ther e's also a few biases between these two data sources that you ought to consider. the survey methodologists at the federal reserve take their job...
Facebook
TwitterThe Jerusalem Household Social Survey 2005 is one of the most important statistical activities that have been conducted by PCBS. It is the most detailed and comprehensive statistical activity that PCBS has conducted in Jerusalem. The main objective of the Jerusalem household social survey, 2005 is to provide basic information about: Demographic and social characteristics for the Palestinian society in Jerusalem governorate including age-sex structure, Illiteracy rate, enrollment and drop-out rates by background characteristics, Labor force status, unemployment rate, occupation, economic activity, employment status, place of work and wage levels, Housing and housing conditions, Living levels and impact of Israeli measures on nutrition behavior during Al-Aqsa intifada, Criminal offence, its victims, and injuries caused.
Social survey data covering the province of Jerusalem only, the type locality (urban, rural, refugee camps) and Governorate
households, Individual
The target population was all Palestinian households living in Jerusalem Governorate.
Sample survey data [ssd]
The Sample Frame Were estimated sample size of Jerusalem by 3,300 family, including 2,240 families in the region J1, and 1,060 families in the region of J2 has been the establishment of Sample Frame to Jerusalem (J2) of the General Census of Population and Housing, and Establishment, which was carried out by the PCBS at the end of 1997, was create Sample Frame to Jerusalem (J1) of project data that has been exclusively in 2004. And the frame is a list of counting areas, and these areas are used as units an initial preview (PSUs) in the first stage of the process of selecting the sample. Stratified cluster random sample of regular two phases: Phase I: was selected a stratified random sample of enumeration areas from Jerusalem (J1) and Jerusalem (J2). The number of enumeration areas that have been chosen counting area 123 divided into two regions: 70 the count of Jerusalem (J1), 53 the count of Jerusalem (J2). Phase II: A random sample was withdrawn systematically with size of 20 families from each enumeration area that was selected in the first stage of the Jerusalem J2, and 32 families from each enumeration area that was selected in the first stage of the Jerusalem J1.
Face-to-face [f2f]
A survey questionnaire the main tool for gathering information, so do not need to check the technical specifications for the phase of field work, as required to achieve the requirements of data processing and analysis, has been designed form the survey after examining the experience of other countries on the subject of social surveys, covering the form as much as possible the most important social indicators as recommended by the United Nations, taking into account the specificity of the Palestinian community in this aspect.
Phase included a set of data processing Activities and operations that have been made to the Forms to prepare her for the analysis phase, This phase included the following operations: Before the introduction of audit data: at this stage was Check all the forms using the instructions To check to make sure the field of logical data and re- Incomplete, including a second field. Data Entry: The data entry Central to the central headquarters in Al-Bireh, was organized The data entry process using the BLAISE Program Where the form has been programmed through this program. Was marked by the program that was developed in the Device properties and features the following: The possibility of dealing with an exact copy of the form The computer screen. The ability to conduct all tests and possibilities Possible and logical sequence of data in the form. Maintain a minimum of errors Portal Digital data or errors of field work. Ease of use and deal with the software and data (User-Friendly). The possibility of converting the data to the other formula can be Use and analysis of the statistical systems Analysis such as SPSS.
during the field work we visit 3,300 family in Jerusalem Governorate, 2,240 in Area J1 and1,060 in Area J2 where the final results of the interviews were as follows: The number of families who were interviewed (2,485) in Jerusalem Governorate, complete questioner 75.3% (1,773) in J1 79.2% (712) in J2 67.2%
Data were collected in a manner that the survey sample and not Balhsr destruction, so she is exposed to two main types of errors. The first sampling errors (statistical errors), and the second non-statistical errors. It is intended that sampling errors of the errors resulting from sample design, so it is easy to measure, the contrast has been calculated and the effect of sample design.
The non-statistical errors are possible to occur in every stage of project implementation, through data collection, inserting, and mistakes can be summarized by the non-response, and response errors (surveyed), and the mistakes of the interview (the researcher) and data-entry errors. To avoid errors and reduce the impact it has made significant efforts through the training of researchers extensive training, and the presence of a group of experts in the concepts and terminology, medical / health, and training on how to conduct interviews, and the things that must be followed during the interview, and the things that should be avoided.
Have been trained on the data entry program entry, program, and were examined in order to see the picture of the situation and reduce any problems, there was constant contact between supervisors and checkers through ongoing visits and periodic meetings. In addition, has been drafting a set of circulars and instructions reminder to the team. Also been circulated answers to questions and problems faced by the researchers during the field work.
As for office work have been trained crew to check the special forms and field detection of errors, which greatly reduces the rates of errors that can occur during field work. In order to reduce the proportion of errors that can occur during entry form to the computer, the software is designed to entry so as not to allow any errors Tnasagah can get during the process of input and contains many of the conditions Logical, where they were loading the program the input of many tests on private answers each question in addition to the relations between the different questions and testing the other logical. This process has led to the disclosure of most of the errors that are not found in previous phases of work, where they were correct all errors that have been discovered.
Data were evaluated according to the following areas: 1. Definition of family members and how to register. 2. Demographic characteristics that have a relationship on Christmas. 3. Breakdown of the profession and activity.
Methods of assessment vary according to the data subject in this survey include the following: 1. Occurrences of missing values and Answers "other" and "Do not know" and examine inconsistencies between different sections or between the date of birth and other sections. Add to examine the internal consistency of the data as part of a logical data and completeness. 2. Compared to survey data with the results of surveys of the relationship and by the Central Bureau of Statistics Palestinian implementation.
Can be summarized as sources of some non-statistical errors that have emerged during the implementation of the survey including the following: Inability to meet the data in some cases the forms because of the lack of a home or be in the housing unit does not exist or are uninhabited and there are families not able to provide some data or refused to do so. Some families did not take the form subject very seriously affecting the quality of the data provided. Errors resulting from the method of asking the question by the researcher in the field. Category understand the question and answer based on his understanding of it. The inability of the technical team overseeing the project from the field visit on a regular basis for all duty stations in order to see the workflow and meet researchers and directing them, especially in the area J1. There was difficulty in reaching the families because of the construction of the wall, especially in the Ram Area and also in the area of Bir Nabala where the switch was a full count area due to additional incompleteness caused by the absence of the families in the region because of the separation wall. It was not easy to follow and adjust the time researchers because of the prevailing security conditions.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data Generation ProceduresThe data for this study was generated through a survey experiment designed to assess how participants evaluate stock risk based on nominal price changes. Participants were randomly assigned to one of four questionnaires, each presenting a hypothetical scenario involving a stock whose price had declined by 25%. The stocks were priced at $2,000, $200, $20, and $2. This setup allowed the researchers to observe how the nominal price influenced participants’ perceptions of risk and their likelihood to sell the stock. The survey included questions about participants’ risk assessments and their decisions regarding selling shares after being informed of the price decline. The data collection involved online survey tools, ensuring a diverse participant pool. Temporal and Geographical ScopeThe survey was conducted in September and December 2022. The geographical scope primarily encompasses participants from various regions in the USA. Tabular Data DescriptionThe survey results were compiled into tabular data containing several entries corresponding to each participant’s responses—the total number of data entries was 632. Row and Column Headers: Each row represents an individual participant’s response with their Participant ID, while columns include variables tested. Variables reported in the data: Variable Description Respondent ID Random number of the respondent’s randomly assigned ID Gender Male=0, Female=1. Region The location of the respondent: Middle Atlantic, West North Central, New England, South Atlantic, Mountain, West South Central, Pacific, East North Central, East South Central Age Age Groups are divided into four, numbered from 1 to 4, respectively: 18-29, 30-44, 45-60, and 60 and older. Stock Price: Q_2,Q_20,Q_200,Q_2000 Participants were randomly assigned a questionnaire and were asked about the stocks of $2, $20, $200, and $2,000. To construct this variable, it has four values range from 1 to 4, in which (1) $2; (2) $20; (3) $200; (4) $2,000. Income Reported annual income is divided into income groups from 1 to 8. Risk appetite/ Self-assessment of risk-taking behavior Participants were asked how much of a risk taker they consider themselves to be on a scale of 1 (risk averse) to 10 (risk lover). TraderD Participants were asked if they trade stocks or bonds and, if so, how often; the possible answers were, from 1 to 4, respectively: Trade regularly (daily or weekly[1]), trade a few times a month, rarely (a few times a year at most), or have never traded before. Follow news People were asked if they follow stock-related news and stock or bond price changes, and if so, how often. The possible responses were divided into five: I never follow news related to stocks or stock indices=1; Occasionally, a few times a year=2; Sometimes, a few times a month =3; Often, a few times a week=4; Always, daily=5. Risk Participants were asked about how risky they think the stock is on a scale of 1 (not risky) to 7 (high risk) Percent Sold Participants were asked how many shares they would sell due to the stock decline. Based on the answer, I calculated the percentage of stocks they sold out of their portfolio. Sold Based on the same question from the previous variable, I constructed another variable with the values of (1) for those who sold at least 10% of their portfolio and (0) otherwise. In some cases, the respondents answered they would purchase more stocks; in those few cases, I assumed their response was zero. Missing DataA few respondents skipped questions or provided incomplete answers. If present, missing data were excluded from the analysis. Description of Each Data FileThe primary data file generated from this study was structured as an Excel file containing all participant responses. Content: The file includes columns for participant demographics, stock prices presented, risk assessments, selling decisions, and number of shares sold.Format: Excel.Size: 64 kb. [1] In some questionnaires I divided this answer into daily or weekly to allow more variability, however, since the number of answers for “daily”, when constructing this variable and combined the responses from daily and weekly together.
Facebook
TwitterThe purpose of the HIES survey is to obtain information on the income, consumption pattern, incidence of poverty, and saving propensities for different groups of people in FSM. This information will be used to guide policy makers in framing socio-economic developmental policies and in initiating financial measures for improving economic conditions of the people. The 2005 FSM HIES asked income of all persons 15 years and over. It referred to income received during the calendar year 2004, and includes both cash and in-kind income. The survey has five primary objectives, namely to:
1) Rebase the FSM Consumer Price Index (CPI); 2) Provide data on the distribution of income and expenditures throughout the FSM; 3) Provide data for national accounts, particularly regarding income from home production activities and the consumption of goods and services derived form home production activities; 4) Provide nutritional information and food consumption patterns for the FSM families; and 5) Provide data for hardship study.
Entire Country
Four states of the FSM: Yap, Chuuk, Pohnpei, and Kosrae
The survey universe covered all persons living in their place of usual residence at the time of the survey. Income data were collected from persons aged 15 years and over while expenditure data were obtained from all household members at a household level. Persons living in institutions, such as school dormitories, hospital wards, hostels, prisons, as well as those whose usual residence were somewhere else were excluded from the survey.
Sample survey data [ssd]
The 2005 FSM Household Income and Expenditure Survey (HIES) used a sampling frame based on updated information on Enumeration Districts (ED) and household listing from the 2000 FSM Census. Based on this sampling frame, the four states of FSM were then classified as the domains of the survey. Each of the states was further divided into 3 strata, except for Kosrae which was not divided at all because it doesn't possess any outer islands and it has relatively good access to goods and services. The entire island was therefore classified under stratum 1. Each stratum was defined as follows:
1) State center and immediate surrounding areas:
- High 'living standard' and has immediate access to goods and services.
2) Areas surrounding state center (rest of main island):
- Medium 'living standard' and sometime limited access to goods and services
3) Outer islands:
- Low 'living standard' and rare access to goods and services.
Within each stratum, the HIES used a two-stage stratified sampling approach from which the sample was selected independently. First, enumeration districts (EDs) were drawn from each stratum using Proportion Probability to Size (PPS) sampling. Thus, the larger the ED size, the higher its probability of selection. About 69 EDs out of a total of 373 EDs were selected nationwide for the survey. Generally, one enumerator is assigned to each ED. Second, 20 households were systematically selected from an updated household listing for each of the selected EDs using a random start to come up with a total sample size of 1,380 households, or roughly 8.4 percent of all households in the state. Although it offered a fairly good representation of the total households in the nation, the final sample size showed a reduction of nearly 180 households from the 1,560 households, or 10 percent, initially selected for the survey.
Detailed information on the changes made to the sample size can be found in the next section under "Deviations from Sample Design."
The original plan to sample 1,560 households, or about 9.5 percent of all households in the nation was eventually reduced to 1,380 households, or about 8.4 percent of all households. The reduction of the sample size was due to fuel unavailability for transportation and uncertainty of field trip schedules to some of the selected outer islands. Dropping some of these islands from the sample was not expected to impact significantly on the accuracy of the survey results because independent weighting took place within each stratum, where islands were considered to be sufficiently homogenous.
Other [oth]
Questionaires and forms used for the 2005 FSM HIES consisted of 1) HIES Questionnaire and 2) Weekly Diaries. The HIES Questionnaire were provided to enumerators and should be filled out during the first visit to the household. Its main objective was to collect housing information, basic demographic information about members of the household, and general household expenditures over the previous year. On the other hand, the weekly diaries, was an attempt to record household expenditure on a daily basis over the course of a 2 week period. Both the HIES questionnaire and the weekly diary were developed and modeled after similar forms from the 1998 FSM HIES Survey and the 2004 Palau HIES Survey. Dr. Micheal Levin from the US Census Bureau, International Program Center (IPC), Ms. Brihmer Johnson of the FSM Division of Statistics and Mr. Glenn McKinlay, statistics advisor to FSM Division of Statistics, provided crucial inputs to the overall design of these forms. All questionnaires and diaries used during the HIES were printed in English so it was extremely important that field interviewers understand the instuctions and questions contained within. Testing of the questionnaire were carried out by FSM Division of Statistics staffs who conducted "real" interviews with certain households in their neighborhood as well as having their own household be interviewed by a different office staff. Specific sections for both the HIES questionnaire and the weekly diaries are outlined below:
I. HIES Questionnaire
1) General Household characteristics 2 ) Individual Person Characteristics 3) General Expenditure Listings - 12 Months Recall Period
II. 2 Week Daily Diaries
1) Daily Expenditure Diary - Day1 (Mon) thru Day7 (Sun) 2) Home Produced Items 3) Gifts Given Away 4) Gifts Received 5) Unusual Expenses for Special Events
Data editing of the 2005 FSM HIES data occurred over several instances during the data processing phase of the project and afterwards prior to putting together the final report. After a two weeks office review and call backs right after the enumeration phase, the initial phase of data editing took place on July 18, 2005 when the data processing phase of the survey commenced. Training for editing and coding took place on the same day along with the signing of contracts for 10 office clerks recruited to carry out this phase of the survey. As part of their contract, these individuals were also hired to key in the data at a later time. One of their primary responsibily was to match geographic ids for questionnaire with corresponding diaries and ensure consistencies and valid entries accordingly. No computer consistency edit checks were run against the data during the keying/verification process since the programs for these processes were not available at the time. All data quality checks and edits were done at the US Bureau of Census. Further edits were applied to the data during the data analysis and report writing process.
There were five types of checks performed: Structural check, Verification check, Consistency check, Macro Editing check, Data Quality assessment. Edit lists were also produced for health module, income and expenditure questionnaire which needed to be checked against the questionnaires. On the edit list, corrections of errors were made by crossing out incorrect or missing values and entering the correct values in red. Missing amounts that were also missing on the questionnaire will need to be estimated using estimates from questionnaires in the same Enumeration District (ED) batch. For the diaries, the batch files were concatenated for each state and exported to tab delimited files. These files were imported into Excel and the unit price for each item was calculated using quantities and weights where possible. Records for each item were then filtered out and check for outlier unit price values (both large and small values as well as missing values). Values for missing amounts were imputed from estimated using average prices from the items within the same ED.
The office operations manual used for editing and coding the questionnaires and diaries is provided under "Technical Documents/Data Processing Documents/Office Editing & Coding."
Original Sample Size: 1,560 Households Original Sampling fraction 9.5%
Final Sample Size: 1,380 Households Final Sampling fraction 8.4%
The response rate for the final sample size of 1,380 households is 100 percent. The majority of households originally selected for the survey did respond to the survey. Households which have moved to other unselected areas or elsewhere and those who refused to respond were replaced with nearby households that were willing to participate in the survey.
No sampling error analysis of the survey was calculated.
The questionnaire design of the 2005 HIES vary from that of the 1998 HIES rendering comparison of the data to the 2005 HIES limited. However, when the data permits, comparisons were made.
Facebook
TwitterThis dataset originates from a series of experimental studies titled “Tough on People, Tolerant to AI? Differential Effects of Human vs. AI Unfairness on Trust” The project investigates how individuals respond to unfair behavior (distributive, procedural, and interactional unfairness) enacted by artificial intelligence versus human agents, and how such behavior affects cognitive and affective trust.1 Experiment 1a: The Impact of AI vs. Human Distributive Unfairness on TrustOverview: This dataset comes from an experimental study aimed at examining how individuals respond in terms of cognitive and affective trust when distributive unfairness is enacted by either an artificial intelligence (AI) agent or a human decision-maker. Experiment 1a specifically focuses on the main effect of the “type of decision-maker” on trust.Data Generation and Processing: The data were collected through Credamo, an online survey platform. Initially, 98 responses were gathered from students at a university in China. Additional student participants were recruited via Credamo to supplement the sample. Attention check items were embedded in the questionnaire, and participants who failed were automatically excluded in real-time. Data collection continued until 202 valid responses were obtained. SPSS software was used for data cleaning and analysis.Data Structure and Format: The data file is named “Experiment1a.sav” and is in SPSS format. It contains 28 columns and 202 rows, where each row corresponds to one participant. Columns represent measured variables, including: grouping and randomization variables, one manipulation check item, four items measuring distributive fairness perception, six items on cognitive trust, five items on affective trust, three items for honesty checks, and four demographic variables (gender, age, education, and grade level). The final three columns contain computed means for distributive fairness, cognitive trust, and affective trust.Additional Information: No missing data are present. All variable names are labeled in English abbreviations to facilitate further analysis. The dataset can be directly opened in SPSS or exported to other formats.2 Experiment 1b: The Mediating Role of Perceived Ability and Benevolence (Distributive Unfairness)Overview: This dataset originates from an experimental study designed to replicate the findings of Experiment 1a and further examine the potential mediating role of perceived ability and perceived benevolence.Data Generation and Processing: Participants were recruited via the Credamo online platform. Attention check items were embedded in the survey to ensure data quality. Data were collected using a rolling recruitment method, with invalid responses removed in real time. A total of 228 valid responses were obtained.Data Structure and Format: The dataset is stored in a file named Experiment1b.sav in SPSS format and can be directly opened in SPSS software. It consists of 228 rows and 40 columns. Each row represents one participant’s data record, and each column corresponds to a different measured variable. Specifically, the dataset includes: random assignment and grouping variables; one manipulation check item; four items measuring perceived distributive fairness; six items on perceived ability; five items on perceived benevolence; six items on cognitive trust; five items on affective trust; three items for attention check; and three demographic variables (gender, age, and education). The last five columns contain the computed mean scores for perceived distributive fairness, ability, benevolence, cognitive trust, and affective trust.Additional Notes: There are no missing values in the dataset. All variables are labeled using standardized English abbreviations to facilitate reuse and secondary analysis. The file can be analyzed directly in SPSS or exported to other formats as needed.3 Experiment 2a: Differential Effects of AI vs. Human Procedural Unfairness on TrustOverview: This dataset originates from an experimental study aimed at examining whether individuals respond differently in terms of cognitive and affective trust when procedural unfairness is enacted by artificial intelligence versus human decision-makers. Experiment 2a focuses on the main effect of the decision agent on trust outcomes.Data Generation and Processing: Participants were recruited via the Credamo online survey platform from two universities located in different regions of China. A total of 227 responses were collected. After excluding those who failed the attention check items, 204 valid responses were retained for analysis. Data were processed and analyzed using SPSS software.Data Structure and Format: The dataset is stored in a file named Experiment2a.sav in SPSS format and can be directly opened in SPSS software. It contains 204 rows and 30 columns. Each row represents one participant’s response record, while each column corresponds to a specific variable. Variables include: random assignment and grouping; one manipulation check item; seven items measuring perceived procedural fairness; six items on cognitive trust; five items on affective trust; three attention check items; and three demographic variables (gender, age, and education). The final three columns contain computed average scores for procedural fairness, cognitive trust, and affective trust.Additional Notes: The dataset contains no missing values. All variables are labeled using standardized English abbreviations to facilitate reuse and secondary analysis. The file can be directly analyzed in SPSS or exported to other formats as needed.4 Experiment 2b: Mediating Role of Perceived Ability and Benevolence (Procedural Unfairness)Overview: This dataset comes from an experimental study designed to replicate the findings of Experiment 2a and to further examine the potential mediating roles of perceived ability and perceived benevolence in shaping trust responses under procedural unfairness.Data Generation and Processing: Participants were working adults recruited through the Credamo online platform. A rolling data collection strategy was used, where responses failing attention checks were excluded in real time. The final dataset includes 235 valid responses. All data were processed and analyzed using SPSS software.Data Structure and Format: The dataset is stored in a file named Experiment2b.sav, which is in SPSS format and can be directly opened using SPSS software. It contains 235 rows and 43 columns. Each row corresponds to a single participant, and each column represents a specific measured variable. These include: random assignment and group labels; one manipulation check item; seven items measuring procedural fairness; six items for perceived ability; five items for perceived benevolence; six items for cognitive trust; five items for affective trust; three attention check items; and three demographic variables (gender, age, education). The final five columns contain the computed average scores for procedural fairness, perceived ability, perceived benevolence, cognitive trust, and affective trust.Additional Notes: There are no missing values in the dataset. All variables are labeled using standardized English abbreviations to support future reuse and secondary analysis. The dataset can be directly analyzed in SPSS and easily converted into other formats if needed.5 Experiment 3a: Effects of AI vs. Human Interactional Unfairness on TrustOverview: This dataset comes from an experimental study that investigates how interactional unfairness, when enacted by either artificial intelligence or human decision-makers, influences individuals’ cognitive and affective trust. Experiment 3a focuses on the main effect of the “decision-maker type” under interactional unfairness conditions.Data Generation and Processing: Participants were college students recruited from two universities in different regions of China through the Credamo survey platform. After excluding responses that failed attention checks, a total of 203 valid cases were retained from an initial pool of 223 responses. All data were processed and analyzed using SPSS software.Data Structure and Format: The dataset is stored in the file named Experiment3a.sav, in SPSS format and compatible with SPSS software. It contains 203 rows and 27 columns. Each row represents a single participant, while each column corresponds to a specific measured variable. These include: random assignment and condition labels; one manipulation check item; four items measuring interactional fairness perception; six items for cognitive trust; five items for affective trust; three attention check items; and three demographic variables (gender, age, education). The final three columns contain computed average scores for interactional fairness, cognitive trust, and affective trust.Additional Notes: There are no missing values in the dataset. All variable names are provided using standardized English abbreviations to facilitate secondary analysis. The data can be directly analyzed using SPSS and exported to other formats as needed.6 Experiment 3b: The Mediating Role of Perceived Ability and Benevolence (Interactional Unfairness)Overview: This dataset comes from an experimental study designed to replicate the findings of Experiment 3a and further examine the potential mediating roles of perceived ability and perceived benevolence under conditions of interactional unfairness.Data Generation and Processing: Participants were working adults recruited via the Credamo platform. Attention check questions were embedded in the survey, and responses that failed these checks were excluded in real time. Data collection proceeded in a rolling manner until a total of 227 valid responses were obtained. All data were processed and analyzed using SPSS software.Data Structure and Format: The dataset is stored in the file named Experiment3b.sav, in SPSS format and compatible with SPSS software. It includes 227 rows and
Facebook
TwitterSUMMARYTo be viewed in combination with the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset.This dataset shows where there was no data* relating to one of more of the following factors:Obesity/inactivity-related illnesses (recorded at the GP practice catchment area level*)Adult obesity (recorded at the GP practice catchment area level*)Inactivity in children (recorded at the district level)Excess weight in children (recorded at the Middle Layer Super Output Area level)* GPs do not have catchments that are mutually exclusive from each other: they overlap, with some geographic areas being covered by 30+ practices.GP data for the financial year 1st April 2018 – 31st March 2019 was used in preference to data for the financial year 1st April 2019 – 31st March 2020, as the onset of the COVID19 pandemic during the latter year could have affected the reporting of medical statistics by GPs. However, for 53 GPs (out of 7670) that did not submit data in 2018/19, data from 2019/20 was used instead. This dataset identifies areas where data from 2019/20 was used, where one or more GPs did not submit data in either year (this could be because there are rural areas that aren’t officially covered by any GP practices), or where there were large discrepancies between the 2018/19 and 2019/20 data (differences in statistics that were > mean +/- 1 St.Dev.), which suggests erroneous data in one of those years (it was not feasible for this study to investigate this further), and thus where data should be interpreted with caution.Results of the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ analysis in these areas should be interpreted with caution, particularly if the levels of obesity, inactivity and associated illnesses appear to be significantly lower than in their immediate surrounding areas.Really small areas with ‘missing’ data were deleted, where it was deemed that missing data will not have impacted the overall analysis (i.e. where GP data was missing from really small countryside areas where no people live).See also Health and wellbeing statistics (GP-level, England): Missing data and potential outliers dataDATA SOURCESThis dataset was produced using:- Quality and Outcomes Framework data: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital.- National Child Measurement Programme: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital. - Active Lives Survey 2019: Sport and Physical Activity Levels amongst children and young people in school years 1-11 (aged 5-16). © Sport England 2020.- Active Lives Survey 2019: Sport and Physical Activity Levels amongst adults aged 16+. © Sport England 2020.- GP Catchment Outlines. Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital. Data was cleaned by Ribble Rivers Trust before use.- Administrative boundaries: Boundary-LineTM: Contains Ordnance Survey data © Crown copyright and database right 2021. Contains public sector information licensed under the Open Government Licence v3.0.- MSOA boundaries: © Office for National Statistics licensed under the Open Government Licence v3.0. Contains OS data © Crown copyright and database right 2021.COPYRIGHT NOTICEThe reproduction of this data must be accompanied by the following statement:© Ribble Rivers Trust 2021. Analysis carried out using data that is: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital; © Sport England 2020; © Office for National Statistics licensed under the Open Government Licence v3.0. Contains Ordnance Survey data © Crown copyright and database right 2021. Contains public sector information licensed under the Open Government Licence v3.0.CaBA HEALTH & WELLBEING EVIDENCE BASEThis dataset forms part of the wider CaBA Health and Wellbeing Evidence Base.
Facebook
TwitterThis file contains all of the cases and variables that are in the original 2014 General Social Survey, but is prepared for easier use in the classroom. Changes have been made in two areas. First, to avoid confusion when constructing tables or interpreting basic analysis, all missing data codes have been set to system missing. Second, many of the continuous variables have been categorized into fewer categories, and added as additional variables to the file.
The General Social Surveys (GSS) have been conducted by the National Opinion Research Center (NORC) annually since 1972, except for the years 1979, 1981, and 1992 (a supplement was added in 1992), and biennially beginning in 1994. The GSS are designed to be part of a program of social indicator research, replicating questionnaire items and wording in order to facilitate time-trend studies. This data file has all cases and variables asked on the 2014 GSS. There are a total of 3,842 cases in the data set but their initial sampling years vary because the GSS now contains panel cases. Sampling years can be identified with the variable SAMPTYPE.
To download syntax files for the GSS that reproduce well-known religious group recodes, including RELTRAD, please visit the "/research/syntax-repository-list" Target="_blank">ARDA's Syntax Repository.
Facebook
TwitterMillennium Challenge Corporation hired Mathematica Policy Research to conduct an independent evaluation of the BRIGHT II program. The three main research questions of interest are: • What was the impact of the program on school enrollment, attendance, and retention? • What was the impact of the program on test scores? • Are the impacts different for girls than for boys?
Mathematica will compare data collected from the 132 communities served by BRIGHT II (the "treatment group") with that collected from the 161 communities that applied but were not selected for the program (the "comparison group"). Using a statistical technique called regression discontinuity, Mathematica will compare the outcomes of the treatment villages just above the cutoff point to the outcomes of the comparison villages just below the cutoff point. If the intervention had an impact, we will observe a "jump" in outcomes at the point of discontinuity.
Mathematica will perform additional analyses to estimate the overall merit of the BRIGHT investment. By conducting a cost-benefit analysis and a cost-effectiveness analysis and calculating the economic rate of return, Mathematica will be able to answer questions related to the sustainability of the program, and compare the program to interventions and social investments in other sectors. The household survey is designed to capture household-level data rather than community-level data; however, questions have been included to measure head-of-household expectations of educational attainment. These questions ask the head of household what grade level he hopes each child will attain; and what grade level he thinks the child will be capable of achieving in reality.
132 rural villages throughout the 10 provinces of Burkina Faso in which girls' enrollment rates were lowest
Households
Households, students, and educators in the 287 villages surveyed
Sample survey data [ssd]
The BRIGHT II program was implemented in the same 132 villages that received the BRIGHT I interventions. These 132 villages were originally selected using a scoring process, with eligibility scores based on the villages’ potential to improve girls’ educational outcomes. A total of 293 villages applied to receive a BRIGHT school; the Burkina Faso Ministry of Basic Education (MEBA) selected the 132 villages with scores that were above a certain cutoff point. Whenever possible, the survey will be conducted with the same children in the same households and schools surveyed during the BRIGHT I evaluation. By visiting the same households and schools, the evaluator will be able to better assess the longer-term impacts of the BRIGHT project.
Mathematica has developed two surveys, a household survey and a school survey, to collect relevant data from villages in both the treatment and comparison groups. The household survey was administered to a new cross-section of households compared to the BRIGHT I evaluation. Data will be collected on the attendance and educational attainment of school-age children in the household, attitudes towards girls' education, and parental assessment of the extent to which the complementary interventions influenced school enrollment decisions. It will also assess the performance of all household children on basic tests of French and math. The school survey, to be administered to all local schools in the 293 villages, gathers data on school characteristics, personnel, and physical structure, and collects enrollment and attendance records. Data will be gathered by a local data collection firm selected by MCA-Burkina Faso, with Mathematica providing technical assistance and oversight.
Following data collection, Mathematica will work with BERD to ensure that the data are correctly entered and are complete and clean. This will include a review of all frequencies for out-of-range responses, missing data, or other problems, as well as a comparison between the data and paper copies for a random selection of variables.
Facebook
TwitterThe survey is designed to collect, analyze, and disseminate demographic and health data pertaining to the Palestinian population living in the Palestinian Territory, with a focus on demography, fertility, infertility, family planning, unmet needs, and maternal and child health, in addition to youth and the elderly. The 2010 survey includes new sections and elements, such as basic health and socio-economic information on different groups within the population: ever married woman less than 55 years and children aged less than five years, child labor in the age 5-14 years, child discipline 2-14 years, person education 5-24 years, youth aged 15-29 years, and elderly people over the age of 60.
The Data are representative at region level (West Bank, Gaza Strip), locality type (urban, rural, camp) and governorates
Household, individual
The survey covered all the Palestinian households who are a usual residence in the Palestine.
Sample survey data [ssd]
Target Population The target population of the survey consists of all the following groups: 1- All Palestinian households normally residing in the Palestinian Territory. 2- Females aged 15 - 54 years. 3- Elderly people aged 60 or over. 4- Children aged 0 - 14 years and divided into the following categories: 0-5 years, 2-14 years, 5-14 years, with parts of the questionnaire customized for each group. 5- Youth aged 15 - 29 years and divided into the following categories: 15-24 years, 25-29 years, with parts of the questionnaire customized for youth.
Sampling Frame We relied on sampling frames established in PCBS and basically comprising the list of enumeration areas. (The enumeration area is a geographical area containing a number of buildings and housing units of about 120 housing units on average.)
The total frame consists of the following two parts:
1- West Bank and Gaza Sampling Frame: containing enumeration areas drawn up in 2007. In the West Bank: each enumeration area consists of a list of households with identification data to ascertain the address of individual households. In Gaza: each enumeration area contains a list of housing units with addresses to ascertain the address of individual households, plus identification data of the housing units.
2- Jerusalem Sampling Frame (inside checkpoints): contains enumeration areas only, geographically divided with information about the total number of households in these areas. However, there is no detailed information about addresses inside enumeration areas and the size of the enumeration area can be ascertained without the ability to identify the addresses.
Design Strata In the survey, two variables were chosen to divide the population into strata, depending on the homogeneity of parts of the population. Previous studies have shown that Palestinian households may be divided as follows: 1- Governorates: there are 16 governorates in the Palestinian Territory: 11 governorates in the West Bank and 5 in the Gaza Strip. 2- Locality Types: there are three types : urban, rural and refugee camps. All the available frames contain the strata variables. Sample Size We use the following formula to estimate the sample size:
n = [(1.15) f(1-r) (r)4]
[(nh) p2(r0.07)]
Where: - n: sample size requested for the main indicator or main estimate - 4: is a factor to achieve a 95 percent level of confidence - r is the predicted or anticipated prevalence (coverage rate) for the indicator - being estimated - 1.15 is the factor necessary to raise the sample size by 20 percent for non-response - f is the design effect - 0.07r is the margin of error to be tolerated at the 95 percent level of confidence, defined as 7 percent of r (7 percent represents the relative sampling error of r) - p is the proportion of the total population upon which the indicator, r, is based - nh is the average household size
To estimate the sample size of the survey we rely on the percentage of children under 5 years who suffer from stunting. We consider it as the main indicator for the survey (r) and it equals 10.2% (from MICS3 data -2006). Also, by returning to census data in 2007 we find the percentage of children aged 0 - 4 years =14.1%. Finally, the sample size = 15,355
Sample Design and Type After determining the sample size, which equals 15456 households, we selected a probability sample - a multi-stage stratified cluster sample as follows: 1- First stage: selecting a sample of clusters (enumeration areas) using PPS without replacement method to obtain 644 enumeration areas from the total enumeration area frame. 2- Second stage: selecting 24 households from each selected enumeration area of the first stage and using the systematic sample method. When reaching households, we enumerate all the targeted individuals from the groups: women (15-54) years, elderly aged 60 and more, children aged 0-5 years. 3- Third stage: selecting one child of age group 2-14 years for part of the questionnaire and one young person from the 15-29 age group to answer the youth attachment in the questionnaire. We use the Kish table to select one person at random.
Face-to-face [f2f]
The design of the survey complied with the standard specifications of health surveys previously implemented by PCBS. In addition, the survey included indicators of MICS4 to meet the needs of all partners.
1. Main questionnaire with the following parts:
· Household questionnaire: Covers demographic and educational characteristics, chronic disease, smoking, discipline of children (2-14 years), child labor (5-14 years), education of children (5-24 years) and housing characteristics.
· Health of women (15-54 years) regardless of marital status, awareness about AIDS, anemia in women (15-49 years).
· Ever married women (15-54 years): Covers general characteristics of qualified women, reproduction, child mortality, maternal care, reproductive morbidity, family planning, and attitudes towards reproduction.
· Children under age of 5: Covers children's health, vaccination against childhood diseases, early childhood development, chronic disease, and anemia.
2. Attached questionnaires
· Youth questionnaire (15-29 years): Covers general characteristics, awareness and perception of family planning, health status, awareness about sexually transmitted diseases and reproduction.
· Elderly questionnaire (60 years and over): Covers general characteristics, social relations, activities, time-use, health status, and use of mass media.
Data editing took place at a number of stages through the processing including:
The survey sample consists of about 15,355 households of which 13,629 households completed the interview; whereas 8,740 households from the West Bank and 4,889 households in Gaza Strip. Weights were modified to account for non-response rate. The response rate in the West Bank reached 90.5% while in the Gaza Strip it reached 94.8%. The response rate in the Palestinian Territory reached 92.0%.
Detailed information on the sampling Error is available in the Survey Report.
Different methods were applied in the assessment of the survey data, including: 1. Occurrences of missing values and answers like "other" and "do not know". 2. Examining inconsistencies between the various sections of the questionnaire, including within record and cross-record consistencies. 3. Comparability of data with previous surveys 2000, 2006 and showed logical homogeneity in the results. The results of these assessment procedures show that the data are of high quality and consistency.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
GENERAL INFORMATION
Title of Dataset: A dataset from a survey investigating disciplinary differences in data citation
Date of data collection: January to March 2022
Collection instrument: SurveyMonkey
Funding: Alfred P. Sloan Foundation
SHARING/ACCESS INFORMATION
Licenses/restrictions placed on the data: These data are available under a CC BY 4.0 license
Links to publications that cite or use the data:
Gregory, K., Ninkov, A., Ripp, C., Peters, I., & Haustein, S. (2022). Surveying practices of data citation and reuse across disciplines. Proceedings of the 26th International Conference on Science and Technology Indicators. International Conference on Science and Technology Indicators, Granada, Spain. https://doi.org/10.5281/ZENODO.6951437
Gregory, K., Ninkov, A., Ripp, C., Roblin, E., Peters, I., & Haustein, S. (2023). Tracing data:
A survey investigating disciplinary differences in data citation. Zenodo. https://doi.org/10.5281/zenodo.7555266
DATA & FILE OVERVIEW
File List
Additional related data collected that was not included in the current data package: Open ended questions asked to respondents
METHODOLOGICAL INFORMATION
Description of methods used for collection/generation of data:
The development of the questionnaire (Gregory et al., 2022) was centered around the creation of two main branches of questions for the primary groups of interest in our study: researchers that reuse data (33 questions in total) and researchers that do not reuse data (16 questions in total). The population of interest for this survey consists of researchers from all disciplines and countries, sampled from the corresponding authors of papers indexed in the Web of Science (WoS) between 2016 and 2020.
Received 3,632 responses, 2,509 of which were completed, representing a completion rate of 68.6%. Incomplete responses were excluded from the dataset. The final total contains 2,492 complete responses and an uncorrected response rate of 1.57%. Controlling for invalid emails, bounced emails and opt-outs (n=5,201) produced a response rate of 1.62%, similar to surveys using comparable recruitment methods (Gregory et al., 2020).
Methods for processing the data:
Results were downloaded from SurveyMonkey in CSV format and were prepared for analysis using Excel and SPSS by recoding ordinal and multiple choice questions and by removing missing values.
Instrument- or software-specific information needed to interpret the data:
The dataset is provided in SPSS format, which requires IBM SPSS Statistics. The dataset is also available in a coded format in CSV. The Codebook is required to interpret to values.
DATA-SPECIFIC INFORMATION FOR: MDCDataCitationReuse2021surveydata
Number of variables: 94
Number of cases/rows: 2,492
Missing data codes: 999 Not asked
Refer to MDCDatacitationReuse2021Codebook.pdf for detailed variable information.