Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
GENERAL INFORMATION
Title of Dataset: A dataset from a survey investigating disciplinary differences in data citation
Date of data collection: January to March 2022
Collection instrument: SurveyMonkey
Funding: Alfred P. Sloan Foundation
SHARING/ACCESS INFORMATION
Licenses/restrictions placed on the data: These data are available under a CC BY 4.0 license
Links to publications that cite or use the data:
Gregory, K., Ninkov, A., Ripp, C., Peters, I., & Haustein, S. (2022). Surveying practices of data citation and reuse across disciplines. Proceedings of the 26th International Conference on Science and Technology Indicators. International Conference on Science and Technology Indicators, Granada, Spain. https://doi.org/10.5281/ZENODO.6951437
Gregory, K., Ninkov, A., Ripp, C., Roblin, E., Peters, I., & Haustein, S. (2023). Tracing data:
A survey investigating disciplinary differences in data citation. Zenodo. https://doi.org/10.5281/zenodo.7555266
DATA & FILE OVERVIEW
File List
Additional related data collected that was not included in the current data package: Open ended questions asked to respondents
METHODOLOGICAL INFORMATION
Description of methods used for collection/generation of data:
The development of the questionnaire (Gregory et al., 2022) was centered around the creation of two main branches of questions for the primary groups of interest in our study: researchers that reuse data (33 questions in total) and researchers that do not reuse data (16 questions in total). The population of interest for this survey consists of researchers from all disciplines and countries, sampled from the corresponding authors of papers indexed in the Web of Science (WoS) between 2016 and 2020.
Received 3,632 responses, 2,509 of which were completed, representing a completion rate of 68.6%. Incomplete responses were excluded from the dataset. The final total contains 2,492 complete responses and an uncorrected response rate of 1.57%. Controlling for invalid emails, bounced emails and opt-outs (n=5,201) produced a response rate of 1.62%, similar to surveys using comparable recruitment methods (Gregory et al., 2020).
Methods for processing the data:
Results were downloaded from SurveyMonkey in CSV format and were prepared for analysis using Excel and SPSS by recoding ordinal and multiple choice questions and by removing missing values.
Instrument- or software-specific information needed to interpret the data:
The dataset is provided in SPSS format, which requires IBM SPSS Statistics. The dataset is also available in a coded format in CSV. The Codebook is required to interpret to values.
DATA-SPECIFIC INFORMATION FOR: MDCDataCitationReuse2021surveydata
Number of variables: 94
Number of cases/rows: 2,492
Missing data codes: 999 Not asked
Refer to MDCDatacitationReuse2021Codebook.pdf for detailed variable information.
Facebook
TwitterThis dataset originates from a series of experimental studies titled “Tough on People, Tolerant to AI? Differential Effects of Human vs. AI Unfairness on Trust” The project investigates how individuals respond to unfair behavior (distributive, procedural, and interactional unfairness) enacted by artificial intelligence versus human agents, and how such behavior affects cognitive and affective trust.1 Experiment 1a: The Impact of AI vs. Human Distributive Unfairness on TrustOverview: This dataset comes from an experimental study aimed at examining how individuals respond in terms of cognitive and affective trust when distributive unfairness is enacted by either an artificial intelligence (AI) agent or a human decision-maker. Experiment 1a specifically focuses on the main effect of the “type of decision-maker” on trust.Data Generation and Processing: The data were collected through Credamo, an online survey platform. Initially, 98 responses were gathered from students at a university in China. Additional student participants were recruited via Credamo to supplement the sample. Attention check items were embedded in the questionnaire, and participants who failed were automatically excluded in real-time. Data collection continued until 202 valid responses were obtained. SPSS software was used for data cleaning and analysis.Data Structure and Format: The data file is named “Experiment1a.sav” and is in SPSS format. It contains 28 columns and 202 rows, where each row corresponds to one participant. Columns represent measured variables, including: grouping and randomization variables, one manipulation check item, four items measuring distributive fairness perception, six items on cognitive trust, five items on affective trust, three items for honesty checks, and four demographic variables (gender, age, education, and grade level). The final three columns contain computed means for distributive fairness, cognitive trust, and affective trust.Additional Information: No missing data are present. All variable names are labeled in English abbreviations to facilitate further analysis. The dataset can be directly opened in SPSS or exported to other formats.2 Experiment 1b: The Mediating Role of Perceived Ability and Benevolence (Distributive Unfairness)Overview: This dataset originates from an experimental study designed to replicate the findings of Experiment 1a and further examine the potential mediating role of perceived ability and perceived benevolence.Data Generation and Processing: Participants were recruited via the Credamo online platform. Attention check items were embedded in the survey to ensure data quality. Data were collected using a rolling recruitment method, with invalid responses removed in real time. A total of 228 valid responses were obtained.Data Structure and Format: The dataset is stored in a file named Experiment1b.sav in SPSS format and can be directly opened in SPSS software. It consists of 228 rows and 40 columns. Each row represents one participant’s data record, and each column corresponds to a different measured variable. Specifically, the dataset includes: random assignment and grouping variables; one manipulation check item; four items measuring perceived distributive fairness; six items on perceived ability; five items on perceived benevolence; six items on cognitive trust; five items on affective trust; three items for attention check; and three demographic variables (gender, age, and education). The last five columns contain the computed mean scores for perceived distributive fairness, ability, benevolence, cognitive trust, and affective trust.Additional Notes: There are no missing values in the dataset. All variables are labeled using standardized English abbreviations to facilitate reuse and secondary analysis. The file can be analyzed directly in SPSS or exported to other formats as needed.3 Experiment 2a: Differential Effects of AI vs. Human Procedural Unfairness on TrustOverview: This dataset originates from an experimental study aimed at examining whether individuals respond differently in terms of cognitive and affective trust when procedural unfairness is enacted by artificial intelligence versus human decision-makers. Experiment 2a focuses on the main effect of the decision agent on trust outcomes.Data Generation and Processing: Participants were recruited via the Credamo online survey platform from two universities located in different regions of China. A total of 227 responses were collected. After excluding those who failed the attention check items, 204 valid responses were retained for analysis. Data were processed and analyzed using SPSS software.Data Structure and Format: The dataset is stored in a file named Experiment2a.sav in SPSS format and can be directly opened in SPSS software. It contains 204 rows and 30 columns. Each row represents one participant’s response record, while each column corresponds to a specific variable. Variables include: random assignment and grouping; one manipulation check item; seven items measuring perceived procedural fairness; six items on cognitive trust; five items on affective trust; three attention check items; and three demographic variables (gender, age, and education). The final three columns contain computed average scores for procedural fairness, cognitive trust, and affective trust.Additional Notes: The dataset contains no missing values. All variables are labeled using standardized English abbreviations to facilitate reuse and secondary analysis. The file can be directly analyzed in SPSS or exported to other formats as needed.4 Experiment 2b: Mediating Role of Perceived Ability and Benevolence (Procedural Unfairness)Overview: This dataset comes from an experimental study designed to replicate the findings of Experiment 2a and to further examine the potential mediating roles of perceived ability and perceived benevolence in shaping trust responses under procedural unfairness.Data Generation and Processing: Participants were working adults recruited through the Credamo online platform. A rolling data collection strategy was used, where responses failing attention checks were excluded in real time. The final dataset includes 235 valid responses. All data were processed and analyzed using SPSS software.Data Structure and Format: The dataset is stored in a file named Experiment2b.sav, which is in SPSS format and can be directly opened using SPSS software. It contains 235 rows and 43 columns. Each row corresponds to a single participant, and each column represents a specific measured variable. These include: random assignment and group labels; one manipulation check item; seven items measuring procedural fairness; six items for perceived ability; five items for perceived benevolence; six items for cognitive trust; five items for affective trust; three attention check items; and three demographic variables (gender, age, education). The final five columns contain the computed average scores for procedural fairness, perceived ability, perceived benevolence, cognitive trust, and affective trust.Additional Notes: There are no missing values in the dataset. All variables are labeled using standardized English abbreviations to support future reuse and secondary analysis. The dataset can be directly analyzed in SPSS and easily converted into other formats if needed.5 Experiment 3a: Effects of AI vs. Human Interactional Unfairness on TrustOverview: This dataset comes from an experimental study that investigates how interactional unfairness, when enacted by either artificial intelligence or human decision-makers, influences individuals’ cognitive and affective trust. Experiment 3a focuses on the main effect of the “decision-maker type” under interactional unfairness conditions.Data Generation and Processing: Participants were college students recruited from two universities in different regions of China through the Credamo survey platform. After excluding responses that failed attention checks, a total of 203 valid cases were retained from an initial pool of 223 responses. All data were processed and analyzed using SPSS software.Data Structure and Format: The dataset is stored in the file named Experiment3a.sav, in SPSS format and compatible with SPSS software. It contains 203 rows and 27 columns. Each row represents a single participant, while each column corresponds to a specific measured variable. These include: random assignment and condition labels; one manipulation check item; four items measuring interactional fairness perception; six items for cognitive trust; five items for affective trust; three attention check items; and three demographic variables (gender, age, education). The final three columns contain computed average scores for interactional fairness, cognitive trust, and affective trust.Additional Notes: There are no missing values in the dataset. All variable names are provided using standardized English abbreviations to facilitate secondary analysis. The data can be directly analyzed using SPSS and exported to other formats as needed.6 Experiment 3b: The Mediating Role of Perceived Ability and Benevolence (Interactional Unfairness)Overview: This dataset comes from an experimental study designed to replicate the findings of Experiment 3a and further examine the potential mediating roles of perceived ability and perceived benevolence under conditions of interactional unfairness.Data Generation and Processing: Participants were working adults recruited via the Credamo platform. Attention check questions were embedded in the survey, and responses that failed these checks were excluded in real time. Data collection proceeded in a rolling manner until a total of 227 valid responses were obtained. All data were processed and analyzed using SPSS software.Data Structure and Format: The dataset is stored in the file named Experiment3b.sav, in SPSS format and compatible with SPSS software. It includes 227 rows and
Facebook
Twitterhttps://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
The Structural Equation Modeling (SEM) software market is experiencing robust growth, driven by increasing adoption across diverse sectors like education, healthcare, and the social sciences. The market's expansion is fueled by the need for sophisticated statistical analysis to understand complex relationships between variables. Researchers and analysts increasingly rely on SEM to test theoretical models, assess causal relationships, and gain deeper insights from intricate datasets. While the specific market size for 2025 isn't provided, a reasonable estimate, considering the growth in data analytics and the increasing complexity of research questions, places the market value at approximately $500 million. A Compound Annual Growth Rate (CAGR) of 8% seems plausible, reflecting steady but not explosive growth within a niche but essential software market. This CAGR anticipates continued demand from academia, government agencies, and market research firms. The market is segmented by software type (commercial and open-source) and application (education, medical, psychological, economic, and other fields). Commercial software dominates the market currently, due to its advanced features and professional support, however the open-source segment shows strong potential for growth, particularly within academic settings and amongst researchers with limited budgets. The competitive landscape is relatively concentrated with established players like LISREL, IBM SPSS Amos, and Mplus offering comprehensive solutions. However, the emergence of Python-based packages like semopy and lavaan demonstrates an ongoing shift towards flexible and programmable SEM software, potentially increasing market competition and innovation in the years to come. Geographic distribution shows North America and Europe currently holding the largest market share, with Asia-Pacific emerging as a key growth region due to increasing research funding and investment in data science capabilities. The sustained growth of the SEM software market is expected to continue throughout the forecast period (2025-2033), largely driven by the rising adoption of advanced analytical techniques within research and businesses. Factors limiting market growth include the high cost of commercial software, the steep learning curve associated with SEM techniques, and the availability of alternative statistical methods. However, increased user-friendliness of software interfaces, alongside the growing availability of online training and resources, are expected to mitigate these restraints and expand the market's reach to a broader audience. Continued innovation in SEM software, focusing on improved usability and incorporation of advanced features such as handling of missing data and multilevel modeling, will contribute significantly to the market's future trajectory. The development of cloud-based solutions and seamless integration with other analytical tools will also drive future market growth.
Facebook
Twitterhttps://qdr.syr.edu/policies/qdr-standard-access-conditionshttps://qdr.syr.edu/policies/qdr-standard-access-conditions
This is an Annotation for Transparent Inquiry (ATI) data project. The annotated article can be viewed on the Publisher's Website. Data Generation The research project engages a story about perceptions of fairness in criminal justice decisions. The specific focus involves a debate between ProPublica, a news organization, and Northpointe, the owner of a popular risk tool called COMPAS. ProPublica wrote that COMPAS was racist against blacks, while Northpointe posted online a reply rejecting such a finding. These two documents were the obvious foci of the qualitative analysis because of the further media attention they attracted, the confusion their competing conclusions caused readers, and the power both companies wield in public circles. There were no barriers to retrieval as both documents have been publicly available on their corporate websites. This public access was one of the motivators for choosing them as it meant that they were also easily attainable by the general public, thus extending the documents’ reach and impact. Additional materials from ProPublica relating to the main debate were also freely downloadable from its website and a third party, open source platform. Access to secondary source materials comprising additional writings from Northpointe representatives that could assist in understanding Northpointe’s main document, though, was more limited. Because of a claim of trade secrets on its tool and the underlying algorithm, it was more difficult to reach Northpointe’s other reports. Nonetheless, largely because its clients are governmental bodies with transparency and accountability obligations, some of Northpointe-associated reports were retrievable from third parties who had obtained them, largely through Freedom of Information Act queries. Together, the primary and (retrievable) secondary sources allowed for a triangulation of themes, arguments, and conclusions. The quantitative component uses a dataset of over 7,000 individuals with information that was collected and compiled by ProPublica and made available to the public on github. ProPublica’s gathering the data directly from criminal justice officials via Freedom of Information Act requests rendered the dataset in the public domain, and thus no confidentiality issues are present. The dataset was loaded into SPSS v. 25 for data analysis. Data Analysis The qualitative enquiry used critical discourse analysis, which investigates ways in which parties in their communications attempt to create, legitimate, rationalize, and control mutual understandings of important issues. Each of the two main discourse documents was parsed on its own merit. Yet the project was also intertextual in studying how the discourses correspond with each other and to other relevant writings by the same authors. Several more specific types of discursive strategies were of interest in attracting further critical examination: Testing claims and rationalizations that appear to serve the speaker’s self-interest Examining conclusions and determining whether sufficient evidence supported them Revealing contradictions and/or inconsistencies within the same text and intertextually Assessing strategies underlying justifications and rationalizations used to promote a party’s assertions and arguments Noticing strategic deployment of lexical phrasings, syntax, and rhetoric Judging sincerity of voice and the objective consideration of alternative perspectives Of equal importance in a critical discourse analysis is consideration of what is not addressed, that is to uncover facts and/or topics missing from the communication. For this project, this included parsing issues that were either briefly mentioned and then neglected, asserted yet the significance left unstated, or not suggested at all. This task required understanding common practices in the algorithmic data science literature. The paper could have been completed with just the critical discourse analysis. However, because one of the salient findings from it highlighted that the discourses overlooked numerous definitions of algorithmic fairness, the call to fill this gap seemed obvious. Then, the availability of the same dataset used by the parties in conflict, made this opportunity more appealing. Calculating additional algorithmic equity equations would not thereby be troubled by irregularities because of diverse sample sets. New variables were created as relevant to calculate algorithmic fairness equations. In addition to using various SPSS Analyze functions (e.g., regression, crosstabs, means), online statistical calculators were useful to compute z-test comparisons of proportions and t-test comparisons of means. Logic of Annotation Annotations were employed to fulfil a variety of functions, including supplementing the main text with context, observations, counter-points, analysis, and source attributions. These fall under a few categories. Space considerations. Critical discourse analysis offers a rich method...
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The dataset was generated from a laboratory experiment based on the dot-matrix integration paradigm, designed to measure death thought accessibility (DTA). The study was conducted under controlled conditions, with participants tested individually in a quiet, dimly lit room. Stimulus presentation and response collection were implemented using PsychoPy (exact version number provided in the supplementary materials), and reaction times were recorded via a standard USB keyboard. Experimental stimuli consisted of five categories of two-character Chinese words rendered in dot-matrix form: death-related words, metaphorical-death words, positive words, neutral words, and meaningless words. Stimuli were centrally displayed on the screen, with presentation durations and inter-stimulus intervals (ISI) precisely controlled at the millisecond level.Data collection took place in spring 2025, with a total of 39 participants contributing approximately 16,699 valid trials. Each trial-level record includes participant ID, priming condition (0 = neutral priming, 1 = mortality salience priming), word type, inter-stimulus interval (in milliseconds), reaction time (in milliseconds), and recognition accuracy (0 = incorrect, 1 = correct). In the dataset, rows correspond to single trials and columns represent experimental variables. Reaction times were measured in milliseconds and later log-transformed for statistical analyses to reduce skewness. Accuracy was coded as a binary variable indicating correct recognition.Data preprocessing included the removal of extreme reaction times (less than 150 ms or greater than 3000 ms). Only trials with valid responses were retained for analysis. Missing data were minimal (<1% of all trials), primarily due to occasional non-responses by participants, and are explicitly marked in the dataset. Potential sources of error include natural individual variability in reaction times and minor recording fluctuations from input devices, which are within the millisecond range and do not affect overall patterns.The data files are stored in Excel format (.xlsx), with each participant’s data saved in a separate file named according to the participant ID. Within each file, the first row contains variable names, and subsequent rows record trial-level observations, allowing for straightforward data access and processing. Excel files are compatible with a wide range of statistical software, including R, Python, SPSS, and Microsoft Excel, and no additional software is required to open them. A supplementary documentation file accompanies the dataset, providing detailed explanations of all variables and data processing steps. A complete codebook of variable definitions is included in the appendix to facilitate data interpretation and ensure reproducibility of the analyses.
Facebook
Twitterhttp://rdm.uva.nl/en/support/confidential-data.htmlhttp://rdm.uva.nl/en/support/confidential-data.html
This database represents data from 480 respondents with the purpose to measure their intervention choice in community care with the instrument AICN (Assessment of Intervention choice in Community Nursing). Data collection took place at the Faculty of Health of the Amsterdam University of Applied Sciences, the Netherlands. The respondents are all baccalaureate nursing students in the fourth year of study close to graduation. Data were collected at three timepoints: around May 2016 (group 1215), May 2017 (group 1316) and May 2018 (group 1417). The student cohorts 1215 and 1316 form a historical control group, 1417 is the intervention group. The intervention group underwent a new four-year more ‘community-oriented’ curriculum, with five new curriculum themes related to caregiving in peoples own homes: (1) fostering patient self-management, (2) shared decision-making, (3) collaboration with the patients’ social system, (4) using healthcare technology, and (5) allocation of care.The aim of this study is to investigate the effect of this redesigned baccalaureate nursing curriculum on students’ intervention choice in community care. AICN is a measuring instrument containing three vignettes in which a situation in caregiving in the patients’ home is described. Each vignette incorporates all five new curriculum themes. The interventions with regard to each theme are a realistic option, while more ‘traditional’ intervention choices are also possible. To avoid students responding in a way they think to be correct, they are not aware of the instrument’s underlying purpose (i.e., determining the five themes). After reading each vignette, the respondents briefly formulate five, in their opinion, most suitable interventions for nursing caregiving. The fifteen interventions yield qualitative information. To allow for quantitative data analysis, the AICN includes a codebook describing the criteria used to recode each of the qualitative intervention descriptions into a quantitative value. As the manuscript describing AICN and codebook is yet under review, a link to the instrument will be added after publication. Filesets:1: SPSS file – 3 cohorts AICN without student numbers2: SPSS syntax file Variables in SPSS file (used in analysis):1: Cohort type2: Curriculum type (old vs. new)3-20: Dummy variables of demographics21-35: CSINV refers to case/intervention; CS1INV2 means case 1, intervention 236-50: Dummy variables of 21-35, representing the main outcome old vs. new intervention type51: Sum of dummy variables (range 1-15) representing the primary outcome AICN52: Sum of dummys like 51, but with respondents with missing variables included, used in the regression analysis53-58: Count the number of chosen interventions per curriculum theme59-60: Count missings (old curriculum = 59, new = 60)61-62: Count no intervention theme (old curriculum = 61, new = 62)ContactBecause of the sensitive nature of the data, the fileset is confidential and will be shared only under strict conditions. For more information contact opensciencesupport@hva.nl
Facebook
Twitterhttps://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de456864https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de456864
Abstract (en): The purpose of this data collection is to provide an official public record of the business of the federal courts. The data originate from 94 district and 12 appellate court offices throughout the United States. Information was obtained at two points in the life of a case: filing and termination. The termination data contain information on both filing and terminations, while the pending data contain only filing information. For the appellate and civil data, the unit of analysis is a single case. The unit of analysis for the criminal data is a single defendant. ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection: Performed consistency checks.; Standardized missing values.; Checked for undocumented or out-of-range codes.. All federal court cases, 1970-2000. 2012-05-22 All parts are being moved to restricted access and will be available only using the restricted access procedures.2005-04-29 The codebook files in Parts 57, 94, and 95 have undergone minor edits and been incorporated with their respective datasets. The SAS files in Parts 90, 91, 227, and 229-231 have undergone minor edits and been incorporated with their respective datasets. The SPSS files in Parts 92, 93, 226, and 228 have undergone minor edits and been incorporated with their respective datasets. Parts 15-28, 34-56, 61-66, 70-75, 82-89, 96-105, 107, 108, and 115-121 have had identifying information removed from the public use file and restricted data files that still include that information have been created. These parts have had their SPSS, SAS, and PDF codebook files updated to reflect the change. The data, SPSS, and SAS files for Parts 34-37 have been updated from OSIRIS to LRECL format. The codebook files for Parts 109-113 have been updated. The case counts for Parts 61-66 and 71-75 have been corrected in the study description. The LRECL for Parts 82, 100-102, and 105 have been corrected in the study description.2003-04-03 A codebook was created for Part 105, Civil Pending, 1997. Parts 232-233, SAS and SPSS setup files for Civil Data, 1996-1997, were removed from the collection since the civil data files for those years have corresponding SAS and SPSS setup files.2002-04-25 Criminal data files for Parts 109-113 have all been replaced with updated files. The updated files contain Criminal Terminations and Criminal Pending data in one file for the years 1996-2000. Part 114, originally Criminal Pending 2000, has been removed from the study and the 2000 pending data are now included in Part 113.2001-08-13 The following data files were revised to include plaintiff and defendant information: Appellate Terminations, 2000 (Part 107), Appellate Pending, 2000 (Part 108), Civil Terminations, 1996-2000 (Parts 103, 104, 115-117), and Civil Pending, 2000 (Part 118). The corresponding SAS and SPSS setup files and PDF codebooks have also been edited.2001-04-12 Criminal Terminations (Parts 109-113) data for 1996-2000 and Criminal Pending (Part 114) data for 2000 have been added to the data collection, along with corresponding SAS and SPSS setup files and PDF codebooks.2001-03-26 Appellate Terminations (Part 107) and Appellate Pending (Part 108) data for 2000 have been added to the data collection, along with corresponding SAS and SPSS setup files and PDF codebooks.1997-07-16 The data for 18 of the Criminal Data files were matched to the wrong part numbers and names, and now have been corrected. Funding insitution(s): United States Department of Justice. Office of Justice Programs. Bureau of Justice Statistics. (1) Several, but not all, of these record counts include a final blank record. Researchers may want to detect this occurrence and eliminate this record before analysis. (2) In July 1984, a major change in the recording and disposition of an appeal occurred, and several data fields dealing with disposition were restructured or replaced. The new structure more clearly delineates mutually exclusive dispositions. Researchers must exercise care in using these fields for comparisons. (3) In 1992, the Administrative Office of the United States Courts changed the reporting period for statistical data. Up to 1992, the reporting period...
Facebook
TwitterBackground and Objectives: Pharmacogenomics (PGx) leverages genomic information to tailor drug therapies, enhancing precision medicine. Despite global advancements, its implementation in Lebanon, Qatar, and Saudi Arabia faces unique challenges in clinical integration. This study aimed to investigate PGx attitudes, knowledge implementation, associated challenges, forecast future educational needs, and compare findings across the three countries. Methods: This cross-sectional study utilized an anonymous, self-administered online survey distributed to healthcare professionals, academics, and clinicians in Lebanon, Qatar, and Saudi Arabia. The survey comprised 18 questions to assess participants' familiarity with PGx, current implementation practices, perceived obstacles, potential integration strategies, and future educational needs. Results: The survey yielded 337 responses from healthcare professionals across the three countries. Data revealed significant variations in PGx familiarity an..., Ethical statement and informed consent Ethical approval for this study was obtained from the institutional review boards of the participating universities: Beirut Arab University (2023-H-0153-HS-R-0545), Qatar University (QU-IRB 1995-E/23), and Alfaisal University (IRB-20270). Informed consent was obtained from all participants online, ensuring their confidentiality and the right to withdraw from the study without any consequences. Participants were informed that all collected data would be anonymous and confidential, with only the principal investigator having access to the data. Completing and submitting the survey was considered an agreement to participate. Study design This study utilized a quantitative cross-sectional research design, involving healthcare professionals (pharmacists, nurses, medical laboratory technologists), university academics, and clinicians from Lebanon, Qatar, and Saudi Arabia. Data was collected through a voluntary, anonymous, private survey to gather PGx per..., , # Integrating pharmacogenomics in three Middle Eastern countries’ healthcare (Lebanon, Qatar, and Saudi Arabia)
Description of the data set: o 1 dataset is included; PGx_database : it includes the raw data of our paper. o In the data set, each row represent one participant. o All the variables can contain empty cells. When participants didn't answer, empty cells were added to show the missing data. o The number in each cell has a specific value depending on the variable.
Listed variables:
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Background Available evidence showed that metabolic syndrome in the adult population is persistently elevated due to nutrition transition, genetic predisposition, individual-related lifestyle factors, and other environmental risks. However, in developing nations, the burden and scientific evidence on the pattern, and risk exposures for the development of the metabolic syndrome were not adequately investigated. Thus, the study aimed to measure the prevalence of metabolic syndrome and to identify specific risk factors among adult populations who visited Dessie Comprehensive Specialized Hospital, Ethiopia. Methods A hospital-based cross-sectional study was conducted among randomly selected 419 adults attending Dessie Comprehensive Specialized Hospital from January 25 to February 29, 2020. We used the WHO STEP-wise approach for non-communicable disease surveillance to assess participants’ disease condition. Metabolic syndrome was measured using the harmonized criteria recommended by the International Diabetes Federation Task Force in 2009. Data were explored for missing values, outliers, and multicollinearity before presenting the summary statistics and regression results. Multivariable logistic regression was used to disentangle statistically significant predictors of metabolic syndrome expressed using an odds ratio with a 95% of uncertainty interval. All statistical tests were managed using SPSS version 26. A non-linear dose-response analysis was performed to show the relationships between metabolic syndrome with potential risk factors. Results The overall prevalence of metabolic syndrome among adults was 35.0 %( 95% CI, (30.5, 39.8)). Women were more affected than men (i.e. 40.3% vs 29.4%). After adjusting for other variables, being female [OR=1.85; 95% CI (1.01, 3.38)], urban residence [OR=1.94; 95% CI (1.08, 3.24)], increased age [OR= 18.23; 95% CI (6.66, 49.84)], shorter sleeping durations [OR= 4.62; 95% CI (1.02, 20.98)], sedentary behaviour[OR=4.05; 95% CI (1.80, 9.11)], obesity[OR=3.14; 95% CI (1.20, 8.18) and alcohol drinking[OR=2.85; 95% CI (1.27,6.39)] were positively associated with the adult metabolic syndrome. Whilst have no formal education [OR=0.30; 95% CI (0.12, 0.74)] was negatively associated with metabolic syndrome. Conclusions The prevalence of the adult metabolic syndrome is found to be high. Metabolic syndrome has linear relationships with BMI, physical activity, sleep duration, and level of education. The demographic and behavioral factors are strongly related to the risk of metabolic syndrome. Since most of the factors are modifiable, there should be urgent large-scale community intervention programs focusing on increased physical activity, healthy sleep, weight management, minimizing behavioral risk factors, and healthier food interventions targeting a lifecycle approach. The existing policy should be evaluated whether due attention has been given to prevention strategies of NCDs. Methods The Data were collected using an interviewer-administered questionnaire, anthropometric measurements, and biochemical profiles of adults attending Dessie Comprehensive Specialized hospital. It was managed using SPSS software to explore missing data, outliers, and logistic regression analysis.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Population and sampleTo find participants for the survey, this study drew from the 4,761 publicly listed members of the online group Awesome Assistants. As a result, the population was all young film/TV professionals, while the sample was the selected members of Awesome Assistants. On its Facebook page, Awesome Assistants allows film/TV professionals to post job openings and work-related questions. The author eliminated respondents younger than 18 or older than 35, and did not include moderators of Awesome Assistants. Members of the group work in the film/TV industry with either an assistant title or performing assistant duties, such as answering phones and running errands. The author randomly selected and contacted 500 individuals from this online group. The author also utilised two follow-up messages to improve the response rate.InstrumentationIn order to collect data from the sample, the author used the Career Decision Self-Efficacy Scale-Short Form (CDSES-SF).Data collection processTo distribute the survey to potential participants, the author sent a letter to the Awesome Assistants moderators to confirm their support. After, the author uploaded a message of informed consent and the survey to Qualtrics, then sent subjects a link to complete the study. Subjects received messages via Facebook Messenger, LinkedIn, or email, based on the contact information available. Once subjects completed the survey, a debriefing form invited them to enter a raffle for one of four $25 Amazon gift cards. After four weeks, the links expired. The author omitted surveys in which the subject did not answer at least one item from each subscale on the CDSES-SF. If respondents did not answer all items in a subscale, the author took the average of the completed questions. Additionally, the author eliminated subjects who fell outside of the target age range, as well as those who did not provide their age or number of contacts in the film/TV industry. Out of the 267 unique responses, the author analyzed 226 subjects as a result.Statistical Analysis ProceduresMuch like data collection, the author ensured that the statistical analysis process was legitimate and insightful. Next, the author entered data into IBM Statistical Product and Service Solutions (SPSS) Statistics 27 and determined what data should be coded as missing. Due to the assumption of linearity not being met by the data, the author used Spearman’s rho instead of a Pearson product-moment correlation. The author declared the results statistically significant if p < .05.
Facebook
Twitterhttps://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de456771https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de456771
Abstract (en): This is the seventh in a series of surveys conducted by the Bureau of the Census. It contains information on state and local public residential facilities operated by the juvenile justice system during the fiscal year 1982. Each data record is classified into one of six categories: (1) detention center, (2) shelter, (3) reception or diagnostic center, (4) training school, (5) ranch, forestry camp, or farm, and (6) halfway house or group home. Data include state, county, and city identification, level of government responsible for the facility, type of agency, agency identification, resident population by sex, age range, detention status, and offense, and admissions and departures of population. Also included in the data are average length of stay, staffing expenditures, capacity of the facility, and programs and services available. ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection: Standardized missing values.; Created online analysis version with question text.; Performed recodes and/or calculated derived variables.; Checked for undocumented or out-of-range codes.. Juvenile detention and correctional facilities operated by state or local governments in the United States in 1982 and 1983. 2007-11-28 Data file was updated to include ready-to-go files and the ASCII codebook was converted to PDF format.2005-11-04 On 2005-03-14 new files were added to one or more datasets. These files included additional setup files as well as one or more of the following: SAS program, SAS transport, SPSS portable, and Stata system files. The metadata record was revised 2005-11-04 to reflect these additions.1997-02-25 SAS data definition statements are now available for this collection and the SPSS data definition statements were updated. Funding insitution(s): United States Department of Justice. Office of Justice Programs. Office of Juvenile Justice and Delinquency Prevention. Conducted by the United States Department of Commerce, Bureau of the Census
Facebook
TwitterSyngenta is committed to increasing crop productivity and to using limited resources such as land, water and inputs more efficiently. Since 2014, Syngenta has been measuring trends in agricultural input efficiency on a global network of real farms. The Good Growth Plan dataset shows aggregated productivity and resource efficiency indicators by harvest year. The data has been collected from more than 4,000 farms and covers more than 20 different crops in 46 countries. The data (except USA data and for Barley in UK, Germany, Poland, Czech Republic, France and Spain) was collected, consolidated and reported by Kynetec (previously Market Probe), an independent market research agency. It can be used as benchmarks for crop yield and input efficiency.
National coverage
Agricultural holdings
Sample survey data [ssd]
A. Sample design Farms are grouped in clusters, which represent a crop grown in an area with homogenous agro- ecological conditions and include comparable types of farms. The sample includes reference and benchmark farms. The reference farms were selected by Syngenta and the benchmark farms were randomly selected by Kynetec within the same cluster.
B. Sample size Sample sizes for each cluster are determined with the aim to measure statistically significant increases in crop efficiency over time. This is done by Kynetec based on target productivity increases and assumptions regarding the variability of farm metrics in each cluster. The smaller the expected increase, the larger the sample size needed to measure significant differences over time. Variability within clusters is assumed based on public research and expert opinion. In addition, growers are also grouped in clusters as a means of keeping variances under control, as well as distinguishing between growers in terms of crop size, region and technological level. A minimum sample size of 20 interviews per cluster is needed. The minimum number of reference farms is 5 of 20. The optimal number of reference farms is 10 of 20 (balanced sample).
C. Selection procedure The respondents were picked randomly using a “quota based random sampling” procedure. Growers were first randomly selected and then checked if they complied with the quotas for crops, region, farm size etc. To avoid clustering high number of interviews at one sampling point, interviewers were instructed to do a maximum of 5 interviews in one village.
BF Screened from Kenya were selected based on the following criterion:
(a) Smallholder potato growers
Location: Gwakiongo, Ol njororok, Wanjohi, Molo
BACKGROUND: Open field potatoes
RF: Flood or drip irrigation BF: No irrigation
Ploughing with a tractor or manually (e.g. with a hoe)
Usage of chemical and/or organic fertilizers
Selling the harvest is the main after harvest activity
(b) Smallholder tomato growers
Location: Kitengela
BACKGROUND: Open field tomatoes
Flood or drip irrigation
Ploughing with a tractor or manually (e.g. with a hoe, a slasher)
Usage of chemical and/or organic fertilizers
Selling the harvest is the main after harvest activity
Face-to-face [f2f]
Data collection tool for 2019 covered the following information:
(A) PRE- HARVEST INFORMATION
PART I: Screening PART II: Contact Information PART III: Farm Characteristics a. Biodiversity conservation b. Soil conservation c. Soil erosion d. Description of growing area e. Training on crop cultivation and safety measures PART IV: Farming Practices - Before Harvest a. Planting and fruit development - Field crops b. Planting and fruit development - Tree crops c. Planting and fruit development - Sugarcane d. Planting and fruit development - Cauliflower e. Seed treatment
(B) HARVEST INFORMATION
PART V: Farming Practices - After Harvest a. Fertilizer usage b. Crop protection products c. Harvest timing & quality per crop - Field crops d. Harvest timing & quality per crop - Tree crops e. Harvest timing & quality per crop - Sugarcane f. Harvest timing & quality per crop - Banana g. After harvest PART VI - Other inputs - After Harvest a. Input costs b. Abiotic stress c. Irrigation
See all questionnaires in external materials tab
Data processing:
Kynetec uses SPSS (Statistical Package for the Social Sciences) for data entry, cleaning, analysis, and reporting. After collection, the farm data is entered into a local database, reviewed, and quality-checked by the local Kynetec agency. In the case of missing values or inconsistencies, farmers are re-contacted. In some cases, grower data is verified with local experts (e.g. retailers) to ensure data accuracy and validity. After country-level cleaning, the farm-level data is submitted to the global Kynetec headquarters for processing. In the case of missing values or inconsistences, the local Kynetec office was re-contacted to clarify and solve issues.
Quality assurance Various consistency checks and internal controls are implemented throughout the entire data collection and reporting process in order to ensure unbiased, high quality data.
• Screening: Each grower is screened and selected by Kynetec based on cluster-specific criteria to ensure a comparable group of growers within each cluster. This helps keeping variability low.
• Evaluation of the questionnaire: The questionnaire aligns with the global objective of the project and is adapted to the local context (e.g. interviewers and growers should understand what is asked). Each year the questionnaire is evaluated based on several criteria, and updated where needed.
• Briefing of interviewers: Each year, local interviewers - familiar with the local context of farming -are thoroughly briefed to fully comprehend the questionnaire to obtain unbiased, accurate answers from respondents.
• Cross-validation of the answers: o Kynetec captures all growers' responses through a digital data-entry tool. Various logical and consistency checks are automated in this tool (e.g. total crop size in hectares cannot be larger than farm size) o Kynetec cross validates the answers of the growers in three different ways: 1. Within the grower (check if growers respond consistently during the interview) 2. Across years (check if growers respond consistently throughout the years) 3. Within cluster (compare a grower's responses with those of others in the group) o All the above mentioned inconsistencies are followed up by contacting the growers and asking them to verify their answers. The data is updated after verification. All updates are tracked.
• Check and discuss evolutions and patterns: Global evolutions are calculated, discussed and reviewed on a monthly basis jointly by Kynetec and Syngenta.
• Sensitivity analysis: sensitivity analysis is conducted to evaluate the global results in terms of outliers, retention rates and overall statistical robustness. The results of the sensitivity analysis are discussed jointly by Kynetec and Syngenta.
• It is recommended that users interested in using the administrative level 1 variable in the location dataset use this variable with care and crosscheck it with the postal code variable.
Due to the above mentioned checks, irregularities in fertilizer usage data were discovered which had to be corrected:
For data collection wave 2014, respondents were asked to give a total estimate of the fertilizer NPK-rates that were applied in the fields. From 2015 onwards, the questionnaire was redesigned to be more precise and obtain data by individual fertilizer products. The new method of measuring fertilizer inputs leads to more accurate results, but also makes a year-on-year comparison difficult. After evaluating several solutions to this problems, 2014 fertilizer usage (NPK input) was re-estimated by calculating a weighted average of fertilizer usage in the following years.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data includes for the first experiment- empathy ratings by participants in response to pictures of losing and winning athletes. Before analyzing our data, we readjusted our scale so that the zero would be in the middle.We ran a paired samples t-test using the program IBM SPSS statistics v23 (2015).
As for the second experiment- we used a statistical analysis to investigate which face and body features predicted the highest and lowest scores of empathy. On a five-point scale (from 1, very sad, to 5, very happy), the participants rated the level of empathy they felt with regard to the images on the screen. Since the dependent variable was ordinal (empathy level 1 to 5), we ran an ordinal cumulative mixed model with fixed effects, by participants and by items (i.e, by pictures). Each predictor had different numbers of levels. To standardize the contrast, we chose a simple coding scheme. The coding scheme was designed to compare the mean of the dependent variable for a given level to the overall mean of that level (e.g., for mouth, neutral/relaxed mouth was the reference level, for body, standing was the reference level). Finally, we ran a post-hoc analysis using the pairwise ordinal paired test using package lsmeans in R. This test calculates the p-values for each independent variable level from the paired cumulative link model.
Facebook
Twitterhttps://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de456669https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de456669
Abstract (en): The 1979 Juvenile Detention and Correctional Facility Census is the sixth in a series of surveys of state and local public residential facilities in the juvenile justice system. There is one record for each juvenile detention facility that had a population of at least 50 percent juveniles. Each record is classified into one of six categories: detention centers or shelters, reception or diagnostic centers, training schools, ranches, forestry camps and farms, and halfway houses and group homes. Data include state, county, and city identification, level of government responsible for the facility, type of agency, agency identification, resident population by sex, age range, detention status, and offense, admissions and departures of population, average length of stay, staffing and expenditures, age and capacity of the facility, and programs and services available. ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection: Standardized missing values.; Created online analysis version with question text.; Performed recodes and/or calculated derived variables.; Checked for undocumented or out-of-range codes.. Juvenile detention and correctional facilities operated by state or local governments. 2007-12-11 The data file was updated to include ready-to-go files and the ASCII codebook was converted to PDF format.2005-11-04 On 2005-03-14 new files were added to one or more datasets. These files included additional setup files as well as one or more of the following: SAS program, SAS transport, SPSS portable, and Stata system files. The metadata record was revised 2005-11-04 to reflect these additions.1997-02-25 SAS data definition statements are now available for this collection and the SPSS data definition statements have been updated. Conducted by the United States Department of Commerce, Bureau of the Census
Facebook
TwitterBy Inder Sethi [source]
This comprehensive District Information System for Education (DISE) dataset collects district-level educational statistics in India and provides the most up-to-date data on the nation's schools. The project tracks and compiles data on primary and upper primary school students, teachers, institutions, infrastructures and more from all districts in India. It has drastically reduced the time lag between data collection to analysis - from seven to eight years down to only a few months at both district and state levels. DISE is fully supported by the Ministry of Human Resource Development (MHRD) as well as UNICEF so precise regional insights are available regarding Indian education standards. With this institutionalized flow of raw data being collected, verified at Block Education Offices/Coordinators then computerized at a District level before eventually being aggregated into State level analysis – it’s easier than ever before to understand where educational improvements need to be made. From tracking key performance indicators amongst students across all ages right through to measuring access teacher resources - this DISE dataset serves as an invaluable resource towards unlocking potential within the Indian learning system!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
Guide: How to Use the Indian District Level School Data 2015-16
Familiarize yourself with the features of this data set. The dataset consists of five columns which provides an overview at district level educational statistics in India for the year 2015-16. Each row contains individual district-level data with corresponding educational information and statistics like Total Number of Schools, Number of Girls' Schools, Enrolment and more for each district in India during that year.
Understand what kind of analysis can be done using this dataset once imported into a statistical software program or spreadsheet program such as Microsoft Excel or Google Sheets. You can use this dataset to analyze many different aspects related to education in India at a district level; including total number of schools, number and percent girls enrolled, teacher qualifications and more across districts throughout all states in India during the year 2015-16 period covered by this data set.
Pull up a visual representation of your data within a statistical program like SPSS or perhaps one online such as Tableau Public, depending on your preference and needs for analysis purposes - either way it is necessary to have these setup beforehand before attempting to import any given subset into them; click upload file option within them (or any other appropriate action), select all files in your local machine directory where you saved our downloaded csv file “report card” from kaggle above – then just wait until it’s completely uploaded after selecting open/import/apply/etc…and if no errors about encoding appear then begin your desired data mining experience (visualization & analytical techniques).
Once inside your preferred visualization environment, try out different methods for analyzing individual rows which correspond directly onto specific districts located inside this geographic territory that are meant by our target sheet observations mentioned prior – refer back often if lost & take time understanding what any given county contributes when computer processing their respective responses accordingly without overlooking any particular variables taken into account unlike secondary “missing values” under consideration also..
Then define relationships between similar items according figures gathered - notice patterns found among these locations while focusing attention isolation instead – graphic qualities captured midst these demographics we choose visualize key representing intent anyways… therefor aim transform knowledge through effective strategy meant enable more meaningful representation ideas presented starting place develops further details follow courtesy
- Analyzing literacy rate and measure the educational advancement of different districts in India.
- Tracking the progress of various Governmental programs like Sarva Shiksha Abhiyan that focus on improving access to education for children across districts.
- Predicting trends in the quality of school resources, educational infrastructure and student performance to guide district-level decision making processes for improved education outcomes
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Note: This version supersedes version 1: https://doi.org/10.15482/USDA.ADC/1522654. In Fall of 2019 the USDA Food and Nutrition Service (FNS) conducted the third Farm to School Census. The 2019 Census was sent via email to 18,832 school food authorities (SFAs) including all public, private, and charter SFAs, as well as residential care institutions, participating in the National School Lunch Program. The questionnaire collected data on local food purchasing, edible school gardens, other farm to school activities and policies, and evidence of economic and nutritional impacts of participating in farm to school activities. A total of 12,634 SFAs completed usable responses to the 2019 Census. Version 2 adds the weight variable, “nrweight”, which is the Non-response weight. Processing methods and equipment used The 2019 Census was administered solely via the web. The study team cleaned the raw data to ensure the data were as correct, complete, and consistent as possible. This process involved examining the data for logical errors, contacting SFAs and consulting official records to update some implausible values, and setting the remaining implausible values to missing. The study team linked the 2019 Census data to information from the National Center of Education Statistics (NCES) Common Core of Data (CCD). Records from the CCD were used to construct a measure of urbanicity, which classifies the area in which schools are located. Study date(s) and duration Data collection occurred from September 9 to December 31, 2019. Questions asked about activities prior to, during and after SY 2018-19. The 2019 Census asked SFAs whether they currently participated in, had ever participated in or planned to participate in any of 30 farm to school activities. An SFA that participated in any of the defined activities in the 2018-19 school year received further questions. Study spatial scale (size of replicates and spatial scale of study area) Respondents to the survey included SFAs from all 50 States as well as American Samoa, Guam, the Northern Mariana Islands, Puerto Rico, the U.S. Virgin Islands, and Washington, DC. Level of true replication Unknown Sampling precision (within-replicate sampling or pseudoreplication) No sampling was involved in the collection of this data. Level of subsampling (number and repeat or within-replicate sampling) No sampling was involved in the collection of this data. Study design (before–after, control–impacts, time series, before–after-control–impacts) None – Non-experimental Description of any data manipulation, modeling, or statistical analysis undertaken Each entry in the dataset contains SFA-level responses to the Census questionnaire for SFAs that responded. This file includes information from only SFAs that clicked “Submit” on the questionnaire. (The dataset used to create the 2019 Farm to School Census Report includes additional SFAs that answered enough questions for their response to be considered usable.) In addition, the file contains constructed variables used for analytic purposes. The file does not include weights created to produce national estimates for the 2019 Farm to School Census Report. The dataset identified SFAs, but to protect individual privacy the file does not include any information for the individual who completed the questionnaire. Description of any gaps in the data or other limiting factors See the full 2019 Farm to School Census Report [https://www.fns.usda.gov/cfs/farm-school-census-and-comprehensive-review] for a detailed explanation of the study’s limitations. Outcome measurement methods and equipment used None Resources in this dataset:Resource Title: 2019 Farm to School Codebook with Weights. File Name: Codebook_Update_02SEP21.xlsxResource Description: 2019 Farm to School Codebook with WeightsResource Title: 2019 Farm to School Data with Weights CSV. File Name: census2019_public_use_with_weight.csvResource Description: 2019 Farm to School Data with Weights CSVResource Title: 2019 Farm to School Data with Weights SAS R Stata and SPSS Datasets. File Name: Farm_to_School_Data_AgDataCommons_SAS_SPSS_R_STATA_with_weight.zipResource Description: 2019 Farm to School Data with Weights SAS R Stata and SPSS Datasets
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Multivariable analysis of the factors associated with neonatal near-miss among neonates admitted in public hospitals of Dire Dawa Administrative, Eastern Ethiopia, 2021.
Facebook
Twitterhttps://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de441876https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de441876
Abstract (en): This collection contains an array of economic time series data pertaining to the United States, the United Kingdom, Germany, and France, primarily between the 1920s and the 1960s, and including some time series from the 18th and 19th centuries. These data were collected by the National Bureau of Economic Research (NBER), and they constitute a research resource of importance to economists as well as to political scientists, sociologists, and historians. Under a grant from the National Science Foundation, ICPSR and the National Bureau of Economic Research converted this collection (which existed heretofore only on handwritten sheets stored in New York) into fully accessible, readily usable, and completely documented machine-readable form. The NBER collection -- containing an estimated 1.6 million entries -- is divided into 16 major categories: (1) construction, (2) prices, (3) security markets, (4) foreign trade, (5) income and employment, (6) financial status of business, (7) volume of transactions, (8) government finance, (9) distribution of commodities, (10) savings and investments, (11) transportation and public utilities, (12) stocks of commodities, (13) interest rates, and (14) indices of leading, coincident, and lagging indicators, (15) money and banking, and (16) production of commodities. Data from all categories are available in Parts 1-22. The economic variables are usually observations on the entire nation or large subsets of the nation. Frequently, however, and especially in the United States, separate regional and metropolitan data are included in other variables. This makes cross-sectional analysis possible in many cases. The time span of variables in these files may be as short as one year or as long as 160 years. Most data pertain to the first half of the 20th century. Many series, however, extend into the 19th century, and a few reach into the 18th. The oldest series, covering brick production in England and Wales, begins in 1785, and the most recent United States data extend to 1968. The unit of analysis is an interval of time -- a year, a quarter, or a month. The bulk of observations are monthly, and most series of monthly data contain annual values or totals. ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection: Performed consistency checks.; Standardized missing values.; Checked for undocumented or out-of-range codes.. Time series of economic statistics pertaining to France, Germany, the United Kingdom, and the United States between 1785 and 1968. 2007-03-26 This study, updated from OSIRIS, now includes SAS, SPSS, and Stata setup files, SAS transport (XPORT) files, SPSS portable files, a Stata system files, and an updated codebook. Funding insitution(s): National Science Foundation. The data were collected between the 1920s and the 1970s, but it is unclear from the documentation as to the exact start and end dates.
Facebook
TwitterSyngenta is committed to increasing crop productivity and to using limited resources such as land, water and inputs more efficiently. Since 2014, Syngenta has been measuring trends in agricultural input efficiency on a global network of real farms. The Good Growth Plan dataset shows aggregated productivity and resource efficiency indicators by harvest year. The data has been collected from more than 4,000 farms and covers more than 20 different crops in 46 countries. The data (except USA data and for Barley in UK, Germany, Poland, Czech Republic, France and Spain) was collected, consolidated and reported by Kynetec (previously Market Probe), an independent market research agency. It can be used as benchmarks for crop yield and input efficiency.
National coverage
Agricultural holdings
Sample survey data [ssd]
A. Sample design Farms are grouped in clusters, which represent a crop grown in an area with homogenous agro- ecological conditions and include comparable types of farms. The sample includes reference and benchmark farms. The reference farms were selected by Syngenta and the benchmark farms were randomly selected by Kynetec within the same cluster.
B. Sample size Sample sizes for each cluster are determined with the aim to measure statistically significant increases in crop efficiency over time. This is done by Kynetec based on target productivity increases and assumptions regarding the variability of farm metrics in each cluster. The smaller the expected increase, the larger the sample size needed to measure significant differences over time. Variability within clusters is assumed based on public research and expert opinion. In addition, growers are also grouped in clusters as a means of keeping variances under control, as well as distinguishing between growers in terms of crop size, region and technological level. A minimum sample size of 20 interviews per cluster is needed. The minimum number of reference farms is 5 of 20. The optimal number of reference farms is 10 of 20 (balanced sample).
C. Selection procedure The respondents were picked randomly using a “quota based random sampling” procedure. Growers were first randomly selected and then checked if they complied with the quotas for crops, region, farm size etc. To avoid clustering high number of interviews at one sampling point, interviewers were instructed to do a maximum of 5 interviews in one village.
BF Screened from Indonesia were selected based on the following criterion:
(a) Corn growers in East Java
- Location: East Java (Kediri and Probolinggo) and Aceh
- Innovative (early adopter); Progressive (keen to learn about agronomy and pests; willing to try new technology); Loyal (loyal to technology that can help them)
- making of technical drain (having irrigation system)
- marketing network for corn: post-harvest access to market (generally they sell 80% of their harvest)
- mid-tier (sub-optimal CP/SE use)
- influenced by fellow farmers and retailers
- may need longer credit
(b) Rice growers in West and East Java
- Location: West Java (Tasikmalaya), East Java (Kediri), Central Java (Blora, Cilacap, Kebumen), South Lampung
- The growers are progressive (keen to learn about agronomy and pests; willing to try new technology)
- Accustomed in using farming equipment and pesticide. (keen to learn about agronomy and pests; willing to try new technology)
- A long rice cultivating experience in his area (lots of experience in cultivating rice)
- willing to move forward in order to increase his productivity (same as progressive)
- have a soil that broad enough for the upcoming project
- have influence in his group (ability to influence others)
- mid-tier (sub-optimal CP/SE use)
- may need longer credit
Face-to-face [f2f]
Data collection tool for 2019 covered the following information:
(A) PRE- HARVEST INFORMATION
PART I: Screening PART II: Contact Information PART III: Farm Characteristics a. Biodiversity conservation b. Soil conservation c. Soil erosion d. Description of growing area e. Training on crop cultivation and safety measures PART IV: Farming Practices - Before Harvest a. Planting and fruit development - Field crops b. Planting and fruit development - Tree crops c. Planting and fruit development - Sugarcane d. Planting and fruit development - Cauliflower e. Seed treatment
(B) HARVEST INFORMATION
PART V: Farming Practices - After Harvest a. Fertilizer usage b. Crop protection products c. Harvest timing & quality per crop - Field crops d. Harvest timing & quality per crop - Tree crops e. Harvest timing & quality per crop - Sugarcane f. Harvest timing & quality per crop - Banana g. After harvest PART VI - Other inputs - After Harvest a. Input costs b. Abiotic stress c. Irrigation
See all questionnaires in external materials tab
Data processing:
Kynetec uses SPSS (Statistical Package for the Social Sciences) for data entry, cleaning, analysis, and reporting. After collection, the farm data is entered into a local database, reviewed, and quality-checked by the local Kynetec agency. In the case of missing values or inconsistencies, farmers are re-contacted. In some cases, grower data is verified with local experts (e.g. retailers) to ensure data accuracy and validity. After country-level cleaning, the farm-level data is submitted to the global Kynetec headquarters for processing. In the case of missing values or inconsistences, the local Kynetec office was re-contacted to clarify and solve issues.
Quality assurance Various consistency checks and internal controls are implemented throughout the entire data collection and reporting process in order to ensure unbiased, high quality data.
• Screening: Each grower is screened and selected by Kynetec based on cluster-specific criteria to ensure a comparable group of growers within each cluster. This helps keeping variability low.
• Evaluation of the questionnaire: The questionnaire aligns with the global objective of the project and is adapted to the local context (e.g. interviewers and growers should understand what is asked). Each year the questionnaire is evaluated based on several criteria, and updated where needed.
• Briefing of interviewers: Each year, local interviewers - familiar with the local context of farming -are thoroughly briefed to fully comprehend the questionnaire to obtain unbiased, accurate answers from respondents.
• Cross-validation of the answers: o Kynetec captures all growers' responses through a digital data-entry tool. Various logical and consistency checks are automated in this tool (e.g. total crop size in hectares cannot be larger than farm size) o Kynetec cross validates the answers of the growers in three different ways: 1. Within the grower (check if growers respond consistently during the interview) 2. Across years (check if growers respond consistently throughout the years) 3. Within cluster (compare a grower's responses with those of others in the group) o All the above mentioned inconsistencies are followed up by contacting the growers and asking them to verify their answers. The data is updated after verification. All updates are tracked.
• Check and discuss evolutions and patterns: Global evolutions are calculated, discussed and reviewed on a monthly basis jointly by Kynetec and Syngenta.
• Sensitivity analysis: sensitivity analysis is conducted to evaluate the global results in terms of outliers, retention rates and overall statistical robustness. The results of the sensitivity analysis are discussed jointly by Kynetec and Syngenta.
• It is recommended that users interested in using the administrative level 1 variable in the location dataset use this variable with care and crosscheck it with the postal code variable.
Due to the above mentioned checks, irregularities in fertilizer usage data were discovered which had to be corrected:
For data collection wave 2014, respondents were asked to give a total estimate of the fertilizer NPK-rates that were applied in the fields. From 2015 onwards, the questionnaire was redesigned to be more precise and obtain data by individual fertilizer products. The new method of measuring fertilizer inputs leads to more accurate results, but also makes a year-on-year comparison difficult. After evaluating several solutions to this problems, 2014 fertilizer usage (NPK input) was re-estimated by calculating a weighted average of fertilizer usage in the following years.
Facebook
TwitterSyngenta is committed to increasing crop productivity and to using limited resources such as land, water and inputs more efficiently. Since 2014, Syngenta has been measuring trends in agricultural input efficiency on a global network of real farms. The Good Growth Plan dataset shows aggregated productivity and resource efficiency indicators by harvest year. The data has been collected from more than 4,000 farms and covers more than 20 different crops in 46 countries. The data (except USA data and for Barley in UK, Germany, Poland, Czech Republic, France and Spain) was collected, consolidated and reported by Kynetec (previously Market Probe), an independent market research agency. It can be used as benchmarks for crop yield and input efficiency.
National Coverage
Agricultural holdings
Sample survey data [ssd]
A. Sample design Farms are grouped in clusters, which represent a crop grown in an area with homogenous agro- ecological conditions and include comparable types of farms. The sample includes reference and benchmark farms. The reference farms were selected by Syngenta and the benchmark farms were randomly selected by Kynetec within the same cluster.
B. Sample size Sample sizes for each cluster are determined with the aim to measure statistically significant increases in crop efficiency over time. This is done by Kynetec based on target productivity increases and assumptions regarding the variability of farm metrics in each cluster. The smaller the expected increase, the larger the sample size needed to measure significant differences over time. Variability within clusters is assumed based on public research and expert opinion. In addition, growers are also grouped in clusters as a means of keeping variances under control, as well as distinguishing between growers in terms of crop size, region and technological level. A minimum sample size of 20 interviews per cluster is needed. The minimum number of reference farms is 5 of 20. The optimal number of reference farms is 10 of 20 (balanced sample).
C. Selection procedure The respondents were picked randomly using a “quota based random sampling” procedure. Growers were first randomly selected and then checked if they complied with the quotas for crops, region, farm size etc. To avoid clustering high number of interviews at one sampling point, interviewers were instructed to do a maximum of 5 interviews in one village.
BF Screened from Paraguay were selected based on the following criterion:
(a) smallholder soybean growers
Medium to high technology farms
Regions: - Hohenau (Itapúa) - Edelira (Itapúa) - Pirapó (Itapúa) - La Paz (Itapúa) - Naranjal (Alto Paraná) - San Cristóbal (Alto Paraná)
corn and soybean in rotation
first grow corn and soybean secondly
(b) smallholder maize growers
Medium to high technology farms
Regions: - Hohenau (Itapúa) - Edelira (Itapúa) - Pirapó (Itapúa) - La Paz (Itapúa) - Naranjal (Alto Paraná) - San Cristóbal (Alto Paraná)
corn and soybean in rotation
first grow corn and soybean secondly
Face-to-face [f2f]
Data collection tool for 2019 covered the following information:
(A) PRE- HARVEST INFORMATION
PART I: Screening PART II: Contact Information PART III: Farm Characteristics a. Biodiversity conservation b. Soil conservation c. Soil erosion d. Description of growing area e. Training on crop cultivation and safety measures PART IV: Farming Practices - Before Harvest a. Planting and fruit development - Field crops b. Planting and fruit development - Tree crops c. Planting and fruit development - Sugarcane d. Planting and fruit development - Cauliflower e. Seed treatment
(B) HARVEST INFORMATION
PART V: Farming Practices - After Harvest a. Fertilizer usage b. Crop protection products c. Harvest timing & quality per crop - Field crops d. Harvest timing & quality per crop - Tree crops e. Harvest timing & quality per crop - Sugarcane f. Harvest timing & quality per crop - Banana g. After harvest PART VI - Other inputs - After Harvest a. Input costs b. Abiotic stress c. Irrigation
See all questionnaires in external materials tab
Data processing:
Kynetec uses SPSS (Statistical Package for the Social Sciences) for data entry, cleaning, analysis, and reporting. After collection, the farm data is entered into a local database, reviewed, and quality-checked by the local Kynetec agency. In the case of missing values or inconsistencies, farmers are re-contacted. In some cases, grower data is verified with local experts (e.g. retailers) to ensure data accuracy and validity. After country-level cleaning, the farm-level data is submitted to the global Kynetec headquarters for processing. In the case of missing values or inconsistences, the local Kynetec office was re-contacted to clarify and solve issues.
B. Quality assurance Various consistency checks and internal controls are implemented throughout the entire data collection and reporting process in order to ensure unbiased, high quality data.
• Screening: Each grower is screened and selected by Kynetec based on cluster-specific criteria to ensure a comparable group of growers within each cluster. This helps keeping variability low.
• Evaluation of the questionnaire: The questionnaire aligns with the global objective of the project and is adapted to the local context (e.g. interviewers and growers should understand what is asked). Each year the questionnaire is evaluated based on several criteria, and updated where needed.
• Briefing of interviewers: Each year, local interviewers - familiar with the local context of farming -are thoroughly briefed to fully comprehend the questionnaire to obtain unbiased, accurate answers from respondents.
• Cross-validation of the answers:
o Kynetec captures all growers' responses through a digital data-entry tool. Various logical and consistency checks are automated in this tool (e.g. total crop size in hectares cannot be larger than farm size)
o Kynetec cross validates the answers of the growers in three different ways:
1. Within the grower (check if growers respond consistently during the interview)
2. Across years (check if growers respond consistently throughout the years)
3. Within cluster (compare a grower's responses with those of others in the group)
o All the above mentioned inconsistencies are followed up by contacting the growers and asking them to verify their answers. The data is updated after verification. All updates are tracked.
• Check and discuss evolutions and patterns: Global evolutions are calculated, discussed and reviewed on a monthly basis jointly by Kynetec and Syngenta.
• Sensitivity analysis: sensitivity analysis is conducted to evaluate the global results in terms of outliers, retention rates and overall statistical robustness. The results of the sensitivity analysis are discussed jointly by Kynetec and Syngenta.
• It is recommended that users interested in using the administrative level 1 variable in the location dataset use this variable with care and crosscheck it with the postal code variable.
Due to the above mentioned checks, irregularities in fertilizer usage data were discovered which had to be corrected:
For data collection wave 2014, respondents were asked to give a total estimate of the fertilizer NPK-rates that were applied in the fields. From 2015 onwards, the questionnaire was redesigned to be more precise and obtain data by individual fertilizer products. The new method of measuring fertilizer inputs leads to more accurate results, but also makes a year-on-year comparison difficult. After evaluating several solutions to this problems, 2014 fertilizer usage (NPK input) was re-estimated by calculating a weighted average of fertilizer usage in the following years.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
GENERAL INFORMATION
Title of Dataset: A dataset from a survey investigating disciplinary differences in data citation
Date of data collection: January to March 2022
Collection instrument: SurveyMonkey
Funding: Alfred P. Sloan Foundation
SHARING/ACCESS INFORMATION
Licenses/restrictions placed on the data: These data are available under a CC BY 4.0 license
Links to publications that cite or use the data:
Gregory, K., Ninkov, A., Ripp, C., Peters, I., & Haustein, S. (2022). Surveying practices of data citation and reuse across disciplines. Proceedings of the 26th International Conference on Science and Technology Indicators. International Conference on Science and Technology Indicators, Granada, Spain. https://doi.org/10.5281/ZENODO.6951437
Gregory, K., Ninkov, A., Ripp, C., Roblin, E., Peters, I., & Haustein, S. (2023). Tracing data:
A survey investigating disciplinary differences in data citation. Zenodo. https://doi.org/10.5281/zenodo.7555266
DATA & FILE OVERVIEW
File List
Additional related data collected that was not included in the current data package: Open ended questions asked to respondents
METHODOLOGICAL INFORMATION
Description of methods used for collection/generation of data:
The development of the questionnaire (Gregory et al., 2022) was centered around the creation of two main branches of questions for the primary groups of interest in our study: researchers that reuse data (33 questions in total) and researchers that do not reuse data (16 questions in total). The population of interest for this survey consists of researchers from all disciplines and countries, sampled from the corresponding authors of papers indexed in the Web of Science (WoS) between 2016 and 2020.
Received 3,632 responses, 2,509 of which were completed, representing a completion rate of 68.6%. Incomplete responses were excluded from the dataset. The final total contains 2,492 complete responses and an uncorrected response rate of 1.57%. Controlling for invalid emails, bounced emails and opt-outs (n=5,201) produced a response rate of 1.62%, similar to surveys using comparable recruitment methods (Gregory et al., 2020).
Methods for processing the data:
Results were downloaded from SurveyMonkey in CSV format and were prepared for analysis using Excel and SPSS by recoding ordinal and multiple choice questions and by removing missing values.
Instrument- or software-specific information needed to interpret the data:
The dataset is provided in SPSS format, which requires IBM SPSS Statistics. The dataset is also available in a coded format in CSV. The Codebook is required to interpret to values.
DATA-SPECIFIC INFORMATION FOR: MDCDataCitationReuse2021surveydata
Number of variables: 94
Number of cases/rows: 2,492
Missing data codes: 999 Not asked
Refer to MDCDatacitationReuse2021Codebook.pdf for detailed variable information.