Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
GENERAL INFORMATION
Title of Dataset: A dataset from a survey investigating disciplinary differences in data citation
Date of data collection: January to March 2022
Collection instrument: SurveyMonkey
Funding: Alfred P. Sloan Foundation
SHARING/ACCESS INFORMATION
Licenses/restrictions placed on the data: These data are available under a CC BY 4.0 license
Links to publications that cite or use the data:
Gregory, K., Ninkov, A., Ripp, C., Peters, I., & Haustein, S. (2022). Surveying practices of data citation and reuse across disciplines. Proceedings of the 26th International Conference on Science and Technology Indicators. International Conference on Science and Technology Indicators, Granada, Spain. https://doi.org/10.5281/ZENODO.6951437
Gregory, K., Ninkov, A., Ripp, C., Roblin, E., Peters, I., & Haustein, S. (2023). Tracing data:
A survey investigating disciplinary differences in data citation. Zenodo. https://doi.org/10.5281/zenodo.7555266
DATA & FILE OVERVIEW
File List
Additional related data collected that was not included in the current data package: Open ended questions asked to respondents
METHODOLOGICAL INFORMATION
Description of methods used for collection/generation of data:
The development of the questionnaire (Gregory et al., 2022) was centered around the creation of two main branches of questions for the primary groups of interest in our study: researchers that reuse data (33 questions in total) and researchers that do not reuse data (16 questions in total). The population of interest for this survey consists of researchers from all disciplines and countries, sampled from the corresponding authors of papers indexed in the Web of Science (WoS) between 2016 and 2020.
Received 3,632 responses, 2,509 of which were completed, representing a completion rate of 68.6%. Incomplete responses were excluded from the dataset. The final total contains 2,492 complete responses and an uncorrected response rate of 1.57%. Controlling for invalid emails, bounced emails and opt-outs (n=5,201) produced a response rate of 1.62%, similar to surveys using comparable recruitment methods (Gregory et al., 2020).
Methods for processing the data:
Results were downloaded from SurveyMonkey in CSV format and were prepared for analysis using Excel and SPSS by recoding ordinal and multiple choice questions and by removing missing values.
Instrument- or software-specific information needed to interpret the data:
The dataset is provided in SPSS format, which requires IBM SPSS Statistics. The dataset is also available in a coded format in CSV. The Codebook is required to interpret to values.
DATA-SPECIFIC INFORMATION FOR: MDCDataCitationReuse2021surveydata
Number of variables: 94
Number of cases/rows: 2,492
Missing data codes: 999 Not asked
Refer to MDCDatacitationReuse2021Codebook.pdf for detailed variable information.
Facebook
TwitterThis dataset originates from a series of experimental studies titled “Tough on People, Tolerant to AI? Differential Effects of Human vs. AI Unfairness on Trust” The project investigates how individuals respond to unfair behavior (distributive, procedural, and interactional unfairness) enacted by artificial intelligence versus human agents, and how such behavior affects cognitive and affective trust.1 Experiment 1a: The Impact of AI vs. Human Distributive Unfairness on TrustOverview: This dataset comes from an experimental study aimed at examining how individuals respond in terms of cognitive and affective trust when distributive unfairness is enacted by either an artificial intelligence (AI) agent or a human decision-maker. Experiment 1a specifically focuses on the main effect of the “type of decision-maker” on trust.Data Generation and Processing: The data were collected through Credamo, an online survey platform. Initially, 98 responses were gathered from students at a university in China. Additional student participants were recruited via Credamo to supplement the sample. Attention check items were embedded in the questionnaire, and participants who failed were automatically excluded in real-time. Data collection continued until 202 valid responses were obtained. SPSS software was used for data cleaning and analysis.Data Structure and Format: The data file is named “Experiment1a.sav” and is in SPSS format. It contains 28 columns and 202 rows, where each row corresponds to one participant. Columns represent measured variables, including: grouping and randomization variables, one manipulation check item, four items measuring distributive fairness perception, six items on cognitive trust, five items on affective trust, three items for honesty checks, and four demographic variables (gender, age, education, and grade level). The final three columns contain computed means for distributive fairness, cognitive trust, and affective trust.Additional Information: No missing data are present. All variable names are labeled in English abbreviations to facilitate further analysis. The dataset can be directly opened in SPSS or exported to other formats.2 Experiment 1b: The Mediating Role of Perceived Ability and Benevolence (Distributive Unfairness)Overview: This dataset originates from an experimental study designed to replicate the findings of Experiment 1a and further examine the potential mediating role of perceived ability and perceived benevolence.Data Generation and Processing: Participants were recruited via the Credamo online platform. Attention check items were embedded in the survey to ensure data quality. Data were collected using a rolling recruitment method, with invalid responses removed in real time. A total of 228 valid responses were obtained.Data Structure and Format: The dataset is stored in a file named Experiment1b.sav in SPSS format and can be directly opened in SPSS software. It consists of 228 rows and 40 columns. Each row represents one participant’s data record, and each column corresponds to a different measured variable. Specifically, the dataset includes: random assignment and grouping variables; one manipulation check item; four items measuring perceived distributive fairness; six items on perceived ability; five items on perceived benevolence; six items on cognitive trust; five items on affective trust; three items for attention check; and three demographic variables (gender, age, and education). The last five columns contain the computed mean scores for perceived distributive fairness, ability, benevolence, cognitive trust, and affective trust.Additional Notes: There are no missing values in the dataset. All variables are labeled using standardized English abbreviations to facilitate reuse and secondary analysis. The file can be analyzed directly in SPSS or exported to other formats as needed.3 Experiment 2a: Differential Effects of AI vs. Human Procedural Unfairness on TrustOverview: This dataset originates from an experimental study aimed at examining whether individuals respond differently in terms of cognitive and affective trust when procedural unfairness is enacted by artificial intelligence versus human decision-makers. Experiment 2a focuses on the main effect of the decision agent on trust outcomes.Data Generation and Processing: Participants were recruited via the Credamo online survey platform from two universities located in different regions of China. A total of 227 responses were collected. After excluding those who failed the attention check items, 204 valid responses were retained for analysis. Data were processed and analyzed using SPSS software.Data Structure and Format: The dataset is stored in a file named Experiment2a.sav in SPSS format and can be directly opened in SPSS software. It contains 204 rows and 30 columns. Each row represents one participant’s response record, while each column corresponds to a specific variable. Variables include: random assignment and grouping; one manipulation check item; seven items measuring perceived procedural fairness; six items on cognitive trust; five items on affective trust; three attention check items; and three demographic variables (gender, age, and education). The final three columns contain computed average scores for procedural fairness, cognitive trust, and affective trust.Additional Notes: The dataset contains no missing values. All variables are labeled using standardized English abbreviations to facilitate reuse and secondary analysis. The file can be directly analyzed in SPSS or exported to other formats as needed.4 Experiment 2b: Mediating Role of Perceived Ability and Benevolence (Procedural Unfairness)Overview: This dataset comes from an experimental study designed to replicate the findings of Experiment 2a and to further examine the potential mediating roles of perceived ability and perceived benevolence in shaping trust responses under procedural unfairness.Data Generation and Processing: Participants were working adults recruited through the Credamo online platform. A rolling data collection strategy was used, where responses failing attention checks were excluded in real time. The final dataset includes 235 valid responses. All data were processed and analyzed using SPSS software.Data Structure and Format: The dataset is stored in a file named Experiment2b.sav, which is in SPSS format and can be directly opened using SPSS software. It contains 235 rows and 43 columns. Each row corresponds to a single participant, and each column represents a specific measured variable. These include: random assignment and group labels; one manipulation check item; seven items measuring procedural fairness; six items for perceived ability; five items for perceived benevolence; six items for cognitive trust; five items for affective trust; three attention check items; and three demographic variables (gender, age, education). The final five columns contain the computed average scores for procedural fairness, perceived ability, perceived benevolence, cognitive trust, and affective trust.Additional Notes: There are no missing values in the dataset. All variables are labeled using standardized English abbreviations to support future reuse and secondary analysis. The dataset can be directly analyzed in SPSS and easily converted into other formats if needed.5 Experiment 3a: Effects of AI vs. Human Interactional Unfairness on TrustOverview: This dataset comes from an experimental study that investigates how interactional unfairness, when enacted by either artificial intelligence or human decision-makers, influences individuals’ cognitive and affective trust. Experiment 3a focuses on the main effect of the “decision-maker type” under interactional unfairness conditions.Data Generation and Processing: Participants were college students recruited from two universities in different regions of China through the Credamo survey platform. After excluding responses that failed attention checks, a total of 203 valid cases were retained from an initial pool of 223 responses. All data were processed and analyzed using SPSS software.Data Structure and Format: The dataset is stored in the file named Experiment3a.sav, in SPSS format and compatible with SPSS software. It contains 203 rows and 27 columns. Each row represents a single participant, while each column corresponds to a specific measured variable. These include: random assignment and condition labels; one manipulation check item; four items measuring interactional fairness perception; six items for cognitive trust; five items for affective trust; three attention check items; and three demographic variables (gender, age, education). The final three columns contain computed average scores for interactional fairness, cognitive trust, and affective trust.Additional Notes: There are no missing values in the dataset. All variable names are provided using standardized English abbreviations to facilitate secondary analysis. The data can be directly analyzed using SPSS and exported to other formats as needed.6 Experiment 3b: The Mediating Role of Perceived Ability and Benevolence (Interactional Unfairness)Overview: This dataset comes from an experimental study designed to replicate the findings of Experiment 3a and further examine the potential mediating roles of perceived ability and perceived benevolence under conditions of interactional unfairness.Data Generation and Processing: Participants were working adults recruited via the Credamo platform. Attention check questions were embedded in the survey, and responses that failed these checks were excluded in real time. Data collection proceeded in a rolling manner until a total of 227 valid responses were obtained. All data were processed and analyzed using SPSS software.Data Structure and Format: The dataset is stored in the file named Experiment3b.sav, in SPSS format and compatible with SPSS software. It includes 227 rows and
Facebook
Twitterhttps://qdr.syr.edu/policies/qdr-standard-access-conditionshttps://qdr.syr.edu/policies/qdr-standard-access-conditions
This is an Annotation for Transparent Inquiry (ATI) data project. The annotated article can be viewed on the Publisher's Website. Data Generation The research project engages a story about perceptions of fairness in criminal justice decisions. The specific focus involves a debate between ProPublica, a news organization, and Northpointe, the owner of a popular risk tool called COMPAS. ProPublica wrote that COMPAS was racist against blacks, while Northpointe posted online a reply rejecting such a finding. These two documents were the obvious foci of the qualitative analysis because of the further media attention they attracted, the confusion their competing conclusions caused readers, and the power both companies wield in public circles. There were no barriers to retrieval as both documents have been publicly available on their corporate websites. This public access was one of the motivators for choosing them as it meant that they were also easily attainable by the general public, thus extending the documents’ reach and impact. Additional materials from ProPublica relating to the main debate were also freely downloadable from its website and a third party, open source platform. Access to secondary source materials comprising additional writings from Northpointe representatives that could assist in understanding Northpointe’s main document, though, was more limited. Because of a claim of trade secrets on its tool and the underlying algorithm, it was more difficult to reach Northpointe’s other reports. Nonetheless, largely because its clients are governmental bodies with transparency and accountability obligations, some of Northpointe-associated reports were retrievable from third parties who had obtained them, largely through Freedom of Information Act queries. Together, the primary and (retrievable) secondary sources allowed for a triangulation of themes, arguments, and conclusions. The quantitative component uses a dataset of over 7,000 individuals with information that was collected and compiled by ProPublica and made available to the public on github. ProPublica’s gathering the data directly from criminal justice officials via Freedom of Information Act requests rendered the dataset in the public domain, and thus no confidentiality issues are present. The dataset was loaded into SPSS v. 25 for data analysis. Data Analysis The qualitative enquiry used critical discourse analysis, which investigates ways in which parties in their communications attempt to create, legitimate, rationalize, and control mutual understandings of important issues. Each of the two main discourse documents was parsed on its own merit. Yet the project was also intertextual in studying how the discourses correspond with each other and to other relevant writings by the same authors. Several more specific types of discursive strategies were of interest in attracting further critical examination: Testing claims and rationalizations that appear to serve the speaker’s self-interest Examining conclusions and determining whether sufficient evidence supported them Revealing contradictions and/or inconsistencies within the same text and intertextually Assessing strategies underlying justifications and rationalizations used to promote a party’s assertions and arguments Noticing strategic deployment of lexical phrasings, syntax, and rhetoric Judging sincerity of voice and the objective consideration of alternative perspectives Of equal importance in a critical discourse analysis is consideration of what is not addressed, that is to uncover facts and/or topics missing from the communication. For this project, this included parsing issues that were either briefly mentioned and then neglected, asserted yet the significance left unstated, or not suggested at all. This task required understanding common practices in the algorithmic data science literature. The paper could have been completed with just the critical discourse analysis. However, because one of the salient findings from it highlighted that the discourses overlooked numerous definitions of algorithmic fairness, the call to fill this gap seemed obvious. Then, the availability of the same dataset used by the parties in conflict, made this opportunity more appealing. Calculating additional algorithmic equity equations would not thereby be troubled by irregularities because of diverse sample sets. New variables were created as relevant to calculate algorithmic fairness equations. In addition to using various SPSS Analyze functions (e.g., regression, crosstabs, means), online statistical calculators were useful to compute z-test comparisons of proportions and t-test comparisons of means. Logic of Annotation Annotations were employed to fulfil a variety of functions, including supplementing the main text with context, observations, counter-points, analysis, and source attributions. These fall under a few categories. Space considerations. Critical discourse analysis offers a rich method...
Facebook
Twitterhttps://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
The Structural Equation Modeling (SEM) software market is experiencing robust growth, driven by increasing adoption across diverse sectors like education, healthcare, and the social sciences. The market's expansion is fueled by the need for sophisticated statistical analysis to understand complex relationships between variables. Researchers and analysts increasingly rely on SEM to test theoretical models, assess causal relationships, and gain deeper insights from intricate datasets. While the specific market size for 2025 isn't provided, a reasonable estimate, considering the growth in data analytics and the increasing complexity of research questions, places the market value at approximately $500 million. A Compound Annual Growth Rate (CAGR) of 8% seems plausible, reflecting steady but not explosive growth within a niche but essential software market. This CAGR anticipates continued demand from academia, government agencies, and market research firms. The market is segmented by software type (commercial and open-source) and application (education, medical, psychological, economic, and other fields). Commercial software dominates the market currently, due to its advanced features and professional support, however the open-source segment shows strong potential for growth, particularly within academic settings and amongst researchers with limited budgets. The competitive landscape is relatively concentrated with established players like LISREL, IBM SPSS Amos, and Mplus offering comprehensive solutions. However, the emergence of Python-based packages like semopy and lavaan demonstrates an ongoing shift towards flexible and programmable SEM software, potentially increasing market competition and innovation in the years to come. Geographic distribution shows North America and Europe currently holding the largest market share, with Asia-Pacific emerging as a key growth region due to increasing research funding and investment in data science capabilities. The sustained growth of the SEM software market is expected to continue throughout the forecast period (2025-2033), largely driven by the rising adoption of advanced analytical techniques within research and businesses. Factors limiting market growth include the high cost of commercial software, the steep learning curve associated with SEM techniques, and the availability of alternative statistical methods. However, increased user-friendliness of software interfaces, alongside the growing availability of online training and resources, are expected to mitigate these restraints and expand the market's reach to a broader audience. Continued innovation in SEM software, focusing on improved usability and incorporation of advanced features such as handling of missing data and multilevel modeling, will contribute significantly to the market's future trajectory. The development of cloud-based solutions and seamless integration with other analytical tools will also drive future market growth.
Facebook
TwitterData from: Doctoral dissertation; Preprint article entitled: Managers' and physicians’ perception of palm vein technology adoption in the healthcare industry. Formats of the files associated with dataset: CSV; SAV. SPSS setup files can be used to generate native SPSS file formats such as SPSS system files and SPSS portable files. SPSS setup files generally include the following SPSS sections: DATA LIST: Assigns the name, type, decimal specification (if any), and specifies the beginning and ending column locations for each variable in the data file. Users must replace the "physical-filename" with host computer-specific input file specifications. For example, users on Windows platforms should replace "physical-filename" with "C:\06512-0001-Data.txt" for the data file named "06512-0001-Data.txt" located on the root directory "C:". VARIABLE LABELS: Assigns descriptive labels to all variables. Variable labels and variable names may be identical for some variables. VALUE LABELS: Assigns descriptive labels to codes in the data file. Not all variables necessarily have assigned value labels. MISSING VALUES: Declares user-defined missing values. Not all variables in the data file necessarily have user-defined missing values. These values can be treated specially in data transformations, statistical calculations, and case selection. MISSING VALUE RECODE: Sets user-defined numeric missing values to missing as interpreted by the SPSS system. Only variables with user-defined missing values are included in the statements. ABSTRACT: The purpose of the article is to examine the factors that influence the adoption of palm vein technology by considering the healthcare managers’ and physicians’ perception, using the Unified Theory of Acceptance and Use of Technology theoretical foundation. A quantitative approach was used for this study through which an exploratory research design was utilized. A cross-sectional questionnaire was distributed to responders who were managers and physicians in the healthcare industry and who had previous experience with palm vein technology. The perceived factors tested for correlation with adoption were perceived usefulness, complexity, security, peer influence, and relative advantage. A Pearson product-moment correlation coefficient was used to test the correlation between the perceived factors and palm vein technology. The results showed that perceived usefulness, security, and peer influence are important factors for adoption. Study limitations included purposive sampling from a single industry (healthcare) and limited literature was available with regard to managers’ and physicians’ perception of palm vein technology adoption in the healthcare industry. Researchers could focus on an examination of the impact of mediating variables on palm vein technology adoption in future studies. The study offers managers insight into the important factors that need to be considered in adopting palm vein technology. With biometric technology becoming pervasive, the study seeks to provide managers with the insight in managing the adoption of palm vein technology. KEYWORDS: biometrics, human identification, image recognition, palm vein authentication, technology adoption, user acceptance, palm vein technology
Facebook
TwitterBackground and Objectives: Pharmacogenomics (PGx) leverages genomic information to tailor drug therapies, enhancing precision medicine. Despite global advancements, its implementation in Lebanon, Qatar, and Saudi Arabia faces unique challenges in clinical integration. This study aimed to investigate PGx attitudes, knowledge implementation, associated challenges, forecast future educational needs, and compare findings across the three countries. Methods: This cross-sectional study utilized an anonymous, self-administered online survey distributed to healthcare professionals, academics, and clinicians in Lebanon, Qatar, and Saudi Arabia. The survey comprised 18 questions to assess participants' familiarity with PGx, current implementation practices, perceived obstacles, potential integration strategies, and future educational needs. Results: The survey yielded 337 responses from healthcare professionals across the three countries. Data revealed significant variations in PGx familiarity an..., Ethical statement and informed consent Ethical approval for this study was obtained from the institutional review boards of the participating universities: Beirut Arab University (2023-H-0153-HS-R-0545), Qatar University (QU-IRB 1995-E/23), and Alfaisal University (IRB-20270). Informed consent was obtained from all participants online, ensuring their confidentiality and the right to withdraw from the study without any consequences. Participants were informed that all collected data would be anonymous and confidential, with only the principal investigator having access to the data. Completing and submitting the survey was considered an agreement to participate. Study design This study utilized a quantitative cross-sectional research design, involving healthcare professionals (pharmacists, nurses, medical laboratory technologists), university academics, and clinicians from Lebanon, Qatar, and Saudi Arabia. Data was collected through a voluntary, anonymous, private survey to gather PGx per..., , # Integrating pharmacogenomics in three Middle Eastern countries’ healthcare (Lebanon, Qatar, and Saudi Arabia)
Description of the data set: o 1 dataset is included; PGx_database : it includes the raw data of our paper. o In the data set, each row represent one participant. o All the variables can contain empty cells. When participants didn't answer, empty cells were added to show the missing data. o The number in each cell has a specific value depending on the variable.
Listed variables:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Background and purposeHydrocephalus is a frequent complication following subarachnoid hemorrhage. Few studies investigated the association between laboratory parameters and shunt-dependent hydrocephalus. This study aimed to investigate the variations of laboratory parameters after subarachnoid hemorrhage. We also attempted to identify predictive laboratory parameters for shunt-dependent hydrocephalus.MethodsMultiple imputation was performed to fill the missing laboratory data using Bayesian methods in SPSS. We used univariate and multivariate Cox regression analyses to calculate hazard ratios for shunt-dependent hydrocephalus based on clinical and laboratory factors. The area under the receiver operating characteristic curve was used to determine the laboratory risk values predicting shunt-dependent hydrocephalus.ResultsWe included 181 participants with a mean age of 54.4 years. Higher sodium (hazard ratio, 1.53; 95% confidence interval, 1.13–2.07; p = 0.005), lower potassium, and higher glucose levels were associated with higher shunt-dependent hydrocephalus. The receiver operating characteristic curve analysis showed that the areas under the curve of sodium, potassium, and glucose were 0.649 (cutoff value, 142.75 mEq/L), 0.609 (cutoff value, 3.04 mmol/L), and 0.664 (cutoff value, 140.51 mg/dL), respectively.ConclusionsDespite the exploratory nature of this study, we found that higher sodium, lower potassium, and higher glucose levels were predictive values for shunt-dependent hydrocephalus from postoperative day (POD) 1 to POD 12–16 after subarachnoid hemorrhage. Strict correction of electrolyte imbalance seems necessary to reduce shunt-dependent hydrocephalus. Further large studies are warranted to confirm our findings.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundIn sub-Saharan Africa countries including Ghana, the malaria burden remains unacceptably high and still a serious health challenge. Evaluating a community’s level of knowledge, attitude, and practice (KAP) regarding malaria is essential to enabling appropriate preventive and control measures. This study aimed to evaluate knowledge of malaria, attitudes toward the disease, and adoption of control and prevention practices in some communities across the Eastern Region of Ghana.MethodsA cross‑sectional based study was carried out in 13 communities across 8 districts from January -June, 2020. Complete data on socio-demographic characteristics and KAP were obtained from 316 randomly selected household respondents by a structured pre-tested questionnaire. Associations between KAP scores and socio-demographic profiles were tested by Chi-square and binary logistic regression. Data analysis was done with SPSS version 26.0.ResultsMost respondents (85.4%) had good knowledge score about malaria. Preferred choice of treatment seeking place (50.6%) was the health center/clinic. All respondents indicated they would seek treatment within 24 hours. Mosquito coils were the preferred choice (58.9%) against mosquito bites. Majority of households (58.5%) had no bed nets and bed net usage was poor (10.1%). Nearly half of the respondents (49.4%) had a positive attitude toward malaria and 40.5% showed good practices. Chi-square analysis showed significant associations for gender and attitude scores (p = 0.033), and educational status and practice scores (p = 0.023). Binary logistic regression analysis showed that 51–60 year-olds were less likely to have good knowledge (OR = 0.20, p = 0.04) than 15–20 year-olds. Respondents with complete basic schooling were less likely to have good knowledge (OR = 0.33, p = 0.04) than those with no formal schooling. A positive attitude was less likely in men (OR = 0.61, p = 0.04). Good malaria prevention practice was lower (OR = 0.30, p = 0.01) in participants with incomplete basic school education compared to those with no formal schooling.ConclusionOverall scores for respondents’ knowledge, though good, was not reflected in attitudes and levels of practice regarding malaria control and prevention. Behavioral change communication, preferably on radio, should be aimed at attitudes and practice toward the disease.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset is an expanded version of the popular "Sample - Superstore Sales" dataset, commonly used for introductory data analysis and visualization. It contains detailed transactional data for a US-based retail company, covering orders, products, and customer information.
This version is specifically designed for practicing Data Quality (DQ) and Data Wrangling skills, featuring a unique set of real-world "dirty data" problems (like those encountered in tools like SPSS Modeler, Tableau Prep, or Alteryx) that must be cleaned before any analysis or machine learning can begin.
This dataset combines the original Superstore data with 15,000 plausibly generated synthetic records, totaling 25,000 rows of transactional data. It includes 21 columns detailing: - Order Information: Order ID, Order Date, Ship Date, Ship Mode. - Customer Information: Customer ID, Customer Name, Segment. - Geographic Information: Country, City, State, Postal Code, Region. - Product Information: Product ID, Category, Sub-Category, Product Name. - Financial Metrics: Sales, Quantity, Discount, and Profit.
This dataset is intentionally corrupted to provide a robust practice environment for data cleaning. Challenges include: Missing/Inconsistent Values: Deliberate gaps in Profit and Discount, and multiple inconsistent entries (-- or blank) in the Region column.
Data Type Mismatches: Order Date and Ship Date are stored as text strings, and the Profit column is polluted with comma-formatted strings (e.g., "1,234.56"), forcing the entire column to be read as an object (string) type.
Categorical Inconsistencies: The Category field contains variations and typos like "Tech", "technologies", "Furni", and "OfficeSupply" that require standardization.
Outliers and Invalid Data: Extreme outliers have been added to the Sales and Profit fields, alongside a subset of transactions with an invalid Sales value of 0.
Duplicate Records: Over 200 rows are duplicated (with slight financial variations) to test your deduplication logic.
This dataset is ideal for:
Data Wrangling/Cleaning (Primary Focus): Fix all the intentional data quality issues before proceeding.
Exploratory Data Analysis (EDA): Analyze sales distribution by region, segment, and category.
Regression: Predict the Profit based on Sales, Discount, and product features.
Classification: Build an RFM Model (Recency, Frequency, Monetary) and create a target variable (HighValueCustomer = 1 if total sales are* $>$ $1000$*) to be predicted by logistical regression or decision trees.
Time Series Analysis: Aggregate sales by month/year to perform forecasting.
This dataset is an expanded and corrupted derivative of the original Sample Superstore dataset, credited to Tableau and widely shared for educational purposes. All synthetic records were generated to follow the plausible distribution of the original data.
Facebook
TwitterAs the UK went into the first lockdown of the COVID-19 pandemic, the team behind the biggest social survey in the UK, Understanding Society (UKHLS), developed a way to capture these experiences. From April 2020, participants from this Study were asked to take part in the Understanding Society COVID-19 survey, henceforth referred to as the COVID-19 survey or the COVID-19 study.
The COVID-19 survey regularly asked people about their situation and experiences. The resulting data gives a unique insight into the impact of the pandemic on individuals, families, and communities. The COVID-19 Teaching Dataset contains data from the main COVID-19 survey in a simplified form. It covers topics such as
The resource contains two data files:
Key features of the dataset
A full list of variables in both files can be found in the User Guide appendix.
Who is in the sample?
All adults (16 years old and over as of April 2020), in households who had participated in at least one of the last two waves of the main study Understanding Society, were invited to participate in this survey. From the September 2020 (Wave 5) survey onwards, only sample members who had completed at least one partial interview in any of the first four web surveys were invited to participate. From the November 2020 (Wave 6) survey onwards, those who had only completed the initial survey in April 2020 and none since, were no longer invited to participate
The User guide accompanying the data adds to the information here and includes a full variable list with details of measurement levels and links to the relevant questionnaire.
Facebook
Twitterhttp://rdm.uva.nl/en/support/confidential-data.htmlhttp://rdm.uva.nl/en/support/confidential-data.html
This database represents data from 480 respondents with the purpose to measure their intervention choice in community care with the instrument AICN (Assessment of Intervention choice in Community Nursing). Data collection took place at the Faculty of Health of the Amsterdam University of Applied Sciences, the Netherlands. The respondents are all baccalaureate nursing students in the fourth year of study close to graduation. Data were collected at three timepoints: around May 2016 (group 1215), May 2017 (group 1316) and May 2018 (group 1417). The student cohorts 1215 and 1316 form a historical control group, 1417 is the intervention group. The intervention group underwent a new four-year more ‘community-oriented’ curriculum, with five new curriculum themes related to caregiving in peoples own homes: (1) fostering patient self-management, (2) shared decision-making, (3) collaboration with the patients’ social system, (4) using healthcare technology, and (5) allocation of care.The aim of this study is to investigate the effect of this redesigned baccalaureate nursing curriculum on students’ intervention choice in community care. AICN is a measuring instrument containing three vignettes in which a situation in caregiving in the patients’ home is described. Each vignette incorporates all five new curriculum themes. The interventions with regard to each theme are a realistic option, while more ‘traditional’ intervention choices are also possible. To avoid students responding in a way they think to be correct, they are not aware of the instrument’s underlying purpose (i.e., determining the five themes). After reading each vignette, the respondents briefly formulate five, in their opinion, most suitable interventions for nursing caregiving. The fifteen interventions yield qualitative information. To allow for quantitative data analysis, the AICN includes a codebook describing the criteria used to recode each of the qualitative intervention descriptions into a quantitative value. As the manuscript describing AICN and codebook is yet under review, a link to the instrument will be added after publication. Filesets:1: SPSS file – 3 cohorts AICN without student numbers2: SPSS syntax file Variables in SPSS file (used in analysis):1: Cohort type2: Curriculum type (old vs. new)3-20: Dummy variables of demographics21-35: CSINV refers to case/intervention; CS1INV2 means case 1, intervention 236-50: Dummy variables of 21-35, representing the main outcome old vs. new intervention type51: Sum of dummy variables (range 1-15) representing the primary outcome AICN52: Sum of dummys like 51, but with respondents with missing variables included, used in the regression analysis53-58: Count the number of chosen interventions per curriculum theme59-60: Count missings (old curriculum = 59, new = 60)61-62: Count no intervention theme (old curriculum = 61, new = 62)ContactBecause of the sensitive nature of the data, the fileset is confidential and will be shared only under strict conditions. For more information contact opensciencesupport@hva.nl
Facebook
TwitterPitch-elevation crossmodal correspondences data sheet
Excel spreadsheet with dog ID, looking times for each trial (in s), times tracing the stimulus (in s), proportion of time spent looking as % of total looking times
CM1 Elevation data NEW.xlsx
There is no missing data in this dataset. The following models were run using SPSS 25:
1) Linear Mixed Model (LMM) testing the effect of congruency of the visual stimulus (congruent) and the order of presentation (order_in_pres) on the time spent tracing the stimulus (trace_stim) as fixed effects with dog ID as a random effect.
2) Linear Mixed Model (LMM) testing the effect of congruency of the visual stimulus (congruent) and the order of presentation (order_in_pres) on duration of looking at the stimulus (duration) as fixed effects with dog ID as a random effect.
3) Linear Mixed Model (LMM) testing the effect of congruency of the visual stimulus (congruent) and the order of pr...
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This bundle contains supplementary materials for an upcoming academic publication A Theory of Scrum Team Effectiveness, by Christiaan Verwijs and Daniel Russo. Included in the bundle are the dataset, SPSS syntaxes, and model definitions (AMOS). This replication package is made available by C. Verwijs under a "Creative Commons Attribution Non-Commercial Share-Alike 4.0 International"-license (CC-BY-NC-SA 4.0).
About the dataset
The dataset (SPSS) contains anonymized response data from 4.940 respondents from 1.978 Scrum Teams that participated from the https://scrumteamsurvey.org. Data was gathered between June 3, 2020, and October 13, 2021. We cleaned the individual response data from careless responses and removed all data that could potentially identify teams, individuals, or their parent organizations. Because we wanted to analyze our measures at the team level, we calculated a team level mean for each item in the survey. Such aggregation is only justified when at least 10% of the variance exists at the team level (Hair, 2019), which was the case (ICC = 51%). Because the percentage of missing data was modest, and to prevent list-wise deletion of cases and lose information, we performed EM maximum likelihood imputation in SPSS.
The dataset contains question labels and answer option definitions. To conform to the privacy statement of scrumteamsurvey.org, the bundle does not include individual response data from before the team-level aggregation.
About the model definitions
The bundle includes definitions for Structural Equation Models (SEM) for AMOS. We added the four iterations of the measurement model, four models used to perform a test for common method bias, the final path model, and the model used for mediation testing. Mediation testing was performed with the procedure outlined by Podsakoff (2003). Mediation testing was performed with the "Indirect Effects" plugin for AMOS by James Gaskin.
About the SPSS syntaxes
The bundle includes the syntaxes we used to prepare the dataset from the raw import, as well as the syntax we used to generate descriptives. This is mostly there for other researchers to verify our procedure.
Facebook
Twitterhttps://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de444364https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de444364
Abstract (en): The major focus of this Euro-Barometer is the respondent's knowledge of and attitudes toward the nations of the Third World. Topics covered include the culture and customs of these nations, the existence of poverty and hunger, and the respondent's opinions on how best to provide assistance to Third World countries. Individuals answered questions on social and political conditions as well as on the level of economic development in these countries. Additionally, respondents were asked to assess the state of relations between the respondent's country and various Third World nations. Another focus of this data collection concerns energy problems and resources in the countries of the European Economic Community. Respondents were asked to choose which regions of the world are considered to be reliable suppliers of fossil fuel for the future and to evaluate the risks that various industrial installations such as chemical and nuclear power plants pose to people living nearby. Respondents were also asked about solutions to the need for additional energy supplies in the future. Possible solutions included the development or continued development of nuclear power, the encouragement of research into producing renewable energy sources such as solar energy, and the conservation of energy. As in previous surveys in this series, respondents' attitudes toward the Community, life satisfaction, and social goals continued to be monitored. The survey also asked each individual to assess the advantages and disadvantages of the creation of a single common European market and whether they approved or disapproved of current efforts to unify western Europe. In addition, the respondent's political orientation, outlook for the future, and socioeconomic and demographic characteristics were probed. Please review the "Weighting Information" section located in the ICPSR codebook for this Eurobarometer study. ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection: Checked for undocumented or out-of-range codes.. Persons aged 15 and over residing in the 12 member nations of the European Community: Belgium, Denmark, France, Greece, Ireland, Italy, Luxembourg, the Netherlands, Portugal, Spain, United Kingdom, and West Germany (including West Berlin). Smallest Geographic Unit: country Multistage probability samples and stratified quota samples. 2009-04-13 The data have been further processed by GESIS-ZA, and the codebook, questionnaire, and SPSS setup files have been updated. Also, SAS and Stata setup files, SPSS and Stata system files, a SAS transport (CPORT) file, and a tab-delimited ASCII data file have been added. Funding insitution(s): National Science Foundation (SES 85-12100 and SES 88-09098). The original data collection was carried out by Faits et Opinions on request of the Commission of the European Communities.The GESIS-ZA study number for this collection is ZA1713, as it does not appear in the data.References to OSIRIS, card-image, and SPSS control cards in the ICPSR codebook for this study are no longer applicable as the data have not been provided in OSIRIS or card-image file formats.Please disregard any reference to column locations, width, or deck in the ICPSR codebook and questionnaire files as they are not applicable to the ICPSR-produced data file. Correct column locations and LRECL for the ICPSR-produced data file can be found in the SPSS and SAS setup files, and Stata dictionary file. The full-product suite of files produced by ICPSR have originated from an SPSS portable file provided by the data producer.Question numbering for Eurobarometer 28 is as follows: Q128-Q180, Q211-Q280, Q313-Q359, and Q60-Q80 (demographic questions). Some question numbers are intentionally skipped, however neither questions nor data are missing.For country-specific categories, filter information, and other remarks, please see the corresponding variable documentation in the ICPSR codebook.V465 (VOTE INTENTION - DENMARK): Danish respondents who declared for political party "Venstre" had been coded as falling into the missing value category during the raw data processing for Eurobarometer 28. The original coding for Eurobarome...
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The data were preprocessed by using IBM SPSS 19.0 software to conduct descriptive statistical and correlation analyses on 540 participants. The community dataset was complete and without missing values. Network model estimation, establishment, and centrality index calculation were then performed. The network was estimated using the EBICglasso function in the qgraph software package (Version 1.9.3; Epskamp et al., 2012) in R (Version 4.1.3, RCore Team, 2022). The Glasso network was used to calculate a partial correlation network, in which the relationship between symptoms can explain all other relationships in the model; each item is represented as a node, and the association between items is referred to as the edge.
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Note: This version supersedes version 1: https://doi.org/10.15482/USDA.ADC/1522654. In Fall of 2019 the USDA Food and Nutrition Service (FNS) conducted the third Farm to School Census. The 2019 Census was sent via email to 18,832 school food authorities (SFAs) including all public, private, and charter SFAs, as well as residential care institutions, participating in the National School Lunch Program. The questionnaire collected data on local food purchasing, edible school gardens, other farm to school activities and policies, and evidence of economic and nutritional impacts of participating in farm to school activities. A total of 12,634 SFAs completed usable responses to the 2019 Census. Version 2 adds the weight variable, “nrweight”, which is the Non-response weight. Processing methods and equipment used The 2019 Census was administered solely via the web. The study team cleaned the raw data to ensure the data were as correct, complete, and consistent as possible. This process involved examining the data for logical errors, contacting SFAs and consulting official records to update some implausible values, and setting the remaining implausible values to missing. The study team linked the 2019 Census data to information from the National Center of Education Statistics (NCES) Common Core of Data (CCD). Records from the CCD were used to construct a measure of urbanicity, which classifies the area in which schools are located. Study date(s) and duration Data collection occurred from September 9 to December 31, 2019. Questions asked about activities prior to, during and after SY 2018-19. The 2019 Census asked SFAs whether they currently participated in, had ever participated in or planned to participate in any of 30 farm to school activities. An SFA that participated in any of the defined activities in the 2018-19 school year received further questions. Study spatial scale (size of replicates and spatial scale of study area) Respondents to the survey included SFAs from all 50 States as well as American Samoa, Guam, the Northern Mariana Islands, Puerto Rico, the U.S. Virgin Islands, and Washington, DC. Level of true replication Unknown Sampling precision (within-replicate sampling or pseudoreplication) No sampling was involved in the collection of this data. Level of subsampling (number and repeat or within-replicate sampling) No sampling was involved in the collection of this data. Study design (before–after, control–impacts, time series, before–after-control–impacts) None – Non-experimental Description of any data manipulation, modeling, or statistical analysis undertaken Each entry in the dataset contains SFA-level responses to the Census questionnaire for SFAs that responded. This file includes information from only SFAs that clicked “Submit” on the questionnaire. (The dataset used to create the 2019 Farm to School Census Report includes additional SFAs that answered enough questions for their response to be considered usable.) In addition, the file contains constructed variables used for analytic purposes. The file does not include weights created to produce national estimates for the 2019 Farm to School Census Report. The dataset identified SFAs, but to protect individual privacy the file does not include any information for the individual who completed the questionnaire. Description of any gaps in the data or other limiting factors See the full 2019 Farm to School Census Report [https://www.fns.usda.gov/cfs/farm-school-census-and-comprehensive-review] for a detailed explanation of the study’s limitations. Outcome measurement methods and equipment used None Resources in this dataset:Resource Title: 2019 Farm to School Codebook with Weights. File Name: Codebook_Update_02SEP21.xlsxResource Description: 2019 Farm to School Codebook with WeightsResource Title: 2019 Farm to School Data with Weights CSV. File Name: census2019_public_use_with_weight.csvResource Description: 2019 Farm to School Data with Weights CSVResource Title: 2019 Farm to School Data with Weights SAS R Stata and SPSS Datasets. File Name: Farm_to_School_Data_AgDataCommons_SAS_SPSS_R_STATA_with_weight.zipResource Description: 2019 Farm to School Data with Weights SAS R Stata and SPSS Datasets
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This syntax file enables the user to reproduce the calculation of the Imitation Index scores using the original SPSS data file, and to also reproduce the analyses reported in the current article.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Background [Extract from Related publication]:
Homicide numbers are relatively low in Queensland, less than one incident per week, and in most cases the victim has been located at the scene of the crime. However, there are approximately 2.5% of victims that have never been located. This has flow on consequences such as difficulty in proving death and then murder by the prosecution, difficulty in gathering forensic evidence when a victim cannot be located and the grief experienced by the co-victims, family and friends who have no closure. There have been limited studies on the disposal of homicide victims, mostly related to sexual serial or familial killings in the United States of America, Canada and Finland (Beauregard & Field, 2008; Beauregard & Martineau, 2014; DiBiase, 2015; Ferguson & Pooley, 2019; Häkkänen, Hurme & Liukkonen, 2007; Lundrigan & Canter, 2001; Nethery, 2004)
Methods [Extract]:
There was a single source of Queensland homicide data, the Queensland Police Records and Information Exchange (QPRIME). QPRIME is the sole repository of all information pertaining to crime within the state. Permission was obtained from the Queensland Police Service to access the demographic data of all homicide incidents between 2004 and 2020. Within the data it was identified that 149 homicide victims had been moved (disposed) from where they were murdered, and of this number seventeen had never been located. The data relates to the demographics of both the victim and offender in those incidents where a homicide victim has been moved from where they were murdered. This includes the sex, height and weight of both victim and offender, method of homicide, distances moved from scene, method of transport, method of concealment and how these victims had been found in the past. No Queensland homicide incidents were excluded from this study.
The data for the non Queensland homicide victims was located in the National Missing Person Register and The Red Heart Campaign. The collection of the demographics was identical to the initial Queensland data and was stored in a parallel MS Excel sheet. Of the non Queensland homicide cases, 149 disposed homicide victim incidents were located, although all of these victims had been located.
A statistical analysis, using IBM SPSS v26, of the data was undertaken, leading to the development of the Disposed Homicide Victim Matrix (DHVM).
The DHVM has provided police search coordinators with the statistical information on victim disposal directions, distances, locations, concealment methods and type of searching required. This has contributed to seven victims being located from the eight times it has been utilised.
Data sources acknowledgement:
There had been no known previous whole of jurisdiction disposed homicide victim analysis previously undertaken.
This dataset consists of:
Software/equipment used to collect and analyse the data: IBM SSPS Statistics v26 Microsoft Excel Spreadsheet.
Facebook
Twitterhttps://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de443631https://search.gesis.org/research_data/datasearch-httpwww-da-ra-deoaip--oaioai-da-ra-de443631
Abstract (en): This study is part of a time-series collection of national surveys fielded continuously since 1952. The election studies are designed to present data on Americans' social backgrounds, enduring political predispositions, social and political values, perceptions and evaluations of groups and candidates, opinions on questions of public policy, and participation in political life. In addition to core items, new content includes questions on values, political knowledge, and attitudes on racial policy, as well as more general attitudes conceptualized as antecedent to these opinions on racial issues. The Main Data File also contains vote validation data that were expanded to include information from the appropriate election office and were attached to the records of each of the respondents in the post-election survey. The expanded data consist of the respondent's post case ID, vote validation ID, and two variables to clarify the distinction between the office of registration and the office associated with the respondent's sample address. The second data file, Bias Nonresponse Data File, contains respondent-level field administration variables. Of 3,833 lines of sample that were originally issued for the 1990 Study, 2,176 resulted in completed interviews, others were nonsample, and others were noninterviews for a variety of reasons. For each line of sample, the Bias Nonresponse Data File includes sampling data, result codes, control variables, and interviewer variables. Detailed geocode data are blanked but available under conditions of confidential access (contact the American National Election Studies at the Center for Political Studies, University of Michigan, for further details). This is a specialized file, of particular interest to those who are interested in survey nonresponse. Demographic variables include age, party affiliation, marital status, education, employment status, occupation, religious preference, and ethnicity. ICPSR data undergo a confidentiality review and are altered when necessary to limit the risk of disclosure. ICPSR also routinely creates ready-to-go data files along with setups in the major statistical software formats as well as standard codebooks to accompany the data. In addition to these procedures, ICPSR performed the following processing steps for this data collection: Performed consistency checks.; Standardized missing values.; Checked for undocumented or out-of-range codes.. Response Rates: The response rate for this study is 67.7 percent. The study was in the field until January 31, although 67 percent of the interviews were taken by November 25, 80 percent by December 7, and 93 percent by December 31. All United States households in the 50 states. National multistage area probability sample. 2015-11-10 The study metadata was updated.2009-01-09 YYYY-MM-DD Part 1, the Main Data File, incorporates errata that were posted separately under the Fourth ICPSR Edition. Part 2, the Bias Nonresponse Data File, has been added to the data collection, along with corresponding SAS, SPSS, and Stata setup files and documentation. The codebook has been updated by adding a technical memorandum on the sampling design of the study previously missing from the codebook. The nonresponse file contains respondent-level field administration variables for those interested in survey nonresponse. The collection now includes files in ASCII, SPSS portable, SAS transport (CPORT), and Stata system formats.2000-02-21 The data for this study are now available in SAS transport and SPSS export formats in addition to the ASCII data file. Variables in the dataset have been renumbered to the following format: 2-digit (or 2-character) year prefix + 4 digits + [optional] 1-character suffix. Dataset ID and version variables have also been added. Additionally, the Voter Validation Office Administration Interview File (Expanded Version) has been merged with the main data file, and the codebook and SPSS setup files have been replaced. Also, SAS setup files have been added to the collection, and the data collection instrument is now provided as a PDF file. Two files are no longer being released with this collection: the Voter Validation Office Administration Interview File (Unexpanded Version) and the Results of First Contact With Respondent file. Funding insitution(s): National Science Foundation (SOC77-08885 and SES-8341310). face-to-face interviewThere was significantly more content in this post-election survey than ...
Facebook
TwitterHypothesis 1: Generic job demands are positively related to a) emotional exhaustion, and b) depersonalization. Hypothesis 2: GP-specific job demands are positively related to a) emotional exhaustion and b) depersonalization. Hypothesis 3: Generic job resources are negatively related to a) emotional exhaustion and b) depersonalization. Hypothesis 4: GP-specific resources are negatively related to a) emotional exhaustion and b) depersonalization. Hypothesis 5: Time-based negative WHI partially mediates the relationship between generic job demands and a) emotional exhaustion, b) depersonalization. Hypothesis 6: Time-based negative WHI partially mediates the relationship between GP specific job demands and a) emotional exhaustion and b) depersonalization. Hypothesis 7: Strain-based negative WHI partially mediates the relationship between generic job demands and a) emotional exhaustion and b) depersonalization. Hypothesis 8: Strain-based negative WHI partially mediates the relationship between GP specific job demands and a) emotional exhaustion and b) depersonalization. The dataset includes raw data obtained from questionnaires, before single imputation with EM algorithm in SPSS to deal with missing values; Description of variables: WPQ (work pace and quantity; generic job demand, q0001 – q0006) MENT (mental load, generic job demand, q0007 – q0010) AUTO (autonomy, generic job resource, q0011 – q0013), not included in the current study OPPOR (opportunity for development, generic job resource, q0014-q0016) FEEDB (feedback, generic job resource, q0017-q0019) COLL (collaboration, generic job resource, q0020-q0022) SELF (self-efficacy, generic personal resource, q0023-q0026, not included in the current study) OPTIM (optimism, generic personal resources, q0027-q0030, not included in the current study) STRAIN (strain-based negative work-home interference, q0031, q0032, q0038, q0041) TIME (time-based negative work-home interference, q0034, q0037, q0039, q0042) EE (emotional exhaustion, q0044, q0045, q0046, q0049, q0051, q0055, q0056, q0059) DP (depersonalization, q0048, q0053, q0054, q0061) PA (personal accomplishment, q0047, q0050, q0052, q0057, q0058, q0060) JDGP (occupation-specific job demands, q0062-q0074) JRGP (occupation-specific job resources, q0075-q0084) PRGP (occupation-specific personal resources, q0085-q0087, not included in the current study) gender q0088 year of birth q0089 marital status q0090 year of start in present practice q0091 number of employees q0092 partner with job q0093 partner works overtime q0094 flexible childcare arrangements q0095 non-flexible childcare arrangements q0096 practice type q0097 care hours q0098 work hours q0099
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
GENERAL INFORMATION
Title of Dataset: A dataset from a survey investigating disciplinary differences in data citation
Date of data collection: January to March 2022
Collection instrument: SurveyMonkey
Funding: Alfred P. Sloan Foundation
SHARING/ACCESS INFORMATION
Licenses/restrictions placed on the data: These data are available under a CC BY 4.0 license
Links to publications that cite or use the data:
Gregory, K., Ninkov, A., Ripp, C., Peters, I., & Haustein, S. (2022). Surveying practices of data citation and reuse across disciplines. Proceedings of the 26th International Conference on Science and Technology Indicators. International Conference on Science and Technology Indicators, Granada, Spain. https://doi.org/10.5281/ZENODO.6951437
Gregory, K., Ninkov, A., Ripp, C., Roblin, E., Peters, I., & Haustein, S. (2023). Tracing data:
A survey investigating disciplinary differences in data citation. Zenodo. https://doi.org/10.5281/zenodo.7555266
DATA & FILE OVERVIEW
File List
Additional related data collected that was not included in the current data package: Open ended questions asked to respondents
METHODOLOGICAL INFORMATION
Description of methods used for collection/generation of data:
The development of the questionnaire (Gregory et al., 2022) was centered around the creation of two main branches of questions for the primary groups of interest in our study: researchers that reuse data (33 questions in total) and researchers that do not reuse data (16 questions in total). The population of interest for this survey consists of researchers from all disciplines and countries, sampled from the corresponding authors of papers indexed in the Web of Science (WoS) between 2016 and 2020.
Received 3,632 responses, 2,509 of which were completed, representing a completion rate of 68.6%. Incomplete responses were excluded from the dataset. The final total contains 2,492 complete responses and an uncorrected response rate of 1.57%. Controlling for invalid emails, bounced emails and opt-outs (n=5,201) produced a response rate of 1.62%, similar to surveys using comparable recruitment methods (Gregory et al., 2020).
Methods for processing the data:
Results were downloaded from SurveyMonkey in CSV format and were prepared for analysis using Excel and SPSS by recoding ordinal and multiple choice questions and by removing missing values.
Instrument- or software-specific information needed to interpret the data:
The dataset is provided in SPSS format, which requires IBM SPSS Statistics. The dataset is also available in a coded format in CSV. The Codebook is required to interpret to values.
DATA-SPECIFIC INFORMATION FOR: MDCDataCitationReuse2021surveydata
Number of variables: 94
Number of cases/rows: 2,492
Missing data codes: 999 Not asked
Refer to MDCDatacitationReuse2021Codebook.pdf for detailed variable information.