Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
GENERAL INFORMATION
Title of Dataset: A dataset from a survey investigating disciplinary differences in data citation
Date of data collection: January to March 2022
Collection instrument: SurveyMonkey
Funding: Alfred P. Sloan Foundation
SHARING/ACCESS INFORMATION
Licenses/restrictions placed on the data: These data are available under a CC BY 4.0 license
Links to publications that cite or use the data:
Gregory, K., Ninkov, A., Ripp, C., Peters, I., & Haustein, S. (2022). Surveying practices of data citation and reuse across disciplines. Proceedings of the 26th International Conference on Science and Technology Indicators. International Conference on Science and Technology Indicators, Granada, Spain. https://doi.org/10.5281/ZENODO.6951437
Gregory, K., Ninkov, A., Ripp, C., Roblin, E., Peters, I., & Haustein, S. (2023). Tracing data: A survey investigating disciplinary differences in data citation. Zenodo. https://doi.org/10.5281/zenodo.7555266
DATA & FILE OVERVIEW
File List
Filename: MDCDatacitationReuse2021Codebookv2.pdf Codebook
Filename: MDCDataCitationReuse2021surveydatav2.csv Dataset format in csv
Filename: MDCDataCitationReuse2021surveydatav2.sav Dataset format in SPSS
Filename: MDCDataCitationReuseSurvey2021QNR.pdf Questionnaire
Additional related data collected that was not included in the current data package: Open ended questions asked to respondents
METHODOLOGICAL INFORMATION
Description of methods used for collection/generation of data:
The development of the questionnaire (Gregory et al., 2022) was centered around the creation of two main branches of questions for the primary groups of interest in our study: researchers that reuse data (33 questions in total) and researchers that do not reuse data (16 questions in total). The population of interest for this survey consists of researchers from all disciplines and countries, sampled from the corresponding authors of papers indexed in the Web of Science (WoS) between 2016 and 2020.
Received 3,632 responses, 2,509 of which were completed, representing a completion rate of 68.6%. Incomplete responses were excluded from the dataset. The final total contains 2,492 complete responses and an uncorrected response rate of 1.57%. Controlling for invalid emails, bounced emails and opt-outs (n=5,201) produced a response rate of 1.62%, similar to surveys using comparable recruitment methods (Gregory et al., 2020).
Methods for processing the data:
Results were downloaded from SurveyMonkey in CSV format and were prepared for analysis using Excel and SPSS by recoding ordinal and multiple choice questions and by removing missing values.
Instrument- or software-specific information needed to interpret the data:
The dataset is provided in SPSS format, which requires IBM SPSS Statistics. The dataset is also available in a coded format in CSV. The Codebook is required to interpret to values.
DATA-SPECIFIC INFORMATION FOR: MDCDataCitationReuse2021surveydata
Number of variables: 95
Number of cases/rows: 2,492
Missing data codes: 999 Not asked
Refer to MDCDatacitationReuse2021Codebook.pdf for detailed variable information.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
DesignA cross-sectional, web-based survey design was employed, consisting of validated self-report measures designed to capture demographic information, insulin use, diabetes-related distress, disordered eating, and body shape perception.Inclusion/Exclusion criteria. Participants were eligible to participate if they self-described as being aged 18 or over, with a diagnosis of Type 1 diabetes and on a prescribed insulin regimen. They were required to be at least one-year post-diagnosis, as people who have been prescribed insulin for less than one year may not have settled into a routine with insulin management and may mismanage their insulin unintentionally. Additionally, participants were required to reside within the UK, as this removed a potential confound of cost or resources as a barrier to accessing insulin. People with a diagnosis of type 2 diabetes were excluded from the study, as the pathophysiology and treatment of the two illnesses are quite different. For example, as those with type 2 diabetes still produce some degree of insulin naturally, non-adherence to an insulin regimen is likely to have less of an immediate impact than for those with type 1 diabetes, who produce no insulin naturally (Peyrot et al., 2010). Potential participants were provided with a link to the study which provided detailed information about the study, details of informed consent and their right to withdraw. When the survey was completed, or participants chose to exit, a debrief page was presented with signposts towards various supports and resources. Participants were offered the opportunity to receive a brief summary of findings from the study and given the chance to win a £25 Amazon gift voucher, both of which required an email address to be supplied through separate surveys, so as to protect the confidentiality of responses. Ethical approval for this study was granted by the chair of the relevant Ethics Committee.Statistical AnalysisPrior to beginning the study, an estimate of the minimum number of participants required was calculated using statistical power tables (Clark-Carter, 2010) and G*Power version 3.1. Based on previous research (Ames, 2017), a medium effect size (.5) was used to calculate sample sizes with a power of .8 (Clark-Carter, 2010), which generated a necessary sample size of 208. All analyses were adequately powered.Data were analysed using IBM SPSS Statistics for Mac version 25. MeasuresDemographic Information. This section collected basic demographic information, including age; gender; country of residence; and current or historical diagnosis of an eating disorder. The data were screened to ensure participants met the inclusion criteria.Insulin Measure. A 16-item questionnaire has been designed to assess rates and reasons for insulin non-adherence (Ames, 2017). Eating Disorder Psychopathology. The Eating Disorder Examination-Questionnaire (EDE-Q) assesses eating disorder psychopathology, and data from this measure was key to informing the primary research questions. It was designed as a self-report version of the interview-based Eating Disorders Examination (EDE; 32), which is considered to be the gold standard measure (Fairburn, Wilson, & Schleimer, 1993). The EDE-Q assesses four subscales: Restraint, Eating Concern, Shape Concern, and Weight Concern. It was found to be an adequate alternative to the EDE (Fairburn & Beglin, 1994). Body Shape Questionnaire (BSQ). The Body Shape Questionnaire is a 34-item self-report measure, designed to assess concerns regarding body shape and the phenomenological experience of “feeling fat” (Cooper, Taylor, Cooper, & Fairbum, 1987). The BSQ targets body image as a central feature of both AN and BN and thus is a useful supplementary measure of eating disorder psychopathology. Diabetes Distress. The Diabetes Distress Scale (Polonsky et al., 2005) is a 17-item scale designed to measure diabetes-related emotional distress via four domains: emotional burden, physician distress, interpersonal distress and regimenn distress. This measure was included on the basis of results from Ames (Ames, 2017), which identified diabetes-related emotional distress as a key reason for insulin non-adherence in type 1 diabetes. Inclusion in this study allowed for further investigation of its role.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Version 11 release notes:Changes release notes description, does not change data.Version 10 release notes:The data now has the following age categories (which were previously aggregated into larger groups to reduce file size): under 10, 10-12, 13-14, 40-44, 45-49, 50-54, 55-59, 60-64, over 64. These categories are available for female, male, and total (female+male) arrests. The previous aggregated categories (under 15, 40-49, and over 49 have been removed from the data). Version 9 release notes:For each offense, adds a variable indicating the number of months that offense was reported - these variables are labeled as "num_months_[crime]" where [crime] is the offense name. These variables are generated by the number of times one or more arrests were reported per month for that crime. For example, if there was at least one arrest for assault in January, February, March, and August (and no other months), there would be four months reported for assault. Please note that this does not differentiate between an agency not reporting that month and actually having zero arrests. The variable "number_of_months_reported" is still in the data and is the number of months that any offense was reported. So if any agency reports murder arrests every month but no other crimes, the murder number of months variable and the "number_of_months_reported" variable will both be 12 while every other offense number of month variable will be 0. Adds data for 2017 and 2018.Version 8 release notes:Adds annual data in R format.Changes project name to avoid confusing this data for the ones done by NACJD.Fixes bug where bookmaking was excluded as an arrest category. Changed the number of categories to include more offenses per category to have fewer total files. Added a "total_race" file for each category - this file has total arrests by race for each crime and a breakdown of juvenile/adult by race. Version 7 release notes: Adds 1974-1979 dataAdds monthly data (only totals by sex and race, not by age-categories). All data now from FBI, not NACJD. Changes some column names so all columns are <=32 characters to be usable in Stata.Changes how number of months reported is calculated. Now it is the number of unique months with arrest data reported - months of data from the monthly header file (i.e. juvenile disposition data) are not considered in this calculation. Version 6 release notes: Fix bug where juvenile female columns had the same value as juvenile male columns.Version 5 release notes: Removes support for SPSS and Excel data.Changes the crimes that are stored in each file. There are more files now with fewer crimes per file. The files and their included crimes have been updated below.Adds in agencies that report 0 months of the year.Adds a column that indicates the number of months reported. This is generated summing up the number of unique months an agency reports data for. Note that this indicates the number of months an agency reported arrests for ANY crime. They may not necessarily report every crime every month. Agencies that did not report a crime with have a value of NA for every arrest column for that crime.Removes data on runaways.Version 4 release notes: Changes column names from "poss_coke" and "sale_coke" to "poss_heroin_coke" and "sale_heroin_coke" to clearly indicate that these column includes the sale of heroin as well as similar opiates such as morphine, codeine, and opium. Also changes column names for the narcotic columns to indicate that they are only for synthetic narcotics. Version 3 release notes: Add data for 2016.Order rows by year (descending) and ORI.Version 2 release notes: Fix bug where Philadelphia Police Department had incorrect FIPS county code. The Arrests by Age, Sex, and Race (ASR) data is an FBI data set that is part of the annual Uniform Crime Reporting (UCR) Program data. This data contains highly granular data on the number of people arrested for a variety of crimes (see below for a full list of included crimes). The data sets here combine data from the years 1974-2018 into a single file for each group of crimes. Each monthly file is only a single year as my laptop can't handle combining all the years together. These files are quite large and may take some time to load. Columns are crime-arrest category units. For example, If you choose the data set that includes murder, you would have rows for each age
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Online searches through Web of Science and PubMed were conducted on 15 September, 2023 for articles published after 1950 using the following terms: TS = (ultra high dose rate OR ultra-high dose rate OR ultrahigh dose rate) AND TS = (in vivo OR animal model OR mice OR preclinical). The queries produced 980 results in total, with 564 results left after removing duplicate entries.The titles and abstracts were reviewed manually by two authors and the full-text of suitable manuscripts was further screened considering the factors such as topics, experiment condition and methods, research objects, endpoints, etc. The detailed record identification and screening flows based on Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) are summarized in Figure 1. Finally, forty articles were included in our analysis.The FLASH effect was confirmed if there were significant differences in experimental phenomena and data under the two radiation conditions. In the same article, the research items with different endpoints but otherwise identical conditions were regarded as one item. As summarized in Table 1, a total of 131 items were extracted from the 40 articles included in the analysis. For each item, the FLASH effect (1 represents significant sparing effect and 0 represents no sparing effect) and detailed parameters were recorded, including type and energy of the radiation, dose, dose rate, experimental object, pulse characteristics (if provided), etc.According to emulate the quantitative analyses of normal tissue effect in the clinic (QUANTEC), the probability of triggering the FLASH effect as a function of mean dose rate or dose was analyzed with the binary logistic regression model. The analysis was done using the SPSS software. For the statistical data items, there are large imbalances in the number of data entries with and without FLASH effect (people are more inclined to report the research with positive results). Therefore, a more balanced dataset was obtained by oversampling using the K-Means SMOTE algorithm (Figure S1), which was implemented using Python based on the imblearn library.The ROC curve (receiver operating characteristic curve) was plotted as FPR (False Positive Rate) against TPR (True Positive Rate) at different threshold values. The classification model was validated using the AUC (area under ROC curve) value, which was threshold and scale invariant.
The materials and datasets accompanying the paper “Fostering Constructive Online News Discussions: The Role of Sender Anonymity and Message Subjectivity in Shaping Perceived Polarization, Disinhibition, and Participation Intention in a Representative Sample of Online Commenters”. In this paper we report on an experiment in which we aimed to reduce perceived polarization and increase intention to join online news discussions through manipulating sender anonymity and message subjectivity (i.e., explicit acknowledgements that a statement represents the writer’s perspective, e.g., “I think that is not true”). The data files are not stored in TiU Dataverse but are accessible via the LISS Data Archive. Data filesDataset_raw – SPSS raw datafile Dataset_restructured_coding incl – SPSS restructured data file from variables to cases, coding of participants’ comments has been included as an additional variable Dataset_backstructured_for MEMORE – SPSS backstructured data file from cases to variables in order to conduct the mediation analysis in MEMORE Coding participant comments – Excell file with the coding of participants comments by the R script, including the manual checking SPSS Syntax – SPSS syntax with which the variables were constructed in the Dataset R Script – R script for all the analyses, except the mediation because that was conducted in SPSS Supplemental material Questionnaire Design lists of stimuli Stimuli lists (1-4) Dutch words and phrases for automated subjectivity coding Structure data package From the raw dataset, we made the restructured dataset which also includes the calculated variables, see the SPSS Syntax. This structured dataset was the basis for the analyses in R. The backstructured dataset is based on the restructured dataset and needed for conducting the repeated measures mediation with SPSS MEMORE. The coding dataset was also analyzed in R, and provides the input for the column “CodingComments” in the restructured dataset. Method: Survey through the LISS panel Universe: The sample consisted of 302 participants, but after removing the 8 participants that had not completed the survey, the final sample consisted of 294 participants (Mage = 54.80, SDage = 15.53, range = 17 – 88 years; 55.4% male and 44.6% female). 3.1% of the sample completed only primary education, 25.6% reported high school as their highest completed education, 31.1% had attained secondary vocational education, 25.6% finished higher professional education, and 14.7% had a University degree as their highest qualification. Notably, whereas we preselected participants on their online activity, 49.7% of the sample indicated that they do not respond to online news articles anymore, suggesting that actual participation in online discussions fluctuates over time. Of the people that do react, 54.1% also engages in discussions in online news article threads. Of those, 8.8% discusses almost never, 45% multiple times per year, 35% multiple times per month, 10% multiple times per week, and 1.3% multiple times per day. Country/Nation: The Netherlands
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
For a comprehensive guide to this data and other UCR data, please see my book at ucrbook.comVersion 12 release notes:Adds 2019 data.Version 11 release notes:Changes release notes description, does not change data.Version 10 release notes:The data now has the following age categories (which were previously aggregated into larger groups to reduce file size): under 10, 10-12, 13-14, 40-44, 45-49, 50-54, 55-59, 60-64, over 64. These categories are available for female, male, and total (female+male) arrests. The previous aggregated categories (under 15, 40-49, and over 49 have been removed from the data). Version 9 release notes:For each offense, adds a variable indicating the number of months that offense was reported - these variables are labeled as "num_months_[crime]" where [crime] is the offense name. These variables are generated by the number of times one or more arrests were reported per month for that crime. For example, if there was at least one arrest for assault in January, February, March, and August (and no other months), there would be four months reported for assault. Please note that this does not differentiate between an agency not reporting that month and actually having zero arrests. The variable "number_of_months_reported" is still in the data and is the number of months that any offense was reported. So if any agency reports murder arrests every month but no other crimes, the murder number of months variable and the "number_of_months_reported" variable will both be 12 while every other offense number of month variable will be 0. Adds data for 2017 and 2018.Version 8 release notes:Adds annual data in R format.Changes project name to avoid confusing this data for the ones done by NACJD.Fixes bug where bookmaking was excluded as an arrest category. Changed the number of categories to include more offenses per category to have fewer total files. Added a "total_race" file for each category - this file has total arrests by race for each crime and a breakdown of juvenile/adult by race. Version 7 release notes: Adds 1974-1979 dataAdds monthly data (only totals by sex and race, not by age-categories). All data now from FBI, not NACJD. Changes some column names so all columns are <=32 characters to be usable in Stata.Changes how number of months reported is calculated. Now it is the number of unique months with arrest data reported - months of data from the monthly header file (i.e. juvenile disposition data) are not considered in this calculation. Version 6 release notes: Fix bug where juvenile female columns had the same value as juvenile male columns.Version 5 release notes: Removes support for SPSS and Excel data.Changes the crimes that are stored in each file. There are more files now with fewer crimes per file. The files and their included crimes have been updated below.Adds in agencies that report 0 months of the year.Adds a column that indicates the number of months reported. This is generated summing up the number of unique months an agency reports data for. Note that this indicates the number of months an agency reported arrests for ANY crime. They may not necessarily report every crime every month. Agencies that did not report a crime with have a value of NA for every arrest column for that crime.Removes data on runaways.Version 4 release notes: Changes column names from "poss_coke" and "sale_coke" to "poss_heroin_coke" and "sale_heroin_coke" to clearly indicate that these column includes the sale of heroin as well as similar opiates such as morphine, codeine, and opium. Also changes column names for the narcotic columns to indicate that they are only for synthetic narcotics. Version 3 release notes: Add data for 2016.Order rows by year (descending) and ORI.Version 2 release notes: Fix bug where Philadelphia Police Department had incorrect FIPS county code. The Arrests by Age, Sex, and Race (ASR) data is an FBI data set that is part of the annual Uniform Crime Reporting (UCR) Program data. This data contains highly granular data on the number of people arrested for a variety of crimes (see below for a full list of included crimes). The data sets here combine data from the years 1974-2019 into a single file for each group of crimes. Each monthly file is only a single year as my laptop can't handle combining all the years together. These files are quite large and may take some time to load. Col
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This collection contains the 17 anonymised datasets from the RAAAP-2 international survey of research management and administration professional undertaken in 2019. To preserve anonymity the data are presented in 17 datasets linked only by AnalysisRegionofEmployment, as many of the textual responses, even though redacted to remove institutional affiliation could be used to identify some individuals if linked to the other data. Each dataset is presented in the original SPSS format, suitable for further analyses, as well as an Excel equivalent for ease of viewing. There are additional files in this collection showing the the questionnaire and the mappings to the datasets together with the SPSS scripts used to produce the datasets. These data follow on from, but re not directly linked to the first RAAAP survey undertaken in 2016, data from which can also be found in FigShare Errata (16/5/23) an error in v13 of the main Data Cleansing syntax file (now updated to v14) meant that two variables were missing their value labels (the underlying codes were correct) - a new version (SPSS & Excel) of the Main Dataset has been updated
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
All experiments were programmed using E-prime, and undergraduate students were selected for individual testing. After removing invalid subjects, perform SPSS data analysis. The independent variable is the encoding method and concurrent task, and the dependent variable is the subject's free recall score.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
The 2022 APS Employee Census was administered to all available Australian Public Service (APS) employees, running from 9 May to 10 June 2022. \r \r The Employee Census provides a comprehensive view of the APS and ensures no eligible respondents are omitted from the survey sample, removing sampling bias and reducing sample error. The Census' content is designed to establish the views of APS employees on workplace issues such as leadership, employee wellbeing, and job satisfaction.\r \r Overall, 120,662 APS employees responded to the Employee Census in 2022, a response rate of 83%.\r \r Please be aware that the very large number of respondents to the employee census means these files are over 200MB in size. Downloading and opening these files may take some time.\r \r TECHNICAL NOTES \r \r Three files are available for download.\r \r * 2022 APS Employee Census - Questionnaire: This contains the 2022 APS Employee Census questionnaire.\r \r * 2022 APS Employee Census - 5 point dataset.csv: This file contains individual responses to the 2022 APS Employee Census as clean, tabular data as required by data.gov.au. This will need to be used in conjunction with the above document.\r \r * 2022 APS Employee Census - 5 point dataset.sav: This file contains individual responses to the 2022 APS Employee Census for use with the SPSS software package. \r \r To protect the privacy and confidentiality of respondents to the 2022 APS Employee Census, the datasets provided on data.gov.au include responses to a limited number of demographic or other attribute questions.\r \r Full citation of this dataset should list the Australian Public Service Commission (APSC) as the author. \r \r A recommended short citation is: 2022 APS Employee Census data, Australian Public Service Commission. \r \r Any queries can be directed to research@apsc.gov.au.\r
Background and Objectives: Pharmacogenomics (PGx) leverages genomic information to tailor drug therapies, enhancing precision medicine. Despite global advancements, its implementation in Lebanon, Qatar, and Saudi Arabia faces unique challenges in clinical integration. This study aimed to investigate PGx attitudes, knowledge implementation, associated challenges, forecast future educational needs, and compare findings across the three countries. Methods: This cross-sectional study utilized an anonymous, self-administered online survey distributed to healthcare professionals, academics, and clinicians in Lebanon, Qatar, and Saudi Arabia. The survey comprised 18 questions to assess participants' familiarity with PGx, current implementation practices, perceived obstacles, potential integration strategies, and future educational needs. Results: The survey yielded 337 responses from healthcare professionals across the three countries. Data revealed significant variations in PGx familiarity an..., Ethical statement and informed consent Ethical approval for this study was obtained from the institutional review boards of the participating universities: Beirut Arab University (2023-H-0153-HS-R-0545), Qatar University (QU-IRB 1995-E/23), and Alfaisal University (IRB-20270). Informed consent was obtained from all participants online, ensuring their confidentiality and the right to withdraw from the study without any consequences. Participants were informed that all collected data would be anonymous and confidential, with only the principal investigator having access to the data. Completing and submitting the survey was considered an agreement to participate. Study design This study utilized a quantitative cross-sectional research design, involving healthcare professionals (pharmacists, nurses, medical laboratory technologists), university academics, and clinicians from Lebanon, Qatar, and Saudi Arabia. Data was collected through a voluntary, anonymous, private survey to gather PGx per..., , # Integrating pharmacogenomics in three Middle Eastern countries’ healthcare (Lebanon, Qatar, and Saudi Arabia)
Description of the data set: o 1 dataset is included; PGx_database : it includes the raw data of our paper. o In the data set, each row represent one participant. o All the variables can contain empty cells. When participants didn't answer, empty cells were added to show the missing data. o The number in each cell has a specific value depending on the variable.
Listed variables:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
!!!WARNING~~~This dataset has a large number of flaws and is unable to properly answer many questions that people generally use it to answer, such as whether national hate crimes are changing (or at least they use the data so improperly that they get the wrong answer). A large number of people using this data (academics, advocates, reporting, US Congress) do so inappropriately and get the wrong answer to their questions as a result. Indeed, many published papers using this data should be retracted. Before using this data I highly recommend that you thoroughly read my book on UCR data, particularly the chapter on hate crimes (https://ucrbook.com/hate-crimes.html) as well as the FBI's own manual on this data. The questions you could potentially answer well are relatively narrow and generally exclude any causal relationships. ~~~WARNING!!!Version 8 release notes:Adds 2019 dataVersion 7 release notes:Changes release notes description, does not change data.Version 6 release notes:Adds 2018 dataVersion 5 release notes:Adds data in the following formats: SPSS, SAS, and Excel.Changes project name to avoid confusing this data for the ones done by NACJD.Adds data for 1991.Fixes bug where bias motivation "anti-lesbian, gay, bisexual, or transgender, mixed group (lgbt)" was labeled "anti-homosexual (gay and lesbian)" prior to 2013 causing there to be two columns and zero values for years with the wrong label.All data is now directly from the FBI, not NACJD. The data initially comes as ASCII+SPSS Setup files and read into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. Version 4 release notes: Adds data for 2017.Adds rows that submitted a zero-report (i.e. that agency reported no hate crimes in the year). This is for all years 1992-2017. Made changes to categorical variables (e.g. bias motivation columns) to make categories consistent over time. Different years had slightly different names (e.g. 'anti-am indian' and 'anti-american indian') which I made consistent. Made the 'population' column which is the total population in that agency. Version 3 release notes: Adds data for 2016.Order rows by year (descending) and ORI.Version 2 release notes: Fix bug where Philadelphia Police Department had incorrect FIPS county code. The Hate Crime data is an FBI data set that is part of the annual Uniform Crime Reporting (UCR) Program data. This data contains information about hate crimes reported in the United States. Please note that the files are quite large and may take some time to open.Each row indicates a hate crime incident for an agency in a given year. I have made a unique ID column ("unique_id") by combining the year, agency ORI9 (the 9 character Originating Identifier code), and incident number columns together. Each column is a variable related to that incident or to the reporting agency. Some of the important columns are the incident date, what crime occurred (up to 10 crimes), the number of victims for each of these crimes, the bias motivation for each of these crimes, and the location of each crime. It also includes the total number of victims, total number of offenders, and race of offenders (as a group). Finally, it has a number of columns indicating if the victim for each offense was a certain type of victim or not (e.g. individual victim, business victim religious victim, etc.). The only changes I made to the data are the following. Minor changes to column names to make all column names 32 characters or fewer (so it can be saved in a Stata format), made all character values lower case, reordered columns. I also generated incident month, weekday, and month-day variables from the incident date variable included in the original data.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
GENERAL INFORMATION
Title of Dataset: A dataset from a survey investigating disciplinary differences in data citation
Date of data collection: January to March 2022
Collection instrument: SurveyMonkey
Funding: Alfred P. Sloan Foundation
SHARING/ACCESS INFORMATION
Licenses/restrictions placed on the data: These data are available under a CC BY 4.0 license
Links to publications that cite or use the data:
Gregory, K., Ninkov, A., Ripp, C., Peters, I., & Haustein, S. (2022). Surveying practices of data citation and reuse across disciplines. Proceedings of the 26th International Conference on Science and Technology Indicators. International Conference on Science and Technology Indicators, Granada, Spain. https://doi.org/10.5281/ZENODO.6951437
Gregory, K., Ninkov, A., Ripp, C., Roblin, E., Peters, I., & Haustein, S. (2023). Tracing data: A survey investigating disciplinary differences in data citation. Zenodo. https://doi.org/10.5281/zenodo.7555266
DATA & FILE OVERVIEW
File List
Filename: MDCDatacitationReuse2021Codebookv2.pdf Codebook
Filename: MDCDataCitationReuse2021surveydatav2.csv Dataset format in csv
Filename: MDCDataCitationReuse2021surveydatav2.sav Dataset format in SPSS
Filename: MDCDataCitationReuseSurvey2021QNR.pdf Questionnaire
Additional related data collected that was not included in the current data package: Open ended questions asked to respondents
METHODOLOGICAL INFORMATION
Description of methods used for collection/generation of data:
The development of the questionnaire (Gregory et al., 2022) was centered around the creation of two main branches of questions for the primary groups of interest in our study: researchers that reuse data (33 questions in total) and researchers that do not reuse data (16 questions in total). The population of interest for this survey consists of researchers from all disciplines and countries, sampled from the corresponding authors of papers indexed in the Web of Science (WoS) between 2016 and 2020.
Received 3,632 responses, 2,509 of which were completed, representing a completion rate of 68.6%. Incomplete responses were excluded from the dataset. The final total contains 2,492 complete responses and an uncorrected response rate of 1.57%. Controlling for invalid emails, bounced emails and opt-outs (n=5,201) produced a response rate of 1.62%, similar to surveys using comparable recruitment methods (Gregory et al., 2020).
Methods for processing the data:
Results were downloaded from SurveyMonkey in CSV format and were prepared for analysis using Excel and SPSS by recoding ordinal and multiple choice questions and by removing missing values.
Instrument- or software-specific information needed to interpret the data:
The dataset is provided in SPSS format, which requires IBM SPSS Statistics. The dataset is also available in a coded format in CSV. The Codebook is required to interpret to values.
DATA-SPECIFIC INFORMATION FOR: MDCDataCitationReuse2021surveydata
Number of variables: 95
Number of cases/rows: 2,492
Missing data codes: 999 Not asked
Refer to MDCDatacitationReuse2021Codebook.pdf for detailed variable information.