Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a dataset with results of the poll conducted in the study “Information Scientists’ Motivations for Research Data Sharing and Reuse”.
In terms of the Uses and Gratifications Theory (Questions 1 and 2), the most popular uses relate to the categories of research support and information. Researchers share, or would share, their research data in general for any reusability purposes and especially for combination of different datasets to produce new evidence. Also, the vast majority of study participants associate research data sharing with possibilities to accelerate scientific progress and to increase research efficiency. In case of research data reuse, all the researchers indicated that they use, or would use, others’ data first of all for inspiration. Interestingly, study participants put relatively high the category of recognition in case of sharing, but at the same time they do not associate increased recognition among colleagues and other researchers with research data reuse. The remaining categories belonging to the categories of self-esteem and social interaction, i.e. increased citation level and visibility of the research as well as enhanced scientific reputation, possible cooperations and co-authorship, were selected only by few respondents. Also remarkably, data reuse is more frequently linked to entertainment then data sharing.
In terms of the Self-Determination Theory (Questions 3 and 4), all but one of the interviewees indicated that they have shared or would share their research data because it can accelerate scientific progress which they consider important and would like to contribute to it (i.e., identified regulation). The second most popular motivation turned out to be the obligation by employer, project funder and/or journals (i.e., external regulation). The third most popular option was social influence, i.e. because many other researchers participate in data sharing and they feel obligated to do the same (i.e., external regulation).This way, the participants demonstrate a mixture of identified motivation and external regulation, both material and social. In the case of data reuse, the participants demonstrate more homogeneous results with identification and intrinsic motivation having most of the votes. The role of external regulation seems to be much less important as in the case with data sharing. So, researchers reuse, or would reuse, research data because it can accelerate scientific progress which is important for them. Additionally, researchers enjoy exploring and using third party research data. Thus, interviewees participate or would participate in data sharing because they consider it important, but also feel or are obliged to do so. At the same time, study participants do not feel pressure from outside when deciding whether to reuse data or not.
For more information about the study and its results, please read the article “Information Scientists’ Motivations for Research Data Sharing and Reuse” by Shutsko and Stock (2023).
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
The incorporation of data sharing into the research lifecycle is an important part of modern scholarly debate. In this study, the DataONE Usability and Assessment working group addresses two primary goals: To examine the current state of data sharing and reuse perceptions and practices among research scientists as they compare to the 2009/2010 baseline study, and to examine differences in practices and perceptions across age groups, geographic regions, and subject disciplines. We distributed surveys to a multinational sample of scientific researchers at two different time periods (October 2009 to July 2010 and October 2013 to March 2014) to observe current states of data sharing and to see what, if any, changes have occurred in the past 3–4 years. We also looked at differences across age, geographic, and discipline-based groups as they currently exist in the 2013/2014 survey. Results point to increased acceptance of and willingness to engage in data sharing, as well as an increase in actual data sharing behaviors. However, there is also increased perceived risk associated with data sharing, and specific barriers to data sharing persist. There are also differences across age groups, with younger respondents feeling more favorably toward data sharing and reuse, yet making less of their data available than older respondents. Geographic differences exist as well, which can in part be understood in terms of collectivist and individualist cultural differences. An examination of subject disciplines shows that the constraints and enablers of data sharing and reuse manifest differently across disciplines. Implications of these findings include the continued need to build infrastructure that promotes data sharing while recognizing the needs of different research communities. Moving into the future, organizations such as DataONE will continue to assess, monitor, educate, and provide the infrastructure necessary to support such complex grand science challenges.
https://www.icpsr.umich.edu/web/ICPSR/studies/37071/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/37071/terms
This study explores the factors that influence the data reuse behaviors of scientists and identifies the generalized patterns that occur in data reuse across various disciplines. An online survey was distributed to the scientists through Qualtrics. The initial email invitation to the survey was sent to 15,703 scientists within academic institutions on October 5, 2015, with a reminder sent on November 10, 2015. The survey closed on November 30, 2015. 1,987 email messages (12.65%) were returned and a total of 13,716 participants (87.35%) received the email invitation to participate in the survey. This research used the National Science Foundation (NSF) STEM discipline codes (2014) for the respondents to indicate their specific academic disciplines based on their current research activities. Of these participants, 1,528 scientists from 94 specific disciplines (as categorized by NSF STEM discipline codes (2014)), completed the survey with less than 5% of missing values.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Keynote presentation by John Burn-Murdoch, Senior Data-Visualisation Journalist, from Financial Times presented at Better Science through Better Data event. The video recording and scribes are included.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains supplementary data and R scripts to generate figures for the paper 'The time efficiency gain in sharing and reuse of research data'. This version contains new R scripts to generate Figures of the revised manuscript. Abstract: Among the frequently stated benefits of sharing research data are time efficiency or increased productivity. The assumption is that reuse or secondary use of research data saves researchers time in not having to produce data for a publication themselves. This can make science more efficient and productive. However, if there is no reuse, time costs in making data available for reuse will have been made with no return on this investment. In this paper a mathematical model is used to calculate the break-even point for time spent sharing in a scientific community, versus time gain by reuse. This is done for several scenarios; from simple to complex datasets to share and reuse, and at different sharing rates. The results indicate that sharing research data can indeed cause an efficiency revenue for the scientific community. However, this is not a given in all modeled scenarios. The most efficient scientific community is one that has few sharing researchers, a high reuse rate, and low time investments for sharing and reuse. This suggests it would be beneficial to have a critical selection of datasets that are worth the effort to prepare for reuse in other scientific studies. In addition, stimulating reuse of datasets in itself would be beneficial to increase efficiency in scientific communities.
The aim of this survey was to chart how the universities in Finland have organised the depositing of digital research data and to what extent the data are reused by the scientific community after the original research has been completed. The respondents were professors of human sciences, social sciences and behavioural sciences in Finnish universities, and representatives of some research institutes. Opinions were also queried on the OECD guidelines and principles on open access to research data from public funding. First, the respondents were asked whether there were any guidelines or regulations concerning the depositing of digital research data in their departments, what happened to research data after the completion of the original research, and to what extent the data were reused. Further questions covered how often the data from completed research projects were reused in secondary research projects or for theses. The respondents also estimated what proportion of the data collected in their departments/institutes were reusable at the time of the survey, and why research data were not being reused in their own field of research. Views were also investigated on whether confidentiality or research ethics issues, or problems related to copyright or information technology formed barriers to data reuse. Opinions on the OECD Open Access guidelines on research data were queried. The respondents were asked whether they had earlier knowledge of the guidelines, and to what extent its principles could be implemented in their own disciplines. Some questions pertained to the advantages and disadvantages of open access to research data. The advantages mentioned included reducing duplicate data collection and more effective use of data resources, whereas the disadvantages mentioned included, for example, risks connected to data protection and misuse of data. The respondents also suggested ways of implementing the Open Access guidelines and gave their opinions on how binding the recommendations should be, to what extent various bodies should be involved in formulating the guidelines, and how the archiving and dissemination of digital research data should be organised. Finally, the respondents estimated how the researchers in their field would react to enhancing open access to research data, and also gave their opinion on open access to the data they themselves have collected. Background variables included the respondent's gender, university, and research field.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A dataset on data reuse practices amongst clinical researchers
This dataset presents the results from a global survey designed to investigate how individuals involved in research discover and reuse secondary data. The data consist of 1677 complete responses received from individuals in 105 countries. The data are provided in two files: one for researchers and one for those working in research support. The README file provides extensive guidance on using the data files and the associated descriptions of the variables.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Background: Attribution to the original contributor upon reuse of published data is important both as a reward for data creators and to document the provenance of research findings. Previous studies have found that papers with publicly available datasets receive a higher number of citations than similar studies without available data. However, few previous analyses have had the statistical power to control for the many variables known to predict citation rate, which has led to uncertain estimates of the "citation benefit". Furthermore, little is known about patterns in data reuse over time and across datasets. Method and Results: Here, we look at citation rates while controlling for many known citation predictors, and investigate the variability of data reuse. In a multivariate regression on 10,555 studies that created gene expression microarray data, we found that studies that made data available in a public repository received 9% (95% confidence interval: 5% to 13%) more citations than similar studies for which the data was not made available. Date of publication, journal impact factor, open access status, number of authors, first and last author publication history, corresponding author country, institution citation history, and study topic were included as covariates. The citation benefit varied with date of dataset deposition: a citation benefit was most clear for papers published in 2004 and 2005, at about 30%. Authors published most papers using their own datasets within two years of their first publication on the dataset, whereas data reuse papers published by third-party investigators continued to accumulate for at least six years. To study patterns of data reuse directly, we compiled 9,724 instances of third party data reuse via mention of GEO or ArrayExpress accession numbers in the full text of papers. The level of third-party data use was high: for 100 datasets deposited in year 0, we estimated that 40 papers in PubMed reused a dataset by year 2, 100 by year 4, and more than 150 data reuse papers had been published by year 5. Data reuse was distributed across a broad base of datasets: a very conservative estimate found that 20% of the datasets deposited between 2003 and 2007 had been reused at least once by third parties. Conclusion: After accounting for other factors affecting citation rate, we find a robust citation benefit from open data, although a smaller one than previously reported. We conclude there is a direct effect of third-party data reuse that persists for years beyond the time when researchers have published most of the papers reusing their own data. Other factors that may also contribute to the citation benefit are considered.We further conclude that, at least for gene expression microarray data, a substantial fraction of archived datasets are reused, and that the intensity of dataset reuse has been steadily increasing since 2003.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains key characteristics about the data described in the Data Descriptor A dataset describing data discovery and reuse practices in research. Contents:
1. human readable metadata summary table in CSV format
2. machine readable metadata file in JSON format
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset for "Initial insight of three modes of data sharing: Prevalence of primary reuse, data integration and dataset release in research articles" is coded as follows:
01 DOI: DOI
02 article number: the accession number in Web of Science
03 article title: title of the articles
04 exclude: if the article was excluded from the sample, assign 1.
05 research_field: the categories of research fields are described in the Appendix (Table S1)
06 target_of_study: the categories of the target of studies are described in the Appendix (Table S1)
29 release_location_nameofpublicarchive: the names of the deposited public archives (comma separated)
The following items, if they occur, are assigned a value of 1:
07 No_datause: The article did not use data
08 primary_reuse: primary reuse
09 primary_data_specificresarchdata: primary reuse of specific research data
10 primary_data_resource: primary reuse of resource
11 primary_source_self: primary reuse from self-constructed data
12 primary_source_citation: primary reuse from citation
13 primary_source_archive: primary reuse from an archive
14 primary_source_others: primary reuse from the other source
15 primary_souce_na: primary reuse source is not available
16 data_integration: data integration
17 integration_type_empirical: data integration as empirical type
18 integration_type_Introductionmaterialresearchmethod: data integration as introduction/material/research methods type
19 integration_type_combinedanalysis: data integration as introduction/material/research methods type
20 integration_source_self: data integration from self-constructed data
21 integration_source_citation: data integration from citation
22 integration_source_archive: data integration from an archive
23 integration_source_others: data integration from the other source
24 integration_source_na: data integration source is not available
25 dataset_release: dataset release
26 release_location_publicarchive: dataset deposit to a public archive
27 release_location_supporting: dataset release in Supporting Information
28 release_location_onrequest: dataset release through personal contacts
The appendix includes following tables:
Table S1. Coding schema for analysis
Table S2. Primary reuse by research field and reused data
Table S3. Primary reuse by target of study and reused data
Table S4. Data integration by research field and reuse type
Table S5. Data integration by target of study and reuse type
Table S6. Dataset release by research field
Table S7. Dataset release by target of study and methods
Table S8. List of names of public data archives for dataset release
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset (anonymized transcripts that were coded based on themes) is from focus groups with scientists (n=25) from five disciplines (atmospheric and earth science, computer science, chemistry, ecology, and neuroscience), where we asked questions about data management to lead into a discussion of what features they think are necessary to include in data repository systems and services to help them implement the data sharing and preservation parts of their data management plans. Participants identified metadata quality control and training as problem areas in data management. Additionally, participants discussed several desired repository features, including: metadata control, data traceability, security, stable infrastructure, and data use restrictions. The dataset was created using MAXQDA and has the .mx20 file format extent.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In May-June 2020 PLOS surveyed researchers from Europe and North America to rate tasks associated with data sharing on (i) their importance to researchers and (ii) researchers' satisfaction with their ability to complete those tasks. Researchers were recruited via direct email campaigns, promoted Facebook and Twitter posts, a post on the PLOS Blog, and emails to industry contacts who distributed the survey on our behalf. Participation was incentivized with 3 random prize draws, which were managed separately to maintain anonymity.This dataset consists of:1) The survey sent to researchers (pdf).2) The anonymised data export of survey results (xlsx).The data export has been processed to retain the anonymity of participants. The comments left in the final question of the survey (question 17) have been removed. Answers to questions 12 to 16 have been recoded to give each answer a numerical value (see 'Scores' tab of spreadsheet). The counts, means, standard deviations and confidence intervals used in the associated manuscript for each factor are given in rows 619-622.Version 2 contains only the completed responses. Completed responses in the version 2 dataset refer to those who answered all the questions in the survey. The version 1 dataset contains a higher number of responses categorised as 'completed' but this has been reviewed for version 2.Version 1 data was used for the preprint: https://doi.org/10.31219/osf.io/njr5u.
A survey conducted by SND regarding researchers' attitudes towards open access to and reuse of research data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains survey responses from 3,257 researchers in 20 different disciplines. The survey was conducted in 2020 as part of a PhD research project and explored data generation, sharing and reuse practices across these disciplines.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These data were generated for an investigation of research data repository (RDR) mentions in biuomedical research articles.
Supplementary Table 1 is a discrete subset of SciCrunch RDRs used to study RDR mentions in biomedical literature. We generated this list by starting with the top 1000 entries in the SciCrunch database, measured by citations, removed entries for organizations (such as universities without a corresponding RDR) or non-relevant tools (such as reference managers), updated links, and consolidated duplicates resulting from RDR mergers and name variations. The resulting list of 737 RDRs is shown in with as a base based on a source list of RDRs in the SciCrunch database. The file includes the Research Resource Identifier (RRID), the RDR name, and a link to the RDR record in the SciCrunch database.
Supplementary Table 2 shows the RDRs, associated journals, and article-mention pairs (records) with text snippets extracted from mined Methods text in 2020 PubMed articles. The dataset has 4 components. The first shows the list of repositories with RDR mentions, and includes the Research Resource Identifier (RRID), the RDR name, the number of articles that mention the RDR, and a link to the record in the SciCrunch database. The second shows the list of journals in the study set with at least 1 RDR mention, andincludes the Journal ID, nam, ESSN/ISSN, the total count of publications in 2020, the number of articles that had text available to mine, the number of article-mention pairs (records), number of articles with RDR mentions, the number of unique RDRs mentioned, % of articles with minable text. The third shows the top 200 journals by RDR mention, normalized by the proportion of articles with available text to mine, with the same metadata as the second table. The fourth shows text snippets for each RDR mention, and includes the RRID, RDR name, PubMedID (PMID), DOI, article publication date, journal name, journal ID, ESSN/ISSN, article title, and snippet.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was used for the research article "Open research data: a case study into institutional and infrastructural arrangements to stimulate open research data sharing and reuse", published in the Journal of Librarianship & Information Science.
The data entails:
The file contents per item are in principle the same; only the filetype differs.
Full edition for scientific use. As part of a study on factors influencing researcher data reuse and the mechanisms by which these factors are activated, the research team conducted semi-structured oral interviews with a purposive sample of 24 data reusers and intermediaries. This dataset includes de-identified transcripts of 21 of the interviews, as well as written follow-up responses from 8 of the study participants.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Extensive research has taken place over the years to examine the barriers of OER adoption, but little empirical studies has been undertaken to map the amount of OER reuse. The discussion around the actual use of OER, outside the context in which they were developed, remains ongoing. Previous studies have already shown that searching and evaluating resources are barriers for actual reuse. Hence, in this quantitative survey study we explored teachers’ practices with resources in Higher Education Institutes in the Netherlands. The survey had three runs, each in a different context, with a total of 439 respondents. The results show that resources that are hard or time-consuming to develop are most often reused of third parties without adaptations. Resources that need to be more context specific are often created by teachers themselves. To improve our understanding of reuse, follow-up studies must explore reuse with a more qualitative research design in order to explore how these hidden practices of dark reuse look like and how teachers and students benefit of it.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This codebook was used to analyze the interview data (from 11 interviews) in the master thesis project titled "Enhancing Open Research Data Sharing and Reuse via Infrastructural and Institutional Instruments: a Case Study in Epidemiology" which is openly available on TU Delft Education Repository.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a dataset with results of the poll conducted in the study “Information Scientists’ Motivations for Research Data Sharing and Reuse”.
In terms of the Uses and Gratifications Theory (Questions 1 and 2), the most popular uses relate to the categories of research support and information. Researchers share, or would share, their research data in general for any reusability purposes and especially for combination of different datasets to produce new evidence. Also, the vast majority of study participants associate research data sharing with possibilities to accelerate scientific progress and to increase research efficiency. In case of research data reuse, all the researchers indicated that they use, or would use, others’ data first of all for inspiration. Interestingly, study participants put relatively high the category of recognition in case of sharing, but at the same time they do not associate increased recognition among colleagues and other researchers with research data reuse. The remaining categories belonging to the categories of self-esteem and social interaction, i.e. increased citation level and visibility of the research as well as enhanced scientific reputation, possible cooperations and co-authorship, were selected only by few respondents. Also remarkably, data reuse is more frequently linked to entertainment then data sharing.
In terms of the Self-Determination Theory (Questions 3 and 4), all but one of the interviewees indicated that they have shared or would share their research data because it can accelerate scientific progress which they consider important and would like to contribute to it (i.e., identified regulation). The second most popular motivation turned out to be the obligation by employer, project funder and/or journals (i.e., external regulation). The third most popular option was social influence, i.e. because many other researchers participate in data sharing and they feel obligated to do the same (i.e., external regulation).This way, the participants demonstrate a mixture of identified motivation and external regulation, both material and social. In the case of data reuse, the participants demonstrate more homogeneous results with identification and intrinsic motivation having most of the votes. The role of external regulation seems to be much less important as in the case with data sharing. So, researchers reuse, or would reuse, research data because it can accelerate scientific progress which is important for them. Additionally, researchers enjoy exploring and using third party research data. Thus, interviewees participate or would participate in data sharing because they consider it important, but also feel or are obliged to do so. At the same time, study participants do not feel pressure from outside when deciding whether to reuse data or not.
For more information about the study and its results, please read the article “Information Scientists’ Motivations for Research Data Sharing and Reuse” by Shutsko and Stock (2023).