https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
The incorporation of data sharing into the research lifecycle is an important part of modern scholarly debate. In this study, the DataONE Usability and Assessment working group addresses two primary goals: To examine the current state of data sharing and reuse perceptions and practices among research scientists as they compare to the 2009/2010 baseline study, and to examine differences in practices and perceptions across age groups, geographic regions, and subject disciplines. We distributed surveys to a multinational sample of scientific researchers at two different time periods (October 2009 to July 2010 and October 2013 to March 2014) to observe current states of data sharing and to see what, if any, changes have occurred in the past 3–4 years. We also looked at differences across age, geographic, and discipline-based groups as they currently exist in the 2013/2014 survey. Results point to increased acceptance of and willingness to engage in data sharing, as well as an increase in actual data sharing behaviors. However, there is also increased perceived risk associated with data sharing, and specific barriers to data sharing persist. There are also differences across age groups, with younger respondents feeling more favorably toward data sharing and reuse, yet making less of their data available than older respondents. Geographic differences exist as well, which can in part be understood in terms of collectivist and individualist cultural differences. An examination of subject disciplines shows that the constraints and enablers of data sharing and reuse manifest differently across disciplines. Implications of these findings include the continued need to build infrastructure that promotes data sharing while recognizing the needs of different research communities. Moving into the future, organizations such as DataONE will continue to assess, monitor, educate, and provide the infrastructure necessary to support such complex grand science challenges.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a dataset with results of the poll conducted in the study “Information Scientists’ Motivations for Research Data Sharing and Reuse”.
In terms of the Uses and Gratifications Theory (Questions 1 and 2), the most popular uses relate to the categories of research support and information. Researchers share, or would share, their research data in general for any reusability purposes and especially for combination of different datasets to produce new evidence. Also, the vast majority of study participants associate research data sharing with possibilities to accelerate scientific progress and to increase research efficiency. In case of research data reuse, all the researchers indicated that they use, or would use, others’ data first of all for inspiration. Interestingly, study participants put relatively high the category of recognition in case of sharing, but at the same time they do not associate increased recognition among colleagues and other researchers with research data reuse. The remaining categories belonging to the categories of self-esteem and social interaction, i.e. increased citation level and visibility of the research as well as enhanced scientific reputation, possible cooperations and co-authorship, were selected only by few respondents. Also remarkably, data reuse is more frequently linked to entertainment then data sharing.
In terms of the Self-Determination Theory (Questions 3 and 4), all but one of the interviewees indicated that they have shared or would share their research data because it can accelerate scientific progress which they consider important and would like to contribute to it (i.e., identified regulation). The second most popular motivation turned out to be the obligation by employer, project funder and/or journals (i.e., external regulation). The third most popular option was social influence, i.e. because many other researchers participate in data sharing and they feel obligated to do the same (i.e., external regulation).This way, the participants demonstrate a mixture of identified motivation and external regulation, both material and social. In the case of data reuse, the participants demonstrate more homogeneous results with identification and intrinsic motivation having most of the votes. The role of external regulation seems to be much less important as in the case with data sharing. So, researchers reuse, or would reuse, research data because it can accelerate scientific progress which is important for them. Additionally, researchers enjoy exploring and using third party research data. Thus, interviewees participate or would participate in data sharing because they consider it important, but also feel or are obliged to do so. At the same time, study participants do not feel pressure from outside when deciding whether to reuse data or not.
For more information about the study and its results, please read the article “Information Scientists’ Motivations for Research Data Sharing and Reuse” by Shutsko and Stock (2023).
https://www.icpsr.umich.edu/web/ICPSR/studies/37071/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/37071/terms
This study explores the factors that influence the data reuse behaviors of scientists and identifies the generalized patterns that occur in data reuse across various disciplines. An online survey was distributed to the scientists through Qualtrics. The initial email invitation to the survey was sent to 15,703 scientists within academic institutions on October 5, 2015, with a reminder sent on November 10, 2015. The survey closed on November 30, 2015. 1,987 email messages (12.65%) were returned and a total of 13,716 participants (87.35%) received the email invitation to participate in the survey. This research used the National Science Foundation (NSF) STEM discipline codes (2014) for the respondents to indicate their specific academic disciplines based on their current research activities. Of these participants, 1,528 scientists from 94 specific disciplines (as categorized by NSF STEM discipline codes (2014)), completed the survey with less than 5% of missing values.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Keynote presentation by John Burn-Murdoch, Senior Data-Visualisation Journalist, from Financial Times presented at Better Science through Better Data event. The video recording and scribes are included.
Background: Attribution to the original contributor upon reuse of published data is important both as a reward for data creators and to document the provenance of research findings. Previous studies have found that papers with publicly available datasets receive a higher number of citations than similar studies without available data. However, few previous analyses have had the statistical power to control for the many variables known to predict citation rate, which has led to uncertain estimates of the "citation benefit". Furthermore, little is known about patterns in data reuse over time and across datasets. Method and Results: Here, we look at citation rates while controlling for many known citation predictors, and investigate the variability of data reuse. In a multivariate regression on 10,555 studies that created gene expression microarray data, we found that studies that made data available in a public repository received 9% (95% confidence interval: 5% to 13%) more citations th...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A dataset on data reuse practices amongst clinical researchers
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains supplementary data and R scripts to generate figures for the paper 'The time efficiency gain in sharing and reuse of research data'. This version contains new R scripts to generate Figures of the revised manuscript. Abstract: Among the frequently stated benefits of sharing research data are time efficiency or increased productivity. The assumption is that reuse or secondary use of research data saves researchers time in not having to produce data for a publication themselves. This can make science more efficient and productive. However, if there is no reuse, time costs in making data available for reuse will have been made with no return on this investment. In this paper a mathematical model is used to calculate the break-even point for time spent sharing in a scientific community, versus time gain by reuse. This is done for several scenarios; from simple to complex datasets to share and reuse, and at different sharing rates. The results indicate that sharing research data can indeed cause an efficiency revenue for the scientific community. However, this is not a given in all modeled scenarios. The most efficient scientific community is one that has few sharing researchers, a high reuse rate, and low time investments for sharing and reuse. This suggests it would be beneficial to have a critical selection of datasets that are worth the effort to prepare for reuse in other scientific studies. In addition, stimulating reuse of datasets in itself would be beneficial to increase efficiency in scientific communities.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset (anonymized transcripts that were coded based on themes) is from focus groups with scientists (n=25) from five disciplines (atmospheric and earth science, computer science, chemistry, ecology, and neuroscience), where we asked questions about data management to lead into a discussion of what features they think are necessary to include in data repository systems and services to help them implement the data sharing and preservation parts of their data management plans. Participants identified metadata quality control and training as problem areas in data management. Additionally, participants discussed several desired repository features, including: metadata control, data traceability, security, stable infrastructure, and data use restrictions. The dataset was created using MAXQDA and has the .mx20 file format extent.
The aim of this survey was to chart how the universities in Finland have organised the depositing of digital research data and to what extent the data are reused by the scientific community after the original research has been completed. The respondents were professors of human sciences, social sciences and behavioural sciences in Finnish universities, and representatives of some research institutes. Opinions were also queried on the OECD guidelines and principles on open access to research data from public funding. First, the respondents were asked whether there were any guidelines or regulations concerning the depositing of digital research data in their departments, what happened to research data after the completion of the original research, and to what extent the data were reused. Further questions covered how often the data from completed research projects were reused in secondary research projects or for theses. The respondents also estimated what proportion of the data collected in their departments/institutes were reusable at the time of the survey, and why research data were not being reused in their own field of research. Views were also investigated on whether confidentiality or research ethics issues, or problems related to copyright or information technology formed barriers to data reuse. Opinions on the OECD Open Access guidelines on research data were queried. The respondents were asked whether they had earlier knowledge of the guidelines, and to what extent its principles could be implemented in their own disciplines. Some questions pertained to the advantages and disadvantages of open access to research data. The advantages mentioned included reducing duplicate data collection and more effective use of data resources, whereas the disadvantages mentioned included, for example, risks connected to data protection and misuse of data. The respondents also suggested ways of implementing the Open Access guidelines and gave their opinions on how binding the recommendations should be, to what extent various bodies should be involved in formulating the guidelines, and how the archiving and dissemination of digital research data should be organised. Finally, the respondents estimated how the researchers in their field would react to enhancing open access to research data, and also gave their opinion on open access to the data they themselves have collected. Background variables included the respondent's gender, university, and research field.
Full edition for scientific use. As part of a study on factors influencing researcher data reuse and the mechanisms by which these factors are activated, the research team conducted semi-structured oral interviews with a purposive sample of 24 data reusers and intermediaries. This dataset includes de-identified transcripts of 21 of the interviews, as well as written follow-up responses from 8 of the study participants.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains key characteristics about the data described in the Data Descriptor A dataset describing data discovery and reuse practices in research. Contents:
1. human readable metadata summary table in CSV format
2. machine readable metadata file in JSON format
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains research data and software (re)use indications (formal citations, informal mentions) in scholarly works related to High Energy Physics. 1,411 research and software indications were identified by a mix of approaches: use of citation discovery services and multiple search approaches in Google Scholar. The dataset contains indications by what approach the (re)use indications were found. All identified research data and software (re)use indications were classified according to their purpose, location, and elements.
The data was collected in 2018 for a PhD thesis on research data and software (re)use indications in scholarly works.
Data is currently being used, and reused, in ecological research at unprecedented rates. To ensure appropriate reuse however, we need to ask the question: “Are aggregated databases currently providing the right information to enable effective and unbiased reuse?” We investigate this question, with a focus on designs that purposefully bias the selection of sampling locations (upweighting the probability of selection of some locations). These designs are common and examples are those that have unequal inclusion probabilities or are stratified. We perform a simulation experiment by creating datasets with progressively more bias, and examine the resulting statistical estimates. The effect of ignoring the survey design can be profound, with biases of up to 250% when naive analytical methods are used. The bias is not reduced by adding more data. Fortunately, the bias can be mitigated by using an appropriate estimator or an appropriate model. These are only applicable however, when essential information about the survey design is available: the randomisation structure (e.g. inclusion probabilities or stratification), and/or covariates used in the randomisation process. The results suggest that such information must be stored and served with the data to support inference and reuse. Citation: S.D. Foster, J. Vanhatalo, V.M. Trenkel, T. Schulz, E. Lawrence, R. Przeslawski, and G.R. Hosack. 2021. Effects of ignoring survey design information for data reuse. Ecological Applications 31(6): e02360. 10.1002/eap.2360
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
We designed and organized a one-day workshop, where in the context of FAIR the following themes were discussed and practiced: scientific transparency and reproducibility; how to write a README; data and code licenses; spatial data; programming code; examples of published datasets; data reuse; and discipline and motivation. The intended audience were researchers at the Environmental Science Group of Wageningen University and Research. All workshop materials were designed with further development and reuse in mind and are shared through this dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains High Energy Physics related research data and software (re)use indications (formal citations, informal mentions) in scholarly works. All research data and software resources were identified and extracted from INSPIRE-HEP. The (re)use indications were identified by a mix of approaches: use of citation discovery services and multiple search approaches in Google Scholar. All identified research data and software (re)use indications were classified according to their purpose, location, and elements.
The data was collected in 2018 for a PhD thesis on research data and software (re)use indications in scholarly works.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This codebook was used to analyze the interview data (from 11 interviews) in the master thesis project titled "Enhancing Open Research Data Sharing and Reuse via Infrastructural and Institutional Instruments: a Case Study in Epidemiology" which is openly available on TU Delft Education Repository.
This dataset presents the results from a global survey designed to investigate how individuals involved in research discover and reuse secondary data. The data consist of 1677 complete responses received from individuals in 105 countries. The data are provided in two files: one for researchers and one for those working in research support. The README file provides extensive guidance on using the data files and the associated descriptions of the variables.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains High Energy Physics related research data and software (re)use indications (formal citations, informal mentions) in scholarly works. All research data and software resources were identified and extracted from Zenodo. The (re)use indications were identified by a mix of approaches: use of citation discovery services and multiple search approaches in Google Scholar. All identified research data and software (re)use indications were classified according to their purpose, location, and elements.
The data was collected in 2018 for a PhD thesis on research data and software (re)use indications in scholarly works.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These data were generated for an investigation of research data repository (RDR) mentions in biuomedical research articles.
Supplementary Table 1 is a discrete subset of SciCrunch RDRs used to study RDR mentions in biomedical literature. We generated this list by starting with the top 1000 entries in the SciCrunch database, measured by citations, removed entries for organizations (such as universities without a corresponding RDR) or non-relevant tools (such as reference managers), updated links, and consolidated duplicates resulting from RDR mergers and name variations. The resulting list of 737 RDRs is shown in with as a base based on a source list of RDRs in the SciCrunch database. The file includes the Research Resource Identifier (RRID), the RDR name, and a link to the RDR record in the SciCrunch database.
Supplementary Table 2 shows the RDRs, associated journals, and article-mention pairs (records) with text snippets extracted from mined Methods text in 2020 PubMed articles. The dataset has 4 components. The first shows the list of repositories with RDR mentions, and includes the Research Resource Identifier (RRID), the RDR name, the number of articles that mention the RDR, and a link to the record in the SciCrunch database. The second shows the list of journals in the study set with at least 1 RDR mention, andincludes the Journal ID, nam, ESSN/ISSN, the total count of publications in 2020, the number of articles that had text available to mine, the number of article-mention pairs (records), number of articles with RDR mentions, the number of unique RDRs mentioned, % of articles with minable text. The third shows the top 200 journals by RDR mention, normalized by the proportion of articles with available text to mine, with the same metadata as the second table. The fourth shows text snippets for each RDR mention, and includes the RRID, RDR name, PubMedID (PMID), DOI, article publication date, journal name, journal ID, ESSN/ISSN, article title, and snippet.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was used for the research article "Open research data: a case study into institutional and infrastructural arrangements to stimulate open research data sharing and reuse", published in the Journal of Librarianship & Information Science.
The data entails:
The file contents per item are in principle the same; only the filetype differs.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
The incorporation of data sharing into the research lifecycle is an important part of modern scholarly debate. In this study, the DataONE Usability and Assessment working group addresses two primary goals: To examine the current state of data sharing and reuse perceptions and practices among research scientists as they compare to the 2009/2010 baseline study, and to examine differences in practices and perceptions across age groups, geographic regions, and subject disciplines. We distributed surveys to a multinational sample of scientific researchers at two different time periods (October 2009 to July 2010 and October 2013 to March 2014) to observe current states of data sharing and to see what, if any, changes have occurred in the past 3–4 years. We also looked at differences across age, geographic, and discipline-based groups as they currently exist in the 2013/2014 survey. Results point to increased acceptance of and willingness to engage in data sharing, as well as an increase in actual data sharing behaviors. However, there is also increased perceived risk associated with data sharing, and specific barriers to data sharing persist. There are also differences across age groups, with younger respondents feeling more favorably toward data sharing and reuse, yet making less of their data available than older respondents. Geographic differences exist as well, which can in part be understood in terms of collectivist and individualist cultural differences. An examination of subject disciplines shows that the constraints and enablers of data sharing and reuse manifest differently across disciplines. Implications of these findings include the continued need to build infrastructure that promotes data sharing while recognizing the needs of different research communities. Moving into the future, organizations such as DataONE will continue to assess, monitor, educate, and provide the infrastructure necessary to support such complex grand science challenges.