CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains supplementary data and R scripts to generate figures for the paper 'The time efficiency gain in sharing and reuse of research data'. This version contains new R scripts to generate Figures of the revised manuscript. Abstract: Among the frequently stated benefits of sharing research data are time efficiency or increased productivity. The assumption is that reuse or secondary use of research data saves researchers time in not having to produce data for a publication themselves. This can make science more efficient and productive. However, if there is no reuse, time costs in making data available for reuse will have been made with no return on this investment. In this paper a mathematical model is used to calculate the break-even point for time spent sharing in a scientific community, versus time gain by reuse. This is done for several scenarios; from simple to complex datasets to share and reuse, and at different sharing rates. The results indicate that sharing research data can indeed cause an efficiency revenue for the scientific community. However, this is not a given in all modeled scenarios. The most efficient scientific community is one that has few sharing researchers, a high reuse rate, and low time investments for sharing and reuse. This suggests it would be beneficial to have a critical selection of datasets that are worth the effort to prepare for reuse in other scientific studies. In addition, stimulating reuse of datasets in itself would be beneficial to increase efficiency in scientific communities.
Full edition for scientific use. As part of a study on factors influencing researcher data reuse and the mechanisms by which these factors are activated, the research team conducted semi-structured oral interviews with a purposive sample of 24 data reusers and intermediaries. This dataset includes de-identified transcripts of 21 of the interviews, as well as written follow-up responses from 8 of the study participants.
The DDL maintains data on articles referencing the DDL since it was formally established in 2014. Details include article citations, DDL site or data asset citations, and data asset availability statements, in addition to codes indicating whether specific data assets are referenced and whether data is referenced in a citation, which may indicate data reuse. This data asset is updated quarterly.
The incorporation of data sharing into the research lifecycle is an important part of modern scholarly debate. In this study, the DataONE Usability and Assessment working group addresses two primary goals: To examine the current state of data sharing and reuse perceptions and practices among research scientists as they compare to the 2009/2010 baseline study, and to examine differences in practices and perceptions across age groups, geographic regions, and subject disciplines. We distributed surveys to a multinational sample of scientific researchers at two different time periods (October 2009 to July 2010 and October 2013 to March 2014) to observe current states of data sharing and to see what, if any, changes have occurred in the past 3–4 years. We also looked at differences across age, geographic, and discipline-based groups as they currently exist in the 2013/2014 survey. Results point to increased acceptance of and willingness to engage in data sharing, as well as an increase in ac...
PubMed Central reuse of GEO datasets deposited in 2007This is the raw data behind the analysis. It contains one row for every mention of a 2007 GEO dataset in PubMed Central. Each row identifies the mentioned GEO dataset, the PubMed Central article that mentions the dataset's accession number, whether the authors of the dataset and the attributing article overlap, and whether this is considered an instance of third-party data reuse.PMC_reuse_of_2007_GEO_datasets.csvAggregate Table DataAggregate table data behind the figures and results in the README associated with the main dataset. Includes Baseline metrics used for extrapolating PubMed Central (PMC) results to PubMed, Number of mentions of a 2007 GEO dataset by authors who submitted the dataset, and Number of mentions of a dataset by authors who DID NOT submit the dataset across 2007-2010.tables.csv Funding agencies are reluctant to support data archiving, even though large research funders such as the National Science Foundation (NSF) and the National Institutes of Health acknowledge its importance for scientific progress. Our quantitative estimates of data reuse indicate that ongoing financial investment in data-archiving infrastructure provides a high scientific return.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data reuse.
This is the dataset for the research article "Enteric pathogen treatment requirements for non-potable water reuse despite limited exposure data". It contains log reduction results for various exposure conditions as described in the text. This dataset is associated with the following publication: Schoen, M., M. Jahne, and J. Garland. A risk-based evaluation of onsite, non-potable reuse systems developed in compliance with conventional water quality measures. JOURNAL OF WATER AND HEALTH. IWA Publishing, London, UK, 18(3): 331-344, (2020).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset outlines a proposed set of core, minimal metadata elements that can be used to describe biomedical datasets, such as those resulting from research funded by the National Institutes of Health. It can inform efforts to better catalog or index such data to improve discoverability. The proposed metadata elements are based on an analysis of the metadata schemas used in a set of NIH-supported data sharing repositories. Common elements from these data repositories were identified, mapped to existing data-specific metadata standards from to existing multidisciplinary data repositories, DataCite and Dryad, and compared with metadata used in MEDLINE records to establish a sustainable and integrated metadata schema. From the mappings, we developed a preliminary set of minimal metadata elements that can be used to describe NIH-funded datasets. Please see the readme file for more details about the individual sheets within the spreadsheet.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Background: With data becoming a centerpiece of modern scientific discovery, data sharing by scientists is now a crucial element of scientific progress. This article aims to provide an in-depth examination of the practices and perceptions of data management, including data storage, data sharing, and data use and reuse by scientists around the world. Methods: The Usability and Assessment Working Group of DataONE, an NSF-funded environmental cyberinfrastructure project, distributed a survey to a multinational and multidisciplinary sample of scientific researchers in a two-waves approach in 2017-2018. We focused our analysis on examining the differences across age groups, sub-disciplines of science, and sectors of employment. Findings: Most respondents displayed what we describe as high and moderate risk data practices by storing their data on their personal computer, departmental servers or USB drives. Respondents appeared to be satisfied with short-term storage solutions; however, only half of them are satisfied with available mechanisms for storing data beyond the life of the process. Data sharing and data reuse were viewed positively: over 85% of respondents admitted they would be willing to share their data with others and said they would use data collected by others if it could be easily accessed. A vast majority of respondents felt that the lack of access to data generated by other researchers or institutions was a major impediment to progress in science at large, yet only about a half thought that it restricted their own ability to answer scientific questions. Although attitudes towards data sharing and data use and reuse are mostly positive, practice does not always support data storage, sharing, and future reuse. Assistance through data managers or data librarians, readily available data repositories for both long-term and short-term storage, and educational programs for both awareness and to help engender good data practices are clearly needed.
The dataset included the inventories, impact assessments and cost analyses of different scenarios at the different scales (building and district) and with different water source being reuse (mixed wastewater, graywater). And also inventoaries for thermal recovery and vertical flow wetland. This dataset is associated with the following publication: Morelli, B., S. Cashman, X. Ma, J. Garland, D. Bless, and M. Jahne. Life Cycle Assessment and Cost Analysis of Distributed Mixed Wastewater and Graywater Treatment for Water Recycling in the Context of an Urban Case Study. U.S. Environmental Protection Agency, Washington, DC, USA, 2019.
This dataset compared the de facto reuse percentage modeled for the 22 surface water sites sampled in Phase II of the drinking water project and the organic chemical data generated as part of the project. This dataset is associated with the following publication: Nguyen , T., P. Westerhoff , E. Furlong, D. Kolpin, A. Batt, H. Mash, K. Schenck, J.S. Boone, J. Rice, and S. Glassmeyer. Modeled De Facto Reuse and Contaminants of Emerging Concern in Drinking Water Source Waters. JOURNAL OF THE AMERICAN WATER WORKS ASSOCIATION. American Water Works Association, Denver, CO, USA, 110(4): E2-E18, (2018).
The presented model and the data are part of the master thesis “Integrating Reuse in MaTrace Models: An implementation and evaluation” by Raphael Elbing handed in in September 2022 to obtain the Master of Science in Industrial Ecology. Part of the code is explained in the file “model_application_manual.html”. The composition of the data is explained in the thesis itself. The ziped folders need to be unziped to run the code.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CN: Waste Water Treatment & Reuse: Total Profit: ytd data was reported at 3.929 RMB bn in Oct 2015. This records an increase from the previous number of 3.421 RMB bn for Sep 2015. CN: Waste Water Treatment & Reuse: Total Profit: ytd data is updated monthly, averaging 0.498 RMB bn from Dec 2003 (Median) to Oct 2015, with 97 observations. The data reached an all-time high of 4.478 RMB bn in Dec 2014 and a record low of -0.090 RMB bn in Nov 2007. CN: Waste Water Treatment & Reuse: Total Profit: ytd data remains active status in CEIC and is reported by National Bureau of Statistics. The data is categorized under China Premium Database’s Utility Sector – Table CN.RCA: Financial Data: Water Production and Supply: Waste Water Treatment and Reuse.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Everyday clinical care generates vast amounts of digital data. A broad range of actors are interested in reusing these data for various purposes. Such reuse of health data could support medical research, healthcare planning, technological innovation, and lead to increased financial revenue. Yet, reuse also raises questions about what data subjects think about the use of health data for various different purposes. Based on a survey with 1071 respondents conducted in 2021 in Denmark, this article explores attitudes to health data reuse. Denmark is renowned for its advanced integration of data infrastructures, facilitating data reuse. This is therefore a relevant setting from which to explore public attitudes to reuse, both as authorities around the globe are currently working to facilitate data reuse opportunities, and in the light of the recent agreement on the establishment in 2024 of the European Health Data Space (EHDS) within the European Union (EU). Our study suggests that there are certain forms of health data reuse—namely transnational data sharing, commercial involvement, and use of data as national economic assets—which risk undermining public support for health data reuse. However, some of the purposes that the EHDS is supposed to facilitate are these three controversial purposes. Failure to address these public concerns could well challenge the long-term legitimacy and sustainability of the data infrastructures currently under construction.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CN: Waste Water Treatment & Reuse: YoY: Industrial Sales Value: Delivery Value for Export data was reported at -100.000 % in Aug 2011. This stayed constant from the previous number of -100.000 % for Jul 2011. CN: Waste Water Treatment & Reuse: YoY: Industrial Sales Value: Delivery Value for Export data is updated monthly, averaging -100.000 % from Feb 2009 (Median) to Aug 2011, with 29 observations. The data reached an all-time high of 22.730 % in Mar 2010 and a record low of -100.000 % in Aug 2011. CN: Waste Water Treatment & Reuse: YoY: Industrial Sales Value: Delivery Value for Export data remains active status in CEIC and is reported by National Bureau of Statistics. The data is categorized under China Premium Database’s Utility Sector – Table CN.RCA: Financial Data: Water Production and Supply: Waste Water Treatment and Reuse.
An Excel file containing spreadsheets that align with each of the tables in the paper. This dataset is associated with the following publication: A. Jahne, M., M. E. Schoen, J. L. Garland, S. P. Nappier, and J. A. Soller. Microbial Treatment Targets for Potable and Non-Potable Water Reuse – A Comprehensive Update and Harmonization. Environmental Science & Technology Letters. American Chemical Society, Washington, DC, USA, 11(11): 1136-1259, (2024).
This dataset contains information related to the apps developed by community members and using data from this portal. Detailed information about community apps can be found on our Community Built Apps page.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Date: Survey of 2018
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
This paper reports on a study exploring ‘metadata capital’ acquired via metadata reuse. Collaborative modeling and content analysis methods were used to study metadata capital in the Dryad data repository. A sample of 20 cases for two Dryad metadata workflows (Case A and Case B) consisting of 100 instantiations (60 metadata objects, 40 metadata activities) was analyzed. Results indicate that Dryad’s overall workflow builds metadata capital, with the total metadata reuse at 50% or greater for 8 of 12 metadata properties, and 5 of these 8 properties showing reuse at 80% or higher. Metadata reuse is frequent for basic bibliographic properties (e.g., author, title, subject), although it is limited or absent for more complex scientific properties (e.g., taxon, spatial, and temporal information). This paper provides background context, reports the research approach and findings, and considers research implications and system design priorities that may contribute to metadata capital—long term.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
As public availability of gene expression profiling data increases, it is natural to ask how these data can be used by neuroscientists. Here we review the public availability of high-throughput expression data in neuroscience and how it has been reused, and tools that have been developed to facilitate reuse. There is increasing interest in making expression data reuse a routine part of the neuroscience tool-kit, but there are a number of challenges. Data must become more readily available in public databases; efforts to encourage investigators to make data available are important, as is education on the benefits of public data release. Once released, data must be better-annotated. Techniques and tools for data reuse are also in need of improvement. Integration of expression profiling data with neuroscience-specific resources such as anatomical atlases will further increase the value of expression data.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains supplementary data and R scripts to generate figures for the paper 'The time efficiency gain in sharing and reuse of research data'. This version contains new R scripts to generate Figures of the revised manuscript. Abstract: Among the frequently stated benefits of sharing research data are time efficiency or increased productivity. The assumption is that reuse or secondary use of research data saves researchers time in not having to produce data for a publication themselves. This can make science more efficient and productive. However, if there is no reuse, time costs in making data available for reuse will have been made with no return on this investment. In this paper a mathematical model is used to calculate the break-even point for time spent sharing in a scientific community, versus time gain by reuse. This is done for several scenarios; from simple to complex datasets to share and reuse, and at different sharing rates. The results indicate that sharing research data can indeed cause an efficiency revenue for the scientific community. However, this is not a given in all modeled scenarios. The most efficient scientific community is one that has few sharing researchers, a high reuse rate, and low time investments for sharing and reuse. This suggests it would be beneficial to have a critical selection of datasets that are worth the effort to prepare for reuse in other scientific studies. In addition, stimulating reuse of datasets in itself would be beneficial to increase efficiency in scientific communities.