CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
While stakeholders in scholarly communication generally agree on the importance of data citation, there is not consensus on where those citations should be placed within the publication – particularly when the publication is citing original data. Recently, CrossRef and the Digital Curation Center (DCC) have recommended as a best practice that original data citations appear in the works cited sections of the article. In some fields, such as the life sciences, this contrasts with the common practice of only listing data identifier(s) within the article body (intratextually). We inquired whether data citation practice has been changing in light of the guidance from CrossRef and the DCC. We examined data citation practices from 2011 to 2014 in a corpus of 1,125 articles associated with original data in the Dryad Digital Repository. The percentage of articles that include no reference to the original data has declined each year, from 31% in 2011 to 15% in 2014. The percentage of articles that include data identifiers intratextually has grown from 69% to 83%, while the percentage that cite data in the works cited section has grown from 5% to 8%. If the proportions continue to grow at the current rate of 19-20% annually, the proportion of articles with data citations in the works cited section will not exceed 90% until 2030.
Collected in this dataset are the slideset and abstract for a presentation on Toward a Reproducible Research Data Repository by the depositar team at International Symposium on Data Science 2023 (DSWS 2023), hosted by the Science Council of Japan in Tokyo on December 13-15, 2023. The conference was organized by the Joint Support-Center for Data Science Research (DS), Research Organization of Information and Systems (ROIS) and the Committee of International Collaborations on Data Science, Science Council of Japan. The conference programme is also included as a reference.
Toward a Reproducible Research Data Repository
Cheng-Jen Lee, Chia-Hsun Ally Wang, Ming-Syuan Ho, and Tyng-Ruey Chuang
Institute of Information Science, Academia Sinica, Taiwan
The depositar (https://data.depositar.io/) is a research data repository at Academia Sinica (Taiwan) open to researhers worldwide for the deposit, discovery, and reuse of datasets. The depositar software itself is open source and builds on top of CKAN. CKAN, an open source project initiated by the Open Knowledge Foundation and sustained by an active user community, is a leading data management system for building data hubs and portals. In addition to CKAN's out-of-the-box features such as JSON data API and in-browser preview of uploaded data, we have added several features to the depositar, including sourcing from Wikidata as dataset keywords, a citation snippet for datasets, in-browser Shapefile preview, and a persistent identifier system based on ARK (Archival Resource Keys). At the same time, the depositar team faces an increasing demand for interactive computing (e.g. Jupyter Notebook) which facilitates not just data analysis, but also for the replication and demonstration of scientific studies. Recently, we have provided a JupyterHub service (a multi-tenancy JupyterLab) to some of the depositar's users. However, it still requires users to first download the data files (or copy the URLs of the files) from the depositar, then upload the data files (or paste the URLs) to the Jupyter notebooks for analysis. Furthermore, a JupyterHub deployed on a single server is limited by its processing power which may lower the service level to the users. To address the above issues, we are integrating the BinderHub into the depositar. BinderHub (https://binderhub.readthedocs.io/) is a kubernetes-based service that allows users to create interactive computing environments from code repositories. Once the integration is completed, users will be able to launch Jupyter Notebooks to perform data analysis and vsualization without leaving the depositar by clicking the BinderHub buttons on the datasets. In this presentation, we will first make a brief introduction to the depositar and BinderHub along with their relationship, then we will share our experiences in incorporating interactive computation in a data repository. We shall also evaluate the possibility of integrating the depositar with other automation frameworks (e.g. the Snakemake workflow management system) in order to enable users to reproduce data analysis.
BinderHub, CKAN, Data Repositories, Interactive Computing, Reproducible Research
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data to complement the quantitative analysis of data citation practices in digital repositories based on metadata records from the re3data.org repositories registry.
Data was retrieved using re3data.org API on 23-02-2023 and 06-03-2023 and processed using the OpenRefine software.
Part of "A FAIR-enabling citation model for Cultural Heritage Objects" project activities.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
List of repositories evaluated in this study.
This document contains brief descriptions of many of the treatments found in the PTSD Repository, organized by treatment category. Note: The download is a .zip file which contains the PDF Reference Guide.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Text patterns considered as PDB URLs.
Project portal for publishing, citing, sharing and discovering research data. Software, protocols, and community connections for creating research data repositories that automate professional archival practices, guarantee long term preservation, and enable researchers to share, retain control of, and receive web visibility and formal academic citations for their data contributions. Researchers, data authors, publishers, data distributors, and affiliated institutions all receive appropriate credit. Hosts multiple dataverses. Each dataverse contains studies or collections of studies, and each study contains cataloging information that describes the data plus the actual data files and complementary files. Data related to social sciences, health, medicine, humanities or other sciences with an emphasis in human behavior are uploaded to the IQSS Dataverse Network (Harvard). You can create your own dataverse for free and start adding studies for your data files and complementary material (documents, software, etc). You may install your own Dataverse Network for your University or organization.
No description is available. Visit https://dataone.org/datasets/urn%3Auuid%3Afc37b5f2-f69b-497e-bb85-048a19e9950a for complete metadata about this dataset.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Sharing research data provides benefit to the general scientific community, but the benefit is less obvious for the investigator who makes his or her data available. We examined the citation history of 85 cancer microarray clinical trial publications with respect to the availability of their data. The 48% of trials with publicly available microarray data received 85% of the aggregate citations. Publicly available data was significantly (p = 0.006) associated with a 69% increase in citations, independently of journal impact factor, date of publication, and author country of origin using linear regression. This correlation between publicly available data and increased literature impact may further motivate investigators to share their detailed research data.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Only citations for education. Includes analysis.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Machine-readable metadata available from landing pages for datasets facilitate data citation by enabling easy integration with reference managers and other tools used in a data citation workflow. Embedding these metadata using the schema.org standard with the JSON-LD is emerging as the community standard. This dataset is a listing of data repositories that have implemented this approach or are in the progress of doing so.
This is the first version of this dataset and was generated via community consultation. We expect to update this dataset, as an increasing number of data repositories adopt this approach, and we hope to see this information added to registries of data repositories such as re3data and FAIRsharing.
In addition to the listing of data repositories we provide information of the schema.org properties supported by these data repositories, focussing on the required and recommended properties from the "Data Citation Roadmap for Scholarly Data Repositories".
This dataset describes how datasets published in the research data repository RADAR are referenced, combining references extracted from Google Scholar, DataCite Event Data and the Data Citation Corpus.
DOIs assigned to RADAR datasets were retrieved from the RADAR API 2025-01-27. References in the three data sources were then identified using these DOIs. Each research output referencing a RADAR dataset was accessed to determine where the reference occurred in the full text. Author names and publication dates for datasets and referencing objects were added from OpenAlex and DataCite on 2025-02-10. Author names of datasets and referencing objects were compared to determine if data reuse occurred.
Current strategies for primary data management towards effective data publication, retrieval, sharing and citation require enhanced platforms. One driving force is the recent developments in biotechnology. It is attended by a strong growth of scientific primary data. For example, “Next-Generation-Sequencing” or “Plant-Phenotyping” technologies produce a huge amount of primary data. Its analysis and publication is one pillar in modern life science research. Consequently, the responsible use and efficient availability of digital resources is an important factor in the nowadays “e-science” age. The JAVA-based e!DAL-API is a comprehensive storage backend for primary data management. It provides main features for the long-term preservation of scientific primary data and has been designed and tested using experiences from several research projects and literature studies.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
O dataset contém 17.324 registros de documentos científicos (Article, Conference Paper, Data Paper, Review, Book Chapter, Note, Book, Letter, Editorial, Short Survey, Erratum, Retracted), extraídos da base Scopus, com janela temporal entre 2016 a 2023 e que tem em suas referências citação ao repositório Figshare.-----------------The dataset contains 17,324 records of scientific documents (Article, Conference Paper, Data Paper, Review, Book Chapter, Note, Book, Letter, Editorial, Short Survey, Erratum, Retracted) extracted from the Scopus database, covering the period from 2016 to 2023, and referencing the Figshare repository in their citations.-----------------El conjunto de datos contiene 17.324 registros de documentos científicos (Artículo, Ponencia de Conferencia, Data Paper, Revisión, Capítulo de Libro, Nota, Libro, Carta, Editorial, Encuesta Corta, Errata, Retractado), extraídos de la base de datos Scopus, con un período temporal de 2016 a 2023 y que tienen en sus referencias una cita al repositorio Figshare.
This is the FAO Fishery and Aquaculture Reference Data repository: Codes and reference data for fishing gear, species, currencies, commodities, countries and others.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Author information for: Hyperprolific physics authors by year 2000-2022 inclusive. Author needs to have 73 or more publications in a year to qualify and their main field from 2000 to 2022 is Physics & Astronomy Hyperprolific non-physics authors by year 2000-2022 inclusive. Author needs to have 73 or more publications in a year to qualify and their main field from 2000 to 2022 must not be Physics & Astronomy Almost Hyperprolific physics authors by year 2000-2022 inclusive. Author needs to have between 61 and 72 publications in a year to qualify and their main field from 2000 to 2022 is Physics & Astronomy Almost Hyperprolific non-physics authors by year 2000-2022 inclusive. Author needs to have between 61 and 72 publications in a year to qualify and their main field from 2000 to 2022 must not be Physics & Astronomy Data: year - The year of publication of the articles, reviews, and conference proceedings analyzed in this study author_field - The primary field assigned to the author based upon publications and share of publications in the field field - The ScienceMetrix field classification of the journal the publication appears in Pubs_in_field_year - Total number of publications by the author in the given year in the given field Pubs_year_total - Total number of publications by the author in the given year Pubs_in_field_2000_2022 - Total number of publications by the author in the period 2000 to 2022 in the given field, Pubs_2000_2022_total - Total number of publications by the author in the period 2000 to 2022 rank - The author's rank, within their SM subfield, based on citation count rank (ns) - The author's rank, within their SM subfield, based on citation count with self citations excluded sm-subfield-1 - The author's SM subfield rank sm-subfield-1 - The author's rank, within their SM subfield, based upon their citation score rank sm-subfield-1 (ns) - The author's rank, within their SM subfield, based upon their citation score excluding self citations sm-subfield-1 count - The number of authors in the SM subfield Subfield_Percentile - 100 * rank/count Subfield_Percentile_ns - 100 * rank (ns)/count author_name - Author's name from their Scopus Author Profile hidx - The author's career H-Index affil_name - The name of the author's most recent affiliation cntry - The country of the author's most recent affiliation Total_Pubs - The number of articles, reviews, and conference proceedings from the affiliation in the period Pubs_with_more_than_100_authors - The number of articles, reviews, and conference proceedings from the affiliation in the period with more than 100 authors Pubs_with_more_than_500_authors - The number of articles, reviews, and conference proceedings from the affiliation in the period with more than 500 authors Pubs_with_more_than_1000_authors - The number of articles, reviews, and conference proceedings from the affiliation in the period with more than 1000 authors
Collection contains open and publicly funded data sets created by Brown University faculty and student researchers. Increasingly, publishers, and funders are requiring that protocols, data sets, metadata, and code underlying published research be retained and preserved, their locations cited within publications, and shared with other researchers and the public. The deposits here endeavor to be in line with FAIR Principles (Findable, Accessible, Interoperable, Reusable). If you would like to deposit data set into this collection for the purposes of citation/linking within publication and public dissemination, then please log in, zip up and upload your file, and request digital object identifier (DOI) for your data citation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This file contains a list of relevant references on value of information (VOI) in RIS format. VOI provides a quantitative analysis to evaluate the outcome of the combined technologies (seismology, hydrology, geodesy) used to monitor Brady's Geothermal Field.
The DDL maintains data on articles referencing the DDL since it was formally established in 2014. Details include article citations, DDL site or data asset citations, and data asset availability statements, in addition to codes indicating whether specific data assets are referenced and whether data is referenced in a citation, which may indicate data reuse. This data asset is updated quarterly.
The Environmental Data Initiative (EDI) is a trustworthy, stable data repository and data management support organization for the environmental scientist. EDI provides tools and support that allow the environmental researcher to easily integrate data publishing into the research workflow. Almost ten years since going into production, these data and code were used to provide a general description of EDI’s collection of data and its data management philosophy and placement in the repository landscape. They show how comprehensive metadata and the repository infrastructure lead to highly findable, accessible, interoperable, and reusable (FAIR) data by evaluating compliance with specific community proposed FAIR criteria. Finally, they provide measures and patterns of data (re)use, assuring that EDI is fulfilling its stated premise.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
While stakeholders in scholarly communication generally agree on the importance of data citation, there is not consensus on where those citations should be placed within the publication – particularly when the publication is citing original data. Recently, CrossRef and the Digital Curation Center (DCC) have recommended as a best practice that original data citations appear in the works cited sections of the article. In some fields, such as the life sciences, this contrasts with the common practice of only listing data identifier(s) within the article body (intratextually). We inquired whether data citation practice has been changing in light of the guidance from CrossRef and the DCC. We examined data citation practices from 2011 to 2014 in a corpus of 1,125 articles associated with original data in the Dryad Digital Repository. The percentage of articles that include no reference to the original data has declined each year, from 31% in 2011 to 15% in 2014. The percentage of articles that include data identifiers intratextually has grown from 69% to 83%, while the percentage that cite data in the works cited section has grown from 5% to 8%. If the proportions continue to grow at the current rate of 19-20% annually, the proportion of articles with data citations in the works cited section will not exceed 90% until 2030.