CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
While stakeholders in scholarly communication generally agree on the importance of data citation, there is not consensus on where those citations should be placed within the publication – particularly when the publication is citing original data. Recently, CrossRef and the Digital Curation Center (DCC) have recommended as a best practice that original data citations appear in the works cited sections of the article. In some fields, such as the life sciences, this contrasts with the common practice of only listing data identifier(s) within the article body (intratextually). We inquired whether data citation practice has been changing in light of the guidance from CrossRef and the DCC. We examined data citation practices from 2011 to 2014 in a corpus of 1,125 articles associated with original data in the Dryad Digital Repository. The percentage of articles that include no reference to the original data has declined each year, from 31% in 2011 to 15% in 2014. The percentage of articles that include data identifiers intratextually has grown from 69% to 83%, while the percentage that cite data in the works cited section has grown from 5% to 8%. If the proportions continue to grow at the current rate of 19-20% annually, the proportion of articles with data citations in the works cited section will not exceed 90% until 2030.
International, curated, digital repository that makes the data underlying scientific publications discoverable, freely reusable, and citable. Particularly data for which no specialized repository exists. Provides the infrastructure for, and promotes the re-use of, data underlying the scholarly literature. Governed by a nonprofit membership organization. Membership is open to any stakeholder organization, including but not limited to journals, scientific societies, publishers, research institutions, libraries, and funding organizations. Most data are associated with peer-reviewed articles, although data associated with non-peer reviewed publications from reputable academic sources, such as dissertations, are also accepted. Used to validate published findings, explore new analysis methodologies, repurpose data for research questions unanticipated by the original authors, and perform synthetic studies.UC system is member organization of Dryad general subject data repository.
The Dryad Digital Repository is a curated resource that makes the data underlying scientific publications discoverable, freely reusable, and citable. Dryad provides a general-purpose home for a wide diversity of datatypes. Dryad welcomes data submissions related to published, or accepted, scholarly publications. Dryad's objectives are to serve as a repository for tables, spreadsheets, and all other kinds of data that do not have another discipline-specific repository, and to enable scientists to: validate published findings, explore new analysis methodologies, repurpose data for research questions unanticipated by the original authors, perform synthetic studies, and utilize data for educational purposes. Dryad is governed by a nonprofit membership organization. Membership is open to any stakeholder organization, including but not limited to journals, scientific societies, publishers, research institutions, libraries, and funding organizations.Publishers are encouraged to facilitate data archiving by coordinating the submission of manuscripts with submission of data to Dryad. Learn more here.Dryad originated from an initiative among a group of leading journals and scientific societies in evolutionary biology and ecology to adopt a joint data archiving policy (JDAP) for their publications, and the recognition that easy-to-use, sustainable, community-governed data infrastructure was needed to support such a policy. See this page to learn more about JDAP.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Graphs and data for ten journals sharing data in the Dryad digital repository.
Can we condition native plants to increase drought tolerance and improve restoration success?Valliere_etal_EcoApps_Data.xlsx
Identifiers of many kinds are the key to creating unambiguous and persistent connections between research objects and other items in the global research infrastructure (GRI). Many repositories are implementing mechanisms to collect and integrate these identifiers into their submission and record curation processes. This bodes well for a well-connected future, but many existing resources submitted in the past are missing these identifiers, thus missing the connections required for inclusion in the connected infrastructure. Re-curation of these metadata is required to make these connections. The Dryad Data Repository has existed since 2008 and has successfully re-curated the repository metadata several times, adding identifiers for research organizations, funders, and researchers. Understanding and quantifying these successes depends on measuring repository and identifier connectivity. Metrics are described and applied to the entire repository here. Identifiers for papers (DOIs) connected..., These data are Dryad metadata retrieved from https://datadryad.org and translated into csv files. There are two datasets: Â 1. DryadJournalDataset was retrieved from Dryad using the ISSNs in the file DryadJournalDataset_ISSNs.txt, although some had no data. Â 2. DryadOrganizationDataset was retrieved from Dryad using the RORs in the file DryadOrganizationDataset_RORs.txt, although some had no data. Each dataset includes four types of metadata: identifiers, funders, keywords, and related works, each in a separate comma (.csv) or tab (.tsv) delimited files. There are also Microsoft Excel files (.xlsx) for the identifier metadata and connectivity summaries for each dataset (*.html). The connectivity summaries include summaries of each parameter in all four data files with definitions, counts, unique counts, most frequent values, and completeness. These data formed the basis for an analysis of the connectivity of the Dryad repository for organizations, funders, and people., , # Data For: Sustainable Connectivity in a Community Repository
This readme.txt file was generated on 30231110 by Ted Habermann
Data For: Sustainable Connectivity in a Community Repository
Principal Investigator Contact Information Name: Ted Habermann (0000-0003-3585-6733) Institution: Metadata Game Changers () Email: ORCID: 0000-0003-3585-6733
November 10, 2023
May and June 2023
National Science Foundation (Crossref Funder ID: 100000001) Award 2134956.
These data are Dryad metadata retrieved from and translated into csv files. There are two datasets:
Diffantom - pkg 1 of 2diffantom-bids.zipDiffantom - pkg 2 of 2diffantom-bids.z01
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
It was recently proposed that long-term population studies be exempted from the expectation that authors publicly archive the primary data underlying published articles. Such studies are valuable to many areas of ecological and evolutionary biological research, and multiple risks to their viability were anticipated as a result of public data archiving (PDA), ultimately all stemming from independent reuse of archived data. However, empirical assessment was missing, making it difficult to determine whether such fears are realistic. I addressed this by surveying data packages from long-term population studies archived in the Dryad Digital Repository. I found no evidence that PDA results in reuse of data by independent parties, suggesting the purported costs of PDA for long-term population studies have been overstated.
Background: Attribution to the original contributor upon reuse of published data is important both as a reward for data creators and to document the provenance of research findings. Previous studies have found that papers with publicly available datasets receive a higher number of citations than similar studies without available data. However, few previous analyses have had the statistical power to control for the many variables known to predict citation rate, which has led to uncertain estimates of the "citation benefit". Furthermore, little is known about patterns in data reuse over time and across datasets. Method and Results: Here, we look at citation rates while controlling for many known citation predictors, and investigate the variability of data reuse. In a multivariate regression on 10,555 studies that created gene expression microarray data, we found that studies that made data available in a public repository received 9% (95% confidence interval: 5% to 13%) more citations th...
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Pregnancy and delivery involve dynamic alterations in many physiological systems. However, the physiological dynamics during pregnancy and after delivery have not been systematically analyzed at high temporal resolution in a large human population. Here we present the dynamics of 76 lab tests based on a cross-sectional analysis of roughly 41 million measurements from over 300,000 pregnancies. We analyzed each test at weekly intervals from 20 weeks preconception to 80 weeks postpartum, providing detailed temporal profiles. About half of the tests take three months to a year to return to baseline during postpartum, highlighting the physiologic load of childbirth. The precision of the data revealed the effects of preconception supplements, overshoots after delivery, and intricate temporal responses to changes in blood volume and renal filtration rate. Pregnancy complications – gestational diabetes, pre-eclampsia, and postpartum hemorrhage – showed distinct dynamical changes. These results provide a comprehensive dynamic portrait of the systems physiology of pregnancy. Methods Study Population The study population consisted of individuals from the Clalit healthcare database, Israel's largest health maintenance organization (HMO). We considered all pregnancies of females aged 20 to 35 between 2003 and 2020. Information about pregnancies before 2003 is not available. We estimated the fraction of first pregnancies for the years 2010-2020 to reduce the influence of first pregnancies before 2003 which we cannot account for. For more information, see “stats.csv”. Data Collection Medical records were pseudonymized by hashing of personal identifiers and randomization of dates by a random number of weeks uniformly sampled between 0 and 13 weeks for each patient and adding it to all dates in the patient diagnoses, laboratory, and medication records. This randomization does not affect timing relative to delivery. We examined the timeframe of 60 weeks before delivery to 80 weeks after delivery for all documented labours within our study population. 0 is denoted as the week of delivery. We identified deliveries by ICD9 code V27 and confirmed a childbirth record for the individual. We excluded preterm deliveries (≤37 gestational weeks, ICD9 code 644) stillbirths, and labors with more than one newborn. Nonetheless, 12% of deliveries were at the ≤37 gestational weeks and missing the 644 code. To mitigate ascertainment bias of the test results, for each test, we removed data from individuals with chronic disease that affected the test if the onset of the disease was up to 6 months after the test. We also removed data from individuals who purchased drugs that affected the tests in the 6 months before the tests. Chronic diseases are defined as non-pediatric ICD9 codes with a Kaplan−Meyer survival drop of >10% over 5 years and are assigned above a minimal average drop of 1/3 per y. Drugs that affect a test were defined as drugs with significant effect on the test (false discovery rate < 0.01). This step allowed us to focus on a relatively healthy subset of the pregnant population, reducing the confounding effects associated with specific health conditions listed above or medication usage. To exclude the potential effect of follow-up pregnancies in the 80 weeks following delivery, we excluded lab values from individuals with another delivery within 40 weeks following the measurement. For each pregnancy, we gathered all available test values including standard blood count, kidney and liver function tests, blood coagulation tests, lipid panel, inflammation markers, and hormones. We then discretized test values into time points relative to the time of birth in weekly intervals for each test. In addition to test values, we also extracted data on patients including age (at measurement, mean, and interquartile range) and BMI (the most proximal BMI measurement in medical records outside pregnancy, mean and interquartile range, if available). Privacy concerns Retrospective test results were aggregated and only statistical information was kept. Our ethical agreement with Clalit does not require informed consent for the publication of this aggregated data. Weekly intervals with a single measurement per test were removed. Mean values were kept for weekly intervals (per test) with 10 measurements or less and other values (percentiles, standard deviation) were removed, ensuring individual measurements cannot be interpreted from the aggregated data.
The need for a names-based cyber-infrastructure for digital biology is based on the argument that scientific names serve as a standardized metadata system that has been used consistently and near universally for 250 years. As we move towards data-centric biology, name-strings can be called on to discover, index, manage, and analyze accessible digital biodiversity information from multiple sources. Known impediments to the use of scientific names as metadata include synonyms, homonyms, mis-spellings, and the use of other strings as identifiers. We here compare the name-strings in GenBank, Catalogue of Life (CoL), and the Dryad Digital Repository (DRYAD) to assess the effectiveness of the current names-management toolkit developed by Global Names to achieve interoperability among distributed data sources. New tools that have been used here include Parser (to break name-strings into component parts and to promote the use of canonical versions of the names), a modified TaxaMatch fuzzy-match...
Microglia morphology is used as a measure of neuroinflammation and pathology, but different methods to quantify microglia morphology are frequently employed across neuroscience. For reliable inference, it is critical that microglial morphology is accurately quantified and that results can be easily interpreted and compared across studies. We applied five of the most commonly used ImageJ-based methods for quantifying the microglial morphological response to a stimulus to identical photomicrographs and isolated microglial cells, which allowed for direct comparisons of the specificity and reliability of each method.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Sequencing of environmental samples has great potential for biodiversity research, but its application is limited by the lack of reliable DNA barcode databases for species identifications. Such a database has been created for epiphytic lichens of Europe, allowing us to compare the results of environmental sequencing with standard taxonomic surveys. The species undetected by taxonomic surveys (what we term the ghost component) amount to about half of the species actually present in hectare plots of Central European forests. Some of these, which currently occur only as diaspores or weakly developed thalli, are likely to be favoured in the course of global change. The ghost component usually represents a larger fraction in managed forests than in old growth unmanaged forests. The total species composition of different plots is much more similar than suggested by taxonomic surveys alone. On a regional scale, this supports the well-known statement that "everything is everywhere, but, the environment selects".
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
For clinical assay validations, well-characterized samples are essential for assessing methodology sensitivity and specificity. To support the community in the development of clinical next-generation sequencing assays for Mycobacterium tuberculosis, we released a comprehensive dataset of 50 whole genome sequences from characterized strains, complete with drug susceptibility and mutation profiles.
Madagascar is one of the world’s foremost biodiversity hotspots with more than 90% of its species endemic to the island. Malagasy carnivorans are one of only four extant terrestrial mammalian clades endemic to Madagascar. Although there are only eight extant species, these carnivorans exhibit remarkable phenotypic and ecological diversity that is often hypothesized to have diversified through an adaptive radiation. Here, we investigated the evolution of skull diversity in Malagasy carnivorans and tested if they exhibited characteristics of convergence and an adaptive radiation. We found that their skull disparity exceeds that of any other feliform family, as their skulls vary widely and strikingly capture a large amount of the morphological variation found across all feliforms. We also found evidence of shared adaptive zones in cranial shape between euplerid subclades and felids, herpestids, and viverrids. Lastly, contrary to predictions of adaptive radiation, we found that Malagasy car..., , , # Skull evolution and lineage diversification in endemic Malagasy carnivorans
https://doi.org/10.5061/dryad.x95x69psj
cran_data.Rdata contains raw data from cranial analyses:
mand_data.Rdata contains raw data from mandibular analyses:
Site01-01 Site01-02 Site01-03 Site02-01 Site02-02 Site02-03 Site03-01 Site03-02 Site03-03 Site04-01 Site04-02 Site04-03 Site05-01 Site05-02 Site05-03 Site06-01 Site06-02 Site06-03 Site07-01 Site07-02 Site07-03 Site08-01 Site08-02 Site08-03 Site09-01 Site09-02 Site09-03 Site10-01 Site10-02 Site10-03
Environmental DNA (eDNA) sampling is an increasingly important tool for answering ecological questions and informing aquatic species management . Challenges of using eDNA include determining species source location(s) and accurately and precisely measuring low concentration eDNA samples, especially considering inhibitory compounds and multiple sources of ecological and measurement variability. These challenges must be overcome to optimize our use of modeling frameworks like the eDNA Integrating Transport and Hydrology (eDITH) model. To better understand eDNA fate and transport dynamics, our ability to estimate parameters within the eDITH framework, and our ability to  reliably quantify low concentration samples,  we developed a hierarchical model and used it to evaluate a fate and transport experiment. Our model addresses several low concentration challenges by modeling the number of copies in each PCR replicate as latent variables with a count distribution and conditioning detection an..., , , # A Hierarchical Model for eDNA Fate and Transport Dynamics Accommodating Low Concentration Samples
https://doi.org/10.5061/dryad.8gtht76wc
All files used for data analysis and resulting files (posteriors, etc.) are in the "Data Analysis" folder on Zenodo.
All files used for simulation analyses are in the "Simulation" folder on Zenodo, as are all simulation results (posteriors, etc.).
Data Description
The field data are located in greenhollow_techrep 12_4_2.csv on Dryad.
Metadata for greenhollow_techrep 12_4_2.csv is in green hollow metadata.csv on Dryad.
Data Analysis
1. The data are in greenhollow_techrep 12_4_23.csv. See files to fit models for data processing.
2. The nimble model files are "Release NimModel X.R", where X is one of the four models.
3. Custom MCMC functions (inhibitor models only) are in "State Samplers.R".
4. Test scripts to run 1 chain for each model are ...
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Butterflies represent a diverse group of insects, playing key ecosystem roles such as pollination and their larval form in herbivory. Despite their importance, comprehensive global distribution data for butterfly species is lacking. This lack of comprehensive global data has hindered many large-scale questions in ecology, evolutionary biology, and conservation at regional and global scales. Here, I use an integrative workflow that combines occurrence records, alpha hull polygons, species’ dispersal capacity, natural habitat and environmental variables within a framework of species distribution models to generate species-level native distributions for butterflies at a global scale in contemporary period. The database releases native range maps for 10,372 extant species of butterflies at a spatial grain resolution of 5 arcmin (~10 km). This database has the potential to allow unprecedented large-scale analyses in ecology, biogeography, and conservation of butterflies. The maps are available in the WGS84 coordinate reference system (EPSG:4326 code) and stored as vector polygons in GEOPACKAGE format for maximum compression, allowing easy data manipulation using a standard computer. I additionally provide each species’ spatial raster. All maps and R scripts are open access and available for download at Dryad, and are guided by FAIR (Findable, Accessible, Interoperable, and Reusable) data principle. By making this data available to the scientific community, I aim to advance the sharing of biological data to stimulate more comprehensive research in ecology, biogeography, and conservation of butterflies.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
This study focused on fabricating a cellulose aerogel for oil spill cleanup, using common reed (Phragmites australis) as the cellulose source. The process involved isolating cellulose from reed via traditional Kraft pulping, considering the effects of key factors on the isolated cellulose content. After a two-stage HP bleaching sequence, the highest cellulose content achieved was 27.2%, with 80% ISO brightness and 1% ash content under mild Kraft pulping conditions of 30% sulfidity, 20% active alkali, sustained cooking at 165°C for 3 hours, and a liquor-to-reed ratio of 8:1. Subsequently, reed-based cellulose aerogel was fabricated via a freeze-drying method using an eco-friendly NaOH/PEG aqueous solvent system, which was then modified with methyltrimethoxysilane (MTMS). The resulting aerogel exhibited remarkable characteristics, including a low density of 0.04 g/cm³, high porosity of 96%, high hydrophobicity with a water contact angle (WAC) of 141°, and a superior crude oil adsorption capacity of 35 g/g. Comprehensive characterizations of the fabricated materials, including SEM, FTIR, TGA/DSC, and WAC measurements, were evaluated. This interdisciplinary study explores the commercial promise of reed-based cellulose aerogel as a sustainable solution for oil spill cleanup efforts. Methods Conducted experimental chemistry at a scaled-up level, collected data, tested physical and chemical properties, and applied the results to crude oil absorption.
R/RStudio
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
While stakeholders in scholarly communication generally agree on the importance of data citation, there is not consensus on where those citations should be placed within the publication – particularly when the publication is citing original data. Recently, CrossRef and the Digital Curation Center (DCC) have recommended as a best practice that original data citations appear in the works cited sections of the article. In some fields, such as the life sciences, this contrasts with the common practice of only listing data identifier(s) within the article body (intratextually). We inquired whether data citation practice has been changing in light of the guidance from CrossRef and the DCC. We examined data citation practices from 2011 to 2014 in a corpus of 1,125 articles associated with original data in the Dryad Digital Repository. The percentage of articles that include no reference to the original data has declined each year, from 31% in 2011 to 15% in 2014. The percentage of articles that include data identifiers intratextually has grown from 69% to 83%, while the percentage that cite data in the works cited section has grown from 5% to 8%. If the proportions continue to grow at the current rate of 19-20% annually, the proportion of articles with data citations in the works cited section will not exceed 90% until 2030.