Heading_data.RData provides animal orientation for each individual and event across the study. Data were collected using magnetometers and accelerometers attached to goats. Full methods for transforming these data into animal orientation can be found in the Supplemental Material for the manuscript.
Loc_data.RData provides GPS fixes for each individual at each timestep (1 second) for the events of interest across the study.
Heading data contains the following column names: "event" "id" "time" "heading"
"event" refers to the identified collective decision
"id" to the individual goat
"time" is the number of seconds since January 1st 1970 (standard notation)
"heading" refers to the animal orientation with respect to north
Loc data contains the additional column names:"lat" "lon"
Which refer to the latitude and longitude of the individual goat as read from their GPS collar.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data from survey's conducted as part of PLOS' project on incentivising best practice for data sharing. Surveys were run to assess the impact of two solutions that were being tested - the integration of the Dryad repository with the PLOS Pathogens submission system and the possible addition of an Accessible Data icon to articles in any PLOS journal using either Dryad, Figshare or OSF repositories. Submitting authors were emailed a survey shortly after completing their submission. This dataset contains the following files: 1) S1_DryadIntegration_Public.xlsx Results from the survey sent to PLOS Pathogens submitting authors about the Dryad integration. 2) Dryad Integration Survey Instrument.pdf Survey questions sent to PLOS Pathogens submitting authors. 3) S2_AccessibleDataLinks_Public.xlsx Results from the survey sent to submitting authors at PLOS Biology, PLOS Computational Biology, PLOS Genetics, PLOS Medicine, PLOS Neglected Tropical Diseases, PLOS ONE and PLOS Pathogens about the Accessible Data feature. 4) Accessible Data Survey Instrument.pdf Survey questions sent to PLOS Biology, PLOS Computational Biology, PLOS Genetics, PLOS Medicine, PLOS Neglected Tropical Diseases, PLOS ONE and PLOS Pathogens submitting authors. Note, PLOS Pathogens authors were asked to complete one survey, which comprised of both the Dryad integration and Accessible Data questions. Free text answers have been removed for the survey data for anonymisation purposes. The question on country has also been adjusted to show the author's region.
https://doi.org/10.5061/dryad.bnzs7h4hj
This dataset contains processed RDS object used to generate the figures in the manuscript as well as the metadata, raw gene counts and cell locations in csv format.
'**AKI_Ctrl_object.rds'** : This file contains the .Rds object which has been processed by Suerat as described in the manuscript.
International, curated, digital repository that makes the data underlying scientific publications discoverable, freely reusable, and citable. Particularly data for which no specialized repository exists. Provides the infrastructure for, and promotes the re-use of, data underlying the scholarly literature. Governed by a nonprofit membership organization. Membership is open to any stakeholder organization, including but not limited to journals, scientific societies, publishers, research institutions, libraries, and funding organizations. Most data are associated with peer-reviewed articles, although data associated with non-peer reviewed publications from reputable academic sources, such as dissertations, are also accepted. Used to validate published findings, explore new analysis methodologies, repurpose data for research questions unanticipated by the original authors, and perform synthetic studies.UC system is member organization of Dryad general subject data repository.
The Dryad Digital Repository is a curated resource that makes the data underlying scientific publications discoverable, freely reusable, and citable. Dryad provides a general-purpose home for a wide diversity of datatypes. Dryad welcomes data submissions related to published, or accepted, scholarly publications. Dryad's objectives are to serve as a repository for tables, spreadsheets, and all other kinds of data that do not have another discipline-specific repository, and to enable scientists to: validate published findings, explore new analysis methodologies, repurpose data for research questions unanticipated by the original authors, perform synthetic studies, and utilize data for educational purposes. Dryad is governed by a nonprofit membership organization. Membership is open to any stakeholder organization, including but not limited to journals, scientific societies, publishers, research institutions, libraries, and funding organizations.Publishers are encouraged to facilitate data archiving by coordinating the submission of manuscripts with submission of data to Dryad. Learn more here.Dryad originated from an initiative among a group of leading journals and scientific societies in evolutionary biology and ecology to adopt a joint data archiving policy (JDAP) for their publications, and the recognition that easy-to-use, sustainable, community-governed data infrastructure was needed to support such a policy. See this page to learn more about JDAP.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Dryad is a general-purpose curated repository for data underlying scholarly publications. Dryad's metadata framework is supported by a Dublin Core Application Profile (DCAP, hereafter referred to as application profile). This paper examines the evolution of Dryad's application profile, which has been revised over time, in an operational system, serving day-to-day needs of stakeholders. We model the relationships between data packages and data files over time, from its initial implementation in 2007 to its current practice, version 3.2, and present a crosswalk analysis. Results covering versions 1.0 to 3.0 show an increase in the number of metadata elements used to describe Dryad's data objects in Dryad. Results also confirm that Version 3.0, which envisioned separate metadata element sets for data package, data files, and publication metadata, was never fully realized due to constraints in Dryad system architecture. Version 3.1 subsequently reduced the number of metadata elements captured by recombining the publication and data package element sets. This paper documents a real world application profile implemented in an operational system, noting practical system and infrastructure constraints. Finally, the analysis presented informs an ongoing effort to update the application profile to support Dryad's diverse and expanding community of stakeholders.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Graphs and data for ten journals sharing data in the Dryad digital repository.
This data set consists of Microsoft Excel (.xlsx) and comma-delimited CSV files for the seven supplementary data files published as PDFs with our article.
Supplemental Data File 1. Citations for 308 included papers from 2019
Supplemental Data File 2: Citations for 2214 excluded papers from 2019
Supplemental Data File 3: Journal category analysis
Supplemental Data File 4: Counts for the 855 unique author keywords
Supplemental Data File 5: Frequency data for the word clouds top 100 generated from titles.
Supplemental Data File 6: Frequency data for the word clouds top 100 generated from abstracts.
Supplemental Data File 7: Counts of unique MeSH terms.
Any program that can open comma-delimited files, such as Microsoft Excel or other spreadsheet programs can be used. The secondary copies in .xlsx format require Microsoft Excel.
The first several rows of each data table contain the title and description and definitions for that table, so please open the fi...
Identifiers of many kinds are the key to creating unambiguous and persistent connections between research objects and other items in the global research infrastructure (GRI). Many repositories are implementing mechanisms to collect and integrate these identifiers into their submission and record curation processes. This bodes well for a well-connected future, but many existing resources submitted in the past are missing these identifiers, thus missing the connections required for inclusion in the connected infrastructure. Re-curation of these metadata is required to make these connections. The Dryad Data Repository has existed since 2008 and has successfully re-curated the repository metadata several times, adding identifiers for research organizations, funders, and researchers. Understanding and quantifying these successes depends on measuring repository and identifier connectivity. Metrics are described and applied to the entire repository here. Identifiers for papers (DOIs) connected..., These data are Dryad metadata retrieved from https://datadryad.org and translated into csv files. There are two datasets: Â 1. DryadJournalDataset was retrieved from Dryad using the ISSNs in the file DryadJournalDataset_ISSNs.txt, although some had no data. Â 2. DryadOrganizationDataset was retrieved from Dryad using the RORs in the file DryadOrganizationDataset_RORs.txt, although some had no data. Each dataset includes four types of metadata: identifiers, funders, keywords, and related works, each in a separate comma (.csv) or tab (.tsv) delimited files. There are also Microsoft Excel files (.xlsx) for the identifier metadata and connectivity summaries for each dataset (*.html). The connectivity summaries include summaries of each parameter in all four data files with definitions, counts, unique counts, most frequent values, and completeness. These data formed the basis for an analysis of the connectivity of the Dryad repository for organizations, funders, and people., , # Data For: Sustainable Connectivity in a Community Repository
This readme.txt file was generated on 30231110 by Ted Habermann
Data For: Sustainable Connectivity in a Community Repository
Principal Investigator Contact Information Name: Ted Habermann (0000-0003-3585-6733) Institution: Metadata Game Changers () Email: ORCID: 0000-0003-3585-6733
November 10, 2023
May and June 2023
National Science Foundation (Crossref Funder ID: 100000001) Award 2134956.
These data are Dryad metadata retrieved from and translated into csv files. There are two datasets:
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This paper reports on a study exploring 'metadata capital' acquired via metadata reuse. Collaborative modeling and content analysis methods were used to study metadata capital in the Dryad data repository. A sample of 20 cases for two Dryad metadata workflows (Case A and Case B) consisting of 100 instantiations (60 metadata objects, 40 metadata activities) was analyzed. Results indicate that Dryad's overall workflow builds metadata capital, with the total metadata reuse at 50% or greater for 8 of 12 metadata properties, and 5 of these 8 properties showing reuse at 80% or higher. Metadata reuse is frequent for basic bibliographic properties (e.g., author, title, subject), although it is limited or absent for more complex scientific properties (e.g., taxon, spatial, and temporal information). This paper provides background context, reports the research approach and findings, and considers research implications and system design priorities that may contribute to metadata capital—long term.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Pregnancy and delivery involve dynamic alterations in many physiological systems. However, the physiological dynamics during pregnancy and after delivery have not been systematically analyzed at high temporal resolution in a large human population. Here we present the dynamics of 76 lab tests based on a cross-sectional analysis of roughly 41 million measurements from over 300,000 pregnancies. We analyzed each test at weekly intervals from 20 weeks preconception to 80 weeks postpartum, providing detailed temporal profiles. About half of the tests take three months to a year to return to baseline during postpartum, highlighting the physiologic load of childbirth. The precision of the data revealed the effects of preconception supplements, overshoots after delivery, and intricate temporal responses to changes in blood volume and renal filtration rate. Pregnancy complications – gestational diabetes, pre-eclampsia, and postpartum hemorrhage – showed distinct dynamical changes. These results provide a comprehensive dynamic portrait of the systems physiology of pregnancy. Methods Study Population The study population consisted of individuals from the Clalit healthcare database, Israel's largest health maintenance organization (HMO). We considered all pregnancies of females aged 20 to 35 between 2003 and 2020. Information about pregnancies before 2003 is not available. We estimated the fraction of first pregnancies for the years 2010-2020 to reduce the influence of first pregnancies before 2003 which we cannot account for. For more information, see “stats.csv”. Data Collection Medical records were pseudonymized by hashing of personal identifiers and randomization of dates by a random number of weeks uniformly sampled between 0 and 13 weeks for each patient and adding it to all dates in the patient diagnoses, laboratory, and medication records. This randomization does not affect timing relative to delivery. We examined the timeframe of 60 weeks before delivery to 80 weeks after delivery for all documented labours within our study population. 0 is denoted as the week of delivery. We identified deliveries by ICD9 code V27 and confirmed a childbirth record for the individual. We excluded preterm deliveries (≤37 gestational weeks, ICD9 code 644) stillbirths, and labors with more than one newborn. Nonetheless, 12% of deliveries were at the ≤37 gestational weeks and missing the 644 code. To mitigate ascertainment bias of the test results, for each test, we removed data from individuals with chronic disease that affected the test if the onset of the disease was up to 6 months after the test. We also removed data from individuals who purchased drugs that affected the tests in the 6 months before the tests. Chronic diseases are defined as non-pediatric ICD9 codes with a Kaplan−Meyer survival drop of >10% over 5 years and are assigned above a minimal average drop of 1/3 per y. Drugs that affect a test were defined as drugs with significant effect on the test (false discovery rate < 0.01). This step allowed us to focus on a relatively healthy subset of the pregnant population, reducing the confounding effects associated with specific health conditions listed above or medication usage. To exclude the potential effect of follow-up pregnancies in the 80 weeks following delivery, we excluded lab values from individuals with another delivery within 40 weeks following the measurement. For each pregnancy, we gathered all available test values including standard blood count, kidney and liver function tests, blood coagulation tests, lipid panel, inflammation markers, and hormones. We then discretized test values into time points relative to the time of birth in weekly intervals for each test. In addition to test values, we also extracted data on patients including age (at measurement, mean, and interquartile range) and BMI (the most proximal BMI measurement in medical records outside pregnancy, mean and interquartile range, if available). Privacy concerns Retrospective test results were aggregated and only statistical information was kept. Our ethical agreement with Clalit does not require informed consent for the publication of this aggregated data. Weekly intervals with a single measurement per test were removed. Mean values were kept for weekly intervals (per test) with 10 measurements or less and other values (percentiles, standard deviation) were removed, ensuring individual measurements cannot be interpreted from the aggregated data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
It was recently proposed that long-term population studies be exempted from the expectation that authors publicly archive the primary data underlying published articles. Such studies are valuable to many areas of ecological and evolutionary biological research, and multiple risks to their viability were anticipated as a result of public data archiving (PDA), ultimately all stemming from independent reuse of archived data. However, empirical assessment was missing, making it difficult to determine whether such fears are realistic. I addressed this by surveying data packages from long-term population studies archived in the Dryad Digital Repository. I found no evidence that PDA results in reuse of data by independent parties, suggesting the purported costs of PDA for long-term population studies have been overstated.
Tweets database, part 1 of 5Postgres database of tweets (backup created by pg_dump in pgAdmin). Partitioned into 5 smaller files with 'split' command. To assemble use "cat rawdata[1-5] > combined_rawdata.backup"rawdata1Tweets database, part 2 of 5Postgres database of tweets (backup created by pg_dump in pgAdmin). Partitioned into 5 smaller files with 'split' command. To assemble use "cat rawdata[1-5] > combined_rawdata.backup"rawdata2Tweets database, part 3 of 5Postgres database of tweets (backup created by pg_dump in pgAdmin). Partitioned into 5 smaller files with 'split' command. To assemble use "cat rawdata[1-5] > combined_rawdata.backup"rawdata3Tweets database, part 4 of 5Postgres database of tweets (backup created by pg_dump in pgAdmin). Partitioned into 5 smaller files with 'split' command. To assemble use "cat rawdata[1-5] > combined_rawdata.backup"rawdata4Tweets database, part 5 of 5.Postgres database of tweets (backup created by pg_dump in pgAdmin). Partitioned into...
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
For clinical assay validations, well-characterized samples are essential for assessing methodology sensitivity and specificity. To support the community in the development of clinical next-generation sequencing assays for Mycobacterium tuberculosis, we released a comprehensive dataset of 50 whole genome sequences from characterized strains, complete with drug susceptibility and mutation profiles.
Citation location resultsDetails of the articles analyzed to determine the location of original data identifiers within the article. Only includes results for articles that were Open Access and full-text indexed in EPMC.citation_locations.txtCitation location scriptThis Python script imports lists of Dryad metadata and uses it to search Europe PMC and to locate and classify data references appearing in the literature.citation_locations.py
The .csv files can be readily accessed with a wide range of open-source and proprietry software
Global Wood Density DatabasePlease direct all correspondence to G. Lopez-Gonzalez GlobalWoodDensityDatabase.xls
The development of high-fidelity mechanical property prediction models for the design of polycrystalline materials relies on large volumes of microstructural feature data. Concurrently, at these same scales, the deformation fields that develop during mechanical loading can be highly heterogeneous. Spatially correlated measurements of 3D microstructure and the ensuing deformation fields at the micro-scale would provide highly valuable insight into the relationship between microstructure and macroscopic mechanical response. They would also provide direct validation for numerical simulations that can guide and speed up the design of new materials and microstructures. However, to date, such data have been rare. Here, a one-of-a-kind, multi-modal dataset is presented that combines recent state-of-the-art experimental developments in 3D tomography and high-resolution deformation field measurements.
People routinely hear and understand speech at rates of 120–200 words per minute [1, 2]. Thus, speech comprehension must involve rapid, online neural mechanisms that process words’ meanings in an approximately time-locked fashion. However, in the context of continuous speech, electrophysiological evidence for such time-locked processing has been lacking. Whilst valuable insights into the semantic processing of speech have been provided by the “N400 component” of the event-related potential [3-6], this literature has been dominated by paradigms using incongruous words within specially constructed sentences, and may not accurately reflect natural, narrative speech comprehension. Building on the discovery that cortical activity “tracks” the dynamics of running speech [7-9], and psycholinguistic work both demonstrating [10-12] and modeling [13-15] how context rapidly impacts on word processing, we describe a new approach for deriving an electrophysiological correlate of natural speech compr...
The DRIAMS dataset is ressource intended for antimicrobial resistance prediction from real-world clinical routine MALDI-TOF mass spectra. It is comprised of four subdatasets collected at different medical institutions across Switzerland. For each site, the data consists of MALDI-TOF mass spectra in the form of .txt files and a meta-data file. (i) The meta-data, incl. species and antimicrobial resistance corresponding to each spectra, is part of the "id" folder (ii) The remaining folders store the MALDI-TOF mass spectra in various stages of preprocessing: "raw" all spectra as extracted from the MALDI-TOF MS instrument, "preprocessed" all spectra after the application of an established preprocessing pipeline and "binned_6000" all spectra after the application of an established preprocessing pipeline and binning along the mass-to-charge-ratio axis with a bin size of 3Da, resulting in 6000 feature bins. For details on the dataset extraction, quality control, preprocessing an...
Heading_data.RData provides animal orientation for each individual and event across the study. Data were collected using magnetometers and accelerometers attached to goats. Full methods for transforming these data into animal orientation can be found in the Supplemental Material for the manuscript.
Loc_data.RData provides GPS fixes for each individual at each timestep (1 second) for the events of interest across the study.
Heading data contains the following column names: "event" "id" "time" "heading"
"event" refers to the identified collective decision
"id" to the individual goat
"time" is the number of seconds since January 1st 1970 (standard notation)
"heading" refers to the animal orientation with respect to north
Loc data contains the additional column names:"lat" "lon"
Which refer to the latitude and longitude of the individual goat as read from their GPS collar.