43 datasets found

A
All of Us - NIH - Registered Tier
dataverse.asu.edu
xlsx
Updated Feb 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ASU Library Research Data Repository (2025). All of Us - NIH - Registered Tier [Dataset]. http://doi.org/10.48349/ASU/5FXHQU
Explore at:
xlsx(1180323)Available download formats
Unique identifier
https://doi.org/10.48349/ASU/5FXHQU
Dataset updated
Feb 19, 2025
Dataset provided by
ASU Library Research Data Repository
License
https://dataverse.asu.edu/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.48349/ASU/5FXHQUhttps://dataverse.asu.edu/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.48349/ASU/5FXHQU
Dataset funded by
National Institutes of Health (NIH)
Description
The All of Us Research Hub contains a wide variety of datatypes, including survey responses, measurements, biosamples, electronic health records (EHRs), and data from mobile health devices from participants who are healthy as well as experiencing illness. The Registered Tier curated dataset contains individual-level data, available only to approved researchers on the Researcher Workbench. The Registered Tier currently includes data from electronic health records, survey answers, and physical measurements taken at the time of participant enrollment. Only authorized users who have registered with the All of Us Research Program can access the Registered Tier data. Authorized users also can access tools such as the Cohort Builder, Jupyter Notebooks, and Dataset Builder.
f
Tools for research data curation.
plos.figshare.com
xls
Updated Jun 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dong Joon Lee; Besiki Stvilia (2023). Tools for research data curation. [Dataset]. http://doi.org/10.1371/journal.pone.0173987.t007
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0173987.t007
Dataset updated
Jun 11, 2023
Dataset provided by
PLOS ONE
Authors
Dong Joon Lee; Besiki Stvilia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Tools for research data curation.
d
Directory of Public Repositories of Geological Materials
catalog.data.gov
data.usgs.gov
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Directory of Public Repositories of Geological Materials [Dataset]. https://catalog.data.gov/dataset/directory-of-public-repositories-of-geological-materials
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
Overview This directory was developed to provide discovery information for anyone looking for publicly accessible repositories that house geological materials in the U.S. and Canada. In addition, this resource is intended to be a tool to facilitate a community of practice. The need for the directory was identified during planning for and follow-up from a drill core repository webinar series in Spring 2020 for public repository curators and staff in the U.S. and Canada hosted by the Minnesota Geological Survey and the Minnesota Department of Natural Resources. Additional supporting sponsors included the U.S. Geological Survey National Geological and Geophysical Data Preservation Program and the Association of American State Geologists Data Preservation Committee. The 10-part webinar series provided overviews of state, provincial, territorial, and national repositories that house drill core, other geoscience materials, and data. When the series concluded a small working group of the participants continued to meet to facilitate the development and production of a directory of repositories that maintain publicly-accessible geological materials throughout the U.S. and Canada. The group used previous directory efforts described in the next section, Summary of Historical Repository Directory Compilation Efforts, as guides for content during development. The working group prepared and compiled responses from a call for repository information and characterization. This directory is planned to be a living resource for the geoscience community with updates every other year to accommodate changes. The updates will facilitated through versioned updates of this data release. Summary of Historical Repository Directory Compilation Efforts 1957 – Sample and Core Repositories of the United States, Alaska, and Canada. Published by AAPG. Committee on Preservation of Samples and Cores. 13 members from industry, academia, and government. 1977 – Well-Sample and Core Repositories of the Unites States and Canada, C.K. Fisher; M.P. Krupa, USGS Open file report 77-567.USGS wanted to update the original index. Includes a map showing core repositories by “State” “University” “Commercial” and “Federal”. Also includes a “Brief Statement of Requirements for the Preservation of Subsurface Material and Data” and referral to state regulations for details on preserved materials. 1984 - Nonprofit Sample and Core Repositories Open to the Public in the United States – USGS Circular 942. James Schmoker, Thomas Michalski, Patricia Worl. The survey was conducted by a questionnaire mailed to repository curators. Information on additions, corrections, and deletions to earlier (1957,1977) directories from state geologists, each state office of the Water Resources Division of the U.S. Geological Survey, additional government agencies and colleagues were also used. 1997 - The National Directory of Geoscience Data Repositories, edited by Nicholas H. Claudy – American Geologic Institute. To prepare the directory, questionnaires were mailed to state geologists, more than 60 geological societies, private-sector data centers selected from oil and gas directories, and to the membership committee of the American Association of Petroleum Geologists, one of AGI's member societies. The directory contains 124 repository listings, organized alphabetically by state. 2002 – National Research Council 2002. Geoscience Data and Collections: National resources in Peril. Washington, D.C.: The National Academies Press 2005 – The National Geological and Geophysical Data Preservation Program (NGGDPP) of the United States Geological Survey (USGS) was established by The Energy Policy Act of 2005, and reauthorized in the Consolidated Appropriations Act, 2021, “to preserve and expose the Nation’s geoscience collections (samples, logs, maps, data) to promote their discovery and use for research and resource development”. The Program provides “technical and financial assistance to state geological surveys and U.S. Department of the Interior (DOI) bureaus” to archive “geological, geophysical, and engineering data, maps, photographs, samples, and other physical specimens”. Metadata records describing the preserved assets are cataloged in the National Digital Catalog (NDC). References American Association of Petroleum Geologists, 1957, Sample and core repositories of the United States, Alaska, and Canada: American Association of Petroleum Geologists, Committee on Preservation of Samples and Cores, 29 p. American Association of Petroleum Geologists, 2018, US Geological Sample and Data Repositories: American Association of Petroleum Geologists, Preservation of Geoscience Data Committee, Unpublished, (Contact: AAPG Preservation of Geoscience Data Committee) American Geological Institute, 1997, National Geoscience Data Repository System, Phase II. Final report, January 30, 1995--January 28, 1997. United States. https://doi.org/10.2172/598388 American Geological Institute, 1997, National Directory of Geoscience Data Repositories, Claudy, N. H., (ed.), 91pp. Claudy N., Stevens D., 1997, AGI Publishes first edition of national directory of geoscience data repositories. American Geological Institute Spotlight, https://www.agiweb.org/news/datarep2.html Consolidated Appropriations Act, 2021 (Public Law 116-260, Sec.7002) Davidson, E. D., Jr., 1981, A look at core and sample libraries: Bureau of Economic Geology, The University of Texas at Austin, 4 p. and Appendix. Deep Carbon Observatory (DCO) Data Portal, Scientific Collections, https://info.deepcarbon.net/vivo/scientific-collections; Keyword Search: sample repository, https://info.deepcarbon.net/vivo/scientific-collections?source=%7B%22query%22%3A%7B%22query_string%22%3A%7B%22query%22%3A%22sample%20repository%20%22%2C%22default_operator%22%3A%22OR%22%7D%7D%2C%22sort%22%3A%5B%7B%22_score%22%3A%7B%22order%22%3A%22asc%22%7D%7D%5D%2C%22from%22%3A0%2C%22size%22%3A200%7D: Accessed September 29, 2021 Fisher, C. K., and Krupa, M. P., 1977, Well-sample and core repositories of the United States and Canada: U.S. Geological Survey Open-File Report 77-567, 73 p. https://doi.org/10.3133/ofr77567 Fogwill, W.D., 1985, Drill Core Collection and Storage Systems in Canada, Manitoba Energy & Mines. https://www.ngsc-cptgs.com/files/PGJSpecialReport_1985_V03b.pdf Goff, S., and Heiken, G., eds., 1982, Workshop on core and sample curation for the National Continental Scientific Drilling Program: Los Alamos National Laboratory, May 5-6, 1981, LA-9308-C, 31 p. https://www.osti.gov/servlets/purl/5235532 Lonsdale, J. T., 1953, On the preservation of well samples and cores: Oklahoma City Geological Society Shale Shaker, v. 3, no. 7, p. 4. National Geological and Geophysical Data Preservation Program. https://www.usgs.gov/core-science-systems/national-geological-and-geophysical-data-preservation-program National Research Council. 2002. Geoscience Data and Collections: National Resources in Peril. Washington, DC: The National Academies Press, 107 p. https://doi.org/10.17226/10348 Pow, J. R., 1969, Core and sample storage in western Canada: Bulletin of Canadian Petroleum Geology, v. 17, no. 4, p. 362-369. DOI: 10.35767/gscpgbull.17.4.362 Ramdeen, S., 2015. Preservation challenges for geological data at state geological surveys, GeoResJ 6 (2015) 213-220, https://doi.org/10.1016/j.grj.2015.04.002 Schmoker, J. W., Michalski, T. C., and Worl, P. B., 1984, Nonprofit sample and core repositories of the United States: U.S. Geological Survey Circular 942. https://doi.org/10.3133/cir942 Schmoker, J. W., Michalski, T. C., and Worl, P. B., 1984, Addresses, telephone numbers, and brief descriptions of publicly available, nonprofit sample and core repositories of the United States: U.S. Geological Survey Open-File Report 84-333, 13 p. (Superseded by USGS Circular 942) https://doi.org/10.3133/ofr84333 The Energy Policy Act of 2005 (Public Law 109-58, Sec. 351) The National Digital Catalog (NDC). https://www.usgs.gov/core-science-systems/national-geological-and-geophysical-data-preservation-program/national-digital U.S. Bureau of Mines, 1978, CORES Operations Manual: Bureau of Mines Core Repository System: U.S. Bureau of Mines Information Circular IC 8784, 118 p. https://digital.library.unt.edu/ark:/67531/metadc170848/
d
Data from: Evolution of Data Creation, Management, Publication, and Curation...
search.dataone.org
borealisdata.ca
Updated Dec 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthews (2023). Evolution of Data Creation, Management, Publication, and Curation in the Research Process [Dataset]. https://search.dataone.org/view/sha256%3Ab867bbb1ca5425f76f0d027cd3981fabc6e1091941de53c232f4e348fa535ae3
Explore at:
Dataset updated
Dec 28, 2023
Dataset provided by
Borealis
Authors
Matthews
Time period covered
Jan 1, 2014
Description
Data relating to the publication. Sharing research data and scholarship is of national importance because of the increased focus on maximizing return on the U.S. government's investment in research programs. Recent government policy changes have directly affected the management and accessibility of publically funded research. On January 18, 2011, the National Science Foundation, a U.S. agency that supports research and education in nonmedical fields, required that data management plans be submitted with all grant proposals. On February 22, 2013, the U.S. President's Office of Science and Technology Policy extended a similar requirement for all federal agencies with research and development budgets of more than $100 million. These requirements illustrate the need for further coordination and management of data as scholarship with traditional publications. Purdue University Libraries and its Joint Transportation Research Program (JTRP) collaborated to develop a comprehensive work flow that links technical report production with the management and publication of associated data. This paper illustrates early initiatives to integrate discrete data publications with traditional scholarly publications by leveraging new and existing repository platforms and services. The authors review government policies, past data-sharing practices, early pilot initiatives, and work flow integration between Purdue's data repository, the traditional press, and institutional repository. Through the adoption of these work flows, the authors propose best practices for integrating data publishing and dissemination into the research process. The implementation of this model has the potential to assist researchers in meeting the requirements of federal funding agencies, while reducing redundancy, ensuring integrity, expanding accessibility, and increasing the return on research investment.
n
Data for: Sustainable connectivity in a community repository
data.niaid.nih.gov
search.dataone.org
+1more
zip
Updated Dec 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ted Habermann (2023). Data for: Sustainable connectivity in a community repository [Dataset]. http://doi.org/10.5061/dryad.nzs7h44xr
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.nzs7h44xr
Dataset updated
Dec 7, 2023
Dataset provided by
Metadata Game Changers (United States)
Authors
Ted Habermann
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Identifiers of many kinds are the key to creating unambiguous and persistent connections between research objects and other items in the global research infrastructure (GRI). Many repositories are implementing mechanisms to collect and integrate these identifiers into their submission and record curation processes. This bodes well for a well-connected future, but many existing resources submitted in the past are missing these identifiers, thus missing the connections required for inclusion in the connected infrastructure. Re-curation of these metadata is required to make these connections. The Dryad Data Repository has existed since 2008 and has successfully re-curated the repository metadata several times, adding identifiers for research organizations, funders, and researchers. Understanding and quantifying these successes depends on measuring repository and identifier connectivity. Metrics are described and applied to the entire repository here. Identifiers for papers (DOIs) connected to datasets in Dryad have long been a critical part of the Dryad metadata creation and curation processes. Since 2019, the % of datasets with connected papers has decreased from 100% to less than 40%. This decrease has significant ramifications for the re-curation efforts described above as connected papers are an important source of metadata. In addition, missing connections to papers make understanding and re-using datasets more difficult. Connections between datasets and papers are many times difficult to make because of time lags between submission and publication, lack of clear mechanisms for citing datasets and other research objects from papers, changing focus of researchers, and other obstacles. The Dryad community of members, i.e. users, research institutions, publishers, and funders have vested interests in identifying these connections and critical roles in the curation and re-curation efforts. Their engagement will be critical in building on the successes Dryad has already achieved and ensuring sustainable connectivity in the future. Methods These data are Dryad metadata retrieved from https://datadryad.org and translated into csv files. There are two datasets: 1. DryadJournalDataset was retrieved from Dryad using the ISSNs in the file DryadJournalDataset_ISSNs.txt, although some had no data. 2. DryadOrganizationDataset was retrieved from Dryad using the RORs in the file DryadOrganizationDataset_RORs.txt, although some had no data. Each dataset includes four types of metadata: identifiers, funders, keywords, and related works, each in a separate comma (.csv) or tab (.tsv) delimited files. There are also Microsoft Excel files (.xlsx) for the identifier metadata and connectivity summaries for each dataset (*.html). The connectivity summaries include summaries of each parameter in all four data files with definitions, counts, unique counts, most frequent values, and completeness. These data formed the basis for an analysis of the connectivity of the Dryad repository for organizations, funders, and people.
d
United States Gulf Coast Basin Curated Wells and Logs Database (ver. 2.0,...
catalog.data.gov
data.usgs.gov
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). United States Gulf Coast Basin Curated Wells and Logs Database (ver. 2.0, November 2023) [Dataset]. https://catalog.data.gov/dataset/united-states-gulf-coast-basin-curated-wells-and-logs-database
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Gulf Coast of the United States
Description
The United States Gulf Coast Basin Curated Wells and Logs Database (CWLDB) is an online repository with stratigraphic information for petroleum wells in the United States portion of the onshore Gulf of Mexico Basin that provides several of the following attributes: a) deep penetrations (generally, total depth of 10,000 feet or more), b) high quality and diverse geophysical well log suites, c) lithostratigraphic logs, d) biostratigraphic units (biozones) and reports, and/or e) core or cuttings samples.
United States LEMIS wildlife trade data curated by EcoHealth Alliance
zenodo.org
data.niaid.nih.gov
csv, zip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Evan A. Eskew; Evan A. Eskew; Allison M. White; Noam Ross; Noam Ross; Kristine M. Smith; Katherine F. Smith; Jon Paul Rodríguez; Jon Paul Rodríguez; Carlos Zambrana-Torrelio; Carlos Zambrana-Torrelio; William B. Karesh; William B. Karesh; Peter Daszak; Peter Daszak; Allison M. White; Kristine M. Smith; Katherine F. Smith (2020). United States LEMIS wildlife trade data curated by EcoHealth Alliance [Dataset]. http://doi.org/10.5281/zenodo.3565869
Explore at:
csv, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3565869
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Evan A. Eskew; Evan A. Eskew; Allison M. White; Noam Ross; Noam Ross; Kristine M. Smith; Katherine F. Smith; Jon Paul Rodríguez; Jon Paul Rodríguez; Carlos Zambrana-Torrelio; Carlos Zambrana-Torrelio; William B. Karesh; William B. Karesh; Peter Daszak; Peter Daszak; Allison M. White; Kristine M. Smith; Katherine F. Smith
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Shared here are United States Fish and Wildlife Service (USFWS) Law Enforcement Management Information System (LEMIS) data on wildlife and wildlife product imports into the United States. This data was obtained via Freedom of Information Act (FOIA) requests by EcoHealth Alliance.

Data were curated, cleaned, and made accessible via an R package interface: https://github.com/ecohealthalliance/lemis.

Additionally, a summary of a portion of the data can be found in Smith et al. 2017, EcoHealth (https://doi.org/10.1007/s10393-017-1211-7).

lemis_2000_2014_cleaned.csv: This file represents the compiled, cleaned LEMIS data from 2000-2014. This data is identical to the version 1.1.0 dataset available through the lemis R package.

lemis_codes.csv: Full values for all coded values used in the LEMIS data. Identical to the output from the lemis R package function "lemis_codes()".

lemis_metadata.csv: Data fields and field descriptions for all variables in the LEMIS data. Identical to the output from the lemis R package function "lemis_metadata()".

raw_data.zip: This archive contains all of the raw LEMIS data files that are processed and cleaned with the code contained in the 'data-raw' subdirectory of the lemis R package repository.
f
Research data activities and their corresponding actions in IRs.
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dong Joon Lee; Besiki Stvilia (2023). Research data activities and their corresponding actions in IRs. [Dataset]. http://doi.org/10.1371/journal.pone.0173987.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0173987.t001
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Dong Joon Lee; Besiki Stvilia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Research data activities and their corresponding actions in IRs.
f
Users’ data activities in IRs.
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dong Joon Lee; Besiki Stvilia (2023). Users’ data activities in IRs. [Dataset]. http://doi.org/10.1371/journal.pone.0173987.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0173987.t003
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Dong Joon Lee; Besiki Stvilia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Users’ data activities in IRs.
GrainGenes- A Global Data Repository for Small Grains
s.cnmilf.com
agdatacommons.nal.usda.gov
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). GrainGenes- A Global Data Repository for Small Grains [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/graingenes-the-genome-database-for-small-grain-crops-23ca9
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
GrainGenes is an international, centralized crop database for peer-reviewed small grains data and information portal that serves the small grains research and breeding communities (wheat, barley, oat, and rye). The GrainGenes project ensures long-term data curation, accessibility, and sustainability so that small grains researchers can develop new, more nutritious, disease and pest resistant, high yielding cultivars. As a digital platform, GrainGenes houses peer-reviewed and curated genetic, genomic, and protein data. It has been hard-funded by the U.S. Department of Agriculture-Agricultural Research Service to ensure long-term data sustainability through a functional and integrated web interface for wheat, barley, oat, and rye.
Data curation materials in "Daily life in the Open Biologist's second job,...
zenodo.org
bin, tiff, txt
Updated Sep 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Livia C T Scorza; Livia C T Scorza; Tomasz Zieliński; Tomasz Zieliński; Andrew J Millar; Andrew J Millar (2024). Data curation materials in "Daily life in the Open Biologist's second job, as a Data Curator" [Dataset]. http://doi.org/10.5281/zenodo.13321937
Explore at:
tiff, txt, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13321937
Dataset updated
Sep 18, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Livia C T Scorza; Livia C T Scorza; Tomasz Zieliński; Tomasz Zieliński; Andrew J Millar; Andrew J Millar
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This is the supplementary material accompanying the manuscript "Daily life in the Open Biologist’s second job, as a Data Curator", published in Wellcome Open Research.

It contains:

- Python_scripts.zip: Python scripts used for data cleaning and organization:

-add_headers.py: adds specified headers automatically to a list of csv files, creating new output files containing a "_with_headers" suffix.

-count_NaN_values.py: counts the total number of rows containing null values in a csv file and prints the location of null values in the (row, column) format.

-remove_rowsNaN_file.py: removes rows containing null values in a single csv file and saves the modified file with a "_dropNaN" suffix.

-remove_rowsNaN_list.py: removes rows containing null values in list of csv files and saves the modified files with a "_dropNaN" suffix.

- README_template.txt: a template for a README file to be used to describe and accompany a dataset.

- template_for_source_data_information.xlsx: a spreadsheet to help manuscript authors to keep track of data used for each figure (e.g., information about data location and links to dataset description).

- Supplementary_Figure_1.tif: Example of a dataset shared by us on Zenodo. The elements that make the dataset FAIR are indicated by the respective letters. Findability (F) is achieved by the dataset unique and persistent identifier (DOI), as well as by the related identifiers for the publication and dataset on GitHub. Additionally, the dataset is described with rich metadata, (e.g., keywords). Accessibility (A) is achieved by the ease of visualization and downloading using a standardised communications protocol (https). Also, the metadata are publicly accessible and licensed under the public domain. Interoperability (I) is achieved by the open formats used (CSV; R), and metadata are harvestable using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), a low-barrier mechanism for repository interoperability. Reusability (R) is achieved by the complete description of the data with metadata in README files and links to the related publication (which contains more detailed information, as well as links to protocols on protocols.io). The dataset has a clear and accessible data usage license (CC-BY 4.0).
u
Data from: United States wildlife and wildlife product imports from...
agdatacommons.nal.usda.gov
bin
Updated May 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Evan A. Eskew; Allison M. White; Naom Ross; Kristine M. Smith; Katherine F. Smith; Jon Paul Rodríguez; Carlos Zambrana-Torrelio; William B. Karesh; Peter Daszak (2025). Data from: United States wildlife and wildlife product imports from 2000–2014 [Dataset]. https://agdatacommons.nal.usda.gov/articles/dataset/Data_from_United_States_wildlife_and_wildlife_product_imports_from_2000_2014/24853503
Explore at:
binAvailable download formats
Dataset updated
May 6, 2025
Dataset provided by
Scientific Data
Authors
Evan A. Eskew; Allison M. White; Naom Ross; Kristine M. Smith; Katherine F. Smith; Jon Paul Rodríguez; Carlos Zambrana-Torrelio; William B. Karesh; Peter Daszak
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
The global wildlife trade network is a massive system that has been shown to threaten biodiversity, introduce non-native species and pathogens, and cause chronic animal welfare concerns. Despite its scale and impact, comprehensive characterization of the global wildlife trade is hampered by data that are limited in their temporal or taxonomic scope and detail. To help fill this gap, we present data on 15 years of the importation of wildlife and their derived products into the United States (2000–2014), originally collected by the United States Fish and Wildlife Service. We curated and cleaned the data and added taxonomic information to improve data usability. These data include >2 million wildlife or wildlife product shipments, representing >60 biological classes and >3.2 billion live organisms. Further, the majority of species in the dataset are not currently reported on by CITES parties. These data will be broadly useful to both scientists and policymakers seeking to better understand the volume, sources, biological composition, and potential risks of the global wildlife trade. Resources in this dataset:Resource Title: United States LEMIS wildlife trade data curated by EcoHealth Alliance (Version 1.1.0) - Zenodo. File Name: Web Page, url: https://doi.org/10.5281/zenodo.3565869 Over 5.5 million USFWS LEMIS wildlife or wildlife product records spanning 15 years and 28 data fields. These records were derived from >2 million unique shipments processed by USFWS during the time period and represent >3.2 billion live organisms. We provide the final cleaned data as a single comma-separated value file. Original raw data as provided by the USFWS are also available. Although relatively large (~1 gigabyte), the cleaned data file can be imported into a software environment of choice for data analysis. Alternatively, the assocated R package provides access to a release of the same cleaned dataset but with a data download and manipulation framework that is designed to work well with this large dataset. Both the Zenodo data repository and the R package contain a metadata file describing each of the data fields as well as a lookup table to retrieve full values for the abbreviated codes used throughout the dataset. Contents: lemis_2000_2014_cleaned.csv: This file represents the compiled, cleaned LEMIS data from 2000-2014. This data is identical to the version 1.1.0 dataset available through the lemis R package. lemis_codes.csv: Full values for all coded values used in the LEMIS data. Identical to the output from the lemis R package function "lemis_codes()". lemis_metadata.csv: Data fields and field descriptions for all variables in the LEMIS data. Identical to the output from the lemis R package function "lemis_metadata()". raw_data.zip: This archive contains all of the raw LEMIS data files that are processed and cleaned with the code contained in the 'data-raw' subdirectory of the lemis R package repository.Resource Software Recommended: R package,url: https://github.com/ecohealthalliance/lemis
Northern elephant seal tracking and diving – raw and curated data
data.niaid.nih.gov
datadryad.org
zip
Updated May 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Costa; Rachel Holser; Theresa Keates; Taiki Adachi; Roxanne Beltran; Cory Champagne; Crocker Daniel; Arina Favilla; Melinda Fowler; Juan Pablo Gallo-Reynoso; Chandra Goetsch; Jason Hassrick; Luis Hückstädt; Jessica Kendall-Bar; Sarah Kienle; Carey Kuhn; Jennifer Maresh; Sara Maxwell; Birgitte McDonald; Elizabeth McHuron; Patricia Morris; Yasuhiko Naito; Logan Pallin; Sarah Peterson; Patrick Robinson; Samantha Simmons; Akinori Takahashi; Nicole Teuschel; Michael Tift; Yann Tremblay; Stella Villegas-Amtman; Ken Yoda (2025). Northern elephant seal tracking and diving – raw and curated data [Dataset]. http://doi.org/10.7291/D10D61
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.7291/D10D61
Dataset updated
May 14, 2025
Dataset provided by
United States Geological Survey
Moss Landing Marine Laboratories
University of Exeter
Scripps Institution of Oceanography
Nagoya University
Baylor University
University of St Andrews
University of North Carolina Wilmington
Sonoma State University
NOAA National Marine Fisheries Service
Springfield College
University of Washington
ICF International (United States)
West Chester University
Consolidated Safety Services-Dynamac (United States)
Centro de Investigación en Alimentación y Desarrollo
University of California, Santa Cruz
Marine Biodiversity Exploitation and Conservation
National Institute of Polar Research
Authors
Daniel Costa; Rachel Holser; Theresa Keates; Taiki Adachi; Roxanne Beltran; Cory Champagne; Crocker Daniel; Arina Favilla; Melinda Fowler; Juan Pablo Gallo-Reynoso; Chandra Goetsch; Jason Hassrick; Luis Hückstädt; Jessica Kendall-Bar; Sarah Kienle; Carey Kuhn; Jennifer Maresh; Sara Maxwell; Birgitte McDonald; Elizabeth McHuron; Patricia Morris; Yasuhiko Naito; Logan Pallin; Sarah Peterson; Patrick Robinson; Samantha Simmons; Akinori Takahashi; Nicole Teuschel; Michael Tift; Yann Tremblay; Stella Villegas-Amtman; Ken Yoda
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Northern elephant seals (Mirounga angustirostris) have been integral to the development and progress of biologging technology and movement data analysis. Adult female elephant seals at Año Nuevo State Park and other colonies along the west coast of North America were tracked annually from 2004 to 2020 for a total of 653 instrument deployments and 561 recoveries. These high-resolution diving and location data have been compiled, curated, and processed. This repository has netCDF files containing the raw tracking and diving data. The processed data are available in a second repository (https://doi.org/10.7291/D18D7W). Methods These data were collected from biotelemetry devices attached to adult female northern elephant seals (Mirounga angustirostris) from 2004 to 2020. The instruments collected locations (Argos and/or GPS) and continuously recorded depth throughout the animals' trips. Data were processed in MATLAB and R using custom code, the IKNOS package for dive data processing, and the aniMotum package for track processing. The details of data collection and processing are documented in the data descriptor paper associated with this dataset. In addition, all code used to process the data are available on GitHub and Zenodo.

The data presented here are freely available for use under the CC0 (Creative Commons Zero), and attribution is encouraged to be given to the data descriptor (DOI: 10.1038/s41597-024-04084-4) and this Dryad repository. We encourage users to reach out to the data owner for richer insight into the dataset. Subsets of this dataset have been made available through other projects and data portals and we caution users that these are not independent northern elephant seal datasets. This includes the AniBOS/MEOP data portal (https://www.meop.net/database/meop-databases/), the Animal Tracking Network (ATN) (https://portal.atn.ioos.us/), Movebank (https://www.movebank.org/cms/movebank-main), and MegaMove (https://megamove.org/data-portal/).

Additional data about the instrumented animals, such as morphometrics, demographics, and other biologging data (e.g., acceleration, jaw motion, temperature), are available for many of these animals but are beyond the scope of this dataset. For more information, contact the author at rholser@ucsc.edu.

Sampling Biases

Generally, we have been careful to select healthy animals for sedation and instrumentation. For animals deployed at Año Nuevo (most of the tracks), typically individuals with known site fidelity to the colony were selected and if age was known it was usually restricted to 4- to 12-year-olds. Furthermore, the data reported here span two decades of work. During this time, different studies prompted additional non-random population sampling. Examples include focusing on one age for a year, repeat tracking the same individuals two trips in a row, and intentionally selecting previously tracked females who had used a coastal foraging strategy. Many individuals in the dataset have been tracked multiple times. We strongly encourage researchers to evaluate the metadata provided carefully and contact the author with inquiries at rholser@ucsc.edu.

Code Availability

All the code written for data processing and NetCDF data import code for MATLAB, R, and Python are available at GitHub (https://github.com/rholser/NES_TrackDive_DataProcessing) and Zenodo (https://doi.org/10.5281/zenodo.12511548). Extensive documentation of functions and scripts is also provided there. In addition, the authors have provided code in Python, R, and MATLAB for basic access to the netCDF files (https://github.com/rholser/NES-Read-netCDF). They should serve as a model to enable users unfamiliar with the format to access the data.
f
Data providers and their activities.
figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dong Joon Lee; Besiki Stvilia (2023). Data providers and their activities. [Dataset]. http://doi.org/10.1371/journal.pone.0173987.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0173987.t002
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Dong Joon Lee; Besiki Stvilia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data providers and their activities.
f
IR position titles mapped into identified IR staff’s roles.
plos.figshare.com
figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dong Joon Lee; Besiki Stvilia (2023). IR position titles mapped into identified IR staff’s roles. [Dataset]. http://doi.org/10.1371/journal.pone.0173987.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0173987.t004
Dataset updated
Jun 3, 2023
Dataset provided by
PLOS ONE
Authors
Dong Joon Lee; Besiki Stvilia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IR position titles mapped into identified IR staff’s roles.
Data from: Current and projected research data storage needs of Agricultural...
catalog.data.gov
agdatacommons.nal.usda.gov
+2more
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. https://catalog.data.gov/dataset/current-and-projected-research-data-storage-needs-of-agricultural-research-service-researc-f33da
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel
h
science-journal-for-kids-data
huggingface.co
Updated Aug 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
stef (2024). science-journal-for-kids-data [Dataset]. https://huggingface.co/datasets/loukritia/science-journal-for-kids-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 22, 2024
Authors
stef
Description
Science Journal for Kids Data

This repository contains a dataset of abstracts from the Science Journal for Kids website and the original academic papers. It includes metadata such as titles, URLs, reading levels, and links to the full academic papers. The dataset is designed to support research and analysis of educational content tailored for young learners.

Data

The dataset is a curated collection of 284 original scientific abstracts and their adapted abstracts for… See the full description on the dataset page: https://huggingface.co/datasets/loukritia/science-journal-for-kids-data.
s
Forest Service Research Data Archive
cinergi.sdsc.edu
resource url
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Forest Service Research Data Archive [Dataset]. http://cinergi.sdsc.edu/geoportal/rest/metadata/item/405602358f30443094570d3e925c2c8e/html
Explore at:
resource urlAvailable download formats
Area covered

Description
Link Function: information
M
Corona Data Scraper
catalog.midasnetwork.us
Updated Jul 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MIDAS Coordination Center (2023). Corona Data Scraper [Dataset]. https://catalog.midasnetwork.us/?object_id=133
Explore at:
Dataset updated
Jul 6, 2023
Dataset authored and provided by
MIDAS Coordination Center
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Time period covered
Jan 12, 2020 - Nov 1, 2020
Variables measured
disease, pathogen, case counts, host organism, case counts - mortality data, disease - infectious disease, host organism - Homo sapiens, case counts - diagnostic tests, case counts - hospital stay dataset, disease - infectious disease - COVID-19, and 1 more
Dataset funded by
National Institute of General Medical Sciences
Description
Corona Data Scraper pulls information from a variety of openly available world government data sources and curated datasets. It includes information on the number of cases, deaths, recovered cases, active cases, testing, hospitalization, ICU patients, and discharge pertaining to COVID-19 in a specified country or state. The data are not updated anymore.
w
Data from: Gene Expression Omnibus (GEO)
data.wu.ac.at
healthdata.gov
+2more
Updated Jul 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Department of Health & Human Services (2016). Gene Expression Omnibus (GEO) [Dataset]. https://data.wu.ac.at/schema/data_gov/NWFkM2U0MDgtNmQ5My00NzdhLWFiZWYtYjhiN2I4MTBmYWM5
Explore at:
Dataset updated
Jul 19, 2016
Dataset provided by
U.S. Department of Health & Human Services
Description
Gene Expression Omnibus is a public functional genomics data repository supporting MIAME-compliant submissions of array- and sequence-based data. Tools are provided to help users query and download experiments and curated gene expression profiles.

Facebook

Twitter

Click to copy link

Link copied

Cite

ASU Library Research Data Repository (2025). All of Us - NIH - Registered Tier [Dataset]. http://doi.org/10.48349/ASU/5FXHQU

All of Us - NIH - Registered Tier

Explore at:

xlsx(1180323)Available download formats

Unique identifier

https://doi.org/10.48349/ASU/5FXHQU

Dataset updated

Feb 19, 2025

Dataset provided by

ASU Library Research Data Repository

License

https://dataverse.asu.edu/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.48349/ASU/5FXHQUhttps://dataverse.asu.edu/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.48349/ASU/5FXHQU

Dataset funded by

National Institutes of Health (NIH)

Description

The All of Us Research Hub contains a wide variety of datatypes, including survey responses, measurements, biosamples, electronic health records (EHRs), and data from mobile health devices from participants who are healthy as well as experiencing illness. The Registered Tier curated dataset contains individual-level data, available only to approved researchers on the Researcher Workbench. The Registered Tier currently includes data from electronic health records, survey answers, and physical measurements taken at the time of participant enrollment. Only authorized users who have registered with the All of Us Research Program can access the Registered Tier data. Authorized users also can access tools such as the Cohort Builder, Jupyter Notebooks, and Dataset Builder.

Clear search

Close search

Google apps

Main menu

All of Us - NIH - Registered Tier

Tools for research data curation.

Directory of Public Repositories of Geological Materials

Data from: Evolution of Data Creation, Management, Publication, and Curation...

Data for: Sustainable connectivity in a community repository

United States Gulf Coast Basin Curated Wells and Logs Database (ver. 2.0,...

United States LEMIS wildlife trade data curated by EcoHealth Alliance

Research data activities and their corresponding actions in IRs.

Users’ data activities in IRs.

GrainGenes- A Global Data Repository for Small Grains

Data curation materials in "Daily life in the Open Biologist's second job,...

Data from: United States wildlife and wildlife product imports from...

Northern elephant seal tracking and diving – raw and curated data

Data providers and their activities.

IR position titles mapped into identified IR staff’s roles.

Data from: Current and projected research data storage needs of Agricultural...

science-journal-for-kids-data

Forest Service Research Data Archive

Corona Data Scraper

Data from: Gene Expression Omnibus (GEO)

All of Us - NIH - Registered Tier