United States agricultural researchers have many options for making their data available online. This dataset aggregates the primary sources of ag-related data and determines where researchers are likely to deposit their agricultural data. These data serve as both a current landscape analysis and also as a baseline for future studies of ag research data. Purpose As sources of agricultural data become more numerous and disparate, and collaboration and open data become more expected if not required, this research provides a landscape inventory of online sources of open agricultural data. An inventory of current agricultural data sharing options will help assess how the Ag Data Commons, a platform for USDA-funded data cataloging and publication, can best support data-intensive and multi-disciplinary research. It will also help agricultural librarians assist their researchers in data management and publication. The goals of this study were to establish where agricultural researchers in the United States-- land grant and USDA researchers, primarily ARS, NRCS, USFS and other agencies -- currently publish their data, including general research data repositories, domain-specific databases, and the top journals compare how much data is in institutional vs. domain-specific vs. federal platforms determine which repositories are recommended by top journals that require or recommend the publication of supporting data ascertain where researchers not affiliated with funding or initiatives possessing a designated open data repository can publish data Approach The National Agricultural Library team focused on Agricultural Research Service (ARS), Natural Resources Conservation Service (NRCS), and United States Forest Service (USFS) style research data, rather than ag economics, statistics, and social sciences data. To find domain-specific, general, institutional, and federal agency repositories and databases that are open to US research submissions and have some amount of ag data, resources including re3data, libguides, and ARS lists were analysed. Primarily environmental or public health databases were not included, but places where ag grantees would publish data were considered. Search methods We first compiled a list of known domain specific USDA / ARS datasets / databases that are represented in the Ag Data Commons, including ARS Image Gallery, ARS Nutrition Databases (sub-components), SoyBase, PeanutBase, National Fungus Collection, i5K Workspace @ NAL, and GRIN. We then searched using search engines such as Bing and Google for non-USDA / federal ag databases, using Boolean variations of “agricultural data” /“ag data” / “scientific data” + NOT + USDA (to filter out the federal / USDA results). Most of these results were domain specific, though some contained a mix of data subjects. We then used search engines such as Bing and Google to find top agricultural university repositories using variations of “agriculture”, “ag data” and “university” to find schools with agriculture programs. Using that list of universities, we searched each university web site to see if their institution had a repository for their unique, independent research data if not apparent in the initial web browser search. We found both ag specific university repositories and general university repositories that housed a portion of agricultural data. Ag specific university repositories are included in the list of domain-specific repositories. Results included Columbia University – International Research Institute for Climate and Society, UC Davis – Cover Crops Database, etc. If a general university repository existed, we determined whether that repository could filter to include only data results after our chosen ag search terms were applied. General university databases that contain ag data included Colorado State University Digital Collections, University of Michigan ICPSR (Inter-university Consortium for Political and Social Research), and University of Minnesota DRUM (Digital Repository of the University of Minnesota). We then split out NCBI (National Center for Biotechnology Information) repositories. Next we searched the internet for open general data repositories using a variety of search engines, and repositories containing a mix of data, journals, books, and other types of records were tested to determine whether that repository could filter for data results after search terms were applied. General subject data repositories include Figshare, Open Science Framework, PANGEA, Protein Data Bank, and Zenodo. Finally, we compared scholarly journal suggestions for data repositories against our list to fill in any missing repositories that might contain agricultural data. Extensive lists of journals were compiled, in which USDA published in 2012 and 2016, combining search results in ARIS, Scopus, and the Forest Service's TreeSearch, plus the USDA web sites Economic Research Service (ERS), National Agricultural Statistics Service (NASS), Natural Resources and Conservation Service (NRCS), Food and Nutrition Service (FNS), Rural Development (RD), and Agricultural Marketing Service (AMS). The top 50 journals' author instructions were consulted to see if they (a) ask or require submitters to provide supplemental data, or (b) require submitters to submit data to open repositories. Data are provided for Journals based on a 2012 and 2016 study of where USDA employees publish their research studies, ranked by number of articles, including 2015/2016 Impact Factor, Author guidelines, Supplemental Data?, Supplemental Data reviewed?, Open Data (Supplemental or in Repository) Required? and Recommended data repositories, as provided in the online author guidelines for each the top 50 journals. Evaluation We ran a series of searches on all resulting general subject databases with the designated search terms. From the results, we noted the total number of datasets in the repository, type of resource searched (datasets, data, images, components, etc.), percentage of the total database that each term comprised, any dataset with a search term that comprised at least 1% and 5% of the total collection, and any search term that returned greater than 100 and greater than 500 results. We compared domain-specific databases and repositories based on parent organization, type of institution, and whether data submissions were dependent on conditions such as funding or affiliation of some kind. Results A summary of the major findings from our data review: Over half of the top 50 ag-related journals from our profile require or encourage open data for their published authors. There are few general repositories that are both large AND contain a significant portion of ag data in their collection. GBIF (Global Biodiversity Information Facility), ICPSR, and ORNL DAAC were among those that had over 500 datasets returned with at least one ag search term and had that result comprise at least 5% of the total collection. Not even one quarter of the domain-specific repositories and datasets reviewed allow open submission by any researcher regardless of funding or affiliation. See included README file for descriptions of each individual data file in this dataset. Resources in this dataset:Resource Title: Journals. File Name: Journals.csvResource Title: Journals - Recommended repositories. File Name: Repos_from_journals.csvResource Title: TDWG presentation. File Name: TDWG_Presentation.pptxResource Title: Domain Specific ag data sources. File Name: domain_specific_ag_databases.csvResource Title: Data Dictionary for Ag Data Repository Inventory. File Name: Ag_Data_Repo_DD.csvResource Title: General repositories containing ag data. File Name: general_repos_1.csvResource Title: README and file inventory. File Name: README_InventoryPublicDBandREepAgData.txt
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set includes rasters of tidal marsh soil properties in the Northeast US for the purpose of blue carbon accounting. Mapping products cover estuarine and emergent wetland classes in the National Wetlands Inventory.Resources in this dataset:Resource Title: Northeast Blue Carbon Raster Map - LOI File Name: Northeast_marsh_LOI.7z Resource Description: This raster file provides soil organic matter (% LOI) estimates for tidal marshes in the Northeast US.Resource Title: Northeast Blue Carbon Raster Map - BD File Name: Northeast_marsh_BD.7z Resource Description: This raster based data product provides soil dry bulk density estimates for tidal marshes in the Northeast US.Resource Title: README file for Northeast Blue Carbon Rasters File Name: README_Northeast_blue_carbon_rasters_LOI_BD.txt Resource Description: Brief description of the raster properties and methods by which they were created to model soil organic matter, soil dry bulk density, and carbon density for the Northeast US. See Teng et al (in prep., 2023) for more details on the model development.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
This document provides instructions for editing and submitting unit process or product system models to the USDA LCA Commons life cycle inventory (LCI) database. The LCA Commons LCI database uses the openLCA life cycle modeling tool's database schema. Therefore, this document describes how to import and edit data in openLCA and name and classify flows such that they properly import into and operate in the database. This document also describes metadata or documentation requirements for posting models to the LCA Commons. This document is an evolving standard for LCA Commons data. As USDA-NAL continues to gain experience in managing a general purpose LCI database and global conventions continue to evolve, so too will the LCA Commons Submission Guidelines. Resources in this dataset:Resource Title: LCA Commons Submission Guidelines_12/09/2015. File Name: lcaCommonsSubmissionGuidelines_Final_2015-12-09.pdf
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Data dictionary for Gridded National Soil Survey Geographic Database (gNATSGO). https://data.nal.usda.gov/node/23067gNATSGO has a schema that is very similar to that of SSURGO and STATSGO2. A CSV version of the data dictionary is presented.A data dictionary typically provides a detailed description for each element or variable in a dataset or data model. Data dictionaries are used to document important and useful information such as a descriptive name, the data type, allowed values, units, and text description.Dataset citation: (dataset) Soil Survey Staff. Gridded National Soil Survey Geographic (gNATSGO) Database for [State name -or- the Conterminous United States]. United States Department of Agriculture, Natural Resources Conservation Service. Available online at https://nrcs.app.box.com/v/soils. Month, day, year.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
House flies (Musca domestica L.) are vectors of human and animal pathogens at livestock operations. Microbial communities in flies are acquired from, and correlate with, their local environment. However, variation among microbial communities carried by flies from farms in different geographical areas is not well understood. We characterized bacterial communities of female house flies collected from beef and dairy farms in Oklahoma, Kansas, and Nebraska and further evaluated the prevalence of antibiotic resistance genes in bacteria within flies. We evaluated the influence of farm type and farm location on bacterial communities, diversity, pathogenic bacteria strains and prevalence of antibiotic resistance genes. These data can be used for better understanding of abundance and prevalence of bacterial communities in house flies associated with livestock operations. These data were collected in September 2019. Abbreviations used include Operational Taxonomic Units(OTUs), Canonical Correspondence analysis (CCA), Infectious Bovine Keratoconjunctivitis (IBK), Anti Microbial Resistance (AMR), and Antibiotic Resistance Genes (ARGs).
The raw Illumina MiSeq sequence data for this project can be found here:
https://www.ncbi.nlm.nih.gov/bioproject/PRJNA863664
Resources in this dataset:
Resource title: Metadata for Microbiome of House Fly Associated with Cattle Farms File name: Metadata for Microbiome of House Fly Associated with Cattle Farms.xlsx Resource description: This spreadsheet links the raw sequence reads on NCBI with data on farm type, farm location and sample type.
The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling.
The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly.
From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey.
Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond.
We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival.
To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values.
Resource Title: Appendix A: ARS data storage survey questions.
File Name: Appendix A.pdf
Resource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here.
Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/
Resource Title: CSV of Responses from ARS Researcher Data Storage Survey.
File Name: Machine-readable survey response data.csv
Resource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).
Resource Title: Responses from ARS Researcher Data Storage Survey.
File Name: Data Storage Survey Data for public release.xlsx
Resource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.
Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Data dictionary for "Agricultural Collaborative Research Outcomes System (AgCROS)". https://data.nal.usda.gov/node/5643/ Data dictionary for the growing AgCROS family of products as of 04/2019.A data dictionary typically provides a detailed description for each element or variable in a dataset or data model. Data dictionaries are used to document important and useful information such as a descriptive name, the data type, allowed values, units, and text description.Dataset citation: (dataset) USDA Agricultural Research Service. (2017). Agricultural Collaborative Research Outcomes System (AgCROS). USDA Agricultural Research Service. https://data.nal.usda.gov/dataset/agricultural-collaborative-research-outcomes-system-agcros.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository provides extended documentation, code, and updated links to access the Soil Landscapes of the United States (SOLUS) 100-meter soil property maps. It provides supporting materials for a peer reviewed paper (Nauman et al., Soil Science Society of America Journal, 1–20. https://doi.org/10.1002/saj2.20769) documenting the theory and novel application of hybridized legacy training datasets used to inform the machine learning models used to create the new soil property maps presented here. The SOLUS dataset includes 20 different soil properties (listed below) with most properties predicted for seven standard depths (0, 5, 15, 30, 60, 100, and 150 cm). Further details on these properties and all included files are available in the README.docx document. Also included is a git repository formatted as a hybrid R package that includes all code used to create the soil property maps. All SOLUS100 mapping layers are available as cloud optimized geotiffs at: https://storage.googleapis.com/solus100pub/index.html Metadata: https://storage.googleapis.com/solus100pub/SOLUS100_metadata_pub.html List of files at this URL are listed at: https://storage.googleapis.com/solus100pub/Final_Layer_Table_20231215.csv Note that many of the raster files are scaled by multipliers of 10, 100, or 1000 to store the values as integers to decrease file size. The ‘scalar’ field of the file list table (Final_Layer_Table_20231215.csv) files provide those values. The actual rasters must be divided by the scalars to get the actual units of the properties. To download files, simply concatenate the google API URL with a forward slash and the file name listed in the table into a browser (e.g. EC at 0 cm would be https://storage.googleapis.com/solus100pub/ec_15_cm_p.tif). To automate downloads, a loop in python, R or your language of choice that builds file download urls from the file list in the csv can be implemented. Alternatively, some GIS programs (e.g. QGIS) will let you visualize and interact with the files without downloading the files by entering the URL. All raster environmental covariates used in mapping are available here: https://storage.googleapis.com/cov100m/index.html Properties included in SOLUS100:
Bulk density (oven dry) Calcium carbonate Cation Exchange Capacity (pH 7) Clay Coarse sand Electrical Conductivity (sat. paste) Effective cation exchange capacity Fine sand Gypsum (in
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Data dictionary and brochure for REAP (Resilient Economic Agricultural Practices). https://data.nal.usda.gov/node/5594
Data Entry Template 2017 includes
Excel templates for Experiment description worksheets, Site characterization worksheets, Management worksheets, Measurement worksheets where experimental unit data are reported, and Information that may be useful to the user, including drop down lists of treatment specific information and ranges of expected values. General and introductory instructions, as well as a Data Validation check are also included.A data dictionary typically provides a detailed description for each element or variable in a dataset or data model. Data dictionaries are used to document important and useful information such as a descriptive name, the data type, allowed values, units, and text description.Dataset citation: (dataset) USDA Agricultural Research Service. (2017). REAP (Resilient Economic Agricultural Practices). Agricultural Research Service. https://doi.org/10.15482/USDA.ADC/1372394.
This model was originally trained for use in a recommendation system to the Ag Data Commons that will automatically link viewers of one dataset to other directly relevant datasets and research papers that they may be interested in. It was also used to determine the similarities and differences between projects within ARS’ National Programs and create a visualization layer to allow leaders to explore and manage their programs easily.
This model was generated using the Word2Vec model, starting with a set of word vectors trained on Google News articles, and further training it on the titles+abstracts from PubAg and the titles+descriptions from Ag Data Commons. This model was trained using a vector length of 300 and the Continuous Bag of Words version of the algorithm with negative sampling.
This word vector model could be used for any Natural-Language Processing applications involving text with a large amount of agricultural research vocabulary.
Resource Title: Agricultural Word Vectors.
File Name: AgWordVectors-300.zip
Resource Description: Word vectors trained on the full titles/abstracts in PubAg and titles/abstracts in Ag Data Commons. (Part A)
Resource Title: Agricultural Word Vectors Trainables.
File Name: AgWordVectors-300.model.trainables.syn1neg.zip
Resource Description: Word vectors trained on the full titles/abstracts in PubAg and titles/abstracts in Ag Data Commons. (Part B)
Resource Title: Agricultural Word Vector Model.
File Name: AgWordVectors-300.model.wv_.vectors.zip
Resource Description: Word vectors trained on the full titles/abstracts in PubAg and titles/abstracts in Ag Data Commons. (Part C)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These data were collected as part of the Assessing and Expanding Soil Health for Production, Economic, & Environmental Benefits grant to the Soil Health Institute. Because the primary objective was to evaluate, identify, and recommend effective widely-applicable measurements for evaluating soil health across North America, the project is often referenced as the North American Project to Evaluate Soil Health Measurements. The data available are a compilation of data collected and curated by Soil Health Institute from 94 long-term agricultural research sites across the United States and Mexico. Metadata includes location (lat/long) for each site, name, year it became a research site, 10-year summary weather variables based on the Daymet daily surface weather data set, year each treatment began, the categorical management attributes and detailed management data, and other metadata. 508 treatments are represented. Data is provided for 1453 experimental units (plots) that were sampled. 110 analyses (data columns) are reported for each experimental unit. Five more data columns are provided on nutrient analytes (ammonium acetate extraction, DTPA extraction, Olsen P) that were performed conditional on pH. Phospholipid Fatty Acid (PLFA) reports are provided on 109 biomarkers. The NAPESHM project is part of a broader effort titled, “Assessing and Expanding Soil Health for Production, Economic, and Environmental Benefits”. The project is funded by the Foundation for Food and Agricultural Research (grant ID 523926), General Mills, and The Samuel Roberts Noble Foundation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Current and projected research data storage needs of Agricultural Research Service researchers in 2016’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/e2b7daf0-c8fe-4c68-b62d-891360ba8f96 on 26 January 2022.
--- Dataset description provided by original source is as follows ---
The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling.
The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly.
From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey.
Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond.
We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival.
To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values.
--- Original source retains full ownership of the source dataset ---
The USDA-Agricultural Research Service carried out an experiment on water productivity in response to seasonal timing of irrigation of maize (Zea mays L.) at the Limited Irrigation Research Farm (LIRF) facility in northeastern Colorado (40°26’ N, 104°38’ W) starting in 2012. Twelve treatments involved different water availability targeted at specific growth-stages. This dataset includes data from the first two years, which were complete years with intact treatments. Data includes canopy growth and development (canopy height, canopy cover and LAI), irrigation, precipitation, and soil water storage measured periodically through the season; daily estimates of crop evapotranspiration; and seasonal measurement of crop water use, harvest index and crop yield. Hourly and daily weather data are also provided from the CoAgMET, Colorado’s network of meteorological information (https://coagmet.colostate.edu/ ; GLY04 station). Additional soil data can be found in a previous dataset (USDA-ARS Colorado Maize Water Productivity Dataset 2008-2011) also available from the Ag Data Commons. This previous dataset included six targeted treatments that were generally uniform through the season. This new dataset can be used to further validate and refine maize crop models. The data are presented in a spreadsheet format in individual sheets within one workbook. The first sheet in the work book provides a list of data descriptions. Two sheets (one sheet for each of the two years) provide the hourly weather data, with the exception of the precipitation data, which is included in the sheet with daily data per treatment. The weather data is from a weather station on site. Another sheet provides plot level data (harvest index, yield, annual ET, maximum LAI, stand density, total aboveground biomass) taken annually by plot (four plots per treatment). Another sheet provides LAI measured four times over each season per plot. The final sheet provides daily data per treatment over each season, including data needed to compute daily water balance. This sheet has LAI, crop growth stage, plant height, estimated root depth, interpolated canopy cover, ET coefficients, precipitation, and estimated deep percolation, evaporation, and soil water deficit at four soil depths. List of files: LIRF small plots map 2012-2013 LIRF maize annual_daily_hourly data 2012-2013 Resources in this dataset:Resource Title: LIRF 2012-2013 Maize database. File Name: 2012-2013_Maize_Compiled database 06012018.xlsxResource Title: LIRF 2012-2013 Data Description. File Name: Data Description 06012018.xlsxResource Title: LIRF 2012-2013 Plot Map. File Name: Plot map 2012 2013.pdfResource Title: LIRF Data Dictionary. File Name: Data_Dictionary_Water_Prod_2012.csv
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
In support of nutrition research, concentrations of compounds from different parts of the watermelon plant are provided. The parts of the plant for which data are tabulated include (red) flesh, heart tissue, juice, seed, rind, peel, yellow flesh, seedling, leaf, root, other parts of the plant, and detected but plant part undeclared. The collected data include the low value in the range, the high value in the range, deviation from those values, and units (assumed to be fresh or wet weight unless noted). This table also provides for all compounds the citations to the literature and database sources. The “AFC” identifier represents the Agricultural Research Service (ARS) Food Compound; PubChem refers to the identifier from this resource of chemical compounds. Resources in this dataset:Resource Title: Catalog of natural products occurring in watermelon. File Name: Watermelon_NP_catalog_20210623.tsvResource Description: This is a table of chemical compounds found in watermelonResource Title: Data dictionary. File Name: Data_dictionary_Watermelon_compounds_NAL_20210623.xlsxResource Description: This is the data dictionaryResource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/access
The Long-Term Agroecosystem Research Network, consisting of 18+ research locations, is conducting research on the sustainable intensification of agroecosystems. To enable coordinated network level research, a spatial framework is required to facilitate analysis. However, no suitable spatial framework currently exists to meet the needs for the LTAR Network. To develop a framework for analysis the LTAR Network initiated the Regionalization Project. Goals also included providing a standardized spatial footprint for LTAR cross-site investigations, estimating the confidence with which results from research plots and fields could reasonably be extrapolated to "represented regions", informing decisions about where additional research sites should be prioritized and facilitating public outreach of the LTAR Network. To address these goals a workshop was held in 2018 resulting in the production of three sets of regional boundaries in a geographic information system (GIS). These GIS datasets are intended to be used for mapping the network and for summarizing spatial data relevant to domains of sustainable intensification corresponding with agricultural production, environmental impact, and rural prosperity. This resulted in a geodatabase of three new maps describing regional boundaries for the LTAR Network titled "Long-Term Agroecosystem Research Network regions, 2018 version", archived in the USDA National Agricultural Library's Ag Data Commons repository. Resources in this dataset:Resource Title: LTAR_Regions_v2018. File Name: LTAR_Regionsv2018.gdb.zipResource Description: Geodatabase of the regions pertaining to the Long-Term Agroecosystem Research Network. There are three data layers describing regions associated with indicators of sustainable intensification corresponding with agricultural production (LTAR_Production_v2018), environmental impacts (LTAR_Environment_v2018), and rural prosperity (LTAR_RuralProsp_v2018). These date were produced by the LTAR Regionalization Project as an outcome from the 2018 LTAR Regionalization Project workshop held in March 2018, Tifton, GA.Resource Software Recommended: ArcGIS Pro,url: www.esri.com
The USDA Branded Food Database was integrated as part of FoodData Central on April 2019. For more information on FoodData Central and the USDA Branded Food Database: Website: https://fdc.nal.usda.gov/ Ag Data Commons link: https://data.nal.usda.gov/dataset/fooddata-central
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
An inventory of current agricultural data sharing options will help assess how the Ag Data Commons, a platform for USDA-funded data cataloging and publication, can best support data-intensive and multi-disciplinary research. It will also help agricultural librarians assist their researchers in data management and publication. The goals of this study were to establish where agricultural researchers in the United States-- land grant and USDA researchers, primarily ARS, NRCS, USFS and other agencies -- currently publish their data, including general research data repositories, domain-specific databases, and the top journals
Link Function: information
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
List of Arctic accessions as of 31 July 2015 for Germplasm Resources Information Network (GRIN). https://data.nal.usda.gov/node/140
The USDA National Plant Germplasm System has more than 500,000 accessions (distinct varieties of plants) representing more than 2,000 unique species. Accessions are described in the GRIN database (Germplasm Resource Information Network). Germplasm accessions are available to support research and education objectives. This resource provides direct hyperlinks to 614 accessions related to Alaska. Search Parameters
Search Type: Accession Area Queries Location: United States of America; State: Alaska
Additional Criteria:
ALL – All Repositories Improvement Status – Any Status Reproductive uniformity – Any Status Any Time Taxonomic sort
Dataset citation: (dataset) USDA Agricultural Research Service (2015). Germplasm Resources Information Network (GRIN). USDA Agricultural Research Service. https://doi.org/10.15482/USDA.ADC/1212393.
The data are derived from the field monitoring of irrigated furrows from 1998 to 2016 at the research farm of the USDA/ARS-Northwest Irrigation and Water Research Laboratory in Kimberly, Idaho, USA (south-central Idaho). For each monitored furrow, irrigation inflow rates, outflow rates, and sediment concentrations were recorded periodically during the irrigation. A gated pipe conveyed irrigation water across the plots at the head, or inflow-end, of the furrows and adjustable spigots supplied water to each irrigated furrow. The methodology used to obtain the field data is described by Lentz and Sojka (2009). Inflows were measured by timing the filling rate of a known volume, and runoff were measured with long-throated, v-notch flumes. Outflows were measured and runoff samples collected at 30 min intervals during the first 1-3 hr of an irrigation, and every hour or two for the next 3 to 5 hr. If the set was continued for an additional 12 hr, two to four additional measurements were made. Immediately after each flume reading, sediment concentration in furrow streams were measured by collecting one-liter runoff samples from free-flowing flume discharge. The weight of sediment per liter of runoff was determined from the settled volume of sediment using the Imhoff-cone technique. Three Imhoff-cone sediment samples were collected from each treatment in each irrigation. These were filtered, and the papers dried and weighed. A calibration function relating the 30-min, settled-sediment volume to sediment mass-per-unit-volume of runoff was then calculated and used to convert settled sediment volume in cones to sediment mass. The field data for each study or year were analyzed using the WASHOUT program (Lentz and Sojka, 1995). The WASHOUT program produces an output file (filename.out), which become components of this Ag Data Commons data set. For many years and irrigations, furrows were monitored at one or more locations along the furrow, as well as at the end (bottom) of the furrow. In these cases, data for each position within the furrow are listed in the data set, labelled for example as "Top", "Middle", and "Bottom" (See Data Dictionary tab). For each furrow position the data represent the flow, infiltration, and runoff information for the length of furrow, which begins at its inflow end (top of the field) and ends at the defined furrow position. This distance is listed in the field data file for each furrow and irrigation. An Irrigation Data Summary is included as a tab in the data set spreadsheet. This is a summary list of the studies and irrigations that are included in the data set. Also included is a PAM-Application-Codes tab that lists description of the polyacrylamide (PAM) treatments that were employed in some of the included studies. Resources in this dataset:Resource Title: Furrow Infiltration and Erosion Data, 1998 to 2016. File Name: IrrigationData.xlsxResource Description: Furrow irrigation inflow, outflow, infiltration, and sediment load data Summary of studies and irrigations included in the data Data Dictionary Description of specific polyacrylamide treatments included in some of the studiesResource Title: Data Dictionary - Kimberly, ID - Furrow Infiltration and Erosion Data, 1998 to 2016. File Name: Kimberly-ID-Furrow-Inf-Erosion-DataDictionary1998-2016.csv
United States agricultural researchers have many options for making their data available online. This dataset aggregates the primary sources of ag-related data and determines where researchers are likely to deposit their agricultural data. These data serve as both a current landscape analysis and also as a baseline for future studies of ag research data. Purpose As sources of agricultural data become more numerous and disparate, and collaboration and open data become more expected if not required, this research provides a landscape inventory of online sources of open agricultural data. An inventory of current agricultural data sharing options will help assess how the Ag Data Commons, a platform for USDA-funded data cataloging and publication, can best support data-intensive and multi-disciplinary research. It will also help agricultural librarians assist their researchers in data management and publication. The goals of this study were to establish where agricultural researchers in the United States-- land grant and USDA researchers, primarily ARS, NRCS, USFS and other agencies -- currently publish their data, including general research data repositories, domain-specific databases, and the top journals compare how much data is in institutional vs. domain-specific vs. federal platforms determine which repositories are recommended by top journals that require or recommend the publication of supporting data ascertain where researchers not affiliated with funding or initiatives possessing a designated open data repository can publish data Approach The National Agricultural Library team focused on Agricultural Research Service (ARS), Natural Resources Conservation Service (NRCS), and United States Forest Service (USFS) style research data, rather than ag economics, statistics, and social sciences data. To find domain-specific, general, institutional, and federal agency repositories and databases that are open to US research submissions and have some amount of ag data, resources including re3data, libguides, and ARS lists were analysed. Primarily environmental or public health databases were not included, but places where ag grantees would publish data were considered. Search methods We first compiled a list of known domain specific USDA / ARS datasets / databases that are represented in the Ag Data Commons, including ARS Image Gallery, ARS Nutrition Databases (sub-components), SoyBase, PeanutBase, National Fungus Collection, i5K Workspace @ NAL, and GRIN. We then searched using search engines such as Bing and Google for non-USDA / federal ag databases, using Boolean variations of “agricultural data” /“ag data” / “scientific data” + NOT + USDA (to filter out the federal / USDA results). Most of these results were domain specific, though some contained a mix of data subjects. We then used search engines such as Bing and Google to find top agricultural university repositories using variations of “agriculture”, “ag data” and “university” to find schools with agriculture programs. Using that list of universities, we searched each university web site to see if their institution had a repository for their unique, independent research data if not apparent in the initial web browser search. We found both ag specific university repositories and general university repositories that housed a portion of agricultural data. Ag specific university repositories are included in the list of domain-specific repositories. Results included Columbia University – International Research Institute for Climate and Society, UC Davis – Cover Crops Database, etc. If a general university repository existed, we determined whether that repository could filter to include only data results after our chosen ag search terms were applied. General university databases that contain ag data included Colorado State University Digital Collections, University of Michigan ICPSR (Inter-university Consortium for Political and Social Research), and University of Minnesota DRUM (Digital Repository of the University of Minnesota). We then split out NCBI (National Center for Biotechnology Information) repositories. Next we searched the internet for open general data repositories using a variety of search engines, and repositories containing a mix of data, journals, books, and other types of records were tested to determine whether that repository could filter for data results after search terms were applied. General subject data repositories include Figshare, Open Science Framework, PANGEA, Protein Data Bank, and Zenodo. Finally, we compared scholarly journal suggestions for data repositories against our list to fill in any missing repositories that might contain agricultural data. Extensive lists of journals were compiled, in which USDA published in 2012 and 2016, combining search results in ARIS, Scopus, and the Forest Service's TreeSearch, plus the USDA web sites Economic Research Service (ERS), National Agricultural Statistics Service (NASS), Natural Resources and Conservation Service (NRCS), Food and Nutrition Service (FNS), Rural Development (RD), and Agricultural Marketing Service (AMS). The top 50 journals' author instructions were consulted to see if they (a) ask or require submitters to provide supplemental data, or (b) require submitters to submit data to open repositories. Data are provided for Journals based on a 2012 and 2016 study of where USDA employees publish their research studies, ranked by number of articles, including 2015/2016 Impact Factor, Author guidelines, Supplemental Data?, Supplemental Data reviewed?, Open Data (Supplemental or in Repository) Required? and Recommended data repositories, as provided in the online author guidelines for each the top 50 journals. Evaluation We ran a series of searches on all resulting general subject databases with the designated search terms. From the results, we noted the total number of datasets in the repository, type of resource searched (datasets, data, images, components, etc.), percentage of the total database that each term comprised, any dataset with a search term that comprised at least 1% and 5% of the total collection, and any search term that returned greater than 100 and greater than 500 results. We compared domain-specific databases and repositories based on parent organization, type of institution, and whether data submissions were dependent on conditions such as funding or affiliation of some kind. Results A summary of the major findings from our data review: Over half of the top 50 ag-related journals from our profile require or encourage open data for their published authors. There are few general repositories that are both large AND contain a significant portion of ag data in their collection. GBIF (Global Biodiversity Information Facility), ICPSR, and ORNL DAAC were among those that had over 500 datasets returned with at least one ag search term and had that result comprise at least 5% of the total collection. Not even one quarter of the domain-specific repositories and datasets reviewed allow open submission by any researcher regardless of funding or affiliation. See included README file for descriptions of each individual data file in this dataset. Resources in this dataset:Resource Title: Journals. File Name: Journals.csvResource Title: Journals - Recommended repositories. File Name: Repos_from_journals.csvResource Title: TDWG presentation. File Name: TDWG_Presentation.pptxResource Title: Domain Specific ag data sources. File Name: domain_specific_ag_databases.csvResource Title: Data Dictionary for Ag Data Repository Inventory. File Name: Ag_Data_Repo_DD.csvResource Title: General repositories containing ag data. File Name: general_repos_1.csvResource Title: README and file inventory. File Name: README_InventoryPublicDBandREepAgData.txt