Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
I) SUMMARY
This database contains harmonized time series for the study of crop yields using remote sensing data and meteorological data. We collected information on soybean, corn, and wheat yields (t/ha) over the CONUS (continuous US) from USDA-NASS for years 2015–2018 at a county level, and collocated time series for the following variables:
II) CONTACT
For questions, please email Laura Martínez-Ferrer at laura.martinez-ferrer@uv.es
III) DATABASE
For each crop type, we provided CSV files containing the time series of the variables and yield described above. Furthermore, additional information for spatial and temporal identification such as a county identifier and a year are included. Lastly, country-shapefiles (.shp) are added for geospatial representation. Further details in readme.txt file.
IV) CITE
We kindly encourage to cite the following works if this database is used
L. Martínez-Ferrer, M. Piles, G. Camps-Valls, Crop Yield Estimation and Interpretability With Gaussian Processes, IEEE Geoscience and Remote Sensing Letters, 2020, vol. 18, no 12, p. 2043-2047, DOI: 10.1109/LGRS.2020.3016140
A. Mateo-Sanchis, J. E. Adsuara, M. Piles, J. Muñoz-Marí, A. Pérez-Suay and G. Camps-Valls, "Interpretable Long-Short Term Memory Networks for Crop Yield Estimation," in IEEE Geoscience and Remote Sensing Letters, DOI: 10.1109/LGRS.2023.3244064
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes sample data for the United States to run the weakly supervised framework as described in the paper titled A weakly supervised framework for high resolution crop yield forecasts, accessible at
https://doi.org/10.48550/arXiv.2205.09016 |
The updated paper (including results from the US) is published in Environmental Research Letters:
https://doi.org/10.1088/1748-9326/acf50e
The software implementation of the machine learning baseline is available at: https://github.com/BigDataWUR/MLforCropYieldForecasting/tree/weaksup.
Data
1. County data (county-data.zip) for county-level strongly supervised models:
* CROP_AREA_COUNTY_US.csv: County crop production area statistics (acres). Source: NASS (USDA-NASS, 2022).
* CSSF_COUNTY_US.csv: Crop productivity indicators including total above-ground production (kg ha-1), total weight of storage organs (kg ha-1), development stage (0-2). Source: de Wit et al. (2022).
* METEO_COUNTY_US.csv: Meteo data including maximum, minimum, average daily air temperature (℃); sum of daily precipitation (PREC) (mm); sum of daily evapotranspiration of short vegetation (ET0) (Penman-Monteith, Allen et al., (1998)) (mm); climate water balance = (PREC - ET0) (mm). Source: Boogaard et al. (2022).
* REMOTE_SENSING_COUNTY_US.csv: Fraction of Absorbed Photosynthetically Active Radiation (Smoothed) (FAPAR). Source: Copernicus GLS (2020).
* SOIL_COUNTY_US.csv: Soil water holding capacity. Source: WISE Soil Property Database (Batjes, 2016).
* YIELD_COUNTY_US.csv: County yield statistics (bushels/acre). Source: NASS (USDA-NASS, 2022).
2. 10-km grid data (grid-data.zip) for grid-level strongly supervised models:
* COUNTY_GRIDS_US.csv: Mapping between counties and grids.
* CSSF_GRIDS_US.csv: Crop productivity indicators at 10km grid level (similar to county data above).
* METEO_GRIDs_US.csv: Meteo data at 10km grid level (similar to county data above).
* REMOTE_SENSING_GRIDS_US.csv: FAPAR at 10km grid level (similar to county data above).
* SOIL_GRIDS_US.csv: Soil water holding capacity at 10km grid level (similar to county data above).
* YIELD_GRIDS_US.csv: Grid-level modeled yields (t ha-1). Source: Deines et al. (2021), Lobell et al. (2020).
3. County labels and 10-km grid inputs (dscale-US.zip) for weak supervision:
* COUNTY_GRIDS_US.csv: Mapping between counties and grids.
* CSSF_GRIDS_US.csv: Crop productivity indicators at 10km grid level.
* METEO_GRIDs_US.csv: Meteo indicators at 10km grid level.
* REMOTE_SENSING_GRIDS_US.csv: FAPAR at 10km grid level.
* SOIL_GRIDS_US.csv: Soil water holding capacity at 10km grid level.
* YIELD_GRIDS_US.csv: Grid-level modeled yields (t ha-1). Source: Deines et al. (2021).
* YIELD_COUNTY_US.csv: County yield statistics (bushels/acre). Source: NASS (USDA-NASS, 2022).
* CROP_AREA_COUNTY_US.csv: County crop production area statistics (acres). Source: NASS (USDA-NASS, 2022).
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Data is archived here: https://doi.org/10.5281/zenodo.4818011Data and code archive provides all the files that are necessary to replicate the empirical analyses that are presented in the paper "Climate impacts and adaptation in US dairy systems 1981-2018" authored by Maria Gisbert-Queral, Arne Henningsen, Bo Markussen, Meredith T. Niles, Ermias Kebreab, Angela J. Rigden, and Nathaniel D. Mueller and published in 'Nature Food' (2021, DOI: 10.1038/s43016-021-00372-z). The empirical analyses are entirely conducted with the "R" statistical software using the add-on packages "car", "data.table", "dplyr", "ggplot2", "grid", "gridExtra", "lmtest", "lubridate", "magrittr", "nlme", "OneR", "plyr", "pracma", "quadprog", "readxl", "sandwich", "tidyr", "usfertilizer", and "usmap". The R code was written by Maria Gisbert-Queral and Arne Henningsen with assistance from Bo Markussen. Some parts of the data preparation and the analyses require substantial amounts of memory (RAM) and computational power (CPU). Running the entire analysis (all R scripts consecutively) on a laptop computer with 32 GB physical memory (RAM), 16 GB swap memory, an 8-core Intel Xeon CPU E3-1505M @ 3.00 GHz, and a GNU/Linux/Ubuntu operating system takes around 11 hours. Running some parts in parallel can speed up the computations but bears the risk that the computations terminate when two or more memory-demanding computations are executed at the same time.This data and code archive contains the following files and folders:* READMEDescription: text file with this description* flowchart.pdfDescription: a PDF file with a flow chart that illustrates how R scripts transform the raw data files to files that contain generated data sets and intermediate results and, finally, to the tables and figures that are presented in the paper.* runAll.shDescription: a (bash) shell script that runs all R scripts in this data and code archive sequentially and in a suitable order (on computers with a "bash" shell such as most computers with MacOS, GNU/Linux, or Unix operating systems)* Folder "DataRaw"Description: folder for raw data filesThis folder contains the following files:- DataRaw/COWS.xlsxDescription: MS-Excel file with the number of cows per countySource: USDA NASS QuickstatsObservations: All available counties and years from 2002 to 2012- DataRaw/milk_state.xlsxDescription: MS-Excel file with average monthly milk yields per cowSource: USDA NASS QuickstatsObservations: All available states from 1981 to 2018- DataRaw/TMAX.csvDescription: CSV file with daily maximum temperaturesSource: PRISM Climate Group (spatially averaged)Observations: All counties from 1981 to 2018- DataRaw/VPD.csvDescription: CSV file with daily maximum vapor pressure deficitsSource: PRISM Climate Group (spatially averaged)Observations: All counties from 1981 to 2018- DataRaw/countynamesandID.csvDescription: CSV file with county names, state FIPS codes, and county FIPS codesSource: US Census BureauObservations: All counties- DataRaw/statecentroids.csvDescriptions: CSV file with latitudes and longitudes of state centroidsSource: Generated by Nathan Mueller from Matlab state shapefiles using the Matlab "centroid" functionObservations: All states* Folder "DataGenerated"Description: folder for data sets that are generated by the R scripts in this data and code archive. In order to reproduce our entire analysis 'from scratch', the files in this folder should be deleted. We provide these generated data files so that parts of the analysis can be replicated (e.g., on computers with insufficient memory to run all parts of the analysis).* Folder "Results"Description: folder for intermediate results that are generated by the R scripts in this data and code archive. In order to reproduce our entire analysis 'from scratch', the files in this folder should be deleted. We provide these intermediate results so that parts of the analysis can be replicated (e.g., on computers with insufficient memory to run all parts of the analysis).* Folder "Figures"Description: folder for the figures that are generated by the R scripts in this data and code archive and that are presented in our paper. In order to reproduce our entire analysis 'from scratch', the files in this folder should be deleted. We provide these figures so that people who replicate our analysis can more easily compare the figures that they get with the figures that are presented in our paper. Additionally, this folder contains CSV files with the data that are required to reproduce the figures.* Folder "Tables"Description: folder for the tables that are generated by the R scripts in this data and code archive and that are presented in our paper. In order to reproduce our entire analysis 'from scratch', the files in this folder should be deleted. We provide these tables so that people who replicate our analysis can more easily compare the tables that they get with the tables that are presented in our paper.* Folder "logFiles"Description: the shell script runAll.sh writes the output of each R script that it runs into this folder. We provide these log files so that people who replicate our analysis can more easily compare the R output that they get with the R output that we got.* PrepareCowsData.RDescription: R script that imports the raw data set COWS.xlsx and prepares it for the further analyses* PrepareWeatherData.RDescription: R script that imports the raw data sets TMAX.csv, VPD.csv, and countynamesandID.csv, merges these three data sets, and prepares the data for the further analyses* PrepareMilkData.RDescription: R script that imports the raw data set milk_state.xlsx and prepares it for the further analyses* CalcFrequenciesTHI_Temp.RDescription: R script that calculates the frequencies of days with the different THI bins and the different temperature bins in each month for each state* CalcAvgTHI.RDescription: R script that calculates the average THI in each state* PreparePanelTHI.RDescription: R script that creates a state-month panel/longitudinal data set with exposure to the different THI bins* PreparePanelTemp.RDescription: R script that creates a state-month panel/longitudinal data set with exposure to the different temperature bins* PreparePanelFinal.RDescription: R script that creates the state-month panel/longitudinal data set with all variables (e.g., THI bins, temperature bins, milk yield) that are used in our statistical analyses* EstimateTrendsTHI.RDescription: R script that estimates the trends of the frequencies of the different THI bins within our sampling period for each state in our data set* EstimateModels.RDescription: R script that estimates all model specifications that are used for generating results that are presented in the paper or for comparing or testing different model specifications* CalcCoefStateYear.RDescription: R script that calculates the effects of each THI bin on the milk yield for all combinations of states and years based on our 'final' model specification* SearchWeightMonths.RDescription: R script that estimates our 'final' model specification with different values of the weight of the temporal component relative to the weight of the spatial component in the temporally and spatially correlated error term* TestModelSpec.RDescription: R script that applies Wald tests and Likelihood-Ratio tests to compare different model specifications and creates Table S10* CreateFigure1a.RDescription: R script that creates subfigure a of Figure 1* CreateFigure1b.RDescription: R script that creates subfigure b of Figure 1* CreateFigure2a.RDescription: R script that creates subfigure a of Figure 2* CreateFigure2b.RDescription: R script that creates subfigure b of Figure 2* CreateFigure2c.RDescription: R script that creates subfigure c of Figure 2* CreateFigure3.RDescription: R script that creates the subfigures of Figure 3* CreateFigure4.RDescription: R script that creates the subfigures of Figure 4* CreateFigure5_TableS6.RDescription: R script that creates the subfigures of Figure 5 and Table S6* CreateFigureS1.RDescription: R script that creates Figure S1* CreateFigureS2.RDescription: R script that creates Figure S2* CreateTableS2_S3_S7.RDescription: R script that creates Tables S2, S3, and S7* CreateTableS4_S5.RDescription: R script that creates Tables S4 and S5* CreateTableS8.RDescription: R script that creates Table S8* CreateTableS9.RDescription: R script that creates Table S9
description:
SNAP (Soil Nutrient Assessment Program), a component of the USDA/ARS Soil and Water Hub, is a web-based tool that provides an estimate of plant-available nutrients that the soil naturally provides.
Soil test fertilizer recommendations have long been predicated upon response curves generated from fertility trials across the country. These response curves have been compared to relative yield which provide probability ranges for a response to varying fertilizer inputs. Category responses include very low, low, adequate, high or very high inversely related to probability of a response to various inputs of nitrogen, phosphate, and potassium (N, P, and K).
New soil test methods, increases in computing power and access to the internet have enabled development of an interactive tool that is based on plant available NPK from both the inorganic fraction and organic pool of the soil. The new methods provide an estimate of plant available nutrients that the soil naturally provides, which has largely been ignored for decades.
Since we have access to large datasets we can calculate the amounts of NPK required growing crops in lbs NPK per bu of the desired crop. For example, it requires 100 lbs of N, 50 lbs P2O5, 50 lbs K2O to grow 100 bu corn. These are the base numbers from which we subtract the soil test data after converting from the analytical ppm to Lbs P2O5 or lbs K2O. This is a straight subtraction. It also eliminates the need for "calibration data" since the soil tests reflect the soils inherent fertility. Using the example above, of 100, 50, 50 of N, P, and K required and soil test results of 25, 35, 45 then the fertilizer needed would be 75 N, 15 P2O5 and 5 K2O. This is a simple approach that doesn't get lost in relative yield-crop response curves that have been used for decades from differing geographical areas.
This tool will include current fertilizer prices, soil test inputs, and crop based county averages for the last 15 years that will predict the chances of making the yield goal the user inputs compared to historical yield data for their county and calculate the fertilizer cost with and without soil testing compared to user input yield goal and county average. This tool will allow the user via the internet to produce a more straightforward approach to realistically planning next year's fertilizer inputs and associated cost. It will also show the benefits of soil testing for increased fertilizer efficiency and reduced environmental impact.
; abstract:SNAP (Soil Nutrient Assessment Program), a component of the USDA/ARS Soil and Water Hub, is a web-based tool that provides an estimate of plant-available nutrients that the soil naturally provides.
Soil test fertilizer recommendations have long been predicated upon response curves generated from fertility trials across the country. These response curves have been compared to relative yield which provide probability ranges for a response to varying fertilizer inputs. Category responses include very low, low, adequate, high or very high inversely related to probability of a response to various inputs of nitrogen, phosphate, and potassium (N, P, and K).
New soil test methods, increases in computing power and access to the internet have enabled development of an interactive tool that is based on plant available NPK from both the inorganic fraction and organic pool of the soil. The new methods provide an estimate of plant available nutrients that the soil naturally provides, which has largely been ignored for decades.
Since we have access to large datasets we can calculate the amounts of NPK required growing crops in lbs NPK per bu of the desired crop. For example, it requires 100 lbs of N, 50 lbs P2O5, 50 lbs K2O to grow 100 bu corn. These are the base numbers from which we subtract the soil test data after converting from the analytical ppm to Lbs P2O5 or lbs K2O. This is a straight subtraction. It also eliminates the need for "calibration data" since the soil tests reflect the soils inherent fertility. Using the example above, of 100, 50, 50 of N, P, and K required and soil test results of 25, 35, 45 then the fertilizer needed would be 75 N, 15 P2O5 and 5 K2O. This is a simple approach that doesn't get lost in relative yield-crop response curves that have been used for decades from differing geographical areas.
This tool will include current fertilizer prices, soil test inputs, and crop based county averages for the last 15 years that will predict the chances of making the yield goal the user inputs compared to historical yield data for their county and calculate the fertilizer cost with and without soil testing compared to user input yield goal and county average. This tool will allow the user via the internet to produce a more straightforward approach to realistically planning next year's fertilizer inputs and associated cost. It will also show the benefits of soil testing for increased fertilizer efficiency and reduced environmental impact.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
I) SUMMARY
This database contains harmonized time series for the study of crop yields using remote sensing data and meteorological data. We collected information on soybean, corn, and wheat yields (t/ha) over the CONUS (continuous US) from USDA-NASS for years 2015–2018 at a county level, and collocated time series for the following variables:
II) CONTACT
For questions, please email Laura Martínez-Ferrer at laura.martinez-ferrer@uv.es
III) DATABASE
For each crop type, we provided CSV files containing the time series of the variables and yield described above. Furthermore, additional information for spatial and temporal identification such as a county identifier and a year are included. Lastly, country-shapefiles (.shp) are added for geospatial representation. Further details in readme.txt file.
IV) CITE
We kindly encourage to cite the following works if this database is used
L. Martínez-Ferrer, M. Piles, G. Camps-Valls, Crop Yield Estimation and Interpretability With Gaussian Processes, IEEE Geoscience and Remote Sensing Letters, 2020, vol. 18, no 12, p. 2043-2047, DOI: 10.1109/LGRS.2020.3016140
A. Mateo-Sanchis, J. E. Adsuara, M. Piles, J. Muñoz-Marí, A. Pérez-Suay and G. Camps-Valls, "Interpretable Long-Short Term Memory Networks for Crop Yield Estimation," in IEEE Geoscience and Remote Sensing Letters, DOI: 10.1109/LGRS.2023.3244064