Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These datasets (.Rmd, .Rroj., .rds) are ready to use within the R software for statistical programming with the R Studio Graphical User Interface (https://posit.co/download/rstudio-desktop/). Please copy the folder structure into one single directory and follow the instructions given in the .Rmd file. Files and data are listed and described as follows:
Main directory files: results_fpath
Population estimation files: wpop_files
Steepness and elevation analysis derived from SRTM and processed in Google Earth Engine for landslides, mountain regions and urban centers in cities: gee_files
Standard deviation analysis derived from SRTM and processed in Google Earth Engine for mean slope in mountain regions and urban centers in cities: gee_sd
This data release supports an analysis of changes in dissolved organic carbon (DOC) and nitrate concentrations in Buck Creek watershed near Inlet, New York 2001 to 2021. The Buck Creek watershed is a 310-hectare forested watershed that is recovering from acidic deposition within the Adirondack region. The data release includes pre-processed model inputs and model outputs for the Weighted Regressions on Time, Discharge and Season (WRTDS) model (Hirsch and others, 2010) to estimate daily flow normalized concentrations of DOC and nitrate during a 20-year period of analysis. WRTDS uses daily discharge and concentration observations implemented through the Exploration and Graphics for River Trends R package (EGRET) to predict solute concentration using decimal time and discharge as explanatory variables (Hirsch and De Cicco, 2015; Hirsch and others, 2010). Discharge and concentration data are available from the U.S. Geological Survey National Water Information System (NWIS) database (U.S. Geological Survey, 2016). The time series data were analyzed for the entire period, water years 2001 (WY2001) to WY2021 where WY2001 is the period from October 1, 2000 to September 30, 2001. This data release contains 5 comma-separated values (CSV) files, one R script, and one XML metadata file. There are four input files (“Daily.csv”, “INFO.csv”, “Sample_doc.csv”, and “Sample_nitrate.csv”) that contain site information, daily mean discharge, and mean daily DOC or nitrate concentrations. The R script (“Buck Creek WRTDS R script.R”) uses the four input datasets and functions from the EGRET R package to generate estimations of flow normalized concentrations. The output file (“WRTDS_results.csv”) contains model output at daily time steps for each sub-watershed and for each solute. Files are automatically associated with the R script when opened in RStudio using the provided R project file ("Files.Rproj"). All input, output, and R files are in the "Files.zip" folder.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Estimates of crop nutrient removal (as crop products and crop residues) are an important component of crop nutrient balances. Crop nutrient removal can be estimated through multiplication of the quantity of crop products or crop residues (removed) by the nutrient concentration of those crop products and crop residue components respectively. Data for quantities of crop products removed at a country level are available through FAOSTAT (https://www.fao.org/faostat/en/), but equivalent data for quantities of crop residues are not available at a global level. However, quantities of crop residues can be estimated if the relationship between quantity of crop residues and crop products is known. Harvest index (HI) provides one such indication of the relationship between quantity of crop products and crop residues. HI is the proportion of above-ground biomass as crop products and can be used to estimate quantity of crop residues based on quantity of crop products. Previously, meta-analyses or surveys have been performed to estimate nutrient concentrations of crop products and crop residues and harvest indices (collectively known as crop coefficients). The challenges for using these coefficients in global nutrient balances include the representativeness of world regions or countries. Moreover, it may be unclear which countries or crop types are actually represented in the analyses of data. In addition, units used among studies differ which makes comparisons challenging. To overcome these challenges, data from meta-analyses and surveys were collated in one dataset with standardised units and referrals to the original region and crop names used by the sources of data. Original region and crop names were converted into internationally recognised names, and crop coefficients were summarised into two Tiers of data, representing the world (Tier 1, with single coefficient values for the world) and specific regions or countries of the world (Tier 2, with single coefficient values for each country). This dataset will aid both global and regional analyses for crop nutrient balances.
Methods
Data acquisition
Data were primarily collated from meta-analyses found in scientific literature. Terms used in Ovid (https://ovidsp.ovid.com/), CAB Abstracts (https://www.cabdirect.org/) and Google Scholar (https://scholar.google.com/) were: (crop) AND (“nutrient concentration” OR “nutrient content” OR “harvest index”) across any time. This search resulted in over 245,000 results. These results were refined to include studies that purported to represent crop nutrient concentration and/or harvest index of crops for geographic regions of the world, as opposed to site-specific field experiments. Given the range in different crops grown globally, preference was given to acquiring datasets that included multiple crops. In some cases, authors of meta-analyses were asked for raw data to aid the standardisation process. In addition, the International Fertilizer Association (IFA), and the Food and Agriculture Organization of the United Nations (UN FAO) provided data used for crop nutrient balances (FAOSTAT 2020). The request to UN FAO yielded phosphorus and potassium crop nutrient concentrations in addition to their publicly available nitrogen concentration values (FAOSTAT 2020). In total the refined search resulted in 26 different sources of data.
Data files were converted to separate comma-delimited CSV files for each source of data, whereby a unique ‘source’ was a dataset from an article from the scientific literature or a dataset sent by the UN FAO or IFA. Crop nutrient concentrations were expressed as a percentage of dry matter and/or the percentage of fresh weight depending on which units were reported and whether dry matter percentages of crop fresh weight were reported. Meta-data text files were written to accompany each standardized CSV file. The standardized CSV files for each source of data included information on the name of the original region, the crop coefficients it purported to represent, as well as the original names of the crops as categorised by the authors of the data. If the data related to a meta-analysis of multiple sources, information was included for the primary source of data when available. Data from the separate source files were collated into one file named ‘Combined_crop_data.csv’ using R Studio (version 4.1.0) (hereafter referred to as R) with the scripts available at https://github.com/ludemannc/Tier_1_2_crop_coefficients.git.
Processing of data
When transforming the combined data file (‘Combined_crop_data.csv’) into representative crop coefficients for different regions (available in ‘Tier_1_and_2_crop_coefficients.csv’), crop coefficients that were duplicates from the same primary source of data were excluded from processing. For instance, Zhang et al. (2021) referred to multiple primary sources of data, and the data requested from the UN FAO and the IFA referred (in many cases) to crop coefficients from IPNI (2014). Duplicate crop coefficient data that came from the same primary source were therefore excluded from the summarised dataset of crop coefficients.
Two tiers of data
The data were sub-divided into two Tiers to help overcome the challenge of using these data in a global nutrient balance when data are not available for every country. This follows the approach taken by the Intergovernmental Panel for Climate Change-IPCC (IPCC 2019). Data were assigned different ‘Tiers’ based on complexity and data requirements.
· Tier 1: crop coefficients at the world level.
· Tier 2: crop coefficients at more granular geographic regions of the world (e.g. at regional, country or sub-country levels).
Crop coefficients were summarised as means for each crop item and crop component based on either ‘Tier 1’ or ‘Tier 2’.
One could also envision a more detailed site-specific level (Tier 3). The data in this dataset did not meet the required level of complexity or data requirements for Tier 3, unlike, say, the site-specific data being collected as part of the Consortium for Precision Crop Nutrition (CPCN) (www.cropnutrientdata.net)-which could be described as being Tier 3. No data from the current dataset were therefore assigned to Tier 3. It is expected that in the future, site-specific data will be used to improve the crop coefficients further with a Tier 3 approach.
The ‘Tier_1_and_2_crop_coefficients.csv’ file includes mean crop coefficients for the Tier 1 data, and mean crop coefficients for the Tier 2 data. The Tier 1 estimates of crop coefficients were mean values across Tier 1 data that purported to represent the World.
Crop coefficients found in the data sources represent quite different geographic areas or regions. To enable combining data with different spatial overlaps for Tier 2, data were disaggregated to the country level. First, each region was assigned a list of countries (which the regional averages were assumed to represent, as listed in the ‘Original_region_names_and_assigned_countries.csv’ file). Countries were assigned alpha-3 country codes following the ISO 3166 international standards (https://www.iso.org/publication/PUB500001.html). Second, for each country mean, crop coefficients were calculated based on coefficients from regions listed for each country. For Australia for example, the mean values for each crop coefficient were calculated from values that represented sub-country (e.g. Australia New South Wales South East), country (Australia), and multi-country (e.g. Oceania) regions. For instance, if there was a harvest index value of 0.5 for wheat for the original region ‘Australia New South Wales South East’, a value of 0.51 for the original region named ‘Australia’ and a value of 0.47 for the original region named ‘Oceania’, then the mean Tier 2 harvest index for wheat for the country Australia would be 0.493, the unweighted mean. Using our dataset, a user can assign different weights to each entry.
To aid analysis, the names of the original categories of crop were converted into UN FAO crop ‘item’ categories, following UN FAO standards (FAOSTAT 2022) (available in the ‘Original_crop_names_in_each_item_category.csv’ file). These item categories were also assigned categorical numeric codes following UN FAO standards (FAOSTAT 2022). Data related to crop products (e.g. grain, beans, saleable tubers or fibre) were assigned the category “Crop_products” and crop residues (eg straw, stover) were assigned the category “Crop_residues”.
Dry and fresh matter weights
In some cases nutrient concentration values from the original sources were available on a dry matter or a fresh weight basis, but not both. Gaps in either the nutrient concentration on a dry matter or fresh weight basis were given imputed values. If the data source mentioned the dry matter percentage of the crop component then this was preferentially used to impute the other missing nutrient concentration data. If dry matter percentage information was not available for a particular crop item or crop component, missing data were imputed using the mean dry matter percentage values across all Tier 1 and Tier 2 data.
Global means for the UN FAO Cropland Nutrient Budget.
Data were also summarised as means for nitrogen (N), elemental phosphorus (P) and elemental potassium (K) nutrient concentrations of crop products using data that represented the world (Tier 1) for the 2023 UN FAO Cropland Nutrient Budget. These data are available in the file named World_crop_coefficients_for_UN_FAO.csv.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Title: Supporting Data and Analysis for: Average Temperature Variations in Železný Brod
Description:
WARNING: This dataset consists of fictive data and is used only for testing purposes. Do not use this dataset for further experiments.
This dataset and R script support the article:
Steiner, L. (2025) „Average Temperature Variations in Jizerka (Liberec Region) During December 2024: A Detailed Analysis", Fiction/non-fiction 21, 1(1), s. 1–3. doi: 10.5281/zenodo.14289761
---
Contents
- `RDE.csv`: Daily temperature data for March 2025 collected in the city of Železný Brod.
- `temperature_analysis.R`: R script that computes daily averages and summary statistics (mean, median, trimmed mean, SD, IQR, min/max).
- `temperature_statistics.csv`: Output file generated by the script.
- `README.txt`: Usage instructions and metadata.
---
Data origin
This dataset was measured with temperature measuring device at local measuring station in the city of Železný Brod (Liberec region).
---
How to Use
1. Open the `temperature_analysis.R` file in RStudio or any R-compatible environment.
2. Ensure that `RDE.csv` is in the working directory.
3. Run the script to generate the statistical output (`temperature_statistics.csv`).
---
License
This deposit is licensed under the **Creative Commons Attribution 4.0 International (CC BY 4.0)** license.
You are free to share and adapt the materials for any purpose, even commercially, provided that you give appropriate credit to the author.
---
If you use this dataset or code, please cite either the Zenodo record or the related publication listed above.
---
Data collection methodology
This data was collected by measuring daily temperatures in the city of Železný Brod. Temperatures were measured at local measuring station.
---
Contact information
Data administrator: Vojtěch Šorm, Faculty of Mechanical engineering, Technical university of Liberec, ORCID: 0009-0007-9298-114X
---
Version and updates
This dataset was last updated on 24.4.2025.
---
Version 3 (12/23/24) -- Updated trait data (ComboFin2_new.csv, Supplemental_Table_1_data.xlsx, SR_data_mean.csv) Version 2 (7/21/23) -- Supplemental materials now include an R markdown file that outlines all analyses completed in R studios as well as two dataframes and a tree file to be uploaded when running analyses. This is in addition to the three supplemental files that were previously published. Please note that .csv files include species mean data for each trait. Also note that Supplemental Table 2 now includes information on which data columns in each .csv file correspond to which traits. Please see the README file for additional details about each published file. Version 1 (6/21/23) -- Supplemental files include trait data for each specimen, definitions for all measurements taken and calculated, and PCA loadings.
Flow-duration statistics at the 99th, 98th, 95th, 90th, 80th, 70th, 60th, and 50th percent exceedance probabilities and annual n-day low-flow statistics for the 1-, 7-, 14-, and 30-day mean low flows with 2-year (0.5 nonexceedance probability), 5-year (0.2 nonexceedance probability), and 10-year (0.1 nonexceedance probability) recurrence intervals were computed for 28 selected streamflow gaging stations in Puerto Rico. The 28 selected streamflow gaging stations were required to have 10 or more years of daily mean streamflow data through water year 2018. The flow-duration statistics and n-day low-flow frequencies were computed using the U.S. Geological Survey program, SWToolbox. Regional regression equations were developed to estimate flow-duration statistics and n-day low-flow frequencies at ungaged stream locations using selected basin characteristics as explanatory variables. These variables were determined from digital spatial datasets and geographic information systems using the most recent data available, as referenced in the U.S. Geological Survey web application, StreamStats, and published in Kolb and Ryan (2021). An ordinary least-squares procedure in R Studio was used to develop the final regional flow-duration regression equations using drainage area, mean total annual reference evapotranspiration, and minimum basin elevation as the explanatory variables. A generalized least squares procedure in the U.S. Geological Survey program, WREG, was used to account for cross-correlation of sites and develop the final regional n-day low-flow frequency regression equations using drainage area, mean total annual reference evapotranspiration, and minimum basin elevation as the explanatory variables. This data release includes two child pages: Puerto Rico Flow-Duration Regression Files and Puerto Rico N-day Low-Flow Regression Files, a BasinCharacteristics.csv file that contains 47 basin and climatic characteristics considered in the analyses, a BasinCharacteristics_corrrelation_charts folder that contains .pdf files showing correlation matrices, a R_regsubsets_output folder that contains .txt and .pdf files showing results of the "regsubsets" analyses, a Trend_statistics_nday_lowflow_timeseries.csv file that contains the SWToolbox Mann-Kendall tau statistics, and a NWIS_rdb_files folder that contains the .rdb files used in the analyses. References Cited: Kolb, K.R., and Ryan, P.J., 2021, Basin Characteristic Rasters for Puerto Rico StreamStats, 2021: U.S. Geological Survey data release, https://doi.org/10.5066/P9HK9SSQ.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These datasets (.Rmd, .Rroj., .rds) are ready to use within the R software for statistical programming with the R Studio Graphical User Interface (https://posit.co/download/rstudio-desktop/). Please copy the folder structure into one single directory and follow the instructions given in the .Rmd file. Files and data are listed and described as follows:
Main directory files: results_fpath
Population estimation files: wpop_files
Steepness and elevation analysis derived from SRTM and processed in Google Earth Engine for landslides, mountain regions and urban centers in cities: gee_files
Standard deviation analysis derived from SRTM and processed in Google Earth Engine for mean slope in mountain regions and urban centers in cities: gee_sd