Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Author: Andrew J. FeltonDate: 5/5/2024
This R project contains the primary code and data (following pre-processing in python) used for data production, manipulation, visualization, and analysis and figure production for the study entitled:
"Global estimates of the storage and transit time of water through vegetation"
Please note that 'turnover' and 'transit' are used interchangeably in this project.
Data information:
The data folder contains key data sets used for analysis. In particular:
"data/turnover_from_python/updated/annual/multi_year_average/average_annual_turnover.nc" contains a global array summarizing five year (2016-2020) averages of annual transit, storage, canopy transpiration, and number of months of data. This is the core dataset for the analysis; however, each folder has much more data, including a dataset for each year of the analysis. Data are also available is separate .csv files for each land cover type. Oterh data can be found for the minimum, monthly, and seasonal transit time found in their respective folders. These data were produced using the python code found in the "supporting_code" folder given the ease of working with .nc and EASE grid in the xarray python module. R was used primarily for data visualization purposes. The remaining files in the "data" and "data/supporting_data"" folder primarily contain ground-based estimates of storage and transit found in public databases or through a literature search, but have been extensively processed and filtered here.
Python scripts can be found in the "supporting_code" folder.
Each R script in this project has a particular function:
01_start.R: This script loads the R packages used in the analysis, sets thedirectory, and imports custom functions for the project. You can also load in the main transit time (turnover) datasets here using the source()
function.
02_functions.R: This script contains the custom function for this analysis, primarily to work with importing the seasonal transit data. Load this using the source()
function in the 01_start.R script.
03_generate_data.R: This script is not necessary to run and is primarilyfor documentation. The main role of this code was to import and wranglethe data needed to calculate ground-based estimates of aboveground water storage.
04_annual_turnover_storage_import.R: This script imports the annual turnover andstorage data for each landcover type. You load in these data from the 01_start.R scriptusing the source()
function.
05_minimum_turnover_storage_import.R: This script imports the minimum turnover andstorage data for each landcover type. Minimum is defined as the lowest monthlyestimate.You load in these data from the 01_start.R scriptusing the source()
function.
06_figures_tables.R: This is the main workhouse for figure/table production and supporting analyses. This script generates the key figures and summary statistics used in the study that then get saved in the manuscript_figures folder. Note that allmaps were produced using Python code found in the "supporting_code"" folder.
We implemented automated workflows using Jupyter notebooks for each state. The GIS processing, crucial for merging, extracting, and projecting GeoTIFF data, was performed using ArcPy—a Python package for geographic data analysis, conversion, and management within ArcGIS (Toms, 2015). After generating state-scale LES (large extent spatial) datasets in GeoTIFF format, we utilized the xarray and rioxarray Python packages to convert GeoTIFF to NetCDF. Xarray is a Python package to work with multi-dimensional arrays and rioxarray is rasterio xarray extension. Rasterio is a Python library to read and write GeoTIFF and other raster formats. Xarray facilitated data manipulation and metadata addition in the NetCDF file, while rioxarray was used to save GeoTIFF as NetCDF. These procedures resulted in the creation of three HydroShare resources (HS 3, HS 4 and HS 5) for sharing state-scale LES datasets. Notably, due to licensing constraints with ArcGIS Pro, a commercial GIS software, the Jupyter notebook development was undertaken on a Windows OS.
cmomy is a python package to calculate central moments and co-moments in a numerical stable and direct way. Behind the scenes, cmomy makes use of Numba to rapidly calculate moments. cmomy provides utilities to calculate central moments from individual samples, precomputed central moments, and precomputed raw moments. It also provides routines to perform bootstrap resampling based on raw data, or precomputed moments. cmomy has numpy array and xarray DataArray interfaces.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset provides simulated data on plastic and substance flows and stocks in buildings and infrastructure as described in the data article "Plastics in the German Building and Infrastructure Sector: A High-Resolution Dataset on Historical Flows, Stocks, and Legacy Substance Contamination". Besides simulated data, the repository contains input data and model files used to produce the simulated data.
Data & Data Visualization: The dataset contains input data and simulated data for the six main plastic applications in buildings and infrastructure in Germany in the period from 1950 to 2023, which are profiles, flooring, pipes, insulation material, cable insulations, and films. For each application the data are provided in a sub-directory (1_ ... 6_) following the structure described below.
Input Data:
The input data are stored in an xlsx-file with three sheets: flows, parameters, and data quality assessment. The data sources for all input data are detailed in the Supplementary Material of the linked Data in Brief article.
Simulated Data:
Simulated data are stored in a sub-folder, which contains:
Note: All files in the [product]/simulated_data folder are automatically replaced with updated model results upon execution of immec_dmfa_calculate_submodels.py.
To reduce storage requirements, data are stored in gzipped pickle files (.pkl.gz), while smaller files are provided as pickle files (.pkl). To open the files, users can use Python with the following code snippet:
import gzip
# Load a gzipped pickle file
with gzip.open("filename.pkl.gz", "rb") as f:
data = pickle.load(f)
# Load a regular pickle file
with open("filename.pkl", "rb") as f:
data = pickle.load(f)
Please note that opening pickle files requires compatible versions of numpy
and pandas
, as the files may have been created using version-specific data structures. If you encounter errors, ensure your package versions match those used during file creation (pandas: 2.2.3, numpy: 2.2.4).
Simulated data are provided as Xarray datasets, a data structure designed for efficient handling, analysis, and visualization of multi-dimensional labeled data. For more details on using Xarray, please refer to the official documentation: https://docs.xarray.dev/en/stable/
Core Model Files:
Computational Considerations:
During model execution, large arrays are generated, requiring significant memory. To enable computation on standard computers, Monte Carlo simulations are split into multiple chunks:
Dependencies
The model relies on the ODYM framework. To run the model, ODYM must be downloaded from https://github.com/IndEcol/ODYM (S. Pauliuk, N. Heeren, ODYM — An open software framework for studying dynamic material systems: Principles, implementation, and data structures, Journal of Industrial Ecology 24 (2020) 446–458. https://doi.org/10.1111/jiec.12952.)
7_Model_Structure:
8_Additional_Data: This folder contains supplementary data used in the model, including substance concentrations, data quality assessment scores, open-loop recycling distributions, and lifetime distributions.
The dataset was generated using a dynamic material flow analysis (dMFA) model. For a complete methodology description, refer to the Data in Brief article (add DOI).
If you use this dataset, please cite: Schmidt, S., Verni, X.-F., Gibon, T., Laner, D. (2025). Dataset for: Plastics in the German Building and Infrastructure Sector: A High-Resolution Dataset on Historical Flows, Stocks, and Legacy Substance Contamination, Zenodo. DOI: 10.5281/zenodo.15049210
This dataset is licensed under CC BY-NC 4.0, permitting use, modification, and distribution for non-commercial purposes, provided that proper attribution is given.
For questions or further details, please contact:
Sarah Schmidt
Center for Resource Management and Solid Waste Engineering
University of Kassel
Email: sarah.schmidt@uni-kassel.de
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Idealized Planar Array study for Quantifying Spatial heterogeneity (IPAQS) is the result of a National Science Foundation (US) funded project, that aims at studying the effect of surface thermal heterogeneities of different length-scale on the atmospheric boundary layer. This project consisted of a computational effort (dataset here included), and an experimental effort (dataset being prepared for publication).
Overview of the numerical (Large Eddy) simulations:
The simulations are separated into two sets to study the differences between heterogeneous and homogeneous surfaces. In the first set, a total of seven configurations are considered, all with a homogeneous surface temperature fixed at a value of (T_s) = 290 K, and for which the geostrophic wind speed has been increased from 1 to 15 m s-1 (i.e., Ug = 1, 2, 3, 4, 6, 9, 15 m s−1 ). These homogeneous cases are referred to as Homog-X, where X indicates the geostrophic wind speed corresponding case (see Margairaz et al. 2020a). In the second set, the surface temperature is distributed amongst square patches, where the temperature of each patch is determined by sampling a Gaussian distribution with a mean temperature of 290 K and a standard deviation of 5 K. In this case, three different patch sizes were considered (i.e., lh = 800, 400, and 200 m). The sizes of the heterogeneities were chosen to be of similar size (lh /ld ≈ 1), half the size (lh /ld ≈ 1/2), and about a quarter of the size (lh /ld ≈ 1/4) of the largest flow motions within the represented thermal boundary layer, assuming that this is of the order of the boundary-layer height (ld ∼ z i ). These heterogeneities are typically not resolved in NWP models. These cases have been studied for the same geostrophic wind speeds indicated above, and hereafter are referred to as PYYY-X-, where X indicates the corresponding geostrophic wind speed, and YYY refers to the size of the patches (e.g., P800_Ug1_ would be the heterogeneous case with patches of 800 m, and forced with Ug = 1 m s−1 ). Additionally, for the case with larger patches, three different random distributions of the patches were considered to evaluate the potential effect of a given surface distribution for all geostrophic wind speeds. In this dataset we only include case v3. The LES imposed surface temperature distributions emulate the surface thermal conditions observed in Morrison et al. (2017 QJRMS, 2021 BLM, 2022 BLM), where measurements of the surface temperature were taken with a thermal camera at the SLTEST site of the US Army Dugway Proving Ground in Utah, USA. This is an ideal site with uniform roughness and a large unperturbed fetch, where surface thermal heterogeneities are naturally created by differences in surface salinity. In all studied cases, the surface roughness is assumed homogeneous, with z0 = 0.1 m, and representative of a surface with sparse forest or farmland with many hedges (Brutsaert 1982; Stull 1988). The initial boundary-layer height is set to zi = 1000 m. The temperature profile is initialized with a mean air temperature of 285 K. At the top of the initial boundary layer, a capping inversion of 1000 m is used to limit its growth. The strength of this inversion is fixed at Γ = 0.012 K m−1. The atmospheric boundary layer (ABL) is considered dry and the latent heat flux is neglected in all cases. Further, in all simulations, the surface heat flux is computed using MOST, as explained in Margairaz et al. 2020a, where the surface temperature is kept constant in time throughout the simulations. Thus, there is no feedback from the atmosphere to the surface as the surface temperature does not cool down or warm up with local changes in velocities. As a consequence, the ABL gradually warms up as the simulations progress, and hence becomes less convective over time. However, the runs are not long enough for this to be significant. In addition, to ensure a degree of homogeneity within each patch and a certain degree of validity of MOST, note that even for the heterogeneous cases with the fewest amount of grid points per patch, a minimum of eight grid points is granted in each horizontal direction. The domain size is set to (Lx, Ly, Lz) = (2π, 2π, 2) km at a grid size of (Nx , Ny , Nz ) = (256, 256, 256) resulting in a horizontal resolution of (\Delta)x = (\Delta)y = 24.5 m and a vertical grid spacing of (\Delta)z = 7.8 m. A timestep of (\Delta)t = 0.1 s is used to ensure the stability of the time integration. The two sets of simulations span a large range of geostrophic forcing conditions, allowing the study of the effect on the structure of the convective boundary layer (CBL) above a patchy surface compared to a homogeneous surface. The procedure used to spin up the simulations is the following: a spinup phase of four hours of real time is used to achieve converged turbulent statistics, which is then followed by an evaluation phase. During the latter, running averages are computed for the next hour of real time (dataset here published). Statistics have been computed for averaging times of 5 min to 1 h, showing statistical convergence at 30-min averages with negligible changes between the 30-min and the 60-min averages. The simulations cover a wide range of atmospheric stability regimes ranging from −zi/L < 5 to −zi/L > 700, and hence spanning from near neutral to highly convective scenarios.
Description of the Dataset as included in the NetCDF files:
Data for each study case is included in two files, one for momentum related variables, and one for temperature related variables. For example, the following files "P200_Ug1_Momentum.nc" and "P200_Ug1_Scalar.nc", include the 1h averaged variables for momentum and temperature for the case of 200 m surface patches with 1 m/s geostrophic winds.
Each corresponding momentum file "PXXX_UgX_Momentum.nc" includes the following variables in a Python Xarray structure:
'avgU' = mean streamwise wind speed; 'avgV' = mean spanwise wind speed; 'avgW' = mean vertical wind speed, 'avgP' = mean dynamic modified pressure field ((p^*), see Margairaz et al 2020a),
'avgU2', 'avgV2', 'avgW2' = correspond to (\overline{UU}), (\overline{VV}), and (\overline{WW}), where the capital indicates the LES filtered variable.
'avgUV', 'avgUW', 'avgVW' = correspond to (\overline{UV}), (\overline{UW}), and (\overline{VW}). These variables together with the ones above are used to compute the Reynolds stress components (e.g. (R_{xz} = \overline{U}\overline{W} - \overline{UW})).
avgU3', 'avgV3', 'avgW3', 'avgU4', 'avgV4', 'avgW4' = correspond to the equivalent but instead of squared they are cubed and to the 4th power.
'avgtxx','avgtyy','avgtzz','avgtxy','avgtxz','avgtyz' = These represent the corresponding averaged subgrid scale (SGS) stress.
'avgdudz','avgdvdz','avgNut','avgCs' = Represent the averaged vertical derivatives, an averaged subgrid Nusselt number, and the Cs coefficient computed in the SGS model.
Overall, there are a total of 26 variables related to the momentum field. Alternatively, the temperature fields are included in the "PXXX_UgX_Scalar.nc" files. These files include 10 variables,
'avgT' = mean Temperature field, 'avgT2' = corresponds to (\overline{TT}), 'avgUT' = correspond to (\overline{UT}), 'avgVT' = correspond to (\overline{VT}), 'avgWT' = correspond to (\overline{WT}); one can use these terms to compute the corresponding Reynolds averaged turbulent fluxes as is the case for momentum.
'avgUT_sgs','avgVT_sgs','avgWT_sgs' = These represent the corresponding subgrid scale fluxes.
'avg_nus', avg_ds' = averaged subgrid Nusselt number, and the Ds coefficient computed in the scalar SGS model.
All variables output from the LES are normalized by Tscale = 290 [K] when it includes dimensions of temperature, u_scale = 0.45 [m/s], when it relates to velocity fields, and zi = 1000 [m] for length scales.
The only output variables that are expressed in dimensional form are those for the surface temperature included in the files "SurfTemp_DXXX.nc"
Together with the data files we include a Python script that loads the data and includes it in two Xarray structures that one can then use to work with the datasets.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Author: Andrew J. FeltonDate: 5/5/2024
This R project contains the primary code and data (following pre-processing in python) used for data production, manipulation, visualization, and analysis and figure production for the study entitled:
"Global estimates of the storage and transit time of water through vegetation"
Please note that 'turnover' and 'transit' are used interchangeably in this project.
Data information:
The data folder contains key data sets used for analysis. In particular:
"data/turnover_from_python/updated/annual/multi_year_average/average_annual_turnover.nc" contains a global array summarizing five year (2016-2020) averages of annual transit, storage, canopy transpiration, and number of months of data. This is the core dataset for the analysis; however, each folder has much more data, including a dataset for each year of the analysis. Data are also available is separate .csv files for each land cover type. Oterh data can be found for the minimum, monthly, and seasonal transit time found in their respective folders. These data were produced using the python code found in the "supporting_code" folder given the ease of working with .nc and EASE grid in the xarray python module. R was used primarily for data visualization purposes. The remaining files in the "data" and "data/supporting_data"" folder primarily contain ground-based estimates of storage and transit found in public databases or through a literature search, but have been extensively processed and filtered here.
Python scripts can be found in the "supporting_code" folder.
Each R script in this project has a particular function:
01_start.R: This script loads the R packages used in the analysis, sets thedirectory, and imports custom functions for the project. You can also load in the main transit time (turnover) datasets here using the source()
function.
02_functions.R: This script contains the custom function for this analysis, primarily to work with importing the seasonal transit data. Load this using the source()
function in the 01_start.R script.
03_generate_data.R: This script is not necessary to run and is primarilyfor documentation. The main role of this code was to import and wranglethe data needed to calculate ground-based estimates of aboveground water storage.
04_annual_turnover_storage_import.R: This script imports the annual turnover andstorage data for each landcover type. You load in these data from the 01_start.R scriptusing the source()
function.
05_minimum_turnover_storage_import.R: This script imports the minimum turnover andstorage data for each landcover type. Minimum is defined as the lowest monthlyestimate.You load in these data from the 01_start.R scriptusing the source()
function.
06_figures_tables.R: This is the main workhouse for figure/table production and supporting analyses. This script generates the key figures and summary statistics used in the study that then get saved in the manuscript_figures folder. Note that allmaps were produced using Python code found in the "supporting_code"" folder.