Renamed the "Unindexed dimensions" section in the Dataset
and DataArray
repr (added in v0.9.0) to "Dimensions without coordinates".
The US National Center for Atmospheric Research partnered with the IBS Center for Climate Physics in South Korea to generate the CESM2 Large Ensemble which consists of 100 ensemble members at 1 degree spatial resolution covering the period 1850-2100 under CMIP6 historical and SSP370 future radiative forcing scenarios. Data sets from this ensemble were made downloadable via the Climate Data Gateway on June 14, 2021. NCAR has copied a subset (currently ~500 TB) of CESM2 LENS data to Amazon S3 as part of the AWS Public Datasets Program. To optimize for large-scale analytics we have represented the data as ~275 Zarr stores format accessible through the Python Xarray library. Each Zarr store contains a single physical variable for a given model run type and temporal frequency (monthly, daily).
We implemented automated workflows using Jupyter notebooks for each state. The GIS processing, crucial for merging, extracting, and projecting GeoTIFF data, was performed using ArcPy—a Python package for geographic data analysis, conversion, and management within ArcGIS (Toms, 2015). After generating state-scale LES (large extent spatial) datasets in GeoTIFF format, we utilized the xarray and rioxarray Python packages to convert GeoTIFF to NetCDF. Xarray is a Python package to work with multi-dimensional arrays and rioxarray is rasterio xarray extension. Rasterio is a Python library to read and write GeoTIFF and other raster formats. Xarray facilitated data manipulation and metadata addition in the NetCDF file, while rioxarray was used to save GeoTIFF as NetCDF. These procedures resulted in the creation of three HydroShare resources (HS 3, HS 4 and HS 5) for sharing state-scale LES datasets. Notably, due to licensing constraints with ArcGIS Pro, a commercial GIS software, the Jupyter notebook development was undertaken on a Windows OS.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Contains the DGM5 geological model and the VELMOD 3.1 velocity model as xarray datasets in UTM31 coordinates.
Original data:
Format:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a dataset of Sentinel-1 radiometric terrain corrected (RTC) imagery processed by the Alaska Satellite Facility covering a region within the Central Himalaya. It accompanies a tutorial demonstrating accessing and working with Sentinel-1 RTC imagery using xarray and other open source python packages.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Test data for ASTE Release 1 integration with ECCOv4-py.
This resource includes materials for two workshops: (1) FAIR Data Management and (2) Advanced Application of Python for Hydrology and Scientific Storytelling, both prepared for presentation at the NWC Summer Institute BootCamp 2024.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Test of EDR data with xarray
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Next Day Wildfire Spread Dataset
This dataset is an xarray version of the original Next Day Wildfire Spread dataset. It comes in three splits: train, eval and test. Note: Given the original dataset does not contain spatio-temporal information, the xarray coordinates has been set to arbitrary ranges (0-63 for spatial dimensions and 0-number_of_samples for the temporal dimension).
Example
To open a train split of the dataset and show an elevation plot at time=2137:… See the full description on the dataset page: https://huggingface.co/datasets/TheRootOf3/next-day-wildfire-spread.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains data for three different experiments presented in the paper:
(1) moose_feet (40 files): The moose leg experiments are labeled as ax_y.nc,
where 'a' indicates attached digits and 'f' indicates free digits. The
number 'x' is either 1 (front leg) or 2 (hind leg), and the number 'y'
is an increment from 0 to 9 representing the 10 samples of each set.
(2) synthetic_feet (120 files): The synthetic feet experiments are labeled
as lw_a_y.nc, where 'lw' (Low Water content) can be replaced by 'mw'
(Medium Water content) or 'vw' (Vast Water content). The 'a' can be 'o'
(Original Go1 foot), 'r' (Rigid extended foot), 'f' (Free digits anisotropic
foot), or 'a' (Attached digits). Similar to (1), the last number is an increment from 0 to 9.
(3) Go1 (15 files): The locomotion experiments of the quadruped robot on the
track are labeled as condition_y.nc, where 'condition' is either 'hard_ground'
for experiments on hard ground, 'bioinspired_feet' for the locomotion of the
quadruped on mud using bio-inspired anisotropic feet, or 'original_feet' for
experiments where the robot used the original Go1 feet. The 'y' is an increment from 0 to 4.
The files for moose_feet and synthetic_feet contain timestamp (s), position (m), and force (N) data.
The files for Go1 contain timestamp (s), position (rad), velocity (rad/s), torque (Nm) data for all 12 motors, and the distance traveled by the robot (m).
All files can be read using xarray datasets (https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This resource includes materials for two workshops: (1) FAIR data management and collaborating on simulation data in the cloud (2) Advanced application of Python for working with high value environmental datasets (3) Configuring and running a NextGen simulation and analyzing model outputs
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Testing files for the xesmf remapping package.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Normalized variances calculated using the method described in the article, based on experimental data. Data is stored using Xarray, specifically in the NetCDF format. Data can be easily accessed using the Xarray Python library, specifically by calling xarray.open_dataset() The dataset is structured as follows: two N-dimensional DataArrays, one corresponding for calculations with time displacements (labeled as time) and one for calculations with phase displacements with the time centroid already picked (labeled as final) each DataArray has 5 dimensions: SNR, eps (separation), ph_disp/disp (displacement), sample/sample_time (bootstrapped sample), supersample (ensemble of bootstrapped samples) coordinates label the parameters along each dimension Usage examples Opening the dataset import numpy as np import xarray as xr variances = xr.open_dataset("coherent.nc") Obtaining parameter estimates def get_centroid_indices(variances): return np.bincount( variances.argmin( dim="disp" if "disp" in variances.dims else "ph_disp" ).values.flatten() ) def get_centroid_index(variances): return np.argmax(get_centroid_indices(variances)) def epsilon_estimator(eps): return 4 * np.sqrt(np.clip(var, 0, None)) time_centroid_estimates = variances["time"].idxmin(dim="disp") phase_centroid_estimates = variances["final"].idxmin(dim="ph_disp") epsilon_estimates = eps_estimator( variances["final"].isel(ph_disp=common.get_centroid_index(variances["final"])) ) Calculating and plotting precision def plot(estimates): estimator_variances = estimates.var( dim="sample" if "sample" in estimates.dims else "sample_time" ) precision = ( 1.0 / estimator_variances.snr / variances.attrs["SAMPLE_SIZE"] / estimator_variances ) precision = precision.where(xr.apply_ufunc(np.isfinite, precision), other=0) mean_precision = precision.mean(dim="supersample") mean_precision = mean_precision.where(np.isfinite(mean_precision), 0) precision_error = 2 * precision.std(dim="supersample").fillna(0) g = mean_precision.plot.scatter( x="eps", col="snr", col_wrap=2, sharex=True, sharey=True, ) for ax, snr in zip(g.axs.flat, snrs): ax.errorbar( precision.eps.values, mean_precision.sel(snr=snr), yerr=precision_error.sel(snr=snr), fmt="o", ) plot(time_centroid_estimates) plot(phase_centroid_estimates) plot(epsilon_estimates)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The model build using these datasets can be found at https://github.com/AstexUK/ESP_DNN/tree/master/esp_dnnThe dataset themselves can be opened using xarray Python library (http://xarray.pydata.org/en/stable/#)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We present a globally consistent, satellite-derived dataset of CO_2 enhancement (ΔXCO_2), quantifying the spatially resolved excess in atmospheric CO_2 concentrations as a collective consequence of anthropogenic emissions and terrestrial carbon uptake. This dataset is generated from the deviations of NASA's OCO-3 satellite retrievals comprising 54 million observations across more than 200 countries from 2019 to 2023.
Dear reviewers, please download the datasets here and access using the password enclosed in the review documents. Many thanks!
Data Descriptions -----------------------------------------
# install pre-requests
! pip install netcdf4
! pip install h5netcdf# read co2 enhancement data
import xarray as xr
fn = './CO2_Enhancements_Global.nc'
data = xr.open_dataset(fn)
type(data)
Please cite at least one of the following for any use of the CO2E dataset.
Zhou, Y.*, Fan, P., Liu, J., Xu, Y., Huang, B., Webster, C. (2025). GloCE v1.0: Global CO2 Enhancement Dataset 2019-2023 [Data set]. Zenodo. https://doi.org/10.5281/zenodo.15209825
Fan, P., Liu, J., Xu, Y., Huang, B., Webster, C., & Zhou, Y*. (Under Review) A global dataset of CO2 enhancements during 2019-2023.
For any data inquiries, please email Yulun Zhou at yulunzhou@hku.hk.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Simulation Data The waveplate.hdf5 file stores the results of the FDTD simulation that are visualized in Fig. 3 b)-d). The simulation was performed using the Tidy 3D Python library and also utilizes its methods for data visualization. The following snippet can be used to visualize the data: import tidy3d as td import matplotlib.pyplot as plt sim_data: td.SimulationData = td.SimulationData.from_file(f"waveplate.hdf5") fig, axs = plt.subplots(1, 2, tight_layout=True, figsize=(12, 5)) for fn, ax in zip(("Ex", "Ey"), axs): sim_data.plot_field("field_xz", field_name=fn, val="abs^2", ax=ax).set_aspect(1 / 10) ax.set_xlabel("x [$\mu$m]") ax.set_ylabel("z [$\mu$m]") fig.show() Measurement Data Signal data used for plotting Fig. 4-6. The data is stored in NetCDF providing self describing data format that is easy to manipulate using the Xarray Python library, specifically by calling xarray.open_dataset() Three datasets are provided and structured as follows: The electric_fields.nc dataset contains data displayed in Fig. 4. It has 3 data variables, corresponding to the signals themselves, as well as estimated Rabi frequencies and electric fields. The freq dimension is the x-axis and contains coordinates for the Probe field detuning in MHz. The n dimension labels different configurations of applied electric field, with the 0th one having no EHF field. The detune.nc dataset contains data displayed in Fig. 6. It has 2 data variables, corresponding to the signals themselves, as well as estimated peak separations, multiplied by the coupling factor. The freq dimension is the same, while the detune dimension labels different EHF field detunings, from -100 to 100 MHz with a step of 10. The waveplates.nc dataset contains data displayed in Fig. 5. It contains estimated Rabi frequencies calculated for different waveplate positions. The angles are stored in radians. There is the quarter- and half-waveplate to choose from. Usage examples Opening the dataset import matplotlib.pyplot as plt import xarray as xr electric_fields_ds = xr.open_dataset("data/electric_fields.nc") detuned_ds = xr.open_dataset("data/detune.nc") waveplates_ds = xr.open_dataset("data/waveplates.nc") sigmas_da = xr.open_dataarray("data/sigmas.nc") peak_heights_da = xr.open_dataarray("data/peak_heights.nc") Plotting the Fig. 4 signals and printing params fig, ax = plt.subplots() electric_fields_ds["signals"].plot.line(x="freq", hue="n", ax=ax) print(f"Rabi frequencies [Hz]: {electric_fields_ds['rabi_freqs'].values}") print(f"Electric fields [V/m]: {electric_fields_ds['electric_fields'].values}") fig.show() Plotting the Fig. 5 data (waveplates_ds["rabi_freqs"] ** 2).plot.scatter(x="angle", col="waveplate") Plotting the Fig. 6 signals for chosen detunes fig, ax = plt.subplots() detuned_ds["signals"].sel( detune=[ -100, -70, -40, 40, 70, 100, ] ).plot.line(x="freq", hue="detune", ax=ax) fig.show() Plotting the Fig. 6 inset plot fig, ax = plt.subplots() detuned_ds["separations"].plot.scatter(x="detune", ax=ax) ax.plot( detuned_ds.detune, np.sqrt(detuned_ds.detune**2 + detuned_ds["separations"].sel(detune=0) ** 2), ) fig.show() Plotting the Fig. 7 calculated peak widths sigmas_da.plot.scatter() Plotting the Fig. 8 calculated detuned smaller peak heights peak_heights_da.plot.scatter()
Data and code for "Spin-filtered measurements of Andreev Bound States" van Driel, David; Wang, Guanzhong; Dvir, Tom This folder contains the raw data and code used to generate the plots for the paper Spin-filtered measurements of Andreev Bound States (arXiv: ??). To run the Jupyter notebook, install Anaconda and execute: conda env create -f environment.yml followed by: conda activate spinABS Finally, jupyter notebook to launch the notebook called 'zenodo_notebook.ipynb'. Raw data are stored in netCDF (.nc) format. The files are exported by the data acquisition package QCoDeS and can be read as an xarray Dataset.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset provides simulated data on plastic and substance flows and stocks in buildings and infrastructure as described in the data article "Plastics in the German Building and Infrastructure Sector: A High-Resolution Dataset on Historical Flows, Stocks, and Legacy Substance Contamination". Besides simulated data, the repository contains input data and model files used to produce the simulated data.
Data & Data Visualization: The dataset contains input data and simulated data for the six main plastic applications in buildings and infrastructure in Germany in the period from 1950 to 2023, which are profiles, flooring, pipes, insulation material, cable insulations, and films. For each application the data are provided in a sub-directory (1_ ... 6_) following the structure described below.
Input Data:
The input data are stored in an xlsx-file with three sheets: flows, parameters, and data quality assessment. The data sources for all input data are detailed in the Supplementary Material of the linked Data in Brief article.
Simulated Data:
Simulated data are stored in a sub-folder, which contains:
Note: All files in the [product]/simulated_data folder are automatically replaced with updated model results upon execution of immec_dmfa_calculate_submodels.py.
To reduce storage requirements, data are stored in gzipped pickle files (.pkl.gz), while smaller files are provided as pickle files (.pkl). To open the files, users can use Python with the following code snippet:
import gzip
# Load a gzipped pickle file
with gzip.open("filename.pkl.gz", "rb") as f:
data = pickle.load(f)
# Load a regular pickle file
with open("filename.pkl", "rb") as f:
data = pickle.load(f)
Please note that opening pickle files requires compatible versions of numpy
and pandas
, as the files may have been created using version-specific data structures. If you encounter errors, ensure your package versions match those used during file creation (pandas: 2.2.3, numpy: 2.2.4).
Simulated data are provided as Xarray datasets, a data structure designed for efficient handling, analysis, and visualization of multi-dimensional labeled data. For more details on using Xarray, please refer to the official documentation: https://docs.xarray.dev/en/stable/
Core Model Files:
Computational Considerations:
During model execution, large arrays are generated, requiring significant memory. To enable computation on standard computers, Monte Carlo simulations are split into multiple chunks:
Dependencies
The model relies on the ODYM framework. To run the model, ODYM must be downloaded from https://github.com/IndEcol/ODYM (S. Pauliuk, N. Heeren, ODYM — An open software framework for studying dynamic material systems: Principles, implementation, and data structures, Journal of Industrial Ecology 24 (2020) 446–458. https://doi.org/10.1111/jiec.12952.)
7_Model_Structure:
8_Additional_Data: This folder contains supplementary data used in the model, including substance concentrations, data quality assessment scores, open-loop recycling distributions, and lifetime distributions.
The dataset was generated using a dynamic material flow analysis (dMFA) model. For a complete methodology description, refer to the Data in Brief article (add DOI).
If you use this dataset, please cite: Schmidt, S., Verni, X.-F., Gibon, T., Laner, D. (2025). Dataset for: Plastics in the German Building and Infrastructure Sector: A High-Resolution Dataset on Historical Flows, Stocks, and Legacy Substance Contamination, Zenodo. DOI: 10.5281/zenodo.15049210
This dataset is licensed under CC BY-NC 4.0, permitting use, modification, and distribution for non-commercial purposes, provided that proper attribution is given.
For questions or further details, please contact:
Sarah Schmidt
Center for Resource Management and Solid Waste Engineering
University of Kassel
Email: sarah.schmidt@uni-kassel.de
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains information on the Surface Soil Moisture (SM) content derived from satellite observations in the microwave domain.
A description of this dataset, including the methodology and validation results, is available at:
Preimesberger, W., Stradiotti, P., and Dorigo, W.: ESA CCI Soil Moisture GAPFILLED: An independent global gap-free satellite climate data record with uncertainty estimates, Earth Syst. Sci. Data Discuss. [preprint], https://doi.org/10.5194/essd-2024-610, in review, 2025.
ESA CCI Soil Moisture is a multi-satellite climate data record that consists of harmonized, daily observations coming from 19 satellites (as of v09.1) operating in the microwave domain. The wealth of satellite information, particularly over the last decade, facilitates the creation of a data record with the highest possible data consistency and coverage.
However, data gaps are still found in the record. This is particularly notable in earlier periods when a limited number of satellites were in operation, but can also arise from various retrieval issues, such as frozen soils, dense vegetation, and radio frequency interference (RFI). These data gaps present a challenge for many users, as they have the potential to obscure relevant events within a study area or are incompatible with (machine learning) software that often relies on gap-free inputs.
Since the requirement of a gap-free ESA CCI SM product was identified, various studies have demonstrated the suitability of different statistical methods to achieve this goal. A fundamental feature of such gap-filling method is to rely only on the original observational record, without need for ancillary variable or model-based information. Due to the intrinsic challenge, there was until present no global, long-term univariate gap-filled product available. In this version of the record, data gaps due to missing satellite overpasses and invalid measurements are filled using the Discrete Cosine Transform (DCT) Penalized Least Squares (PLS) algorithm (Garcia, 2010). A linear interpolation is applied over periods of (potentially) frozen soils with little to no variability in (frozen) soil moisture content. Uncertainty estimates are based on models calibrated in experiments to fill satellite-like gaps introduced to GLDAS Noah reanalysis soil moisture (Rodell et al., 2004), and consider the gap size and local vegetation conditions as parameters that affect the gapfilling performance.
You can use command line tools such as wget or curl to download (and extract) data for multiple years. The following command will download and extract the complete data set to the local directory ~/Download on Linux or macOS systems.
#!/bin/bash
# Set download directory
DOWNLOAD_DIR=~/Downloads
base_url="https://researchdata.tuwien.at/records/3fcxr-cde10/files"
# Loop through years 1991 to 2023 and download & extract data
for year in {1991..2023}; do
echo "Downloading $year.zip..."
wget -q -P "$DOWNLOAD_DIR" "$base_url/$year.zip"
unzip -o "$DOWNLOAD_DIR/$year.zip" -d $DOWNLOAD_DIR
rm "$DOWNLOAD_DIR/$year.zip"
done
The dataset provides global daily estimates for the 1991-2023 period at 0.25° (~25 km) horizontal grid resolution. Daily images are grouped by year (YYYY), each subdirectory containing one netCDF image file for a specific day (DD), month (MM) in a 2-dimensional (longitude, latitude) grid system (CRS: WGS84). The file name has the following convention:
ESACCI-SOILMOISTURE-L3S-SSMV-COMBINED_GAPFILLED-YYYYMMDD000000-fv09.1r1.nc
Each netCDF file contains 3 coordinate variables (WGS84 longitude, latitude and time stamp), as well as the following data variables:
Additional information for each variable is given in the netCDF attributes.
Changes in v9.1r1 (previous version was v09.1):
These data can be read by any software that supports Climate and Forecast (CF) conform metadata standards for netCDF files, such as:
The following records are all part of the Soil Moisture Climate Data Records from satellites community
1 |
ESA CCI SM MODELFREE Surface Soil Moisture Record | <a href="https://doi.org/10.48436/svr1r-27j77" target="_blank" |
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for DWD Observations
This dataset is a collection of historical German Weather Service (DWD) weather station observations at 10 minutely, and hourly resolutions for various parameters. The data has been converted to Zarr and Xarray. The data was gathered using the wonderful wetterdienst package.
Dataset Details
Dataset Description
Curated by: [More Information Needed] Funded by [optional]: [More Information Needed] Shared by [optional]:… See the full description on the dataset page: https://huggingface.co/datasets/jacobbieker/dwd.
Renamed the "Unindexed dimensions" section in the Dataset
and DataArray
repr (added in v0.9.0) to "Dimensions without coordinates".