Facebook
TwitterWe implemented automated workflows using Jupyter notebooks for each state. The GIS processing, crucial for merging, extracting, and projecting GeoTIFF data, was performed using ArcPy—a Python package for geographic data analysis, conversion, and management within ArcGIS (Toms, 2015). After generating state-scale LES (large extent spatial) datasets in GeoTIFF format, we utilized the xarray and rioxarray Python packages to convert GeoTIFF to NetCDF. Xarray is a Python package to work with multi-dimensional arrays and rioxarray is rasterio xarray extension. Rasterio is a Python library to read and write GeoTIFF and other raster formats. Xarray facilitated data manipulation and metadata addition in the NetCDF file, while rioxarray was used to save GeoTIFF as NetCDF. These procedures resulted in the creation of three HydroShare resources (HS 3, HS 4 and HS 5) for sharing state-scale LES datasets. Notably, due to licensing constraints with ArcGIS Pro, a commercial GIS software, the Jupyter notebook development was undertaken on a Windows OS.
Facebook
TwitterThis dataset was created by TAG
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
QLKNN11D training set
This dataset contains a large-scale run of ~1 billion flux calculations of the quasilinear gyrokinetic transport model QuaLiKiz. QuaLiKiz is applied in numerous tokamak integrated modelling suites, and is openly available at https://gitlab.com/qualikiz-group/QuaLiKiz/. This dataset was generated with the 'QLKNN11D-hyper' tag of QuaLiKiz, equivalent to 2.8.1 apart from the negative magnetic shear filter being disabled. See https://gitlab.com/qualikiz-group/QuaLiKiz/-/tags/QLKNN11D-hyper for the in-repository tag.
The dataset is appropriate for the training of learned surrogates of QuaLiKiz, e.g. with neural networks. See https://doi.org/10.1063/1.5134126 for a Physics of Plasmas publication illustrating the development of a learned surrogate (QLKNN10D-hyper) of an older version of QuaLiKiz (2.4.0) with a 300 million point 10D dataset. The paper is also available on arXiv https://arxiv.org/abs/1911.05617 and the older dataset on Zenodo https://doi.org/10.5281/zenodo.3497066. For an application example, see Van Mulders et al 2021 https://doi.org/10.1088/1741-4326/ac0d12, where QLKNN10D-hyper was applied for ITER hybrid scenario optimization. For any learned surrogates developed for QLKNN11D, the effective addition of the alphaMHD input dimension through rescaling the input magnetic shear (s) by s = s - alpha_MHD/2, as carried out in Van Mulders et al., is recommended.
Related repositories:
General QuaLiKiz documentation https://qualikiz.com
QuaLiKiz/QLKNN input/output variables naming scheme https://qualikiz.com/QuaLiKiz/Input-and-output-variables
Training, plotting, filtering, and auxiliary tools https://gitlab.com/Karel-van-de-Plassche/QLKNN-develop
QuaLiKiz related tools https://gitlab.com/qualikiz-group/QuaLiKiz-pythontools
FORTRAN QLKNN implementation with wrapper for Python and MATLAB https://gitlab.com/qualikiz-group/QLKNN-fortran
Weights and biases of 'hyperrectangle style' QLKNN https://gitlab.com/qualikiz-group/qlknn-hype
Data exploration
The data is provided in 43 netCDF files. We advise opening single datasets using xarray or multiple datasets out-of-core using dask. For reference, we give the load times and sizes of a single variable that just depends on the scan size dimx below. This was tested single-core on a Intel Xeon 8160 CPU at 2.1 GHz and 192 GB of DDR4 RAM. Note that during loading, more memory is needed than the final number.
Timing of dataset loading
Amount of datasets
Final in-RAM memory (GiB)
Loading time single var (M:SS)
1
10.3
0:09
5
43.9
1:00
10
63.2
2:01
16
98.0
3:25
17
Out Of Memory
x:xx
Full dataset
The full dataset of QuaLiKiz in-and-output data is available on request. Note that this is 2.2 TiB of netCDF files!
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset contains the model output data that was used to create the figures of the study "The effects of a storm surge event on salt intrusion: Insights from the Rhine-Meuse Delta". The dataset includes:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data and code for "Spin-filtered measurements of Andreev Bound States"
van Driel, David; Wang, Guanzhong; Dvir, Tom
This folder contains the raw data and code used to generate the plots for the paper Spin-filtered measurements of Andreev Bound States (arXiv: ??).
To run the Jupyter notebook, install Anaconda and execute:
conda env create -f environment.yml
followed by:
conda activate spinABS
Finally,
jupyter notebook
to launch the notebook called 'zenodo_notebook.ipynb'.
Raw data are stored in netCDF (.nc) format. The files are exported by the data acquisition package QCoDeS and can be read as an xarray Dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SHNITSEL
The Surface Hopping Nested Instances Training Set for Excited-State Learning (SHNITSEL) is a comprehensive data repository designed to support the development and benchmarking of excited-state dynamics methods.
Configuration Space
SHNITSEL contains datasets for nine organic molecules that represent a diverse range of photochemical behaviors. The following molecules are included in the dataset:
Alkenes: ethene (A01), propene (A02), 2-butene (A03)
Ring structures: fulvene (R01), 1,3-cyclohexadiene (R02), tyrosine (R03)
Other molecules: methylenimmonium cation (I01), methanethione (T01), diiodomethane (H01)
Property Space
These datasets provide key electronic properties for singlet and triplet states, including energies, forces, dipole moments, transition dipole moments, nonadiabatic couplings, and spin-orbit couplings, computed at the multi-reference ab initio level. The data is categorized into static and dynamic data, based on its origin and purpose.
Static data (#147,169 data points in total) consists of sampled molecular structures without time-dependent information, covering relevant vibrational and conformational spaces. These datasets are provided for eight molecules: A01, A02, A03, R01, R03, I01, T01, and H01
Dynamic data (#444,581 data points in total) originates from surface hopping simulations and captures the evolution of molecular structures and properties over time, as they propagate on potential energy surfaces according to Newton’s equations of motion. These datasets are provided for five molecules: A01, A02, A03, R02, and I01
Data Structure and Workflow
The data is stored in xarray format, using xarray.Dataset objects for efficient handling of multidimensional data. Key dimensions include electronic states, couplings, atoms, and time frames for dynamic data. The dataset is scalable and compatible with large datasets, stored in NetCDF4 format within HDF5 for optimal performance. Tools for data processing, visualization, and integration into machine learning workflows are provided by the shnitsel Python package published on Github (shnitsel-tools) .(https://github.com/SHNITSEL/shnitsel-tools).
An overview of the molecular structures and visualizations of key properties (from trajectory data) are compiled on the SHNITSEL webpage (https://shnitsel.github.io/).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This folder contains the raw data and code used to generate the plots for the paper Singlet and triplet Cooper pair splitting in hybrid superconducting nanowires (arXiv: 2205.03458).
To run the Jupyter notebooks, install Anaconda and execute:
conda env create -f cps-exp.yml
followed by:
conda activate cps-exp
for the experiment data, or
conda env create -f cps-theory.yml
and similarly
conda activate cps-theory
for the theory plots. Finally,
jupyter notebook
to launch the corresponding notebook.
Raw data are stored in netCDF (.nc) format. The files are directly exported by the data acquisition package QCoDeS and can be read as an xarray Dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
# ERA-NUTS (1980-2018)
This dataset contains a set of time-series of meteorological variables based on Copernicus Climate Change Service (C3S) ERA5 reanalysis. The data files can be downloaded from here while notebooks and other files can be found on the associated Github repository.
This data has been generated with the aim of providing hourly time-series of the meteorological variables commonly used for power system modelling and, more in general, studies on energy systems.
An example of the analysis that can be performed with ERA-NUTS is shown in this video.
Important: this dataset is still a work-in-progress, we will add more analysis and variables in the near-future. If you spot an error or something strange in the data please tell us sending an email or opening an Issue in the associated Github repository.
## Data
The time-series have hourly/daily/monthly frequency and are aggregated following the NUTS 2016 classification. NUTS (Nomenclature of Territorial Units for Statistics) is a European Union standard for referencing the subdivisions of countries (member states, candidate countries and EFTA countries).
This dataset contains NUTS0/1/2 time-series for the following variables obtained from the ERA5 reanalysis data (in brackets the name of the variable on the Copernicus Data Store and its unit measure):
- t2m: 2-meter temperature (`2m_temperature`, Celsius degrees)
- ssrd: Surface solar radiation (`surface_solar_radiation_downwards`, Watt per square meter)
- ssrdc: Surface solar radiation clear-sky (`surface_solar_radiation_downward_clear_sky`, Watt per square meter)
- ro: Runoff (`runoff`, millimeters)
There are also a set of derived variables:
- ws10: Wind speed at 10 meters (derived by `10m_u_component_of_wind` and `10m_v_component_of_wind`, meters per second)
- ws100: Wind speed at 100 meters (derived by `100m_u_component_of_wind` and `100m_v_component_of_wind`, meters per second)
- CS: Clear-Sky index (the ratio between the solar radiation and the solar radiation clear-sky)
- HDD/CDD: Heating/Cooling Degree days (derived by 2-meter temperature the EUROSTAT definition.
For each variable we have 350 599 hourly samples (from 01-01-1980 00:00:00 to 31-12-2019 23:00:00) for 34/115/309 regions (NUTS 0/1/2).
The data is provided in two formats:
- NetCDF version 4 (all the variables hourly and CDD/HDD daily). NOTE: the variables are stored as `int16` type using a `scale_factor` of 0.01 to minimise the size of the files.
- Comma Separated Value ("single index" format for all the variables and the time frequencies and "stacked" only for daily and monthly)
All the CSV files are stored in a zipped file for each variable.
## Methodology
The time-series have been generated using the following workflow:
1. The NetCDF files are downloaded from the Copernicus Data Store from the ERA5 hourly data on single levels from 1979 to present dataset
2. The data is read in R with the climate4r packages and aggregated using the function `/get_ts_from_shp` from panas. All the variables are aggregated at the NUTS boundaries using the average except for the runoff, which consists of the sum of all the grid points within the regional/national borders.
3. The derived variables (wind speed, CDD/HDD, clear-sky) are computed and all the CSV files are generated using R
4. The NetCDF are created using `xarray` in Python 3.7.
NOTE: air temperature, solar radiation, runoff and wind speed hourly data have been rounded with two decimal digits.
## Example notebooks
In the folder `notebooks` on the associated Github repository there are two Jupyter notebooks which shows how to deal effectively with the NetCDF data in `xarray` and how to visualise them in several ways by using matplotlib or the enlopy package.
There are currently two notebooks:
- exploring-ERA-NUTS: it shows how to open the NetCDF files (with Dask), how to manipulate and visualise them.
- ERA-NUTS-explore-with-widget: explorer interactively the datasets with [jupyter]() and ipywidgets.
The notebook `exploring-ERA-NUTS` is also available rendered as HTML.
## Additional files
In the folder `additional files`on the associated Github repository there is a map showing the spatial resolution of the ERA5 reanalysis and a CSV file specifying the number of grid points with respect to each NUTS0/1/2 region.
## License
This dataset is released under CC-BY-4.0 license.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This HydroShare resource was created to share large extent spatial (LES) datasets in North Carolina on GeoServer (https://geoserver.hydroshare.org/geoserver/web/wicket/bookmarkable/org.geoserver.web.demo.MapPreviewPage) and THREDDS (https://thredds.hydroshare.org/thredds/catalog/hydroshare/resources/catalog.html).
Users can access the uploaded LES datasets on HydroShare-GeoServer and THREDDS using this HS resource id. This resource was created using HS 2.
Then, through the RHESSys workflows, users can subset LES datasets using OWSLib and xarray.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IAGOS-CARIBIC WSM files collection (v2024.07.17)
Content
IAGOS-CARIBIC_WSM_files_collection_20240717.zip contains merged IAGOS-CARIBIC whole air sampler data (CARIBIC-1 and CARIBIC-2; ). There is one netCDF file per IAGOS-CARIBIC flight. Files were generated from NASA Ames 1001. For detailed content information, see global and variable attributes. Global attribute na_file_header_[x] contains the original NASA Ames file header as an array of strings, with [x] being one of the source files.
Data Coverage
The data set covers 22 years of CARIBIC data from 1997 to 2020, flight numbers 8 to 591. There is no data available after 2020. Also, note that data isn't available for all flight numbers within the [1, 591] range.
Special note on CARIBIC-1 data
CARIBIC-1 data only contains a subset of the variables found in CARIBIC-2 data files. To distinguish those two campaigns, use the global attribute 'mission'.
File format
netCDF v4, created with xarray, . Default variable encoding was used (no compression etc.).
Data availability
This dataset is also available via our THREDDS server at KIT, .
Contact
Tanja Schuck, whole air sampling system PI, Andreas Zahn, IAGOS-CARIBIC Coordinator , Florian Obersteiner, IAGOS-CARIBIC data management,
Changelog
2024.07.17: revise ozone data for flights 294 to 591
2024.01.22: editorial changes, add Schuck et al. publications, data unchanged
2024.01.12: initial upload
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterWe implemented automated workflows using Jupyter notebooks for each state. The GIS processing, crucial for merging, extracting, and projecting GeoTIFF data, was performed using ArcPy—a Python package for geographic data analysis, conversion, and management within ArcGIS (Toms, 2015). After generating state-scale LES (large extent spatial) datasets in GeoTIFF format, we utilized the xarray and rioxarray Python packages to convert GeoTIFF to NetCDF. Xarray is a Python package to work with multi-dimensional arrays and rioxarray is rasterio xarray extension. Rasterio is a Python library to read and write GeoTIFF and other raster formats. Xarray facilitated data manipulation and metadata addition in the NetCDF file, while rioxarray was used to save GeoTIFF as NetCDF. These procedures resulted in the creation of three HydroShare resources (HS 3, HS 4 and HS 5) for sharing state-scale LES datasets. Notably, due to licensing constraints with ArcGIS Pro, a commercial GIS software, the Jupyter notebook development was undertaken on a Windows OS.