10 datasets found

d
(HS 2) Automate Workflows using Jupyter notebook to create Large Extent...
search.dataone.org
hydroshare.org
Updated Oct 19, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Young-Don Choi (2024). (HS 2) Automate Workflows using Jupyter notebook to create Large Extent Spatial Datasets [Dataset]. http://doi.org/10.4211/hs.a52df87347ef47c388d9633925cde9ad
Explore at:
Unique identifier
https://doi.org/10.4211/hs.a52df87347ef47c388d9633925cde9ad
Dataset updated
Oct 19, 2024
Dataset provided by
Hydroshare
Authors
Young-Don Choi
Description
We implemented automated workflows using Jupyter notebooks for each state. The GIS processing, crucial for merging, extracting, and projecting GeoTIFF data, was performed using ArcPy—a Python package for geographic data analysis, conversion, and management within ArcGIS (Toms, 2015). After generating state-scale LES (large extent spatial) datasets in GeoTIFF format, we utilized the xarray and rioxarray Python packages to convert GeoTIFF to NetCDF. Xarray is a Python package to work with multi-dimensional arrays and rioxarray is rasterio xarray extension. Rasterio is a Python library to read and write GeoTIFF and other raster formats. Xarray facilitated data manipulation and metadata addition in the NetCDF file, while rioxarray was used to save GeoTIFF as NetCDF. These procedures resulted in the creation of three HydroShare resources (HS 3, HS 4 and HS 5) for sharing state-scale LES datasets. Notably, due to licensing constraints with ArcGIS Pro, a commercial GIS software, the Jupyter notebook development was undertaken on a Windows OS.
Geospatial Analysis with Xarray
kaggle.com
zip
Updated Jul 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TAG (2023). Geospatial Analysis with Xarray [Dataset]. https://www.kaggle.com/datasets/tagg27/geospatial-analysis-with-xarray
Explore at:
zip(33082857 bytes)Available download formats
Dataset updated
Jul 8, 2023
Authors
TAG
Description
Dataset

This dataset was created by TAG

Contents
Z
QLKNN11D training set
data.niaid.nih.gov
zenodo.org
+1more
Updated Jun 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Karel Lucas van de Plassche; Jonathan Citrin (2023). QLKNN11D training set [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8011147
Explore at:
Dataset updated
Jun 8, 2023
Dataset provided by
DIFFER
Authors
Karel Lucas van de Plassche; Jonathan Citrin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
QLKNN11D training set

This dataset contains a large-scale run of ~1 billion flux calculations of the quasilinear gyrokinetic transport model QuaLiKiz. QuaLiKiz is applied in numerous tokamak integrated modelling suites, and is openly available at https://gitlab.com/qualikiz-group/QuaLiKiz/. This dataset was generated with the 'QLKNN11D-hyper' tag of QuaLiKiz, equivalent to 2.8.1 apart from the negative magnetic shear filter being disabled. See https://gitlab.com/qualikiz-group/QuaLiKiz/-/tags/QLKNN11D-hyper for the in-repository tag.

The dataset is appropriate for the training of learned surrogates of QuaLiKiz, e.g. with neural networks. See https://doi.org/10.1063/1.5134126 for a Physics of Plasmas publication illustrating the development of a learned surrogate (QLKNN10D-hyper) of an older version of QuaLiKiz (2.4.0) with a 300 million point 10D dataset. The paper is also available on arXiv https://arxiv.org/abs/1911.05617 and the older dataset on Zenodo https://doi.org/10.5281/zenodo.3497066. For an application example, see Van Mulders et al 2021 https://doi.org/10.1088/1741-4326/ac0d12, where QLKNN10D-hyper was applied for ITER hybrid scenario optimization. For any learned surrogates developed for QLKNN11D, the effective addition of the alphaMHD input dimension through rescaling the input magnetic shear (s) by s = s - alpha_MHD/2, as carried out in Van Mulders et al., is recommended.

Related repositories:

General QuaLiKiz documentation https://qualikiz.com

QuaLiKiz/QLKNN input/output variables naming scheme https://qualikiz.com/QuaLiKiz/Input-and-output-variables

Training, plotting, filtering, and auxiliary tools https://gitlab.com/Karel-van-de-Plassche/QLKNN-develop

QuaLiKiz related tools https://gitlab.com/qualikiz-group/QuaLiKiz-pythontools

FORTRAN QLKNN implementation with wrapper for Python and MATLAB https://gitlab.com/qualikiz-group/QLKNN-fortran

Weights and biases of 'hyperrectangle style' QLKNN https://gitlab.com/qualikiz-group/qlknn-hype

Data exploration

The data is provided in 43 netCDF files. We advise opening single datasets using xarray or multiple datasets out-of-core using dask. For reference, we give the load times and sizes of a single variable that just depends on the scan size dimx below. This was tested single-core on a Intel Xeon 8160 CPU at 2.1 GHz and 192 GB of DDR4 RAM. Note that during loading, more memory is needed than the final number.

Timing of dataset loading Amount of datasets Final in-RAM memory (GiB)

Loading time single var (M:SS)

1 10.3 0:09 5 43.9 1:00 10 63.2 2:01 16 98.0 3:25 17 Out Of Memory x:xx

Full dataset

The full dataset of QuaLiKiz in-and-output data is available on request. Note that this is 2.2 TiB of netCDF files!
4
Dataset underlying the study "The effects of a storm surge event on salt...
data.4tu.nl
zip
Updated Feb 1, 2002
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Avelon Gerritsma; Martin Verlaan; Marlein Geraeds; Ymkje Huismans; Julie Pietrzak (2002). Dataset underlying the study "The effects of a storm surge event on salt intrusion: Insights from the Rhine-Meuse Delta" [Dataset]. http://doi.org/10.4121/ba7df652-cf0d-469a-817c-e783b7b2047c.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/ba7df652-cf0d-469a-817c-e783b7b2047c.v1
Dataset updated
Feb 1, 2002
Dataset provided by
4TU.ResearchData
Authors
Avelon Gerritsma; Martin Verlaan; Marlein Geraeds; Ymkje Huismans; Julie Pietrzak
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Rhine–Meuse–Scheldt delta
Description
Dataset contains the model output data that was used to create the figures of the study "The effects of a storm surge event on salt intrusion: Insights from the Rhine-Meuse Delta". The dataset includes:
README file
Xarray datasets with the simulated water levels and salinities that were used to create figures 4,5,8 and 10.
Regridded salinity data used to create figure 7 and 9.
Bedlevel and distance information of the cross section (figure 6).
Python script to plot regridded salinity data
ABS spin
zenodo.org
Updated Jan 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David van Driel; David van Driel (2023). ABS spin [Dataset]. http://doi.org/10.5281/zenodo.7220682
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7220682
Dataset updated
Jan 28, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
David van Driel; David van Driel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data and code for "Spin-filtered measurements of Andreev Bound States"

van Driel, David; Wang, Guanzhong; Dvir, Tom

This folder contains the raw data and code used to generate the plots for the paper Spin-filtered measurements of Andreev Bound States (arXiv: ??).

To run the Jupyter notebook, install Anaconda and execute:

conda env create -f environment.yml

followed by:

conda activate spinABS

Finally,

jupyter notebook

to launch the notebook called 'zenodo_notebook.ipynb'.

Raw data are stored in netCDF (.nc) format. The files are exported by the data acquisition package QCoDeS and can be read as an xarray Dataset.
Z
SHNITSEL - Surface Hopping Nested Instances Training Set for Excited-state...
data.niaid.nih.gov
Updated Mar 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Curth, Robin; Röhrkasten, Theodor; Müller, Carolin; Westermayr, Julia (2025). SHNITSEL - Surface Hopping Nested Instances Training Set for Excited-state Learning [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_14910194
Explore at:
Dataset updated
Mar 20, 2025
Dataset provided by
Leipzig University
Friedrich-Alexander-Universität Erlangen-Nürnberg
Authors
Curth, Robin; Röhrkasten, Theodor; Müller, Carolin; Westermayr, Julia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
SHNITSEL

The Surface Hopping Nested Instances Training Set for Excited-State Learning (SHNITSEL) is a comprehensive data repository designed to support the development and benchmarking of excited-state dynamics methods.

Configuration Space

SHNITSEL contains datasets for nine organic molecules that represent a diverse range of photochemical behaviors. The following molecules are included in the dataset:

Alkenes: ethene (A01), propene (A02), 2-butene (A03)

Ring structures: fulvene (R01), 1,3-cyclohexadiene (R02), tyrosine (R03)

Other molecules: methylenimmonium cation (I01), methanethione (T01), diiodomethane (H01)

Property Space

These datasets provide key electronic properties for singlet and triplet states, including energies, forces, dipole moments, transition dipole moments, nonadiabatic couplings, and spin-orbit couplings, computed at the multi-reference ab initio level. The data is categorized into static and dynamic data, based on its origin and purpose.

Static data (#147,169 data points in total) consists of sampled molecular structures without time-dependent information, covering relevant vibrational and conformational spaces. These datasets are provided for eight molecules: A01, A02, A03, R01, R03, I01, T01, and H01

Dynamic data (#444,581 data points in total) originates from surface hopping simulations and captures the evolution of molecular structures and properties over time, as they propagate on potential energy surfaces according to Newton’s equations of motion. These datasets are provided for five molecules: A01, A02, A03, R02, and I01

Data Structure and Workflow

The data is stored in xarray format, using xarray.Dataset objects for efficient handling of multidimensional data. Key dimensions include electronic states, couplings, atoms, and time frames for dynamic data. The dataset is scalable and compatible with large datasets, stored in NetCDF4 format within HDF5 for optimal performance. Tools for data processing, visualization, and integration into machine learning workflows are provided by the shnitsel Python package published on Github (shnitsel-tools) .(https://github.com/SHNITSEL/shnitsel-tools).

An overview of the molecular structures and visualizations of key properties (from trajectory data) are compiled on the SHNITSEL webpage (https://shnitsel.github.io/).
Z
Data and code for "Singlet and triplet Cooper pair splitting in hybrid...
data.niaid.nih.gov
Updated Nov 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Guanzhong Wang (2022). Data and code for "Singlet and triplet Cooper pair splitting in hybrid superconducting nanowires" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5774827
Explore at:
Dataset updated
Nov 23, 2022
Dataset provided by
TU Delft
Authors
Guanzhong Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This folder contains the raw data and code used to generate the plots for the paper Singlet and triplet Cooper pair splitting in hybrid superconducting nanowires (arXiv: 2205.03458).

To run the Jupyter notebooks, install Anaconda and execute:

conda env create -f cps-exp.yml

followed by:

conda activate cps-exp

for the experiment data, or

conda env create -f cps-theory.yml

and similarly

conda activate cps-theory

for the theory plots. Finally,

jupyter notebook

to launch the corresponding notebook.

Raw data are stored in netCDF (.nc) format. The files are directly exported by the data acquisition package QCoDeS and can be read as an xarray Dataset.
ERA-NUTS: time-series based on C3S ERA5 for European regions
zenodo.org
nc, zip
Updated Aug 4, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
M. De Felice; M. De Felice; K. Kavvadias; K. Kavvadias (2022). ERA-NUTS: time-series based on C3S ERA5 for European regions [Dataset]. http://doi.org/10.5281/zenodo.2650191
Explore at:
zip, ncAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.2650191
Dataset updated
Aug 4, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
M. De Felice; M. De Felice; K. Kavvadias; K. Kavvadias
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
# ERA-NUTS (1980-2018)

This dataset contains a set of time-series of meteorological variables based on Copernicus Climate Change Service (C3S) ERA5 reanalysis. The data files can be downloaded from here while notebooks and other files can be found on the associated Github repository.

This data has been generated with the aim of providing hourly time-series of the meteorological variables commonly used for power system modelling and, more in general, studies on energy systems.

An example of the analysis that can be performed with ERA-NUTS is shown in this video.

Important: this dataset is still a work-in-progress, we will add more analysis and variables in the near-future. If you spot an error or something strange in the data please tell us sending an email or opening an Issue in the associated Github repository.

## Data
The time-series have hourly/daily/monthly frequency and are aggregated following the NUTS 2016 classification. NUTS (Nomenclature of Territorial Units for Statistics) is a European Union standard for referencing the subdivisions of countries (member states, candidate countries and EFTA countries).

This dataset contains NUTS0/1/2 time-series for the following variables obtained from the ERA5 reanalysis data (in brackets the name of the variable on the Copernicus Data Store and its unit measure):

- t2m: 2-meter temperature (`2m_temperature`, Celsius degrees)
- ssrd: Surface solar radiation (`surface_solar_radiation_downwards`, Watt per square meter)
- ssrdc: Surface solar radiation clear-sky (`surface_solar_radiation_downward_clear_sky`, Watt per square meter)
- ro: Runoff (`runoff`, millimeters)

There are also a set of derived variables:
- ws10: Wind speed at 10 meters (derived by `10m_u_component_of_wind` and `10m_v_component_of_wind`, meters per second)
- ws100: Wind speed at 100 meters (derived by `100m_u_component_of_wind` and `100m_v_component_of_wind`, meters per second)
- CS: Clear-Sky index (the ratio between the solar radiation and the solar radiation clear-sky)
- HDD/CDD: Heating/Cooling Degree days (derived by 2-meter temperature the EUROSTAT definition.

For each variable we have 350 599 hourly samples (from 01-01-1980 00:00:00 to 31-12-2019 23:00:00) for 34/115/309 regions (NUTS 0/1/2).

The data is provided in two formats:

- NetCDF version 4 (all the variables hourly and CDD/HDD daily). NOTE: the variables are stored as `int16` type using a `scale_factor` of 0.01 to minimise the size of the files.
- Comma Separated Value ("single index" format for all the variables and the time frequencies and "stacked" only for daily and monthly)

All the CSV files are stored in a zipped file for each variable.

## Methodology

The time-series have been generated using the following workflow:

1. The NetCDF files are downloaded from the Copernicus Data Store from the ERA5 hourly data on single levels from 1979 to present dataset
2. The data is read in R with the climate4r packages and aggregated using the function `/get_ts_from_shp` from panas. All the variables are aggregated at the NUTS boundaries using the average except for the runoff, which consists of the sum of all the grid points within the regional/national borders.
3. The derived variables (wind speed, CDD/HDD, clear-sky) are computed and all the CSV files are generated using R
4. The NetCDF are created using `xarray` in Python 3.7.

NOTE: air temperature, solar radiation, runoff and wind speed hourly data have been rounded with two decimal digits.

## Example notebooks

In the folder `notebooks` on the associated Github repository there are two Jupyter notebooks which shows how to deal effectively with the NetCDF data in `xarray` and how to visualise them in several ways by using matplotlib or the enlopy package.

There are currently two notebooks:

- exploring-ERA-NUTS: it shows how to open the NetCDF files (with Dask), how to manipulate and visualise them.
- ERA-NUTS-explore-with-widget: explorer interactively the datasets with [jupyter]() and ipywidgets.

The notebook `exploring-ERA-NUTS` is also available rendered as HTML.

## Additional files

In the folder `additional files`on the associated Github repository there is a map showing the spatial resolution of the ERA5 reanalysis and a CSV file specifying the number of grid points with respect to each NUTS0/1/2 region.

## License

This dataset is released under CC-BY-4.0 license.
H
(HS 3) Large Extent Spatial Datasets in North Carolina
hydroshare.org
search.dataone.org
zip
Updated Oct 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Young-Don Choi (2024). (HS 3) Large Extent Spatial Datasets in North Carolina [Dataset]. http://doi.org/10.4211/hs.7bd3a773639f40458c22c5ec43ae3bc6
Explore at:
zip(8.5 GB)Available download formats
Unique identifier
https://doi.org/10.4211/hs.7bd3a773639f40458c22c5ec43ae3bc6
Dataset updated
Oct 15, 2024
Dataset provided by
HydroShare
Authors
Young-Don Choi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
This HydroShare resource was created to share large extent spatial (LES) datasets in North Carolina on GeoServer (https://geoserver.hydroshare.org/geoserver/web/wicket/bookmarkable/org.geoserver.web.demo.MapPreviewPage) and THREDDS (https://thredds.hydroshare.org/thredds/catalog/hydroshare/resources/catalog.html).

Users can access the uploaded LES datasets on HydroShare-GeoServer and THREDDS using this HS resource id. This resource was created using HS 2.

Then, through the RHESSys workflows, users can subset LES datasets using OWSLib and xarray.
Z
IAGOS-CARIBIC whole air sampler data (v2024.07.17)
data.niaid.nih.gov
zenodo.org
Updated Oct 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Schuck, Tanja; Obersteiner, Florian (2024). IAGOS-CARIBIC whole air sampler data (v2024.07.17) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10495038
Explore at:
Dataset updated
Oct 28, 2024
Dataset provided by
Goethe University Frankfurt
Karlsruher Institut für Technologie
Authors
Schuck, Tanja; Obersteiner, Florian
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IAGOS-CARIBIC WSM files collection (v2024.07.17)

Content

IAGOS-CARIBIC_WSM_files_collection_20240717.zip contains merged IAGOS-CARIBIC whole air sampler data (CARIBIC-1 and CARIBIC-2; ). There is one netCDF file per IAGOS-CARIBIC flight. Files were generated from NASA Ames 1001. For detailed content information, see global and variable attributes. Global attribute na_file_header_[x] contains the original NASA Ames file header as an array of strings, with [x] being one of the source files.

Data Coverage

The data set covers 22 years of CARIBIC data from 1997 to 2020, flight numbers 8 to 591. There is no data available after 2020. Also, note that data isn't available for all flight numbers within the [1, 591] range.

Special note on CARIBIC-1 data

CARIBIC-1 data only contains a subset of the variables found in CARIBIC-2 data files. To distinguish those two campaigns, use the global attribute 'mission'.

File format

netCDF v4, created with xarray, . Default variable encoding was used (no compression etc.).

Data availability

This dataset is also available via our THREDDS server at KIT, .

Contact

Tanja Schuck, whole air sampling system PI, Andreas Zahn, IAGOS-CARIBIC Coordinator , Florian Obersteiner, IAGOS-CARIBIC data management,

Changelog

2024.07.17: revise ozone data for flights 294 to 591

2024.01.22: editorial changes, add Schuck et al. publications, data unchanged

2024.01.12: initial upload
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Young-Don Choi (2024). (HS 2) Automate Workflows using Jupyter notebook to create Large Extent Spatial Datasets [Dataset]. http://doi.org/10.4211/hs.a52df87347ef47c388d9633925cde9ad

(HS 2) Automate Workflows using Jupyter notebook to create Large Extent Spatial Datasets

Explore at:

Unique identifier

https://doi.org/10.4211/hs.a52df87347ef47c388d9633925cde9ad

Dataset updated

Oct 19, 2024

Dataset provided by

Hydroshare

Authors

Young-Don Choi

Description

We implemented automated workflows using Jupyter notebooks for each state. The GIS processing, crucial for merging, extracting, and projecting GeoTIFF data, was performed using ArcPy—a Python package for geographic data analysis, conversion, and management within ArcGIS (Toms, 2015). After generating state-scale LES (large extent spatial) datasets in GeoTIFF format, we utilized the xarray and rioxarray Python packages to convert GeoTIFF to NetCDF. Xarray is a Python package to work with multi-dimensional arrays and rioxarray is rasterio xarray extension. Rasterio is a Python library to read and write GeoTIFF and other raster formats. Xarray facilitated data manipulation and metadata addition in the NetCDF file, while rioxarray was used to save GeoTIFF as NetCDF. These procedures resulted in the creation of three HydroShare resources (HS 3, HS 4 and HS 5) for sharing state-scale LES datasets. Notably, due to licensing constraints with ArcGIS Pro, a commercial GIS software, the Jupyter notebook development was undertaken on a Windows OS.

Clear search

Close search

Google apps

Main menu

(HS 2) Automate Workflows using Jupyter notebook to create Large Extent...

Geospatial Analysis with Xarray

Dataset

Contents

QLKNN11D training set

Dataset underlying the study "The effects of a storm surge event on salt...

ABS spin

SHNITSEL - Surface Hopping Nested Instances Training Set for Excited-state...

Data and code for "Singlet and triplet Cooper pair splitting in hybrid...

ERA-NUTS: time-series based on C3S ERA5 for European regions

(HS 3) Large Extent Spatial Datasets in North Carolina

IAGOS-CARIBIC whole air sampler data (v2024.07.17)

(HS 2) Automate Workflows using Jupyter notebook to create Large Extent Spatial DatasetsSee More Versions

(HS 2) Automate Workflows using Jupyter notebook to create Large Extent Spatial Datasets