10 datasets found
  1. d

    (HS 2) Automate Workflows using Jupyter notebook to create Large Extent...

    • search.dataone.org
    • hydroshare.org
    Updated Oct 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Young-Don Choi (2024). (HS 2) Automate Workflows using Jupyter notebook to create Large Extent Spatial Datasets [Dataset]. http://doi.org/10.4211/hs.a52df87347ef47c388d9633925cde9ad
    Explore at:
    Dataset updated
    Oct 19, 2024
    Dataset provided by
    Hydroshare
    Authors
    Young-Don Choi
    Description

    We implemented automated workflows using Jupyter notebooks for each state. The GIS processing, crucial for merging, extracting, and projecting GeoTIFF data, was performed using ArcPy—a Python package for geographic data analysis, conversion, and management within ArcGIS (Toms, 2015). After generating state-scale LES (large extent spatial) datasets in GeoTIFF format, we utilized the xarray and rioxarray Python packages to convert GeoTIFF to NetCDF. Xarray is a Python package to work with multi-dimensional arrays and rioxarray is rasterio xarray extension. Rasterio is a Python library to read and write GeoTIFF and other raster formats. Xarray facilitated data manipulation and metadata addition in the NetCDF file, while rioxarray was used to save GeoTIFF as NetCDF. These procedures resulted in the creation of three HydroShare resources (HS 3, HS 4 and HS 5) for sharing state-scale LES datasets. Notably, due to licensing constraints with ArcGIS Pro, a commercial GIS software, the Jupyter notebook development was undertaken on a Windows OS.

  2. Geospatial Analysis with Xarray

    • kaggle.com
    zip
    Updated Jul 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TAG (2023). Geospatial Analysis with Xarray [Dataset]. https://www.kaggle.com/datasets/tagg27/geospatial-analysis-with-xarray
    Explore at:
    zip(33082857 bytes)Available download formats
    Dataset updated
    Jul 8, 2023
    Authors
    TAG
    Description

    Dataset

    This dataset was created by TAG

    Contents

  3. Z

    QLKNN11D training set

    • data.niaid.nih.gov
    • zenodo.org
    • +1more
    Updated Jun 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Karel Lucas van de Plassche; Jonathan Citrin (2023). QLKNN11D training set [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8011147
    Explore at:
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    DIFFER
    Authors
    Karel Lucas van de Plassche; Jonathan Citrin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    QLKNN11D training set

    This dataset contains a large-scale run of ~1 billion flux calculations of the quasilinear gyrokinetic transport model QuaLiKiz. QuaLiKiz is applied in numerous tokamak integrated modelling suites, and is openly available at https://gitlab.com/qualikiz-group/QuaLiKiz/. This dataset was generated with the 'QLKNN11D-hyper' tag of QuaLiKiz, equivalent to 2.8.1 apart from the negative magnetic shear filter being disabled. See https://gitlab.com/qualikiz-group/QuaLiKiz/-/tags/QLKNN11D-hyper for the in-repository tag.

    The dataset is appropriate for the training of learned surrogates of QuaLiKiz, e.g. with neural networks. See https://doi.org/10.1063/1.5134126 for a Physics of Plasmas publication illustrating the development of a learned surrogate (QLKNN10D-hyper) of an older version of QuaLiKiz (2.4.0) with a 300 million point 10D dataset. The paper is also available on arXiv https://arxiv.org/abs/1911.05617 and the older dataset on Zenodo https://doi.org/10.5281/zenodo.3497066. For an application example, see Van Mulders et al 2021 https://doi.org/10.1088/1741-4326/ac0d12, where QLKNN10D-hyper was applied for ITER hybrid scenario optimization. For any learned surrogates developed for QLKNN11D, the effective addition of the alphaMHD input dimension through rescaling the input magnetic shear (s) by s = s - alpha_MHD/2, as carried out in Van Mulders et al., is recommended.

    Related repositories:

    General QuaLiKiz documentation https://qualikiz.com

    QuaLiKiz/QLKNN input/output variables naming scheme https://qualikiz.com/QuaLiKiz/Input-and-output-variables

    Training, plotting, filtering, and auxiliary tools https://gitlab.com/Karel-van-de-Plassche/QLKNN-develop

    QuaLiKiz related tools https://gitlab.com/qualikiz-group/QuaLiKiz-pythontools

    FORTRAN QLKNN implementation with wrapper for Python and MATLAB https://gitlab.com/qualikiz-group/QLKNN-fortran

    Weights and biases of 'hyperrectangle style' QLKNN https://gitlab.com/qualikiz-group/qlknn-hype

    Data exploration

    The data is provided in 43 netCDF files. We advise opening single datasets using xarray or multiple datasets out-of-core using dask. For reference, we give the load times and sizes of a single variable that just depends on the scan size dimx below. This was tested single-core on a Intel Xeon 8160 CPU at 2.1 GHz and 192 GB of DDR4 RAM. Note that during loading, more memory is needed than the final number.

    Timing of dataset loading
    
    
        Amount of datasets
        Final in-RAM memory (GiB)
    

    Loading time single var (M:SS)

        1
        10.3
        0:09
    
    
        5
        43.9
        1:00
    
    
        10
        63.2
        2:01
    
    
        16
        98.0
        3:25
    
    
        17
        Out Of Memory
        x:xx
    

    Full dataset

    The full dataset of QuaLiKiz in-and-output data is available on request. Note that this is 2.2 TiB of netCDF files!

  4. 4

    Dataset underlying the study "The effects of a storm surge event on salt...

    • data.4tu.nl
    zip
    Updated Feb 1, 2002
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Avelon Gerritsma; Martin Verlaan; Marlein Geraeds; Ymkje Huismans; Julie Pietrzak (2002). Dataset underlying the study "The effects of a storm surge event on salt intrusion: Insights from the Rhine-Meuse Delta" [Dataset]. http://doi.org/10.4121/ba7df652-cf0d-469a-817c-e783b7b2047c.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 1, 2002
    Dataset provided by
    4TU.ResearchData
    Authors
    Avelon Gerritsma; Martin Verlaan; Marlein Geraeds; Ymkje Huismans; Julie Pietrzak
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Rhine–Meuse–Scheldt delta
    Description

    Dataset contains the model output data that was used to create the figures of the study "The effects of a storm surge event on salt intrusion: Insights from the Rhine-Meuse Delta". The dataset includes:

    1. README file
    2. Xarray datasets with the simulated water levels and salinities that were used to create figures 4,5,8 and 10.
    3. Regridded salinity data used to create figure 7 and 9.
    4. Bedlevel and distance information of the cross section (figure 6).
    5. Python script to plot regridded salinity data

  5. ABS spin

    • zenodo.org
    Updated Jan 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David van Driel; David van Driel (2023). ABS spin [Dataset]. http://doi.org/10.5281/zenodo.7220682
    Explore at:
    Dataset updated
    Jan 28, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    David van Driel; David van Driel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data and code for "Spin-filtered measurements of Andreev Bound States"

    van Driel, David; Wang, Guanzhong; Dvir, Tom

    This folder contains the raw data and code used to generate the plots for the paper Spin-filtered measurements of Andreev Bound States (arXiv: ??).

    To run the Jupyter notebook, install Anaconda and execute:

    conda env create -f environment.yml

    followed by:

    conda activate spinABS

    Finally,

    jupyter notebook

    to launch the notebook called 'zenodo_notebook.ipynb'.

    Raw data are stored in netCDF (.nc) format. The files are exported by the data acquisition package QCoDeS and can be read as an xarray Dataset.

  6. Z

    SHNITSEL - Surface Hopping Nested Instances Training Set for Excited-state...

    • data.niaid.nih.gov
    Updated Mar 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Curth, Robin; Röhrkasten, Theodor; Müller, Carolin; Westermayr, Julia (2025). SHNITSEL - Surface Hopping Nested Instances Training Set for Excited-state Learning [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_14910194
    Explore at:
    Dataset updated
    Mar 20, 2025
    Dataset provided by
    Leipzig University
    Friedrich-Alexander-Universität Erlangen-Nürnberg
    Authors
    Curth, Robin; Röhrkasten, Theodor; Müller, Carolin; Westermayr, Julia
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SHNITSEL

    The Surface Hopping Nested Instances Training Set for Excited-State Learning (SHNITSEL) is a comprehensive data repository designed to support the development and benchmarking of excited-state dynamics methods.

    Configuration Space

    SHNITSEL contains datasets for nine organic molecules that represent a diverse range of photochemical behaviors. The following molecules are included in the dataset:

    Alkenes: ethene (A01), propene (A02), 2-butene (A03)

    Ring structures: fulvene (R01), 1,3-cyclohexadiene (R02), tyrosine (R03)

    Other molecules: methylenimmonium cation (I01), methanethione (T01), diiodomethane (H01)

    Property Space

    These datasets provide key electronic properties for singlet and triplet states, including energies, forces, dipole moments, transition dipole moments, nonadiabatic couplings, and spin-orbit couplings, computed at the multi-reference ab initio level. The data is categorized into static and dynamic data, based on its origin and purpose.

    Static data (#147,169 data points in total) consists of sampled molecular structures without time-dependent information, covering relevant vibrational and conformational spaces. These datasets are provided for eight molecules: A01, A02, A03, R01, R03, I01, T01, and H01

    Dynamic data (#444,581 data points in total) originates from surface hopping simulations and captures the evolution of molecular structures and properties over time, as they propagate on potential energy surfaces according to Newton’s equations of motion. These datasets are provided for five molecules: A01, A02, A03, R02, and I01

    Data Structure and Workflow

    The data is stored in xarray format, using xarray.Dataset objects for efficient handling of multidimensional data. Key dimensions include electronic states, couplings, atoms, and time frames for dynamic data. The dataset is scalable and compatible with large datasets, stored in NetCDF4 format within HDF5 for optimal performance. Tools for data processing, visualization, and integration into machine learning workflows are provided by the shnitsel Python package published on Github (shnitsel-tools) .(https://github.com/SHNITSEL/shnitsel-tools).

    An overview of the molecular structures and visualizations of key properties (from trajectory data) are compiled on the SHNITSEL webpage (https://shnitsel.github.io/).

  7. Z

    Data and code for "Singlet and triplet Cooper pair splitting in hybrid...

    • data.niaid.nih.gov
    Updated Nov 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guanzhong Wang (2022). Data and code for "Singlet and triplet Cooper pair splitting in hybrid superconducting nanowires" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5774827
    Explore at:
    Dataset updated
    Nov 23, 2022
    Dataset provided by
    TU Delft
    Authors
    Guanzhong Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This folder contains the raw data and code used to generate the plots for the paper Singlet and triplet Cooper pair splitting in hybrid superconducting nanowires (arXiv: 2205.03458).

    To run the Jupyter notebooks, install Anaconda and execute:

    conda env create -f cps-exp.yml

    followed by:

    conda activate cps-exp

    for the experiment data, or

    conda env create -f cps-theory.yml

    and similarly

    conda activate cps-theory

    for the theory plots. Finally,

    jupyter notebook

    to launch the corresponding notebook.

    Raw data are stored in netCDF (.nc) format. The files are directly exported by the data acquisition package QCoDeS and can be read as an xarray Dataset.

  8. ERA-NUTS: time-series based on C3S ERA5 for European regions

    • zenodo.org
    nc, zip
    Updated Aug 4, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M. De Felice; M. De Felice; K. Kavvadias; K. Kavvadias (2022). ERA-NUTS: time-series based on C3S ERA5 for European regions [Dataset]. http://doi.org/10.5281/zenodo.2650191
    Explore at:
    zip, ncAvailable download formats
    Dataset updated
    Aug 4, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    M. De Felice; M. De Felice; K. Kavvadias; K. Kavvadias
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    # ERA-NUTS (1980-2018)

    This dataset contains a set of time-series of meteorological variables based on Copernicus Climate Change Service (C3S) ERA5 reanalysis. The data files can be downloaded from here while notebooks and other files can be found on the associated Github repository.

    This data has been generated with the aim of providing hourly time-series of the meteorological variables commonly used for power system modelling and, more in general, studies on energy systems.

    An example of the analysis that can be performed with ERA-NUTS is shown in this video.

    Important: this dataset is still a work-in-progress, we will add more analysis and variables in the near-future. If you spot an error or something strange in the data please tell us sending an email or opening an Issue in the associated Github repository.

    ## Data
    The time-series have hourly/daily/monthly frequency and are aggregated following the NUTS 2016 classification. NUTS (Nomenclature of Territorial Units for Statistics) is a European Union standard for referencing the subdivisions of countries (member states, candidate countries and EFTA countries).

    This dataset contains NUTS0/1/2 time-series for the following variables obtained from the ERA5 reanalysis data (in brackets the name of the variable on the Copernicus Data Store and its unit measure):

    - t2m: 2-meter temperature (`2m_temperature`, Celsius degrees)
    - ssrd: Surface solar radiation (`surface_solar_radiation_downwards`, Watt per square meter)
    - ssrdc: Surface solar radiation clear-sky (`surface_solar_radiation_downward_clear_sky`, Watt per square meter)
    - ro: Runoff (`runoff`, millimeters)

    There are also a set of derived variables:
    - ws10: Wind speed at 10 meters (derived by `10m_u_component_of_wind` and `10m_v_component_of_wind`, meters per second)
    - ws100: Wind speed at 100 meters (derived by `100m_u_component_of_wind` and `100m_v_component_of_wind`, meters per second)
    - CS: Clear-Sky index (the ratio between the solar radiation and the solar radiation clear-sky)
    - HDD/CDD: Heating/Cooling Degree days (derived by 2-meter temperature the EUROSTAT definition.

    For each variable we have 350 599 hourly samples (from 01-01-1980 00:00:00 to 31-12-2019 23:00:00) for 34/115/309 regions (NUTS 0/1/2).

    The data is provided in two formats:

    - NetCDF version 4 (all the variables hourly and CDD/HDD daily). NOTE: the variables are stored as `int16` type using a `scale_factor` of 0.01 to minimise the size of the files.
    - Comma Separated Value ("single index" format for all the variables and the time frequencies and "stacked" only for daily and monthly)

    All the CSV files are stored in a zipped file for each variable.

    ## Methodology

    The time-series have been generated using the following workflow:

    1. The NetCDF files are downloaded from the Copernicus Data Store from the ERA5 hourly data on single levels from 1979 to present dataset
    2. The data is read in R with the climate4r packages and aggregated using the function `/get_ts_from_shp` from panas. All the variables are aggregated at the NUTS boundaries using the average except for the runoff, which consists of the sum of all the grid points within the regional/national borders.
    3. The derived variables (wind speed, CDD/HDD, clear-sky) are computed and all the CSV files are generated using R
    4. The NetCDF are created using `xarray` in Python 3.7.

    NOTE: air temperature, solar radiation, runoff and wind speed hourly data have been rounded with two decimal digits.

    ## Example notebooks

    In the folder `notebooks` on the associated Github repository there are two Jupyter notebooks which shows how to deal effectively with the NetCDF data in `xarray` and how to visualise them in several ways by using matplotlib or the enlopy package.

    There are currently two notebooks:

    - exploring-ERA-NUTS: it shows how to open the NetCDF files (with Dask), how to manipulate and visualise them.
    - ERA-NUTS-explore-with-widget: explorer interactively the datasets with [jupyter]() and ipywidgets.

    The notebook `exploring-ERA-NUTS` is also available rendered as HTML.

    ## Additional files

    In the folder `additional files`on the associated Github repository there is a map showing the spatial resolution of the ERA5 reanalysis and a CSV file specifying the number of grid points with respect to each NUTS0/1/2 region.

    ## License

    This dataset is released under CC-BY-4.0 license.

  9. H

    (HS 3) Large Extent Spatial Datasets in North Carolina

    • hydroshare.org
    • search.dataone.org
    zip
    Updated Oct 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Young-Don Choi (2024). (HS 3) Large Extent Spatial Datasets in North Carolina [Dataset]. http://doi.org/10.4211/hs.7bd3a773639f40458c22c5ec43ae3bc6
    Explore at:
    zip(8.5 GB)Available download formats
    Dataset updated
    Oct 15, 2024
    Dataset provided by
    HydroShare
    Authors
    Young-Don Choi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    This HydroShare resource was created to share large extent spatial (LES) datasets in North Carolina on GeoServer (https://geoserver.hydroshare.org/geoserver/web/wicket/bookmarkable/org.geoserver.web.demo.MapPreviewPage) and THREDDS (https://thredds.hydroshare.org/thredds/catalog/hydroshare/resources/catalog.html).

    Users can access the uploaded LES datasets on HydroShare-GeoServer and THREDDS using this HS resource id. This resource was created using HS 2.

    Then, through the RHESSys workflows, users can subset LES datasets using OWSLib and xarray.

  10. Z

    IAGOS-CARIBIC whole air sampler data (v2024.07.17)

    • data.niaid.nih.gov
    • zenodo.org
    Updated Oct 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Schuck, Tanja; Obersteiner, Florian (2024). IAGOS-CARIBIC whole air sampler data (v2024.07.17) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10495038
    Explore at:
    Dataset updated
    Oct 28, 2024
    Dataset provided by
    Goethe University Frankfurt
    Karlsruher Institut für Technologie
    Authors
    Schuck, Tanja; Obersteiner, Florian
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IAGOS-CARIBIC WSM files collection (v2024.07.17)

    Content

    IAGOS-CARIBIC_WSM_files_collection_20240717.zip contains merged IAGOS-CARIBIC whole air sampler data (CARIBIC-1 and CARIBIC-2; ). There is one netCDF file per IAGOS-CARIBIC flight. Files were generated from NASA Ames 1001. For detailed content information, see global and variable attributes. Global attribute na_file_header_[x] contains the original NASA Ames file header as an array of strings, with [x] being one of the source files.

    Data Coverage

    The data set covers 22 years of CARIBIC data from 1997 to 2020, flight numbers 8 to 591. There is no data available after 2020. Also, note that data isn't available for all flight numbers within the [1, 591] range.

    Special note on CARIBIC-1 data

    CARIBIC-1 data only contains a subset of the variables found in CARIBIC-2 data files. To distinguish those two campaigns, use the global attribute 'mission'.

    File format

    netCDF v4, created with xarray, . Default variable encoding was used (no compression etc.).

    Data availability

    This dataset is also available via our THREDDS server at KIT, .

    Contact

    Tanja Schuck, whole air sampling system PI, Andreas Zahn, IAGOS-CARIBIC Coordinator , Florian Obersteiner, IAGOS-CARIBIC data management,

    Changelog

    2024.07.17: revise ozone data for flights 294 to 591

    2024.01.22: editorial changes, add Schuck et al. publications, data unchanged

    2024.01.12: initial upload

  11. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Young-Don Choi (2024). (HS 2) Automate Workflows using Jupyter notebook to create Large Extent Spatial Datasets [Dataset]. http://doi.org/10.4211/hs.a52df87347ef47c388d9633925cde9ad

(HS 2) Automate Workflows using Jupyter notebook to create Large Extent Spatial Datasets

Explore at:
Dataset updated
Oct 19, 2024
Dataset provided by
Hydroshare
Authors
Young-Don Choi
Description

We implemented automated workflows using Jupyter notebooks for each state. The GIS processing, crucial for merging, extracting, and projecting GeoTIFF data, was performed using ArcPy—a Python package for geographic data analysis, conversion, and management within ArcGIS (Toms, 2015). After generating state-scale LES (large extent spatial) datasets in GeoTIFF format, we utilized the xarray and rioxarray Python packages to convert GeoTIFF to NetCDF. Xarray is a Python package to work with multi-dimensional arrays and rioxarray is rasterio xarray extension. Rasterio is a Python library to read and write GeoTIFF and other raster formats. Xarray facilitated data manipulation and metadata addition in the NetCDF file, while rioxarray was used to save GeoTIFF as NetCDF. These procedures resulted in the creation of three HydroShare resources (HS 3, HS 4 and HS 5) for sharing state-scale LES datasets. Notably, due to licensing constraints with ArcGIS Pro, a commercial GIS software, the Jupyter notebook development was undertaken on a Windows OS.

Search
Clear search
Close search
Google apps
Main menu