Facebook
TwitterWe implemented automated workflows using Jupyter notebooks for each state. The GIS processing, crucial for merging, extracting, and projecting GeoTIFF data, was performed using ArcPy—a Python package for geographic data analysis, conversion, and management within ArcGIS (Toms, 2015). After generating state-scale LES (large extent spatial) datasets in GeoTIFF format, we utilized the xarray and rioxarray Python packages to convert GeoTIFF to NetCDF. Xarray is a Python package to work with multi-dimensional arrays and rioxarray is rasterio xarray extension. Rasterio is a Python library to read and write GeoTIFF and other raster formats. Xarray facilitated data manipulation and metadata addition in the NetCDF file, while rioxarray was used to save GeoTIFF as NetCDF. These procedures resulted in the creation of three HydroShare resources (HS 3, HS 4 and HS 5) for sharing state-scale LES datasets. Notably, due to licensing constraints with ArcGIS Pro, a commercial GIS software, the Jupyter notebook development was undertaken on a Windows OS.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Test datasets for use with xmitgcm.These data were generated by running mitgcm in different configurations. Each tar archive contain a folder full of mds *.data / *.meta files.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This Zenodo repository contains all migration flow estimates associated with the paper "Deep learning four decades of human migration." Evaluation code, training data, trained neural networks, and smaller flow datasets are available in the main GitHub repository, which also provides detailed instructions on data sourcing. Due to file size limits, the larger datasets are archived here.
Data is available in both NetCDF (.nc) and CSV (.csv) formats. The NetCDF format is more compact and pre-indexed, making it suitable for large files. In Python, datasets can be opened as xarray.Dataset objects, enabling coordinate-based data selection.
Each dataset uses the following coordinate conventions:
The following data files are provided:
T summed over Birth ISO). Dimensions: Year, Origin ISO, Destination ISOAdditionally, two CSV files are provided for convenience:
imm: Total immigration flowsemi: Total emigration flowsnet: Net migrationimm_pop: Total immigrant population (non-native-born)emi_pop: Total emigrant population (living abroad)mig_prev: Total origin-destination flowsmig_brth: Total birth-destination flows, where Origin ISO reflects place of birthEach dataset includes a mean variable (mean estimate) and a std variable (standard deviation of the estimate).
An ISO3 conversion table is also provided.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains information on the Surface Soil Moisture (SM) content derived from satellite observations in the microwave domain.
A description of this dataset, including the methodology and validation results, is available at:
Preimesberger, W., Stradiotti, P., and Dorigo, W.: ESA CCI Soil Moisture GAPFILLED: an independent global gap-free satellite climate data record with uncertainty estimates, Earth Syst. Sci. Data, 17, 4305–4329, https://doi.org/10.5194/essd-17-4305-2025, 2025.
ESA CCI Soil Moisture is a multi-satellite climate data record that consists of harmonized, daily observations coming from 19 satellites (as of v09.1) operating in the microwave domain. The wealth of satellite information, particularly over the last decade, facilitates the creation of a data record with the highest possible data consistency and coverage.
However, data gaps are still found in the record. This is particularly notable in earlier periods when a limited number of satellites were in operation, but can also arise from various retrieval issues, such as frozen soils, dense vegetation, and radio frequency interference (RFI). These data gaps present a challenge for many users, as they have the potential to obscure relevant events within a study area or are incompatible with (machine learning) software that often relies on gap-free inputs.
Since the requirement of a gap-free ESA CCI SM product was identified, various studies have demonstrated the suitability of different statistical methods to achieve this goal. A fundamental feature of such gap-filling method is to rely only on the original observational record, without need for ancillary variable or model-based information. Due to the intrinsic challenge, there was until present no global, long-term univariate gap-filled product available. In this version of the record, data gaps due to missing satellite overpasses and invalid measurements are filled using the Discrete Cosine Transform (DCT) Penalized Least Squares (PLS) algorithm (Garcia, 2010). A linear interpolation is applied over periods of (potentially) frozen soils with little to no variability in (frozen) soil moisture content. Uncertainty estimates are based on models calibrated in experiments to fill satellite-like gaps introduced to GLDAS Noah reanalysis soil moisture (Rodell et al., 2004), and consider the gap size and local vegetation conditions as parameters that affect the gapfilling performance.
You can use command line tools such as wget or curl to download (and extract) data for multiple years. The following command will download and extract the complete data set to the local directory ~/Download on Linux or macOS systems.
#!/bin/bash
# Set download directory
DOWNLOAD_DIR=~/Downloads
base_url="https://researchdata.tuwien.at/records/3fcxr-cde10/files"
# Loop through years 1991 to 2023 and download & extract data
for year in {1991..2023}; do
echo "Downloading $year.zip..."
wget -q -P "$DOWNLOAD_DIR" "$base_url/$year.zip"
unzip -o "$DOWNLOAD_DIR/$year.zip" -d $DOWNLOAD_DIR
rm "$DOWNLOAD_DIR/$year.zip"
done
The dataset provides global daily estimates for the 1991-2023 period at 0.25° (~25 km) horizontal grid resolution. Daily images are grouped by year (YYYY), each subdirectory containing one netCDF image file for a specific day (DD), month (MM) in a 2-dimensional (longitude, latitude) grid system (CRS: WGS84). The file name has the following convention:
ESACCI-SOILMOISTURE-L3S-SSMV-COMBINED_GAPFILLED-YYYYMMDD000000-fv09.1r1.nc
Each netCDF file contains 3 coordinate variables (WGS84 longitude, latitude and time stamp), as well as the following data variables:
Additional information for each variable is given in the netCDF attributes.
Changes in v9.1r1 (previous version was v09.1):
These data can be read by any software that supports Climate and Forecast (CF) conform metadata standards for netCDF files, such as:
The following records are all part of the ESA CCI Soil Moisture science data records community
| 1 |
ESA CCI SM MODELFREE Surface Soil Moisture Record | <a href="https://doi.org/10.48436/svr1r-27j77" target="_blank" |
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
onemil1_1.nc is the train dataset.
onemil1_2.nc is the validation dataset.
onemil2.nc, p240.nc, and p390.nc are the test datasets.
These files are in .nc format; use xarray with Python to interface with them.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Author: Andrew J. FeltonDate: 5/5/2024
This R project contains the primary code and data (following pre-processing in python) used for data production, manipulation, visualization, and analysis and figure production for the study entitled:
"Global estimates of the storage and transit time of water through vegetation"
Please note that 'turnover' and 'transit' are used interchangeably in this project.
Data information:
The data folder contains key data sets used for analysis. In particular:
"data/turnover_from_python/updated/annual/multi_year_average/average_annual_turnover.nc" contains a global array summarizing five year (2016-2020) averages of annual transit, storage, canopy transpiration, and number of months of data. This is the core dataset for the analysis; however, each folder has much more data, including a dataset for each year of the analysis. Data are also available is separate .csv files for each land cover type. Oterh data can be found for the minimum, monthly, and seasonal transit time found in their respective folders. These data were produced using the python code found in the "supporting_code" folder given the ease of working with .nc and EASE grid in the xarray python module. R was used primarily for data visualization purposes. The remaining files in the "data" and "data/supporting_data"" folder primarily contain ground-based estimates of storage and transit found in public databases or through a literature search, but have been extensively processed and filtered here.
Python scripts can be found in the "supporting_code" folder.
Each R script in this project has a particular function:
01_start.R: This script loads the R packages used in the analysis, sets thedirectory, and imports custom functions for the project. You can also load in the main transit time (turnover) datasets here using the source() function.
02_functions.R: This script contains the custom function for this analysis, primarily to work with importing the seasonal transit data. Load this using the source() function in the 01_start.R script.
03_generate_data.R: This script is not necessary to run and is primarilyfor documentation. The main role of this code was to import and wranglethe data needed to calculate ground-based estimates of aboveground water storage.
04_annual_turnover_storage_import.R: This script imports the annual turnover andstorage data for each landcover type. You load in these data from the 01_start.R scriptusing the source() function.
05_minimum_turnover_storage_import.R: This script imports the minimum turnover andstorage data for each landcover type. Minimum is defined as the lowest monthlyestimate.You load in these data from the 01_start.R scriptusing the source() function.
06_figures_tables.R: This is the main workhouse for figure/table production and supporting analyses. This script generates the key figures and summary statistics used in the study that then get saved in the manuscript_figures folder. Note that allmaps were produced using Python code found in the "supporting_code"" folder.
Facebook
TwitterThese datasets are from tidal resource characterization measurements collected on the Terrasond High Energy Oceanographic Mooring (THEOM) from 1 July 2021 to 30 August 2021 (60 days) in Cook Inlet, Alaska. The lander was deployed at 60.7207031 N, 151.4294998 W in ~50 m of water. The dataset contains raw and processed data from the following two instruments: A Nortek Signature 500 kHz acoustic Doppler current profiler (ADCP). Data were recorded in 4 Hz in the beam coordinate system from all 5 beams. Processed data has been averaged into 5 minutes bins and converted to the East-North-Up (ENU) coordinate system. A Nortek Vector acoustic Doppler velocimeter (ADV). Data were recorded at 8 Hz in the beam coordinate system. Processed data has been averaged into 5 minutes bins and converted to the Streamwise - Cross-stream - Vertical (Principal) coordinate system. Turbulence statistics were calculated from 5-minute bins, with an FFT length equal to the bin length, and saved in the processed dataset. Data was read and analyzed using the DOLfYN (version 1.0.2) python package and saved in MATLAB (.mat) and netCDF (.nc) file formats. Files containing analyzed data (".b1") were standardized using the TSDAT (version 0.4.2) python package. NetCDF files can be opened using DOLfYN (e.g., dat = dolfyn.load(''*.nc")) or the xarray python package (e.g. `dat = xarray.open_dataset("*.nc"). All distances are in meters (e.g., depth, range, etc), and all velocities in m/s. See the DOLfYN documentation linked in the submission, and/or the Nortek documentation for additional details.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the datasets and code used in the Zastrow and Glotch manuscript entitled "Distinct Lithologies in Jezero Crater, Mars".
Abstract
Jezero crater is the landing site for the Mars 2020 Perseverance rover. The Noachian-aged crater has undergone several periods of fluvial and lacustrine activity and many phyllosilicate- and carbonate-bearing rocks formed as a result. It also contains a large portion of the regional Nili Fossae olivine-carbonate unit. In this work, we performed spectral mixture analysis of visible/near-infrared hyperspectral imagery over Jezero. We modeled carbonate abundances up to ~35% and identified three distinct units containing different carbonate phases. Our work also suggests that the olivine in the regional unit is largely restricted to aeolian deposis overlying the carbonate-bearing rocks. The diversity of carbonate phases in Jezero points to multiple periods of carbonate formation under varying conditions.
Description of Repository Datasets
code_unmixing.zip:
data_input.zip:
data_models.zip:
data_roi.zip:
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
For this experiment we measured all of the data using heterodyne detection.File "fig2" contains the data for the expeiment performed by storing the light in the GEM and then reading it out with EIT and gaussian functions fitted to the collecte time traces.Files "fig3_a" and "fig3_c_sim" contains the parameters of the gaussian functions fitted to the experimental time traces and simulated time traces respectively.File "fig4" contains fourier transforms of the experimentally measured time traces for impulses stored in EIT and readout in GEM.File "fig5" contains time traces for impulse with two frequencies stored in GEM and readout in EIT with fitted gaussian functions and fourier transform of time traces for two impulse stored in EIT and readout in GEM with fitted gaussian functions.Files were generated with Xarray (v2025.3.1) Python library. Files are in the HDF5 format. Files can be loaded uisng Xarray function "xarray.load_dataset". We analyzed the data using Python programming language. Data for theoretical model were created using Python.The “Quantum Optical Technologies” (FENG.02.01-IP.05-0017/23) project is carried out within the Measure 2.1 International Research Agendas programme of the Foundation for Polish Science, co-financed by the European Union under the European Funds for Smart Economy 2021--2027 (FENG). This research was funded in whole or in part by the National Science Centre, Poland, grant no. 2024/53/B/ST2/04040. Publication co-financed from the state budget funds (Poland), awarded by the Minister of Science under the “Perły Nauki II” program, project No. PN/02/0027/2023, co-financing amount PLN 239,998.00, total project value PLN 239,998.00.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The US National Center for Atmospheric Research partnered with the IBS Center for Climate Physics in South Korea to generate the CESM2 Large Ensemble which consists of 100 ensemble members at 1 degree spatial resolution covering the period 1850-2100 under CMIP6 historical and SSP370 future radiative forcing scenarios. Data sets from this ensemble were made downloadable via the Climate Data Gateway on June 14, 2021. NCAR has copied a subset (currently ~500 TB) of CESM2 LENS data to Amazon S3 as part of the AWS Public Datasets Program. To optimize for large-scale analytics we have represented the data as ~275 Zarr stores format accessible through the Python Xarray library. Each Zarr store contains a single physical variable for a given model run type and temporal frequency (monthly, daily).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data was prepared as input for the Selkie GIS-TE tool. This GIS tool aids site selection, logistics optimization and financial analysis of wave or tidal farms in the Irish and Welsh maritime areas. Read more here: https://www.selkie-project.eu/selkie-tools-gis-technoeconomic-model/
This research was funded by the Science Foundation Ireland (SFI) through MaREI, the SFI Research Centre for Energy, Climate and the Marine and by the Sustainable Energy Authority of Ireland (SEAI). Support was also received from the European Union's European Regional Development Fund through the Ireland Wales Cooperation Programme as part of the Selkie project.
File Formats
Results are presented in three file formats:
tif Can be imported into a GIS software (such as ARC GIS) csv Human-readable text format, which can also be opened in Excel png Image files that can be viewed in standard desktop software and give a spatial view of results
Input Data
All calculations use open-source data from the Copernicus store and the open-source software Python. The Python xarray library is used to read the data.
Hourly Data from 2000 to 2019
Wind -
Copernicus ERA5 dataset
17 by 27.5 km grid
10m wind speed
Wave - Copernicus Atlantic -Iberian Biscay Irish - Ocean Wave Reanalysis dataset 3 by 5 km grid
Accessibility
The maximum limits for Hs and wind speed are applied when mapping the accessibility of a site.
The Accessibility layer shows the percentage of time the Hs (Atlantic -Iberian Biscay Irish - Ocean Wave Reanalysis) and wind speed (ERA5) are below these limits for the month.
Input data is 20 years of hourly wave and wind data from 2000 to 2019, partitioned by month. At each timestep, the accessibility of the site was determined by checking if
the Hs and wind speed were below their respective limits. The percentage accessibility is the number of hours within limits divided by the total number of hours for the month.
Environmental data is from the Copernicus data store (https://cds.climate.copernicus.eu/). Wave hourly data is from the 'Atlantic -Iberian Biscay Irish - Ocean Wave Reanalysis' dataset.
Wind hourly data is from the ERA 5 dataset.
Availability
A device's availability to produce electricity depends on the device's reliability and the time to repair any failures. The repair time depends on weather
windows and other logistical factors (for example, the availability of repair vessels and personnel.). A 2013 study by O'Connor et al. determined the
relationship between the accessibility and availability of a wave energy device. The resulting graph (see Fig. 1 of their paper) shows the correlation between
accessibility at Hs of 2m and wind speed of 15.0m/s and availability. This graph is used to calculate the availability layer from the accessibility layer.
The input value, accessibility, measures how accessible a site is for installation or operation and maintenance activities. It is the percentage time the
environmental conditions, i.e. the Hs (Atlantic -Iberian Biscay Irish - Ocean Wave Reanalysis) and wind speed (ERA5), are below operational limits.
Input data is 20 years of hourly wave and wind data from 2000 to 2019, partitioned by month. At each timestep, the accessibility of the site was determined
by checking if the Hs and wind speed were below their respective limits. The percentage accessibility is the number of hours within limits divided by the total
number of hours for the month. Once the accessibility was known, the percentage availability was calculated using the O'Connor et al. graph of the relationship
between the two. A mature technology reliability was assumed.
Weather Window
The weather window availability is the percentage of possible x-duration windows where weather conditions (Hs, wind speed) are below maximum limits for the
given duration for the month.
The resolution of the wave dataset (0.05° × 0.05°) is higher than that of the wind dataset
(0.25° x 0.25°), so the nearest wind value is used for each wave data point. The weather window layer is at the resolution of the wave layer.
The first step in calculating the weather window for a particular set of inputs (Hs, wind speed and duration) is to calculate the accessibility at each timestep.
The accessibility is based on a simple boolean evaluation: are the wave and wind conditions within the required limits at the given timestep?
Once the time series of accessibility is calculated, the next step is to look for periods of sustained favourable environmental conditions, i.e. the weather
windows. Here all possible operating periods with a duration matching the required weather-window value are assessed to see if the weather conditions remain
suitable for the entire period. The percentage availability of the weather window is calculated based on the percentage of x-duration windows with suitable
weather conditions for their entire duration.The weather window availability can be considered as the probability of having the required weather window available
at any given point in the month.
Extreme Wind and Wave
The Extreme wave layers show the highest significant wave height expected to occur during the given return period. The Extreme wind layers show the highest wind speed expected to occur during the given return period.
To predict extreme values, we use Extreme Value Analysis (EVA). EVA focuses on the extreme part of the data and seeks to determine a model to fit this reduced
portion accurately. EVA consists of three main stages. The first stage is the selection of extreme values from a time series. The next step is to fit a model
that best approximates the selected extremes by determining the shape parameters for a suitable probability distribution. The model then predicts extreme values
for the selected return period. All calculations use the python pyextremes library. Two methods are used - Block Maxima and Peaks over threshold.
The Block Maxima methods selects the annual maxima and fits a GEVD probability distribution.
The peaks_over_threshold method has two variable calculation parameters. The first is the percentile above which values must be to be selected as extreme (0.9 or 0.998). The
second input is the time difference between extreme values for them to be considered independent (3 days). A Generalised Pareto Distribution is fitted to the selected
extremes and used to calculate the extreme value for the selected return period.
Facebook
Twittercmomy is a python package to calculate central moments and co-moments in a numerical stable and direct way. Behind the scenes, cmomy makes use of Numba to rapidly calculate moments. cmomy provides utilities to calculate central moments from individual samples, precomputed central moments, and precomputed raw moments. It also provides routines to perform bootstrap resampling based on raw data, or precomputed moments. cmomy has numpy array and xarray DataArray interfaces.
Facebook
TwitterThis dataset provides an empirical three-dimensional P- and S-wave velocity model covering a 30 x 30 km area and extending to 10 km depth around the Cape Modern EGS and Utah FORGE sites. It incorporates three-dimensional topography and a sediment/basement contact derived from geophysical and geological datasets collected by Utah FORGE or Fervo Energy. Basin velocities were estimated from a logarithmic fit to borehole velocity logs, while basement velocities were assigned constant values of 5.8 km/s for Vp and 3.392 km/s for Vs. No geophysical data inversion was performed in constructing this model. The dataset includes a manuscript describing the methods used to develop the model. The velocity model is provided in NetCDF format. Users will need software capable of reading NetCDF files. Common scientific libraries support this format without requiring proprietary tools. We recommend the Xarray package for Python, where methods like open_dataset() and .sel().plot() can be used to read and plot the data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the MESWA (Middle East and Southwest Asia) seismic model and auxiliary data used in the creation of the model (Rodgers, 2023). MESWA is a three-dimensional model of the seismic properties of crust and upper mantle of the Middle East and Southwest Asia. The MESWA model is provided in NetCDF format (readable by for example, xarray, Hoyer & Hamman, 2017) and HDF5 format for viewing with ParaView (Ahrens et al., 2005) and interaction with Salvus (Afanasiev et al., 2019).
Also included are the earthquake source parameters for all 327 Global Centroid Moment Tensor events considered in this study in ASCII text format. Also included are lists of the selected 192 inversion events and 66 validation events in ASCII text format. Lastly, we include a list of all receivers used in the creation and validation of MESWA. This is a simple ASCII file with the event name and receiver name (composed of the network_code and station_code).
The following table provides a listing of the files in the dataset:
|
File |
Description |
|
MESWA.nc |
MESWA model in NetCDF format |
|
MESWA.h5 |
MESWA model in HDF5 format, used by Salvus |
|
MESWA.xmdf |
Auxiliary file for MESWA.h5, used to import model into Paraview |
|
events_project.csv |
Table of event source parameters for all 327 events considered in the project |
|
inversion_events_192.csv |
Table of 192 inversion events (ASCII comma separated value) |
|
validation_events_66.csv |
Table of 66 validation events (ASCII comma separated value) |
|
events_receivers_inversion.csv |
Table of waveform (event-receiver-channel) data used in the inversion (ASCII comma separated value) |
|
events_receivers_validation.csv |
Table of waveform (event-receiver-channel) data used in the validation (ASCII comma separated value) |
References
Afanasiev, M, C Boehm, M van Driel, L Krischer, M Rietmann, DA May, MG Knepley, and A Fichtner (2019). Modular and flexible spectral-element waveform modelling in two and three dimensions, Geophys. J. Int., 216(3), 1675–1692, doi: 10.1093/gji/ggy469
Ahrens, J., Geveci, B., & Law, C. (2005). Paraview: An end-user tool for large data visualization. The Visualization Handbook, 717(8). https://doi.org/10.1016/b978-012387582-2/50038-1
Hoyer, S., & Hamman, J. (2017). Xarray: N-D labeled arrays and datasets in Python. Journal of Open Research Software, 5(1). https://doi.org/10.5334/jors.148
Rodgers, A. (2023). Adjoint Waveform Tomography for Crustal and Upper Mantle Structure the Middle East and Southwest Asia for Improved Waveform Simulations Using Openly Available Broadband Data, technical report, LLNL-TR- 851939.
Acknowledgements
This project was support by Lawrence Livermore National Laboratory’s Laboratory Directed Research and Development project 20-ERD-008 and the National Nuclear Security Administration. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. LLNL-MI-852402
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cropland Data Layer (CDL) data from the US Department of Agriculture's National Agricultural Statistics Service (NASS), subset spatially to cover the Snake River Basin, USA for years 2010-2017, inclusive. This data is the raw data used to support initialization of the Janus agent based model of land use land cover change. It was developed by downloading CDL data from the USDA NASS site for an area of interest encompassing the Snake River Basin for individual years from 2010-2017. Data were converted to a georeferenced GeoTiff format using the Geospatial Data Abstraction Library (GDAL) command line interface. They were then concatenated into a single dataset using the rioxarray python library and saved as a CF-compliant NetCDF4 file using the xarray python library. Note that this file is saved with zlib compression level 1 and, therefore, users may experience a slowdown upon initial reading of the file.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This database consists of a high-resolution village-level drought dataset for major Indian states for the past 43 years (1981 – 2022) for each month. It was created by utilising the CHIRPS precipitation and GLEAM evapotranspiration datasets. GLEAMS dataset based on the well recognised Priestley-Taylor equation to estimate potential evapotranspiration (PET) based on observations of surface net radiation and near-surface air temperature. The SPEI was calculated for spatial grids of 5x5 km for the SPEI 3-month time scale, suitable for agricultural drought monitoring.This high-resolution SPEI dataset was integrated with Indian village boundaries and associated census attribute dataset. This allows researchers to perform multi-disciplinary investigations, e.g., climate migration modelling, drought hazards, and exposure assessment. The development of the dataset has been performed while keeping potential users in mind. Therefore, the dataset can be integrated into a GIS system for visualization (using .mid/.mif format) and into Python programming for modelling and analysis (using .csv). For advanced analysis, I have also provided it in netCDF format, which can be read in Python using xarray or the netcdf4 library. More details are in the README.pdf file. Date Submitted: 2023-11-07 Issued: 2023-11-07
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains daily histograms of wind speed at 100m ("WS100"), wind direction at 100 m ("WD100") and an atmospheric stability proxy ("STAB") derived from the ERA5 hourly data on single levels [1] accessed via the Copernicus Climate Change Climate Data Store [2]. The dataset covers six geographical regions (illustrated in regions.png) on a reduced 0.5 x 0.5 degrees regular grid and covers the period 1994 to 2023 (both years included). The dataset is packaged as a zip folder per region which contains a range of monthly zip folders following the convention of zarr ZipStores (more details here: https://zarr.readthedocs.io/en/stable/api/storage.html). Thus, the monthly zip folders are intended to be used in connection with the xarray python package (no unzipping of the monthly files needed).Wind speed and wind direction are derived from the U- and V-components. The stability metric makes use of a 5-class classification scheme [3] based on the Obukhov length whereby the required Obukhov length was computed using [4]. The following bins (left edges) have been used to create the histograms:Wind speed: [0, 40) m/s (bin width 1 m/s)Wind direction: [0,360) deg (bin width 15 deg)Stability: 5 discrete stability classes (1: very unstable, 2: unstable, 3: neutral, 4: stable, 5: very stable)Main Purpose: The dataset serves as minimum input data for the CLIMatological REPresentative PERiods (climrepper) python package (https://gitlab.windenergy.dtu.dk/climrepper/climrepper) in preparation for public release).References:[1] Hersbach, H., Bell, B., Berrisford, P., Biavati, G., Horányi, A., Muñoz Sabater, J., Nicolas, J., Peubey, C., Radu, R., Rozum, I., Schepers, D., Simmons, A., Soci, C., Dee, D., Thépaut, J-N. (2023): ERA5 hourly data on single levels from 1940 to present. Copernicus Climate Change Service (C3S) Climate Data Store (CDS), DOI: 10.24381/cds.adbb2d47 (Accessed Nov. 2024)[2] Copernicus Climate Change Service, Climate Data Store, (2023): ERA5 hourly data on single levels from 1940 to present. Copernicus Climate Change Service (C3S) Climate Data Store (CDS), DOI: 10.24381/cds.adbb2d47 (Accessed Nov. 2024)'[3] Holtslag, M. C., Bierbooms, W. A. A. M., & Bussel, G. J. W. van. (2014). Estimating atmospheric stability from observations and correcting wind shear models accordingly. In Journal of Physics: Conference Series (Vol. 555, p. 012052). IOP Publishing. https://doi.org/10.1088/1742-6596/555/1/012052[4] Copernicus Knowledge Base, ERA5: How to calculate Obukhov Length, URL: https://confluence.ecmwf.int/display/CKB/ERA5:+How+to+calculate+Obukhov+Length, last accessed: Nov 2024
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Supporting data for the paper "Satellite derived SO2 emissions from the relatively low-intensity, effusive 2021 eruption of Fagradalsfjall, Iceland" by Esse et al. The data files are in netCDF4 format, created using the Python xarray library. Each is a separate xarray Dataset.
2021-05-02_18403_Fagradalsfjall_results.nc contains the analysis results for TROPOMI orbit 18403 shown in Figure 2.
Fagradalsfjall_2021_emission_intensity.nc contains the SO2 emission intensity data shown in Figures 3, 4 and 5.
cloud_effective_altitude_difference.nc contains the daily cloud effective altitude difference shown in figure 6.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is used as a "ground truth" for investigating the performance of a volumetric reconstruction technique of electric current densities, intended to be applied to the EISCAT 3D radar system. The technique is outlined in a mnuscript in preparation, to be referred to here once submitted. The volumetric reconstruction code can be found here: https://github.com/jpreistad/e3dsecs
This dataset contain four files:
1) Dataset file 'gemini_dataset.nc'. This is a dump from the end of a GEMINI model run driven with a pair of up/down FAC above the region around the EISCAT 3D facility. Detailes of the GEMINI model can be found here: https://doi.org/10.5281/zenodo.3528915 . This is a NETCDF file, intended to be opened with xarray in python:
import xaray
dataset = xarray.open_dataset('gemini_dataset.nc')
2) Grid file 'gemini_grid.h5'. This file is needed to get information about the grid that the values from GEMINI are represented in. The E3DSECS library (https://github.com/jpreistad/e3dsecs) has the necessary code to open this file and put it into the dictionary structure used in that package.
3) The GEMINI simulation config file 'config.nml' used to produce the simulation.
4) The GEMINI boundary file 'fac_said.py' used to produce the boundary conditions for the simulation
Together files 3 and 4 could be used to reproduce the full simulation of the GEMINI model, which is freely available at https://github.com/gemini3d
The configuration files for this particular run are also available at this location:
https://github.com/gemini3d/gemini-examples/tree/main/init/aurora_curv
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
QLKNN11D training set
This dataset contains a large-scale run of ~1 billion flux calculations of the quasilinear gyrokinetic transport model QuaLiKiz. QuaLiKiz is applied in numerous tokamak integrated modelling suites, and is openly available at https://gitlab.com/qualikiz-group/QuaLiKiz/. This dataset was generated with the 'QLKNN11D-hyper' tag of QuaLiKiz, equivalent to 2.8.1 apart from the negative magnetic shear filter being disabled. See https://gitlab.com/qualikiz-group/QuaLiKiz/-/tags/QLKNN11D-hyper for the in-repository tag.
The dataset is appropriate for the training of learned surrogates of QuaLiKiz, e.g. with neural networks. See https://doi.org/10.1063/1.5134126 for a Physics of Plasmas publication illustrating the development of a learned surrogate (QLKNN10D-hyper) of an older version of QuaLiKiz (2.4.0) with a 300 million point 10D dataset. The paper is also available on arXiv https://arxiv.org/abs/1911.05617 and the older dataset on Zenodo https://doi.org/10.5281/zenodo.3497066. For an application example, see Van Mulders et al 2021 https://doi.org/10.1088/1741-4326/ac0d12, where QLKNN10D-hyper was applied for ITER hybrid scenario optimization. For any learned surrogates developed for QLKNN11D, the effective addition of the alphaMHD input dimension through rescaling the input magnetic shear (s) by s = s - alpha_MHD/2, as carried out in Van Mulders et al., is recommended.
Related repositories:
General QuaLiKiz documentation https://qualikiz.com
QuaLiKiz/QLKNN input/output variables naming scheme https://qualikiz.com/QuaLiKiz/Input-and-output-variables
Training, plotting, filtering, and auxiliary tools https://gitlab.com/Karel-van-de-Plassche/QLKNN-develop
QuaLiKiz related tools https://gitlab.com/qualikiz-group/QuaLiKiz-pythontools
FORTRAN QLKNN implementation with wrapper for Python and MATLAB https://gitlab.com/qualikiz-group/QLKNN-fortran
Weights and biases of 'hyperrectangle style' QLKNN https://gitlab.com/qualikiz-group/qlknn-hype
Data exploration
The data is provided in 43 netCDF files. We advise opening single datasets using xarray or multiple datasets out-of-core using dask. For reference, we give the load times and sizes of a single variable that just depends on the scan size dimx below. This was tested single-core on a Intel Xeon 8160 CPU at 2.1 GHz and 192 GB of DDR4 RAM. Note that during loading, more memory is needed than the final number.
Timing of dataset loading
Amount of datasets
Final in-RAM memory (GiB)
Loading time single var (M:SS)
1
10.3
0:09
5
43.9
1:00
10
63.2
2:01
16
98.0
3:25
17
Out Of Memory
x:xx
Full dataset
The full dataset of QuaLiKiz in-and-output data is available on request. Note that this is 2.2 TiB of netCDF files!
Facebook
TwitterWe implemented automated workflows using Jupyter notebooks for each state. The GIS processing, crucial for merging, extracting, and projecting GeoTIFF data, was performed using ArcPy—a Python package for geographic data analysis, conversion, and management within ArcGIS (Toms, 2015). After generating state-scale LES (large extent spatial) datasets in GeoTIFF format, we utilized the xarray and rioxarray Python packages to convert GeoTIFF to NetCDF. Xarray is a Python package to work with multi-dimensional arrays and rioxarray is rasterio xarray extension. Rasterio is a Python library to read and write GeoTIFF and other raster formats. Xarray facilitated data manipulation and metadata addition in the NetCDF file, while rioxarray was used to save GeoTIFF as NetCDF. These procedures resulted in the creation of three HydroShare resources (HS 3, HS 4 and HS 5) for sharing state-scale LES datasets. Notably, due to licensing constraints with ArcGIS Pro, a commercial GIS software, the Jupyter notebook development was undertaken on a Windows OS.