Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Met Office UK Deterministic Dataset (Zarr Format)
Description
This dataset is a subset of the Met Office UK Deterministic Dataset, converted from the original NetCDF format into Zarr format for modern data analysis. The Zarr files are packaged as tar archives for efficient storage and transfer. The subset focuses on specific variables and configurations, which are detailed in the met_office_uk_data_config.yaml file included in this repository. Researchers and… See the full description on the dataset page: https://huggingface.co/datasets/jcamier/met-office-uk-deterministic-zarr.
This item contains data and code used in experiments that produced the results for Sadler et. al (2022) (see below for full reference). We ran five experiments for the analysis, Experiment A, Experiment B, Experiment C, Experiment D, and Experiment AuxIn. Experiment A tested multi-task learning for predicting streamflow with 25 years of training data and using a different model for each of 101 sites. Experiment B tested multi-task learning for predicting streamflow with 25 years of training data and using a single model for all 101 sites. Experiment C tested multi-task learning for predicting streamflow with just 2 years of training data. Experiment D tested multi-task learning for predicting water temperature with over 25 years of training data. Experiment AuxIn used water temperature as an input variable for predicting streamflow. These experiments and their results are described in detail in the WRR paper. Data from a total of 101 sites across the US was used for the experiments. The model input data and streamflow data were from the Catchment Attributes and Meteorology for Large-sample Studies (CAMELS) dataset (Newman et. al 2014, Addor et. al 2017). The water temperature data were gathered from the National Water Information System (NWIS) (U.S. Geological Survey, 2016). The contents of this item are broken into 13 files or groups of files aggregated into zip files:
The North America CORDEX (NA-CORDEX) dataset contains regional climate change scenario data and guide for North America, for use in impacts, decision-making, and climate science. This dataset contains output from ... regional climate models (RCMs) run over a domain covering most of North America using boundary conditions from global climate model (GCM) simulations in the CMIP5 archive. These simulations run from 1950 to 2100 with a spatial resolution of 0.22 degree (25km) or 0.44 degree (50km). This version of the data is the same as the AWS S3 version. It includes selected variables converted to the Zarr format from the original NetCDF. Only daily data are currently available; all daily data were mapped to the standard calendar.
The NOAA National Water Model Retrospective dataset contains input and output from multi-decade CONUS retrospective simulations. These simulations used meteorological input fields from meteorological retrospective datasets. The output frequency and fields available in this historical NWM dataset differ from those contained in the real-time operational NWM forecast model. Additionally, note that no streamflow or other data assimilation is performed within any of the NWM retrospective simulations
One application of this dataset is to provide historical context to current near real-time streamflow, soil moisture and snowpack conditions. The retrospective data can be used to infer flow frequencies and perform temporal analyses with hourly streamflow output and 3-hourly land surface output. This dataset can also be used in the development of end user applications which require a long baseline of data for system training or verification purposes.
Details for Each Version of the NWM Retrospective Output
CONUS Domain - CONUS retrospective output is provided by all four versions of the NWM
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset is comprised of ECMWF ERA5-Land data covering 2014 to October 2022. This data is on a 0.1 degree grid and has fewer variables than the standard ERA5-reanalysis, but at a higher resolution. All the data has been downloaded as NetCDF files from the Copernicus Data Store and converted to Zarr using Xarray, then uploaded here. Each file is one day, and holds 24 timesteps.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Results from the Python Coastal Impacts and Adaptation Model (pyCIAM), the inputs and source code necessary to replicate these outputs, and the results presented in Depsky et al. 2023.
All zipped Zarr stores can be downloaded and accessed locally or can be directly accessed via code similar to the following:
from fsspec.implementations.zip import ZipFileSystem import xarray as xr xr.open_zarr(ZipFileSystem(url_of_file_in_record}}).get_mapper())
File Inventory
Products
pyCIAM_outputs.zarr.zip: Outputs of the pyCIAM model, using the SLIIDERS dataset to define socioeconomic and extreme sea level characteristics of coastal regions and the 17th, 50th, and 83rd quantiles of local sea level rise as projected by various modeling frameworks (LocalizeSL and FACTS) and for multiple emissions scenarios and ice sheet models.
pyCIAM_outputs_{case}.nc: A NetCDF version of pyCIAM_outputs, in which the netcdf files are divided up by adaptation "case" to reduce file size.
diaz2016_outputs.zarr.zip: A replication of the results from Diaz 2016 - the model upon which pyCIAM was built, using an identical configuration to that of the original model.
suboptimal_capital_by_movefactor.zarr.zip: An analysis of the observed present-day allocation of capital compared to a "rational" allocation, as a function of the magnitude of non-market costs of relocation assumed in the model. See Depsky et al. 2023 for further details.
Inputs
ar5-msl-rel-2005-quantiles.zarr.zip: Quantiles of projected local sea level rise as projected from the LocalizeSL model, using a variety of temperature scenarios and ice sheet models developed in Kopp 2014, Bamber 2019, DeConto 2021, IPCC SROCC. The results contained in pyCIAM_outputs.zarr.zip cover a broader (and newer) range of SLR projections from a more recent projection framework (FACTS); however, these data are more easily obtained from the appropriate Zenodo records and thus are not hosted in this one.
diaz2016_inputs_raw.zarr.zip: The coastal inputs used in Diaz 2016, obtained from GitHub and formatted for use in the Python-based pyCIAM. These are based on the Dynamic Integrated Vulnerability Assessment (DIVA) dataset.
surge-lookup-seg(_adm).zarr.zip: Pre-computed lookup tables estimating average annual losses from extreme sea levels due to mortality and capital stock damage. This is an intermediate output of pyCIAM and is not necessary to replicate the model results. However, it is more time consuming to produce than the rest of the model and is provided for users who may wish to start from the pre-computed dataset. Two versions are provided - the first contains estimates for each unique intersection of ~50km coastal segment and state/province-level administrative unit (admin-1). This is derived from the characteristics in SLIIDERS. The second is simply estimated on a version of SLIIDERS collapsed over administrative units to vary only over coastal segments. Both are used in the process of running pyCIAM.
ypk_2000_2100.zarr.zip: An intermediate output in creating SLIIDERS that contains country-level projections of GDP, capital stock, and population, based on the Shared Socioeconomic Pathways (SSPs). This is only used in normalizing costs estimated in pyCIAM by country and global GDP to report in Depsky et al. 2023. It is not used in the execution of pyCIAM but is provided to replicate results reported in the manuscript.
Source Code
pyCIAM.zip: Contains the python-CIAM package as well as a notebook-based workflow to replicate the results presented in Depsky et al. 2023. It also contains two master shell scripts (run_example.sh and run_full_replication.sh) to assist in executing a small sample of the pyCIAM model or in fully executing the workflow of Depsky et al. 2023, respectively. This code is consistent with release 1.2.0 in the pyCIAM GitHub repository and is available as version 1.2.0 of the python-CIAM package on PyPI.
Version history:
1.2
Point data-acquisition.ipynb
to updated Zenodo deposit that fixes the dtype of subsets
variable in diaz2016_inputs_raw.zarr.zip
to be bool rather than int8
Variable name bugfix in data-acquisition.ipynb
Add netcdf versions of SLIIDERS and the pyCIAM results to upload-zenodo.ipynb
Update results in Zenodo record to use SLIIDERS v1.2
1.1.1
Bugfix to inputs/diaz2016_inputs_raw.zarr.zip to make the subsets
variable bool instead of int8.
1.1.0
Version associated with publication of Depsky et al., 2023
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
GEOGLOWS is the Group on Earth Observation's Global Water Sustainability Program. It coordinates efforts from public
and private entities to make application ready river data more accessible and sustainably available to underdeveloped
regions. The GEOGLOWS Hydrological Model provides a retrospective and daily forecast of global river discharge at 7
million river sub-basins. The stream network is a hydrologically conditioned subset of the TDX-Hydro streams and
basins data produced by the United State's National Geospatial Intelligence Agency. The daily forecast provides 3
hourly average discharge in a 51 member ensemble and 15 day lead time derived from the ECMWF Integrated Forecast
System (IFS). The retrospective simulation is derived from ERA5 climate reanalysis data and provides daily average
streamflow beginning on 1 January 1940. New forecasts are uploaded daily and the retrospective simulation is updated
weekly on Sundays to keep the lag time between 5 and 12 days.
The geoglows-v2 bucket contains: (1) model configuration files used to generate the simulations, (2) the GIS streams
datasets used by the model, (3) the GIS streams datasets optimized for visualizations used by Esri's Living Atlas
layer, (4) several supporting table of metadata including country names, river names, hydrological properties used for
modeling.
The geoglows-v2-forecasts bucket contains: (1) daily 15 forecasts in zarr format optimized for time series queries of
all ensemble members in the prediction, (2) CSV formatted summary files optimized for producing time series animated
web maps for the entire global streams dataset.
The geoglows-v2-retrospective bucket contains: (1) the model retrospective outputs in (1a) zarr format optimized for
time series queries of up to a few hundred rivers on demand as well as (1b) in netCDF format best for bulk downloading
the dataset, (2) estimated return period flows for all 7 million million rivers (2a) in zarr format optimized for
reading subsets of the dataset as well as (2b) in netCDF format best for bulk downloading. (3) The initialization files
produced at the end of each incremental simulation useful for restarting the model from a specific date.
The NA-CORDEX dataset contains output from high-resolution regional climate models run over North America using boundary conditions from global simulations in the CMIP5 archive. The subset of the NA-CORDEX data on AWS (data volume ~15 TB) includes daily data from 1950-2100 for impacts-relevant variables on a 0.25 degree or 0.50 degree common lat-lon grid. This data is freely available on AWS S3 thanks to the AWS Open Data Sponsorship Program and the Amazon Sustainability Data Initiative, which provide free storage and egress. The data on AWS is stored in Zarr format. This format supports the same data model as netCDF and is well suited to object storage and distributed computing in the cloud using the Pangeo libraries in Python. An Intake-ESM Catalog listing all available data can be found at: [https://ncar-na-cordex.s3-us-west-2.amazonaws.com/catalogs/aws-na-cordex.json] The full dataset (data volume ~35 TB) can be accessed for download or via web services on the NCAR Climate Data Gateway. [https://www.earthsystemgrid.org/search/cordexsearch.html]
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Climate Resilience Information System (CRIS) provides data and tools for developers of climate services. This image service provides access to gridded historical observations for 16 threshold values of precipitation for the contiguous United States for 1950-2013. These services are intended to support analysis of climate exposure for custom geographies and time horizons. More details on the how the data were processed can be found in Understanding CRIS Data.Time RangesPixel values for each variable were calculated for each year from 1950 to 2013. Variable DefinitionsSee the variable list and definitions here. Additional ServicesTwo versions of the gridded hisorical observations are available from CRIS:nClimGrid: a 4-km resolution dataset generated by NOAA. This data was used to downscale the STAR-ESDM climate projections in CRIS.Livneh: a 6-km resolution dataset generated by Livneh et al. This data was used to downscale the LOCA2 climate projections in CRIS.Using the Imagery LayerThe ArcGIS Tiled Imagery Service has a multidimensional structure -- a data cube with variable and time dimensions. Methods for accessing the different dimensions will depend on the software/client being used. For more details, please see the CRIS Developer’s Hub along with this instructional StoryMap. To run analysis, first use the multidimensional tools Aggregate or Subset in ArcGIS Pro to copy the necessary data locally.Data ExportData export is enabled on the services if using an ArcGIS client. NetCDF or Zarr files are also available from the NOAA Open Data Distribution system on Amazon Web Services.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CAFE60 hindcast/forecast datasets were submitted to the WMO decadal forecasting activity in 2020 allowing the CSIRO Climate Science Centre to become a Global Producing Centre for Annual to Decadal Climate Prediction (GPC-ADCP) of the World Meteorological Organisation (WMO) Lead Centre for Annual-to-Decadal Climate Prediction. This is an update and includes only the hindcasts/forecasts for 2020. As a variation from the first submission (https://doi.org/10.25919/cngc-hs12), this dataset includes all 96 members. Variables tos (sea surface temperature) and tas (2m surface air temperature) have also been included but are not used for the model exchange currently, however, have been included due to their important applications including model behaviour checking and data validation. Lineage: The CAFE60 model was run on the NCI (gadi) and/or Pawsey (magnus) supercomputers, generating forecasts that were partly downloaded to the CSIRO (pearcey) and/or NCI (gadi) cluster. From raw model products in zipped ZARR format standardised netCDF files using the CMOR3 python libraries were created to forming this dataset. These files have the common "CMOR" version name of v20210102 which is the approximate date for when they were created.
The objective of the Coupled Model Intercomparison Project (CMIP) is to better understand past, present and future climate changes arising from natural, unforced variability or in response to changes in radiative forcing in a multi-model context. This understanding includes assessments of model performance during the historical period and quantifications of the causes of the spread in future projections. Idealized experiments are also used to increase understanding of the model responses. In addition to these long time scale responses, experiments are performed to investigate the predictability of the climate system on various time and space scales as well as making predictions from observed climate states. See the World Climate Research Programme for more details. The Google Cloud CMIP6 data are derived from the original CMIP6 data files, as distributed via the Earth System Grid Federation (ESGF). Consistent with the CMIP6 terms of use, some modifications have been made to render the data more analysis-ready, including concatenation of time slices and conversion from netCDF to Zarr format. All relevant metadata, including information about how to cite, are provided in the zarr metadata. The CMIP6 hosted on Google Cloud are maintained by the Climate Data Science Lab at Lamont Doherty Earth Observatory (LDEO) of Columbia University, as part of the Pangeo Project . Transferring CMIP6 from ESGF into Google Cloud is ongoing, and only a fraction of the full CMIP6 archive is currently available. Users may request new data to be added via this form .This public dataset is hosted in Google Cloud Storage and available free to use. Use this quick start guide to learn more.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This record includes the Sea Level Impacts Input Dataset by Elevation, Region, and Scenario (SLIIDERS) dataset. It also includes source code to generate this product as well as necessary inputs that are not available for download elsewhere. Both the dataset and the source code are consistent with version 1.2. Note: The version associated with Depsky et al., 2023 is v1.1.
The zipped SLIIDERS Zarr store can be downloaded and accessed locally or can be directly accessed via code similar to the following:
from fsspec.implementations.zip import ZipFileSystem import xarray as xr xr.open_zarr(ZipFileSystem(url_of_file_in_record}}).get_mapper())
File Inventory
Products
sliiders-v1.2.zarr.zip: SLIIDERS. A global dataset containing 18 socioeconomic variables, reflecting present day socioeconomic and geophysical characteristics of 11,980 coastal regions and projecting capital stock, GDP, and population growth trajectories through 2100 for five SSPs and two economic growth models. These variables are used as inputs to the pyCIAM modeling platform detailed in Depsky et al. 2023.
sliiders-v1.2.nc: Same as the original SLIIDERS dataset, but in netcdf format.
Inputs
All provided inputs are manually created or adjusted points used to create the coastline segments of SLIIDERS:
ciam_segment_pts_manual_adds.parquet: A list of segment points manually added to those that come from the extreme sea level model CoDEC (Muis et al. 2020)
gtsm_stations_ciam_ne_coastline_snapped.parquet: Stations from CoDEC snapped to coastlines from Natural Earth
gtsm_stations_eur_tothin.parquet: A list of European points in CoDEC to thin. CoDEC provides ~10km resolution in Europe and ~50km elsewhere. For consistency, SLIIDERS uses ~50km spacing for its coastal segments globally.
Source Code
sliiders-1.2.zip: The source code used to generate SLIIDERS v1.1. See READMEs within this code for more details. This is consistent with release v1.2 of the code maintained on github at https://github.com/ClimateImpactLab/SLIIDERS
The Community Earth System Model (CESM) Large Ensemble Numerical Simulation (LENS) dataset includes a 40-member ensemble of climate simulations for the period 1920-2100 using historical data (1920-2005) or assuming the RCP8.5 greenhouse gas concentration scenario (2006-2100), as well as longer control runs based on pre-industrial conditions. The data comprise both surface (2D) and volumetric (3D) variables in the atmosphere, ocean, land, and ice domains. The total data volume of the original dataset is ~500TB, which has traditionally been stored as ~150,000 individual CF/NetCDF files on disk or magnetic tape made available through the NCAR Climate Data Gateway for download or via web services. NCAR has copied a subset (currently ~70 TB) of CESM LENS data to Amazon S3 as part of the AWS Public Datasets Program. To optimize for large-scale analytics we have represented the data as ~275 Zarr stores format accessible through the Python Xarray library. Each Zarr store contains a single physical variable for a given model run type and temporal frequency (monthly, daily, 6-hourly).
The NA-CORDEX dataset contains regional climate change scenario data and guidance for North America, for use in impacts, decision-making, and climate science. The NA-CORDEX data archive contains output from regional climate models (RCMs) run over a domain covering most of North America using boundary conditions from global climate model (GCM) simulations in the CMIP5 archive. These simulations run from 1950–2100 with a spatial resolution of 0.22°/25km or 0.44°/50km. This AWS S3 version of the data includes selected variables converted to Zarr format from the original NetCDF. Only daily data are currently available; all daily data were mapped to the Gregorian calendar. Sub-daily data may be added later. Both raw and bias-corrected data are available. Further details about this version of the dataset are available at the documentation link below.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Climate Resilience Information System (CRIS) provides data and tools for developers of climate services. This image service provides access to gridded historical observations for 16 threshold values of precipitation for the contiguous United States for 1950-2023. These services are intended to support analysis of climate exposure for custom geographies and time horizons. More details on the how the data were processed can be found in Understanding CRIS Data.Time RangesPixel values for each variable were calculated for each year from 1950 to 2023. Variable DefinitionsSee the variable list and definitions here. Additional ServicesTwo versions of the gridded hisorical observations are available from CRIS:nClimGrid: a 4-km resolution dataset generated by NOAA. This data was used to downscale the STAR-ESDM climate projections in CRIS.Livneh: a 6-km resolution dataset generated by Livneh et al. This data was used to downscale the LOCA2 climate projections in CRIS.Using the Imagery LayerThe ArcGIS Tiled Imagery Service has a multidimensional structure -- a data cube with variable and time dimensions. Methods for accessing the different dimensions will depend on the software/client being used. For more details, please see the CRIS Developer’s Hub along with this instructional StoryMap. To run analysis, first use the multidimensional tools Aggregate or Subset in ArcGIS Pro to copy the necessary data locally.Data ExportData export is enabled on the services if using an ArcGIS client. NetCDF or Zarr files are also available from the NOAA Open Data Distribution system on Amazon Web Services.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Climate Resilience Information System (CRIS) provides data and tools for developers of climate services. This image service provides access to gridded historical observations for 27 threshold values of temperature for the contiguous United States for 1950-2023. These services are intended to support analysis of climate exposure for custom geographies and time horizons. More details on the how the data were processed can be found in Understanding CRIS Data.Time RangesPixel values for each variable were calculated for each year from 1950 to 2023. Variable DefinitionsSee the variable list and definitions here. Additional ServicesTwo versions of the gridded hisorical observations are available from CRIS:nClimGrid: a 4-km resolution dataset generated by NOAA. This data was used to downscale the STAR-ESDM climate projections in CRIS.Livneh: a 6-km resolution dataset generated by Livneh et al. This data was used to downscale the LOCA2 climate projections in CRIS.Using the Imagery LayerThe ArcGIS Tiled Imagery Service has a multidimensional structure -- a data cube with variable and time dimensions. Methods for accessing the different dimensions will depend on the software/client being used. For more details, please see the CRIS Developer’s Hub along with this instructional StoryMap. To run analysis, first use the multidimensional tools Aggregate or Subset in ArcGIS Pro to copy the necessary data locally.Data ExportData export is enabled on the services if using an ArcGIS client. NetCDF or Zarr files are also available from the NOAA Open Data Distribution system on Amazon Web Services.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Met Office UK Deterministic Dataset (Zarr Format)
Description
This dataset is a subset of the Met Office UK Deterministic Dataset, converted from the original NetCDF format into Zarr format for modern data analysis. The Zarr files are packaged as tar archives for efficient storage and transfer. The subset focuses on specific variables and configurations, which are detailed in the met_office_uk_data_config.yaml file included in this repository. Researchers and… See the full description on the dataset page: https://huggingface.co/datasets/jcamier/met-office-uk-deterministic-zarr.