7 datasets found

H
(HS 2) Automate Workflows using Jupyter notebook to create Large Extent...
hydroshare.org
search.dataone.org
zip
Updated Oct 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Young-Don Choi (2024). (HS 2) Automate Workflows using Jupyter notebook to create Large Extent Spatial Datasets [Dataset]. http://doi.org/10.4211/hs.a52df87347ef47c388d9633925cde9ad
Explore at:
zip(2.4 MB)Available download formats
Unique identifier
https://doi.org/10.4211/hs.a52df87347ef47c388d9633925cde9ad
Dataset updated
Oct 15, 2024
Dataset provided by
HydroShare
Authors
Young-Don Choi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We implemented automated workflows using Jupyter notebooks for each state. The GIS processing, crucial for merging, extracting, and projecting GeoTIFF data, was performed using ArcPy—a Python package for geographic data analysis, conversion, and management within ArcGIS (Toms, 2015). After generating state-scale LES (large extent spatial) datasets in GeoTIFF format, we utilized the xarray and rioxarray Python packages to convert GeoTIFF to NetCDF. Xarray is a Python package to work with multi-dimensional arrays and rioxarray is rasterio xarray extension. Rasterio is a Python library to read and write GeoTIFF and other raster formats. Xarray facilitated data manipulation and metadata addition in the NetCDF file, while rioxarray was used to save GeoTIFF as NetCDF. These procedures resulted in the creation of three HydroShare resources (HS 3, HS 4 and HS 5) for sharing state-scale LES datasets. Notably, due to licensing constraints with ArcGIS Pro, a commercial GIS software, the Jupyter notebook development was undertaken on a Windows OS.
d
Data from: Multi-task Deep Learning for Water Temperature and Streamflow...
catalog.data.gov
Updated Nov 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Multi-task Deep Learning for Water Temperature and Streamflow Prediction (ver. 1.1, June 2022) [Dataset]. https://catalog.data.gov/dataset/multi-task-deep-learning-for-water-temperature-and-streamflow-prediction-ver-1-1-june-2022
Explore at:
Dataset updated
Nov 11, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
This item contains data and code used in experiments that produced the results for Sadler et. al (2022) (see below for full reference). We ran five experiments for the analysis, Experiment A, Experiment B, Experiment C, Experiment D, and Experiment AuxIn. Experiment A tested multi-task learning for predicting streamflow with 25 years of training data and using a different model for each of 101 sites. Experiment B tested multi-task learning for predicting streamflow with 25 years of training data and using a single model for all 101 sites. Experiment C tested multi-task learning for predicting streamflow with just 2 years of training data. Experiment D tested multi-task learning for predicting water temperature with over 25 years of training data. Experiment AuxIn used water temperature as an input variable for predicting streamflow. These experiments and their results are described in detail in the WRR paper. Data from a total of 101 sites across the US was used for the experiments. The model input data and streamflow data were from the Catchment Attributes and Meteorology for Large-sample Studies (CAMELS) dataset (Newman et. al 2014, Addor et. al 2017). The water temperature data were gathered from the National Water Information System (NWIS) (U.S. Geological Survey, 2016). The contents of this item are broken into 13 files or groups of files aggregated into zip files:

input_data_processing.zip: A zip file containing the scripts used to collate the observations, input weather drivers, and catchment attributes for the multi-task modeling experiments

flow_observations.zip: A zip file containing collated daily streamflow data for the sites used in multi-task modeling experiments. The streamflow data were originally accessed from the CAMELs dataset. The data are stored in csv and Zarr formats.

temperature_observations.zip: A zip file containing collated daily water temperature data for the sites used in multi-task modeling experiments. The data were originally accessed via NWIS. The data are stored in csv and Zarr formats.

temperature_sites.geojson: Geojson file of the locations of the water temperature and streamflow sites used in the analysis.

model_drivers.zip: A zip file containing the daily input weather driver data for the multi-task deep learning models. These data are from the Daymet drivers and were collated from the CAMELS dataset. The data are stored in csv and Zarr formats.

catchment_attrs.csv: Catchment attributes collatted from the CAMELS dataset. These data are used for the Random Forest modeling. For full metadata regarding these data see CAMELS dataset.

experiment_workflow_files.zip: A zip file containing workflow definitions used to run multi-task deep learning experiments. These are Snakemake workflows. To run a given experiment, one would run (for experiment A) 'snakemake -s expA_Snakefile --configfile expA_config.yml'

river-dl-paper_v0.zip: A zip file containing python code used to run multi-task deep learning experiments. This code was called by the Snakemake workflows contained in 'experiment_workflow_files.zip'.

random_forest_scripts.zip: A zip file containing Python code and a Python Jupyter Notebook used to prepare data for, train, and visualize feature importance of a Random Forest model.

plotting_code.zip: A zip file containing python code and Snakemake workflow used to produce figures showing the results of multi-task deep learning experiments.

results.zip: A zip file containing results of multi-task deep learning experiments. The results are stored in csv and netcdf formats. The netcdf files were used by the plotting libraries in 'plotting_code.zip'. These files are for five experiments, 'A', 'B', 'C', 'D', and 'AuxIn'. These experiment names are shown in the file name.

sample_scripts.zip: A zip file containing scripts for creating sample output to demonstrate how the modeling workflow was executed.

sample_output.zip: A zip file containing sample output data. Similar files are created by running the sample scripts provided.

A. Newman; K. Sampson; M. P. Clark; A. Bock; R. J. Viger; D. Blodgett, 2014. A large-sample watershed-scale hydrometeorological dataset for the contiguous USA. Boulder, CO: UCAR/NCAR. https://dx.doi.org/10.5065/D6MW2F4D

N. Addor, A. Newman, M. Mizukami, and M. P. Clark, 2017. Catchment attributes for large-sample studies. Boulder, CO: UCAR/NCAR. https://doi.org/10.5065/D6G73C3Q

Sadler, J. M., Appling, A. P., Read, J. S., Oliver, S. K., Jia, X., Zwart, J. A., & Kumar, V. (2022). Multi-Task Deep Learning of Daily Streamflow and Water Temperature. Water Resources Research, 58(4), e2021WR030138. https://doi.org/10.1029/2021WR030138

U.S. Geological Survey, 2016, National Water Information System data available on the World Wide Web (USGS Water Data for the Nation), accessed Dec. 2020.
U
CMAQ Grid Mask Files for 12km CONUS - US States and NOAA Climate Regions
dataverse-staging.rdmc.unc.edu
datasearch.gesis.org
Updated Dec 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UNC Dataverse (2019). CMAQ Grid Mask Files for 12km CONUS - US States and NOAA Climate Regions [Dataset]. http://doi.org/10.15139/S3/XDYYB9
Explore at:
Unique identifier
https://doi.org/10.15139/S3/XDYYB9
Dataset updated
Dec 12, 2019
Dataset provided by
UNC Dataverse
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
United States
Description
Data Summary: US states grid mask file and NOAA climate regions grid mask file, both compatible with the 12US1 modeling grid domain. Note:The datasets are on a Google Drive. The metadata associated with this DOI contain the link to the Google Drive folder and instructions for downloading the data. These files can be used with CMAQ-ISAMv5.3 to track state- or region-specific emissions. See Chapter 11 and Appendix B.4 in the CMAQ User's Guide for further information on how to use the ISAM control file with GRIDMASK files. The files can also be used for state or region-specific scaling of emissions using the CMAQv5.3 DESID module. See the DESID Tutorial and Appendix B.4 in the CMAQ User's Guide for further information on how to use the Emission Control File to scale emissions in predetermined geographical areas. File Location and Download Instructions: Link to GRIDMASK files Link to README text file with information on how these files were created File Format: The grid mask are stored as netcdf formatted files using I/O API data structures (https://www.cmascenter.org/ioapi/). Information on the model projection and grid structure is contained in the header information of the netcdf file. The output files can be opened and manipulated using I/O API utilities (e.g. M3XTRACT, M3WNDW) or other software programs that can read and write netcdf formatted files (e.g. Fortran, R, Python). File descriptions These GRIDMASK files can be used with the 12US1 modeling grid domain (grid origin x = -2556000 m, y = -1728000 m; N columns = 459, N rows = 299). GRIDMASK_STATES_12US1.nc - This file containes 49 variables for the 48 states in the conterminous U.S. plus DC. Each state variable (e.g., AL, AZ, AR, etc.) is a 2D array (299 x 459) providing the fractional area of each grid cell that falls within that state. GRIDMASK_CLIMATE_REGIONS_12US1.nc - This file containes 9 variables for 9 NOAA climate regions based on the Karl and Koss (1984) definition of climate regions. Each climate region variable (e.g., CLIMATE_REGION_1, CLIMATE_REGION_2, etc.) is a 2D array (299 x 459) providing the fractional area of each grid cell that falls within that climate region. NOAA Climate regions: CLIMATE_REGION_1: Northwest (OR, WA, ID) CLIMATE_REGION_2: West (CA, NV) CLIMATE_REGION_3: West North Central (MT, WY, ND, SD, NE) CLIMATE_REGION_4: Southwest (UT, AZ, NM, CO) CLIMATE_REGION_5: South (KS, OK, TX, LA, AR, MS) CLIMATE_REGION_6: Central (MO, IL, IN, KY, TN, OH, WV) CLIMATE_REGION_7: East North Central (MN, IA, WI, MI) CLIMATE_REGION_8: Northeast (MD, DE, NJ, PA, NY, CT, RI, MA, VT, NH, ME) + Washington, D.C.* CLIMATE_REGION_9: Southeast (VA, NC, SC, GA, AL, GA) *Note that Washington, D.C. is not included in any of the climate regions on the website but was included with the “Northeast” region for the generation of this GRIDMASK file.
U
CMAQ Model Version 5.1 Output Data -- 2013 CONUS_12km
dataverse-staging.rdmc.unc.edu
Updated Apr 18, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UNC Dataverse (2019). CMAQ Model Version 5.1 Output Data -- 2013 CONUS_12km [Dataset]. http://doi.org/10.15139/S3/FQO7IS
Explore at:
Unique identifier
https://doi.org/10.15139/S3/FQO7IS
Dataset updated
Apr 18, 2019
Dataset provided by
UNC Dataverse
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Data Summary:Community Multiscale Air Quality (CMAQ) Model Version 5.1 output data from a 01/01/2013 - 12/31/2013 CONUS simulation. Note:The datasets are on a Google Drive. The metadata associated with this DOI contain the link to the Google Drive folder and instructions for downloading the data. File Location and Download Instructions: The 2013 model output are available in two forms. The hourly datasets are a set of monthly files with surface-layer hourly concentrations for a model domain that encompasses the contiguous U.S. with a horizontal grid resolution of 12km x 12km. The daily average dataset is a single file with a year of daily average data for the same domain. Link to hourly data Link to daily average data Download instructions File Format:The 2013 model output are stored as netcdf formatted files using I/O API data structures (https://www.cmascenter.org/ioapi/). Information on the model projection and grid structure is contained in the header information of the netcdf file. The output files can be opened and manipulated using I/O API utilities (e.g. M3XTRACT, M3WNDW) or other software programs that can read and write netcdf formatted files (e.g. Fortran, R, Python). Model Variables Variable names in hourly data files: Variable Name, Units, Variable Description CO, ppb, carbon monoxide NO, ppb, nitric oxide NO2, ppb, nitrogen dioxide O3, ppb, ozone SO2, ppb, sulfur dioxide SO2_UGM3, micrograms/m^3, sulfur dioxide AECIJ, micrograms/m^3, aerosol elemental carbon (sum of i-mode and j-mode)* AOCIJ, micrograms/m^3, aerosol organic carbon (sum of i-mode and j-mode)* ANO3IJ, micrograms/m^3, aerosol nitrate (sum of i-mode and j-mode)* TNO3, micrograms/m^3, total nitrate= NO3 (ANO3IJ)+ nitric acid (HNO3) ANH4IJ,micrograms/m^3, aerosol ammonium (sum of i-mode and j-mode)* ASO4IJ,micrograms/m^3, aerosol sulfate (sum of i-mode and j-mode)* PMIJ**, micrograms/m^3, total fine particulate matter (sum of i-mode and j-mode)* PM10**, micrograms/m^3, total particulate matter (sum of i-mode, j-mode, k-mode)* Variable names in daily data files: Note: All daily averages are computed using Local Standard Time (LST) Variable Name, Units, Variable Description CO_AVG, ppb, 24-hr average carbon monoxide NO_AVG, ppb, 24-hr average nitric oxide NO2_AVG, ppb, 24-hr average nitrogen dioxide O3_AVG, ppb, 24-hr average ozone O3_MDA8, ppb, Maximum daily 8-hr average ozone + SO2_AVG, ppb, 24-hr average sulfur dioxide SO2_UGM3_AVG, micrograms/m^3, 24-hr average sulfur dioxide AECIJ_AVG, micrograms/m^3, 24-hr average aerosol elemental carbon (sum of i-mode and j-mode)* AOCIJ_AVG, micrograms/m^3, 24-hr average aerosol organic carbon (sum of i-mode and j-mode)* ANO3IJ_AVG, micrograms/m^3, 24-hr average aerosol nitrate (sum of i-mode and j-mode)* TNO3_AVG, micrograms/m^3, 24-hr average total nitrate= NO3 (ANO3IJ)+ nitric acid (HNO3) ANH4IJ_AVG,micrograms/m^3, 24-hr average aerosol ammonium (sum of i-mode and j-mode)* ASO4IJ_AVG,micrograms/m^3, 24-hr average aerosol sulfate (sum of i-mode and j-mode)* PMIJ_AVG**, micrograms/m^3, 24-hr average total fine particulate matter (sum of i-mode and j-mode)* PM10_AVG**, micrograms/m^3, 24-hr average total particulate matter (sum of i-mode, j-mode, k-mode)* +The calculation of the MDA8 O3 variable is based on the current ozone NAAQS and is derived from the highest of the 17 consecutive 8-hr averages beginning with the 8-hr period from 7:00am to 3:00pm LST and ending with the 8-hr period from 11pm to 7am the following day. *CMAQ represents PM using three interacting lognormal distributions, or modes. Two modes, Aitken (i-mode) and accumulation (j-mode) are generally less than 2.5 microns in diameter while the coarse mode (k-mode) contains significant amounts of mass above 2.5 microns. **Note that modeled size distributions can also be used to output PM species that represent the aerosol mass that falls below a specific diameter, e.g. 2.5 um or 10um. The output variables that are based on the sharp cut-off method are typically very similar to the aggregate PMIJ (i+j mode) and PM10 (i+j+k mode) variables included in these files. Further information on particle size-composition distributions in CMAQv5.0 can be found in Nolte et al. (2015), https://doi.org/10.5194/gmd-8-2877-2015. Simulation Settings and Inputs: CMAQ Model Model version: 5.1Bi-directional NH3 air-surface exchange: Massad formulationChemical mechanism: CB05e51Aerosol module: aero6 Domain: Continental U.S. (CONUS) using a 12 km grid size and a Lambert Conformal projection assuming a spherical earth with radius 6370.0 km. Vertical Resolution: 35 layers from the surface to the top of the free troposphere with layer 1 nominally 19 m tall. Boundary Condition Inputs Hourly values from 2013 simulation of GEOS-Chem v9-01-02 with GEOS-5 meteorology inputs. Emissions Inputs Anthropogenic emissions: Emissions inventory label 2013ej. 2011 modeling platform version 6.2 with 2013 updates for fires, mobile sources, and mobile source (link...
g
CMAQ Model Version 5.3 Input Data -- 1/1/2016 - 12/31/2016 12km CONUS
datasearch.gesis.org
dataverse-staging.rdmc.unc.edu
Updated Jan 22, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
US EPA (2020). CMAQ Model Version 5.3 Input Data -- 1/1/2016 - 12/31/2016 12km CONUS [Dataset]. http://doi.org/10.15139/S3/MHNUNE
Explore at:
Unique identifier
https://doi.org/10.15139/S3/MHNUNE
Dataset updated
Jan 22, 2020
Dataset provided by
Odum Institute Dataverse Network
Authors
US EPA
Description
Data Summary:
CMAQv5.3 input data for a 01/01/2016 - 12/31/2016 simulation over the Continental US. Note:The datasets are on a Google Drive. The metadata associated with this DOI contain the link to the Google Drive folder and instructions for downloading the data.

File Location and Download Instructions:

Link to input files for 2016_12US1 benchmark case

Download instructions

File Format:
The 2016 model input are stored as netcdf formatted files using I/O API data structures (https://www.cmascenter.org/ioapi/). Information on the model projection and grid structure is contained in the header information of the netcdf file. The output files can be opened and manipulated using I/O API utilities (e.g. M3XTRACT, M3WNDW) or other software programs that can read and write netcdf formatted files (e.g. Fortran, R, Python).
g
Hemispheric CMAQ Model Version 5.3beta Output Data – 2016 seasonally...
datasearch.gesis.org
dataverse-staging.rdmc.unc.edu
Updated Jan 22, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
US EPA (2020). Hemispheric CMAQ Model Version 5.3beta Output Data – 2016 seasonally averaged 108km for N. Hemisphere [Dataset]. http://doi.org/10.15139/S3/QJDYWO
Explore at:
Unique identifier
https://doi.org/10.15139/S3/QJDYWO
Dataset updated
Jan 22, 2020
Dataset provided by
Odum Institute Dataverse Network
Authors
US EPA
Description
Data Summary:
CMAQv5.3 output data for a 2016 seasonal simulation over the Northern Hemisphere. Note:The datasets are on a Google Drive. The metadata associated with this DOI contain the link to the Google Drive folder and instructions for downloading the data.

File Location and Download Instructions:

Link to output files for HEMIS_cmaq

Download instructions

File Format:
The 2016 model input are stored as netcdf formatted files using I/O API data structures (https://www.cmascenter.org/ioapi/). Information on the model projection and grid structure is contained in the header information of the netcdf file. The output files can be opened and manipulated using I/O API utilities (e.g. M3XTRACT, M3WNDW) or other software programs that can read and write netcdf formatted files (e.g. Fortran, R, Python).
g
MCIP Version 4.3 output based on WRF Version 3.8 -- 1/2015-12/2015...
datasearch.gesis.org
dataverse-staging.rdmc.unc.edu
Updated Jan 22, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
EPA (2020). MCIP Version 4.3 output based on WRF Version 3.8 -- 1/2015-12/2015 Continental US_12km [Dataset]. http://doi.org/10.15139/S3/KISNXI
Explore at:
Unique identifier
https://doi.org/10.15139/S3/KISNXI
Dataset updated
Jan 22, 2020
Dataset provided by
Odum Institute Dataverse Network
Authors
EPA
Area covered
United States
Description
Data Summary:
MCIPv4.3 output data from a January to December 2015 simulation over the Continental US based on WRFv3.8.Note:The datasets are on a Google Drive. The metadata associated with this DOI contain the link to the Google Drive folder and instructions for downloading the data.
File Location and Download Instructions:

Link to MCIP Output Files
Download instructions

File Format:
The 2015 model output are stored as netcdf formatted files using I/O API data structures (https://www.cmascenter.org/ioapi/). Information on the model projection and grid structure is contained in the header information of the netcdf file. The output files can be opened and manipulated using I/O API utilities (e.g. M3XTRACT, M3WNDW) or other software programs that can read and write netcdf formatted files (e.g. Fortran, R, Python).
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Young-Don Choi (2024). (HS 2) Automate Workflows using Jupyter notebook to create Large Extent Spatial Datasets [Dataset]. http://doi.org/10.4211/hs.a52df87347ef47c388d9633925cde9ad

(HS 2) Automate Workflows using Jupyter notebook to create Large Extent Spatial Datasets

Explore at:

zip(2.4 MB)Available download formats

Unique identifier

https://doi.org/10.4211/hs.a52df87347ef47c388d9633925cde9ad

Dataset updated

Oct 15, 2024

Dataset provided by

HydroShare

Authors

Young-Don Choi

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

We implemented automated workflows using Jupyter notebooks for each state. The GIS processing, crucial for merging, extracting, and projecting GeoTIFF data, was performed using ArcPy—a Python package for geographic data analysis, conversion, and management within ArcGIS (Toms, 2015). After generating state-scale LES (large extent spatial) datasets in GeoTIFF format, we utilized the xarray and rioxarray Python packages to convert GeoTIFF to NetCDF. Xarray is a Python package to work with multi-dimensional arrays and rioxarray is rasterio xarray extension. Rasterio is a Python library to read and write GeoTIFF and other raster formats. Xarray facilitated data manipulation and metadata addition in the NetCDF file, while rioxarray was used to save GeoTIFF as NetCDF. These procedures resulted in the creation of three HydroShare resources (HS 3, HS 4 and HS 5) for sharing state-scale LES datasets. Notably, due to licensing constraints with ArcGIS Pro, a commercial GIS software, the Jupyter notebook development was undertaken on a Windows OS.

Clear search

Close search

Google apps

Main menu

(HS 2) Automate Workflows using Jupyter notebook to create Large Extent...

Data from: Multi-task Deep Learning for Water Temperature and Streamflow...

CMAQ Grid Mask Files for 12km CONUS - US States and NOAA Climate Regions

CMAQ Model Version 5.1 Output Data -- 2013 CONUS_12km

CMAQ Model Version 5.3 Input Data -- 1/1/2016 - 12/31/2016 12km CONUS

Data Summary:

File Location and Download Instructions:

File Format:

Hemispheric CMAQ Model Version 5.3beta Output Data – 2016 seasonally...

Data Summary:

File Location and Download Instructions:

File Format:

MCIP Version 4.3 output based on WRF Version 3.8 -- 1/2015-12/2015...

Data Summary:

File Location and Download Instructions:

File Format:

(HS 2) Automate Workflows using Jupyter notebook to create Large Extent Spatial DatasetsSee More Versions

(HS 2) Automate Workflows using Jupyter notebook to create Large Extent Spatial Datasets