7 datasets found
  1. H

    (HS 2) Automate Workflows using Jupyter notebook to create Large Extent...

    • hydroshare.org
    • search.dataone.org
    zip
    Updated Oct 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Young-Don Choi (2024). (HS 2) Automate Workflows using Jupyter notebook to create Large Extent Spatial Datasets [Dataset]. http://doi.org/10.4211/hs.a52df87347ef47c388d9633925cde9ad
    Explore at:
    zip(2.4 MB)Available download formats
    Dataset updated
    Oct 15, 2024
    Dataset provided by
    HydroShare
    Authors
    Young-Don Choi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We implemented automated workflows using Jupyter notebooks for each state. The GIS processing, crucial for merging, extracting, and projecting GeoTIFF data, was performed using ArcPy—a Python package for geographic data analysis, conversion, and management within ArcGIS (Toms, 2015). After generating state-scale LES (large extent spatial) datasets in GeoTIFF format, we utilized the xarray and rioxarray Python packages to convert GeoTIFF to NetCDF. Xarray is a Python package to work with multi-dimensional arrays and rioxarray is rasterio xarray extension. Rasterio is a Python library to read and write GeoTIFF and other raster formats. Xarray facilitated data manipulation and metadata addition in the NetCDF file, while rioxarray was used to save GeoTIFF as NetCDF. These procedures resulted in the creation of three HydroShare resources (HS 3, HS 4 and HS 5) for sharing state-scale LES datasets. Notably, due to licensing constraints with ArcGIS Pro, a commercial GIS software, the Jupyter notebook development was undertaken on a Windows OS.

  2. d

    Data from: Multi-task Deep Learning for Water Temperature and Streamflow...

    • catalog.data.gov
    Updated Nov 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Multi-task Deep Learning for Water Temperature and Streamflow Prediction (ver. 1.1, June 2022) [Dataset]. https://catalog.data.gov/dataset/multi-task-deep-learning-for-water-temperature-and-streamflow-prediction-ver-1-1-june-2022
    Explore at:
    Dataset updated
    Nov 11, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    This item contains data and code used in experiments that produced the results for Sadler et. al (2022) (see below for full reference). We ran five experiments for the analysis, Experiment A, Experiment B, Experiment C, Experiment D, and Experiment AuxIn. Experiment A tested multi-task learning for predicting streamflow with 25 years of training data and using a different model for each of 101 sites. Experiment B tested multi-task learning for predicting streamflow with 25 years of training data and using a single model for all 101 sites. Experiment C tested multi-task learning for predicting streamflow with just 2 years of training data. Experiment D tested multi-task learning for predicting water temperature with over 25 years of training data. Experiment AuxIn used water temperature as an input variable for predicting streamflow. These experiments and their results are described in detail in the WRR paper. Data from a total of 101 sites across the US was used for the experiments. The model input data and streamflow data were from the Catchment Attributes and Meteorology for Large-sample Studies (CAMELS) dataset (Newman et. al 2014, Addor et. al 2017). The water temperature data were gathered from the National Water Information System (NWIS) (U.S. Geological Survey, 2016). The contents of this item are broken into 13 files or groups of files aggregated into zip files:

    1. input_data_processing.zip: A zip file containing the scripts used to collate the observations, input weather drivers, and catchment attributes for the multi-task modeling experiments
    2. flow_observations.zip: A zip file containing collated daily streamflow data for the sites used in multi-task modeling experiments. The streamflow data were originally accessed from the CAMELs dataset. The data are stored in csv and Zarr formats.
    3. temperature_observations.zip: A zip file containing collated daily water temperature data for the sites used in multi-task modeling experiments. The data were originally accessed via NWIS. The data are stored in csv and Zarr formats.
    4. temperature_sites.geojson: Geojson file of the locations of the water temperature and streamflow sites used in the analysis.
    5. model_drivers.zip: A zip file containing the daily input weather driver data for the multi-task deep learning models. These data are from the Daymet drivers and were collated from the CAMELS dataset. The data are stored in csv and Zarr formats.
    6. catchment_attrs.csv: Catchment attributes collatted from the CAMELS dataset. These data are used for the Random Forest modeling. For full metadata regarding these data see CAMELS dataset.
    7. experiment_workflow_files.zip: A zip file containing workflow definitions used to run multi-task deep learning experiments. These are Snakemake workflows. To run a given experiment, one would run (for experiment A) 'snakemake -s expA_Snakefile --configfile expA_config.yml'
    8. river-dl-paper_v0.zip: A zip file containing python code used to run multi-task deep learning experiments. This code was called by the Snakemake workflows contained in 'experiment_workflow_files.zip'.
    9. random_forest_scripts.zip: A zip file containing Python code and a Python Jupyter Notebook used to prepare data for, train, and visualize feature importance of a Random Forest model.
    10. plotting_code.zip: A zip file containing python code and Snakemake workflow used to produce figures showing the results of multi-task deep learning experiments.
    11. results.zip: A zip file containing results of multi-task deep learning experiments. The results are stored in csv and netcdf formats. The netcdf files were used by the plotting libraries in 'plotting_code.zip'. These files are for five experiments, 'A', 'B', 'C', 'D', and 'AuxIn'. These experiment names are shown in the file name.
    12. sample_scripts.zip: A zip file containing scripts for creating sample output to demonstrate how the modeling workflow was executed.
    13. sample_output.zip: A zip file containing sample output data. Similar files are created by running the sample scripts provided.
    A. Newman; K. Sampson; M. P. Clark; A. Bock; R. J. Viger; D. Blodgett, 2014. A large-sample watershed-scale hydrometeorological dataset for the contiguous USA. Boulder, CO: UCAR/NCAR. https://dx.doi.org/10.5065/D6MW2F4D

    N. Addor, A. Newman, M. Mizukami, and M. P. Clark, 2017. Catchment attributes for large-sample studies. Boulder, CO: UCAR/NCAR. https://doi.org/10.5065/D6G73C3Q

    Sadler, J. M., Appling, A. P., Read, J. S., Oliver, S. K., Jia, X., Zwart, J. A., & Kumar, V. (2022). Multi-Task Deep Learning of Daily Streamflow and Water Temperature. Water Resources Research, 58(4), e2021WR030138. https://doi.org/10.1029/2021WR030138

    U.S. Geological Survey, 2016, National Water Information System data available on the World Wide Web (USGS Water Data for the Nation), accessed Dec. 2020.

  3. U

    CMAQ Grid Mask Files for 12km CONUS - US States and NOAA Climate Regions

    • dataverse-staging.rdmc.unc.edu
    • datasearch.gesis.org
    Updated Dec 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UNC Dataverse (2019). CMAQ Grid Mask Files for 12km CONUS - US States and NOAA Climate Regions [Dataset]. http://doi.org/10.15139/S3/XDYYB9
    Explore at:
    Dataset updated
    Dec 12, 2019
    Dataset provided by
    UNC Dataverse
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    United States
    Description

    Data Summary: US states grid mask file and NOAA climate regions grid mask file, both compatible with the 12US1 modeling grid domain. Note:The datasets are on a Google Drive. The metadata associated with this DOI contain the link to the Google Drive folder and instructions for downloading the data. These files can be used with CMAQ-ISAMv5.3 to track state- or region-specific emissions. See Chapter 11 and Appendix B.4 in the CMAQ User's Guide for further information on how to use the ISAM control file with GRIDMASK files. The files can also be used for state or region-specific scaling of emissions using the CMAQv5.3 DESID module. See the DESID Tutorial and Appendix B.4 in the CMAQ User's Guide for further information on how to use the Emission Control File to scale emissions in predetermined geographical areas. File Location and Download Instructions: Link to GRIDMASK files Link to README text file with information on how these files were created File Format: The grid mask are stored as netcdf formatted files using I/O API data structures (https://www.cmascenter.org/ioapi/). Information on the model projection and grid structure is contained in the header information of the netcdf file. The output files can be opened and manipulated using I/O API utilities (e.g. M3XTRACT, M3WNDW) or other software programs that can read and write netcdf formatted files (e.g. Fortran, R, Python). File descriptions These GRIDMASK files can be used with the 12US1 modeling grid domain (grid origin x = -2556000 m, y = -1728000 m; N columns = 459, N rows = 299). GRIDMASK_STATES_12US1.nc - This file containes 49 variables for the 48 states in the conterminous U.S. plus DC. Each state variable (e.g., AL, AZ, AR, etc.) is a 2D array (299 x 459) providing the fractional area of each grid cell that falls within that state. GRIDMASK_CLIMATE_REGIONS_12US1.nc - This file containes 9 variables for 9 NOAA climate regions based on the Karl and Koss (1984) definition of climate regions. Each climate region variable (e.g., CLIMATE_REGION_1, CLIMATE_REGION_2, etc.) is a 2D array (299 x 459) providing the fractional area of each grid cell that falls within that climate region. NOAA Climate regions: CLIMATE_REGION_1: Northwest (OR, WA, ID) CLIMATE_REGION_2: West (CA, NV) CLIMATE_REGION_3: West North Central (MT, WY, ND, SD, NE) CLIMATE_REGION_4: Southwest (UT, AZ, NM, CO) CLIMATE_REGION_5: South (KS, OK, TX, LA, AR, MS) CLIMATE_REGION_6: Central (MO, IL, IN, KY, TN, OH, WV) CLIMATE_REGION_7: East North Central (MN, IA, WI, MI) CLIMATE_REGION_8: Northeast (MD, DE, NJ, PA, NY, CT, RI, MA, VT, NH, ME) + Washington, D.C.* CLIMATE_REGION_9: Southeast (VA, NC, SC, GA, AL, GA) *Note that Washington, D.C. is not included in any of the climate regions on the website but was included with the “Northeast” region for the generation of this GRIDMASK file.

  4. U

    CMAQ Model Version 5.1 Output Data -- 2013 CONUS_12km

    • dataverse-staging.rdmc.unc.edu
    Updated Apr 18, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UNC Dataverse (2019). CMAQ Model Version 5.1 Output Data -- 2013 CONUS_12km [Dataset]. http://doi.org/10.15139/S3/FQO7IS
    Explore at:
    Dataset updated
    Apr 18, 2019
    Dataset provided by
    UNC Dataverse
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Data Summary:Community Multiscale Air Quality (CMAQ) Model Version 5.1 output data from a 01/01/2013 - 12/31/2013 CONUS simulation. Note:The datasets are on a Google Drive. The metadata associated with this DOI contain the link to the Google Drive folder and instructions for downloading the data. File Location and Download Instructions: The 2013 model output are available in two forms. The hourly datasets are a set of monthly files with surface-layer hourly concentrations for a model domain that encompasses the contiguous U.S. with a horizontal grid resolution of 12km x 12km. The daily average dataset is a single file with a year of daily average data for the same domain. Link to hourly data Link to daily average data Download instructions File Format:The 2013 model output are stored as netcdf formatted files using I/O API data structures (https://www.cmascenter.org/ioapi/). Information on the model projection and grid structure is contained in the header information of the netcdf file. The output files can be opened and manipulated using I/O API utilities (e.g. M3XTRACT, M3WNDW) or other software programs that can read and write netcdf formatted files (e.g. Fortran, R, Python). Model Variables Variable names in hourly data files: Variable Name, Units, Variable Description CO, ppb, carbon monoxide NO, ppb, nitric oxide NO2, ppb, nitrogen dioxide O3, ppb, ozone SO2, ppb, sulfur dioxide SO2_UGM3, micrograms/m^3, sulfur dioxide AECIJ, micrograms/m^3, aerosol elemental carbon (sum of i-mode and j-mode)* AOCIJ, micrograms/m^3, aerosol organic carbon (sum of i-mode and j-mode)* ANO3IJ, micrograms/m^3, aerosol nitrate (sum of i-mode and j-mode)* TNO3, micrograms/m^3, total nitrate= NO3 (ANO3IJ)+ nitric acid (HNO3) ANH4IJ,micrograms/m^3, aerosol ammonium (sum of i-mode and j-mode)* ASO4IJ,micrograms/m^3, aerosol sulfate (sum of i-mode and j-mode)* PMIJ**, micrograms/m^3, total fine particulate matter (sum of i-mode and j-mode)* PM10**, micrograms/m^3, total particulate matter (sum of i-mode, j-mode, k-mode)* Variable names in daily data files: Note: All daily averages are computed using Local Standard Time (LST) Variable Name, Units, Variable Description CO_AVG, ppb, 24-hr average carbon monoxide NO_AVG, ppb, 24-hr average nitric oxide NO2_AVG, ppb, 24-hr average nitrogen dioxide O3_AVG, ppb, 24-hr average ozone O3_MDA8, ppb, Maximum daily 8-hr average ozone + SO2_AVG, ppb, 24-hr average sulfur dioxide SO2_UGM3_AVG, micrograms/m^3, 24-hr average sulfur dioxide AECIJ_AVG, micrograms/m^3, 24-hr average aerosol elemental carbon (sum of i-mode and j-mode)* AOCIJ_AVG, micrograms/m^3, 24-hr average aerosol organic carbon (sum of i-mode and j-mode)* ANO3IJ_AVG, micrograms/m^3, 24-hr average aerosol nitrate (sum of i-mode and j-mode)* TNO3_AVG, micrograms/m^3, 24-hr average total nitrate= NO3 (ANO3IJ)+ nitric acid (HNO3) ANH4IJ_AVG,micrograms/m^3, 24-hr average aerosol ammonium (sum of i-mode and j-mode)* ASO4IJ_AVG,micrograms/m^3, 24-hr average aerosol sulfate (sum of i-mode and j-mode)* PMIJ_AVG**, micrograms/m^3, 24-hr average total fine particulate matter (sum of i-mode and j-mode)* PM10_AVG**, micrograms/m^3, 24-hr average total particulate matter (sum of i-mode, j-mode, k-mode)* +The calculation of the MDA8 O3 variable is based on the current ozone NAAQS and is derived from the highest of the 17 consecutive 8-hr averages beginning with the 8-hr period from 7:00am to 3:00pm LST and ending with the 8-hr period from 11pm to 7am the following day. *CMAQ represents PM using three interacting lognormal distributions, or modes. Two modes, Aitken (i-mode) and accumulation (j-mode) are generally less than 2.5 microns in diameter while the coarse mode (k-mode) contains significant amounts of mass above 2.5 microns. **Note that modeled size distributions can also be used to output PM species that represent the aerosol mass that falls below a specific diameter, e.g. 2.5 um or 10um. The output variables that are based on the sharp cut-off method are typically very similar to the aggregate PMIJ (i+j mode) and PM10 (i+j+k mode) variables included in these files. Further information on particle size-composition distributions in CMAQv5.0 can be found in Nolte et al. (2015), https://doi.org/10.5194/gmd-8-2877-2015. Simulation Settings and Inputs: CMAQ Model Model version: 5.1Bi-directional NH3 air-surface exchange: Massad formulationChemical mechanism: CB05e51Aerosol module: aero6 Domain: Continental U.S. (CONUS) using a 12 km grid size and a Lambert Conformal projection assuming a spherical earth with radius 6370.0 km. Vertical Resolution: 35 layers from the surface to the top of the free troposphere with layer 1 nominally 19 m tall. Boundary Condition Inputs Hourly values from 2013 simulation of GEOS-Chem v9-01-02 with GEOS-5 meteorology inputs. Emissions Inputs Anthropogenic emissions: Emissions inventory label 2013ej. 2011 modeling platform version 6.2 with 2013 updates for fires, mobile sources, and mobile source (link...

  5. g

    CMAQ Model Version 5.3 Input Data -- 1/1/2016 - 12/31/2016 12km CONUS

    • datasearch.gesis.org
    • dataverse-staging.rdmc.unc.edu
    Updated Jan 22, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    US EPA (2020). CMAQ Model Version 5.3 Input Data -- 1/1/2016 - 12/31/2016 12km CONUS [Dataset]. http://doi.org/10.15139/S3/MHNUNE
    Explore at:
    Dataset updated
    Jan 22, 2020
    Dataset provided by
    Odum Institute Dataverse Network
    Authors
    US EPA
    Description

    Data Summary:

    CMAQv5.3 input data for a 01/01/2016 - 12/31/2016 simulation over the Continental US. Note:The datasets are on a Google Drive. The metadata associated with this DOI contain the link to the Google Drive folder and instructions for downloading the data.

    File Location and Download Instructions:

    File Format:

    The 2016 model input are stored as netcdf formatted files using I/O API data structures (https://www.cmascenter.org/ioapi/). Information on the model projection and grid structure is contained in the header information of the netcdf file. The output files can be opened and manipulated using I/O API utilities (e.g. M3XTRACT, M3WNDW) or other software programs that can read and write netcdf formatted files (e.g. Fortran, R, Python).

  6. g

    Hemispheric CMAQ Model Version 5.3beta Output Data – 2016 seasonally...

    • datasearch.gesis.org
    • dataverse-staging.rdmc.unc.edu
    Updated Jan 22, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    US EPA (2020). Hemispheric CMAQ Model Version 5.3beta Output Data – 2016 seasonally averaged 108km for N. Hemisphere [Dataset]. http://doi.org/10.15139/S3/QJDYWO
    Explore at:
    Dataset updated
    Jan 22, 2020
    Dataset provided by
    Odum Institute Dataverse Network
    Authors
    US EPA
    Description

    Data Summary:

    CMAQv5.3 output data for a 2016 seasonal simulation over the Northern Hemisphere. Note:The datasets are on a Google Drive. The metadata associated with this DOI contain the link to the Google Drive folder and instructions for downloading the data.

    File Location and Download Instructions:

    File Format:

    The 2016 model input are stored as netcdf formatted files using I/O API data structures (https://www.cmascenter.org/ioapi/). Information on the model projection and grid structure is contained in the header information of the netcdf file. The output files can be opened and manipulated using I/O API utilities (e.g. M3XTRACT, M3WNDW) or other software programs that can read and write netcdf formatted files (e.g. Fortran, R, Python).

  7. g

    MCIP Version 4.3 output based on WRF Version 3.8 -- 1/2015-12/2015...

    • datasearch.gesis.org
    • dataverse-staging.rdmc.unc.edu
    Updated Jan 22, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    EPA (2020). MCIP Version 4.3 output based on WRF Version 3.8 -- 1/2015-12/2015 Continental US_12km [Dataset]. http://doi.org/10.15139/S3/KISNXI
    Explore at:
    Dataset updated
    Jan 22, 2020
    Dataset provided by
    Odum Institute Dataverse Network
    Authors
    EPA
    Area covered
    United States
    Description

    Data Summary:

    MCIPv4.3 output data from a January to December 2015 simulation over the Continental US based on WRFv3.8.Note:The datasets are on a Google Drive. The metadata associated with this DOI contain the link to the Google Drive folder and instructions for downloading the data.

    File Location and Download Instructions:

    File Format:

    The 2015 model output are stored as netcdf formatted files using I/O API data structures (https://www.cmascenter.org/ioapi/). Information on the model projection and grid structure is contained in the header information of the netcdf file. The output files can be opened and manipulated using I/O API utilities (e.g. M3XTRACT, M3WNDW) or other software programs that can read and write netcdf formatted files (e.g. Fortran, R, Python).

  8. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Young-Don Choi (2024). (HS 2) Automate Workflows using Jupyter notebook to create Large Extent Spatial Datasets [Dataset]. http://doi.org/10.4211/hs.a52df87347ef47c388d9633925cde9ad

(HS 2) Automate Workflows using Jupyter notebook to create Large Extent Spatial Datasets

Explore at:
zip(2.4 MB)Available download formats
Dataset updated
Oct 15, 2024
Dataset provided by
HydroShare
Authors
Young-Don Choi
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

We implemented automated workflows using Jupyter notebooks for each state. The GIS processing, crucial for merging, extracting, and projecting GeoTIFF data, was performed using ArcPy—a Python package for geographic data analysis, conversion, and management within ArcGIS (Toms, 2015). After generating state-scale LES (large extent spatial) datasets in GeoTIFF format, we utilized the xarray and rioxarray Python packages to convert GeoTIFF to NetCDF. Xarray is a Python package to work with multi-dimensional arrays and rioxarray is rasterio xarray extension. Rasterio is a Python library to read and write GeoTIFF and other raster formats. Xarray facilitated data manipulation and metadata addition in the NetCDF file, while rioxarray was used to save GeoTIFF as NetCDF. These procedures resulted in the creation of three HydroShare resources (HS 3, HS 4 and HS 5) for sharing state-scale LES datasets. Notably, due to licensing constraints with ArcGIS Pro, a commercial GIS software, the Jupyter notebook development was undertaken on a Windows OS.

Search
Clear search
Close search
Google apps
Main menu