2 datasets found
  1. Data for: A new paradigm for medium-range severe weather forecasts:...

    • data.niaid.nih.gov
    • zenodo.org
    • +1more
    zip
    Updated Jan 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aaron J. Hill; Russ S. Schumacher; Israel L. Jirak (2023). Data for: A new paradigm for medium-range severe weather forecasts: Probabilistic random forest-based predictions [Dataset]. http://doi.org/10.5061/dryad.c2fqz61cv
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 1, 2023
    Dataset provided by
    National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
    Colorado State University
    Authors
    Aaron J. Hill; Russ S. Schumacher; Israel L. Jirak
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Historical observations of severe weather and simulated severe weather environments (i.e., features) from the Global Ensemble Forecast System v12 (GEFSv12) Reforecast Dataset (GEFS/R) are used in conjunction to train and test random forest (RF) machine learning (ML) models to probabilistically forecast severe weather out to days 4–8. RFs are trained with ~9 years of the GEFS/R and severe weather reports to establish statistical relationships. Feature engineering is briefly explored to examine alternative methods for gathering features around observed events, including simplifying features using spatial averaging and increasing the GEFS/R ensemble size with time-lagging. Validated RF models are tested with ~1.5 years of real-time forecast output from the operational GEFSv12 ensemble and are evaluated alongside expert human-generated outlooks from the Storm Prediction Center (SPC). Both RF-based forecasts and SPC outlooks are skillful with respect to climatology at days 4 and 5 with diminishing skill thereafter. The RF-based forecasts exhibit tendencies to slightly underforecast severe weather events, but they tend to be well-calibrated at lower probability thresholds. Spatially averaging predictors during RF training allows for prior-day thermodynamic and kinematic environments to generate skillful forecasts, while time-lagging acts to expand the forecast areas, increasing resolution but decreasing overall skill. The results highlight the utility of ML-generated products to aid SPC forecast operations into the medium range. Methods These data include publically available local storm reports (from NOAA), publically available Storm Prediction Center (SPC) outlooks, and forecasts generated from the machine learning prediction system detailed in the manuscript. The local storm reports were retrieved from an online public-facing archive and gridded to NCEP grid 4. The SPC outlooks were originally in a shapefile format and ArcGIS was used to convert the shapefiles to a netCDF format. Then, the netCDF gridded SPC outlooks were regridded to NCEP grid 4 to conduct verification with local storm reports. Lastly, the machine learning-based forecasts are generated on the NCEP grid. Each of these datasets are then combined in a 'master' netCDF file for easy compression and storage. The master netCDF files additionally have metadata associated with the latitude and longitude points of the grid and forecast day strings.

  2. n

    Data from: Can ingredients based forecasting be learned? Disentangling a...

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated May 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexandra C. Mazurek; Aaron J. Hill; Russ S. Schumacher; Hanna J. McDaniel (2024). Can ingredients based forecasting be learned? Disentangling a random forest's severe weather predictions [Dataset]. http://doi.org/10.5061/dryad.0rxwdbs7w
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 6, 2024
    Dataset provided by
    Florida State University
    University of Oklahoma
    Colorado State University
    Authors
    Alexandra C. Mazurek; Aaron J. Hill; Russ S. Schumacher; Hanna J. McDaniel
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Machine learning (ML)-based models have been rapidly integrated into forecast practices across the weather forecasting community in recent years. While ML tools introduce additional data to forecasting operations, there is a need for explainability to be available alongside the model output, such that the guidance can be transparent and trustworthy for the forecaster. This work makes use of the algorithm tree interpreter (TI) to disaggregate the contributions of meteorological features used in the Colorado State University Machine Learning Probabilities (CSU-MLP) system, a random forest-based ML tool that produces real-time probabilistic forecasts for severe weather using inputs from the Global Ensemble Forecast System v12. TI feature contributions are analyzed in time and space for CSU-MLP day-2 and 3 individual hazard (tornado, wind, and hail) forecasts and day-4 aggregate severe forecasts over a 2-yr period. For individual forecast periods, this work demonstrates that feature contributions derived from TI can be interpreted in an ingredients-based sense, effectively making the CSU-MLP probabilities physically interpretable. When investigated in an aggregate sense, TI illustrates that the CSU-MLP system's predictions use meteorological inputs in ways that are consistent with the spatiotemporal patterns seen in meteorological fields that pertain to severe storms climatology. This work concludes with a discussion on how these insights could be beneficial for model development, real-time forecast operations, and retrospective event analysis. Methods Forecast data: These data include publically available local storm reports (from NOAA), publically available Storm Prediction Center (SPC) outlooks, and forecasts generated from the machine learning prediction system detailed in the manuscript. The local storm reports were retrieved from an online public-facing archive and gridded to NCEP grid 4. The SPC outlooks were originally in a shapefile format and ArcGIS was used to convert the shapefiles to a netCDF format. Then, the netCDF gridded SPC outlooks were regridded to NCEP grid 4 to conduct verification with local storm reports. Lastly, the machine learning-based forecasts are generated on the NCEP grid. Each of these datasets are then combined in a 'master' netCDF file for each forecast leadtime examined in the study (day 2, day 3, and day 4) for easy compression and storage. The master netCDF files additionally have metadata associated with the latitude and longitude points of the grid and forecast day strings. Forecasts span October 2020 through April 2023. Feature contributions: Feature contributions were calculated from the machine learning forecasts described above using the treeinterpreter package for python. For each forecast day for a given lead time and hazard type (tornado, wind, hail, severe), feature contributions are calculated for all environmental predictors in the dataset (~6,600). For each grid point, the feature contributions are summed according to the spatial neighborhood described in the methods of this manuscript for dimensionality reduction purposes. Thus, for a given forecast, the contributions have dimensions of environmental variable, forecast hour, latitude, and longitude. TI contributions corresponding to two years of machine learning forecasts (2021-2022) are combined into single netCDF files for each forecast hazard and lead time (i.e. 7 total files, day 2 tornado, wind, and hail, day 3 tornado, wind, and hail, and day 4 "any severe"). More details on the methods surrounding each of these datasets can be found in the methods section of the manuscript associated with this work.

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Aaron J. Hill; Russ S. Schumacher; Israel L. Jirak (2023). Data for: A new paradigm for medium-range severe weather forecasts: Probabilistic random forest-based predictions [Dataset]. http://doi.org/10.5061/dryad.c2fqz61cv
Organization logo

Data for: A new paradigm for medium-range severe weather forecasts: Probabilistic random forest-based predictions

Related Article
Explore at:
zipAvailable download formats
Dataset updated
Jan 1, 2023
Dataset provided by
National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
Colorado State University
Authors
Aaron J. Hill; Russ S. Schumacher; Israel L. Jirak
License

https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

Description

Historical observations of severe weather and simulated severe weather environments (i.e., features) from the Global Ensemble Forecast System v12 (GEFSv12) Reforecast Dataset (GEFS/R) are used in conjunction to train and test random forest (RF) machine learning (ML) models to probabilistically forecast severe weather out to days 4–8. RFs are trained with ~9 years of the GEFS/R and severe weather reports to establish statistical relationships. Feature engineering is briefly explored to examine alternative methods for gathering features around observed events, including simplifying features using spatial averaging and increasing the GEFS/R ensemble size with time-lagging. Validated RF models are tested with ~1.5 years of real-time forecast output from the operational GEFSv12 ensemble and are evaluated alongside expert human-generated outlooks from the Storm Prediction Center (SPC). Both RF-based forecasts and SPC outlooks are skillful with respect to climatology at days 4 and 5 with diminishing skill thereafter. The RF-based forecasts exhibit tendencies to slightly underforecast severe weather events, but they tend to be well-calibrated at lower probability thresholds. Spatially averaging predictors during RF training allows for prior-day thermodynamic and kinematic environments to generate skillful forecasts, while time-lagging acts to expand the forecast areas, increasing resolution but decreasing overall skill. The results highlight the utility of ML-generated products to aid SPC forecast operations into the medium range. Methods These data include publically available local storm reports (from NOAA), publically available Storm Prediction Center (SPC) outlooks, and forecasts generated from the machine learning prediction system detailed in the manuscript. The local storm reports were retrieved from an online public-facing archive and gridded to NCEP grid 4. The SPC outlooks were originally in a shapefile format and ArcGIS was used to convert the shapefiles to a netCDF format. Then, the netCDF gridded SPC outlooks were regridded to NCEP grid 4 to conduct verification with local storm reports. Lastly, the machine learning-based forecasts are generated on the NCEP grid. Each of these datasets are then combined in a 'master' netCDF file for easy compression and storage. The master netCDF files additionally have metadata associated with the latitude and longitude points of the grid and forecast day strings.

Search
Clear search
Close search
Google apps
Main menu