2 datasets found

Data for: A new paradigm for medium-range severe weather forecasts:...
data.niaid.nih.gov
zenodo.org
+1more
zip
Updated Jan 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aaron J. Hill; Russ S. Schumacher; Israel L. Jirak (2023). Data for: A new paradigm for medium-range severe weather forecasts: Probabilistic random forest-based predictions [Dataset]. http://doi.org/10.5061/dryad.c2fqz61cv
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.c2fqz61cv
Dataset updated
Jan 1, 2023
Dataset provided by
National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
Colorado State University
Authors
Aaron J. Hill; Russ S. Schumacher; Israel L. Jirak
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Historical observations of severe weather and simulated severe weather environments (i.e., features) from the Global Ensemble Forecast System v12 (GEFSv12) Reforecast Dataset (GEFS/R) are used in conjunction to train and test random forest (RF) machine learning (ML) models to probabilistically forecast severe weather out to days 4–8. RFs are trained with ~9 years of the GEFS/R and severe weather reports to establish statistical relationships. Feature engineering is briefly explored to examine alternative methods for gathering features around observed events, including simplifying features using spatial averaging and increasing the GEFS/R ensemble size with time-lagging. Validated RF models are tested with ~1.5 years of real-time forecast output from the operational GEFSv12 ensemble and are evaluated alongside expert human-generated outlooks from the Storm Prediction Center (SPC). Both RF-based forecasts and SPC outlooks are skillful with respect to climatology at days 4 and 5 with diminishing skill thereafter. The RF-based forecasts exhibit tendencies to slightly underforecast severe weather events, but they tend to be well-calibrated at lower probability thresholds. Spatially averaging predictors during RF training allows for prior-day thermodynamic and kinematic environments to generate skillful forecasts, while time-lagging acts to expand the forecast areas, increasing resolution but decreasing overall skill. The results highlight the utility of ML-generated products to aid SPC forecast operations into the medium range. Methods These data include publically available local storm reports (from NOAA), publically available Storm Prediction Center (SPC) outlooks, and forecasts generated from the machine learning prediction system detailed in the manuscript. The local storm reports were retrieved from an online public-facing archive and gridded to NCEP grid 4. The SPC outlooks were originally in a shapefile format and ArcGIS was used to convert the shapefiles to a netCDF format. Then, the netCDF gridded SPC outlooks were regridded to NCEP grid 4 to conduct verification with local storm reports. Lastly, the machine learning-based forecasts are generated on the NCEP grid. Each of these datasets are then combined in a 'master' netCDF file for easy compression and storage. The master netCDF files additionally have metadata associated with the latitude and longitude points of the grid and forecast day strings.
n
Data from: Can ingredients based forecasting be learned? Disentangling a...
data.niaid.nih.gov
datadryad.org
zip
Updated May 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexandra C. Mazurek; Aaron J. Hill; Russ S. Schumacher; Hanna J. McDaniel (2024). Can ingredients based forecasting be learned? Disentangling a random forest's severe weather predictions [Dataset]. http://doi.org/10.5061/dryad.0rxwdbs7w
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.0rxwdbs7w
Dataset updated
May 6, 2024
Dataset provided by
Florida State University
University of Oklahoma
Colorado State University
Authors
Alexandra C. Mazurek; Aaron J. Hill; Russ S. Schumacher; Hanna J. McDaniel
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Machine learning (ML)-based models have been rapidly integrated into forecast practices across the weather forecasting community in recent years. While ML tools introduce additional data to forecasting operations, there is a need for explainability to be available alongside the model output, such that the guidance can be transparent and trustworthy for the forecaster. This work makes use of the algorithm tree interpreter (TI) to disaggregate the contributions of meteorological features used in the Colorado State University Machine Learning Probabilities (CSU-MLP) system, a random forest-based ML tool that produces real-time probabilistic forecasts for severe weather using inputs from the Global Ensemble Forecast System v12. TI feature contributions are analyzed in time and space for CSU-MLP day-2 and 3 individual hazard (tornado, wind, and hail) forecasts and day-4 aggregate severe forecasts over a 2-yr period. For individual forecast periods, this work demonstrates that feature contributions derived from TI can be interpreted in an ingredients-based sense, effectively making the CSU-MLP probabilities physically interpretable. When investigated in an aggregate sense, TI illustrates that the CSU-MLP system's predictions use meteorological inputs in ways that are consistent with the spatiotemporal patterns seen in meteorological fields that pertain to severe storms climatology. This work concludes with a discussion on how these insights could be beneficial for model development, real-time forecast operations, and retrospective event analysis. Methods Forecast data: These data include publically available local storm reports (from NOAA), publically available Storm Prediction Center (SPC) outlooks, and forecasts generated from the machine learning prediction system detailed in the manuscript. The local storm reports were retrieved from an online public-facing archive and gridded to NCEP grid 4. The SPC outlooks were originally in a shapefile format and ArcGIS was used to convert the shapefiles to a netCDF format. Then, the netCDF gridded SPC outlooks were regridded to NCEP grid 4 to conduct verification with local storm reports. Lastly, the machine learning-based forecasts are generated on the NCEP grid. Each of these datasets are then combined in a 'master' netCDF file for each forecast leadtime examined in the study (day 2, day 3, and day 4) for easy compression and storage. The master netCDF files additionally have metadata associated with the latitude and longitude points of the grid and forecast day strings. Forecasts span October 2020 through April 2023. Feature contributions: Feature contributions were calculated from the machine learning forecasts described above using the treeinterpreter package for python. For each forecast day for a given lead time and hazard type (tornado, wind, hail, severe), feature contributions are calculated for all environmental predictors in the dataset (~6,600). For each grid point, the feature contributions are summed according to the spatial neighborhood described in the methods of this manuscript for dimensionality reduction purposes. Thus, for a given forecast, the contributions have dimensions of environmental variable, forecast hour, latitude, and longitude. TI contributions corresponding to two years of machine learning forecasts (2021-2022) are combined into single netCDF files for each forecast hazard and lead time (i.e. 7 total files, day 2 tornado, wind, and hail, day 3 tornado, wind, and hail, and day 4 "any severe"). More details on the methods surrounding each of these datasets can be found in the methods section of the manuscript associated with this work.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Aaron J. Hill; Russ S. Schumacher; Israel L. Jirak (2023). Data for: A new paradigm for medium-range severe weather forecasts: Probabilistic random forest-based predictions [Dataset]. http://doi.org/10.5061/dryad.c2fqz61cv

Data for: A new paradigm for medium-range severe weather forecasts: Probabilistic random forest-based predictions

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5061/dryad.c2fqz61cv

Dataset updated

Jan 1, 2023

Dataset provided by

National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
Colorado State University

Authors

Aaron J. Hill; Russ S. Schumacher; Israel L. Jirak

License

https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

Description

Historical observations of severe weather and simulated severe weather environments (i.e., features) from the Global Ensemble Forecast System v12 (GEFSv12) Reforecast Dataset (GEFS/R) are used in conjunction to train and test random forest (RF) machine learning (ML) models to probabilistically forecast severe weather out to days 4–8. RFs are trained with ~9 years of the GEFS/R and severe weather reports to establish statistical relationships. Feature engineering is briefly explored to examine alternative methods for gathering features around observed events, including simplifying features using spatial averaging and increasing the GEFS/R ensemble size with time-lagging. Validated RF models are tested with ~1.5 years of real-time forecast output from the operational GEFSv12 ensemble and are evaluated alongside expert human-generated outlooks from the Storm Prediction Center (SPC). Both RF-based forecasts and SPC outlooks are skillful with respect to climatology at days 4 and 5 with diminishing skill thereafter. The RF-based forecasts exhibit tendencies to slightly underforecast severe weather events, but they tend to be well-calibrated at lower probability thresholds. Spatially averaging predictors during RF training allows for prior-day thermodynamic and kinematic environments to generate skillful forecasts, while time-lagging acts to expand the forecast areas, increasing resolution but decreasing overall skill. The results highlight the utility of ML-generated products to aid SPC forecast operations into the medium range. Methods These data include publically available local storm reports (from NOAA), publically available Storm Prediction Center (SPC) outlooks, and forecasts generated from the machine learning prediction system detailed in the manuscript. The local storm reports were retrieved from an online public-facing archive and gridded to NCEP grid 4. The SPC outlooks were originally in a shapefile format and ArcGIS was used to convert the shapefiles to a netCDF format. Then, the netCDF gridded SPC outlooks were regridded to NCEP grid 4 to conduct verification with local storm reports. Lastly, the machine learning-based forecasts are generated on the NCEP grid. Each of these datasets are then combined in a 'master' netCDF file for easy compression and storage. The master netCDF files additionally have metadata associated with the latitude and longitude points of the grid and forecast day strings.

Clear search

Close search

Google apps

Main menu

Data for: A new paradigm for medium-range severe weather forecasts:...

Data from: Can ingredients based forecasting be learned? Disentangling a...

Data for: A new paradigm for medium-range severe weather forecasts: Probabilistic random forest-based predictionsSee More Versions

Data for: A new paradigm for medium-range severe weather forecasts: Probabilistic random forest-based predictions