Brightband is releasing Extreme Weather Bench (EWB) - a benchmark suite for evaluating AI and traditional weather forecasting models focused on high-impact events. Verification of AI and traditional weather forecasting model performance on a common ground is crucial. EWB lets the user compare across models and to dive deeply into each event category to know what works well when and what needs improvement.
Brightband and a community of worldwide verification experts, have developed a set of case studies of high-impact weather as well as community driven metrics to compare AIWP and NWP models. Extreme weather such as tornado outbreaks and heatwaves impact the population differently and should be evaluated using metrics appropriate to the event. EWB provides data and metrics appropriate to each category, chosen to answer specific questions about that type of event. For example, knowing the average error in landfall for tropical cyclones is important while knowing the error in temperature and onset of a heat event can dramatically affect the preparation for such an event.
As AI weather models are growing in popularity, it is clear that we need a standardized set of community driven tests that evaluate the models across a wide variety of high-impact hazards. This work builds on the successful approach of WeatherBench by introducing Extreme Weather Bench (EWB). EWB introduces a set of high-impact weather events, spanning across multiple spatial and temporal scales and different parts of the weather spectrum. We provide data to use for testing, standard metrics for evaluation by forecasters worldwide for each of the phenomena, as well as impact-based metrics and code to evaluate the models on the data. EWB is a community system and will be adding additional phenomena, test cases and metrics in collaboration with the worldwide weather and forecast verification community.
Extreme Weather Bench (EWB) provides a community curated set of high-impact weather events, data, and metrics, including code to compute the metrics and to plot the results across the full set of test cases. EWB phenomena span across multiple spatial and temporal scales and different parts of the weather spectrum, ranging from short-term and small-scale impacts such as severe storms to long-term and larger-scale impacts such as drought. With the goal of creating a dataset with global impact, we provide case studies and data for events around the world. The metrics are standard metrics, used for evaluating the specific high-impact phenomena. The impacts-based metrics focus on the primary impacts of the event being predicted. For example, when predicting a tropical cyclone (TC), it is important to not only predict the strength of the TC correctly but also to predict the speed at which it strengthens, the time and place it will impact people by hitting land, and the associated rainfall and other severe impacts that it will generate. EWB is a community system and will be adding additional phenomena, test cases and metrics in collaboration with the worldwide weather and forecast verification community.
If you have suggestions of improvements or additions, please contact hello@brightband.com. Through this worldwide community involvement, the goal is that EWB will be useful as a standard testing set across AI models.