DTC Ensemble Testbed
Design Plan
Verification, Module 6
Objectives and Functionality
Assessment activities including quantitative verification will be a critical role for the DET. The utilities that will facilitate that role will be both retrospective and near-real-time in nature, the former to accommodate both ambitious re-analysis and sensitivity studies similar to several ongoing tasks in the DTC, and the latter to complement real-time forecasting demonstration projects such as those currently operating in HMT and HWT collaborations with the DTC.
The tools to achieve verification tasks for the DET will of necessity focus primarily on probabilistic verification techniques. Within the broad scope of these techniques, three roughly separate areas of development are necessary: 1) ensemble preparation and processing; 2) score definition and computation; and 3) results display. This section focuses on needs specific for verification; other post-processing tasks are described in Modules 4 and 5.
As will be emphasized in a later section, there are many tasks and techniques that are common to the objectives of the DET as well as to those of several other projects of the DTC. It is important that these common areas are carefully considered in the course of development of DET plans, both to leverage as much effort as possible and to keep the DTC components as inter-connected as possible. With respect to the verification plans of the DET, the HMT and HWT accomplishments and infrastructure are of particular interest. Both have designed and implemented demonstrations of real-time verification systems for forecast exercises in these testbeds that have heavily leveraged off each other and that can be considered as prototypes for the DET verification module. Since both have employed MET verification utilities, and MET is in fact a DTC-sponsored and funded package, it is proposed that MET will continue to be the principal verification software for DET. However, DET will require additional utilities not yet available in MET, so that it is also important to investigate other interesting packages and techniques that are available for either adaptation or application within MET or for stand-alone use by themselves. The next section describes MET and a few other leading product candidates and their potential usefulness for the DET.
Existing Verification Packages
1. MET
Existing utilities in MET include a large set of scoring and data-ingest options, both for probabilistic and deterministic forecasts. For probabilistic uses, it now incorporates scores including the Brier score and its decomposition. It also includes the computation of the Receiever Operator Characteristic (ROC) Curve, the area under the ROC, the points for the reliability diagram, calibration, refinement, likelihood, base rate and ranked probability scores.
A Brier Skill Score can be calculated assuming the sample climatology and using the decomposition of the Brier Score. A method to define or ingest a standard climatology field from which skill scores may be produced is not included at this time but is feasible. MET routines require that user probabilities must be produced off-line and ingested into the MET workflow. However, basic capability within MET includes a module to process ensemble model data and thereby produce the simple arithmetically defined probabilities and other parameters used in probabilistic forecast verification.
On the evaluation post-processing side, a prototype version of ‘METViewer’, database and display package based on ‘R’ statistical routines and graphics, is available for use by DET. This package currently allows for the calculation of median values across user-defined stratifications (i.e. time, region of interest, thresholds, etc…) for all statistics currently available in MET. METViewer will also display the accumulated rank histograms across user-defined stratifications. Displays of aggregated statistics as well as Reliability, Attribute, and ROC diagrams are still to be added.
The DET will have an excellent opportunity to steer development of both MET and METViewer for ensemble related verification needs. As part of that process, it will probably become incumbent on DET personnel to help with development of scoring algorithms and with programming tasks related to display of results. A quite complete set of probabilistic verification software has already been designed and installed in ‘R’ at RAL for DTC; an immediate question is how to best integrate these routines for use with MET. It is quite likely that between existing and planned DTC development, most of the direct needs of the DET can be satisfied.
In addition to previous use in the DTC, other large advantages of MET are the pool of expertise available locally, its close relationship to development of the WRF model, its highly conversant relationship to innovative techniques (e.g., object-based verification), and the growing use of MET at other agencies.
MET website: http://www.dtcenter.org/met/users/
Real-time MET and ‘R’ application:
http://verif.rap.ucar.edu/eval/hmt/2011/graphics/; http://verif.rap.ucar.edu/eval/hmt/2010/graphics/; http://verif.rap.ucar.edu/eval/hwt/2010/graphics/; http://verif.rap.ucar.edu/eval/hmt/2009/graphics/
2. NEVS at the Aviation Branch at GSD
NEVS and its predecessors at GSD have a long record of verification activity, and present emphasis is on the coupling of verification utilities to future data and modeling structures (‘the data cube’). Probabilistic verification has not been a focus for NEVS in the past. Previous activity in AB has been almost exclusively directed toward aviation needs, including flight-relevant parameters (icing, turbulence, ceiling, etc.). A few tools have been produced in the past on a project-need basis (e.g., reliability plots for convective probability forecasts). Collaboration with NEVS may become beneficial in the future as it becomes necessary to interact with the data cube.
3. AWIPS and ALPS Capabilities
AWIPS development at GSD related to ensembles has emphasized interactive displays through the Advanced Linux Prototype System (ALPS). Manipulation of ensemble members, with some graphical statistics utilities (e.g., quantile time series) is available along with options for the selection of ensemble members. The quick sub-setting and recalculation and display of ensemble products like mean and probability are particularly nice, and could serve as useful ideas for DET projects. However, no scoring capabilities are available or planned, and convenient access to AWIPS is something of an obstacle.
4. National Precipitation Verification Unit (NPVU)
This web-based verification utility has a several-year history at Office of Hydrologic Development (OHD), but it is not designed for true probabilistic verification or display. During the HMT/DTC winter exercise, parts of the real-time statistical summary displays were designed with the monthly NPVU bar charts in mind, since they were familiar to western forecasters and presented a convenient way to compare performance of individual ensemble members and to summarize distinct periods of precipitation. As the name suggests, this site and its utilities have relevance only to precipitation.
Website: http://www.hpc.ncep.noaa.gov/npvu/
5. Verification for HRRR convective probability forecasts
In support of HRRR ensemble convective forecasts, display products have been developed to visualize the set of multiple lead times and initialization times for convective probabilities. In development are tabulations of a wide array of scoring algorithms and skill scores. As of yet, the skill scores based on these forecasts use relatively simple (constant) standard probabilities.
Website: http://ruc.noaa.gov/hcpf/hcpf_verif.cgi
6. NCEP/EMC operational and developmental probabilistic forecast verification
The set of probabilistic verification utilities now installed in the real-time product stream at NCEP will be very valuable for DET verification activities. Since initial benchmarking for the DET will focus on NCEP capabilities, the verification infrastructure there will also serve as an initial set of products to emulate. The section below on verification products will describe these in greater detail.
Useful website: https://ams.confex.com/ams/pdfpapers/131645.pdf
7. OHD verification
An important emphasis of the Office of Hydrologic Development (OHD) is the development of ensemble hydrologic forecasts, including verification. Of course, precipitation is an over-riding factor in hydrologic forecasts, so QPF and QPE verification are necessarily a part of the OHD program. QPF is also an early focus of the DET so facilitating interactions with OHD and other hydrology agencies is an important element for early planning.
Useful websites describing the OHD forecast and verification system are:
http://hydis8.eng.uci.edu/hepex/Workshops/postprocesswksp/POSTERS/Demargne-33.pdf
http://hydis8.eng.uci.edu/hepex/testbeds/Verification.htm
8. ‘R’ Utilities
The ’R’ package is an open-source set of statistical utilities that serve both computational and display functions. Since it is user-supported, many of the utilities are developed for specific applications, including probabilistic and other verification topics. An existing set of routines (see website below) has been developed by Matt Pocernich, at RAL/JNT, and others specifically for ensemble and other probabilistic meteorological applications. These routines include scoring and display for more-or-less standard scores and diagrams such as Talagrand and other Attributes Diagrams; Brier and other skill scores (Heidke; Kuiper; etc) and their decompositional products (reliability; resolution; uncertainty) ; ROC curves and area under the ROC curve; RPS and CRPS scores; and other more innovative scoring options. Since this package is subject to revision and addition, it should be emphasized that its documentation will also intermittently change. Website: http://cran.r-project.org/web/packages/verification/verification.pdf
Verification Datasets
Increasingly, the impact of verification dataset choices on verification results has become a topic of interest. In many practical cases, there are no choices to be made. To facilitate comparison with other centers, the DET will initially make available the principal verification data streams available at NCEP including the RTMA, operational radiosondes, radar products, precipitation gages, and eventually satellite products. Where choices are available (e.g., non-operational rain gage networks), options to individually select verification data will be offered. Data quality evaluation for these verification sets will be primarily a user responsibility, but could eventually become a DET activity if determined to be warranted.
Scoring Algorithms and Display
Eventually most of the probabilistic scores that have been considered valuable at other centers should become available as part of the DET verification module. Initially, however, a set of highest priority scores and display products will be developed with input from the potential user community. These will include, for instance, Brier skill scores, rank probability and cumulative rank probability scores, and decompositional products such as reliability and resolution. The first set of visual displays will include ROC curves, Talagrand diagrams, rank histograms, and reliability curves. Binned spread-skill, ROC skill score, and economic value diagrams will be added when available. Additional ensemble displays not specifically related to verification needs but useful nonetheless for qualitative evaluation are described in the display products module of this plan.
User Group Verification Packages
While numerical weather prediction from the EMC and other operational and research forecasting groups have been the principal active proponents for the DTC and the DET, there are several other distinct user groups for which the DET could have potential value. Four such users are the aviation, hurricane, severe weather, and hydrology communities. For each, verification needs could be significantly different from those of the other, and from those of the weather forecasting agencies. For instance, verification for hydrology will be particularly directed toward QPF and will often take the form of time series validation instead of gridded fields. Aviation users will likely need to provide verification for derived aviation-relevant variables (icing and turbulence; visibility) that may not be routinely produced by numerical weather models. The probability of track and intensity are of most interest to the hurricane community. Likewise, the probability of convective initiation leading to hail, wind, and tornadic outbreaks is important to the severe weather community. An additional consideration is data formats between user groups are not identical. To accommodate these users as much as possible, fact-finding visits and meetings will need to be arranged to identify the possible areas of interest and to specify sets of metrics that the DET could provide.
Timeline and Milestones
August 2010 – Verification module plan presented and reviewed by WRF ensemble working group
January 2011 – Initial probabilistic scoring utilities demonstrated as part of HMT winter exercise
March 2011 – Written module 6 plan for verification is completed as part of overall DET planning process
April 2011 – Prototype partial DET verification workflow and display utility that closely emulates the HMT parallel structure is assembled and applied to extended-CONUS test runs of HMT-based ensemble
June 2011 – Prototype real-time web-based verification site with basic capabilities completed
August 2011 – User group meetings in DC are arranged and held to integrate verification subsets for major potential DET users
September 2011 – Verification utility installed as full module in end-to-end DET workflow
Working Group Recommendations
(Mod6.1) Do we need to be discussing the inclusion of MET (or MET-like capability) in AWIPS II?
Based on this recommendation, we will consider NexGen requirements more closely.
(Mod6.2) What is a prioritized list of ensemble-relevant verification products?
DET will explore verification ideas recommended by the Working Groups including basic tools., time-space scales, decomposition tools, account for various needs of users, etc.
(Mod6.3) How should we filter through ideas provided by other workshops (i.e., RAL Verification Workshop 11/2010; WMO Verification tutorials; others?)
DET accept the recommendations to focus on sessions for ensemble verification at workshops and conduct “Literature review” by aware individual
(Mod6.4) What analysis fields and/or observation data should we consider for verification?
The Working Group recommended both, to use ensemble analyses (e.g. Torn and Hakim), disclaimer for uncertainty, etc. DET will explore the list of recommended ideas.