Download PDF
Download page Calibration Summary Statistics.
Calibration Summary Statistics
HEC-HMS calculates and displays four summary statistics to quantify model performance compared to observations. Statistics include Nash-Sutcliffe Efficiency (NSE), Ratio of the Root Mean Square Error to the Standard Deviation Ratio (RSR) and Percent Bias (PBIAS) (Moriasi, et al., 2007), as well as Coefficient of Determination (R2) (Legates and McCabe, 1999). These statistics are summarized in the Table below.
Criterion | Equation | Notes | ||||||
|---|---|---|---|---|---|---|---|---|
Nash-Sutcliffe Efficiency (NSE) |
|
| ||||||
Ratio of the Root Mean Square Error to the Standard Deviation Ratio (RSR) |
|
| ||||||
Percent Bias (PBIAS) |
|
NOTE: PBIAS sign convention in HEC-HMS is opposite from the sign convention in Moriasi, 2007 | ||||||
Coefficient of Determination (R2) |
|
| ||||||
Modified Kling Gupta Efficiency (MKGE) |
|
|
Variables :
- Y_i^{obs} = ith observation
- Y_i^{sim} = ith simulated value
- \bar Y_{obs} = the mean of observed data
- \bar Y_{sim} = the mean of simulated data
- n = total number of observations
- r = correlation coefficient between simulated and observed runoff (dimensionless)
- \beta = bias ratio (dimensionless)
- \gamma = variability ratio (dimensionless)
- CV = coefficient of variation (dimensionless)
- \sigma = standard deviation
- The indices s and o represent simulated and observed runoff values, respectively.
HEC-HMS also reports observed and computed maximum flow, time of peak and total volume. These measures are also useful in the calibration process.

As a reminder, the following basic statistical measures are useful for this discussion:
- Residual variance = sum of squared differences between the observed and simulated values = \sum_{i=1}^{n}(Y_i^{obs} - Y_i^{sim})^2
- Measured data variance = sum of squared differences between the individual observed values and the mean of the observed value = \sum_{i=1}^{n}(Y_i^{obs} - Y_{obs}^{mean})^2
- Standard deviation (\sigma) is the square root of variance
Performance ranges
Suggested model performance ranges of the four summary statistics for evaluating streamflow, adapted from Moriasi et all, 2007 and 2015, are summarized in the Table below. Note that these are derived for continuous flow data at daily and monthly time steps at watershed scale.
Performance Rating | NSE | RSR | PBIAS (%) | R2 |
|---|---|---|---|---|
Very Good | 0.80 < 𝑁𝑆𝐸 ≤1.00 | 0.00 < 𝑅𝑆𝑅 ≤ 0.50 | |𝑃𝐵𝐼𝐴𝑆| < ±5 | 0.85 < R2 ≤ 1.00 |
Good | 0.70 < 𝑁𝑆𝐸 ≤ 0.80 | 0.50 < 𝑅𝑆𝑅 ≤ 0.60 | ±5 < |𝑃𝐵𝐼𝐴𝑆| ≤ ±10 | 0.75 < R2 ≤ 0.85 |
Satisfactory | 0.50 < 𝑁𝑆𝐸 ≤ 0.70 | 0.60 < 𝑅𝑆𝑅 ≤ 0.70 | ±10 < |𝑃𝐵𝐼𝐴𝑆| ≤ ±15 | 0.60 < R2 ≤ 0.75 |
Unsatisfactory | 𝑁𝑆𝐸 ≤ 0.50 | 𝑅𝑆𝑅 > 0.70 | |𝑃𝐵𝐼𝐴𝑆| ≥ ±15 | R2 ≤ 0.60 |
Performance ranges for MKGE are not provided in HEC-HMS. Using the mean flow as a predictor results in a NSE = 0 and a MKGE = 1 - \sqrt2 = -0.41. MKGE values greater than -0.41 indicate that the model's performance is better than the mean flow. NSE and MKGE values cannot be directly compared because the relationship depends in part on the coefficient of variation of the observed time series (Knoben et al., 2019). Modelers should analyze the MKGE components (correlation coefficient, bias ratio, and variability ratio) to better understand the model error.
"What is an acceptable statistical metric?" is an extremely common modeling question and one that elicits considerable debate within the water resources community. The values presented above and their qualitative interpretations should NOT be blindly used without first considering numerous factors. These considerations include, but are not limited to, the following:
- What is the variable(s) of interest (e.g., flow, reservoir pool elevation, snow water equivalent, sediment concentration, etc.)?
- How often are observations taken (e.g., 15 minutes, 1 day, 2 weeks, etc.)?
- What computational time step was used to compute the model results?
- How much uncertainty is present within the observed data and boundary conditions?
- How will model results be used (e.g., $billion infrastructure investment, flood forecasting, evaluation of parameter sensitivity, etc.)?
- What are the time and funding constraints (i.e., can 6 months be invested to calibrate/validate model processes and parameters or must these results be useable within 15 minutes)?
Each modeling application should document which statistical metrics were used to evaluate model performance, their quantitative/qualitative interpretations, and why this was considered appropriate.