Frequency Function Uncertainty
A frequency function is another way of describing a cumulative distribution function. Often, these functions are expressed in terms of exceedance probability so that the frequency function is actually an inverse cumulative distribution function. We often use frequency functions to represent the distribution of flows, stages, and damage.
Defining uncertainty about the user-provided frequency function is a critical part of the uncertainty analysis in HEC-FDA. This uncertainty will be defined differently, depending on whether the function is analytical or graphical.
Analytical Discharge-Frequency Uncertainty
Analytical discharge-frequency functions represent the relationship between annual maximum streamflow values and the values' probability of being exceeded. In other words, a discharge-frequency function is an inverse cumulative distribution function of annual maximum streamflow values. This relationship is often derived from the statistical properties (mean, standard deviation, and skew) of an annual maximum streamflow record. A streamflow record represents an incomplete sample and therein a fuzzy picture of the range of possible flow values. The length of this record (a.k.a. period of record or sample size) is used to parameterize the uncertainty about the discharge-frequency function. Why? The smaller the sample, the fuzzier is the picture, and the more uncertainty there is about the true range of possible flow values and therein the true inverse cumulative distribution function.
The methods used to compute uncertainty about analytical discharge-frequency functions in HEC-FDA Version 1.4.3 are based on the guidelines for determining flood flow frequency described in USGS Bulletin 17B. The USGS Bulletin 17B method incorporates uncertainty about the mean and standard deviation of the distribution, but not skew. The guidelines for determining flood flow frequency have been since updated, and are documented in USGS Bulletin 17C. The USGS Bulletin 17C approach allows for uncertainty in skew in addition to the mean and standard deviation.
HEC-FDA Version 2.0 relies on a new method for calculating the uncertainty about an analytical discharge-frequency function that is consistent with the USGS Bulletin 17C approach. In HEC-FDA Version 2.0, a user specifies an analytical discharge-frequency function (the median function) as a Log Pearson Type III distribution based on the mean, standard deviation, and skew of the original period of record data, as well as the record length . The implementation for calculating the uncertainty about that function then involves fitting a new discharge-frequency function to a bootstrapped sample of flow values drawn from the user-entered Log Pearson Type III distribution. This procedure occurs once within each iteration (realization) of a compute, and has the effect of allowing the mean, standard deviation, and skew of the discharge-frequency distribution to shift from one iteration to the next. The new procedure occurs in the following steps:
- An array of size n, the size of the period of record, is filled a random number p_{i} \forall i = \{1, \ldots, n\} such that 0 < p_{i} < 1.. A random number p_{i} represents a cumulative probability.
- A flow value f_{i} is calculated using the inverse cumulative distribution function of L for probability p_{i} \forall i = \{1, \ldots, n\}, generating a sample of flows of size n. f_{i} is calculated using the inverse cumulative distribution function of L for probability p_{i} as follows:
- Ifp_{i} <= 0, then reset p_{i} = 0.000000000001.
- Ifp_{i} >=1, then reset p_{i} = 0.999999999999.
- The calculation proceeds differently based on the skewness b of distribution L.
If b = 0, then the Log Pearson Type III distribution with mean μ, standard deviation σ, and b = 0 is mathematically equivalent to a Normal distribution with same mean μ and the same standard deviation σ. Flow f_{i} is calculated as
f_{i} = 10^{\left(Z_{p_{i}} \times \sigma + \mu\right)} where Z_{p_{i}} is the standard normal deviate of the probability p_{i}.
Otherwise, f is calculated as follows:
w_{i} = \frac{(Z_{p_{i}} - \frac{b}{6})\times b}{6} + 1 \\ k_{i} = \frac{2}{b} \times \left(w_{i}^{3} - 1\right) \\ f = 10^{(\mu + k_{i} \times \sigma)} where Z_{p_{i}} is the standard normal deviate of the probability p_{i}.
- The logarithm (base 10) of each flow value f_{i} is calculated.
- The mean, standard deviation, and skew of the sample of logged flows is calculated.
- A new Log Pearson Type III distribution is created based on the sample's mean, standard deviation, and skew. The new distribution is used for one iteration of the compute, then discarded once the iteration's results are stored.
The above is repeated with each iteration so that skewness shifts each iteration until convergence, including the skewness in the calculated stage- and damage-frequency functions.
Graphical Frequency Uncertainty
The "Less Simple" method is based on an asymptotic approximation used to extrapolate the order statistics method. This approximation is based more closely on the features of the order statistics method, and provides a result with the same characteristics, such as smaller uncertainty when the frequency curve is flatter; and larger uncertainty when it is steeper.
Before the uncertainty can be calculated about an input graphical frequency function, the input graphical frequency function must first be extrapolated to the 0.0001 and 0.9999 AEPs so that the function is defined for something close to the entire range of probability, and then interpolated between the user-provided coordinates.
- Extrapolation happens differently at the more frequent end of the curve versus the infrequent end. At the frequent end, the extrapolated stage or flow is 0.1% less than the most frequent stage or flow provided by the user. In other words, the more frequent end of the frequency curve is extrapolated flat. At the less frequent end of the curve, the stage or flow for the 0.0001 AEP is identified by extrapolation using the slope between the two least frequent coordinates provided by the user. The extrapolation at the less frequent end and dependent calculation of slope takes place in z-space. In other words, when thinking of rise over run where rise is change in flow or stage and run is change in probability, the change in probability is calculated as the change in the corresponding z-scores from a standard normal distribution at those probabilities.
- Interpolation between coordinates provided by the user also takes place in z-space.
- Extrapolation and interpolation happen at the pre-defined required exceedance probabilities, plus any user provided probabilities if not already among the required probabilities. The required probabilities used in the compute can be examined on GitHub.
It is important that the extrapolation and interpolation steps take place before calculating uncertainty because local uncertainty should depend solely on local slope. The completely interpolated and extrapolated function is taken as the fully-specified mean function. Once the user-provided graphical frequency function has been interpolated and extrapolated at the required exceedance probabilities, the uncertainty can be calculated. The uncertainty is calculated by identifying the standard deviation of flow or stage at each exceedance probability. The standard deviation is calculated using the Less Simple Method. Below, the Less Simple variance of the quantile S_Y^2 is described by the variance of a sample count for the Binomial distribution, where S_X^2 = p(1-p)n. Conversion to a proportion, and use of a first order Taylor expansion, provide:
S_Y^2=\frac{p(1-p)}{n f(y)^2} |
where S_Y is the standard deviation of the uncertainty distribution, p is the non-exceedance probability, n is the record length, and f(y) is the probability density function (PDF) for variable Y, derived from the frequency curve of interest. Note that the PDF is the inverse of the slope of the frequency curve, which is a CDF with probability on the horizontal axis. Because the "Less Simple" method is based on the inverse slope of the CDF or frequency curve, it can get very large for the extreme ends of curve (where a CDF slope approaches zero). In order to avoid unreasonable results, the standard deviation computed for the 0.01 quantile is used all quantiles larger (p < 0.01), and the value for the 0.99 quantile is used for all quantiles smaller (p > 0.99). The standard deviations calculated using the equation above, after being held constant beyond the 0.01 and 0.99 AEP stages or flows, are paired with the mean flows or stages from the fully-specified mean function, and used together to specify Normal distributions for each probability of exceedance.
A final detail of note is the monotonicity forcing mechanism for sampled graphical frequency functions. All summary relationships in HEC-FDA are sampled among the uncertainty and forced to be monotonically increasing for situations where the quantiles are not monotonically increasing over the domain. This monotonicity forcing typically takes place "from the bottom, up" so that decreases in the dependent variable are adjusted to be constant for increases in the independent variable. Graphical frequency uncertainty is unique. For sampled graphical frequency functions above the input function, monotonicity must be forced "from the top, down" so that the maximum of the uncertainty for the least frequent coordinate reflects the maximum of uncertainty for the entire function. This is particularly impactful for frequency functions that are rather flat for several infrequent AEPs, demonstrating little uncertainty at the top end of the curve. If the middle portion of the curve has a lot of uncertainty, then the middle portion must be truncated by the low uncertainty at the high portion, which takes place by using the high portion as the starting place, and making sure that the dependent variable is decreasing for decreases in the independent variable.