Distribution fitting is the art of choosing a probability model for an unknown and unknowable population, and calibrating that model using a representative sample from the population. Such a model allows for inferences about the population to be made despite not knowing all of its properties. Uncertainty will always be part of the inference because of a limited sample size. However, choice of an appropriate model for the population can result in better inferences about its properties.

Calibration of the selected model, also called parameter estimation in the distribution-fitting context, relies on the assumption that the observed data are in some way representative of the larger population. HEC-SSP provides users with three distribution fitting methods within the Distribution Fittiny Analysis: Standard Product Moments, and Maximum Likelihood Estimation. For moments-based methods discussed below, the assumption is that the moments of the sample are equal to the moments of the population. Any method for estimating model parameters for the population that makes this assumption is called the Method of Moments.

Moments are a numerical description of the shape of a dataset or model, and two types of moments are used within HEC-SSP for this purpose. The first is the more common product moments and the other is linear moments or L-moments (Hosking, 1990). Both product moments and L-moments describe various shape properties of a dataset or model, as shown in Table 1. For samples, the difference between the two methods is that product moments give equal weights to transformations of observations and L-moments give unequal weights to order statistics of observations based on the rank of the observation. Functionally, the difference is that L-moments give less weight to extreme observations. They are sometimes thought to be a more robust alternative to product moments, especially for higher-order moments (e.g. skew and kurtosis). Additionally, in cases where higher-order product moments do not exist (e.g. variance and higher moments of a generalized extreme-value distribution with a shape parameter ≤-12 ) L-moments exist as long as the mean is finite. The order of a moment is associated with a particular shape descriptor and in general, population moments of a higher order are harder to estimate from a small sample. Moments of order greater than 4 do exist but are rarely used in practice.

Table 1. Description of Properties and Moments

Order	Property	Product Moment	Linear Moment
1	Central Tendency	Mean	L-Mean
2	Dispersion	Variance	L-CV
3	Asymmetry	Skew	L-Skew
4	Tail Thickness	Kurtosis	L-Kurtosis

Unique moment-based estimators for the parameters of a probability distribution are achieved by solving a system of equations. Moments of a probability distribution can be computed from the probability density function, which result in equations for the moments in terms of the parameters of the distribution. Solving for the parameters in terms of the moments results in the moments-based estimators. Generally, the number of moments needed to estimate the parameters of a distribution is equal to the number of parameters in the distribution. As a guideline, higher-order moments are generally associated with shape parameters of a distribution if it has them; second-order moments are generally associated with scale parameters, and the first-order moment is generally associated with the location parameter. Moments of a higher order than the number of distribution parameters are usually fixed (with exceptions).

The Maximum Likelihood Estimation (MLE) method maximizes a likelihood function such that the observed data is most probable under an assumed statistical model. The set of model parameters that maximizes the likelihood function is called the maximum likelihood estimate. MLE has a few advantages over moment-based methods. First, estimates generated from method-based moments are not always sufficient statistics. A statistic is sufficient if it provides as much information about the parameters of the distribution as the sample data. Second, estimates given by moment-based methods may be outside the parameter space for small sample sizes. This problem does not occur when using MLE.

Currently, 19 probability distributions are available for use within the Distribution Fitting Analysis. These distribution choices range from simple (e.g. uniform distribution) to complex (e.g. four-parameter beta distribution). Each of the probability distributions available in the Distribution Fitting Analysis is continuous but has varying support. A short description of each distribution and some common uses are contained below. Within each description, common notation for the probability density, cumulative distribution, and quantile functions will be utilized as shown in Table 2. Additionally, common notation for the moments of each distribution will be utilized as shown in Table 3.

Table 2. Notation for Density, Distribution, and Quantile Functions.

Probability Density Function	$\begin{array}{l}f(x \mid \theta)=\end{array}$
Cumulative Distribution Function	$\begin{array}{l}F(x \mid \theta)=\end{array}$
Quantile Function	$\begin{array}{l}F^{-1}(p \mid \theta)=\end{array}$

Table 3. Notation for Moments.

Mean	$\begin{array}{l}E[X]=\end{array}$
Variance	$\begin{array}{l}Var[X]=\end{array}$
Skewness	$\begin{array}{l}Skew[X]=\end{array}$

Beta Distribution

The Beta Distribution is a two-parameter distribution with continuous support on the closed interval [0, 1]. In practice, the Beta Distribution is used to model continuous outcomes where the minimum and maximum values are known to be 0 and 1, respectively. Random variables on any arbitrary closed interval [a, c] can be transformed using an affine transformation:

$\begin{array}{l}y=\frac{x-a}{c-a}\end{array}$

Where x is any random variable bounded by the interval [a, c] and y is a random variable bounded by [0, 1]. This transformation makes the Beta useful for effectively truncating a random variable by fixing the interval over which outcomes may occur. Take for example a seasonality analysis in which the frequency of events occurring by month has a bell-curve shape. While the Normal Distribution may provide a good fit to the data due to their shape, outcomes less than 1 or greater than 12 do not make sense, and a Normal model would have to be truncated. Instead, using an affine transformation would allow the Beta Distribution to be used as a model for the outcomes.

Bayesian inference places great importance on the Beta Distribution as a means to describe information contained in a proportion or probability. This is useful for describing one’s opinion about the probability of an event occurring including uncertainty, also called a prior distribution.

The Beta also appears in a project/program management technique called PERT, which leads to a particular but different parameterization of the Beta Distribution being called the PERT Distribution.

The Beta Distribution has two parameters, α and β, which are both shape parameters. Both α and β are strictly positive (α, β > 0). However, a case that sometimes appears in Bayesian statistics as a prior is when α = β = 0, which is an improper (degenerate) distribution sometimes referred to as the Haldane prior. When α > β, the distribution is left-skewed, when α = β the distribution is symmetrical, and when α < β the distribution is right-skewed. In the case of α = β = 1, the distribution reduces to the Uniform Distribution.

The Beta Distribution uses the Probability Density Function and Cumulative Distribution Function (the Quantile Function has no closed form) as shown in Table 4. Both parameters of the distribution control all of its moments. Fitting the Beta Distribution to a sample using the Method of Moments requires solution of a system of equations. The first three moments of the Beta Distribution are shown in Table 5.

Table 4. Beta Density, Distribution, and Quantile Functions.

Probability Density Function	$\begin{array}{l}f(x \mid \alpha, \beta)=\frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha, \beta)}\end{array}$ where B(∙,∙) is the Beta Function.
Cumulative Distribution Function	$\begin{array}{l}F(x \mid \alpha, \beta)=\frac{\int_{0}^{x} t^{\alpha-1}(1-t)^{\beta-1} d t}{B(\alpha, \beta)}\end{array}$ where B(∙,∙) is the Beta Function. The numerator is the Incomplete Beta Function.
Quantile Function	No closed form.

Table 5. Beta Distribution Moments.

Mean	$\begin{array}{l}E[X]=\frac{\alpha}{\alpha+\beta}\end{array}$
Variance	$\begin{array}{l}\operatorname{Var}[X]=\frac{\alpha \beta}{(\alpha+\beta)^{2}(\alpha+\beta+1)}\end{array}$
Skewness	$\begin{array}{l}\operatorname{Skew}[X]=\frac{2(\beta-\alpha) \sqrt{\alpha+\beta+1}}{(\alpha+\beta+2) \sqrt{\alpha \beta}}\end{array}$

4-Parameter Beta Distribution

The 4-Parameter Beta Distribution is identical to the Beta Distribution above, except it is over any interval [a, c] and the interval endpoints a and c are estimated from the data. It is also sometimes called the Pearson type I Distribution. Because this distribution has four parameters, any moments-based parameter estimation method requires involvement of the fourth moment (kurtosis). Population kurtosis is exceedingly challenging to estimate from a small sample, even more so than skew (the third moment.) Thus, it should be cautioned that if the user knows the data endpoints in the first place that using the 2-parameter Beta Distribution with an affine transformation of the data could produce better results.

The 4-Parameter Beta Distribution uses the Probability Density Function and Cumulative Distribution Function (the Quantile Function has no closed form) as shown in Table 6. Fitting the 4-Parameter Beta Distribution to a sample using the Method of Moments requires solution of a system of equations that includes the fourth moment (kurtosis.) The first three moments are contained within Table 7.

Table 6. 4-Parameter Beta Density, Distribution, and Quantile Functions.

Probability Density Function	$\begin{array}{l}f(x \mid \alpha, \beta, a, c)=\frac{y^{\alpha-1}(1-y)^{\beta-1}}{B(\alpha, \beta)}\end{array}$ Where B(∙,∙) is the Beta Function $\begin{array}{l}y=\frac{x-a}{c-a}\end{array}$ is the affine transformation.
Cumulative Distribution Function	$\begin{array}{l}F(x \mid \alpha, \beta, a, c)=\frac{\int_{0}^{y} t^{\alpha-1}(1-t)^{\beta-1} d t}{B(\alpha, \beta)}\end{array}$ The numerator is the Incomplete Beta Function.
Quantile Function	No closed form.

Table 7. 4-Parameter Beta Distribution Moments.

Mean	$\begin{array}{l}E[X]=\frac{\alpha c+\beta a}{\alpha+\beta}\end{array}$
Variance	$\begin{array}{l}\operatorname{Var}[X]=\frac{\alpha \beta(c-a)^{2}}{(\alpha+\beta)^{2}(\alpha+\beta+1)}\end{array}$
Skewness	$\begin{array}{l}\operatorname{Skew}[X]=\frac{2(\beta-\alpha) \sqrt{\alpha+\beta+1}}{(\alpha+\beta+2) \sqrt{\alpha \beta}}\end{array}$

Empirical Distribution

The Empirical Distribution is a non-parametric method for estimating the Cumulative Distribution Function of a set of data. It should be noted that this distribution is not an analytical distribution. However, it is included here for convenience.

In literature, the Empirical Distribution is sometimes called the Empirical Cumulative Distribution Function, abbreviated as eCDF. Construction of an eCDF assumes that all observations are equally likely and assigns some probability to each based on the sample size of the dataset. A traditional choice is to assign a probability of $\begin{array}{l}\frac{1}{n}\end{array}$ to each observation, where n is the sample size. Then, the eCDF at each observation is the discrete sum $\begin{array}{l}F(x)=\sum_{i=1}^{j}\frac{1}{n}\end{array}$ where j is the rank of the data (ascending, j = 1 is the smallest); which simplifies to $\begin{array}{l}(x)=\frac{j}{n}\end{array}$ . This results in the largest observation (j = n) having a non-exceedance probability of 1. In most cases, it is not reasonable to assume that the largest observation on record is the population maximum. A compromise is to use the plotting position formula $\begin{array}{l}f(x \mid A, B)=\frac{j-A}{n+1-A-B}\end{array}$ where A and B are user-selected constants with specific motivations; see Table 18.3.1 of the Handbook of Hydrology for several examples (Maidment, 1993). The Distribution Fitting Analysis includes three commonly-used default values; A = B = 0 is the Weibull plotting position, A = B = 0.3175 is the Median plotting position, and A = B = 0.5 is the Hazen plotting position. Other defensible choices depending on use include the Gringorten plotting position A = B = 0.44, the Blom plotting position A = B = 0.375, and the Cunnane plotting position A = B = 0.4. Note that this is not an exhaustive list. Also note that, by default, plotting positions (and other values) are output using exceedance probability and rank is defined descending (j = 1 is largest), which corresponds to the Complementary Cumulative Distribution Function, also known as the Survival Function.

Exponential Distribution

The Exponential Distribution is a one-parameter, positively-skewed distribution with semi-infinite continuous support for all non-negative real numbers; x ∈ [0, ∞).

Due to its one parameter and fixed positive skewness of 2, the Exponential is not a flexible distribution. However, for particular kinds of modeling, the Exponential possesses a property that no other continuous distribution does, a property called "memorylessness". This property can be very important in the fields of survival analysis, reliability analysis, and stochastic processes. For applications in hydrology, this is less important. For non-negative, highly skewed variables, there is no simpler or more parsimonious model than the Exponential Distribution.

Some notational ambiguity may arise with the Exponential Distribution as it is commonly parameterized in two different ways; the first is with a rate parameter λ; the second with a scale parameter β. It is often more convenient to parameterize the distribution in terms of the scale parameter β as it is equal to the mean and the standard deviation of the distribution. It is also the specification commonly used in survival analysis and actuarial science. The λ (rate) parameterization is more common in stochastic processes, and naturally relates to the Poisson distribution. The parameterization used in HEC-SSP is the β (scale) parameterization. β is strictly positive.

The Exponential (β) Distribution uses the Probability Density Function, Cumulative Distribution Function, and Quantile Function shown in Table 8. The moments for this distribution are simple in terms of the parameters, as shown in Table 9.

Table 8. Exponential Density, Distribution, and Quantile Functions.

Probability Density Function	$\begin{array}{l}f(x \mid \beta)=\beta^{-1}exp(-\frac{x}{\beta}) x\ge0\end{array}$
Cumulative Distribution Function	$\begin{array}{l}F(x \mid \beta)=1-exp(-\frac{x}{\beta}) x\ge0\end{array}$
Quantile Function	$\begin{array}{l}F^{-1}(p \mid \beta)=-\beta ln(1-p)\end{array}$

Table 9. Exponential Distribution Moments.

Mean	$\begin{array}{l}E[X]=\beta\end{array}$
Variance	$\begin{array}{l}Var[X]=\beta^{2}\end{array}$
Skewness	$\begin{array}{l}Skew[X]=2\end{array}$

Shifted Exponential Distribution

The Shifted Exponential Distribution is a two-parameter, positively-skewed distribution with semi-infinite continuous support with a defined lower bound; x ∈ [τ, ∞).

By adding a second (location) parameter to the Exponential Distribution, the lower bound of the distribution can be non-zero. The shift increases the mean of the distribution but leaves the higher moments the same. The Shifted Exponential is also a special case of the Generalized Pareto Distribution when its shape parameter (κ) is equal to zero. Adding a shift makes this distribution useful for modeling highly positively skewed data that have a non-zero lower bound. The location parameter can take on any real value; τ ∈ [-∞, ∞).

The same notational ambiguity arises with the Shifted Exponential Distribution as it does with the conventional Exponential Distribution; see the Exponential Distribution section for discussion regarding parameterization. The HEC-SSP implementation of the Shifted Exponential also uses the scale parameter convention.

The Shifted Exponential (β) Distribution uses the Probability Density Function, Cumulative Distribution Function, and Quantile Function shown in Table 10. The moments for this distribution are simple in terms of the parameters, as shown in Table 11.

Table 10. Shifted Exponential Density, Distribution, and Quantile Functions.

Probability Density Function	$\begin{array}{l}f(x \mid \beta , \tau)=\beta^{-1}exp(-\frac{x-\tau}{\beta}) x\ge0\end{array}$
Cumulative Distribution Function	$\begin{array}{l}F(x \mid \beta , \tau)=1-exp(-\frac{x-\tau}{\beta}) x\ge0\end{array}$
Quantile Function	$\begin{array}{l}F^{-1}(p \mid \beta , \tau)=\tau -\beta ln(1-p)\end{array}$

Table 11. Shifted Exponential Distribution Moments.

Mean	$\begin{array}{l}E[X]=\beta + \tau\end{array}$
Variance	$\begin{array}{l}Var[X]=\beta^{2}\end{array}$
Skewness	$\begin{array}{l}Skew[X]=2\end{array}$

Gamma Distribution

The Gamma Distribution is a two-parameter, positively-skewed distribution with support for all positive real numbers; x ∈ (0, ∞).

Flexible and strictly positive, the Gamma Distribution is a popular choice for modeling many kinds of data with mild positive skew and a lower endpoint of zero. An extension of the distribution with a non-zero lower endpoint is the Shifted Gamma Distribution. A re-parameterization of the Shifted Gamma that uses moments as parameters produces the Pearson type III Distribution.

There are two common parameterizations of the Gamma Distribution which leads to extensive notational ambiguity; HEC-SSP uses the shape-scale parameterization with shape κ and scale θ. Both κ and θ are strictly positive. For κ → ∞ the distribution converges to a Normal distribution. The alternative Gamma(α, β) or shape-rate parameterization is more common in Bayesian inference.

When κ is an integer, the distribution is known as the Erlang Distribution which is used extensively in stochastic processes and queuing theory; it is the distribution of the sum of κ independent Exponential Distributions each with scale parameter θ.

The Gamma(κ, θ) Distribution uses the Probability Density Function and Cumulative Distribution Function (the Quantile Function has no closed form) shown in Table 12. The first three moments of the Gamma(κ, θ) Distribution are shown in Table 13.

Table 12. Gamma Density, Distribution, and Quantile Functions.

Probability Density Function	$\begin{array}{l}f(x \mid \kappa , \theta)=\frac{x^{\kappa -1}exp(-\frac{x}{\theta})}{\Gamma(\kappa)\theta^{\kappa}}\end{array}$ where Γ(∙) is the complete gamma function.
Cumulative Distribution Function	$\begin{array}{l}F(x \mid \kappa, \theta)=\frac{\int_{0}^{\frac{x}{\theta}} t^{\kappa-1}exp(-t)dt}{\Gamma(\kappa)}\end{array}$ where Γ(∙) is the complete gamma function. The numerator is the Lower Incomplete Gamma Function, sometimes written γ(∙,∙).
Quantile Function	No closed form.

Table 13. Gamma Distribution Moments.

Mean	$\begin{array}{l}E[X]=\kappa \theta\end{array}$
Variance	$\begin{array}{l}Var[X]=\kappa \theta^{2}\end{array}$
Skewness	$\begin{array}{l}Skew[X]=\frac{2}{\sqrt{\kappa}}\end{array}$

Shifted Gamma Distribution

The Shifted Gamma Distribution is a three-parameter distribution with continuous but variable support on the interval x∈τ, ∞ where τ is the shift (location) parameter of the distribution. When τ = 0 the distribution reduces to the 2-parameter Gamma Distribution. If the Shifted Gamma Distribution is parameterized by its mean, standard deviation and skew instead of location, scale, and shape, it is referred to as the Pearson type III Distribution.

The Shifted Gamma(κ, θ, τ) Distribution uses the Probability Density Function and Cumulative Distribution Function (the Quantile Function has no closed form) shown in Table 14. The first three moments of the Shifted Gamma(κ, θ, τ) Distribution are shown in Table 15.

Table 14. Shifted Gamma Density, Distribution, and Quantile Functions.

Probability Density Function	$\begin{array}{l}f(x \mid \kappa , \theta, \tau)=\frac{(x-\tau)^{\kappa -1}exp(-\frac{(x-\tau)}{\theta})}{\Gamma(\kappa)\theta^{\kappa}}\end{array}$ where Γ(∙) is the complete gamma function.
Cumulative Distribution Function	$\begin{array}{l}F(x \mid \kappa, \theta, \tau)=\frac{\int_{0}^{\frac{x}{\theta}} (y-\tau)^{\kappa-1}exp(-(y-\tau))dy}{\Gamma(\kappa)\theta ^{\kappa}}\end{array}$ where Γ(∙) is the complete gamma function. The numerator is the Lower Incomplete Gamma Function, sometimes written γ(∙,∙).
Quantile Function	No closed form.

Table 15. Shifted Gamma Distribution Moments.

Mean	$\begin{array}{l}E[X]=\tau +\kappa \theta\end{array}$
Variance	$\begin{array}{l}Var[X]=\kappa \theta ^{2}\end{array}$
Skewness	$\begin{array}{l}Skew[X]=\frac{2}{\sqrt{\kappa}}\end{array}$

Hosking's Location-Scale-Shape Family

The following three distributions (Generalized Extreme Value, Generalized Logistic, and Generalized Pareto) belong to a family of three-parameter distributions with a common parameterization scheme devised by Hosking and Wallis (Hosking & Wallis, 1997). Each has location ξ, scale α and shape κ. It is an unwritten rule in mathematics texts that the Greek letter "xi" (ξ), the most difficult to write and to pronounce, should appear at least once (Klugman, Panjer, & Willmot, 2012). The shape parameter κ is important for determining the behavior of the distributions, including their boundedness and the weight of the right tail. It also indicates which other probability distribution has been subsumed by the generalized distribution. Occasionally, these distributions will appear in literature with other parameterizations but the same name, so it is important to check which parameterization is being used.

Generalized Extreme Value Distribution

The Generalized Extreme Value (GEV) Distribution is a flexible three-parameter distribution which subsumes the three extreme-value distributions: the Gumbel (extreme-value type I), Fréchet (extreme-value type II) and Weibull (extreme-value type III) distributions. Which distribution it represents is dependent on the value of the shape parameter κ (kappa).

Precipitation frequency analysis most frequently employs the GEV distribution, or sometimes one of the other members of this location-scale-shape family. In nations other than the United States, GEV is the model of choice for flood frequency analysis. Indeed, the GEV distribution is directly derived by taking the maximum of repeated independent samples from a homogeneous population.

The GEV distribution arises because of the Fisher-Tippett-Gnedenko Theorem, also known as the First Extreme Value Theorem. It states that the distribution of the maximum of repeated samples from a homogenous population converge in distribution to one of three distributions: extreme value I, II, or III. The extreme value distribution to which the maxima converge is dependent on the tail behavior of the population from which the samples are drawn. Roughly, maxima drawn from thin-tailed or upper-bounded distributions tend to be in the Weibull (extreme-value III) domain of attraction; maxima drawn from exponential-tailed populations tend to be in the Gumbel (extreme-value I) domain of attraction; maxima drawn from thick- or heavy-tailed populations tend to be in the Fréchet (extreme-value II) domain of attraction. In practice, the convergence of these samples may occur very slowly, requiring many repeated samples from the population for the limit-state extreme-value distribution to become apparent.

The convention used in HEC-SSP is that the κ < 0 case is for the extreme-value type II distribution with fixed lower bound and no upper bound and κ > 0 is the extreme-value type III distribution with no lower bound and fixed upper bound. Other sources may adopt the opposite convention, where a negative value for the shape parameter implies an upper bound and the type III extreme value distribution. It is important to check which parameterization is being used.

Among this location-scale-shape family, the GEV can be recognized as the double-exponential form. The double-exponential nature is apparent when inspecting its distribution function. In fact, another name for the Gumbel (extreme-value type I) Distribution is the double-exponential distribution. The Probability Density Function, Cumulative Distribution Function, and Quantile Function for the GEV distribution are shown in Table 16. The first three moments of the GEV distribution are shown in Table 17. It may be noted that the practical range of kappa is limited by the possibility of encountering infinite moments.

Table 16. GEV Density, Distribution, and Quantile Functions.

Probability Density Function	$\begin{array}{l}f(x \mid \xi, \alpha, \kappa)=a^{-1} \exp (-(1-\kappa) y-\exp (-y))\end{array}$ where $\begin{array}{l}y=\begin{cases}-\kappa^{-1} \ln \left[1-\frac{\kappa(x-\xi)}{\alpha}\right], \kappa \neq 0 \\ \frac{(x-\xi)}{\alpha}, \kappa=0\end{cases}\right.\end{array}$
Cumulative Distribution Function	$\begin{array}{l}F(x \mid \xi, \alpha, \kappa)=\exp (-\exp (-y))\end{array}$ where $\begin{array}{l}y=\begin{cases}-\kappa^{-1} \ln \left[1-\frac{\kappa(x-\xi)}{\alpha}\right], \kappa \neq 0 \\ \frac{(x-\xi)}{\alpha}, \kappa=0\end{cases}\right.\end{array}$
Quantile Function	$\begin{array}{l}F^{-1}(p \mid \xi, \alpha, \kappa)=\begin{cases}\xi+\frac{\alpha\left[1-(-\ln (p))^{\kappa}\right]}{\kappa}, \kappa \neq 0 \\ \xi-\alpha \ln (-\ln (p)), \kappa=0\end{cases}\right.\end{array}$

Table 17. GEV Distribution Moments.

Mean	$\begin{array}{l}E[X]=\begin{cases}\xi+\alpha \kappa^{-1}\left(g_{1}-1\right), \kappa \neq 0, \kappa>-1 \\ \xi+\alpha \gamma, \kappa=0 \\ \infty, \kappa \leq-1\end{cases}\right.\end{array}$ where $\begin{array}{l}g_{r}=\Gamma(1-r \kappa)\end{array}$ Γ(∙) is the gamma function. γ=0.57721… γ is the Euler-Mascheroni constant.
Variance	$\begin{array}{l}\operatorname{Var}[X]=\begin{cases}\alpha^{2} \kappa^{-2}\left(g_{2}-g_{1}^{2}\right), \kappa \neq 0, \kappa>-\frac{1}{2} \\ \alpha^{2} \frac{\pi^{2}}{6}, \kappa=0 \\ \infty, \kappa \leq-\frac{1}{2}\end{cases}\right.\end{array}$ where π=3.14159…
Skewness	$\begin{array}{l}\operatorname{Skew}[X]=\begin{cases}\operatorname{sgn}(\kappa) \frac{g_{3}-3 g_{2} g_{1}+2 g_{1}^{3}}{\left(g_{2}-g_{1}^{2}\right)^{\frac{3}{2}}}, \kappa \neq 0, \kappa>-\frac{1}{3} \\ \frac{12 \sqrt{6} \zeta(3)}{\pi^{3}} \approx 1.14, \kappa=0 \\ \infty, \kappa \leq-\frac{1}{3}\end{cases}\right.\end{array}$ where ζ(∙) is the Riemann zeta function.

Generalized Logistic Distribution

The Generalized Logistic Distribution (GLO) is a heavy-tailed probability distribution with support dependent on the shape parameter κ (kappa). Positive κ implies an upper bounded distribution while negative κ implies a lower bound, with the κ = 0 case being unbounded. The κ = 0 case is the Logistic Distribution, which is symmetrical.

Due to the heavy-tailed nature of the GLO, it is useful for modeling data where extreme deviations from the mean are to be expected more often than under other distributions. The κ = 0 case (Logistic Distribution) is often used as a heavy-tailed alternative to the Normal Distribution. GLO most frequently appears in precipitation frequency analysis alongside the GEV and GPA.

There are many distributions referred to as the "Generalized Logistic Distribution," and this is simply another version of it. As implemented in HEC-SSP, it was meant to align with the definitions of the Generalized Extreme Value and Generalized Pareto Distributions generalized by Hosking and Wallis (1996) with location ξ, scale α, and shape κ.

The Probability Density Function, Cumulative Distribution Function, and Quantile Function are shown in Table 18. The logit function $\begin{array}{l}log (\frac{p}{1-p})\end{array}$ is at the heart of the quantile function, which is the source of the name for the distribution. The first three moments of the GLO are shown in Table 19.

Table 18. GLO Density, Distribution, and Quantile Functions.

Probability Density Function	$\begin{array}{l}f(x \mid \xi, \alpha, \kappa)=\frac{a^{-1} \exp (-(1-\kappa) y)}{(1+\exp (-y))^{2}}\end{array}$ where $\begin{array}{l}y=\begin{cases}-\kappa^{-1} \ln \left[1-\frac{\kappa(x-\xi)}{\alpha}\right], \kappa \neq 0 \\ \frac{(x-\xi)}{\alpha}, \kappa=0\end{cases}\right.\end{array}$
Cumulative Distribution Function	$\begin{array}{l}F(x \mid \xi, \alpha, \kappa)=(1+\exp (-y))^{-1}\end{array}$ where $\begin{array}{l}y=\begin{cases}-\kappa^{-1} \ln \left[1-\frac{\kappa(x-\xi)}{\alpha}\right], \kappa \neq 0 \\ \frac{(x-\xi)}{\alpha}, \kappa=0\end{cases}\right.\end{array}$
Quantile Function	$\begin{array}{l}F^{-1}(p \mid \xi, \alpha, \kappa)=\begin{cases}\xi+\frac{\alpha\left[1-\left(\frac{p}{1-p}\right)^{\kappa}\right]}{\kappa}, \kappa \neq 0 \\ \xi-\alpha \ln \left(\frac{p}{1-p}\right), \kappa=0\end{cases}\right.\end{array}$

Table 19. GLO Distribution Moments.

Mean	$\begin{array}{l}E[X]=\begin{cases}\xi+\alpha \kappa^{-1}\left(1-h_{1}\right), \kappa \neq 0, \kappa>-1 \\ {\xi}, \kappa=0 \\ \infty, \kappa \leq-1\end{cases}\right.\end{array}$ where $\begin{array}{l}h_{r}=\frac{r \pi \kappa}{\sin (r \pi \kappa)}\end{array}$ π=3.14159…
Variance	$\begin{array}{l}\operatorname{Var}[X]=\begin{cases}\alpha^{2} \kappa^{-2}\left(1-2 h_{1}+h_{2}\right), \kappa \neq 0, \kappa>-\frac{1}{2} \\ \frac{\alpha^{2} \pi^{2}}{3}, \kappa=0 \\ \infty, \kappa \leq-\frac{1}{2}\end{cases}\right.\end{array}$
Skewness	$\begin{array}{l}\operatorname{Skew}[X]=\begin{cases}\operatorname{sgn}(\kappa) \frac{1-3 h_{1}+3 h_{2}-h_{3}}{\left(1-2 h_{1}+h_{2}\right)^{\frac{3}{2}}}, \kappa \neq 0, \kappa>-\frac{1}{3} \\ 0, \kappa=0 \\ \infty, \kappa \leq-\frac{1}{3}\end{cases}\right.\end{array}$

Generalized Pareto Distribution

The Generalized Pareto Distribution (GPA, for consistency with Hosking's notation; sometimes GPD elsewhere) is a flexible three-parameter probability distribution with fixed lower bound and location parameter ξ (xi). The distribution may also be upper bounded when the shape parameter κ (kappa) is positive. When κ is zero, the distribution reduces to a shifted exponential distribution, which makes it useful for modeling exponential variates with lower bound ξ instead of zero.

A feature of the GPA is the control provided by the location parameter ξ, which determines the lower bound of the distribution. In some applications, the GPA is treated as a two-parameter distribution where ξ is known and specified by the user. Alternatively, it may be estimated from the data. ξ may be referred to as the "threshold" for the GPA, which comes from the derivation of the distribution.

The GPA arises as a result of the Pickands-Balkema-de Haan Theorem, also known as the Second Extreme Value Theorem. The theorem states that subsamples exceeding a sufficiently high threshold from repeated samples of a homogeneous population converge in distribution to the GPA Distribution. In other words, if repeated independent samples are taken from a population, and only the values in those samples that are greater than a selected value are retained, then those retained values will follow the GPA distribution.

In hydrology, the GPA allows for straightforward analysis of partial duration series (peaks-over-threshold) flood series. Rainfall frequency analysis frequently makes use of the GPA, especially in cases where only rainfall maxima exceeding a selected threshold are retained. It also gets extensive use in actuarial science as a model for payouts when claims exceed a deductible.

The form of the Probability Density, Cumulative Distribution, and Quantile functions are functionally very similar for the GEV, GLO, and GPA distributions (by design) and are shown in Table 20. The GPA uses the first three moments shown in Table 21.

Table 20. GPA Density, Distribution, and Quantile Functions.

Probability Density Function	$\begin{array}{l}f(x \mid \xi, \alpha, \kappa)=a^{-1} \exp (-(1-\kappa) y)\end{array}$ where $\begin{array}{l}y=\begin{cases}-\kappa^{-1} \ln \left[1-\frac{\kappa(x-\xi)}{\alpha}\right], \kappa \neq 0 \\ \frac{(x-\xi)}{\alpha}, \kappa=0\end{cases}\right.\end{array}$
Cumulative Distribution Function	$\begin{array}{l}F(x \mid \xi, \alpha, \kappa)=1-\exp (-y)\end{array}$ where $\begin{array}{l}y=\begin{cases}-\kappa^{-1} \ln \left[1-\frac{\kappa(x-\xi)}{\alpha}\right], \kappa \neq 0 \\ \frac{(x-\xi)}{\alpha}, \kappa=0\end{cases}\right.\end{array}$
Quantile Function	$\begin{array}{l}F^{-1}(p \mid \xi, \alpha, \kappa)=\begin{cases}\xi+\frac{\alpha\left[1-(1-p)^{\kappa}\right]}{\kappa}, \kappa \neq 0 \\ \xi-\alpha \ln (1-p), \kappa=0\end{cases}\right.\end{array}$

Table 21. GPA Distribution Moments.

Mean	$\begin{array}{l}E[X]=\begin{cases}\xi+\alpha(1+\kappa)^{-1}, \kappa>-1 \\ \infty, \kappa \leq-1\end{cases}\right.\end{array}$
Variance	$\begin{array}{l}\operatorname{Var}[X]=\begin{cases}\frac{\alpha^{2}}{(1+\kappa)^{2}(1+2 \kappa)}, \kappa>-\frac{1}{2} \\ \infty, \kappa \leq-\frac{1}{2}\end{cases}\right.\end{array}$
Skewness	$\begin{array}{l}\operatorname{Skew}[X]=\begin{cases}\frac{2(1-\kappa) \sqrt{1+2 \kappa}}{(1+3 \kappa)}, \kappa>-\frac{1}{3} \\ \infty, \kappa \leq-\frac{1}{3}\end{cases}\right.\end{array}$

Gumbel Distribution

The Gumbel, or Extreme-Value type I Distribution, is a two-parameter distribution with continuous infinite support and fixed positive skewness ≈ 1.14. The Generalized Extreme Value (GEV) distribution converges to a Gumbel Distribution as its shape parameter approaches zero. Use of the Gumbel Distribution instead of the GEV is often at the user's discretion. For data with a small sample size and GEV κ close to zero, the Gumbel makes a sane and parsimonious choice. Gumbel is a particularly safe choice when modeling maxima of an exponential-tail population, as those populations tend to lie within the extreme-value I maximum domain of attraction. For the Probability Density Function, Cumulative Distribution Function, Quantile Function and moments, see the GEV Distribution with κ = 0.

Logistic Distribution

The Logistic Distribution is a two-parameter distribution with continuous infinite support. It is symmetrical and nearly bell-shaped like the Normal Distribution, with heavier tails; it has fixed excess kurtosis = 1.2 (Normal excess kurtosis ≡ 0).

Historically, the Logistic Distribution was sometimes used in place of the Normal Distribution for computational convenience, as neither the Cumulative Distribution Function nor the Quantile Function of the Normal Distribution has closed forms, while they do for the Logistic. For non-extreme values of the distributions, inferences made from the Logistic are close to that of the Normal. However, when continuing into the tails of the distributions, the difference between the two can be profound. Its primary use in modern statistics is as the random error term in logistic regression.

The Logistic Distribution's two parameters are μ (location) and s (scale). It uses the Probability Density Function, Cumulative Distribution Function, and Quantile Function shown in Table 22. The logit function appears in the Quantile Function. The first three moments of the Logistic Distribution are found in Table 23.

Table 22. Logistic Density, Distribution, and Quantile Functions.

Probability Density Function	$\begin{array}{l}f(x \mid \mu, s)=\frac{exp(-\frac{x-\mu}{s})}{s(1+exp(-\frac{x-\mu}{s}))^{2}}\end{array}$
Cumulative Distribution Function	$\begin{array}{l}F(x \mid \mu, s)=(1+exp(-\frac{x-\mu}{s}))^{-1}\end{array}$
Quantile Function	$\begin{array}{l}F^{-1}(p\mid \mu ,s)=\mu +s ln(\frac {p}{1-p})\end{array}$

Table 23. Logistic Distribution Moments.

Mean	$\begin{array}{l}E[X]=\mu\end{array}$
Variance	$\begin{array}{l}Var[X]=\frac{s^{2}\pi^{2}}{3}\end{array}$
Skewness	$\begin{array}{l}Skew[X]=0\end{array}$

Log-Logistic Distribution

The Log-Logistic Distribution is a two-parameter positively-skewed distribution that describes random variables whose logarithm is Logistic-distributed. It has continuous support for all non-negative real numbers. It is similar to the Log-Normal Distribution, except it has heavier tails. When used in econometrics it is referred to as the Fisk Distribution and used to model the distribution of income.

HEC-SSP reports the parameters of the Log-Logistic Distribution as the location and scale parameters μ and s of a Logistic Distribution fit to the natural logarithm of the input dataset. Taking y = ln(x), the Probability Density Function, Cumulative Distribution Function and Quantile Function of the Log-Logistic Distribution are shown in Table 24. The most straightforward way to obtain the real-space moments of the Log-Logistic Distribution is to use the transform $\begin{array}{l}\alpha = exp(\mu)\end{array}$ , $\begin{array}{l}\beta =\frac{1}{s}\end{array}$ which changes the parameterization to scale-shape, and compute (noting that moments only exist if their order is less than β), as shown in Table 25.

Table 24. Log-Logistic Density, Distribution, and Quantile Functions.

Probability Density Function	$\begin{array}{l}f(y\mid \mu \,s)=\frac {exp(-\frac {y-\mu }{s})}{s(1+exp(-\frac {y-\mu}{s}))^{2}}\end{array}$
Cumulative Distribution Function	$\begin{array}{l}F(y\mid \mu \,s)=(1+exp(-\frac {y-\mu}{s}))^{-1}}\end{array}$
Quantile Function	$\begin{array}{l}F^{-1}(p\mid \mu ,s)=\mu +s ln(\frac {p}{1-p})\end{array}$

Table 25. Log-Logistic Distribution Moments.

Mean	$\begin{array}{l}E[X]=\frac{\pi \alpha}{\beta}csc(\frac{\pi}{\beta})\end{array}$ where π=3.14159… csc(∙) is the cosecant function.
Variance	$\begin{array}{l}Var[X]=\frac{2\pi \alpha ^{2}}{\beta}csc(\frac{2\pi}{\beta})-\frac{\pi^{2}\alpha^{2}}{\beta^{2}}csc(\frac{\pi}{\beta})^{2}\end{array}$
Skewness	$\begin{array}{l}Skew[X]=\frac{2\pi^{3}\alpha^{3}}{\beta^{3}}csc(\frac{\pi}{\beta})^{3}-\frac{6\pi^{2}\alpha^{3}}{\beta^{2}}csc(\frac{\pi}{\beta})csc{(\frac{2\pi}{\beta})+\frac{3\pi\alpha^{3}}{\beta}csc(\frac{3\pi}{\beta})\end{array}$

Normal Distribution

The ubiquitous Normal Distribution is a two-parameter symmetrical distribution with infinite support. It has two parameters, location μ ∈ ℝ and squared-scale $\begin{array}{l}\sigma^{2}>0\end{array}$ , which also happen to be the moments of the distribution. For symmetrical data, the Normal Distribution is generally an user's first choice as a model and is reasonable for a wide range of applications. Note that the support is infinite, so for strictly positive data, inference using the Normal Distribution will result in non-zero probability assigned to negative-valued outcomes. When μ = 0, σ²= 1 the distribution is said to be Standard Normal, usually notated as Z. Normally-distributed data with any mean and variance can be transformed to Standard Normal using the transform $\begin{array}{l}Z \equiv \frac{x-\mu}{\sigma}\end{array}$ . The Probability Density Function, Cumulative Distribution Function, and Quantile Function of the Normal Distribution are shown in Table 26. The moments are straightforward and are shown in Table 27.

Table 26. Normal Density, Distribution, and Quantile Functions.

Probability Density Function	$\begin{array}{l}f(x\mid \mu,\sigma^{2})=\frac{1}{\sqrt{2\pi\sigma^{2}}}exp(-\frac{(x-\mu)^{2}}{2\sigma^{2}})\end{array}$ π=3.14159…
Cumulative Distribution Function	No closed form.
Quantile Function	No closed form.

Table 27. Normal Distribution Moments.

Mean	$\begin{array}{l}E[X]=\mu\end{array}$
Variance	$\begin{array}{l}Var[X]=\sigma^{2}\end{array}$
Skewness	$\begin{array}{l}Skew[X]=0\end{array}$

Excess kurtosis is defined in terms of the Normal Distribution, which has a standard kurtosis of 3. Thus, excess kurtosis is the amount by which a distribution exceeds the kurtosis of the Normal Distribution. Probability distributions with positive excess kurtosis are heavier-tailed than the Normal Distribution and are called leptokurtic. Distributions with negative excess kurtosis are thinner-tailed than the Normal and are called platykurtic. Those with excess kurtosis close to zero are called mesokurtic.

Log-Normal Distribution

The Log-Normal Distribution is a two-parameter positively skewed distribution that describes random variables whose logarithm is Normal-distributed. It has continuous support for all non-negative real numbers. For certain applications, a different base of logarithm may be desired, with the most common being log base e (natural log) and base 10. They are functionally identical, and will be shown below generically using the function log_b(∙) where b is the base. The parameters μ and σ are not location and scale parameters for a Log-Normal distributed random variable, they are the location and scale parameters of a Normal-distributed logarithm log_b(x). The Probability Density Function, Cumulative Distribution Function and Quantile Function of the Log-Normal Distribution are shown in Table 28. The moments are shown in Table 29.

Table 28. Log-Normal Density, Distribution, and Quantile Functions.

Probability Density Function	$\begin{array}{l}f\left(x \mid \mu, \sigma^{2}\right)=\frac{1}{x \sqrt{2 \pi \sigma^{2}}} b^{-\frac{\left(\log _{b}(x)-\mu\right)^{2}}{2 \sigma^{2}}}\end{array}$ where π=3.14159…
Cumulative Distribution Function	No closed form.
Quantile Function	No closed form.

Table 29. Log-Normal Distribution Moments.

Mean	$\begin{array}{l}E[X]=b^\mu\end{array}$
Variance	$\begin{array}{l}\operatorname{Var}[X]=\left[b^{\sigma^{2}}-1\right] b^{2 \mu+\sigma^{2}}\end{array}$
Skewness	$\begin{array}{l}\operatorname{Skew}[X]=\left(b^{\sigma^{2}}+2\right) \sqrt{b^{\sigma^{2}-1}}\end{array}$

Pearson Type III Distribution

The Pearson Type III distribution is one of the seven distributions in the Pearson family. The Pearson Type III (PE3) is a flexible, three parameter distribution with skewness controlled by its shape parameter γ (gamma). PE3 converges to a Normal Distribution as its shape parameter approaches zero. The distribution is an extension of the Gamma Distribution that gives explicit control over the model's support and as a result its symmetry. When the shape parameter is positive, the model is lower-bounded and right (positive) skewed; conversely, when it is negative the model is upper-bounded and left (negative) skewed. As the shape parameter approaches zero, the distribution becomes more symmetrical and the upper (or lower) bound approaches infinity (negative infinity) until the result is a normal distribution.

PE3 is parameterized by its moments, mean μ, standard deviation σ, and skewness γ. When γ=0 the PE3 reduces to a Normal Distribution with mean μ and variance σ². For the Probability Density Function, Cumulative Distribution Function and Quantile Function, use the following transforms and see the Shifted Gamma Distribution: $\begin{array}{l}\kappa=\frac{4}{\gamma^{2}}, \theta=\frac{\sigma \gamma}{2}, \tau=\mu-2 \frac{\sigma}{\gamma}\end{array}$ .

The distribution has been described occasionally in literature as a "skewed normal" distribution, but this terminology has been used to describe many generalizations or extensions of the normal distribution that allow for a third (shape) parameter. Its most common usage is for modeling annual maximum stream discharge, where it is applied to logarithmically transformed values and referred to as the Log-Pearson Type III Distribution. However, it may be used without first logarithmically transforming the data.

Log-Pearson Type III Distribution

The Log-Pearson Type III Distribution (LP3) is a flexible three-parameter distribution that models random variables whose logarithms are Pearson Type III-distributed. LP3 originally arose as a solution to fitting a model to annual maximum stream discharges that did not form a straight line on normal probability paper with a logarithmically transformed ordinate. Generally, LP3 is applied with the base-10 logarithm and the moments are expressed as the mean, standard deviation and skewness of the logarithmically-transformed data. After taking y=log₁₀(x) find the mean, standard deviation, and skewness of the sample y and proceed to fitting the PE3 Distribution.

Note that Bulletin 17 procedures include additional considerations for differing types of data in the input, and should be thought of as a parameter estimation procedure similar to, but not the same as, the method of moments for the LP3 Distribution.

Uniform Distribution

The Uniform Distribution is a simple, symmetrical probability distribution that assigns equal probability density to all outcomes on the closed interval [A, B]. HEC-SSP includes the Uniform Distribution as a convenience, but it is rarely used in practice. Its most common usage is in Bayesian inference as a prior distribution when little to no information is known about the parameter. It has no mode and may also be called the Rectangular Distribution, contrasting the Triangular Distribution below. The most common usage of the Uniform Distribution is with 𝐴=0, 𝐵=1 which is usually notated as U(0, 1) and referred to as the Standard Uniform Distribution. Through a property called the Probability Integral Transform, any random variable can be transformed to a Standard Uniform Distribution and vice versa. The consequence of this transform is that random samples can be drawn from any probability distribution with a Quantile Function by supplying samples from a U(0, 1) random variable as the argument p. Random number generation algorithms generally provide U(0, 1) random variables, and inverse transform sampling then allows for random samples of many probability distributions to be generated. The Probability Density Function, Cumulative Distribution Function, and Quantile Function of the Uniform Distribution are shown in Table 30. The moments of the Uniform Distribution are shown in Table 31.

Table 30. Uniform Density, Distribution, and Quantile Functions.

Probability Density Function	$\begin{array}{l}f(x \mid A, B)=\frac{1}{B-A}\end{array}$
Cumulative Distribution Function	$\begin{array}{l}F(x \mid A, B)=\begin{cases}0, x< A \\ \frac{x-A}{B-A}, x \in[A, B) \\ 1, x \geq B\end{cases}\right.\end{array}$
Quantile Function	$\begin{array}{l}F^{-1}(p \mid A, B)=(B-A) p+A\end{array}$

Table 31. Uniform Distribution Moments.

Mean	$\begin{array}{l}E[X]=\frac{A+B}{2}\end{array}$
Variance	$\begin{array}{l}Var[X]=\frac{(B-A)^2}{12}\end{array}$
Skewness	$\begin{array}{l}Skew[X]=0\end{array}$

Triangular Distribution

The Triangular Distribution is a continuous, unimodal distribution on the closed interval [A, B] generally used as a means for expressing subjective probability about outcomes that can be specified by a minimum, maximum and most likely value. It is an extension of the Uniform Distribution that adds a third parameter, 𝐶∈[𝐴,𝐵] which is the location of the mode of the distribution, noting that the mode can be at one of the endpoints. The Probability Density Function, Cumulative Distribution Function, and Quantile Function of the Triangular Distribution are shown in Table 32. The moments of the Triangular Distribution are shown in Table 33.

Table 32. Triangular Density, Distribution, and Quantile Functions.

Probability Density Function	$\begin{array}{l}f(x \mid A, B, C)=\begin{cases}\frac{2(x-A)}{(B-A)(C-A)}, x \leq C \\ \frac{2(B-x)}{(B-A)(B-C)}, x>C\end{cases}\right.\end{array}$
Cumulative Distribution Function	$\begin{array}{l}F(x \mid A, B, C)=\begin{cases}\frac{(x-A)^{2}}{(B-A)(C-A)}, x \leq C \\ 1-\frac{(B-x)^{2}}{(B-A)(B-C)}, x>C\end{cases}\right.\end{array}$
Quantile Function	$\begin{array}{l}F^{-1}(p \mid A, B, C)=\begin{cases}A+\sqrt{p(B-A)(C-A)}, x \leq \frac{C-A}{B-A} \\ B-\sqrt{(1-p)(B-A)(B-C)}, x>\frac{C-A}{B-A}\end{cases}\right.\end{array}$

Table 33. Triangular Distribution Moments.

Mean	$\begin{array}{l}E[X]=\frac{A+B+C}{3}\end{array}$
Variance	$\begin{array}{l}\operatorname{Var}[X]=\frac{1}{18}\left(A^{2}-A(B+C)+B^{2}-B C+C^{2}\right)\end{array}$
Skewness	$\begin{array}{l}\operatorname{Skew}[X]=\frac{1}{270}(A+B-2 C)(2 A-B-C)(A-2 B+C)\end{array}$

Download PDF

Distribution Fitting and Parameter Estimation