The initial sample using all storm observations is known to be created by a mixture of samples taken from two different populations (thunderstorms and non-thunderstorms.)  This means that the series of observations may not be identically distributed (the -ID part of IID) which has consequences when a model for the population is fit to the data.  In this last task, you will examine the difference between fitting an annual maximum series (AMS) model to the “all storms” dataset, and to fitting an AMS model to each of the sub-samples for a specific type of storm and combining them back into a single model using the Mixed Population Analysis (MPA) in HEC-SSP.

The Mixed Population Analysis allows you to compute what the population frequency curve would be when you know what the frequency curve of each part of the mixture is.  The computation is performed by taking a sequence of values of the random variable (for this example, wind speeds) and computing the CDF for each of the mixture curves at that value.  Then, using the probability of union, we are able to compute the cumulative probability for that wind speed due to each of the parts of the mixture.  For our example, it computes the cumulative probability for a wind speed by computing the cumulative probability from non-thunderstorm OR thunderstorm causes, which increases the probability from either of the individual curves.  For two curves, the computation works like this: F_{NT \cup T}(X) = F_{NT}(X) + F_{T}(X) - F_{NT}(X)*F_{T}(X)

Create a new Mixed Population Analysis using the menu bar (Figure 1) and title it “AMS Combined” (Figure 2).

Figure 1. Creating a new Mixed Population Analysis through the menu bar.

Figure 2. Heading for the Mixed Population Analysis.

Set the number of curves to combine to “2” (which should be the default value.)  Under “Extrapolate Input Data,” select “Allow Extrapolation of Input Data.”  Under “Output Labeling,” check both boxes for “Data Label” and “Data Units,” entering “WIND-SPEED” and “MPH,” respectively.  Under “Log Transform,” select “Do Not Use Log Transform.”  For “Data Type,” use the drop-down menu to select “Other.”  Under “Confidence Limits” select “Compute Using Order Statistics” and enter the number of equivalent years of record equal to the smaller of the sample size from the non-thunderstorm and thunderstorm AMS analyses (29 years).  Click “Apply” in the lower right corner.  Switch to the “Frequency Curves” tab (Figure 3).

Figure 3. Location of the Frequency Curves tab on the MPA editor.

For Frequency Curve 1, enter “Non-Thunderstorm” in the Name box, and check that the “Analytical Distribution” radio button is selected.  In the drop-down menu for Analytical Distribution, select GEV.  For the parameters, enter the location (ξ), scale (α) and shape (κ) that you recorded in Task 2 Table 2 results for non-thunderstorm type, annual maximum series.  Then, click the “Compute” button that is just below the shape statistic.

Repeat the procedure for Frequency Curve 2, entering “Thunderstorm” in the Name box, selecting “Analytical Distribution/GEV,” and entering the parameters you recorded in Task 2 Table 4 results for thunderstorm type, annual maximum series.  Click the “Compute” button that is just below the shape statistic.

Click the “Plot Input Frequency Curves” button in the lower left corner.

Question 1: Based on the frequency curves plotted over each other, which storm type causes extremely high winds more frequently?  Which storm type controls the lower portion of the frequency curve?  Which of the two storm types displays more variance?


Click the “Compute” button in the lower left corner, and then switch to the “Results” tab (Figure 4).

Figure 4. Location of the Results tab on the MPA editor.

Question 2: What is the effect of combining the two curves on a) the less-frequent end of the curve and b) the more frequent end of the curve?

 

 Open up the “result_plotter.xlsx” Excel workbook.  There is a table for the results of the MPA frequency curve, as well as the AMS model fit to all wind speed data.  Copy the resulting wind speed frequency curve from the MPA analysis into the table, and then the “AMS All” results for the median curve.  The plot will update with the results (which are plotted on “extreme value type I” paper.)

Question 3: If you only performed an analysis on the “All Storms” dataset instead of modeling the individual extreme wind causal mechanisms separately, what assumption typically made when fitting a model is being violated?

 

 

Question 4: How does the result differ between the “All Storms” AMS analysis and the combination of the Thunderstorm and Non-Thunderstorm AMS analyses using a Mixed Population Analysis?  If an analysis of extreme wind speeds was a critical piece of a risk assessment, what is the potential consequence of violating the assumption you identified in Question 3?