Helpful Hints About Study Data

The hints provided on this page are useful for any new or existing HEC-FDA Version 2.0 study.

General Advice

This advice applies to the development of any statistical model - not requirements of HEC-FDA alone.

When configuring study data, you're building a statistical model of damage that takes the form of a probability distribution. You should configure your data with the objective of painting a complete picture. There are two very common characteristics of study data that cause substantial bias: nonlinearity and truncation.

Nonlinearity

The statistical model of damage that you produce with HEC-FDA is based on discrete data. The functions that we use to develop the damage-exceedance probability function are not continuous - there are a countable number of coordinates of x and y values. This means that we must interpolate between the coordinates to reach something as close to continuous as we can get to satisfy the assumptions that we make when we integrate a discrete damage-exceedance probability distribution to calculate expected annual damage. HEC-FDA interpolates linearly (an opportunity for innovation), so if the true nature of the relationship that we are modeling is highly nonlinear and we use very few coordinates, the result becomes biased. Any one function among the set of summary relationships can cause substantial bias if too few coordinates are used to model that relationship.

How do you know if you have a problem? If we start with defining the maximum number of coordinates practicable (8 feels like minimum), then investigate whether there are any intervals that demonstrate rapid change. If so, evaluate whether another coordinate among that interval is practicable. In other words, if you see a jump, it could be worth adding a point if the jump is not likely to be linear.

An example is provided in the image below. The image is a screen shot of a system response curve obtained from real study data with two alternative shapes that are perfectly possible given the input data. At a stage of 29 feet, there is a 30% difference in failure probability, which is an outstanding difference when the consequences are measured in the billions.

Real System Response Curve with Alternatives Overlaid

Truncation

We build a statistical model in the form of a probability distribution because that probability distribution can be integrated to give us an expected value of damage. Several relationships are combined in the development of that probability distribution so as to be able to relate the probabilistic model of the hazard to the model of the performance of any infrastructure system to the model of consequences in the floodplain. When in the combination of these models, the relevant range and domain do not overlap well, truncation of the probability distribution takes place. Intuitively, when the hazard domain of the model of consequences in the floodplain overlaps with just a portion of the range of the probabilistic model of the hazard, we have to make an assumption about what the consequences look like for the unmatched part of the hazard. HEC-FDA does not extrapolate, so we assume the undefined portion of the model to be flat. In other words, if the highest stage in the probabilistic model of the hazard is 200ft but the highest stage in the model of consequences in the floodplain is 195ft, then consequences in the floodplain are the same for all stages between 195ft and 200ft.

Every time you build a study, pop out the summary relationships that you're using for your scenario, and verify that you have good overlap inclusive of the uncertainty distributions. An example with real study data is provided below. An analytical flow frequency function is displayed alongside the regulated-unregulated flow-frequency function applied to model a dam. The highest unregulated flow defined in the regulated-unregulated transform flow function is 1.4M cfs. On average, that maximum is reached between the .002 AEP and .001 AEP unregulated flow, but can occur among AEPs as frequent as the interval between .01 and .005. Unregulated flows in the truncated range will be assigned the same regulated flow as the largest defined unregulated flow.

The greatest problem lies with the final coordinate of the regulated-unregulated function. The largest possible regulated flow is 726k cfs. All flows on the bottom right-hand side of the analytical flow-frequency probability distribution table will be assigned a regulated flow no greater than 726cfs. As a result our risk estimate will reflect a dam that is incapable of releasing more than 726k cfs.

Tips and Tricks

Before you start building your model, identify all of the data that you'll need. Obtain the data, and inspect the data. Confirm that you're ready to model, and then build. Do not wait until you have built the model to identify problems in the data.
Explore the water surface profiles in geospatial software with the structure inventory. Extract water surface elevation values to the structure points and review the flood depths and frequencies. Identify areas with relatively high magnitude and frequency of flooding and ensure quality estimates of structure attributes. Identify outliers (exceptionally high flood depths) and remove or adjust where appropriate.
Check for monotonicity in the water surface elevations in the water surface profiles. Errors are often found when identifying and flagging this anomaly with the HEC-RAS modeler.
Check that your summary relationships have substantial overlap. In other words, check that the range of flows that corresponds to the probability domain is sufficiently captured in the domain of flows identified in the stage-discharge function, check that the range of stages identified in the stage-discharge function is sufficiently captured in the domain of stages identified in the stage-damage function, and so on. HEC-FDA does not extrapolate functions (except for the summary relationship identified as the frequency function). As such, non-overlapping regions results in unreasonable damage-frequency functions and thus unreasonable expected annual damage (EAD) estimates.
If there is any damage at the 2yr, you need a 1yr.
Specify as much of a summary relationship as possible. For example, include as many coordinates in a graphical frequency or stage-discharge function as possible. The more that you help the software approximate the true curve, the better results you'll get.
Keep track of the units of measurement. Units of measurement will not be tracked nor handled within HEC-FDA Version 2.0. For example, if terrain elevation is measured in meters but the first floor elevation is measured in feet, HEC-FDA will not know about the discrepancy - this will have to be handled outside of HEC-FDA.
Take the time to write out good names and descriptions. If something is preliminary or has been copied from somewhere else or has been estimated externally, mark that in the description. You'll thank yourself later.
Be sure to identify the without-project condition target stage for the with-project scenarios where impact areas do not have levees. Target stages/thresholds are otherwise determined by the software.
Use the 1.4.3 tab-delimited format to speed up the import of the summary relationships.
Scenarios are for EAD and system performance metrics, alternatives are for average annual equivalent damage (AAED), and alternative comparison reports are for EAD reduced and AAED reduced (this final metric is the most common measure of benefits).
Generally, there are five distributional assumptions available for input relationships and uncertainty parameters: Normal, Log Normal, triangular, uniform, and deterministic (no uncertainty). These distributions are specified in their usual way.
Import the structure detail output file in geospatial software and review estimated damages across the study area. Identify outliers and ensure quality control of the structure attributes. Re-run the model if significant changes are made as a result of this quality control exercise.