Task 4 Analyze Results from the Machine Learning Model

Results from the random forest models were evaluated in two ways. First, the random forest predicted HEC-HMS parameter adjustment factors were compared to factors that were set through manual calibration. Second, the random forest predicted HEC-HMS parameter adjustment factors were plugged into HEC-HMS model simulations and computed hydrographs were compared to observed streamflow.

The Python script was configured to compute the simple linear R² value (zero intercept) by comparing the random forest predicted HEC-HMS adjustment factors and those set through manual calibration. As shown in the table below, these values demonstrate good agreement in the predicted state variables and those set through manual calibration.

Process and zone	R²
Initial Baseflow Lake Mendocino Zone	0.86102
Initial Baseflow Hopland Zone	0.98037
Initial Baseflow Guerneville Zone	0.95937
Initial Baseflow Lake Sonoma Zone	0.98926
Initial Deficit Lake Mendocino Zone	0.92718
Initial Deficit Hopland Zone	0.87681
Initial Deficit Guerneville Zone	0.929496
Initial Deficit Lake Sonoma Zone	0.880974
Constant Loss Rate Lake Mendocino Zone	0.902152
Constant Loss Rate Hopland Zone	0.853695
Constant Loss Rate Guerneville Zone	0.807263
Constant Loss Rate Lake Sonoma Zone	0.849145
GW1 Fraction Lake Mendocino Zone	0.975312
GW1 Fraction Hopland Zone	0.935556
GW1 Fraction Guerneville Zone	0.946215
GW1 Fraction Lake Sonoma Zone	0.958229
GW2 Fraction Lake Mendocino Zone	0.968632
GW2 Fraction Hopland Zone	0.94916
GW2 Fraction Guerneville Zone	0.961453
GW2 Fraction Lake Sonoma Zone	0.966991

The following figure show graphically what is in the R² table above for the Calpella gage location. R² values for the first ML model (one ML model for all HEC-HMS adjustment factors, SimpleMLScript_1.py) showed comparable performance. Significant improvement are not demonstrated by having a separate random forest model for each hydrologic process.

R2 values from a comparison of random forest and manual set HEC-HMS adjustment factors

The following group of results show fairly good model performance at all four locations for the January 2, 1997 event. Starting a forecast simulation with the initial adjustment factors from the random forest model provides the modeler a great starting point that would only require slight modification to the initial deficit and constant loss rate adjustment factors for the Hopland, Guerneville, and Lake Sonoma Zones.

Results when using the HEC-HMS adjustment factors generated by the random forest models - January 2, 1997 event

The following group of results DO NOT show good model performance at all four locations for the November, 11, 2001 event. However, the initial HEC-HMS adjustment factors set by the random forest models are fairly close to those found through manual calibration. The November, 11, 2001 storm was the first major storm of the season and the watershed was very dry when the storm began. The HEC-HMS adjustment factors estimate by the random forest models are much better than using average conditions to begin the calibration process.

Results when using the HEC-HMS adjustment factors generated by the random forest models - November 12, 2001 event

On the whole, the ML model predicted reasonable initial HEC-HMS adjustment factors for most events. Performance for the two February 2009 events was not as good as other events, results from the first event are shown in the following figure. January and February 2009 were drier than most years and the random forest model overpredicted the Initial Deficit adjustment factors. However, it would be relatively quick to manually set better adjustment factors after seeing the following results.

Results when using the HEC-HMS adjustment factors generated by the random forest models - February 17, 2009 event

Conclusion

This example demonstrates a couple of key points:

An HEC-HMS model can be quickly calibrated for real time flood forecasting by only adjusting a few model parameters that represent the initial state of the watershed. Even though the initial baseflow adjustment factor was modified for this example, reasonable results would have been seen by keeping a constant zonal override set for all events. This would have left only 4 adjustment factors to really change per event.
The random forest models were able to assimilate past information, precipitation, temperature, and flow, and estimate reasonable initial HEC-HMS adjustment factors. In some cases the adjustment factors did not require further adjustment, the simulated hydrograph matched the observed hydrograph.

The random forest models took approximately 2 minutes to process the training data and fit the model and milliseconds to generate the predicted adjustment factors for the 11 test events. If applied to real-time forecasting, the random forest models would already exist and any time a new forecast created, it would automatically populate the starting HEC-HMS adjustment factors. Periodically, the random forest models would be updated as new events were added to the training dataset.