Random Forest Models

The Scikit-learn Python library contains a Random Forest Regression model (documentation is at https://scikit-learn.org/stable/). Initially, one random forecast model (SimpleMLScript_1) was used to predict all adjustment factors. As shown below, the X array was all the predictor variables (precipitation and temperature) for each of the 32 training events, and the Y array was all 20 adjustment factors from the calibrated models. The 3, 7, 30 and 60 day total precipitation and 7 day average temperature were used as predictor variables in this first ML model.

In the second iteration (SimpleMLScript_2), a separate ML model was developed for Initial Deficit, Constant Loss, GW1 Fraction, and GW2 Fraction adjustment factors, and for each individual Initial Baseflow adjustment factors. The 1-day prior flow seemed like the clear predictor variable for setting the zonal Initial Baseflow adjustment factors; a separate ML model was set up for each initial baseflow zone using the prior day's flow for the reference streamflow gage for that zone. Precipitation, Temperature, and Flow were used as predictor variables for Initial Deficit, Constant Loss, GW1 Fraction, and GW2 Fraction adjustment factors. 

The following table shows the importance factors for each predictor variable used in the different ML models. You will notice that the ML model used to predict the Initial Moisture Deficit placed more weight on the 30 day total precipitation, the 60 day total precipitation, the 7 day average flow at Guerneville, and the 7 day average flow at Lake Sonoma. 

XP3 (3 day total precipitation)0.019790.078480.024380.03464
XP7 (7 day total precipitation)0.028910.036140.041620.06356
XP30 (30 day total precipitation)0.156790.049070.038340.02937
XP60 (60 day total precipitation)0.167950.177440.391150.40637
XMonth (month of year at the start of the event)0.025370.032430.062620.07476
X7Flow_Calpella (7 day average flow at the Calpella Gage)0.07770.250140.11730.0524
X7Flow_Hopland (7 day average flow at the Hopland Gage)0.075190.068430.055560.0411
X7Flow_Guerneville (7 day average flow at the Guerneville Gage)0.162840.122340.183480.19926
X7Flow_LakeSonoma (7 day average flow at the Lake Sonoma Gage)0.213730.045530.035570.03933
XT7 (7 day average temperature)0.071750.140010.049970.05921

Different predictor variables could be explored as well as the "n_estimaters" parameter used to set the number of decision trees within the random forest model. The following tables contain the HEC-HMS adjustment factors for the test events as predicted by the random forest models created using the SimpleMLScript_2 script

Initial Baseflow Scale Factor
Zones1/2/19971/22/199711/12/20011/1/200212/17/200212/27/20025/20/200512/21/20052/24/20082/18/20092/23/2009
Lake Mendocino0.570.310.10.530.360.30.450.120.270.260.27
Hopland0.240.260.080.20.080.220.190.080.220.220.22
Guerneville0.050.050.010.050.020.050.050.020.020.050.05
Lake Sonoma0.480.50.390.340.390.50.390.50.390.50.49


Initial Moisture Deficit Scale Factor
Zones1/2/19971/22/199711/12/20011/1/200212/17/200212/27/20025/20/200512/21/20052/24/20082/18/20092/23/2009
Lake Mendocino0.350.422.870.392.560.271.021.590.590.810.32
Hopland0.450.563.010.582.680.51.141.940.691.010.43
Guerneville0.380.42.810.42.580.361.051.640.540.880.31
Lake Sonoma0.600.713.30.792.920.631.292.210.841.170.53


Constant Loss Rate Scale Factor
Zones1/2/19971/22/199711/12/20011/1/200212/17/200212/27/20025/20/200512/21/20052/24/20082/18/20092/23/2009
Lake Mendocino0.430.420.780.480.850.450.520.570.470.550.43
Hopland1.451.221.811.432.471.191.581.741.31.711.27
Guerneville0.70.721.580.652.150.680.941.660.811.40.67
Lake Sonoma1.091.231.950.921.850.991.431.360.991.271.21


Groundwater 1 Fraction Scale Factor
Zones1/2/19971/22/199711/12/20011/1/200212/17/200212/27/20025/20/200512/21/20052/24/20082/18/20092/23/2009
Lake Mendocino0.890.930.380.90.50.930.790.810.820.780.94
Hopland0.830.980.390.810.370.810.760.590.780.660.91
Guerneville0.870.990.310.860.30.80.780.550.810.610.87
Lake Sonoma0.910.990.350.910.310.940.780.540.810.610.87


Groundwater 2 Fraction Scale Factor
Zones1/2/19971/22/199711/12/20011/1/200212/17/200212/27/20025/20/200512/21/20052/24/20082/18/20092/23/2009
Lake Mendocino1.041.160.521.050.631.031.070.951.020.921.14
Hopland0.930.990.430.90.520.950.80.770.80.720.94
Guerneville0.950.990.430.940.470.950.830.750.790.690.96
Lake Sonoma0.951.010.50.980.510.970.820.760.80.680.94