Many data sets contain erroneous values, which must be corrected before performing mathematical operations with the data.  HEC-DSSVue provides some limited screening criteria, allowing values to be declared as 'missing', and/or quality flags to be set.  This workshop demonstrates three of the four mechanisms available to identify and replace invalid data:

  • Screen Using Minimum and Maximum
  • Screen with Forward Moving Average (not used in this workshop)
  • Manual editing of values inside Tabulation window
  • Estimate Missing Values

Task 1. Select the data set

  1. Start HEC-DSSVue and open “MathFunctions_1-2.dss” 
  2. Select 'Condensed Catalog' under the View menu.  In the absence of a time window, this makes subsequent operations on a pathname apply to all the records, regardless of the D-part.
  3. Also under the View menu, ensure that the Unit System and Time Zone are rendered “As Stored”, in order to avoid inadvertent conversions when writing Math Functions results back to the database.
  4. Select the data set named
    /FOX RIVER/LUTZ PARK/FLOW‑RES OUT//15MIN/USGS-CST/
  5. launch the Math Functions module under "Tools"
  6. Plot and tabulate the data.

Task 2. Screen data

Note that the data contains many gaps and downward spikes.  The station now has better technology, but this data was produced by a very old USGS Acoustic Velocity Meter.  Each minute it made an instantaneous measurement of mean velocity across the channel, which it multiplied by cross-sectional area to calculate flow.  Every 15 minutes it applied some very coarse quality-control procedures, averaged the samples, and reported the resulting flow to the USGS. 

In situations where the actual data changes gradually, individual fluky values can be easily screened out using a “moving average” technique.  However, this gage is located a short distance downstream of the Lake Winnebago outlet structures and hydropower facilities.  Abrupt flow changes typically indicate gage malfunction but can be difficult to distinguish from actual variations due to gate movements or generating adjustments.

  1. Open the Math Functions menu.
  2. Choose the “General” tab
  3. Set the “Operator” to “Screen with Forward Moving Average”.
    1. Your engineering judgment and experience with this gage suggest that the flows cannot realistically change more than 500 cfs in an hour.
    2. Screen the data set using the parameters shown in the figure below. 
    3. Note that these screening values could also be used in a script to automate a first pass at data validation.
  4. Plot and tabulate your computed data to review the changes. 
    1. Try selecting 'Original Data with Computed' under the Display menu and re-plotting the data. 


Question 1. If the original curve covers up the “modified” curve in the plot, how can you switch order so that the revised curve is plotted on top of the original?

  • Go to the "Edit" button on the ribbon and select "Plot Properties..."
  • Use the arrows to the right to change the order of the curves
  1. In the “View" then "Quality” menu option of the tabulation, switch between “Symbol” and “Hex”.
    1. Note the three different quality flags associated with the “modified” data? 
      1. “3” or empty cell means okay value,
      2. “5” or “M” means missing before the test
      3. “11” or “R” means rejected according to the test
  2. Look again at the plot and note that many dubious values persist. 
    1. The revised data is still unfit for use in computations. 
    2.  Perhaps a more aggressive screening test is needed. 
    3. Leave the existing plot open for subsequent comparisons
  3. Go back to the Math Functions window. 
  4. Choose “Restore Original Data” from the “Edit” menu, and verify that when you plot the data that only the original pathname appears. 
  5. Verify that the “Screen with Forward Moving Average” Operator is still selected. 
  6. Leave the “Number To Average Over” at 4, but set the “Change Value Limit” to 200.  
  7. Click “Compute”.  
  8. Verify that 'Original Data with Computed' is turned on, and use the plot/tabulate functions to review the results


Question 2. Did the stricter test reject any additional bogus data?

Yes. The point around the center of the data is not longer included in the screened data like it was in the screening attempt performed prior.

Question 3. Did it reject any “good” data?

Yes. We had some steep sudden changes in the data that were removed due to the screening.

  1. Again Choose “Restore Original Data” from the “Edit” menu
  2. Verify that when you plot the data that only the original pathname appears. 
  3. Screen the data set using the Minimum and Maximum screening function with the values shown in the figure below. Such specific parameters would not be suitable as generalized limits for automated processing, but represent the most effective parameters for the given period of this data.
  4. Plot and tabulate your computed data to review the changes. 
    1. Try selecting 'Original Data with Computed' under the Display menu and re-plotting the data


Question 4. If the original curve covers up the revised curve in the plot, how can you switch order so that the revised curve is plotted on top of the original?

  • Go to the "Edit" button on the ribbon and select "Plot Properties..."
  • Use the arrows to the right to change the order of the curves

Alternately, in the Legend click on the covered pathname to temporarily "bold" the line in the plot. 

Question 5. Does the revised data set look ready for use in computations?

Not ready yet because calculations (and models) generally require all values to be present.   The missing values will need to be addressed in some way.   Also note the upward-spike on 9Oct2000, and a block of dubious data on 25Oct2000 still remain after the initial screening, so further clean-up is needed prior to estimating missing.

  1. Using 'Save As', save your changes with a new F part of ‘USGS-CST-REV’, then close the Math Functions window.
  2. From the main HEC-DSSVue screen, clear any previously selected pathnames, and tabulate your revised data. 
  3. From the tabulation window, enable the 'Allow Editing' option under the ‘Edit’ menu. 
  4. Scroll down to the values for 0703 and 0718 on 9Oct2000, and highlight the two rows. 
    1. (NOTE: the times might appear as 0715 and 0730 due to a bug in some versions of DSSVue 3.0 that causes the times to be “snapped” to the top of the interval during the save to DSS).  
  5. Right-click on the selected values, and choose 'clear'.  Save your changes to the same pathname, and close the tabulation window.

The spike down on October 25 is invalid and needs to be removed. 

     22. To accomplish this, set the time window in HEC-DSSVue to 0000-2300 on 25Oct2000

     23. Select the recently created/revised data set and launch the Math Functions module again. 

                 a. This time the math operations are confined to data for 25Oct2000. 

     24. Plot and tabulate the data to verify that flows below 4500 cfs are likely bogus.

     25. Run the 'Screen Using Minimum and Maximum' function again, with a Minimum Value Limit of 4500. 

     26. Save the data in the ‘USGS-CST-REV’ pathname again, and close the Math Functions screen

      27. Clear the time window in HEC-DSSVue, and plot the data. 

      28. All suspicious values should be gone. 

      29. Close your plot.

Task 3. Estimate Missing Values

  1. Select the same revised Lutz Park flows, launch the Math Functions module.
  2. Estimate missing values, setting the maximum consecutive number of missing to 96, so that it will not interpolate values if the missing values could span a complete day.
  3. Press ‘Compute’.
  4. Plot and tabulate the new data set, verifying that all of the missing values have been filled in.
  5. Save your data.
  6. Close the math window if it is still open.


Your data technician passes by and points out that cleaned up data already exists in /FOX RIVER/LUTZ PARK/FLOW-RES OUT//15Minute/USGS-CST-FIXED/. He thinks it’s great that you walked a mile in his shoes, but really, you should just use the “fixed” version provided by an experienced professional, pointing out that the powerhouse really did have a hiccup on 25Oct2000, and that the parameter type must be "INST-VAL" according to an arbitrary but long-standing convention.

      7. Plot and tabulate your "-REV" data vs the "-FIXED" data.   

Question 6. Does re-defining the parameter type to "INST-VAL" actually change anything?

Yes the meaning of the timestamps has changed. What was previously the end of an averaging period is now the time of an instantaneous observation.