Reader+ (Dataset Preprocessing)

Each source of data in the Reader+ Editor is referred to as a Data Product. A Data Product is a set of spatially distributed data that are consistent in time, and generally from a single source. In the example of DSS grids, a Data Product would be grids with a consistent set of pathnames. There are a series of steps performed on each Data Product, which converts them to a consistent set of grids, preparing them for merging into a single coherent dataset. Below is the list of steps for processing each Data Product:

Data Product Operations on each grid:

Read grid
Project grid to specified resolution and extents
Fill temporal gaps
Fill missing grid cells
Transform data temporally to the correct time step and data type
Screen values
Set grid parameter name

Read Grid

Reads the grid based on the specifications in the Reader+ (Data Selector).

Project Grids

In order to get all Data Products onto a consistent spatial domain, each grid from each Data Product is projected onto a user defined grid. This grid definition has a Coordinate System, Lower Left X Coordinate, Lower Left Y Coordinate, Number of Columns, Number of Rows, and Resolution (Cell Size). With this information, the Reader+ Editor uses the ProjectTin function in MetCaclulator to reproject the grid onto the output grid definition.

Fill Temporal Gaps

On each Data Product, the user has the ability to fill missing grids with a default value. This filling will only apply within the time window of the available data for that Data Product. Note that if additional data filling is required outside the bounds of the available data, that can be performed after merging the Data Products. Data Filling will create a grid, matching the settings in grid projection, but setting all the values to a consistent value. The expected case is setting a value of zero for precipitation, but is not limited to that value. By default, filling on the Data Product will be set to leave the grids missing, allowing for other data products to fill the gaps with real data. In general, Fill Temporal Gaps should be set to Leave Missing in the preprocessing, leaving temporal gap filling to the product merging compute.

Fill Missing Data

Missing data within a grid may be filled with a default value or an average of the adjacent non-missing grid cells. Alternatively, the missing grid cells can be left as missing values to be cleanup up by the merging process.

Transform Data Temporally

Temporal transformation will convert the grids from an input time steps to the output time step. Depending on the type of data, and user defined settings, this transformation will convert data to the desired time step. Below is a description of the transformations performed:

Period Cumulative Data (i.e. precipitation)
- Conserves precipitation mass
  - If converting grid to a longer duration, the grid values will be added
  - If converting grids to a shorter duration, the grid values will be assigned based on a user specified pattern.
- Transformation patterns can be used for disaggregation
  - Default is uniform, which distributes precipitation mass equally across shorter durations
  - Pattern options are provided that will push more precipitation towards the start, middle or end of the input grid duration.
    - The Triangular distribution distribute a 6Hour grid with 1" of precipitation to the following distribution: .057", .167", .277", .277", .167", .057"
    - The Peaked_Start_1 distribution distribute a 6Hour grid with 1" of precipitation to the following distribution: 1", 0, 0, 0, 0, 0
  - In addition to the pattern, an option to alternate the pattern is provided. This allows for minimizing the amount of time that the precipitation falls.
    - For example, if there are 2 successive 6Hour grids, each with 1" of precipitation, and the Peaked_Start_1 option is applied in an alternating fashion, the 12hour distribution will be 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0.
Period Average Data (i.e. temperature)
- Applies period average values across duration
  - For example, if an input 6Hour grid has 32 degrees, and the output time step is 1Hour, six 1Hour grids will be created, each with 32 degrees
Instantaneous Data (i.e. temperature)
- Grids are interpolated
  - For example, if there is a grid at hour 1, with a value of 10, and a grid at hour 3, with a value of 20, a grid will be created at hour 2 with a value of 15.
- Special Case of Minimum and Maximum Temperature
  - Daily Minimum and Maximum Temperature data can be provided for interpolation
    - Requires specification of an assumed hour at which the minimum and maximum temperature occur
    - Performs a sinusoidal interpolation between minimum and maximum grids

Screen Values

Values within the grids can be screened based on a minimum and maximum value. If the user believes there is a practical maximum or minimum value in the output grids, they can be screened to prevent those values. For example, a minimum value of zero for precipitation is practical, as negative precipitation is impossible. Given the specification of a minimum and/or a maximum screening bound, a replacement value can be specified. This can either be a real value, or specifying that value as NaN, -Infitnity or Infinity.

Set Parameter Name

In order to keep all preprocessed Data Products consistent, it is necessary to make the preprocessed grids have a consistent parameter name. Since each Data Product has different metadata, the specification of parameter name provides the user to select the desired parameter name. Common parameter names are Precipitation, Temperature, etc.