HEC-DSS uses a block of sequential data as the basic unit of storage. This concept results in a more efficient access of time series or other sequentially related data. Each block contains a series of values of a single variable over a time span appropriate for most applications. The basic concept underlying HEC-DSS is the organization of data into records of continuous, applications-related elements as opposed to individually addressable data items. This approach is more efficient for scientific applications than a conventional database system because it avoids the processing and storage overhead required in assembling an equivalent record from a conventional system.


Data is stored in blocks, or records, within a file and each record is identified by a unique name called a "pathname". Each time data is stored or retrieved from the file, the data's pathname must be given. The data and information about the data (e.g., units) is stored in a "header array". HEC-DSS automatically stores the name of the program writing the data, the number of times the data has been written to, and the last written date and time. HEC-DSS documents stored data completely via information contained in the pathname and stored in the header so no additional information is required to identify the data. The self-documenting nature of the database allows information to be recognized and understood months or years after data has been stored.


The pathname is the key to the data's location in the database. HEC-DSS analyzes each pathname to determine a "hash" index number. This index determines where the data set is stored within the database. The design ensures that very few disk accesses are made to retrieve or store datasets. One data set is not directly related to another so there is no need to update other areas of the database when a new dataset is stored.
Because of the self-documenting nature of the pathname and the conventions adopted, there is no need for a data dictionary or data definition file as required with other database systems. In fact, there are no database creation tasks or any database setup. Both HEC-DSS utility programs and applications that use HEC-DSS will generate and configure HEC-DSS database files automatically. There is no pre-allocation of space; the software automatically expands the file size as needed.


A HEC-DSS database file has a user-specified conventional name with an extension of "dss". As many database files as desired may be generated and there are no size limitations, apart from available disk space. Corps offices have HEC-DSS files that range from a few datasets to thousands. HEC-DSS adjusts internal tables and hash algorithms to match the database size so as to access both small and very large databases efficiently.
HEC-DSS database files are "direct-access" binary files with no published format. Only programs linked with the HEC-DSS software library can be used to access HEC-DSS files. Direct access files allow efficient retrieval and storage of blocks of data compared to sequential files.


A principal feature of HEC-DSS is that many users can read and write data to a single database at the same time. This multi-user access capability is implemented with system record locking and flushing functions. There is no daemon or other background program managing accesses to a database. A database may exist on a Windows or UNIX server machine, which can be accessed by users on PC's or other computers via NFS or the Microsoft network, as long as locking and flushing functions are implemented.

Pathnames


HEC-DSS references datasets, or records, by their pathnames. A pathname may consist of up to 391 characters and is, by convention, separated into six parts, which may be up to 64 characters each. Pathnames are automatically translated into all upper case characters. Pathnames are separated into six parts (delimited by slashes "/") labeled "A" through "F", as follows:
/A/B/C/D/E/F/
The naming convention for pathname parts is listed in the table below:

Part

Description

A

Project, river, or basin name

B

Location

C

Data parameter

D

Starting date of block, in a nine-character military format

E

Time interval

F

Additional user-defined descriptive information


An example pathname for regular-interval time series might be:
/RED RIVER/BEND MARINA/FLOW/01JAN1995/1DAY/OBS/

Catalogs


HEC-DSS utility programs, including HEC-DSSVue, will generate a list of the pathnames in a HEC-DSS file and store that list in a "catalog" file. The catalog file is a list of the record pathnames in the file, their last written date and time, and the name of the program that wrote that record. The catalog is usually sorted alphabetically by pathname parts. Each pathname has a record tag and a reference number, either of which may be used in place of the pathname in several of the utility programs. The name given to the catalog file is the HEC-DSS file's name with an extension of ".dsc".


A special catalog file, the "condensed catalog", is useful mainly for time series data. In this type of catalog, pathname parts display in columns, and pathnames for time series data, differing only by the date (Part D), are referenced with one line.

Data Conventions


HEC-DSS stores different types of data structures by using an appropriate pathname. To facilitate the ability of application and utility programs to work with and display data, standard record conventions were developed. These conventions define what should be contained in a pathname, how data is stored, and what additional information is stored along with the data. For regular-interval time series data (e.g., hourly data), the conventions specify that data is stored in blocks of a standard length, uniform for that time interval, with a pathname that contains the date of the beginning of the block and the time interval. The conventions identify how a pathname for the data should be constructed. Conventions have been defined for regular and irregular interval time series data, paired (curve) data, gridded data (such as NEXRAD radar data), and text (alphanumeric) data.


Regular-interval time series data is data that occurs at a standard time interval. This data is divided into blocks whose length depends on the time interval. For example, hourly data is stored with a block length of a month, while daily data is stored with a block length of a year. Only the date and time of the first piece of data for a block is stored; the times of the other data elements are implied by their location within the block. If a data element, or a set of elements, does not exist for a particular time, a missing data flag is placed in that element's location. Data quality flags may optionally be stored along with a regular-interval time series record.


Irregular-interval time series data does not have a constant time interval between values. This type of data is stored with a date/time stamp for each element. The user-selectable block size is based on the amount of data that is to be stored. For example, the user may select a block length of a month or a year. Because a date/time stamp is stored with each data element, approximately twice the amount of space is required compared with regular-interval time series data. Data quality flags may optionally be stored along with an irregular-interval time series record.


A convention for paired data has been defined for data that generally defines a curve. Paired data is for rating tables, flow-frequency curves, and stage-damage curves. One paired data record may contain several curves within it as long as the record has a common set of ordinates. For example, a stage-damage curve will contain a set of stages and may have associated residential damages, commercial damages, and, agricultural damages; however, a stage-damage curve and a stage-flow curve should be be stored in separate records.


Gridded Data Conventions


Grid records in DSS are named according to a naming convention that differs slightly from the convention for time-series or paired-data records.  Grids represent data over a region instead of at a single location and one grid record contains data for a single time interval or instantaneous value.  The naming convention assigns the six pathname parts as follows.


  • A-part:  Refers to the grid reference system.  At present, GageInterp supports only the HRAP and SHG grid systems (see appendices D and E).  Other grid systems will be necessary for work outside the conterminous United States.
  • B-part:  Contains the name of the region covered by the grid.  For radar grids, this could be the name of the NWS River Forecast Center that produces the grid.  For interpolated grids, this could be the name of a watershed.
  • C-part:  Refers to the parameter represented by the grid.  Examples include PRECIP for precipitation, AIRTEMP for air temperature, SWE for snow-water equivalent, and ELEVATION for ground surface elevation.
  • D-part:  Contains the start time.  This is the starting time of the interval covered by the grid.  The date and time are given military-style (DDMMMYYYY for date and HHMM for time on a twenty-four hour clock) and the date and time are separated by a colon (:).  All times for grids should be given as UTC.  Midnight is represented by 0000 if it is a starting time and 2400 if it is an ending time.
  • E-part:  Contains the end time.  This is the ending time of the interval covered by the grid.  The E part is blank for grids of instantaneous values.
  • F-part:  Refers to the version of the data.  The version identifies the source of the data or otherwise distinguishes one set of grids from another.  Version labels include STAGEIII for NWS stage III radar products, and INTERPOLATED for grids produced by GageInterp.


Some examples of DSS grid pathnames follow.


The pathname below names an SHG precipitation grid for the Rogue River basin for the hour ending at 1900 UTC on May 3, 2003.  The Rogue basin is in Oregon, so the grid represents the hour ending at noon local (Pacific Daylight) time.  This grid was generated by GageInterp.


/SHG/ROGUE/PRECIP/03MAY2003:1800/03MAY2003:1900/GAGEINTERP/


The pathname below names a precipitation grid for the Missouri River basin for the hour ending at 0100 UTC.  The grid comes from the Missouri Basin River Forecast Center, and is a product of the NWS’s multi-sensor precipitation estimate.


/HRAP/MBRFC/PRECIP/22SEP2004:0000/22SEP2004:0100/MPE/


The pathname below names an HRAP grid representing a Quantitative Precipitation Forecast over the Ohio River Basin.  This is a NWS product from the Ohio River RFC, and it covers the twenty-four hour period from noon June 1, 2001 to noon June 2, 2001 (UTC).


/HRAP/OHRFC/PRECIP/01JUN2001:1200/02JUN2001:1200/QPF/


The pathname below names an SHG temperature grid for the Rogue River basin.  The cell size in this grid is 1000 meters instead of the default 2000 meters for SHG.  Because the temperature is an instantaneous value at 0800, the E-part of the path is blank.


/SHG1K/ROGUE/AIRTEMP/22FEB2002:0800//GAGEINTERP/