HEC-DSS Basics

HEC-DSS is designed to be optimal for storing and retrieving large sets, or series, of data. HEC-DSS incorporates a modified hashing algorithm and hierarchical design for database accesses. This algorithm provides quick access to data sets and an efficient means of adding new data sets to the database. HEC-DSS is not a relational database, but a database that is designed to retrieve and store large amounts of data quickly that are not necessarily interlinked to other sets of data, like relational databases are. Additionally, HEC-DSS provides a flexible set of utility programs and is easy to add to a user's application program. These are the features that distinguish HEC-DSS from most commercial relational database programs and make it optimal for scientific applications.

HEC-DSS uses a block of sequential data as the basic unit of storage. Each block contains a series of values of a single variable over a time span appropriate for most applications. The basic concept underlying HEC-DSS is the organization of data into records of continuous, applications-related elements as opposed to individually addressable data items. This approach is more efficient for scientific applications than a relational database system because it avoids the processing and storage overhead required to assemble an equivalent record from a relational database.

Data is stored in blocks, or "records", within a file and each record is identified by a unique name called a "pathname." Each time data is stored or retrieved from the file, its pathname is used to access its data. Information about the record (e.g., units) is stored in a "header array." This includes the name of the program writing the data, the number of times the data has been written to, and the last written date and time. HEC-DSS documents stored data completely via information contained in the pathname and stored in the header so no additional information is required to identify it. The self-documenting nature of the database allows information to be recognized and understood months or years after it was stored.

The pathname is the key to the data's location in the database. HEC-DSS uses the pathname to determine a "hash" index number. This index determines where the data set is stored within the database. The design ensures that few disk accesses are needed to retrieve or store data sets. One data set is not directly related to another so there is no need to update other areas of the database when a new data set is stored.

Because of the self-documenting nature of the pathname and the conventions adopted, there is no need for a data dictionary or data definition file as required with other database systems. In fact, there are no database creation tasks or any database setup. Both HEC-DSS utility programs and applications that use HEC-DSS will generate and configure HEC-DSS database files automatically. There is also no pre-allocation of space; the software automatically expands the file size as needed.

HEC-DSS references data sets, or records, by their pathnames. A pathname may consist of up to 391 characters and is, by convention, separated into six parts, which may be up to 64 characters each. Each part is delimited by a slashe "/", and is labeled "A" through "F," as follows:
/A/B/C/D/E/F/
HEC-DSS has conventions to store different kinds of data. These conventions include time-series data, curve or paired data, gridded, text and other types of data. Time-series data is one of the more common types of data stored in HEC-DSS. The time series convention has a pathname with parts as follows:

/Basin/Location/Parameter/Block Date/Interval/Version/
An example of a time-series pathname is:
/Green River/Bend/Flow/01Jan2002/1Day/Observed/
A time-series dataset includes both a pathname and a time window. Such a dataset might be for 75 years of flow data at Bend, for example,
/Green River/Bend/Flow//1Day/Observed/
Start: 01Oct1940
End: 30Sep2015
Or, as a "condensed pathname":
/Green River/Bend/Flow/01Oct1940 - 30Sep2015/1Day/Observed/
Note that neither
"/Green River/Bend/Flow//1Day/Observed/"
nor
"/Green River/Bend/Flow/01Oct1940 - 30Sep2015/1Day/Observed/"
are valid pathnames by themselves. A time series function will take the time window and build the unique pathnames, read each of those records and combine them together to make a complete dataset. The following records are implied:
/Green River/Bend/Flow/01Jan1940/1Day/Observed/
/Green River/Bend/Flow/01Jan1941/1Day/Observed/
…
/Green River/Bend/Flow/01Jan2015/1Day/Observed/

A list of the pathnames in a DSS file is called a "catalog". In version 6, the catalog was a separate file; in version 7, the catalog is constructed directly from pathnames in the file.
Refer to Chapter 1 of the HEC-DSSVue User's Guide for further information.