Appendix - HEC-DSS VERSION 7 FILE STRUCTURE
HEC-DSS Version 7
HEC-DSS VERSION 7 FILE STRUCTURE
HEC-DSS files are binary files that cannot be accessed outside of the DSS library. The file “word” size is 64 bits (longs). All addressing is done by the 64 bit words (not by byte address), meaning that an item with an address of 1,000 occurs at byte 64,000. Although the base word size is double, floats are just as easily stored by putting two floats in a 64-bit word. Similarly, integers and characters are combined into 64-bit words. Although all file addressing is 64-bits, the compiled library may be either 32-bit or 64-bit; it is independent of the file addressing.
There are four main components in a DSS file: 1) The permanent section, which contains primary addresses and file information; 2) The “Hash table”, which contains “pathname bin” addresses for each pathname “table hash”; 3) Multiple “pathname bins”, which, for a single table hash, contain pathnames and their data address, and 4) Data areas.
The permanent section, which is setup by the function “zinit”, contains addresses of the various components along with all the file information for that DSS file. This includes the address of the hash table, first pathname bin, catalog sort information, the reclamation table, etc. The size of the permanent sections is about 100 64-bit words, which can be viewed with HEC-DSSVue under the Tools menu, Debug, File Header.
Following the permanent section is the “hash table”. The hash table can vary in size when a DSS file is “squeezed” (rebuilt), according to the number of records (pathnames) in the file. When a record is to be accessed, its pathname is “hashed” by function zcheck into two different hash code, the table hash and bin hash. Both hashes are case insensitive. The table hash is a positive integer from about 0 to 10,000, depending on file parameters. It points to an address in the main hash table. The bin hash is a quasi-unique long integer used to quickly check if two pathnames are the same.
Zcheck does a pretty good job of evenly distributing table hashes for pathnames across the size of the table. The table hash is used to form a hash table address (for example, if it was 2000, that would be file word address 2000 + 100 = 2100). That word address then points to the file address of the corresponding pathname bin. If that bin address is zero, then that record does not exist.
The pathname bin is an area in the file that contains pathnames with the same table hashes. Its size can vary according to file size parameters, but is usually around 300 long words and often contains about 5 to 8 pathnames, along with other information. When searching for a record, DSS will compare each pathname in the bin with its “bin hash” and the bin hash of the pathname being searched for. The algorithm for computing the bin hash was borrowed from Java, and usually fills a long word and is almost always unique for the same table hash. This allows the code to quickly identify if it has the correct pathname or not. If there ever is a bin hash collision, the permanent section records that, and then the pathnames themselves would be compared after the bin hash codes (a very rare occurrence.) If the pathname is not found in the pathname bin, then it will search the (next) overflow bin for that table hash. This will continue until the bin indicates that no overflow bins follow. For a “regular-sized” DSS file, there will be 0-2 pathname bins for each table hash. For a very large DSS file, there might be 10 pathname bins for each table hash. This allows the DSS software to very quickly find pathnames in the DSS file.
When a pathname is found in a bin, the file address to the record’s “info area” is stored in the bin. That address points to a file address where the information about the record is stored, followed by the record’s header and data areas. (The info, header and data areas are always stored together in a file.) For a typical record read, this results in only a few file reads before reading its data from the file:
- Compute table hash and read bin hash address from table
- Read pathname bin and compare bin hashes for those pathnames in bin
- If overflows and is not in that bin, read the next overflow bin, and so forth until found.
- Using file address from bin, read the record’s info, header and data area.
Anatomy of a Write
Assuming the DSS file has been opened, etc., the calling process generates a pathname, the dataset and header arrays. For this exercise, we will assume that “zwrite” is being called.
Note: At this level, the six-part pathname and associated conventions are not required; the pathname can be just a string of characters. However, no utility routines (including the catalog?) will be able to access that dataset.
- zwrite calls zwriteInternal.
- zwriteInternal checks for write access and then locks the file (zlocking) to keep another process from changing internal addresses while it is changing them. zlocking also loads the file header (perm section).
- The pathname is “checked” by zcheck to determine if it exists:
- zcheck computes both a “table hash” and a “bin hash” for the pathname (bin as in container, not binary). Both hashes are case insensitive.
- The table hash is a positive integer from about 0 to 10,000, depending on file parameters. It points to an address in the main hash table.
- The bin hash is a quasi-unique long integer used to quickly check if two pathnames are the same
- zcheck compares the table hash and bin hash against the table and bin hash from the pathname last accessed. Most processes will do a check just before reading or writing and this avoids having to do a full check. If the hashes are the same and the file has a collision reported (bin hashes and table hashes the same for more than one pathname – extremely rare, but possible), the pathname being writing and the last one accessed are compared.
- If the current and last pathname are the same, then zcheck returns if the last record was found or not. (If found, the address pointing to the record info area is valid.)
- If the hashes are not the same, zcheck performs a “normal” check.
- The table hash code is used to read the address from the main hash table. If that address is zero, zcheck returns not found.
- If the table hash address is non-zero, that address points to the pathname bin for that table hash.
- The pathname bin is essentially a list of pathnames and their record addresses for that table hash.
- The pathname bin is read (around 200 longs) and searched for this pathnames bin hash.
- If the bin hash is found, the address of the record (the info area) is retained in ifltab and zcheck returns found.
- If the bin hash is not found, the address of the next (FIXX ME – word) bin is check to determine if zero or if it points to the next (FIX – adjunct or extension) bin.
- If the next bin address is zero, zcheck returns not found.
- If the next bin address is positive, that bin is read and searched, repeating the above process until the pathname is either found or there are no more bins for that table hash.
- zcheck computes both a “table hash” and a “bin hash” for the pathname (bin as in container, not binary). Both hashes are case insensitive.
- Note: A well-tuned DSS file would only have one or two pathname bins for each table hash.
- If the record was found, its “information block” is read. The information block is the record header, containing information about the record as well as the addresses to data and headers. The information block and data areas are always kept adjacent on disk for efficiency purposes. zoldWrite is called to update the information block and check for record expansion
- zoldWrite checks if the record size is the same or less than the allocated space. If so, that space is reused for the new (updated) data.
- If the record size is larger than the allocated space, the existing information block is marked as moved, and is re-written at the end of the file. Space for the data is allocated at the end of the file, right after the information block.
- When expanding (new size larger than old), logic is employed to attempt to determine the appropriate expansion size. Expansions are relatively expensive, so extra space may be allocated so that the dataset size can grow without having to expand the record right away.
- If the record was not found, znewWrite is called to add this pathname to the hash table and bin.
- If the hash table address for this pathname is zero, znewBin is called to create a new pathname bin and put its address in the hash table.
- znewWrite creates the information block and writes it at the end of the file.
- zupdateBin saves the address of the information block along with the pathname, hash, etc. in the pathname bin.
- Space for the data is allocated at the end of the file, right after the information block.
- zwriteInternal then allocates (memory) buffer space so that all data can be written in one write, if reasonable. For a smaller dataset, the various headers and dataset are combined into one array and then written together. For larger datasets, the headers and data are written separately.
- zwriteInternal calls zput4Buff to write each of the header arrays and the data array. zput4Buff will either save the data in the buffer, if large enough, or to the file, if not.
- If at the end of the file, an end-of-file flag is written.
- The permanent section of the file is updated with last write time, number records, file size, etc.
- All buffers are flushed (written) to the file. If on Windows in regular mode, that file is still cached in memory by the OS (independent of DSS). If in multi-user access mode, the buffers are physically written to disk, overriding an OS caching. Tests for Sun Solaris show little difference between modes, so data is always flushed. (On Windows, there is about 100 times difference in speed.)
- Buffers are freed and the file is unlocked.
- Returns with the status.
Low Level Writing in DSS-7
Lower level writes in DSS-7 use derivations of the “zput” function. Those functions are:
zput4Buff or zput8Buff: buffers adjacent data together so that one disk write can be made from several write calls. The “4” and “8” correspond to int 4 (32-bit) words and int 8 (64 bit) words. Generally, addressing and internal tables use int 8 words and user data values use int 4 words.
These call “zput”, which will either write the data or buffer. zput4Buff and zput8Buff set up the “buffer control” that indicates the address to be written to, number of values, etc. If the data is buffered, zput is eventually called with a flag to write the buffer area to disk.
Everything that is written to disk is written in int 8 words. If an int 4 block is odd, that is the last word in the block will only fill one-half an int 8 word, the left over int 4 word is set to zero. This ensures that everything written is in int 8 words. However, the actual writing occurs in bytes (int 1). As a convenience, zput’s array length is in int 4, although int 8 is always written (by both zput4 and zput8).
When zput writes to disk, it calls “zwriteDisk”. zwriteDisk is where the values are actually written. zwriteDisk is where ints are swapped for big endian machines. zwriteDisk array length is given in int 4, but change to bytes for the actual write.
In MS Windows file management, while in “non-multi-user access mode” (multi-user advisory, or exclusive), Windows writes to memory and the information does not end up on disk until file the file is closed or similar. In “multi-user access mode”, the information has to be physically on disk to prevent collisions (from other users). This requires a flush or commit after each full write and slows the write down by over 100 times (yes, at least one-hundred times slower or more!) Thus every effort is made to keep the writes out of multi-user access mode. Programmers need to be aware of this OS inefficiency (or efficiency, depending on your viewpoint), plus know not to close DSS files excessively. There have been no speed differences seen on Sun Unix systems and similar. As such, files are usually opened in multi-user access mode.