hash database (hdb32) multitool
hdb(5)                                hdb                               hdb(5)



NAME
       hdb - hash database (hdb32) file format

DESCRIPTION
       An  hdb file is implemented as a partitioned hash table.  It provides a
       set of mappings for record ``keys'' to record  ``values'',  where  keys
       and records are any arbitrary strings of bytes.

       An  hdb  file  is implemented as a ``write-once, read-many'' datastore.
       It is generally used by applications  requiring  fast  read  access  to
       ``constant''  data,  and  is modeled after the cdb(5) constant database
       specification.

       A hdb file is comprised of six sections:

           I M P C R H

       Where:

       I: identifier
              A file type identifier of 16 bytes.  The  identifier  for  files
              conforming to this specification is ``hdb32/1.0\0\0\0\0\0\0\0''.

       M: metadata
              File metadata of 8 bytes consisting  of:  4  bytes  nrecs  total
              records  in database; and 4 bytes offset to start of record sec-
              tion R.

       P: hash subtable pointers
              An array of 8 pointer elements providing a ``table of contents''
              for  the 8 hash subtables in section H.  Each element P[ntab] is
              eight bytes: 4 bytes number of slots in hash  subtable  ntab;  4
              bytes  offset  to  start  of hash subtable ntab.  A subtable may
              have zero slots.

       C: descriptive comment
              An optional comment of any length, normally used to describe the
              dataset.

       R: records
              The  sequential  collection  of  nrecs  records in the database.
              Storage for each record R[nrec] is a concatenation of:  3  bytes
              key  length; 3 bytes value length; record key; and record value.

       H: hash tables
              An array of 8 hash subtables.  Each  subtable  H[ntab]  is  com-
              prised  of  P[ntab].tslots  slots.   Each slot is eight bytes: 4
              bytes hash value; 4 bytes offset to associated record in section
              R.  An offset value of 0 indicates an empty slot, and no associ-
              ated record exists.

       Hash values are calculated for any given byte string key of length klen
       according to the following algorithm:

           uint32_t
           hdb32_hash(unsigned char *key, uint32_t klen)
           {
             uint32_t  h = 0;

             while(klen){
               h = h ^ *key;
               h = h * 37;
               ++key; --klen;
             }

             return h;
           }

       The subtable ntab for a given key is calculated from the relation:

           ntab = (hash % 8).

       The  subtable offset and number of subtable slots is read from P[ntab].
       The target slot for a given key in a subtable is computed from:

           target = ((HMIX(hash) >> 3) % tslots)

       where tslots is the number of  slots  in  the  subtable  as  read  from
       P[ntab], and HMIX(hash) is a bit operation defined as:

           (hash >> 13) ^ hash

       A record is located according to an open-addressing hash scheme as fol-
       lows.  First compute the hash value, subtable, and target slot for  the
       record  key.  Begin probing in subtable H[ntab], starting with the tar-
       get slot, then the next slot, and so forth, while checking at each step
       in section R for a matching key.  Continue until either the record with
       matching key is found, or until an empty slot is encountered.

       All offset and hash values  are  stored  as  32-bit  unsigned  integers
       packed  into a portable, little-endian form, 4 bytes per value.  Length
       values for keys and values are  stored  as  24-bit  unsigned  integers,
       packed i