hdb::hdb.1

hash database (hdb32) multitool
hdb(1)                                hdb                               hdb(1)



NAME
       hdb - hash database (hdb32) multitool

SYNOPSIS
       hdb [-hV] command opts args

       hdb about [-h] [-acin] [ hdb ]
       hdb cross [-h] [-p perms ] [-t tmpfile ] result.hdb A.hdb B.hdb
       hdb dump [-h] [-d del | -g | -y] [ hdb ]
       hdb dupes [-h] [-d del | -g | -y] [-q] [ hdb ]
       hdb get [-h] [-a | -j num ] [-n] key [ hdb ]
       hdb grep [-h] [-d del | -g | -y] [-i] [-k] [-!]  regex [ hdb ]
       hdb keys [-h] [-g | -y] [-X] [ hdb ]
       hdb make [-h] [-d del | -g | -y] [-p perms ] [-t tmpfile ] hdb
       hdb match [-h] [-d del | -g | -y] [-i] [-k] [-!]  pattern [ hdb ]
       hdb merge [-h] [-p perms ] [-t tmpfile ] [-a] result.hdb A.hdb B.hdb
       hdb purge [-h] [-p perms ] [-t tmpfile ] result.hdb A.hdb B.hdb
       hdb stats [-h] [-v] [ hdb ]

DESCRIPTION
       hdb  is  used to generate, query, analyze, and operate on hash database
       files in hdb(5) format.  The following operations are supported:

   about
       hdb about reports selected metadata within hdb (or  seekable  input  on
       stdin) to stdout.  Each item reported begins on a new line, prefixed by
       the item-specific string described below.   Without  explicit  options,
       the  default report displays the number of records and database comment
       (as if options -cn selected).  Options:

       -a     All.   Display  all  of  the  metadata  (identifier,  number  of
              records, and comment).

       -c     Comment.  Display the internal comment.  The display is prefixed
              by the string ``comment: ''.  Note that the hdb(5) specification
              places  no  constraints on the length or contents of the comment
              area.

       -i     Identifier.  Display the hdb32 header  identifier  string.   The
              display is prefixed by the string ``identifier: ''.

       -n     Number.   Display  the  total  number of records in the dataset.
              The display is prefixed by the string ``records: ''.

   cross
       hdb cross performs a set intersect operation with A.hdb and  B.hdb  and
       compiles the result in result.hdb.  The cross operation selects records
       in A with matching keys in B.  The record sequence in  result.hdb  pre-
       serves  the original sequence among records in A.  Any/all of the named
       argument files may ``overlap''.  Options:

       -p perms
              Permissions.  Set the file creation permissions  for  result.hdb
              as explicitly given in the octal argument perms.  Otherwise, the
              result.hdb file permissions will be set to mode 0666  and  modi-
              fied by the process umask.

       -t tmpfile
              Tempfile.   Use  the  path specified by the argument tmpfile for
              the temporary file used during the creation of result.hdb.  Nor-
              mally the temporary file name is constructed from the result.hdb
              argument as result.hdb.{new}.

   dump
       hdb dump lists to stdout all records found in the given  hdb  file  (or
       seekable  input  on  stdin).  Output format is under control of options
       and described in the FORMATS section.  Options:

       -d del Delimiter.  Output records  in  ``getline''  format,  using  the
              first character in the argument del as separator between the key
              and value parts of the record.  Implies -g.

       -g     Getline.  Output records in ``getline'' format.

       -y     Legacy.  Output records in cdb ``legacy'' format.

   dupes
       hdb dupes lists to stdout all records with duplicate keys found in  the
       given  hdb file (or seekable input on stdin), and prints summary report
       to stderr.  Because the dupes command sequentially  scans  each  record
       and  hash  value  in  the  file,  it  may  also be used to validate the
       integrity of the database.  Output format is under control  of  options
       and described in the FORMATS section.  Options:

       -d del Delimiter.   Output  records  in  ``getline''  format, using the
              first character in the argument del as separator between the key
              and value parts of the record.  Implies -g.

       -g     Getline.  Output records in ``getline'' format.

       -y     Legacy.  Output records in cdb ``legacy'' format.

       -q     Quick/quiet.    Suppress   any  output  and  summary  reporting.
              Short-circuit scan and return exit status 1 on  first  duplicate
              key found in hdb.  Otherwise, return exit status 0 for no dupli-
              cates found.

   get
       hdb get looks up key in hdb (or seekable input on stdin) and, if found,
       writes  the associated record value to stdout.  Exits zero if lookup is
       successful.  Exits non-zero (1) if key not found.  Options:

       -a     All.  Write all record values with matching key.  A  newline  is
              appended  to each record value output.  Normally, only the first
              record matching key is output.

       -j num Jump.  Skip the first num matches for  key  before  writing  the
              num+1 matching record value, if any.

       -n     Newline suppressed.  Write only the record value without append-
              ing a newline.  Normally, a newline is appended to  each  record
              in the output.

   grep
       hdb  grep  lists  to stdout all records found in the given hdb file (or
       seekable input on stdin) matching  the  re_format(7)  extended  regular
       expression given in regex.  Note that the regex argument may need to be
       quoted to inhibit unwanted expansion by the shell.   Output  format  is
       under  control  of options and described in the FORMATS section.  Exits
       zero if one or more matches found.  Exits  non-zero  (1)  if  no  match
       found.  Options:

       -d del Delimiter.   Output  records  in  ``getline''  format, using the
              first character in the argument del as separator between the key
              and value parts of the record.  Implies -g.

       -g     Getline.  Output records in ``getline'' format.

       -y     Legacy.  Output records in cdb ``legacy'' format.

       -i     Case  insensitive.  Perform case insensitive matching.  Normally
              the regular expression is matched  explicitly  with  respect  to
              case.

       -k     Key  match.   Perform  the  regular  expression matching against
              record keys.  Normally the regular expression is matched against
              record values.

       -!     Invert (logical not).  Select and output the records that do not
              match the  given  regular  expression.   Normally  the  matching
              records are output.

   keys
       hdb keys lists to stdout all keys found in the given hdb file (or seek-
       able input on stdin).  Output format is under control  of  options  and
       modified  slightly  from  the  descriptions in the FORMATS section.  By
       default, keys are listed in a modified default format:

           klen:key\n

       An empty line terminates the default output sequence.  Options:

       -g     Getline.  Output keys listed in a modified ``getline'' format:
                   key\n

       -y     Legacy.  Output records in a modified cdb ``legacy'' format:
                   +klen:key\n

       -X     Hash (hexadecimal).  Following each key, display the hash  value
              computed for the key in hexadecimal format.

   make
       hdb make generates the hdb file hdb from formatted input read on stdin.
       Input format is under control of options and described in  the  FORMATS
       section.  Options:

       -d del Delimiter.  Input records in ``getline'' format, using the first
              character in the argument del as separator between the  key  and
              value parts of the record.  Implies -g.

       -g     Getline.  Input records in ``getline'' format.

       -y     Legacy.  Input records in cdb ``legacy'' format.

       -p perms
              Permissions.   Set  the  file  creation  permissions  for hdb as
              explicitly given in the octal argument  perms.   Otherwise,  the
              hdb  file  permissions  will be set to mode 0666 and modified by
              the process umask.

       -t tmpfile
              Tempfile.  Use the path specified by the  argument  tmpfile  for
              the  temporary  file  used during the creation of hdb.  Normally
              the temporary file name is constructed from the hdb argument  as
              hdb.{new}.

   match
       hdb  match  lists to stdout all records found in the given hdb file (or
       seekable input on stdin) matching the simple wildcard expression  given
       in  pattern.   A pattern is a character string composed in any combina-
       tion of:

              o   any character (excepting `?' and `*'), matched explicitly

              o   the `?' (question-mark) character, matching any single char-
                  acter

              o   the  `*' (asterisk) character, matching any sequence of zero
                  or more characters

       Note that the pattern expression argument is a simplified subset of the
       sh(1)  globbing  rules provided by fnmatch(3), and does not provide for
       range expressions or escaping of metacharacters.  (The hdb grep command
       may  be  used  whenever more sophisticated pattern matching expressions
       are required.)

       Note also that the pattern argument may need to be  quoted  to  inhibit
       unwanted  expansion  by  the  shell.  Output format is under control of
       options and described in the FORMATS section.  Exits  zero  if  one  or
       more matches found.  Exits non-zero (1) if no match found.  Options:

       -d del Delimiter.   Output  records  in  ``getline''  format, using the
              first character in the argument del as separator between the key
              and value parts of the record.  Implies -g.

       -g     Getline.  Output records in ``getline'' format.

       -y     Legacy.  Output records in cdb ``legacy'' format.

       -i     Case  insensitive.  Perform case insensitive matching.  Normally
              the wildcard expression is matched explicitly  with  respect  to
              case.

       -k     Key  match.   Perform the wildcard matching against record keys.
              Normally the wildcard expression is matched against record  val-
              ues.

       -!     Invert (logical not).  Select and output the records that do not
              match the given  wildcard  expression.   Normally  the  matching
              records are output.

   merge
       hdb  merge performs a set union operation with A.hdb and B.hdb and com-
       piles the result in  result.hdb.   The  merge  operation  combines  all
       records in A and B, excluding by default any records in A with matching
       keys in B.  Note that the -a option may be used  for  a  ``union  all''
       operation.  The record sequence in result.hdb includes A records before
       B records, and original sequence is preserved among records in A and B.
       Any/all of the named argument files may ``overlap''.  Options:

       -p perms
              Permissions.   Set  the file creation permissions for result.hdb
              as explicitly given in the octal argument perms.  Otherwise, the
              result.hdb  file  permissions will be set to mode 0666 and modi-
              fied by the process umask.

       -t tmpfile
              Tempfile.  Use the path specified by the  argument  tmpfile  for
              the temporary file used during the creation of result.hdb.  Nor-
              mally the temporary file name is constructed from the result.hdb
              argument as result.hdb.{new}.

       -a     All.  Merge all records from A and B, not excluding any matching
              keys.  Normally any records in A with matching  keys  in  B  are
              excluded.

   purge
       hdb purge performs a set minus (exclude) operation with A.hdb and B.hdb
       and compiles the result in result.hdb.   The  purge  operation  selects
       only  records  in A without matching keys in B.  The record sequence in
       result.hdb preserves the original sequence among records in A.  Any/all
       of the named argument files may ``overlap''.  Options:

       -p perms
              Permissions.   Set  the file creation permissions for result.hdb
              as explicitly given in the octal argument perms.  Otherwise, the
              result.hdb  file  permissions will be set to mode 0666 and modi-
              fied by the process umask.

       -t tmpfile
              Tempfile.  Use the path specified by the  argument  tmpfile  for
              the temporary file used during the creation of result.hdb.  Nor-
              mally the temporary file name is constructed from the result.hdb
              argument as result.hdb.{new}.

   stats
       hdb  stats  scans  the hdb file (or seekable input on stdin) and prints
       some summary statistics to stdout.  Options:

       -v     Verbose.  Some additional information is included  in  the  sum-
              mary.

FORMATS
       The hdb utility accepts the following formats for record input/output:

   default
       The default record format is described as:

           klen:dlen\tkey:data\n

       Where:  klen and dlen are the key length and data length, respectively,
       in decimal ascii notation; key and data  are  any  arbitrary  character
       sequences for the record key and data; each record ends with a newline;
       and the `:' and tab separators are literal characters.  A  sequence  of
       records is terminated by eof.

   getline [-g]
       The  ``getline''  record  format  is selected with the -g option and is
       described as:

           key\tdata\n

       Where: key and data are any arbitrary  character  sequences  (excepting
       nul and newline) for the record key and data; separated by default with
       a single tab character; and each record is terminated with  a  newline.
       The  separator  must  itself  not appear within any key, but may appear
       within data.  An alternative separator character between key  and  data
       may  be  specified by the first character in the del argument to the -d
       option.  Lines beginning with a `#' character are ignored.  A  sequence
       of  records  is  terminated  by eof.  Note that this format will not be
       usable in cases where the input records may themselves contain the  nul
       or newline characters.

   legacy [-y]
       The  cdb ``legacy'' record format is selected with the -y option and is
       described as:

           +klen,dlen:key->data\n

       Where: klen and dlen are the key length and data length,  respectively,
       in  decimal  ascii  notation;  key and data are any arbitrary character
       sequences for the record key and data; each record ends with a newline;
       and  the  `+', ',', `:' and `->' are literal characters.  A sequence of
       records is terminated by an empty line.

MISCELLANEOUS
       In this section are some additional notes and comments regarding opera-
       tions on constant databases.

   Duplicate Keys
       The hdb format imposes no constraints on the presence of duplicate keys
       in a database.  For example, applications may  use  duplicate  keys  to
       represent one-to-many relationships between keys and values.

       Other  applications  may  require  a  unique key constraint, modeling a
       strict one-to-one relationship between keys and values.  In such cases,
       the hdb make operation will not itself screen input for duplicate keys.
       However, the hdb dupes operation may be used to test for  the  presence
       of  duplicate keys in a hdb file, and executes in a manner that is gen-
       erally efficient when compared to other methods of pre-screening dupli-
       cates on input.

       A  simple  front-end  script  is  suggested as one means to implement a
       duplicate key constraint on hdb make by checking its  output  with  hdb
       dupes  before  atomically  moving the file (with mv(1) or rename(2)) to
       its intended destination.

   Set Operations
       The hdb utility provides some basic set operations on  hdb  files  with
       the  cross,  merge,  and  purge  commands.   Let  ``0''  represent  the
       null/empty set, and imposing a unique key constraint on each of the set
       operands A and B, the following relations are given:

           A merge A = A

           A merge 0 = A

           A purge A = 0

           A purge 0 = A

           A cross A = A

           A cross 0 = 0

           for((A  cross B) == 0): (A merge B) == (B merge A), with respect to
           membership but not to order

           (A merge all B) == (B merge all A), with respect to membership  but
           not to order

           for(A != B): (A purge B) != (B purge A)

           for((A cross B) == 0): (A purge B) = A, (B purge A) = B

       All  hdb  operations  are  stable  with respect to maintaining original
       insertion order, and records in operand A are inserted  before  records
       in operand B.

       When  the unique key constraint is relaxed for A and B, the results for
       set operations are predictable and exactly described,  but  some  addi-
       tional  consideration  may  be  needed  to understand the outcome.  For
       example: given the operation A merge B, whenever a duplicate key exists
       across  multiple  records  in A, and such key is found in only a single
       record in B, all records in A with  that  key  are  excluded  from  the
       result,  and the result will contain only the single record with match-
       ing key from B.  Considering the inverse B merge A operation  for  this
       example,  the  result  will now exclude the single record with matching
       key from B, and all records with that key in A will now be included.

   Grep vs. Match
       hdb provides two ways to scan a constant database for pattern  matches,
       with  the commands grep and match.  While neither of these is nearly as
       fast as performing exact key matches with the  get  command,  grep  and
       match  do  permit  useful  constant database queries in those instances
       where exact key matches are not otherwise possible.

       In most cases, the match command will be preferred to grep  because  it
       is  simpler  to use and faster in execution.  Especially for the novice
       user, match patterns are easy to compose and resemble  common  wildcard
       globbing  expressions.   But  grep  is  much  more capable when complex
       matches are required, and may be useful  for  performing  sophisticated
       queries.

       One  additional  difference  between  grep  and  match should be noted.
       match patterns are  implicitly  anchored  to  match  against  the  full
       record,  whereas grep patterns require the explicit use of the  `^' and
       `$'  metacharacters  whenever  matching  against  the  full  record  is
       required.   Otherwise,  a grep pattern will match against any substring
       found within a record, while, conversely, a  match  pattern  will  need
       leading  and trailing `*' asterisk characters to match against any sub-
       string within a record.

       Neither grep nor match will be particularly useful in querying any  hdb
       file  that  includes  records containing the nul and/or newline charac-
       ters.

   Seekable Input
       The commands dump, dupes, get, grep, keys, match, and stats  will  read
       either  a  hdb  file  argument directly, or stdin.  If input is read on
       stdin, it must be ``seekable'', that is, permit the lseek(2) operation.
       Piped  input is normally not seekable and the command will fail ESPIPE.
       On the other hand, the sh(1) input redirection operator (`<') will usu-
       ally succeed if the underlying device supports lseek(2) operations.

LIMITS
       The  hdb (5) file format is constrained to a maximum file size of (2^32
       - 1) bytes (4 gigabytes).  Individual record keys and values  are  lim-
       ited  to  a  maximum length of (2^24 - 1) bytes (16 megabytes).  Unless
       using the ``getline'' input format, the hdb make operation permits that
       single  keys  and  values  do  not have to fit into local memory.  When
       using the ``getline'' input format, local memory is required of  suffi-
       cient  size  to  contain the largest single line (key\tvalue\n) encoun-
       tered in the input.  Otherwise, the hdb make operation  requires  about
       10 bytes of memory overhead per input record.

OPTIONS
       The specific options relative to each command are described above.  hdb
       also recognizes the following general options:

       -h     Help.  Display a brief help message to stderr  and  exit.   When
              the  -h  option follows command, the help message is specific to
              the requested operation.

       -V     Version.  Display version  information  and  hdb32  file  format
              specification to stderr and exit.

EXIT STATUS
       hdb exits with one of the following values:

       0      Success.

       1      Boolean  negative result.  Interpretation based on command: get,
              grep,  and  match  indicates  no  match  found;  dupes  in  [-q]
              ``quick/quiet'' mode indicates duplicates found.

       100    Usage  error.   An  error was encountered among options or argu-
              ments to the command.  In this case, hdb prints a brief diagnos-
              tic message to stderr on exit.

       111    System failure.  The command failed to complete due to some sys-
              tem, protocol, or resource error.  In this case,  hdb  prints  a
              brief diagnostic message to stderr on exit.

AUTHOR
       Wayne Marshall, http://b0llix.net/hdb/

SEE ALSO
       hdb(5)



hdb-0.03                         January 2013                           hdb(1)