hdb(1) hdb hdb(1)
NAME
hdb - hash database (hdb32) multitool
SYNOPSIS
hdb [-hV] command opts args
hdb about [-h] [-acin] [ hdb ]
hdb cross [-h] [-p perms ] [-t tmpfile ] result.hdb A.hdb B.hdb
hdb dump [-h] [-d del | -g | -y] [ hdb ]
hdb dupes [-h] [-d del | -g | -y] [-q] [ hdb ]
hdb get [-h] [-a | -j num ] [-n] key [ hdb ]
hdb grep [-h] [-d del | -g | -y] [-i] [-k] [-!] regex [ hdb ]
hdb keys [-h] [-g | -y] [-X] [ hdb ]
hdb make [-h] [-d del | -g | -y] [-p perms ] [-t tmpfile ] hdb
hdb match [-h] [-d del | -g | -y] [-i] [-k] [-!] pattern [ hdb ]
hdb merge [-h] [-p perms ] [-t tmpfile ] [-a] result.hdb A.hdb B.hdb
hdb purge [-h] [-p perms ] [-t tmpfile ] result.hdb A.hdb B.hdb
hdb stats [-h] [-v] [ hdb ]
DESCRIPTION
hdb is used to generate, query, analyze, and operate on hash database
files in hdb(5) format. The following operations are supported:
about
hdb about reports selected metadata within hdb (or seekable input on
stdin) to stdout. Each item reported begins on a new line, prefixed by
the item-specific string described below. Without explicit options,
the default report displays the number of records and database comment
(as if options -cn selected). Options:
-a All. Display all of the metadata (identifier, number of
records, and comment).
-c Comment. Display the internal comment. The display is prefixed
by the string ``comment: ''. Note that the hdb(5) specification
places no constraints on the length or contents of the comment
area.
-i Identifier. Display the hdb32 header identifier string. The
display is prefixed by the string ``identifier: ''.
-n Number. Display the total number of records in the dataset.
The display is prefixed by the string ``records: ''.
cross
hdb cross performs a set intersect operation with A.hdb and B.hdb and
compiles the result in result.hdb. The cross operation selects records
in A with matching keys in B. The record sequence in result.hdb pre-
serves the original sequence among records in A. Any/all of the named
argument files may ``overlap''. Options:
-p perms
Permissions. Set the file creation permissions for result.hdb
as explicitly given in the octal argument perms. Otherwise, the
result.hdb file permissions will be set to mode 0666 and modi-
fied by the process umask.
-t tmpfile
Tempfile. Use the path specified by the argument tmpfile for
the temporary file used during the creation of result.hdb. Nor-
mally the temporary file name is constructed from the result.hdb
argument as result.hdb.{new}.
dump
hdb dump lists to stdout all records found in the given hdb file (or
seekable input on stdin). Output format is under control of options
and described in the FORMATS section. Options:
-d del Delimiter. Output records in ``getline'' format, using the
first character in the argument del as separator between the key
and value parts of the record. Implies -g.
-g Getline. Output records in ``getline'' format.
-y Legacy. Output records in cdb ``legacy'' format.
dupes
hdb dupes lists to stdout all records with duplicate keys found in the
given hdb file (or seekable input on stdin), and prints summary report
to stderr. Because the dupes command sequentially scans each record
and hash value in the file, it may also be used to validate the
integrity of the database. Output format is under control of options
and described in the FORMATS section. Options:
-d del Delimiter. Output records in ``getline'' format, using the
first character in the argument del as separator between the key
and value parts of the record. Implies -g.
-g Getline. Output records in ``getline'' format.
-y Legacy. Output records in cdb ``legacy'' format.
-q Quick/quiet. Suppress any output and summary reporting.
Short-circuit scan and return exit status 1 on first duplicate
key found in hdb. Otherwise, return exit status 0 for no dupli-
cates found.
get
hdb get looks up key in hdb (or seekable input on stdin) and, if found,
writes the associated record value to stdout. Exits zero if lookup is
successful. Exits non-zero (1) if key not found. Options:
-a All. Write all record values with matching key. A newline is
appended to each record value output. Normally, only the first
record matching key is output.
-j num Jump. Skip the first num matches for key before writing the
num+1 matching record value, if any.
-n Newline suppressed. Write only the record value without append-
ing a newline. Normally, a newline is appended to each record
in the output.
grep
hdb grep lists to stdout all records found in the given hdb file (or
seekable input on stdin) matching the re_format(7) extended regular
expression given in regex. Note that the regex argument may need to be
quoted to inhibit unwanted expansion by the shell. Output format is
under control of options and described in the FORMATS section. Exits
zero if one or more matches found. Exits non-zero (1) if no match
found. Options:
-d del Delimiter. Output records in ``getline'' format, using the
first character in the argument del as separator between the key
and value parts of the record. Implies -g.
-g Getline. Output records in ``getline'' format.
-y Legacy. Output records in cdb ``legacy'' format.
-i Case insensitive. Perform case insensitive matching. Normally
the regular expression is matched explicitly with respect to
case.
-k Key match. Perform the regular expression matching against
record keys. Normally the regular expression is matched against
record values.
-! Invert (logical not). Select and output the records that do not
match the given regular expression. Normally the matching
records are output.
keys
hdb keys lists to stdout all keys found in the given hdb file (or seek-
able input on stdin). Output format is under control of options and
modified slightly from the descriptions in the FORMATS section. By
default, keys are listed in a modified default format:
klen:key\n
An empty line terminates the default output sequence. Options:
-g Getline. Output keys listed in a modified ``getline'' format:
key\n
-y Legacy. Output records in a modified cdb ``legacy'' format:
+klen:key\n
-X Hash (hexadecimal). Following each key, display the hash value
computed for the key in hexadecimal format.
make
hdb make generates the hdb file hdb from formatted input read on stdin.
Input format is under control of options and described in the FORMATS
section. Options:
-d del Delimiter. Input records in ``getline'' format, using the first
character in the argument del as separator between the key and
value parts of the record. Implies -g.
-g Getline. Input records in ``getline'' format.
-y Legacy. Input records in cdb ``legacy'' format.
-p perms
Permissions. Set the file creation permissions for hdb as
explicitly given in the octal argument perms. Otherwise, the
hdb file permissions will be set to mode 0666 and modified by
the process umask.
-t tmpfile
Tempfile. Use the path specified by the argument tmpfile for
the temporary file used during the creation of hdb. Normally
the temporary file name is constructed from the hdb argument as
hdb.{new}.
match
hdb match lists to stdout all records found in the given hdb file (or
seekable input on stdin) matching the simple wildcard expression given
in pattern. A pattern is a character string composed in any combina-
tion of:
o any character (excepting `?' and `*'), matched explicitly
o the `?' (question-mark) character, matching any single char-
acter
o the `*' (asterisk) character, matching any sequence of zero
or more characters
Note that the pattern expression argument is a simplified subset of the
sh(1) globbing rules provided by fnmatch(3), and does not provide for
range expressions or escaping of metacharacters. (The hdb grep command
may be used whenever more sophisticated pattern matching expressions
are required.)
Note also that the pattern argument may need to be quoted to inhibit
unwanted expansion by the shell. Output format is under control of
options and described in the FORMATS section. Exits zero if one or
more matches found. Exits non-zero (1) if no match found. Options:
-d del Delimiter. Output records in ``getline'' format, using the
first character in the argument del as separator between the key
and value parts of the record. Implies -g.
-g Getline. Output records in ``getline'' format.
-y Legacy. Output records in cdb ``legacy'' format.
-i Case insensitive. Perform case insensitive matching. Normally
the wildcard expression is matched explicitly with respect to
case.
-k Key match. Perform the wildcard matching against record keys.
Normally the wildcard expression is matched against record val-
ues.
-! Invert (logical not). Select and output the records that do not
match the given wildcard expression. Normally the matching
records are output.
merge
hdb merge performs a set union operation with A.hdb and B.hdb and com-
piles the result in result.hdb. The merge operation combines all
records in A and B, excluding by default any records in A with matching
keys in B. Note that the -a option may be used for a ``union all''
operation. The record sequence in result.hdb includes A records before
B records, and original sequence is preserved among records in A and B.
Any/all of the named argument files may ``overlap''. Options:
-p perms
Permissions. Set the file creation permissions for result.hdb
as explicitly given in the octal argument perms. Otherwise, the
result.hdb file permissions will be set to mode 0666 and modi-
fied by the process umask.
-t tmpfile
Tempfile. Use the path specified by the argument tmpfile for
the temporary file used during the creation of result.hdb. Nor-
mally the temporary file name is constructed from the result.hdb
argument as result.hdb.{new}.
-a All. Merge all records from A and B, not excluding any matching
keys. Normally any records in A with matching keys in B are
excluded.
purge
hdb purge performs a set minus (exclude) operation with A.hdb and B.hdb
and compiles the result in result.hdb. The purge operation selects
only records in A without matching keys in B. The record sequence in
result.hdb preserves the original sequence among records in A. Any/all
of the named argument files may ``overlap''. Options:
-p perms
Permissions. Set the file creation permissions for result.hdb
as explicitly given in the octal argument perms. Otherwise, the
result.hdb file permissions will be set to mode 0666 and modi-
fied by the process umask.
-t tmpfile
Tempfile. Use the path specified by the argument tmpfile for
the temporary file used during the creation of result.hdb. Nor-
mally the temporary file name is constructed from the result.hdb
argument as result.hdb.{new}.
stats
hdb stats scans the hdb file (or seekable input on stdin) and prints
some summary statistics to stdout. Options:
-v Verbose. Some additional information is included in the sum-
mary.
FORMATS
The hdb utility accepts the following formats for record input/output:
default
The default record format is described as:
klen:dlen\tkey:data\n
Where: klen and dlen are the key length and data length, respectively,
in decimal ascii notation; key and data are any arbitrary character
sequences for the record key and data; each record ends with a newline;
and the `:' and tab separators are literal characters. A sequence of
records is terminated by eof.
getline [-g]
The ``getline'' record format is selected with the -g option and is
described as:
key\tdata\n
Where: key and data are any arbitrary character sequences (excepting
nul and newline) for the record key and data; separated by default with
a single tab character; and each record is terminated with a newline.
The separator must itself not appear within any key, but may appear
within data. An alternative separator character between key and data
may be specified by the first character in the del argument to the -d
option. Lines beginning with a `#' character are ignored. A sequence
of records is terminated by eof. Note that this format will not be
usable in cases where the input records may themselves contain the nul
or newline characters.
legacy [-y]
The cdb ``legacy'' record format is selected with the -y option and is
described as:
+klen,dlen:key->data\n
Where: klen and dlen are the key length and data length, respectively,
in decimal ascii notation; key and data are any arbitrary character
sequences for the record key and data; each record ends with a newline;
and the `+', ',', `:' and `->' are literal characters. A sequence of
records is terminated by an empty line.
MISCELLANEOUS
In this section are some additional notes and comments regarding opera-
tions on constant databases.
Duplicate Keys
The hdb format imposes no constraints on the presence of duplicate keys
in a database. For example, applications may use duplicate keys to
represent one-to-many relationships between keys and values.
Other applications may require a unique key constraint, modeling a
strict one-to-one relationship between keys and values. In such cases,
the hdb make operation will not itself screen input for duplicate keys.
However, the hdb dupes operation may be used to test for the presence
of duplicate keys in a hdb file, and executes in a manner that is gen-
erally efficient when compared to other methods of pre-screening dupli-
cates on input.
A simple front-end script is suggested as one means to implement a
duplicate key constraint on hdb make by checking its output with hdb
dupes before atomically moving the file (with mv(1) or rename(2)) to
its intended destination.
Set Operations
The hdb utility provides some basic set operations on hdb files with
the cross, merge, and purge commands. Let ``0'' represent the
null/empty set, and imposing a unique key constraint on each of the set
operands A and B, the following relations are given:
A merge A = A
A merge 0 = A
A purge A = 0
A purge 0 = A
A cross A = A
A cross 0 = 0
for((A cross B) == 0): (A merge B) == (B merge A), with respect to
membership but not to order
(A merge all B) == (B merge all A), with respect to membership but
not to order
for(A != B): (A purge B) != (B purge A)
for((A cross B) == 0): (A purge B) = A, (B purge A) = B
All hdb operations are stable with respect to maintaining original
insertion order, and records in operand A are inserted before records
in operand B.
When the unique key constraint is relaxed for A and B, the results for
set operations are predictable and exactly described, but some addi-
tional consideration may be needed to understand the outcome. For
example: given the operation A merge B, whenever a duplicate key exists
across multiple records in A, and such key is found in only a single
record in B, all records in A with that key are excluded from the
result, and the result will contain only the single record with match-
ing key from B. Considering the inverse B merge A operation for this
example, the result will now exclude the single record with matching
key from B, and all records with that key in A will now be included.
Grep vs. Match
hdb provides two ways to scan a constant database for pattern matches,
with the commands grep and match. While neither of these is nearly as
fast as performing exact key matches with the get command, grep and
match do permit useful constant database queries in those instances
where exact key matches are not otherwise possible.
In most cases, the match command will be preferred to grep because it
is simpler to use and faster in execution. Especially for the novice
user, match patterns are easy to compose and resemble common wildcard
globbing expressions. But grep is much more capable when complex
matches are required, and may be useful for performing sophisticated
queries.
One additional difference between grep and match should be noted.
match patterns are implicitly anchored to match against the full
record, whereas grep patterns require the explicit use of the `^' and
`$' metacharacters whenever matching against the full record is
required. Otherwise, a grep pattern will match against any substring
found within a record, while, conversely, a match pattern will need
leading and trailing `*' asterisk characters to match against any sub-
string within a record.
Neither grep nor match will be particularly useful in querying any hdb
file that includes records containing the nul and/or newline charac-
ters.
Seekable Input
The commands dump, dupes, get, grep, keys, match, and stats will read
either a hdb file argument directly, or stdin. If input is read on
stdin, it must be ``seekable'', that is, permit the lseek(2) operation.
Piped input is normally not seekable and the command will fail ESPIPE.
On the other hand, the sh(1) input redirection operator (`<') will usu-
ally succeed if the underlying device supports lseek(2) operations.
LIMITS
The hdb (5) file format is constrained to a maximum file size of (2^32
- 1) bytes (4 gigabytes). Individual record keys and values are lim-
ited to a maximum length of (2^24 - 1) bytes (16 megabytes). Unless
using the ``getline'' input format, the hdb make operation permits that
single keys and values do not have to fit into local memory. When
using the ``getline'' input format, local memory is required of suffi-
cient size to contain the largest single line (key\tvalue\n) encoun-
tered in the input. Otherwise, the hdb make operation requires about
10 bytes of memory overhead per input record.
OPTIONS
The specific options relative to each command are described above. hdb
also recognizes the following general options:
-h Help. Display a brief help message to stderr and exit. When
the -h option follows command, the help message is specific to
the requested operation.
-V Version. Display version information and hdb32 file format
specification to stderr and exit.
EXIT STATUS
hdb exits with one of the following values:
0 Success.
1 Boolean negative result. Interpretation based on command: get,
grep, and match indicates no match found; dupes in [-q]
``quick/quiet'' mode indicates duplicates found.
100 Usage error. An error was encountered among options or argu-
ments to the command. In this case, hdb prints a brief diagnos-
tic message to stderr on exit.
111 System failure. The command failed to complete due to some sys-
tem, protocol, or resource error. In this case, hdb prints a
brief diagnostic message to stderr on exit.
AUTHOR
Wayne Marshall, http://b0llix.net/hdb/
SEE ALSO
hdb(5)
hdb-0.03 January 2013 hdb(1)