.TH htdig 1 "21 July 1997" .\" NAME should be all caps, SECTION should be 1-8, maybe w/ subsection .\" other parms are allowed: see man(7), man(1) .SH NAME htdig \- retrieve HTML documents for ht://Dig search engine .SH SYNOPSIS .B htdig .I "[options]" .SH "DESCRIPTION" Htdig retrieves HTML documents using the HTTP protocol and gathers information from these documents which can later be used to search these documents. This program can be referred to as the search robot. .SH OPTIONS .TP .B \- Get the list of URLs to start indexing from standard input. This will override the default parameter \fIstart_url\fR specified in the config file and the file supplied to the \fI-m\fR option. .TP .B \-a Use alternate work files. Tells htdig to append .I .work to database files, causing a second copy of the database to be built. This allows the original files to be used by htsearch during the indexing run. .TP .B \-c \fIconfigfile\fR Use the specified .I configfile instead of the default. .TP .B \-h \fImaxhops\fR Restrict the dig to documents that are at most .I maxhops links away from the starting document. This only works if option \fI\-i\fR is also given. .TP .B \-i Initial. Do not use any old databases. Old databases will be erased before runing the program. .TP .B \-m \fIfilename\fR Minimal run. Only index the URLs given in the file \fIfilename\fR, ignoring all others. URLs in the file should be formatted one URL per line. .B \-s Print statistics about the dig after completion. .TP .B \-t Create an ASCII version of the document database. This database is easy to parse with other programs so that information can be extracted from it for purposes other than searching. One could gather some interesting statistics from this database. .TS cB cB c l . Fieldname Value u URL t Title a State (0 normal, 1 not found, 2 not indexed, 3 obsolete) m Time of last modification reported by the server s Document Size in bytes H Excerpt of the document h Meta Description l Time of last rerievial L Count of links in the document or of \fIoutgoing links\fR b Number of links to the document, also called \fIincoming\fR links or \fIbacklinks\fR c Hop count of this document g Signature of this document (used to detect duplicates) e E-Mail address to use for a notification from \fIhtnotify\fR n Date on which such notification is sent S Subject of the notfication message d The text of Incoming links pointing to this document (e.g. description) A Anchors in the document (i.e.