You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
138 lines
4.5 KiB
138 lines
4.5 KiB
3 years ago
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
|
||
|
<html>
|
||
|
<head>
|
||
|
<title>
|
||
|
ht://Dig: Overview of Programs
|
||
|
</title>
|
||
|
</head>
|
||
|
<body bgcolor="#eef7ff">
|
||
|
<h1>
|
||
|
Overview of Programs
|
||
|
</h1>
|
||
|
<p>
|
||
|
ht://Dig Copyright © 1995-2004 <a href="THANKS.html">The ht://Dig Group</a><br>
|
||
|
Please see the file <a href="COPYING">COPYING</a> for
|
||
|
license information.
|
||
|
</p>
|
||
|
<hr size="4" noshade>
|
||
|
<p>
|
||
|
There are several programs in the ht://Dig package.
|
||
|
</p>
|
||
|
<h3>
|
||
|
<a href="htdig.html">htdig</a>
|
||
|
</h3>
|
||
|
<p>
|
||
|
Digging is the first step in creating a search database. This
|
||
|
system uses the word <em>digging</em> while other systems call
|
||
|
it <em>harvesting</em> or <em>gathering</em>. In the ht://Dig
|
||
|
system, the program <a href="htdig.html">htdig</a> performs
|
||
|
the information gathering stage. In this process, the program
|
||
|
will act as a regular web user, except that it will follow
|
||
|
<em>all</em> hyperlinks that it comes across. (Actually, it
|
||
|
will not follow all of them, just those that are within the
|
||
|
domain it needs to gather information on...)<br>
|
||
|
Each document it goes to is examined and all the unique
|
||
|
words in this document are extracted and stored.
|
||
|
</p>
|
||
|
<p>
|
||
|
The digging process will <em>only</em> follow links and has
|
||
|
no notion of JavaScript, applets, or user-input forms.
|
||
|
</p>
|
||
|
<hr noshade>
|
||
|
<h3>
|
||
|
<a href="htsearch.html" target="_top">htsearch</a>
|
||
|
</h3>
|
||
|
<p>
|
||
|
Searching is where the users actually get to use all the
|
||
|
information that was gathered during the dig and merge
|
||
|
stages. The <a href="htsearch.html" target="_top">
|
||
|
htsearch</a> program performs the actual searches. It typically
|
||
|
produces <code>HTML</code> output which will be seen by the
|
||
|
users, though other text formats could be generated by
|
||
|
editing the output templates.
|
||
|
</p>
|
||
|
<hr noshade>
|
||
|
<h3>
|
||
|
<a href="htmerge.html">htmerge</a>
|
||
|
</h3>
|
||
|
<p>
|
||
|
Merging does exactly that--it merges one database
|
||
|
into another. In previous versions of ht://Dig, the htmerge
|
||
|
program also formed databases for use by htsearch from the
|
||
|
htdig output. This process is now largely unnecessary except
|
||
|
for removal of invalid URLs which is now done by the htpurge
|
||
|
program.
|
||
|
</p>
|
||
|
<hr noshade>
|
||
|
<h3>
|
||
|
<a href="htpurge.html">htpurge</a>
|
||
|
</h3>
|
||
|
<p>
|
||
|
Purging removes documents and the associated words from the
|
||
|
databases. This should be done after running htdig to remove
|
||
|
invalid URLs, documents marked not to be indexed, old
|
||
|
versions of modified documents, etc. You can also specify
|
||
|
specific URLs to be removed explicitly by htpurge.
|
||
|
</p>
|
||
|
<hr noshade>
|
||
|
<h3>
|
||
|
<a href="htload.html">htload</a>
|
||
|
</h3>
|
||
|
<p>
|
||
|
Loading involves importing the contents of the databases
|
||
|
from formatted ASCII text documents as created by htdump or
|
||
|
the -t flag from htdig. This is, of course, destructive by
|
||
|
nature and data from the text files will replace any
|
||
|
conflicting data in the databases.
|
||
|
</p>
|
||
|
<hr noshade>
|
||
|
<h3>
|
||
|
<a href="htdump.html">htdump</a>
|
||
|
</h3>
|
||
|
<p>
|
||
|
Dumping involves exporting the contents of the databases to
|
||
|
formatted ASCII text documents. This can be useful for
|
||
|
backups, transferring databases between different operating
|
||
|
systems, changing the compression or encodings in the
|
||
|
ht://Dig configuration, parsing by external utilities. It is
|
||
|
<em>not</em> recommended to edit these files by hand, so be
|
||
|
warned! (Minor edits will probably be fine.)
|
||
|
</p>
|
||
|
<hr noshade>
|
||
|
<h3>
|
||
|
<a href="htstat.html">htstat</a>
|
||
|
</h3>
|
||
|
<p>
|
||
|
The htstat program returns statistics on the databases,
|
||
|
similar to the -s flags for some of the programs. In
|
||
|
addition, it can return a list of URLs in the databases.
|
||
|
</p>
|
||
|
<hr noshade>
|
||
|
<h3>
|
||
|
<a href="htnotify.html">htnotify</a>
|
||
|
</h3>
|
||
|
<p>
|
||
|
The ht://Dig system includes a handy reminder service which
|
||
|
allows HTML authors to add some ht://Dig specific <a href="meta.html">meta
|
||
|
information</a> in HTML documents. This meta information is
|
||
|
used to email authors after a specified date. Very useful
|
||
|
to maintain lists that contain those annoying "new"
|
||
|
graphics with new items. (Hint: Things really aren't all
|
||
|
that new anymore after 6 months!)<br>
|
||
|
</p>
|
||
|
<hr noshade>
|
||
|
<h3>
|
||
|
<a href="htfuzzy.html">htfuzzy</a>
|
||
|
</h3>
|
||
|
<p>
|
||
|
To allow the searches to use "fuzzy" algorithms to match
|
||
|
words, the <a href="htfuzzy.html">htfuzzy</a> program can
|
||
|
create indexes for several different algorithms.
|
||
|
</p>
|
||
|
<hr size="4" noshade>
|
||
|
|
||
|
Last modified: $Date: 2004/05/28 13:15:17 $
|
||
|
|
||
|
</body>
|
||
|
</html>
|