You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

240 lines
5.4 KiB

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head>
<title>
ht://Dig: htfuzzy
</title>
</head>
<body bgcolor="#eef7ff">
<h1>
htfuzzy
</h1>
<p>
ht://Dig Copyright &copy; 1995-2004 <a href="THANKS.html">The ht://Dig Group</a><br>
Please see the file <a href="COPYING">COPYING</a> for
license information.
</p>
<hr size="4" noshade>
<dl>
<dd>
<h2>
Synopsis
</h2>
</dd>
<dd>
htfuzzy [-c <em>configfile</em>][-v] <em>algorithm</em> ...
</dd>
</dl>
<dl>
<dd>
<h2>
Description
</h2>
</dd>
<dd>
Htfuzzy creates indexes for different "fuzzy" search
algorithms. These indexes can then be used by the
<a href="htsearch.html" target="_top">htsearch</a> program.
</dd>
</dl>
<dl>
<dd>
<h2>
Options
</h2>
</dd>
<dd>
<dl compact>
<dt>
-c <em>configfile</em>
</dt>
<dd>
Use the specified configuration file instead of the
default.
</dd>
<dt>
-v
</dt>
<dd>
Verbose mode. Used once will provide progress feedback,
used more than once will overflow even the biggest
buffers. :-)
</dd>
</dl>
</dd>
</dl>
<dl>
<dd>
<h2>
Algorithms
</h2>
</dd>
<dd>
Indexes for the following search algorithms can currently
be created:
<dl>
<dt>
<strong>soundex</strong>
</dt>
<dd>
Creates a slightly modified <a href="http://www.sog.org.uk/cig/vol6/605tdrake.pdf">soundex</a> key database.
A soundex key encodes letters as digits, with similar
sounding letters (c, k, q) given the same digit. Vowels
are not coded.
Differences with the standard soundex algorithm are:
<ul>
<li>
Keys are 6 digits.
</li>
<li>
The first letter is also encoded.
</li>
</ul>
</dd>
<dt>
<strong>metaphone</strong>
</dt>
<dd>
Creates a metaphone key database. This algorithm is
more specific to English, but will get fewer "weird"
matches than the soundex algorithm.
</dd>
<dt>
<strong>accents</strong>
</dt>
<dd>
Creates an accents key database. This algorithm will
map all accented letters to their unaccented
counterparts, so that a search for the unaccented
word will yield all variations of this word with
accents.
</dd>
<dt>
<strong>endings</strong>
</dt>
<dd>
Creates two databases which can be used to match common
word endings. The creation of these databases requires
a list of affix rules and a dictionary which uses those
affix rules. The format of the affix rules and
dictionary files are the ones used by the
<a href="http://fmg-www.cs.ucla.edu/fmg-members/geoff/ispell.html">
ispell</a> program. Included with the distribution are
the affix rules for English and a fairly small English
dictionary. Other languages can be supported by getting
the appropriate affix rules and dictionaries. These are
available for many languages; check the ispell
distribution for more details.
</dd>
<dt>
<strong>synonyms</strong>
</dt>
<dd>
Creates a database of synonyms for words. It reads a
text database of synonyms and creates a database that
htsearch can then use. Each line of the text database
consists of words where the first word will have the
other words on that line as synonyms.
</dd>
</dl>
</dd>
</dl>
<dl>
<dd>
<h2>
Files
</h2>
</dd>
<dd>
<dl>
<dt>
<a href="attrs.html#config_dir">CONFIG_DIR</a>/htdig.conf
</dt>
<dd>
The default configuration file.
</dd>
</dl>
<dl>
<dt>
<a href="attrs.html#database_dir">DATABASE_DIR</a>/db.accents.db
</dt>
<dd>
(Output) Maps between characters with and without
accents for accents fuzzy rule
</dd>
</dl>
<dl>
<dt>
<a href="attrs.html#database_dir">DATABASE_DIR</a>/db.metaphone.db
</dt>
<dd>
(Output) Database of similar-sounding words for
metaphone fuzzy rule
</dd>
</dl>
<dl>
<dt>
<a href="attrs.html#database_dir">DATABASE_DIR</a>/db.soundex.db
</dt>
<dd>
(Output) Database of similar-sounding words for soundex
fuzzy rule
</dd>
</dl>
<dl>
<dt>
<a href="attrs.html#common_dir">COMMON_DIR</a>/english.0, <a href="attrs.html#common_dir">COMMON_DIR</a>/english.aff
</dt>
<dd>
(Input) List of words and affix rules used to generate
endings
</dd>
</dl>
<dl>
<dt>
<a href="attrs.html#common_dir">COMMON_DIR</a>/root2word.db, <a href="attrs.html#common_dir">COMMON_DIR</a>/word2rood.db
</dt>
<dd>
(Output) Database used for endings fuzzy rule
</dd>
</dl>
<dl>
<dt>
<a href="attrs.html#common_dir">COMMON_DIR</a>/synonyms
</dt>
<dd>
(Input) List of groups of words considered synonymous
</dd>
</dl>
<dl>
<dt>
<a href="attrs.html#common_dir">COMMON_DIR</a>/synonyms.db
</dt>
<dd>
(Output) Database used for synonyms fuzzy rule
</dd>
</dl>
</dd>
</dl>
<dl>
<dd>
<h2>
See Also
</h2>
</dd>
<dd>
<a href="htdig.html">htdig</a>,
<a href="htmerge.html">htmerge</a>,
<a href="htsearch.html" target="_top">htsearch</a>,
<a href="attrs.html">Configuration file format</a>, and
<a href="http://fmg-www.cs.ucla.edu/fmg-members/geoff/ispell.html">
ispell</a>.
</dd>
</dl>
<hr size="4" noshade>
Last modified: $Date: 2004/06/12 13:39:13 $
</body>
</html>