You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
240 lines
5.4 KiB
240 lines
5.4 KiB
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
|
|
<html>
|
|
<head>
|
|
<title>
|
|
ht://Dig: htfuzzy
|
|
</title>
|
|
</head>
|
|
<body bgcolor="#eef7ff">
|
|
<h1>
|
|
htfuzzy
|
|
</h1>
|
|
<p>
|
|
ht://Dig Copyright © 1995-2004 <a href="THANKS.html">The ht://Dig Group</a><br>
|
|
Please see the file <a href="COPYING">COPYING</a> for
|
|
license information.
|
|
</p>
|
|
<hr size="4" noshade>
|
|
<dl>
|
|
<dd>
|
|
<h2>
|
|
Synopsis
|
|
</h2>
|
|
</dd>
|
|
<dd>
|
|
htfuzzy [-c <em>configfile</em>][-v] <em>algorithm</em> ...
|
|
</dd>
|
|
</dl>
|
|
<dl>
|
|
<dd>
|
|
<h2>
|
|
Description
|
|
</h2>
|
|
</dd>
|
|
<dd>
|
|
Htfuzzy creates indexes for different "fuzzy" search
|
|
algorithms. These indexes can then be used by the
|
|
<a href="htsearch.html" target="_top">htsearch</a> program.
|
|
</dd>
|
|
</dl>
|
|
<dl>
|
|
<dd>
|
|
<h2>
|
|
Options
|
|
</h2>
|
|
</dd>
|
|
<dd>
|
|
<dl compact>
|
|
<dt>
|
|
-c <em>configfile</em>
|
|
</dt>
|
|
<dd>
|
|
Use the specified configuration file instead of the
|
|
default.
|
|
</dd>
|
|
<dt>
|
|
-v
|
|
</dt>
|
|
<dd>
|
|
Verbose mode. Used once will provide progress feedback,
|
|
used more than once will overflow even the biggest
|
|
buffers. :-)
|
|
</dd>
|
|
</dl>
|
|
</dd>
|
|
</dl>
|
|
<dl>
|
|
<dd>
|
|
<h2>
|
|
Algorithms
|
|
</h2>
|
|
</dd>
|
|
<dd>
|
|
Indexes for the following search algorithms can currently
|
|
be created:
|
|
<dl>
|
|
<dt>
|
|
<strong>soundex</strong>
|
|
</dt>
|
|
<dd>
|
|
Creates a slightly modified <a href="http://www.sog.org.uk/cig/vol6/605tdrake.pdf">soundex</a> key database.
|
|
A soundex key encodes letters as digits, with similar
|
|
sounding letters (c, k, q) given the same digit. Vowels
|
|
are not coded.
|
|
Differences with the standard soundex algorithm are:
|
|
<ul>
|
|
<li>
|
|
Keys are 6 digits.
|
|
</li>
|
|
<li>
|
|
The first letter is also encoded.
|
|
</li>
|
|
</ul>
|
|
</dd>
|
|
<dt>
|
|
<strong>metaphone</strong>
|
|
</dt>
|
|
<dd>
|
|
Creates a metaphone key database. This algorithm is
|
|
more specific to English, but will get fewer "weird"
|
|
matches than the soundex algorithm.
|
|
</dd>
|
|
<dt>
|
|
<strong>accents</strong>
|
|
</dt>
|
|
<dd>
|
|
Creates an accents key database. This algorithm will
|
|
map all accented letters to their unaccented
|
|
counterparts, so that a search for the unaccented
|
|
word will yield all variations of this word with
|
|
accents.
|
|
</dd>
|
|
<dt>
|
|
<strong>endings</strong>
|
|
</dt>
|
|
<dd>
|
|
Creates two databases which can be used to match common
|
|
word endings. The creation of these databases requires
|
|
a list of affix rules and a dictionary which uses those
|
|
affix rules. The format of the affix rules and
|
|
dictionary files are the ones used by the
|
|
<a href="http://fmg-www.cs.ucla.edu/fmg-members/geoff/ispell.html">
|
|
ispell</a> program. Included with the distribution are
|
|
the affix rules for English and a fairly small English
|
|
dictionary. Other languages can be supported by getting
|
|
the appropriate affix rules and dictionaries. These are
|
|
available for many languages; check the ispell
|
|
distribution for more details.
|
|
</dd>
|
|
<dt>
|
|
<strong>synonyms</strong>
|
|
</dt>
|
|
<dd>
|
|
Creates a database of synonyms for words. It reads a
|
|
text database of synonyms and creates a database that
|
|
htsearch can then use. Each line of the text database
|
|
consists of words where the first word will have the
|
|
other words on that line as synonyms.
|
|
</dd>
|
|
</dl>
|
|
</dd>
|
|
</dl>
|
|
<dl>
|
|
<dd>
|
|
<h2>
|
|
Files
|
|
</h2>
|
|
</dd>
|
|
<dd>
|
|
<dl>
|
|
<dt>
|
|
<a href="attrs.html#config_dir">CONFIG_DIR</a>/htdig.conf
|
|
</dt>
|
|
<dd>
|
|
The default configuration file.
|
|
</dd>
|
|
</dl>
|
|
<dl>
|
|
<dt>
|
|
<a href="attrs.html#database_dir">DATABASE_DIR</a>/db.accents.db
|
|
</dt>
|
|
<dd>
|
|
(Output) Maps between characters with and without
|
|
accents for accents fuzzy rule
|
|
</dd>
|
|
</dl>
|
|
<dl>
|
|
<dt>
|
|
<a href="attrs.html#database_dir">DATABASE_DIR</a>/db.metaphone.db
|
|
</dt>
|
|
<dd>
|
|
(Output) Database of similar-sounding words for
|
|
metaphone fuzzy rule
|
|
</dd>
|
|
</dl>
|
|
<dl>
|
|
<dt>
|
|
<a href="attrs.html#database_dir">DATABASE_DIR</a>/db.soundex.db
|
|
</dt>
|
|
<dd>
|
|
(Output) Database of similar-sounding words for soundex
|
|
fuzzy rule
|
|
</dd>
|
|
</dl>
|
|
<dl>
|
|
<dt>
|
|
<a href="attrs.html#common_dir">COMMON_DIR</a>/english.0, <a href="attrs.html#common_dir">COMMON_DIR</a>/english.aff
|
|
</dt>
|
|
<dd>
|
|
(Input) List of words and affix rules used to generate
|
|
endings
|
|
</dd>
|
|
</dl>
|
|
<dl>
|
|
<dt>
|
|
<a href="attrs.html#common_dir">COMMON_DIR</a>/root2word.db, <a href="attrs.html#common_dir">COMMON_DIR</a>/word2rood.db
|
|
</dt>
|
|
<dd>
|
|
(Output) Database used for endings fuzzy rule
|
|
</dd>
|
|
</dl>
|
|
<dl>
|
|
<dt>
|
|
<a href="attrs.html#common_dir">COMMON_DIR</a>/synonyms
|
|
</dt>
|
|
<dd>
|
|
(Input) List of groups of words considered synonymous
|
|
</dd>
|
|
</dl>
|
|
<dl>
|
|
<dt>
|
|
<a href="attrs.html#common_dir">COMMON_DIR</a>/synonyms.db
|
|
</dt>
|
|
<dd>
|
|
(Output) Database used for synonyms fuzzy rule
|
|
</dd>
|
|
</dl>
|
|
</dd>
|
|
</dl>
|
|
<dl>
|
|
<dd>
|
|
<h2>
|
|
See Also
|
|
</h2>
|
|
</dd>
|
|
<dd>
|
|
<a href="htdig.html">htdig</a>,
|
|
<a href="htmerge.html">htmerge</a>,
|
|
<a href="htsearch.html" target="_top">htsearch</a>,
|
|
<a href="attrs.html">Configuration file format</a>, and
|
|
<a href="http://fmg-www.cs.ucla.edu/fmg-members/geoff/ispell.html">
|
|
ispell</a>.
|
|
</dd>
|
|
</dl>
|
|
<hr size="4" noshade>
|
|
|
|
Last modified: $Date: 2004/06/12 13:39:13 $
|
|
|
|
</body>
|
|
</html>
|