You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

4360 lines
158 KiB

Thu Jan 31 17:32:33 2002 Geoff Hutchison <ghutchis@wso.williams.edu>
* Release of 3.1.6.
* htdoc/confindex.html, htdoc/htsearch.html, htdoc/index.html,
htdoc/mailarchive.html: Remove CSS link, not needed in these
frameset pages.
* htdoc/howto-mirror.html: Update with Jesse's latest version.
Thu Jan 31 15:13:07 2002 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* Makefile.in: Fixed install-strip target to properly handle relative
paths in INSTALL_PROGRAM when passing it to subdirectories.
Thu Jan 31 11:41:39 2002 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/FAQ.html: Updated questions 4.8 & 4.9 to emphasize use of
doc2html over parse_doc.pl. Further clarified question 2.1.
Thu Jan 31 10:14:23 2002 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* contrib/parse_doc.pl: Added comments explaining why you should
not be using this script.
Wed Jan 30 17:20:51 2002 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdoc/FAQ.html: Updated to mention 3.1.6 as the newest version
and --with-rx as a fix for regex problems on BSDI.
Wed Jan 30 17:15:49 2002 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* installdir/synonyms: Updated with the version contributed by
David Adams, with minor changes. Kept old one as synonyms.original.
* installdir/english.0: Changed lots more dubious uses of suffixes to
get more appropriate and correct fuzzy endings expansions.
Wed Jan 30 12:30:16 2002 Geoff Hutchison <ghutchis@wso.williams.edu>
* htlib/Connection.cc (connect): Fixed bug with allow_EINTR and
add support for looping when the connection returns EAGAIN (no
more free local ports). Thanks to Ahmon Dancy <dancy@franz.com>
for pointing out the EAGAIN issue.
Tue Jan 29 09:59:58 2002 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/FAQ.html: Updated with today's changes to maindocs FAQ.
Mon Jan 28 16:54:15 2002 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* contrib/README: Added mentions of examples & xmlsearch, fixed typo.
Sun Jan 27 23:13:11 2002 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdoc/*.html: Final batch of documentation updates.
Sat Jan 26 23:28:25 2002 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdoc/*: More documentation updates from merging with the
current maindocs CVS.
Fri Jan 25 21:36:21 2002 Geoff Hutchison <ghutchis@wso.williams.edu>
* acconfig.h, include/htconfig.h.in: Add USE_RX to potential
configure #include macros.
* htlib/gregex.h: Rename regex.h to prevent conflicts with system
version.
* htlib/regex.c, htlib/HtRegex.h: Ditto.
* htfuzzy/EndingsDB.cc: Use same tests as HtRegex.h for rxposix.h,
gregex.h or regex.h depending on configure results.
* configure.in: Implement more flexible test for rx/regex, which
will check for rxposix.h if --with-rx is supplied, will "fall
back" to regex test if rxposix.h isn't available and will only use
the htlib/ code and header for regex compile.
* configure: Update using autoconf.
Fri Jan 25 12:14:26 2002 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* contrib/whatsnew/README, contrib/whatsnew/whatsnew.html: Added
an example of how to get a what's new listing from the new features
in htsearch.
Thu Jan 24 22:43:28 2002 Geoff Hutchison <ghutchis@wso.williams.edu>
* htcommon/defaults.cc: Add ignore_dead_servers attribute to
control whether indexing will continue to try to contact a dead
server.
* htdig/Retriever.cc: Only mark a server as dead if the
ignore_dead_servers attribute is set.
* htdoc/cf_byname.html, htdoc/cf_byprog.html, htdoc/attrs.html:
Documentation updates.
Thu Jan 24 15:32:59 2002 Geoff Hutchison <ghutchis@wso.williams.edu>
* configure, configure.in: Add --with-rx option to switch to
system rx code (e.g. on BSDI). Needs some touchups still,
including checking that rxposix.h exists and if --without-rx was
supplied for some reason.
* htlib/HtRegex.h: Add conditional <rxposix.h> header for systems
where rx is better than regex.
* htlib/Makefile.in: Make sure regex.o is only compiled if it
works on a given system via LIBOBJS as supplied by the configure
script.
Mon Jan 21 22:33:30 2002 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdoc/RELEASE.html: Add first shot at the release notes for
3.1.6. Still need to finish some of the htdoc/ merges, including
the SF icons and such.
* htdoc/*.html: First stab at many of the htdoc/merges including
the new Copyright line. (It is 2002, after all.)
Fri Jan 18 18:17:34 2002 Geoff Hutchison <ghutchis@wso.williams.edu>
* htmerge/docs.cc: Add a test if the DB database has no URLs
before proceeding.
* htmerge/words.cc: Add a slightly more user-friendly error
message if the word list file doesn't exist. Remove exit()
statements since reportError does this for us.
Fri Jan 18 16:47:50 2002 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/attrs.html: Rewrote description of prefix_match_character
to make it more clear, with crosslinks to related attributes, and
described new wildcard matching feature. Added more explanations
for relative days & months in startday et al. to make it clearer.
Added more notes about to-strings in the url_part_aliases description
and explained the example even more, as well as adding crosslinks
to the new *_rewrite_rules.
Fri Jan 18 15:56:11 2002 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htsearch/htsearch.cc (setupWords), htsearch/parser.cc (perform_push):
Added support for a wildcard word of "*" (or prefix_match_character
if set and not empty) which returns all documents.
Wed Jan 16 17:21:26 2002 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/attrs.html, htdoc/hts_form.html: Described how to use
relative dates for startyear et al.
Wed Jan 16 16:58:05 2002 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htsearch/Display.cc (buildMatchList): Fixed startday et al. to
allow relative days, month & years if values are negative.
Fri Jan 11 20:57:51 2002 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/attrs.html: Updated descriptions for translate_* attributes
to match the new default behavior.
Fri Jan 11 17:48:54 2002 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/SGMLEntities.cc (translateAndUpdate): Added support for
translate_latin1 attribute, to turn off ISO-8859-1-specific entities.
* htcommon/defaults.cc: Added translate_latin1 attribute.
* htdoc/attrs.html, htdoc/cf_by{name,prog}.html: Documented it.
Fri Jan 11 17:14:54 2002 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* contrib/xmlsearch.{README,tar.gz}: Removed older xmlsearch package.
Fri Jan 11 17:06:09 2002 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* contrib/xmlsearch/*: Added files contributed by Nathan Hand and
me to implement XML output from htsearch, including DTD, templates
and config file.
Wed Jan 9 22:08:21 2002 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* CONFIG.in: Fixed to allow setting BIN_DIR by configure option.
* contrib/htdig-3.1.6.spec: Fixed to make use of new ./configure
options for pathnames, do away with patch file. Used variables for
many pathnames to allow easy changes.
Wed Jan 9 16:22:32 2002 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/ExternalParser.cc (parse): Added support for max_keywords
attribute.
Wed Jan 9 16:10:44 2002 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/HTML.cc (HTML, do_tag), htdig/ExternalParser.cc (parse):
Added support for description_meta_tag_names attribute.
Ensure external parser interface accepts META descriptions even if
'description' is added to the keyword list.
* htcommon/defaults.cc: Added description_meta_tag_names attribute.
* htdoc/attrs.html, htdoc/cf_by{name,prog}.html: Documented it.
Tue Jan 8 17:39:24 2002 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/ExternalParser.cc (parse): Added support for use_doc_date
attribute.
Thu Jan 3 17:10:50 2002 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htlib/Makefile.in, htlib/lib.h: Removed references to timegm,
mytimegm and strptime functions. Removed C source for these.
Thu Jan 3 16:43:31 2002 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/htmerge.html: Added extra description for -m option to clear
up common points of confusion, added note about LC_COLLATE environment
variable.
Fri Dec 21 18:52:32 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/Retriever.cc: Added parsedcdate function, used by got_time,
to parse DC date meta tags without requiring strptime or timegm.
Thu Dec 20 12:25:47 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/Document.cc: Added parsedate function, used by getdate, to
parse date headers without requiring strptime or timegm, which have
caused problems on some systems.
Thu Dec 20 11:51:26 CET 2001 Gabriele Bartolini <angusgb@users.sourceforge.net>
* configure.in: reviewed directory settings
* Makefile.in: ditto (for 'make install' of htdig.conf and rundig)
Wed Dec 19 23:05:09 2001 Geoff Hutchison <ghutchis@wso.williams.edu>
* configure.in: Add tests for ostream.h and iostream.h.
* htlib/htString.h: Add HAVE_OSTREAM_H and HAVE_IOSTREAM_H
preprocessor statements to deal with portability issues around the
C++ header files.
Wed Dec 19 13:33:55 2001 Gabriele Bartolini <angusgb@users.sourceforge.net>
* configure.in: fixed bug in customisation of configure paramters
* CONFIG.in: ditto
* configure: re-generated with autoconf
Tue Dec 18 16:12:17 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htsearch/Display.cc (displayMatch): Fixed to clear out old values
of ANCHOR template variable for each result.
Thu Dec 6 13:14:22 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* contrib/examples/rundig.sh: Fixed to make use of DBDIR variable.
Wed Nov 21 12:54:42 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/rundig.html: Added note about effect of changing database_base.
* htmerge/docs.cc (convertDocs): Changed confusing message about
total doc db size in stats.
Wed Nov 21 11:37:52 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htsearch/TemplateList.cc (createFromString), htdoc/attrs.html:
Treat template_map as a _quoted_ string list. Change <i> tags to
the HTML-4.0 compliant <em> tags in builtin-long template.
Tue Nov 20 17:13:27 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htlib/String.cc (String, append, sub): Added checks for negative
lengths or start position to make code more fault-tolerant.
Tue Nov 20 16:37:26 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htfuzzy/Synonym.cc (createDB): Check for lines with less than
2 words, to avoid segfault caused by calling Database::Put() with
negative length for data field.
Sat Nov 3 23:55:00 2001 Geoff Hutchison <ghutchis@wso.williams.edu>
* htlib/htString.h: Add #include for ostream.h to solve compile
problems with gcc3.
* htlib/Connection.h, htlib/Connection.cc: Backport Connection
class from 3.2 code--installs alarm() call to timeout connections
and will retry connections a few times before giving up.
Fri Nov 2 12:28:35 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/HTML.cc, htdoc/attrs.html: Added support for dc.date,
dc.date.created and dc.date.modified to use_doc_date handling.
Fri Nov 2 12:12:59 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* contrib/xmlsearch.README, contrib/xmlsearch.tar.gz: Added files
contributed by Nathan Hand and me to implement XML output from
htsearch, including DTD, templates and config file.
Fri Nov 2 12:05:49 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/HTML.cc (do_tag), htcommon/defaults.cc: Added ignore_alt_text
attribute to avoid indexing alt text in img tags.
* htdoc/attrs.html, htdoc/cf_by{name,prog}.html: Documented it.
Thu Nov 1 14:43:13 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htsearch/htsearch.cc (main): Fixed to only show file names in
error messages when REQUEST_METHOD not set and -v option given,
for security.
Thu Nov 1 10:19:27 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htsearch/Display.cc, htsearch/Display.h: Added a localized
method for outputing HTTP headers, added support for a new
search_results_contenttype attribute to control that header.
* htcommon/defaults.cc: Added default for it.
* htdoc/attrs.html, htdoc/cf_by{name,prog}.html: Documented it.
Wed Oct 31 13:31:18 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* installdir/english.0: Changed lots of dubious uses of suffixes to
get more appropriate and correct fuzzy endings expansions.
Tue Oct 23 14:06:37 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/Retriever.cc (RetrievedDocument): Fixed handling of null
return from getParsable(), to avoid segfault problem introduced
by text/css conditional added Jul 25.
Fri Oct 19 17:24:19 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htsearch/Display.cc (hilight): Added Stefan Nehlsen's idea for
anchor_target attribute.
* htcommon/defaults.cc: Added default for it.
* htdoc/attrs.html, htdoc/cf_by{name,prog}.html: Documented it.
Sun Oct 14 22:05:30 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/attrs.html (external_parsers): Documented external converter
chaining to same content-type, e.g. text/html->text/html-internal.
Sun Oct 14 21:54:24 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/attrs.html, htdoc/cf_byprog.html, htdoc/cf_byname.html,
htcommon/defaults.cc: Documented and declared startyear, etc.
attributes used by htsearch.
Sun Oct 14 21:16:19 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/htdump.html, htdoc/htload.html, htdoc/attrs.html,
htdoc/cf_byprog.html, htdoc/contents.html: Documented htdump and
htload, indicating which attributes are used by them.
Fri Oct 12 14:58:15 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htlib/URL.cc (removeIndex): Fixed to make sure the matched file
name is at the end of the URL.
Tue Oct 2 09:34:43 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/attrs.html (start_url): Added a reference and link to
limit_urls_to, explaining how the two are tied together.
Fri Sep 28 17:19:45 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* contrib/htdig-3.1.6.spec: Fixed %install to make symlinks for
htdump & htload, added these to %files list.
Fri Sep 28 15:38:00 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htsearch/Display.cc (displayMatch): Save rewritten URL in DocumentRef
so it'll be used for star_patterns and template_patterns matching.
Fri Sep 28 14:25:29 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htsearch/Display.cc (buildMatchList, displayMatch),
htsearch/htsearch.cc (main): Added calls to pass search_rewrite_rules
to HtURLRewriter class and use it to rewrite URLs in results.
* htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html,
htcommon/defaults.cc: Added search_rewrite_rules attribute.
Thu Sep 27 16:34:51 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htlib/Makefile.in, htlib/HtRegex.cc, htlib/HtRegex.h,
htlib/HtRegexReplace.cc, htlib/HtRegexReplace.h,
htlib/HtRegexReplaceList.cc, htlib/HtRegexReplaceList.h,
htlib/HtURLRewriter.cc, htlib/HtURLRewriter.h: Added new classes to
support regular expressions and implement url_rewrite_rules attribute,
using Geoff's variation of Andy Armstrong's implementation of this.
* htlib/URL.h, htlib/URL.cc: Added URL::rewrite() method.
* htlib/htString.h: Added Nth() method for HtRegex class.
* htdig/Retriever.cc (got_href, got_redirect): Added calls to
url.rewrite(), and debugging output for this.
* htdig/htdig.cc (main): Added calls to make instance of
HtURLRewriter class.
* htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html,
htcommon/defaults.cc: Added url_rewrite_rules attribute.
Mon Sep 17 16:52:07 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/running.html: New documentation on how to run after configuring.
* htdoc/rundig.html: New manual page for rundig script.
* htdoc/install.html: Added link to running.html.
* htdoc/contents.html: Added link to running.html, rundig.html, related
projects. Updated links to contrib and developer site. Got rid of
link to web site stats.
Fri Sep 14 09:18:38 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/Document.cc (RetrieveHTTP): Add port to Host: header when
port is not default, as per RFC2616(14.23). Fixes bug #459969.
Sat Sep 8 22:04:47 2001 Geoff Hutchison <ghutchis@wso.williams.edu>
* acconfig.h, include/htconfig.h.in: Add undef for
ALLOW_INSECURE_CGI_CONFIG, which if defined does about what you'd
expect. (This is for any wrapper authors who don't want to rewrite
but are willing to run insecure.)
* htsearch/htsearch.cc: Only allow the -c flag to work when
REQUEST_METHOD is undefined. Fixes PR#458013.
Fri Aug 31 16:00:37 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htlib/URL.cc (URL): Fixed to call normalizePath() even if URL
is relative but with absolute path. Should fix bug #408586.
Fri Aug 31 15:21:49 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/HTML.h, htdig/HTML.cc (HTML, parse, do_tag): Fixed buggy
handling of nested tags that independently turn off indexing, so
</script> doesn't cancel <meta name=robots ...> tag. Add handling
of <noindex follow> tag.
Fri Aug 31 14:33:41 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
[ Backport some 3.2.0b4 HTML parser changes. ]
* htdig/HTML.cc (do_tag): Rewrite using Configuration class to
separate tag attributes. Parse <object> tags properly, looking
for data= attribute rather than src=. Add support for TITLE
attributes in anchor and related tags. Treat <script></script>
tags as noindex tags, much like <style></style> as suggested
by Torsten.
* htdig/HTML.cc(parse): Fix to prevent closing ">" from being passed
to do_tag().
Wed Aug 29 10:20:55 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/attrs.html (allow_in_form, build_select_lists,
limit_normalized, server_aliases, server_max_docs, server_wait_time,
url_part_aliases): Added clarifications to allow_in_form,
server_aliases and url_part_aliases descriptions. Changed word
"directive" to "attribute" where appropriate. Added cross-link to
server_aliases from limit_normalized, and to allow_in_form from
build_select_lists.
Mon Aug 27 17:22:56 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/HTML.cc (do_tag): Improve handling of whitespace in META
refresh handling. Fixes bug #406244.
Mon Aug 27 16:38:43 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/HTML.cc (parse): Fixed delete [] text (was missing []), added
simple optimizations for comment & noindex_start skipping, handle
decoded &lt; entity correctly.
Mon Aug 27 15:31:01 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
[ Backport 3.2.0b4 config files. ]
* installdir/htdig.conf: Added .css to bad_extensions default,
added missing closing ">", added mentions of accents & substring,
fixed a couple typos in comments.
* installdir/search.html: Add DTD tag for HTML 4 compliance.
* installdir/{long, syntax, header, footer, wrapper, nomatch}.html:
Add DTD tags, ALT attributes and remove bogus </select> tags to
fix invalid HTML pointed out in PR#901. Change all <b> and <i> tags
to the HTML-4.0 compliant <strong> and <em> tags.
* htdoc/config.html: Updated with sample of latest htdig.conf and
installdir/*.html, added blurb on wrapper.html.
Thu Jul 26 15:05:29 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htcommon/defaults.cc, htsearch/parser.cc (perform_or),
htdoc/attrs.html, htdoc/cf_by{name,prog}.html: Added new attribute
multimatch_method and used it to boost score on 'or' method with
multiple matches.
Thu Jul 26 14:25:01 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htcommon/defaults.cc, htsearch/parser.cc, htdoc/attrs.html,
htdoc/cf_by{name,prog}.html: Added new attribute boolean_syntax_errors
and used it to generate syntax error messages for boolean method.
Wed Jul 25 23:39:00 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htnotify/htnotify.cc: Changed calls to EmailNotification class
to avoid compiler warnings.
Wed Jul 25 23:15:24 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htcommon/defaults.cc, htsearch/htsearch.cc, htdoc/attrs.html,
htdoc/cf_by{name,prog}.html: Added new attribute boolean_keywords
and used it to make LOGICAL_WORDS and parse "words" using boolean
method.
Wed Jul 25 22:31:19 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htlib/Dictionary.cc (Remove): Fixed so it doesn't clobber rest of
chain when removing an entry, as suggested by Yariv Tal.
Wed Jul 25 22:06:08 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htcommon/defaults.cc: Add new attributes htnotify_replyto,
htnotify_webmaster, htnotify_prefix_file, htnotify_suffix_file.
* htdoc/attrs.html, htdoc/cf_by{name,prog}.html: Document them.
* htnotify/htnotify.cc, htnotify/EmailNotification.{h,cc},
htnotify/Makefile.in: Added in code from Richard Beton
<richard.beton@roke.co.uk> to collect multiple URLs per e-mail
address and allow customization of notification messages by
reading in header/footer text as designated by the new attributes
above.
* htdoc/THANKS.html: Credit where due.
Wed Jul 25 21:38:21 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htcommon/defaults.cc: Added .css to bad_extensions, for consistency
with 3.2.
* htdoc/attrs.html: Ditto for default value. Also set examples for
translate_* and modification_time_is_now to false so the example is
different than default.
Wed Jul 25 17:26:07 2001 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdig/Document.cc (getParsable): Add conditional to catch
text/css files to prevent these from being parsed as Plaintext.
* htdig/htdig.cc: Quick fix to make the logging -l flag the
default behavior. (Set to Retriever_logUrl from the start.)
* htcommon/defaults.cc: Set modification_time_is_now to default to
true (now that it works correctly). Also set translate_*
attributes to true.
* htdoc/htdig.html: Remove documentation for -l flag--now no
longer used.
* htdoc/attrs.html: Correct new default values for
modification_time_is_now and translate_* attributes.
Tue Jul 24 16:12:45 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/attrs.html: Added reference to maximum_page_buttons in the
section on maximum_pages.
Tue Jul 24 15:38:39 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htsearch/Display.cc (generateStars): Add NSTARS variable for
template output as suggested by Caleb Crome
<ccrome@users.sourceforge.net> (except here precision is 0). Fixes
feature request #405787.
* htdoc/hts_templates.html: Add description of NSTARS variable
above. (Actually copied hts_templates.html from 3.2.0b4.)
Tue Jul 24 14:21:53 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htsearch/Display.cc (expandVariables, outputVariable),
htdoc/hts_templates.html: Add support for $=(var) template variable
references, as suggested by Quim Sanmarti.
Tue Jul 24 14:12:06 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htsearch/Display.cc (readFile): Added missing fclose() call, and
debugging message for when file can't be opened.
* htsearch/Display.cc (displayParsedFile): Added debugging message
for when file can't be opened.
Tue Jul 24 14:03:12 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htsearch/Display.cc (setVariables), htcommon/defaults.cc: Added
maximum_page_buttons attribute, to limit buttons to less than
maximum_pages. Fixes PR#731 & PR#781.
* htdoc/attrs.html, htdoc/cf_by{name,prog}.html: Documented it.
Tue Jul 24 13:42:56 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/hts_templates.html, htsearch/Display.cc (displayMatch):
Add METADESCRIPTION variable.
Tue Jul 24 13:20:24 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htcommon/DocumentDB.{h,cc}: Added FindCoded() method to lookup
docdb record with URL that's still encoded.
* htsearch/Display.cc (display, displayMatch, buildMatchList):
Use new method to avoid problems with URLs that are decoded and
reencoded with another, more ambiguous url_part_aliases setting.
Also fixed a problem with date range checking looking at ref before
checking if it's null.
Thu Jul 12 11:45:05 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* contrib/conv_doc.pl, contrib/parse_doc.pl: Fixed EOF handling in
dehyphenation, fixed to handle %xx codes in title made from URL.
* contrib/doc2html/doc2html.pl, contrib/doc2html/pdf2html.pl,
contrib/doc2html/swf2html.pl: Fixed to handle %xx codes in URL title.
Thu Jul 5 11:23:40 2001 Geoff Hutchison <ghutchis@wso.williams.edu>
* db/dist/config.guess: Update with more recent GNU version that
recognizes various flavors of Mac OS X automatically.
* htlib/DB2_db.cc: Only #include <malloc.h> if we have it. Fixes
compilation problems on Mac OS X.
* htlib/String.cc: Include <iostream.h> instead of depreciated
<stream.h>. Fixes compilation problems with Mac OS X.
* htlib/Configuration.cc: Make sure we never try to operate on
strings of no length--accessing string[-1] is a bug--exposed on
Mac OS X.
Fri Jun 29 11:56:25 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/Retriever.cc (got_redirect): Allow the redirect to accept
relative redirects instead of just full URLs.
Fri Jun 22 16:25:21 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/THANKS.html: Credit Marc Pohl and Robert Marchand.
* htsearch/Display.cc (buildMatchList): Fix date_factor calculation
to avoid 32-bit int overflow after multiplication by 1000, and avoid
repetitive time(0) call, as contributed by Marc Pohl. Also move the
localtime() call up before gmtime() call, to avoid clobbering gmtime's
returned static structure (my thinko).
Tue Jun 19 17:07:01 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htsearch/Display.cc (setVariables): Fixed handling of
build_select_lists attribute, to deal with new restrict & exclude
attributes.
Fri Jun 15 17:45:40 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/require.html: Added mentions of accents, prefix & substring,
taken from 3.2.0b4.
* htdoc/htfuzzy: Added blurb on accents algorithm, taken from 3.2.0b4.
* htdoc/attrs.html, htdoc/cf_by{name,prog}.html: Added entry for
accents_db attribute for htfuzzy and htsearch. Mentioned accents
algorithm in description of search_algorithm. Noted effect of
locale setting on floating point numbers in search_algorithm
and locale descriptions.
Fri Jun 15 16:47:09 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htfuzzy/Accents.{h,cc}, htfuzzy/Fuzzy.c (getFuzzyByName),
htfuzzy/htfuzzy.cc (main, usage), htfuzzy/Makefile.in: Added
latest version of Robert Marchand's accents fuzzy match algorithm.
* htcommon/defaults.cc: Added accents_db attribute for this.
* htsearch/htsearch.cc: Fixed parsing of search_algorithm not to
use comma as separator, because it may be needed as decimal point
in some locales.
Fri Jun 15 16:30:19 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htfuzzy/Endings.cc (getWords): Undid change introduced in 3.1.3,
in part. It now gets permutations of word whether or not it has
a root, but it also gets permutations of one or more roots that
the word has, based on a suggestion by Alexander Lebedev.
* htfuzzy/EndingsDB.cc (createRoot): Fixed to handle words that have
more than one root.
* installdir/english.0: Removed P flag from wit, like and high, so
they're not treated as roots of witness, likeness and highness, which
are already in the dictionary.
Thu Jun 7 17:09:46 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htcommon/defaults.cc: Add new attribute use_doc_date to use
document meta information for the DocTime() field.
* htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html:
Document it.
* htdig/HTML.cc(do_tag): Call Retriever::got_time if use_doc_date
is set and we run across a META date tag.
* htdig/Retriever.h, htdig/Retriver.cc: Add new got_date
function. When called, sets the DocTime field of the DocumentRef
after parsing is completed. Currently assumes ISO 8601 format for
the date tag.
Thu Jun 7 16:48:13 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htcommon/defaults.cc: Add new attribute any_keywords to allow
ORing of keywords input parameter.
* htsearch/htsearch.cc (addRequiredWords): Use it. Fix handling
of empty search word list.
* htsearch/Display.cc (excerpt, highlight): Fix handling of case
where "words" is empty but "keywords" isn't.
* htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html:
Document any_keywords.
Thu Jun 7 16:34:41 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htcommon/defaults.cc: Add new attribute plural_suffix to set the
language-dependent suffix for PLURAL_MATCHES contributed by Jesse.
* htsearch/Display.cc (setVariables): Use it.
* htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html:
Document it.
Thu Jun 7 16:03:17 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htsearch/Display.{h,cc}, htcommon/defaults.cc: Added multi-excerpt
feature and max_excerpts attribute, as contributed by Jim Cole.
* htdoc/THANKS.html, htdoc/attrs.html, htdoc/cf_byname.html,
htdoc/cf_byprog.html: Credit where due, and document attribute.
Thu Jun 7 15:27:33 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/ExternalParser.cc: Backported from 3.2.0b3, fixing these
problems: no longer confused by "; charset=..." in Content-Type,
avoids security problems with popen() and shell parsing untrusted URL
(PR#542, PR#951), avoids predictable temporary file name if mkstemp()
exists, binary output from external converter no longer mangled,
less ambiguous error messages, opens temp. file in binary mode on
non-Unix systems.
Thu Jun 7 15:10:14 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htcommon/DocumentDB.{h,cc}: Replace CreateSearchDB() with DumpDB(),
add LoadDB(), both backported from 3.2.0b3.
* htdig/htdig.cc (main, usage), htdig/Makefile.in, htdoc/htdig.html:
Add handling of -m (minimal) option, file input for URLs, and arg 0
handling for htdump & htload.
* htdig/HTML.cc (do_tag): Change all white space to blanks in meta
description tag, for proper ASCII record dumps by htdump, and to fix
bug #405771.
* htlib/String.cc (= operator), htlib/htString.cc: change handling
of 0 length strings. Add readLine() for htload support.
Thu Jun 7 14:41:42 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/Retriever.cc (got_href): Fix hop count mishandling.
Thu Jun 7 14:23:47 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htmerge/db.cc (mergeDB), htmerge/words.cc (mergeWords),
installdir/rundig: Fix various htmerge bugs. Quotes the temp.
directory name and word_list name (PR#872). Correctly handles
words beginning with +, - and ! when in extra_word_characters
(PR#952). Corrects problems with bad wordlists generated by
htmerge -m causing it to lose entries in words.db and problems
with the sort program using non-ASCII collating having a similar
effect.
Thu Jun 7 14:13:56 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htsearch/htsearch.cc (main), htsearch/Display.cc (setVariables,
createURL, buildMatchList), htdoc/THANKS.html, htdoc/hts_form.html,
htdoc/hts_templates.html: Add Mike Grommet's date range search
feature.
Thu Jun 7 13:57:06 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/Retriever.cc (GetLocal, GetLocalUser): Fix to allow compiling
on AIX & other non-GNU compilers.
Thu Jun 7 13:52:20 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htsearch/Display.cc (setVariables): Extend the handling of
build_select_lists to handle select multiple, radio buttons and
checkboxes.
* htdoc/attrs.html, htdoc/hts_selectors.html: Describe this.
Thu Jun 7 13:40:13 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htfuzzy/Exact.cc (Exact), htfuzzy/Prefix.cc (Prefix): Set the
name field to the class name, as suggested by Jesse.
Thu Jun 7 13:27:35 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* contrib/htdig-3.1.6.spec, contrib/htdig-3.1.6-conf.patch,
htdoc/where.html, .version, README: Bump to version 3.1.6.
Thu Jun 7 11:58:28 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* contrib/multidig/*: Backport from 3.2.0b3, including fixes below.
* contrib/multidig/Makefile, gen-collect, db.conf, multidig.conf:
Add missing trailing newlines as pointed out by Doug Moran
<dmoran@dougmoran.com>.
* contrib/multidig/Makefile (install): Make sure scripts have a+x
permissions. Pointed out by Doug Moran.
* contrib/multidig/new-collect: Fix typo to ensure MULTIDIG_CONF
is set correctly.
Thu Jun 7 11:37:52 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* contrib/README: Add in descriptions for web site contrib directory,
acroconv.pl & conv_doc.pl.
* contrib/examples/rundig.sh: Update to most recent version for 3.1.x.
* contrib/htparsedoc/htparsedoc: Add in contributed bug fixes from
Andrew Bishop to work on SunOS 4.x machines.
* contrib/acroconv.pl: Added external converter script to convert
PDFs with acroread.
Thu Jun 7 10:41:05 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htlib/ParsedString.cc (get), htsearch/Display.cc (expandVariables):
Use isalnum() instead of isalpha() to allow digits in attribute and
variable names, allow '-' in variable names too for consistency.
Wed Jun 6 17:13:49 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/HTML.cc (do_tag): Make parsing of meta robots tag case
insensitive.
Wed Jun 6 15:31:00 2001 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* contrib/doc2html/DETAILS, contrib/doc2html/README,
contrib/doc2html/doc2html.cfg, contrib/doc2html/doc2html.sty,
contrib/doc2html/doc2html.pl, contrib/doc2html/pdf2html.pl,
contrib/doc2html/swf2html.pl: Added version 3.0 of doc2html,
contributed by David Adams <D.J.Adams@soton.ac.uk>.
Mon Jun 4 10:31:45 CEST 2001 Gabriele Bartolini <angusgb@users.sourceforge.net>
* htdoc/cf_byname.html: I forgot to insert the 'restrict' attribute.
Wed May 30 11:30:43 2001 Gabriele Bartolini <angusgb@users.sourceforge.net>
* htsearch/htsearch.cc: two new attributes, used by htsearch, have
been added: restrict and exclude. They can now give more control
to template customisation through configuration files, allowing
to restrict or exclude URLs from search without passing
any CGI variables (although this specification overrides the
configuration one).
* htcommon/defaults.cc: ditto
* htdoc/attrs.html: ditto
* htdoc/cf_byname.html: ditto
* htdoc/cf_byprog.html: ditto
* htdoc/hts_form.html: ditto
Sat May 5 21:43:32 2001 Geoff Hutchison <ghutchis@wso.williams.edu>
* configure.in, configure: Add tests for wait.h, sys/wait.h,
mkstemp() and malloc.h.
* acconfig.h, include/htconfig.h.in: Update with autoheader for
new tests.
* htlib/regex.[h,c]: Update with backports from 3.2.0b4 development.
Tue Feb 29 23:04:04 2000 Geoff Hutchison <ghutchis@wso.williams.edu>
* htlib/DB2_db.cc (Error): Simply fprint the error message on
stderr. This is not a method since the db.h interface expects a C
function.
(db_init): Don't set db_errfile, instead set errcall to point to
the new Error function.
Fri Feb 25 10:11:50 2000 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/attrs.html (maximum_pages): Describe new bahaviour (as of
3.1.4), where this limits total matches shown.
Thu Feb 24 20:24:24 2000 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdoc/FAQ.html: Update to refer to 3.1.5 and edit comments about 3.2.
Thu Feb 24 15:20:08 2000 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/RELEASE.html, htdoc/main.html: Updated notes for 3.1.5 release.
Thu Feb 24 10:37:45 2000 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/attrs.html (external_parsers): Add references to FAQ 4.8 & 4.9.
(local_default_doc): Give an expanded example.
(logging): Explain log entry format.
(star_blank): Fix some old typos (incorrect references to other attrs.)
Wed Feb 23 13:58:24 2000 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htcommon/cgi.cc(init): Fixed bug: array must be free by
delete [] buf, not just delete buf; (from Vadim).
* installdir/syntax.html: Fixed a $(WORDS) I'd missed earlier.
Tue Feb 22 12:40:22 2000 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/RELEASE.html, htdoc/main.html: Updated notes for 3.1.5 release.
* htlib/URL.cc (URL, normalizePath): Fix PR#779, to handle relative
URLs correctly when there's a trailing ".." or leading "//".
Thu Feb 17 15:58:53 2000 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/RELEASE.html, htdoc/main.html: Add notes for 3.1.5 release.
* htdoc/TODO.html, htdoc/author.html, htdoc/bugs.html,
htdoc/cf_general.html, htdoc/cf_types.html, htdoc/cf_variables.html,
htdoc/config.html, htdoc/howitworks.html, htdoc/htdig.html,
htdoc/htfuzzy.html, htdoc/htmerge.html, htdoc/htnotify.html,
htdoc/hts_form.html, htdoc/hts_general.html, htdoc/hts_method.html,
htdoc/install.html, htdoc/isp.html, htdoc/mailing.html,
htdoc/meta.html, htdoc/notification.html, htdoc/require.html,
htdoc/uses.html, htdoc/where.html: Update copyright date and fix
last modified date for automatic CVS update.
Thu Feb 17 14:37:18 2000 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* installdir/htdig.conf: quote all HTML tag parameters.
* htsearch/TemplateList.cc (createFromString), installdir/long.html,
installdir/short.html: Use $&(URL) in templates.
Thu Feb 17 14:01:34 2000 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* contrib/htdig-3.1.5.spec: Fix silly typos in %post script,
make cron script a %config file.
Thu Feb 17 10:34:05 2000 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
[ Improve htsearch's HTML 4.0 compliance ]
* htsearch/TemplateList.cc (createFromString): Use file name rather
than internal name to select builtin-* templates, use $&(TITLE) in
templates and quote HTML tag parameters.
* installdir/long.html, installdir/short.html: Use $&(TITLE) in
templates and quote HTML tag parameters.
* htsearch/Display.cc (setVariables): quote all HTML tag parameters
in generated select lists.
* installdir/footer.html, installdir/header.html,
installdir/nomatch.html, installdir/search.html,
installdir/syntax.html, installdir/wrapper.html:
Use $&(var) where appropriate, and quote HTML tag parameters.
Thu Feb 17 10:00:26 2000 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* contrib/htdig-3.1.5.spec: Fix %post script to add more descriptive
htdig.conf entries.
Wed Feb 16 16:26:05 2000 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* contrib/htdig-3.1.5.spec, contrib/htdig-3.1.5-conf.patch,
htdoc/where.html, .version, README: Bump to version 3.1.5.
* htdoc/THANKS.html: Added new contributors.
* htdoc/FAQ.html, htdoc/main.html: Updated to versions from web site.
Wed Feb 16 15:49:28 2000 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htlib/Configuration.h, htlib/Configuration.cc: split Add() method
into Add() and AddParsed(), so that only config attributes get parsed.
Use AddParsed() only in Read() and Defaults().
Wed Feb 16 15:02:47 2000 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htlib/URL.h (encodeURL): Change list of valid characters to
include only unreserved ones.
* htlib/cgi.cc (init): Allow "&" and ";" as input parameter separators.
* htsearch/Display.cc (createURL): Encode each parameter separately,
using new unreserved list, before piecing together query string, to
allow characters like "?=&" within parameters to be encoded.
Wed Feb 16 14:42:02 2000 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htsearch/Display.cc (encodeSGML, excerpt): Add encoding for
characters that could pose problems in HTML output.
* htsearch/Display.cc (expandVariables, outputVariables): Add support
for $&(var) and $%(var) template variable references. This should
fix PR#750, once we use this in common/*.html.
Tue Feb 15 17:21:08 2000 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
[ Applied a whole collection of patches and fixes from the archives ]
* htdig/Server.cc (robotstxt): apply more rigorous parsing of
multiple user-agent fields, and use only the first one.
* htdig/Retriever.cc(GetLocal, GetLocalUser): Add URL-decoding
enhancements to local_urls, local_default_urls & local_default_doc,
to allow hex encoding of special characters.
* htdoc/attrs.html: Document these.
* htdig/Retriever.cc (IsValidURL): Fix problem with
valid_extensions when an "extension" would include part of a
directory path or server name, as contributed by Warren Jones.
Also fix problem with valid_extensions matching failure when URL
parameters follow extension, as reported by fxbois@cybercable.fr.
* htdig/Document.cc (RetrieveLocal), htdig/Document.h,
htdig/Retriever.cc(Initial, parse_url, GetLocal, GetLocalUser,
IsLocalURL, got_href, got_redirect), htdig/Retriever.h,
htdig/Server.cc(Server), htdig/Server.h: Apply Paul B. Henson's
enhancements to local_urls, local_user_urls & local_default_doc.
* htdoc/attrs.html: Document these.
* htsearch/htsearch.cc (setupWords): Fix problem reported by
D.J. Adams, in which bad_words removal failed on upper-case
search words.
* htsearch/Display.cc(setVariables), htcommon/defaults.cc: Added
build_select_lists attribute, to generate selector menus in forms.
* htdoc/hts_selectors.html: Added this page to explain this new
feature, plus other details on select lists in general.
* htdoc/hts_templates.html: Added relevant links to related attributes
and selectors documentation.
* htdoc/attrs.html, htdoc/cf_by{name,prog}.html: Added relevant
explanations and links to selectors documentation.
* htlib/QuotedStringList.cc (Create): fix PR#743, where quoted string
lists didn't allow embedded quotes of opposite sort in strings
(e.g. "'" or '"'), and fix to avoid overrunning end of string
if it ends with backslash.
* htcommon/WordList.cc (valid_word): Applied Marc Pohl's fix to make
this 8-bit clean on Solaris.
* contrib/conv_doc.pl, contrib/parse_doc.pl: Applied Warren Jones's
changes to these scripts.
* htdig/PDF.cc (parseNonTextLine): Fix bogus escape sequences
around Title parsing. (Fixes PR#740)
* htsearch/Display.cc (display, displaySyntaxError),
htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html,
htcommon/defaults.cc: Add new attribute "nph" to send out
non-parsed headers for servers that do not supply HTTP headers on
CGI output (e.g. IIS). If nph is set, send out HTTP OK header,
as suggested by Matthew Daniel <mdaniel@scdi.com> (PR#727)
* htdig/Document.cc (getdate): avoid strftime() altogether on
filled-in tm structure, to avoid recurring segfault problems. (PR#734)
* htlib/strptime.cc (mystrptime): Use Warren Jones's fix to deal
with a web server that returns dates with a two digit year field.
(Fixes PR#770)
* htdig/HTML.cc (HTML, parse, do_tag), htcommon/defaults.cc,
htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html:
Add max_keywords attribute to limit meta keyword spamming.
Wed Dec 8 18:19:32 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdoc/FAQ.html, htdoc/bugs.html: Update to refer to latest versions.
(Update for 3.1.4 release.)
Wed Dec 8 18:10:27 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htlib/QuotedStringList.cc (Create): Make sure that an empty
token isn't ignored.
Tue Dec 7 10:26:58 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htsearch/Display.cc (setVariables): Fix a compilation error by
making a statment with '?' an explicit if-else statment.
* htdoc/RELEASE.html: Change case_sensitive fix to a bug-fix,
update release date for 12/9/99. (We certainly didn't release yesterday!)
Mon Dec 6 22:17:21 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htsearch/Display.cc(Display): Add missing call to setupTemplates(),
for handling template_patterns. Oops!
* htdoc/attrs.html: Fixed a couple typos in new attributes.
* htdoc/ChangeLog: Update to latest version.
Mon Dec 6 16:41:04 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/main.html: Update news with latest version.
* htdig/htdig.cc(main), htdig/Document.cc(Document),
htcommon/defaults.cc, htdoc/attrs.html, htdoc/cf_byname.html,
htdoc/cf_byprog.html: Add authorization attribute, settable by
htdig -u. Also fixes PR#490, by setting authentication before
robots.txt fetched.
* htdoc/RELEASE.html: Update with latest fix.
Fri Dec 3 17:31:47 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htcommon/DocumentRef.cc(Clear): Set docHopCount & docSig to 0,
and clear docEmail, docNotification & docSubject strings to have
a clean slate for Deserialize(), which assume 0/empty for these.
Fixes problem with hop counts getting clobbered.
* htdoc/RELEASE.html: Update with latest fix.
* htdoc/ChangeLog: Update to latest version.
Fri Dec 3 12:12:19 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/Document.cc: removed vestiges of internal Postscript
support that never worked, and removed test for application/msword,
which is handled only by external parser.
* htdig/Makefile.in: removed Postscript.o from list.
* htdig/Retriever.cc(parse_url): Fix compilation error;
(Initial, got_href, got_redirect): Try to get the local filename
for a server's robots.txt file and pass it along to the newly
generated server.
* htdig/Server.cc(Server): Retrieve the robots.txt file from the
filesystem when possible; fix compilation error.
* htdig/Server.h(Server): Add local_robots_file parameter to Server().
* htlib/HtWordType.h, htlib/HtWordType.cc: fix compilation errors.
Fri Dec 3 10:52:57 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/HTML.cc(parse, do_tag): Add handling of <img alt=...> text,
fix parsing of words in meta tags, disable indexing of meta tags
when "noindex" state in effect, fix calculations of word positions
to more accurately reflect relative positions.
* htlib/HtWordType.h, htlib/HtWordType.cc: Add HtWordToken() function,
to replace strtok() in HTML parser.
* htdoc/RELEASE.html: Update with latest fixes.
Fri Dec 3 09:02:55 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htlib/Configuration(Add): handle strings in single quotes, as in
parm='value'.
Thu Dec 2 16:14:28 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/attrs.html: Add Tom Metro's suggested revisions for pdf_parser
and external_parsers.
Thu Dec 2 15:15:03 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/mailing.html: Updated to version from htdig.org web site.
* htcommon/defaults.cc: Add missing no_page_number_text and
page_number_text attribute definitions.
* htdoc/attrs.html(modification_time_is_now): Make the description
a bit clearer as to how it may cut down on reindexing.
Thu Dec 2 13:46:11 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/Retriever.cc(parse_url), htdig/Server.cc(Server),
htcommon/defaults.cc, htdoc/attrs.html, htdoc/cf_byname.html,
htdoc/cf_byprog.html: Add support for local_urls_only attribute.
* htdoc/RELEASE.html: Update with latest feature.
Thu Dec 2 11:02:07 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htlib/URL.cc(ServerAlias): Fix server_aliases processing to prevent
infinite loop (as for local_urls in PR#688).
Wed Dec 1 17:23:24 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/Retriever.cc(parse_url), htdig/Server.h: add IsDead() methods
to query and set server status, use them in Retriever to avoid repeated
HTTP request to a dead server. (Needed for persistent local stuff.)
Wed Dec 1 16:56:28 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/Retriever.cc(GetLocal): Fix error in GetLocalUser() return
value check, as suggested by Vadim.
* contrib/conv_doc.pl: Added a sample external converter script.
* htdoc/THANKS.html: A couple more additions.
Tue Nov 30 15:02:25 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/Retriever.cc(IsValidURL): Fix compilation error in
valid_extensions list handling.
* contrib/htdig-3.1.4.spec, contrib/htdig-3.1.4-conf.patch:
Added sample RPM spec file and config patch for it.
Tue Nov 30 14:01:51 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/where.html: Bump to version 3.1.4.
* htdoc/THANKS.html: Added new contributors.
* htdoc/isp.html, htdoc/uses.html, htdoc/main.html, htdoc/mailing.html:
Updated to versions from htdig.org web site.
Tue Nov 30 13:01:20 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/RELEASE.html: Add release notes for 3.1.4 release.
* .version, README: Bump for 3.1.4.
Tue Nov 30 11:03:34 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/attrs.html(backlink_factor): Added Geoff's clarification of
what this attribute does.
Tue Nov 30 09:47:05 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/Document.cc(RetrieveLocal): Handle common extensions for
text/plain, application/pdf & application/postscript.
* htdig/Retriever.cc(IsValidURL): Add valid_extensions list handling,
make it and bad_extensions case insensitive.
* htcommon/defaults.cc: Add config attribute valid_extensions,
with default as empty.
* htdoc/attrs.html, htdoc/cf_by{name,prog}.html: Document it.
Tue Nov 30 09:02:02 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/Retriever.cc(got_href & got_redirect): remove all of Patrick's
case insensitive server code, to replace it with Geoff's fix to URL.cc
* htlib/URL.cc(normalizePath, path): If not case_sensitive,
lowercase the URL. Should ensure that all URLs are appropriately
lowercased, regardless of where they're generated.
Mon Nov 29 20:25:01 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/Retriever.cc, htdig/Retriever.h, htdig/Server.cc(push),
htdig/Server.h: added Alexis's patch for persistent local digging
even if HTTP server is down. Also made new GetLocal() method
call GetLocalUser() itself, to simplify its use, and made it
non-private, for eventual use by Server code.
Mon Nov 29 19:18:20 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/Retriever.cc(got_href & got_redirect): corrections to case
insensitive server fix, to handle redirects, to make more thorough
use of mapped URL, and to update it after normalization.
Fri Nov 26 17:14:46 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/Document.cc(RetrieveHTTP): always c.close() the connection
when returning.
* htdig/HTML.cc(HTML & do_tag): add code to turn off indexing between
<style> and </style> tags.
Fri Nov 26 16:31:06 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htlib/Configuration.cc(Read): fixed to allow final line without
terminating newline character, rather than ignoring it.
* htlib/String.cc(write): added Alexis Mikhailov's fix to bump up
pointer after writing a block.
* htsearch/Display.cc(setVariables): added Alexis Mikhailov's fix
to check the number of pages against maximum_pages at the right time.
(Put it even earlier, to make sure nPages is at least 1.)
* htsearch/Display.cc(generateStars): Remove extra newline after
STARSRIGHT and STARSLEFT variables, noted by Torsten Neuer
<tneuer@inwise.de>.
Wed Nov 24 20:33:13 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* installdir/htdig.conf: Add bad_extensions to make it
more obvious to users how to exclude certain document types.
Fix the comments for search_algorithm to refer to all the current
possibilities. Add example of no_excerpt_show_top attribute in
line with most user's expectations. (Geoff's changes)
Wed Nov 24 20:02:32 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* installdir/search.html (Match): Add Boolean to default search
form, as suggested by PR#561.
Tue Nov 23 23:03:45 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htsearch/Display.cc(setupTemplates), htsearch/Display.h: fixed a
couple of compilation errors in template_patterns code.
Tue Nov 23 22:16:31 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/Retriever.cc(got_href): Applied Patrick's case insensitive
server fix, to lowercase all URLs if case_sensitive is false.
Tue Nov 23 22:08:22 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htlib/StringList.cc(Join): Applied Loic's patch to fix memory leak.
Tue Nov 23 21:52:18 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
[Applied patch from Hanno Mueller <kontakt@hanno.de>, which includes...]
* contrib/README: Add scriptname directory.
* contrib/scriptname/*: An example of using htsearch within
dynamic SSI pages
* htcommon/defaults.cc: Add script_name attribute to override
SCRIPT_NAME CGI environment variable.
* htdoc/FAQ.html: Update question 4.7 based on including htsearch
as a CGI in SSI markup.
* htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html,
htdoc/hts_templates.html: Update based on behavior of script_name
attribute.
* htsearch/Display.cc: Set SCRIPT_NAME variable to attribute
script_name if set and CGI environment variable if undefined.
Tue Nov 23 21:29:03 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/FAQ.html: Added the past few month's updates to the FAQ.
Tue Nov 23 21:20:35 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htcommon/defaults.cc, htsearch/Display.h, htsearch/Display.cc,
htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html,
htdoc/hts_templates.html: add template_patterns attribute, to select
result templates based on URL patterns.
Tue Nov 23 20:52:38 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htlib/cgi.h, htlib/cgi.cc(cgi & init), htsearch/htsearch.cc
(main & usage): allow a query string to be passed as an argument.
Tue Nov 23 20:35:05 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htsearch/Display.cc(setVariables & createURL),
htsearch/htsearch.cc(main), htdoc/hts_templates.html: handle keywords
input parameter like others, and make it propagate to followups.
Tue Nov 23 20:25:45 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/attrs.html: removed vestigial references to MAX_MATCHES
template variables in search_results_{header,footer}.
* htdoc/hts_form.html: add disclaimer about keywords parameter not
being limited to meta keywords.
* htdoc/meta.html: add description of "keywords" meta tag property.
add links to keywords_factor & meta_description_factor attributes.
Tue Nov 23 20:07:20 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htsearch/Display.cc(setVariables & hilight): added Sergey's idea
for start_highlight, end_highlight & page_number_separator attributes.
* htcommon/defaults.cc: added defaults for these.
* htdoc/attrs.html, htdoc/cf_by{name,prog}.html: documented them.
Tue Nov 23 19:58:28 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/ExternalParser.cc: added support for external converters
as extension to external_parsers attribute.
* htdoc/attrs.html: Updated external_parsers with new description
and examples of external converters.
Tue Nov 23 19:52:27 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/HTML.cc(transSGML), htdig/SGMLEntities.cc(translateAndUpdate):
Fix the infamous problem in htdig 3.1.3 of mangling URL parameters that
contain bare ampersands (&), and not converting &amp; entities in URLs.
* htdig/Retriever.cc(IsLocal & IsLocalUser): Fix PR#688, where
htdig goes into an infinite loop if an entry in local_urls
(or local_user_urls) is missing a '=' (or a ',').
* htcommon/cgi.cc(cgi): Fix bug in reading long queries via POST
method (PR#668).
* htnotify/htnotify.cc(send_notification): apply Jason Haar's fix
to quote the sender name "ht://Dig Notification Service".
Wed Sep 22 11:12:38 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdoc/ChangeLog, htdoc/isp.html, htdoc/FAQ.html,
htdoc/RELEASE.html, htdoc/THANKS.html, htdoc/attrs.html,
htdoc/bugs.html, htdoc/contents.html, htdoc/main.html,
htdoc/require.html, htdoc/uses.html, htdoc/where.html: Update for
3.1.3 release and synch with latest versions from the website.
Wed Sep 15 17:54:31 1999 Alexander Bergolth <leo@leo.wu-wien.ac.at>
A few changes to satisfy the AIX xlC compiler:
* htdig/htdig.cc: Moved variable declaration out of case block.
* configure.in, htconfig.in: Add check for sys/select.h.
Add "long unsigned int" to the possible getpeername_length types.
* htlib/Connection.cc: Include sys/select.h.
Sun Sep 12 15:02:19 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* .version: Bump for 3.1.3.
* README: Bump first line for 3.1.3 release, remove mention of rx
directory.
* htdoc/ChangeLog: Update with latest version.
* htdoc/RELEASE.html: Add release notes for 3.1.3 release.
Thu Sep 9 14:52:19 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* contrib/parse_doc.pl: fix bug in pdf title extraction.
Wed Sep 1 15:58:14 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/Retriever.cc(got_word): add code to check for compound words
and add their component parts to the word database.
* htdig/PDF.cc(parseString), htdig/Plaintext.cc(parse): Don't strip
punctuation or lowercase the word before calling got_word. That
should be left up to got_word & Word methods.
* htlib/StringMatch.h, htlib/StringMatch.cc(Pattern, IgnoreCase):
Add an IgnorePunct() method, which allows matches to skip over valid
punctuation, change Pattern() and IgnoreCase() to accomodate this.
* htsearch/htsearch.cc(main, createLogicalWords): use IgnorePunct()
to highlight matching words in excerpts regardless of punctuation,
toss out old origPattern, and don't add short or bad words to
logicalPattern.
* htlib/HtWordType.h, htlib/HtWordType.cc(Initialize): set up and
use a lookup table to speed up HtIsWordChar() and HtIsStrictWordChar().
Wed Sep 1 15:48:13 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/PDF.cc(parse), htcommon/defaults.cc, htdoc/attrs.html:
Fix PDF.cc to handle acroread in Acrobat 4, which has a bug with
the -pairs option. It turns out that even without the -pairs
option, acroread 4 is still prone to segmentation violations when
generating PostScript, so acroread 3 is a better choice anyway.
* htdoc/FAQ.html: Added the past few month's updates to the FAQ.
* contrib/parse_doc.pl: Updated to latest version, adapted for
xpdf 0.90.
Wed Sep 1 15:39:41 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
Applied "bugfixes" patch collection, which I had posted to
htdig@htdig.org mailing list in August. Changes include...
* htsearch/Display.cc(expandVariables): Fix problem with $(VAR)
at end of template string not being expanded.
* htlib/URL.cc(URL): Fix PR#566 by setting the correct length of the
string being matched. 'http://' is 7 characters. Submitted by
<wolfgang.pichler@creditanstalt.co.at>.
* htdig/HTML.h, htdig/HTML.cc(do_tag, transSGML): Fix the HTML parser
to decode SGML entities within tag attributes.
* htlib/URL.cc(ServerAlias): Fix server_aliases entries so port
defaults to 80 if omitted.
* htlib/URL.cc(removeIndex): Fix the infamous problem with files
like left_index.html not getting indexed. PR#543 & PR#585.
* htdig/PDF.cc(parseNonTextLine): Fixed a bug in the PDF parser:
when the Title header was just the temporary file name, it
wouldn't be used, but it also wouldn't be cleared from the
_parsedString variable, so it ended up polluting the document
excerpt.
* htdig/Document.cc(RetrieveHTTP): Added error messages for unknown
hosts.
* htlib/cgi.cc(cgi): Fix PR#572, where htsearch crashed if
CONTENT_LENGTH was not set but REQUEST_METHOD was.
* htdig/HTML.cc(do_tag): Fix <meta> robots parsing to allow
multiple directives to work correctly. Fixes PR#578, as provided
by Chris Liddiard <c.h.liddiard@qmw.ac.uk>.
* htsearch/htsearch.cc(main): Allow multiple keywords input
parameters in search forms.
* htdig/Document.cc(Reset, readHeader): Fix the bug in the handling
of modification_time_is_now.
* htfuzzy/Fuzzy.cc(getWords), htfuzzy/Metaphone.cc(vscode,generateKey):
Should fix PR#514 in the bug database. It's Geoff's first attempt,
with a minor correction, plus an added test in the vscode macro,
which is where the problem seemed to be happening. This won't
map accented vowels to their unaccented counterparts, but
it should hopefully put an end to the segmentation faults.
* include/htconfig.h.in, htcommon/WordReference.h,
htcommon/WordList.cc(Word, Flush, BadWordFile),
htcommon/DocumentRef.cc(AddDescription), htcommon/defaults.cc,
htsearch/parser.cc(perform_push), htdoc/attrs.html,
htdoc/cf_byname.html, htdoc/cf_byprog.html: Change the maximum word
length into a run-time option, rather than compile-time.
* htsearch/Display.cc(displayMatch): Applied Torsten Neuer's
<tneuer@inwise.de> fix for PR#554.
* htdig/HTML.cc(HTML, do_tag): Added support for <embed>, <object>
and <link> tags.
* htdig/htdig.cc(main): Applied Geoff's patch to hide the
username/password in the command line arguments.
* htdig/Document.cc(readHeader): Fixed a few problems with header
parsing, including PR#535 & PR#557.
* htdig/Document.cc(getdate): This should help with PR#81 & PR#472,
where strftime() would crash on some systems. Idea submitted
by benoit.sibaud@cnet.francetelecom.fr.
* COPYING, htdoc/COPYING, Makefile.in: Updated the FSF address
in COPYING & Makefile.in. PR#595.
* htdig/Retriever.cc(IsValidURL): Fix PR#493, to avoid rejecting
a valid URL with ".." in it.
* htlib/URL.cc(parse): Fix PR#348, to make sure a missing
or invalid port number will get set correctly.
* htsearch/Display.h, htsearch/Display.cc(excerpt): Fix declaration
to refer to "first" as reference--ensures ANCHOR is properly set.
Fixes PR#541 as suggested by <pmb1@york.ac.uk>.
* htdig/ExternalParser.cc(parse): Quote the filename before passing
it to the command-line to prevent shell escapes. Fixes PR#542.
Also make error messages more useful.
* htfuzzy/Endings.cc(getWords): Suffix-handling improvement (PR#560),
to prevent inappropriate suffix stripping in endings fuzzy matches.
* htlib/URLTrans.cc(encodeURL): Fix encoding so all non-ascii
characters get hex-encoded. I think this is what PR#339 was all about.
* htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html:
Added descriptions for attributes that were missing, added
a few clarifications, and corrected a few defaults and typos.
Covers PR#558, PR#626, and then some.
* configure.in, configure, include/htconfig.h.in, htlib/regex.c:
Fix PR#545, to test for presence of alloca.h
Wed Apr 21 22:45:16 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* .version: Bump for final 3.1.2 release.
* htdoc/where.html, htdoc/FAQ.html: Update to mention the new release.
Tue Apr 20 13:34:22 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/RELEASE.html: Fixed a few typos, updated modification date.
Tue Apr 20 10:54:59 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdoc/RELEASE.html: Add notes on changes in the 3.1.2 release.
* htdoc/contents.html, htdoc/mailarchive.html, htdoc/where.html,
htdoc/uses.html: Update with versions from maindocs.
* installdir/htdig.conf: Add example max_doc_size attribute to cut
down on FAQ, also add comment on including a file for start_url.
Mon Apr 19 15:40:24 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htcommon/WordList.cc(valid_word): fixed to avoid having the new
HtIsStrictWordChar() test circumvent the allow_numbers option by
allowing numbers all the time. Also fixed to allow HtIsStrictWordChar()
to override iscntrl(), so extra_word_characters can define characters
that a broken locale would define as control characters.
Mon Apr 19 15:17:12 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htcommon/WordList.cc(valid_word): fixed bug introduced Jan 9,
where it stopped scanning for control characters prematurely.
Now also use iscntrl() to detect all control characters.
Fri Apr 16 10:30:42 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/FAQ.html: fixed typo - use_meta_description was plural.
Wed Apr 14 20:22:31 1999 Alexander Bergolth <leo@leo.wu-wien.ac.at>
* htlib/regex.h: fixed compile problem with AIX xlc compiler
Tue Apr 13 13:01:04 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htsearch/Display.cc(generateStars): Set status to -1 if
URLimage.hasPattern() fails, to avoid empty URLimageList.
(Fix to Mar 31 change.)
Tue Apr 13 11:27:45 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htsearch/Display.h(class Display): move enum SortType up to public
section, to avoid problem compiling on IBM AIX C++ compiler.
Mon Apr 12 17:36:20 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/FAQ.html: added sections on indexing docs in other languages,
practical & theoretical limits of ht://Dig.
Fri Apr 9 16:47:34 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/FAQ.html: Fixed a few typos.
Fri Apr 9 16:24:21 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/Document.cc(RetrieveHTTP): Show "Unable to build connection"
message at lower debug level.
Fri Apr 9 15:17:53 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/FAQ.html: Added changes in maindocs from Mar 18, a few
clarifications, and four new questions.
Wed Apr 7 19:41:12 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htsearch/htsearch.cc (usage): Remove bogus -w flag.
Thu Apr 1 11:58:20 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htsearch/htsearch.cc(main): Apply Gabriele's patch to avoid using an
invalid matchesperpage CGI input variable.
* htsearch/Display.cc(display) & (setVariables): Correct any invalid
values for matches_per_page attribute to avoid div. by 0 error.
Wed Mar 31 18:21:21 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdig/htdig.cc: Undo March 30 change.
* htdig/Retriever.cc: Use excludes.hasPattern before using the
exclude list. (More elegant solution to problem, as pointed out by
Gilles.)
* htsearch/Display.cc: Remove code setting URLimage to a bogus
pattern. Instead, check that URLimage.hasPattern() before using
it.
Wed Mar 31 15:16:36 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htfuzzy/Synonym.cc: Fix previous fix of minor memory leak.
(db pointer wasn't properly set)
Tue Mar 30 20:08:18 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdig/htdig.cc: If exclude_urls attribute is set to empty, set
it to something that will never match a URL to ensure nothing is
excluded.
* Makefile.config.in: Fix typo leading to HTLIBS referring to itself.
Mon Mar 29 16:47:48 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htsearch/Display.cc(excerpt): Added patch from Gabriele to
improve display of excerpts--show top of description always,
otherwise try to find the excerpt.
Mon Mar 29 15:57:06 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdig/htdig.cc: Rename main.cc for consistency with other
directories.
* htdig/Makefile.in: Use it.
Mon Mar 29 12:53:17 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htlib/HtWordType.h (HtIsWordChar): Avoid matching 0 when using
strchr.
(HtIsStrictWordChar): Ditto. (Patch from Hans-Peter Nilsson)
Mon Mar 29 10:51:54 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htlib/regex.h, htlib/regex.c: Include glibc versions of the
regex functions to override possibly buggy system versions.
* htlib/Makefile.in: Use them.
* htfuzzy/EndingsDB.cc: Use glibc regex functions instead of rx
for massive speedups on non-English affix files.
* configure, configure.in: Use the system timegm function if present.
Don't configure rx since we don't use it any more. Don't worry
about tsort since that was only needed for rx.
* Makefile.in, Makefile.config.in: Ignore the rx directory if present.
Thu Mar 25 12:24:18 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* installdir/long.html, installdir/short.html: Remove backslashes
before quotes in HTML versions of the builtin templates.
* Makefile.in: Add long.html & short.html to COMMONHTML list, so
they get installed in common_dir.
Thu Mar 25 11:45:59 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htsearch/Display.cc(displayMatch), htcommon/defaults.cc,
htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html:
Add date_format attribute suggested by Marc Pohl.
Thu Mar 25 09:49:33 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htsearch/Display.cc(displayMatch): Avoid segfault when DocAnchors
list has too few entries for current anchor number.
Wed Mar 24 12:20:02 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/main.cc (main): Call HtWordType::Initialize. (Missed this
one yesterday. Oops!)
Tue Mar 23 17:11:46 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* backport Hans-Peter Nilsson's suite of changes for HtWordType
and extra_word_characters support, to 3.1.2...
* htlib/HtWordType.h (class HtWordType): New.
* htlib/HtWordType.cc: New.
* htlib/Makefile.in (OBJS): Add HtWordType.o
* htdoc/attrs.html: Document attribute extra_word_characters.
* htdoc/cf_byprog.html: Ditto.
* htdoc/cf_byname.html: Ditto.
* htcommon/defaults.cc (defaults): Add extra_word_characters.
* htsearch/htsearch.h: Lose spurious extern declaration of unused
variable valid_punctuation.
* htsearch/htsearch.cc (main): Call HtWordType::Initialize.
(setupWords): Use HtIsWordChar, HtIsStrictWordChar and
HtStripPunctuation. Do not read valid_punctuation.
* htsearch/Display.cc (excerpt): Use HtIsStrictWordChar.
* htlib/StringMatch.cc (FindFirstWord): Ditto.
(CompareWord): Ditto.
* htdig/Retriever.h (class Retriever): Lose member
valid_punctuation.
* htdig/Retriever.cc (Retriever): Lose its initialization.
* htdig/Postscript.h (class Postscript): Lose member
valid_punctuation.
* htdig/Postscript.cc (Postscript): Lose its initialization.
(flush_word): Use HtStripPunctuation.
(parse_string): Use HtIsWordChar,
HtIsStrictWordChar and HtStripPunctuation.
* htdig/Parsable.h (class Parsable): Lose member
valid_punctuation.
* htdig/Parsable.cc (Parsable): Lose its initilization.
* htcommon/WordList.cc (valid_word): Use HtIsStrictWordChar.
(BadWordFile): Use HtStripPunctuation. Do not read
valid_punctuation.
* htcommon/DocumentRef.cc (AddDescription): Use HtIsWordChar,
HtIsStrictWordChar and HtStripPunctuation. Do not read
valid_punctuation.
* htdig/PDF.cc (parseString): Similar..
* htdig/HTML.cc (parse): Similar.
* htdig/Plaintext.cc (parse): Similar.
Tue Mar 23 15:52:33 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* .version: Bump to 3.1.2-dev.
Tue Mar 23 14:50:37 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htlib/String.cc: Fix up code to be cleaner with memory
allocation, inline next_power_of_2, fix some memory leaks.
(Geoff's changes of Feb 22-25)
Tue Mar 23 14:35:37 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htlib/HtWordCodec.cc(HtWordCodec): Fix bug with constructing from
uninitialized variables!
* htlib/HtURLCodec.cc (~HtURLCodec): Add missing deletion of
myWordCodec.
Tue Mar 23 14:18:16 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/PDF.cc(parseString): Use minimum_word_length instead of
hardcoded constant.
Tue Mar 23 12:02:00 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htsearch/Display.cc(generateStars): Add in support for use_star_image
which was lost when template support was put in way back when.
Tue Mar 23 11:47:52 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* Makefile.in: add missing ';' in for loops, between fi & done
Mon Mar 22 19:26:56 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htcommon/DocumentRef.cc(AddDescription): Check to see that
description isn't a null string or contains only whitespace before
doing anything.
Mon Mar 22 19:21:16 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htcommon/DocumentRef.h, htcommon/DocumentRef.cc: Fix #ifdef
problems with zlib.
Mon Mar 22 19:14:40 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/attrs.html (template_name): Typo; used by htsearch, not htdig.
Mon Mar 22 19:10:56 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/Retriever.cc (got_href): Check if the ref is for the
current document before adding it to the db. (From H-P Nilsson, Mar 8)
Mon Mar 22 19:03:23 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/attrs.html: Rephrase and clarify entry for url_part_aliases.
(From Hans-Peter Nilsson, Mar 2)
Mon Mar 22 18:48:10 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htfuzzy/Synonym.cc: Fix minor memory leak.
* htlib/Dictionary.h, htlib/Dictionary.cc(hashCode): Check if key
can be converted to an integer using strtol. If so, use the
integer as the hash code. (Geoff's patch)
Mon Mar 22 18:23:11 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htlib/List.cc(Nth): Check for out-of-bounds requests before
doing anything.
Mon Mar 22 17:50:47 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htsearch/Display.cc(display): Free DocumentRef memory after
displaying them.
(displayMatch): Fix memory leak when documents did not have anchors,
fix problems when documents did not have descriptions.
Mon Mar 22 17:32:14 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htmerge/docs.cc(convertDocs): Replace previous verbose patch
with H-P Nilsson's.
Mon Mar 22 17:13:35 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/Plaintext.cc, htmerge/words.cc: removed Log lines.
Mon Mar 22 16:11:31 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htsearch/htsearch.cc: Add patch from Jerome Alet <alet@unice.fr>
to allow '.' in config field but NOT './' for security reasons.
Mon Mar 22 15:56:55 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* installdir/long.html, installdir/short.html: Write out HTML
versions of the builtin templates. (committed to 3.1.2 by Gilles)
* installdir/htdig.conf: Add commented-out template_map and
template_name attributes to use the on-disk versions.
Mon Mar 22 15:13:33 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htcommon/defaults.cc, htdoc/attrs.html: Change default locale
to "C", as H-P Nilsson recommended.
* htlib/Configuration.cc(Add): Fix small memory leak in locale code,
as Geoff discovered.
Mon Mar 22 15:03:10 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* contrib/parse_doc.pl: uses pdftotext to handle PDF files,
generates a head record with punctuation intact, extra checks
for file "wrappers" & check for MS Word signature (no longer
defaults to catdoc), strip extra punct. from start & end of words,
rehyphenate text from PDFs, fix handling of minimum word length.
Mon Mar 22 14:38:01 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/Plaintext.cc(parse): Use minimum_word_length instead of
hardcoded constant.
Mon Mar 22 14:33:45 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htlib/Configuration.cc(Add): Fix function to avoid infinite loop
on some systems, which don't allow all the letters in isalnum() that
isalpha() does, e.g. accented ones.
* htdig/HTML.cc: Fix three reported bugs about inconsistent
handling of space and punctuation in title, href description & head.
Now makes destinction between tags that cause word breaks and those
that don't, and which of the latter add space.
Mon Mar 22 14:25:34 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htmerge/docs.cc: Make htmerge -vv report reasons for deleting docs.
* htmerge/words.cc(mergeWords): Fix to prevent description text
words from clobbering anchor number of merged anchor text words.
Fri Mar 19 17:09:21 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/HTML.cc: Fix bug where noindex_start was empty, allow case
insensitive matching of noindex_start & noindex_end.
* htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html:
Fix inconsistencies in documentation for noindex_start & noindex_end.
Fri Mar 19 17:05:16 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/HTML.cc: Add check for <a href=...> tag that is missing a
closing </a> tag, terminating it at next href.
Fri Mar 19 17:00:18 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/Document.cc: Fix check of Content-type header in readHeader(),
correcting bug introduced Jan 10 (for PR#91), and check against
allowed external parsers.
* htdig/HTML.cc: More lenient comment parsing, allows extra dashes.
Fri Mar 19 16:52:51 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/HTML.cc: Check for presence of more than one <title> tag.
* htlib/mytimegm.cc: Fix Y2K problems.
Fri Mar 19 16:43:28 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/HTML.cc: Add patch from Gabriele to ensure META
descriptions are parsed, even if 'description' is added to the
keyword list.
Fri Mar 19 16:37:08 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htsearch/parser.h, htsearch/parser.cc: Clean up patch made for
error messages, made on Feb 16.
Tue Feb 16 23:48:09 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* configure.in, configure: Default to 'int' when we cannot
establish type used by getpeername.
* htdoc/RELEASE.html: Additional notes on everything fixed in 3.1.1.
Tue Feb 16 23:45:26 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* contrib/parse_doc.pl: Add replacement for less-capable (and
buggy) parse_word_doc.pl script. Handles Word, PS, RTF, and
WordPerfect files, with appropriate file->text converters.
* htsearch/parser.cc, htsearch/parser.h: Add more error messages
when the boolean expression is invalid.
Mon Feb 15 21:02:24 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdig/Document.cc(RetrieveLocal): Fix to ensure we report
reading only max_doc_size bytes, even when the document is larger.
* configure.in, configure: Add 'socklen_t' to getpeername check to
prevent problems configuring on Solaris 7.
* htdoc/RELEASE.html: Minor changes for 3.1.1 release.
Sun Feb 14 16:29:48 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdig/Document.cc(retrieveHTTP, retrieveLocal): Fix document
size when the document is larger than max_doc_size. Size should be
that sent by the server or as given by stat().
* htdoc/*.html: More cleanups from Marjolein.
Sat Feb 13 20:53:34 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdig/Retriever.cc(got_word): Ensure heading is in a normal range.
* htdoc/RELEASE.html: Added information on the bugs fixed in 3.1.1.
* htdoc/attrs.html: Added info on the changed syntax of the pdf_parser
attribute in 3.1.0 and later.
Sat Feb 13 20:29:26 1999 Marjolein Katsma <webmaster@javawoman.com>
* htdoc/*.html: Cleaned up HTML, fixed typos, added appropriate
HTML 4.0 syntax, added DTDs to files, other minor fixed.
Fri Feb 12 19:58:28 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* .version: Bump for version 3.1.1.
* configure.in, configure: Fix problems determining getpeername
syntax under IRIX.
* db/os/os_map.c: Fixed problems on AlphaLinux pointed out by Paul
J. Meyer.
Fri Feb 12 12:00:25 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/ExternalParser.cc: Fix crashes noted by Frank Richter.
* contrib/htparsedoc/parse_word_doc.pl: Use updated version (with
fixed line breaks).
* htnotify/htnotify.cc: Add patch mentioned in Feb 8 documentation
change.
Thu Feb 11 00:29:42 1999 Hans-Peter Nilsson <hp@axis.se>
* htcommon/DocumentRef.cc (NUM_ASSIGN): Expand from unsigned types.
(getnum): Use temporary for "unsigned short", and memcpy data into
it instead of assignment.
Tue Feb 9 19:21:55 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdoc/FAQ.html, htdoc/where.html: Update for 3.1.0 release.
* htdoc/uses.html: Added remaining backlog.
* htdoc/RELEASE.html: Finish up release notes for 3.1.0.
Tue Feb 9 19:19:13 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/ExternalParser.cc: Ensure we remove the temporary file.
Mon Feb 8 20:28:07 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdoc/ma_menu: Change relative URLs to absolute URLs to
www.htdig.org to reflect the changing mail archive.
* htdoc/install.html: Add notes on new configure flags to set
CONFIG variables.
* htdoc/*.html: Ensure Last Modifed date stamps are up-to-date.
Mon Feb 8 20:26:40 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/meta.html, htdoc/notification.html: Add info on date
formats for the htnotify-date tag, esp. in relation to ISO 8601.
Sat Feb 6 23:24:19 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htcommon/DocumentRef.cc: Fixed compile problem when zlib is disabled.
* htdoc/cf_byname, htdoc/cf_byprog.html, htdoc/attrs.html: Added
entries for url_log, compression_level, noindex_start, noindex_end,
allow_in_form, bad_querystr, no_title_text.
* htdoc/THANKS.html: Added Gabriele Bartolini.
* htdoc/uses.html, htdoc/FAQ.html, htdoc/bugs.html: Synch with the
latest versions from the website tree.
Fri Feb 5 19:57:39 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htnotify/htnotify.cc: Add function parse_date() to parse date
strings from htnotify-date tags. It tries to be as flexible as
possible about formatting and will report invalid dates. Based in
part from code contributed by Gabriele Bartolini.
Fri Feb 5 19:28:24 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* configure, configure.in: Add a test to ensure the zlib.h header
file exists.
* include/htconfig.h.in: Added definition for HAVE_ZLIB_H.
* htcommon/DocumentRef.h, htcommon/DocumentRef.cc: Add checks for
HAVE_ZLIB_H in addition to HAVE_LIBZ. Ensures the library is
actually accessible, not just present.
* htfuzzy/Soundex.cc: Fix typo.
Thu Feb 4 22:51:37 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* Makefile.in: Clean up previous patch and tidy up HTML and
dictionary installation.
Thu Feb 4 22:31:35 1999 Ric Klaren <klaren@telin.nl>
* Makefile.in, */Makefile.in: Add support for
$INSTALL_ROOT, making it easier to build packages (e.g. RPMs) into
directories for later processing.
* htsearch/Display.cc: Tiny patch to silence a compiler warning.
Thu Feb 4 13:03:44 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htfuzzy/Soundex.cc(generateKey): Skip initial non-alphabetic
characters and explicitly skip characters without values.
* htfuzzy/Metaphone.cc(generateKey): General bug-fixing, fixing a
bug that corrupted the string to be processed, fixing typos, and
ensuring keys generated fit the metaphone algorithm.
* htfuzzy/Fuzzy.cc(getWords): Add debugging output of the fuzzy
key used.
* contrib/doclist/doclist.pl, contrib/doclist/listafter.pl,
contrib/whatsnew/whatsnew.pl, contribu/urlindex.pl: Change to
support additions to ht://Dig database format.
Thu Feb 4 02:09:22 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htsearch/htsearch.cc: Add debugging information on words
returned from fuzzy matching.
* htfuzzy/Metaphone.cc(addWord): Fix bug where only one word would be
stored per key in the database.
* htfuzzy/Soundex.cc(addWord): Ditto.
(generateKey): Rewrite to generate keys correctly.
Wed Feb 3 19:24:36 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdoc/htdig.html: Added documentation on the -l log and restart
feature.
* htdoc/htmerge.html: Added documentation on the -m merge database
feature.
* htdig/main.cc: Added documentation on the -l flag to the usage
message.
* .version: Bump to 3.1.0.
Wed Feb 3 19:09:31 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htsearch/Display.cc: Add check for URLs with no / in the
no_title code.
* htdig/Document.cc: Fix problems with dates returned from servers
with incorrect formats. Those simply missing the day of week are
parsed correctly, otherwise output an error, use the current date,
and keep going.
Wed Feb 3 09:57:14 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* installdir/nomatch.html: Fix small typo.
* htdoc/RELEASE.html: Finish up 3.1.0 release notes.
* htdoc/TODO.html: Update with status and new directions.
Wed Feb 3 14:22:11 1999 Alexander Bergolth <leo@leo.wu-wien.ac.at>
* htsearch/Display.cc(setVariables): Removed some of yesterdays
changes. Thanks to Gilles!
Tue Feb 2 17:26:06 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/PDF.h, htdig/PDF.cc: Fix problems with PDFs generated by
CorelDraw.
* htdoc/attrs.html: Fixed small typo.
Tue Feb 2 21:02:25 1999 Alexander Bergolth <leo@leo.wu-wien.ac.at>
* htsearch/Display.cc(setVariables,createURL): As pointed out by
Gilles, append allow_in_form variables to the query strings only
if they are given as input parameters.
Tue Feb 2 10:29:09 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* configure, configure.in: Rewrite getpeername_length_t detection
to use prototypes to eliminate type conversion.
* htsearch/Display.cc(buildMatchList): Ensure scores are always
positive or zero.
Mon Feb 1 22:54:02 1999 Hans-Peter Nilsson <hp@axis.se>
* htdoc/attrs.html: Correct "default" for "nothing_found_file".
Mon Feb 1 14:44:32 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htsearch/Display.cc(displayMatch): Remove compiler warnings.
* */Makefile.in: Define INSTALL_PROGRAM from configure script.
Mon Feb 1 14:04:18 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/ExternalParser.cc: Add checks to prevent wayward parsers
from bringing down the dig.
Sun Jan 31 23:15:36 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htsearch/WeightWord.cc(set): Ensure word is lowercased for
accurate fuzzy comparisons.
* htfuzzy/Fuzzy.cc(openIndex): Destroy the database reference if
we cannot open the database. Fixes a coredump in classes that
inherit this method.
* Makefile.config.in: Remove bogus definitions of INSTALL.
* Makefile.in: Define INSTALL, INSTALL_PROGRAM, INSTALL_SCRIPT,
and INSSTALL_DATA as defined by configure. Use them.
* htdoc/RELEASE.html: Started release notes for version 3.1.0.
Mon Feb 1 04:36:29 1999 Hans-Peter Nilsson <hp@axis.se>
* htsearch/Display.cc (displayMatch): Fix leaking user of
String(String *).
* htfuzzy/Prefix.cc (getWords): Ditto.
* htlib/htString.h, htlib/String.cc (String(const String &)): New.
* htlib/htString.h, htlib/String.cc (String(const String &, int)):
No default argument.
* htlib/htString.cc, htlib/String.cc (String(String *)): Removed.
Sun Jan 31 21:46:52 1999 Alexander Bergolth <leo@leo.wu-wien.ac.at>
* htlib/Connection.cc: Include sys/time.h needed by select, fixes
PR #322.
Sun Jan 31 20:50:38 1999 Hans-Peter Nilsson <hp@axis.se>
* htdig/Retriever.cc (Initial, GetRef, Need2Get, IsValidURL,
got_href, got_redirect): Do not lowercase URLs.
* htlib/HtURLCodec.h (class HtURLCodec): Fake a friend function.
Sat Jan 30 22:29:50 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* configure, configure.in: Add support for program name
transformations.
* */Makefile.in: Do it.
Sat Jan 30 21:16:50 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htmerge/docs.cc: Added translation of Dutch comment for us ignorant
Americans. ;-)
* installdir/rundig: As mentioned by Gilles, use sed with ls -t
test. Add more comments for FAQs.
* configure.in, configure: Add --disable-zlib to turn off compiling
compression entirely. Add --with-cgi-bin-dir,
--with-image-dir and --with-search-dir flags to set CONFIG
variables.
* CONFIG.in: Use them.
Sat Jan 30 21:05:35 1999 Randy Winch <gumby@cafes.net>
* htcommon/DocumentRef.h: If using compressed document databases,
declare compress and decompress functions and the current state of
the head (excerpt).
* htcommon/DocumentRef.cc: Change document compression to only
compress the DocHead field and only decompress when necessary.
Sat Jan 30 03:49:21 1999 Hans-Peter Nilsson <hp@axis.se>
* htcommon/DocumentRef.h: Add #ifdef around declaration of
c_buffer.
* htcommon/DocumentRef.cc: Remove spurious extra "static" from
c_buffer definition. Add #ifdef HAVE_LIBZ around it.
Fri Jan 29 13:30:11 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htsearch/htsearch.cc: Construct the StringMatch used for finding
excerpts in two pieces--user input and post-fuzzy matching. Fixes
problems with matching searches with punctuation.
* htlib/StringMatch.cc(IgnoreCase): Fix small memory leak pointed
out by Gilles.
Thu Jan 28 21:36:03 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdoc/*.html: Changed copyright information to mention the
ht://Dig group, removing Andrew's name.
* README, configure.in, Makefile.in: Ditto.
* configure: Change mention of libg++ -> libstdc++.
Thu Jan 28 12:53:40 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html:
Document new remove_default_doc attribute.
* Makefile.in: Make sure we put the wrapper file in the right place.
Make sure dictionaries are installed with the correct permissions.
* installdir/rundig: Use a portable test for testing the endings
and synonym databases. Also enhanced support for flags (-a, -s,
-vvv, -c config).
* htsearch/Display.cc: Fix bug when sorting results would cause a
coredump.
Wed Jan 27 20:00:40 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/HTML.cc, htdig/SGMLEntities.cc, htdig/ExternalParser.cc,
htcommon/WordList.cc, htcommon/DocumentRef.cc: Speedup by
converting many config lookups into static variables.
* htdoc/attrs.html, htdoc/hts_templates.cc, htdoc/cf_byname.html,
htdoc/cf_byprog.html: Various minor fixes.
* htsearch/Display.cc: Fix problems with star_patterns attribute.
Wed Jan 27 13:02:39 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdig/SGMLEntities.cc: Use StringMatch class for matching
&quot; &amp; &lt; and &gt; as defined by config options. Should
speed up translation.
* htdoc/THANKS.html: Minor updates for contributions towards 3.1.0.
Tue Jan 26 19:29:08 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* include/htconfig.h.in: Define TRUE and FALSE if not
defined. Change default of NO_WORD_COUNT (now undefined) for
compatibility.
* htdig/htdig.h: Remove definition of TRUE and FALSE (for consistency).
* htcommon/DocumentDB.cc(Add, Delete, Exists, []): Do not
lowercase the URL before storing it. URLs can be case-sensitive.
Tue Jan 26 19:07:03 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htcommon/defaults.cc: Define remove_default_doc as option of
default document to strip off URLs (e.g. /index.html -> /).
* htlib/URL.cc(removeIndex): Use it.
(normalizePath): Fix bug with stripping double slashes and the
like from a query string.
* htdig/Document.h, htdig/Document.cc: Add new variable
contentLength and consider content-length headers when reading in
documents.
* htdig/PDF.cc: Fix broken code calling acroread.
* htsearch/Display.cc: Allow braces in wrapper file.
* htdoc/hts_general.html, htdoc/hts_templates.html: Add info on
the wrapper alternative to separate header and footer files.
* htdoc/config.html, installdir/header.html,
installdir/nomatch.html, installdir/wrapper.html,
installdir/search.html: Change sort option to be more grammatically
correct.
Tue Jan 26 21:19:02 1999 Hans-Peter Nilsson <hp@axis.se>
* htmerge/docs.cc (convertDocs): Use HtURLCodec to encode URLs
going into the doc_index database.
* htsearch/Display.cc (buildMatchList): Use HtURLCodec to decode
URLs from docIndex.
* htcommon/defaults.cc (defaults): Fix typo with "case_sensitive".
Tue Jan 26 18:08:19 1999 Alexander Bergolth <leo@leo.wu-wien.ac.at>
* include/htconfig.h.in: Added HAVE_STRINGS_H. (I forgot that when
added the configure check.)
* htdig/Retriever.h: Fix small compiler error. Removed Log-lines.
Tue Jan 26 02:22:45 1999 Hans-Peter Nilsson <hp@axis.se>
* htdig/main.cc (main): Fix typo "uncoded_db_compatbile".
Mon Jan 25 19:38:31 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htlib/Configuration(Find): Make error message for missing
entries conditional to DEBUG symbol. Removes odd error messages
under normal use.
Sun Jan 24 23:55:57 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htmerge/db.cc, htmerge/docs.cc: Fix compiler errors.
* htnotify/htnotify.cc: Similar.
Sun Jan 24 14:13:37 1999 Hans-Peter Nilsson <hp@axis.se>
* htcommon/WordRecord.h (struct WordRecord): Remove member count
if NO_WORD_COUNT defined.
* htmerge/db.cc (mergeDB): Remove handling.
* htmerge/words.cc (mergeWords): Similar.
* include/htconfig.h.in: Define NO_WORD_COUNT by default.
Sun Jan 24 14:13:37 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htsearch/Display.cc(logSearch): Added fix from Gilles in case
REMOTE_ADDR is NULL as well.
* htnotify/htnotify.cc: Fix compiler warnings.
* htlib/String.cc(indexOf): Use autoconf check for strstr, fix
compiler warnings.
* htlib/Configuration.cc(Find): Complain when option is not in the
list.
* htdig/HTML.cc(parse): Move declarations out of the loop.
(parse): Don't add non-word characters to the excerpt if they're
in the title. Fixes PR #80.
Mon Jan 25 02:17:58 1999 Hans-Peter Nilsson <hp@axis.se>
* htcommon/defaults.cc (defaults): New option
"uncoded_db_compatible", default true.
* htcommon/DocumentDB.h (DocumentDB::SetCompatibility): New
function.
(DocumentDB::myTryUncoded): New member.
* htcommon/DocumentDB.cc (Constructor, Add(), operator[],
Exists(), Delete()): Handle uncoded URL in database if
myTryUncoded.
* htdig/main.cc (main): Call (DocumentDB::)SetCompatibility() with
option "uncoded_db_compatible".
* htsearch/Display.cc (Display): Likewise.
* htnotify/htnotify.cc (main): Likewise.
* htmerge/docs.cc (convertDocs): Likewise.
* htmerge/db.cc (mergeDB): Likewise.
* htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html:
Document option "uncoded_db_compatible".
Sun Jan 24 15:21:02 1999 Hans-Peter Nilsson <hp@axis.se>
* htlib/HtWordCodec.cc (HtWordCodec(StringList &, etc)): Check
limits separately for "to" and "from". Do not calculate
string-lengths separately for limit-checking; use methods Count()
and length() on data near the final result.
* htlib/HtWordCodec.cc (HtWordCodec constructors): Do not
explicitly add '\0' to the pattern strings.
* htlib/HtWordCodec.cc (code): Check for zero-length replacement
list.
Sat Jan 23 22:18:18 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdig/Retriever.cc(parse_url): If a server ignores the
If-Modified-Since request, still compare the retrieved date to the
stored date to see if it has been modified.
Sat Jan 23 13:09:03 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htmerge/htmerge.cc: Unlink the db.docs.index file before we
build it again. This ensures we have a clean copy and don't
duplicate URLs.
Fri Jan 22 23:12:12 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* include/htconfig.h.in: Cleaned up preprocessor definitions.
* configure.in, configure: Fix NEED_PROTO_GETHOSTNAME check and
make check for GETPEERNAME_LENGTH_T more flexible.
* htlib/Connection.cc: Change __sun__ to NEED_PROTO_GETHOSTNAME
since we prefer feature tests.
Sat Jan 23 02:38:08 1999 Hans-Peter Nilsson <hp@axis.se>
* htsearch/Display.cc (logSearch): Fix simple typo in last change.
Sat Jan 23 01:18:05 1999 Hans-Peter Nilsson <hp@axis.se>
* htlib/String.cc (operator =): Add const modifier: const String &.
* htlib/htString.h (String::operator=(const String &)): Ditto.
* htlib/DB2_db.h (class DB2_db): Make Put(), Get(), Exists() and
Delete() use const modifiers on appropriate parameters.
* htlib/DB2_db.cc: Ditto.
* htlib/GDBM_db.h (class GDBM_db): Ditto.
* htlib/GDBM_db.cc: Ditto.
* htlib/Database.h (class Database): Ditto.
* htlib/Database.cc (Put): Similar.
* htlib/BTree.h (class BTree): Make Put(), Get() and Exists() use
const modifiers on appropriate parameters.
* htlib/BTree.cc: Ditto.
* htcommon/DocumentDB.cc (Add, operator[], Exists, Delete): Remove
needless temporary String.
* htcommon/DocumentRef.cc (Deserialize): Ditto.
Fri Jan 22 21:10:12 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htlib/Configuration.cc: Add support for keyword "include" to
include other config files.
* htdoc/cf_general.html: Document it.
Thu Jan 21 23:25:37 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htsearch/Display.cc(logSearch): Check if HTTP_REFERER is NULL,
if so, use a dash. (Otherwise we'll kill some syslog() services).
Thu Jan 21 05:30:40 1999 Hans-Peter Nilsson <hp@axis.se>
* htlib/HtURLCodec.h, htlib/HtURLCodec.cc, htlib/HtWordCodec.cc,
htlib/HtWordCodec.h, htlib/HtCodec.cc, htlib/HtCodec.h: New files.
* htlib/Makefile.in (OBJS): Add the corresponding *.o files
* htcommon/DocumentDB.cc (Open, Read, Add, operator[], Exists,
Delete, CreateSearchDB, URLs): Use HtURLCodec; ::encode() and
::decode() the URL used as a key.
* htcommon/DocumentRef.cc (Serialize): Encode the URL using
HtURLCodec.
(Deserialize): Decode it.
* htmerge/htmerge.h: #include <HtURLCodec.h>
* htmerge/htmerge.cc (main): Check HtURLCodec for errors.
* htnotify/htnotify.cc (main): Ditto.
* htsearch/htsearch.cc (main): Ditto.
* htdig/main.cc (main): Ditto.
* htcommon/defaults.cc (defaults): Add common_url_parts and
url_part_aliases.
* htdoc/cf_byprog.html, htdoc/cf_byname.html,
htdoc/attrs.html: Document url_part_aliases and
common_url_parts.
* htlib/StringMatch.h (StringMatch::Pattern): Add default
parameter sep = '|'.
* htlib/StringMatch.cc (Pattern): Similar.
Wed Jan 20 20:20:35 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htsearch/Display.cc(logSearch): Use REMOTE_ADDR when REMOTE_HOST
is unavailable (otherwise we silently dump core). Fixes PR #138.
* htcommon/WordList.cc(valid_word): Words cannot be valid if
they're shorter than minimum_word_length! Fixes PR #139.
* htsearch/Display.cc(expandVariables): Allow variables of the
form ${VAR}, fixes PR #121.
Wed Jan 20 17:21:33 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htmerge/docs.cc: Fix logic to remove documents--missing else
statements allow some "deleted" documents to not be removed.
Wed Jan 20 11:52:18 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htlib/good_strtok.h, htlib/good_strtok.cc: Added fixes and speed
improvements contributed by Andrew Bishop.
* htdig/ExternalParser.cc, htdig/Server.cc, htlib/cgi.cc,
htmerge/db.cc, htmerge/words.cc: Call good_strtok with appropriate
parameters (explicitly include NULL first parameter, second param
is char, not char *).
* htcommon/WordList.cc(Word): Added check for adding words with
weight zero.
* htsearch/Display.h, htsearch/Display.cc: Revised setting ANCHOR
variable: it will be empty if there is no excerpt which matches
the search formula. Fixes problems with META descriptions. Based
on a patch contributed by Marjolein.
Wed Jan 20 00:30:12 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdig/SGMLEntities.cc: Declare extern config, since we now use
config options.
* htsearch/Display.cc: Fix typo causing compile problems.
Tue Jan 19 23:51:38 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htcommon/defaults.cc: Added options translate_amp, _lt_gt, _quot as
suggested by Marjolein to control SGML translation of these
entities.
* htdig/SGMLEntities.cc: Use them as contributed by Marjolein.
Tue Jan 19 12:55:36 1999 Hans-Peter Nilsson <hp@axis.se>
* htlib/StringMatch.cc (Pattern): Always set PreviousState before
checking PreviousValue.
* htlib/StringMatch.cc (FindFirst): Be "greedy"; match longest.
(Compare): Ditto.
* htcommon/DocumentRef.cc (MEMCPY_ASSIGN, NUM_ASSIGN): New macros
for assigning portably to some possibly-enum numeric type.
(getnum): Use them.
* htlib/StringMatch.cc (FINAL): Remove.
(MATCH_INDEX_MASK): Include highest bit.
(Pattern, FindFirst, Compare, FindFirstWord, CompareWord): Do not
use FINAL.
(FindFirst, Compare, FindFirstWord, CompareWord): When shifting by
INDEX_SHIFT, cast to unsigned.
Mon Jan 18 17:43:29 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htcommon/defaults.cc: Added no_title_text option to allow
configuration of the text when no title is available. Default is
the filename.
* htsearch/Display.cc: Use no_title_text to set the title
appropriately, as contributed by Marjolein.
* htsearch/Display.cc: Ensure PERCENT variable has a minimum of 1.
Mon Jan 18 17:41:44 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdig/Server.cc: Use max_doc_size when retrieving robots.txt
files instead of a hard-coded 10k limit.
* htdig/Document.cc: When reading chunks of document, if a chunk
puts us over the max_doc_size limit, take everything up to that
limit (rather than discarding the entire chunk).
* htcommon/DocumentRef.cc: Fix thinko with compression_level.
Sun Jan 17 21:48:05 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/(attrs.html, cf_byname.html, cf_byprog.html, config.html,
hts_form.html, hts_templates.html): Add documentation for "sort"
config and form input.
* htcommon/defaults.cc: Added options "sort" and "sort_names" to
pick result sorting order and text names for sort options.
* htsearch/Display.cc: Added variable SORT to render a form menu
for sort options, based on "sort" and "sort_names" options.
* installdir/(wrapper.html, header.html, nomatch.html,
footer.html, search.html, syntax.html): Add in sort option to form.
Sun Jan 17 14:03:54 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htsearch/TemplateList.h
htsearch/TemplateList.cc(createFromString): Ensure
template_map config has three members for each template we add,
contributed by Gabriele Bartolini <tlm@mbox.comune.prato.it>.
* htsearch/Display.cc(Display): Take advantage of createFromString
returning an error value to bail out of poorly-constructed
template_maps, based on code contributed by <tlm@mbox.comune.prato.it>.
* htdig/PDF.cc: Add debugging output of URLs causing
problems. Also, switch system call to make it easier to call xpdf
instead of acroread.
* htcommon/defaults.cc: Change default pdf_parser attribute to
include acrobat-specific flags. Fix mismatched naming of
compression_level (was compression_factor).
* htdig/Retriever.cc: Fix compiler warnings.
* contrib/examples/updatedig: Added contributed rundig-type script
from David Robley <webmaster@www.nisu.flinders.edu.au>.
Sun Jan 17 13:42:43 1999 didier Gautheron <dgautheron@magic.fr>
* htcommon/defaults.cc: add url_log parameter for save and restart
function.
* htdig/Retriever.cc, htdig/Retriever.h: Add save and restart
function.
* htdig/main.cc: Add option -l for save and restart
function.
* htdig/PDF.cc: Check to see if we have acroread before copying
the pdf into TMPDIR!
Fri Jan 15 07:23:30 1999 Hans-Peter Nilsson <hp@axis.se>
* htcommon/DocumentRef.cc(Serialize): Save
space when lengths can fit in an unsigned char or unsigned short.
* htcommon/DocumentRef.cc(Deserialize): Handle expansion.
Thu Jan 14 23:37:29 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htcommon/defaults.cc: Added options noindex_start and
noindex_end to enable NOT indexing some sections of
HTML. Contributed by Marjolein.
* htdig/HTML.cc: Use them.
* contrib/examples/rundig.sh: Add rundig example from Colin
Viebrock with a few modifications for using less disk space.
Thu Jan 14 23:27:24 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htlib/URL.cc: Fix parent path logic to ignore slashes in query
string. Noted by Adam Coyne <adam@criticalmass.com>.
Thu Jan 14 00:04:03 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* README: Fix for upcoming 3.1.0 release.
* htcommon/defaults.cc: Set compression_factor to 0 for default
(no compression).
Thu Jan 14 03:16:15 1999 Hans-Peter Nilsson <hp@axis.se>
* htdig/ExternalParser.cc (parse): Added support for 'm': meta element.
* htdoc/attrs.html: Document it.
Wed Jan 13 21:31:38 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* Makefile.in(install): Add wrapper.html to the common directory
when installing.
* contrib/examples: Added directory for example common files
(e.g. badwords, dictionaries, templates, etc.)
* contrib/examples/badwords: Added example bad_words file by Marjolein.
* .version: Bump to 3.1.0dev.
* htdig/HTML.cc(parse): Added slight fixes to the comment parsing
code, contributed by Marjolein.
Wed Jan 13 20:11:26 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/attrs.html: Fix typo with META example.
* htdig/Document.cc: Use new StringList::Join function for
http_proxy_exclude.
* htnotify/htnotify.cc: Bring latest security patch from 3.1.0b4
onto the mainline source.
* installdir/wrapper.html: New file to merge header and footer files.
* htcommon/defaults.cc: Added search_results_wrapper for the
location of the wrapper file, if used. (The default is empty,
which uses header.html and footer.html)
* htsearch/Display.cc: Added support for using the wrapper instead
of header and footer if search_results_wrapper is set.
* htsearch/htsearch.cc: Added check for sort config.
* htsearch/Display.cc, htsearch/Display.h: Added support for
sorting and reverse sorting by date, time, and score.
Wed Jan 13 18:45:17 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htcommon/defaults.cc: Removed use_document_compression
(redundant) and fixed problem with missing comma. Setting
compression_factor to 0 is the equivalent of turning off
use_document_compression.
* htcommon/DocumentRef.cc(Serialize, Deserialize): Update from
Randy Winch to eliminate use_document_compression and fix
compilation problems noted by Hans-Peter.
* htmerge/db.cc: Fixed problem with db.NextDocID() being set
incorrectly, reported by Roman Dimov <roman@mark-itt.ru>.
* htcommon/DocumentDB.h: Added IncNextDocID to allow big changes
in db.NextDocID(), such as those above.
* htdoc/THANKS.html: Added Akos Domotor.
Wed Jan 13 07:07:35 1999 Hans-Peter Nilsson <hp@axis.se>
* htsearch/htsearch.cc (setupWords): Remove parsedWords parameter
with accociated processing of original words - deletion of
bad_words, spacing and on-the-fly modifiers.
(main): Create originalWords from input, not via setupWords().
Tue Jan 12 09:16:49 1999 didier Gautheron <dgautheron@magic.fr>
* htcommon/WordList.cc, htmerge/words.cc: Changed field order
in db.wordlist. With the old order, words from HTML body and words
from links to that url weren't merged sometimes.
* htdig/Document.cc, htmerge/words.cc: Small speed improvements.
* htdig/HTML.cc: Fixed small memory leak with bogus HTML and small
speedups.
* htdig/Retriever.cc(got_href) : if ref exists we have to call
AddDescription even if max_hop_count is reached. It's important
for wwwoffle (urls in the cache are restricted by max_hop_count)
* htcommon/DocumentDB.cc, htcommon/DocumentDB.h, htdig/Retriever.cc,
htlib/Dictionary.cc, htlib/Dictionary.h, htlib/Object.cc,
htlib/Object.h, htlib/String.cc, htlib/htString.h,
htcommon/WordList.cc: Speedups after gprof data.
Tue Jan 12 07:23:35 1999 didier Gautheron <dgautheron@magic.fr>
* htlib/Configuration.cc: Fixed time format to standard to avoid
sending If-Modified-Since http headers in native format (which
would be incorrect behavior). Use C locale.
* htlib/Dictionary.h, htlib/Dictionary.cc: Add new method
GetNextElement to directly return next object when iterating.
Tue Jan 12 12:56:26 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htcommon/DocumentRef.h, htcommon/DocumentRef.cc(serialize,
deserialize): Added support for compressing data using zlib if
available, contributed by Randy Winch <gumby@cafes.net>.
* htcommon/defaults.cc: Added config options
use_document_compression and compression_factor for zlib support.
* configure.in, include/htconfig.h.in: Added autoconf check for
libz and deflate function.
* configure: Generated from above change.
Mon Jan 11 22:48:17 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htmerge/db.cc: Fixed thinko with setting the docIDs of new words
in the destination wordlist.
* htdoc/FAQ.html, htdoc/THANKS.html, htdoc/contents.html: Minor
cleanups.
* htdoc/RELEASE.html: Added release info from 3.1.0b4.
* htdoc/uses.html: Alphabetized, added a form for requests, and
added in lots of new sites.
Mon Jan 11 02:42:51 1999 Hans-Peter Nilsson <hp@axis.se>
* htsearch/htsearch.cc (setupWords): Do not skip words if
"boolean" search.
Mon Jan 11 00:42:51 1999 Hans-Peter Nilsson <hp@axis.se>
* htdoc/hts_method.html: Add explanation of operator "not".
* installdir/syntax.html: Added examples of correct logical
expressions.
Mon Jan 11 00:23:58 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdoc/attrs.html(search_algorithm): Added prefix and substring
matching--somehow slipped through the cracks!
* htdoc/THANKS.html: Update to be more accurate as far as recent
contributions.
Sun Jan 10 00:06:59 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdig/Document.cc(readHeader): Added check for header status
when considering content-types. Fixed PR #91.
Sat Jan 9 20:52:49 1999 didier Gautheron <dgautheron@magic.fr>
* htcommon/WordList.cc(valid_word): Break out of looping once
we're sure the word is invalid.
* htlib/Dictionary.cc(Remove, Exists): Remember special case of an
empty dictionary.
Sat Jan 9 20:16:25 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdig/HTML.cc(parse): Don't capitalize headers--this creates
problems with non-ASCII values, since String::uppercase doesn't
know how to capitalize them. Fixes PR #100.
Sat Jan 9 14:47:17 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdig/Document.cc(getdate): Strip off weekday before calling
strptime since some servers return invalid weekdays. Fixes PR #79.
* htmerge/htmerge.h: Declare new mergeDB code.
* htmerge/htmerge.cc: Set up merge_config file and add options for
mergeDB code.
* htmerge/db.cc: New file. Implements merging of two database sets
specified by the merge_config and config variables.
* htmerge/Makefile.in: Add db.o as an object to be compiled.
Fri Jan 8 20:11:56 1999 Alexander Bergolth <bergolth@ariel.wu-wien.ac.at>
* htdig/Plaintext.cc: fixed bug that inhibited compressing of
whitespace
* htlib/URL.cc: fixed problem in stripping anchors from URLs
Thu Jan 7 23:29:32 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdig/HTML.cc(parse): Corrected problems with parsing comments,
as contributed by Marjolein Katsma <webmaster@javawoman.com> and
Gilles.
* htsearch/Display.cc, htsearch/Display.h: Implement
add_anchors_to_excerpt option and new variable ANCHOR as
contributed by Marjolein.
* htdoc/THANKS.html: Added new contributors.
* README: Update for 1999 copyright, version, etc.
Thu Jan 7 17:29:52 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdoc/(attrs.html, cf_byname.html, cf_byprog.html): Fix typo
noted by Joe Jah: keyword_factor -> keywords_factor.
Thu Jan 7 14:32:34 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htsearch/Display.cc (display): The start template, if provided,
should come out after the header, not before.
* htcommon/defaults.cc, installdir/footer.html: Use the
no_page_list_header stuff.
Thu Jan 7 11:09:08 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* installdir/*.png: Add PNG versions of the default GIF graphics.
Wed Jan 6 22:03:54 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htfuzzy/Synonym.cc, htfuzzy/htfuzzy.cc, htmerge/docs.cc,
htmerge/words.cc, htdig/SGMLEntities.cc: Fix minor memory leaks.
* htcommon/defaults.cc: Add .bin, .tgz, .rpm, .mov, .mpg, .avi to
bad_extensions.
* htdoc/attrs.html: Update documentation on default.
* installdir/rundig: Removed check for age of synonym and endings
DB. Nice feature, but it broke under too many shells.
* htlib/DB2_db.cc: Change allocation of database cursors to match
API in new version.
* htdig/Retriever.cc(got_word): Skip changing to lowercase, we do
it in WordList::Word.
Wed Jan 6 14:49:47 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca>
* htdoc/attrs.html: Added four new attributes, fixed defaults & typos.
* htdoc/cf_byname.html: Added four new attributes.
* htdoc/cf_byprog.html: Added four new attributes.
Wed Jan 6 14:37:06 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* configure.in: Changed to require Autoconf 2.13 to eliminate bugs
obeserved by users with older autoconf versions.
* configure: Regenerated using Autoconf 2.13.
Wed Jan 6 13:08:26 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htcommon/DocumentRef.cc: Applied fix from Dave Alden
<alden@math.ohio-state.edu> to compile under SunPRO compilers
by eliminating trailing comma in enum.
Wed Jan 6 17:50:55 1999 Alexander Bergolth <bergolth@ariel.wu-wien.ac.at>
* {.,htcommon,htdig,htfuzzy,htlib,htmerge,htnotify,htsearch}/
Makefile.in, Makefile.config.in: fixed relative path problem if
install-sh is used.
Wed Jan 6 17:12:04 1999 Alexander Bergolth <bergolth@ariel.wu-wien.ac.at>
* htlib/StringList.cc: fixed bug in StringList::Join (oops!)
Wed Jan 6 10:34:45 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htcommon/DocumentRef.cc(AddDescription): Remove delete
instruction that fouls up everything (it was removing descriptions
as we add them!).
Wed Jan 6 14:52:11 1999 Hans-Peter Nilsson <hp@axis.se>
* htlib/String.cc (allocate_space): Add missing [] to delete.
Wed Jan 6 05:53:02 1999 Hans-Peter Nilsson <hp@axis.se>
* htcommon/DocumentRef.cc(AddDescription): Do not add non-word
characters to the wordlist.
Wed Jan 6 00:28:19 1999 Hans-Peter Nilsson <hp@axis.se>
* htdoc/cf_byname.html: Fixed html syntax "<br" and "/a>".
Tue Jan 5 22:40:58 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htsearch/Display.cc: Check if we need to do backlink and date
factoring (e.g. we don't if they're zero!), from a patch by Gilles.
Tue Jan 5 20:57:02 1999 Alexander Bergolth <bergolth@ariel.wu-wien.ac.at>
* configure.in, htlib/Connection.cc: Check for strings.h for those
platforms that don't have it.
Tue Jan 5 14:24:52 1999 Geoff Hutchison <ghutchis@wso.williams.edu>
* htcommon/DocumentRef.h: Added comments on the members (fields)
of DocumentRef objects.
* htcommon/defaults.cc: Added new option max_descriptions for
limit on the number of descriptions to store (default 5, matches
behavior pre 3.1.0b3).
* htcommon/DocumentRef.cc: Support restriction of max_descriptions.
* .version: Bump to 3.1.0b5dev.
Tue Jan 5 20:07:05 1999 Alexander Bergolth <bergolth@ariel.wu-wien.ac.at>
* htdig/Retriever.cc: fixed bug in bad_querystring detection
Sat Jan 2 16:39:34 1999 Alexander Bergolth <leo@strike.wu-wien.ac.at>
* htdig/main.cc, htlib/Configuration.cc: Added warning message if
the locale selection was not successful. (e.g. because the locale
definition is not installed) config["locale"] is now set to the
return string of setlocale.
* {.,htcommon,htdig,htfuzzy,htlib,htmerge,htnotify,htsearch}/
Makefile.in, Makefile.config.in, configure.in: Changed to allow
compiling in seperate build directories.
Fri Jan 1 05:49:19 1999 Hans-Peter Nilsson <hp@axis.se>
* htdoc/attrs.html: Describe more thoroughly how "pdf_parser"
is used.
* htdoc/attrs.html: Fix typo for anchor/attribute
"allow_virtual_hosts".
* htdoc/attrs.html: Correct and add more verbose description of
external parser program parameters and fields.
Sun Dec 27 14:52:45 1998 Alexander Bergolth <leo@strike.wu-wien.ac.at>
* htlib/URL.cc: Small change in URL::removeIndex so that URLs are not
stripped if a query string ends with /index.html
* htsearch/Display.cc, htnotify/htnotify.cc: Added patches from
Gilles Detillieux <grdetil@scrc.umanitoba.ca> to fix memory leaks.
Sat Dec 19 17:53:44 1998 Alexander Bergolth <leo@strike.wu-wien.ac.at>
* htdig/main.cc, htdig/htdig.h, htdig/Retriever.cc: Added new option
bad_querystr. Allows exclusion when digging CGI-Scripts.
* htsearch/htsearch.cc, htsearch/Display.cc: Added new option
allow_in_form. Does currently not work with some special variable
names!
* htcommon/defaults.cc: Added the two new options.
Sat Dec 19 11:21:38 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* contrib/htparsedoc/parse_word_doc.pl: Update from Jesse.
* .version: Bump for 3.1.0b4.
* README: Ditto.
* Makefile.in: Remove references to version number.
* htnotify/htnotify.cc: Fix nasty security hole found by Werner
Hett <hett@isbiel.ch>.
Sat Dec 19 15:22:38 1998 Alexander Bergolth <leo@strike.wu-wien.ac.at>
* htlib/StringList.cc, htlib/StringList.h: Added StringList::Join
to simplify the creation of patterns for StringMatch.
* htlib/String.cc: lastIndexOf(char ch) added
* htlib/URL.cc: Changed URL::removeIndex to use local_default_doc.
(index.html was hardcoded) local_default_doc can be a list.
* htdig/main.cc, htlib/URL.cc: Use StringList::Join.
Sun Dec 13 23:06:35 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htsearch/Display.cc: Fix potential coredump when calculating
date_factor and backlink_factor on docs that aren't in the
database.
Sat Dec 12 23:17:56 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdoc/cf_byname.html, htdoc/cf_byprog.html, htdoc/attrs.html:
Added docs for new options since version 3.1.0b2.
* htdoc/RELEASE.html: Added notes on changes since 3.1.0b2 (we
should keep this up rather than all-at-once).
* htdoc/hts_templates: Include documentation on using CGI
environment variables in templates with this version.
* htdig/Retriever.cc(got_href): Added check to prevent
currenthopcount from becoming -1.
* htcommon/WordList.cc: Change undefined minimumWordLength to
config("minimum_word_length").
Sat Dec 12 12:01:55 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* Makefile.in, Makefile.config.in, */Makefile.in: Added target
mostlyclean to clean up, but leave compile-intensive targets
(e.g. db, rx code). General cleanup too.
* htdoc/where.html: Updated for eventual 3.1.0b3 release.
* htcommon/WordList.cc: Added additional cleanups for the words in
the bad word file, in case they have invalid punctuation, etc.
Sat Dec 12 18:41:29 1998 Alexander Bergolth <leo@strike.wu-wien.ac.at>
* htmerge/words.cc: Fix last update so that it compiles on AIX.
Fri Dec 11 10:40:48 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdig/Retriever.cc: Added additional debugging info on the
reason for excluding a URL, based on a patch by Benoit Majeau
<Benoit.Majeau@nrc.ca>.
* htmerge/words.cc: Fixed a bug where pointer, rather than strings
were assigned. Silly references...
* htsearch/Display.cc, htsearch/Display.h: Added patch from Gilles
to allow CGI environment variables in templates.
* htdig/HTML.cc: Fix core dump when META refresh tags don't have
content portions.
Thu Dec 10 22:28:44 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdig/Retriever.cc, htdig/Server.cc, htdig/Server.h:
Changed support for server_wait_time to use delay() method in
Server. Delay is from beginning of last connection to this
one. Currently this also delays local digging, which may not be ideal.
* htcommon/defaults.cc: Added option for server_max_docs as a
limit on the number of docs returned from a server.
* contrib/htparsedoc/parse_word_doc.pl: New version from
Jesse. New code speedups and better matching of punctuation.
* htdig/Document.cc: Check http_proxy_exclude to see if it's
empty. If so, use the proxy.
Mon Dec 7 21:46:34 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htsearch/htsearch.cc: Fix thinko with multiple excludes and
restricts. Pointed out by Gilles.
* htcommon/defaults.cc: Add new option server_wait_time for the
number of seconds to wait between requests.
* htdig/Retriever.cc: Use server_wait_time to call sleep() before
requests. Should help prevent server abuse. :-)
* htcommon/WordList.cc(valid_word): Remove unnecessary code.
* htcommon/DocumentRef.cc: Fix typo that added description text
that contained punctuation or was too short.
Sun Dec 6 13:12:55 1998 Geoff Hutchison <ghutchis@ethel.williams.edu>
* htsearch/parser.cc: Check for empty boolean searches and report
an error. Fixes bug reported by Chuck O'Donnell <cao@bus.net>.
* install-sh, mkinstalldirs: Import latest version from autoconf.
* htcommon/DocumentRef.cc: Add the text of descriptions to the
word database with weight description_factor.
* htcommon/WordList.cc: Ensure duplicate words have minimum
location and anchor attributes.
* htcommon/WordRecord.h: Ensure blank WordRecords have a default
count of 1 since a word has to exist to have a WordRecord!
* htdig/ExternalParser.cc, htdig/PDF.cc, htfuzzy/EndingsDB.cc:
Ensure temporary files are placed in TMPDIR if it's set.
* htdig/Retriever.cc: Don't add the text of descriptions to the
word db here, it's better to do it in the DocumentRef itself.
* htmerge/words.cc: Check for word entries that are essentially
duplicates and compact them.
Sat Dec 5 01:10:46 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdoc/THANKS.html: Updated for recent submissions.
* htdoc/FAQ.html: Cleaned up title.
* htdoc/uses.html: Added more sites and cleaned up the HTML.
Fri Dec 4 20:15:41 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* db/os/os_fsync.c, db/mutex/mutex.c: Patch from Klaus Mueller
<K.Mueller@intershop.de> to compile under CygWinB20.
* htdig/HTML.cc: Fix mistake in last update--file was included
twice.
* htdig/Retriever.cc: Do a check for blank URLs before adding them
to the list to be retrieved.
Fri Dec 4 19:21:17 1998 Didier Gautheron <dgautheron@magic.fr>
* htdig/HTML.cc: Fix parser bug with &lt; becoming a tag.
* htlib/Dictionary.cc: Added check for empty dictionaries.
* htlib/URL.cc: Allow server_aliases to work under virtual hosts.
* htmerge/htmerge.cc: Remove previous db.words.db file before
doing a word merging. Fixes bug with deleted documents keeping
entries.
* htdig/main.cc, htdig/Retriever.h, htdig/Retriever.cc: Added
parameter to Initial function to prevent URLs from being checked
twice during an update dig.
* htcommon/WordList.cc, htmerge/words.cc: Don't store c:1 and a:0
entries in db.wordlist to save space.
Fri Dec 4 19:08:28 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* configure.in, Makefile.in, Makefile.config.in: Remove DB_DIR and
RX_DIR.
* configure: Regenerated for configure.in changes.
* htsearch/htsearch.cc: Added usage message for the command line.
Fri Dec 4 18:52:55 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdoc/FAQ.html: Added question about phrase matching.
Fri Dec 4 21:21:00 1998 Alexander Bergolth <leo@leo.wu-wien.ac.at>
* configure.in: Check if the third argument of getpeername is a
size_t* or an unsigned int*.
* include/htconfig.h.in: Define GETPEERNAME_LENGTH_T.
* htlib/Connection.cc: Use GETPEERNAME_LENGTH_T as the type of the
third getpeername argument. Included strings.h which is needed for
FD_ZERO on AIX.
Thu Dec 3 23:03:15 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* configure.in: Check for getopt.h for those platforms that don't
have it. Fix checks for db and rx dirs since these names won't
change.
* include/htconfig.h.in: Define HAVE_GETOPT_H.
* configure: Generate from configure.in with latest autoconf
(2.12.2).
* htdig/Plaintext.cc: Removed compiler warnings.
* htdig/main.cc, htfuzzy/htfuzzy.cc, htmerge/htmerge.cc,
htnotify/htnotify.cc, htsearch/htsearch.cc: Use configure check to
only include getopt.h when it exists.
* htcommon/defaults.cc: Add new option http_proxy_exclude for
servers that shouldn't use the proxy, from a patch by Gilles
Detillieux.
* htdig/Document.h, htdig/Document.cc: Use it, from a patch by Gilles.
Tue Dec 1 21:36:37 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* Makefile.in: Fixed bug with "make depend," noted by Morgan Davis
<mdavis@cts.com>.
* htdig/main.cc, htfuzzy/htfuzzy.cc, htmerge/htmerge.cc,
htnotify/htnotify.cc, htsearch/htsearch.cc: Add include <getopt.h>
to help compiling under Win32 with CygWinB20.
* htdig/Retriever.cc: Update hopcount correctly by taking the
shortest paths to documents.
* htlib/DB2_db.cc: Added fix from Alexander Bergolth for Berkeley
DB under AIX.
* htlib/StringMatch.cc: Added fix from Christian Schneider
<cschneid@relog.ch>, discovered from behavior with limit_urls_to.
Tue Dec 1 18:06:33 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdoc/hts_form.html: Explained why config fields reject periods.
* htdoc/FAQ.html: Added information about Internal Server Errors.
* htdoc/uses.html: Updated with more sites, change e-mail to Geoff.
Sun Nov 29 21:26:56 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htsearch/htsearch.cc: Fix last update so it compiles (oops!).
* htdig/Document.cc: As above!
Sun Nov 29 20:06:58 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htsearch/htsearch.cc: Improved support for multiple restrict and
exclude patterns, based on code from Gilles Detillieux
and William Rhee <willrhee@umich.edu>.
* htdig/Document.cc, htdig/PDF.cc: Fixed problems under FreeBSD
where <sys/types.h> needed to be before <sys/stat.h>, noted by
Gilles.
* htdig/Server.cc: Fixed bug with robots.txt files containing
tabs, based on patch from Christian Schneider <cschneid@relog.ch>.
* htdig/Document.cc: Fixed core dumps caused by mystrptime
returning NULL. Instead, we'll use the current timestamp. Noted by
Michael Hauber <mhauber@datacore.ch> and
<MARK_ALLEYNE@Non-HP-UnitedKingdom-om8.om.hp.com>.
Fri Nov 27 19:09:33 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* db/*: Import of Sleepycat's Berkely DB 2.5.9
* rx/*: Import of FSF rx 1.5
* configure, configure.in: Updated to deal with changes in db, rx
directories.
* Attic/db-2.4.14.tar.gz: Removed old db package for update.
* htsearch/parser.cc: Removed bogus code with "%01" -> "|"
* htlib/URL.cc: Considers URLs with "%7E" to be equivalent to "~"
* htlib/String.cc: Changed MinimumAllocationSize to cut down on
memory usage on small strings.
* htdig/Retriever.h, htdig/Retriever.cc, htdig/HTML.cc: Changed
Retriever::got_word to check for small words, valid_punctuation to
remove bugs in HTML.cc.
* htcommon/defaults.cc: Changed backlink_factor to 1000,
description_factor to 150, match_method to and, and
meta_description factor to 50. Should produce more accurate search
results.
* htcommon/WordList.cc: Fixed bug with bad_words and
MAX_WORD_LENGTH, noted by Jeff Breidenbach <jeff@alum.mit.edu>.
* README: Updated to reflect bug-tracking system.
Tue Nov 24 15:57:28 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdig/Retriever.cc: Added patch to use local_default doc with
local_user_urls from Gilles Detillieux
<grdetil@scrc.umanitoba.ca>.
Mon Nov 23 18:57:16 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdoc/RELEASE.html, htdoc/bugs.html, htdoc/contents.html,
htdoc/where.html: Updated for new bug reporting system.
* htdoc/TODO.html: Updated To Do w/ current status.
Sun Nov 22 14:03:06 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* installdir/rundig: Added checks for synonym databases older than
the synonym files.
* htcommon/defaults.cc: New config options "description_factor"
for weighting words added as link descriptions, and
"no_excerpt_show_top" to show the top of an excerpt instead of the
"no_excerpt_text".
* htdig/Retriever.cc: Use "description_factor" to weight link
descriptions with the documents at the end of the link.
* htsearch/Display.cc: Adjust date_factor and backlink_factor
rankings to produce better results.
* htsearch/Display.cc: Use "no_excerpt_show_top."
* htsearch/htsearch.cc: Don't remove boolean operators from
boolean search strings!
Thu Nov 19 01:31:37 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdoc/FAQ.html: Update for -ldb problem on Digital UNIX.
Wed Nov 18 05:14:53 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdoc/FAQ.html: Update FAQ w/ new questions, better responses.
* htdoc/mailing.html: Mention additional archive at
www.mail-archive.com.
* htdoc/require.html: Update requirements (libstc++ instead of libg++).
Tue Nov 17 23:13:04 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* contrib/wordfreq/wordfreq.pl: Added changes by Isoif.
* htsearch/Display.cc: Added HTTP_REFERER to htsearch logging
* htdig/Document.cc: Fixed memory leak as a result of thinko.
* htcommon/DocumentRef.cc: Removed limit on number of link
descriptions.
Mon Nov 16 22:30:07 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htcommon/defaults.cc: Declare new config options backlink_factor
and date_factor for counting document backlink counts and modifed
dates in rankings.
* htsearch/Display.cc: Use above factors.
* htsearch/ResultMatch.cc: Clarify getScore() comments.
* htlib/mktime.c: Import new version.
* installdir/htdig.conf: Add max_doc_size example (to help w/FAQ).
Mon Nov 16 10:46:15 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdig/ExternalParser.cc: Add checks for null tokens, adapted
from patch by Vadim Checkan.
* htdig/Retriever.cc: Count docBackLinks accurately (previously
all docs had count of 2!).
Sun Nov 15 17:04:34 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdig/HTML.cc(do_tag): Fix for refresh tags w/o URLs.
* htmerge/docs.cc, htmerge/words.cc: Change \r to \n, as mentioned
by Andrew Bishop.
* htcommon/DocumentRef.h, htcommon/DocumentRef.cc: Define new fields
docBackLinks (backlink count) and docSig (document signature).
* htdig/Retriever.cc: Keep track of docBackLinks.
* htsearch/Display.cc: Add variable BACKLINKS to display the count.
Sat Nov 14 20:30:18 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdig/HTML.cc(parse, do_tag): Ensure links respect META robot
settings. Patch contributed by Michael Spann
<mikes@mail.sv.dialogic.com>.
* htdig/HTML.cc(do_tag): Eliminate bug that ignores "?" in URLs
* htdig/HTML.cc(do_tag): Add support for META refresh tags as
"redirects", submitted by Aidas Kasparas
<kaspar@dobilas.infosistema.lt>.
Thu Nov 12 04:13:26 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdoc/contents.html: Added link to jitterbug bug db.
Sun Nov 8 21:10:19 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdoc/ChangeLog, htdoc/RELEASE.html, htdoc/THANKS.html:
Correct spelling error with Rene' Seindal's name.
* htdoc/hts_templates.html: Update to improve clarity.
Sun Nov 8 20:33:22 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdig/Document.cc: Changed reset to keep proxy settings--fixes
bug noted by Didier Gautheron <dgautheron@magic.fr>
Fri Nov 6 17:07:00 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* contrib/wordfreq/wordfreq.pl: Updated with patch from Isoif
Fettich <ifettich@netsoft.ro> to use Berkeley DB.
* contrib/whatsnew/whatsnew.pl: Fixed mistake from Oct 26 change.
* contrib/htparsedoc/parse_word_doc.pl: Added file contributed by
Jesse.
* contrib/README: Updated to include short descriptions of the scripts.
* contrib/multidig/*: New scripts to make working with multiple DB
a little easier.
* configure, configure.in: Added changes to support snapshots.
* .version: Resurrected to automate snapshot versions.
Wed Nov 4 20:13:10 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdoc/contents.html: Added "Contributors" for THANKS.html
* htdoc/THANKS.html: Added acknowledgement to contributors.
Wed Nov 4 15:02:43 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htnotify/htnotify.cc: Fixed buglet with -F flag to sendmail.
* htdig/Plaintext.cc: Added patch from Vadim Chekan to change char
to unsigned char to fix reading Cyrillic plaintext files.
Mon Nov 2 15:34:53 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htnotify/htnotify.cc, Makefile.config.in, README:
Changed "HTDig" to "ht://Dig."
Sun Nov 1 20:34:14 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* Makefile.in: Fixed buglet with dist target.
* htdig/Makefile.in: Fixed buglet with distclean target.
* htdoc/FAQ.html, htdoc/RELEASE.html, htdoc/attrs.html
htdoc/cf_byname.html, htdoc/cf_byprog.html, htdoc/htdig.html
htdoc/hts_templates.html: Updated documentation for new features,
bug-fixes in ht://Dig 3.1.0b2.
* htlib/Makefile.in, htlib/lib.h: Call mytimegm.cc instead of timegm.c.
* Attic/makedp: Remove file generated by configure
* htdig/Document.cc: Remove const from *ext to fix compiler warning.
Sun Nov 1 00:17:08 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htsearch/Display.cc: Added template var DESCRIPTION as first
item in DESCRIPTIONS, as requested by Ryan Scott
<test@netcreations.com>.
* htlib/mytimegm.cc: Resurrected mytimegm() until problems with
glibc version can be solved.
* htdig/Document.cc, htdig/Retriever.cc, htfuzzy/Prefix.cc,
htsearch/WeightWord.cc, htsearch/htsearch.cc: Replaced system
calls with htlib/my* functions.
Sat Oct 31 23:58:22 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htlib/URL.cc: Fixed compiler warning.
* rx-1.5/Attic/Makefile, rx-1.5/Attic/config.log:
Removed useless Makefile and config.log file.
Tue Oct 27 22:53:03 1998 Andrew Scherpbier <andrew@contigo.com>
* */Makefile.in (depend): Fixed so that 'make depend' works
again. (Not sure exactly how long it was broken!)
Tue Oct 27 20:00:16 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* Makefile.in: Fix buglet with distclean target
* configure configure.in: Added check for LOCALTIME_R, removed
test for timegm replacement, changed compiler for most tests to
$CC.
* include/htconfig.in: Added option for LOCALTIME_R.
* htlib/timegm.c, htlib/mktime.c: Fixed some compilation problems.
* htlib/Makefile.in: Remove mktime.o since source is included in
timegm.o.
Tue Oct 27 13:31:25 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htlib/mktime.c: Imported new version from glibc-2.0.99.
* htcommon/DocumentDB.cc: Fixed bug noted by Vadim Chekan with
CreateSearchDB.
Mon Oct 26 15:27:28 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* Makefile.config.in, configure.in, configure: Fixed problem with
-ldb, -lrx, etc. not being declared in $LIBS
* htdoc/install.html: Added remarks about using ./configure
--prefix=
* README: Cleaned up for new URLs, version numbers, etc.
* htsearch/htsearch.cc: Added patch by Esa Ahola fixing bug with
not ingoring bad_words properly.
* contrib/whatsnew/whatsnew.pl: Added fix from Jacques Reynes
<Jacques.Reynes@cict.fr> to get whatsnew to work with Berkeley DB.
* htdig/Retriever.cc, htdig/Document.cc: Fixed bug introduced by
Oct 18 change. Authorization will not be cleared.
* htlib/URL.cc: Fixed new -Wall warnings.
Wed Oct 21 13:30:05 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htlib/timegm.c: Corrected Oct 17 change. Should now work. :-)
* htcommon/defaults.cc: Added defaults for new directives
server_aliases and limit_normalized.
* htdig/HTML.cc: Cleaned up HTML parsing based on patch by Rene'
Seindal.
Wed Oct 21 18:31:00 1998 Alexander Bergolth <leo@leo.wu-wien.ac.at>
* htlib/URL.cc, htlib/URL.h: Added patch to support translation of
server names. (Configuration directive: server_aliases)
* htdig/Retriever.cc, htdig/htdig.h, htdig/main.cc:
Additional limiting after normalization of the URL.
(Configuration directive: limit_normalized)
Sun Oct 18 17:19:51 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htlib/Connection.h, htlib/Connection.cc: Define new function
timeout() as adapted from a patch by Rene' Seindal.
* htdig/Document.cc: Use it as adapted from a patch by Rene' Seindal.
Sun Oct 18 16:33:58 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htcommon/DocumentDB.cc: Changed deserialize function to
explicitly delete DocumentRef.
* htcommon/DocumentRef.cc: Added trap for DOC_STRING value.
* htdig/Retriever.cc: Delete and reallocate Document variable
before retrieving. (Fixes database corruption bug) Removed code to
add a "/" to every URL with a 404--servers should send a redirect
in this case.
Sat Oct 17 20:15:44 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htlib/timegm.c: Declare __gmtime_r if not defined
Sat Oct 17 10:15:57 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* configure.in: Fixed problem with configuring DB_DIR introduced
by Oct 11 change.
* configure: Regenerated by autoconf for above fix.
* htlib/Connection.h, htlib/Connection.cc: Included fixes sent by
Paul J. Meyer <pmeyer@rimeice.msfc.nasa.gov> to fix connections on
Dec Alpha environments.
* htsearch/Display.cc, htsearch/Display.h,
htdoc/hts_templates.html: Added variable CURRENT as the number of
the current match, adapted from a patch by Rene' Seindal
<seindal@webadm.kb.dk>
* htcommon/defaults.cc: Changed htdig.sdsu.edu to www.htdig.org in
start_urls
Wed Oct 14 03:43:22 1998 turtle <turtle@kiwi>
* installdir/htdig.conf: fixed broken link pointed out by
chris@impulsedata.net, moved maintainer stuff up in the file
Sun Oct 11 22:16:27 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htlib/DB2_db.cc: Added fix suggested by Domotor Akos
<dome@impulzus.sch.bme.hu> with (char *)NULL cast.
* htlib/Attic/mytimegm.cc: Removed old mytimegm function.
* installdir/syntax.html: Improved boolean method error
message. It now gives examples of boolean expressions.
* htcommon/defaults.cc, htsearch/Display.cc, htsearch/Display.h,
htsearch/parser.cc: Added htsearch logging patch from Alexander
Bergolth.
* */Makefile.in, include/htconfig.h.in, htdig/Document.cc,
htdig/Images.cc, Attic/.version, Makefile.config.in, Makefile.in,
configure, configure.in, mkinstalldirs: Updated Makefiles and
configure variables.
* htfuzzy/Endings.cc, htfuzzy/Fuzzy.cc, htfuzzy/Prefix.cc,
htfuzzy/htfuzzy.cc, htlib/DB2_db.cc, htcommon/DocumentDB.cc:
Removed more -Wall warnings.
Fri Oct 9 00:29:18 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdig/Retriever.cc: Fixed typo with "meta_desription_factor".
* htdig/Images.cc: Use user_agent config in GET request.
Thu Oct 8 09:05:41 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* installdir/syntax.html: Improved Boolean search description.
Mon Oct 5 11:30:16 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* contrib/ewswrap/ewswrap.cgi, contrib/ewswrap/htwrap.cgi,
contrib/ewswrap/README: New scripts, contributed by John Grohol
PsyD <johngr@cmhcsys.com>.
Fri Oct 2 13:11:24 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdig/Retriever.cc: Added check for docs removed with
noindex. Now words in these docs should be ignored for the word
db.
Fri Oct 2 13:09:04 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* CONFIG Makefile.config.in Makefile.in */Makefile.in,
htcommon/defaults.cc htdig/main.cc, htfuzzy/htfuzzy.cc,
htmerge/htmerge.cc, htnotify/htnotify.cc include/htconfig.h.in:
More configure improvements--use top_srcdir instead of
HTDIG_TOP, use PACKAGE, VERSION, etc.
Fri Oct 2 11:32:59 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htlib/StringList.cc: Added patch by Alexander Bergolth for bug
with multiple delimeter characters
Fri Oct 2 15:22:06 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* installdir/rundig, configure.in, CONFIG, CONFIG.in, aclocal.m4,
configure: Improvements in configure.in, notably using --prefix=
and --exec-prefix=
Tue Sep 29 19:26:11 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdig/HTML.cc: Added patch from Tim Frost <tim@nz.eds.com> for
single quotes around URLs.
* htfuzzy/Prefix.cc: Added patch from Esa to fix Prefix matching
for capitalization.
* htcommon/defaults.cc: Added modification_time_is_now config
* htdig/Document.cc:, htdig/Retriever.cc: Added patch from Andrew
Bishop <amb@gedanken.demon.co.uk> for above to use modification
times when servers do not supply them.
* htsearch/htsearch.cc: Added patch from Andrew Bishop for -c switch.
Wed Sep 23 14:46:34 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htcommon/defaults.cc, htdig/Server.cc: Added case_sensitive
attribute to work on case insensitive servers.
Wed Sep 23 11:58:22 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htsearch/Display.cc: re-fixed bug noted by Alexander Bergolth
* htlib/Attic/timegm.cc, htlib/Makefile.in, htlib/mktime.c,
htlib/mytimegm.cc, htlib/timegm.c: Switched to using glibc timegm
replacement.
* configure, configure.in, Makefile.config.in: Add configure
searches for acroread and sendmail programs.
* htnotify/Makefile.in, htnotify/htnotify.cc,
htcommon/Makefile.in, htcommon/defaults.cc: Use them.
* htdig/HTML.cc: Fix thinko in META robots tag.
* htcommon/defaults.cc: Define iso_8601 date formatting option
* htsearch/Display.cc, htnotify/htnotify.cc: Use it as suggested
by Knut A. Syed <Knut.Syed@nhh.no>
Fri Sep 18 14:35:02 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htsearch/Display.cc: Fixed bug noted by Alexander Bergolth
<leo@strike.wu-wien.ac.at> in exclude logic
* htdig/HTML.cc: Fixed bug in comma-separated keywords noted by
<C.H.Liddiard@qmw.ac.uk>
* installdir/synonyms: New version contributed by John Banbury
<lijab@flinders.edu.au>
Fri Sep 18 00:38:09 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* .version: Bump to 3.1.0b2
* htsearch/Makefile.in, htdig/Makefile.in, htfuzzy/Makefile.in,
htlib/Makefile.in, htmerge/Makefile.in,
htnotify/Makefile.in, htcommon/Makefile.in: Remove include
.sniffdir directive.
* htdig/HTML.cc: Fix horrible META description coding.
* htfuzzy/EndingsDB.cc, htfuzzy/Fuzzy.cc htfuzzy/Synonym.cc,
htfuzzy/htfuzzy.cc: Change "\r" to "\n" in statistics on
suggestion of Andrew M. Bishop <amb@gedanken.demon.co.uk>
* Makefile.config.in: Remove -ggdb from LDFLAGS.
Tue Sep 15 22:31:48 1998 turtle <turtle@kiwi>
* Makefile.in: add substitution for @DATABASE_DIR@
Thu Sep 10 00:06:58 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdig/HTML.cc: Change debug level of META tags.
* htsearch/TemplateList.cc, htsearch/htsearch.cc, htsearch/Display.cc,
htsearch/Display.h: Backed out builtin-long default from Monday, now
use error handler
Mon Sep 7 23:19:12 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* contrib/htparsedoc: Added contributed external parser for MS
Word documents by Richard Jones <rjones@imcl.com>.
* htdig/Document.cc: Added fix to use htparsedoc.
* htdoc/*.html: Merged in new documentation for htdig-3.1.0b1.
* htdig/HTML.cc: Extended "noindex" behavior in previous patch.
* htcommon/defaults.cc: Added user_agent config option.
* htdig/Document.cc: Use it.
Mon Sep 7 00:34:19 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htcommon/DocumentRef.h: Added DocState for documents marked as
"noindex".
* htdig/HTML.cc, htdig/Retriever.h, htdig/Retriever.cc,
htmerge/docs.cc: Use it to remove them.
* htsearch/TemplateList.cc: Add default template of builtin-long
to slot 0 in case of an error.
* htsearch/Display.cc: Use it.
Sun Sep 6 21:36:16 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htcommon/defaults.cc: Sorted the current list of defaults, added
"pdf_parser" for the program to use in PDF.cc.
* htdig/PDF.cc: Use it, checking for the file before calling
system to fail gracefully.
* htlib/URL.cc: Bug fix for http:/ v. http://
Sat Sep 5 23:11:48 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htlib/String.cc: Added patch by Zvi Har'El
<rl@math.technion.ac.il> to indexOf function to prevent "false
positive" matches.
* installdir/nomatch.html, installdir/syntax.html: Fixed reference
to ht://Dig 3.0.
* htdig/Document.cc: Use robotstxt_name as user-agent as a more
consistent approach.
* htsearch/parser.cc: Convert "%01" to "|" to support <SELECT
... MULTIPLE> tags.
Thu Sep 3 20:53:51 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htsearch/Makefile.in: Remove reference to -lgdbm
* htsearch/Display.cc: Send Content-type header after all variable
expansion is completed.
* htcommon/WordList.cc: Removed warning under egcs-1.1
Tue Aug 11 08:58:34 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htsearch/Display.cc, htdig/Retriever.h,
htdig/Retriever.cc, htdig/Parsable.h, htdig/Parsable.cc,
htdig/HTML.h, htdig/HTML.cc, htcommon/defaults.cc,
htcommon/DocumentRef.h, htcommon/DocumentRef.cc,
htcommon/DocumentDB.cc:
Second patch for META description tags. New field in DocDB for the
desc., space in word DB w/ proper factor.
* htmerge/docs.cc: Added statistic for total size of docs in DB.
Thu Aug 6 10:15:22 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdig/Retriever.cc: Added "local_dir_doc" config option,
the default filename in a directory.
* htcommon/defaults.cc: Fixed "elipses" spelling mistake,
local_dir_doc as above
Tue Aug 4 11:34:46 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htlib/Configuration.cc: Added fix by Philippe Rochat
<prochat@lbdsun.epfl.ch> to remove whitespace after config
options.
* htdig/HTML.cc, htdig/HTML.h: Added support for META robots tags.
Mon Aug 3 16:50:46 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htsearch/ResultList.cc, htnotify/htnotify.cc,
htmerge/htmerge.cc, htmerge/docs.cc, htlib/String.cc,
htlib/ParsedString.cc, htfuzzy/Substring.cc,
htfuzzy/Prefix.cc, htfuzzy/Exact.cc,
htdig/SGMLEntities.cc, htdig/Retriever.cc, htdig/PDF.cc,
htdig/HTML.cc, htdig/Document.cc:
Fixed compiler warnings under -Wall
Mon Aug 3 05:56:23 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htsearch/Display.cc: Spelling correction for "ellipses"
Thu Jul 23 12:14:34 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdig/PDF.cc, htdig/PDF.h, htdig/Document.cc: Added files (and
patch) from Sylvain Wallez for PDF parsing. Incorporates fix for
non-Adobe PDFs.
* htcommon/defaults.cc: Removed .pdf extension from bad_extensions.
Wed Jul 22 10:04:31 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htsearch/Display.cc: Added patch from Sylvain Wallez
<s.wallez.alcatel@e-mail.com> to use the filename if no title is found.
* htnotify/htnotify.cc: Added patch from Chris Jason
Richards <richards@cs.tamu.edu> to fix problems with sendmail.
Tue Jul 21 09:56:58 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htsearch/Display.cc: Added patch by Rob Stone
<rob@psych.york.ac.uk> to create new environment variables to
htsearch: SELECTED_FORMAT and SELECTED_METHOD.
Sun Jul 19 09:51:47 1998 Andrew Scherpbier <andrew@contigo.com>
* configure.in (berkeley db stuff): Added the berkeley db .tar.gz
to the distribution and modified configure.in to extract it if it
needs to.
Thu Jul 9 09:39:01 1998 Geoff Hutchison <ghutchis@wso.williams.edu>
* htdig/Server.cc, htdig/Retriever.h, htdig/Retriever.cc,
htdig/Document.h, htdig/Document.cc, htcommon/defaults.cc: Added
support for local file digging using patches by Pasi Eronen
<pe@iki.fi>. Patches include support for local user (~username)
digging.
* htdig/HTML.h, htdig/HTML.cc, htcommon/defaults.cc:
Added support for META name=description tags. Uses new config-file
option "use_meta_description" which is off by default.
Mon Jun 22 05:02:01 1998 turtle <turtle@kiwi>
* configure.in:
Added test to make sure that the berkeley db library is present
* .cvsignore: Ignore the berkeley db library
* configure: changed
* Makefile.config.in: Removed GDBM references
* Makefile.in: Removed GDMB references
* .version: updated version to 3.1.0b1
* README: Updated version # and website location
* htdig/HTML.cc: Applied patch that prevented SGML entities that
translate to valid_punctuation characters from becoming part of
words
* configure.in: Removed references to GDBM
* htcommon/defaults.cc: Got rid of my email address as the default
maintainer
* htdig/htdig.conf: simple config file for development
* htlib/String.cc, htlib/Attic/SDSU.h, htlib/Attic/SDSU.cc,
htlib/DB2_db.cc, htlib/Connection.cc, htlib/Configuration.cc,
htlib/BTree.cc: New Berkeley database stuff
* htlib/.sniffdir/ofiles.incl: removed SDSU.*
* installdir/syntax.html, installdir/search.html,
installdir/rundig, installdir/nomatch.html, installdir/htdig.conf,
installdir/footer.html: Changed to use the new
http://www.htdig.org/ instead of the sdsu site
Sun Jun 21 23:20:14 1998 turtle <turtle@kiwi>
* rx-1.5/rx/Attic/config.log, htsearch/htsearch.cc,
htsearch/Attic/display.cc, htsearch/Display.cc, htmerge/docs.cc,
htlib/.sniffdir/ofiles.incl, htlib/Database.h, htlib/DB2_db.cc,
htlib/DB2_db.h, htlib/Database.cc, htfuzzy/.sniffdir/ofiles.incl,
htfuzzy/Prefix.cc, htfuzzy/Prefix.h, htfuzzy/Makefile.in,
htfuzzy/Fuzzy.cc, htcommon/defaults.cc, configure.in, Makefile.in,
Makefile.config.in: patches by Esa and Jesse to add BerkeleyDB and
Prefix searching
Mon Jun 15 18:15:50 1998 turtle <turtle@kiwi>
* htdig/HTML.cc: Added suggestion by Chris Liddiard to add ',' to
the list of separator characters for meta keyword parsing
Tue May 26 03:58:14 1998 turtle <turtle@kiwi>
* rx-1.5/rx/Attic/config.log, htlib/htString.h, htlib/cgi.cc,
htlib/URL.cc, htlib/String.cc, htlib/ParsedString.cc,
htlib/Database.cc, htlib/Connection.cc: Got rid of compiler
warnings.
* rx-1.5/rx/.cvsignore: added config.log
Fri Apr 3 17:10:44 1998 turtle <turtle@kiwi>
* htsearch/Display.cc: Patch to make excludes work
Tue Mar 10 16:02:32 1998 turtle <turtle@kiwi>
* htlib/strcasecmp.cc: Applied patch by Bernhard Griener to add
arguments checks in the mystrncasecmp() function
Sun Feb 22 17:43:49 1998 turtle <turtle@kiwi>
* htdoc/mailing.html: New mailing list archive location
Tue Feb 17 18:05:40 1998 turtle <turtle@kiwi>
* htdoc/uses.html: added new one
Thu Feb 12 22:22:15 1998 turtle <turtle@kiwi>
* htdoc/uses.html: Added more sites
Mon Jan 5 06:14:11 1998 turtle <turtle@kiwi>
* configure, configure.in: Added check for fstream.h to get rid of
the annoying emails about ht://Dig not compiling...
* Makefile.config.in: Added include of the GDBM library back
* .version: Now at version 3.0.9
* include/htconfig.h.in: Changed refs to time related stuff
* htmerge/htmerge.cc, htmerge/docs.cc: format changes
* htdig/Document.cc: Changed tm from pointer to real structure
* htlib/.sniffdir/ofiles.incl, htlib/timegm.cc: Our own timegm
function
* rx-1.5/rx/.cvsignore, rx-1.5/rx/Attic/Makefile: cvs cleanup
* htmerge/docs.cc: Fixed memory leak
* htlib/lib.h: Added own replacement of timegm()
* htlib/Dictionary.cc: Fixed memory leaks
* htlib/Connection.cc: Fix by Pontus Borg for AIX. Changed
'size_t' to 'unsigned long' for the length parameter for
getpeername()
* htfuzzy/Metaphone.cc: formatting changes
* htdig/Retriever.cc: fixed memory leak
* htdig/Document.cc: * Alarm was not cancelled if readHeader
returned anything but OK * Use our own timegm() replacement if
necessary
* htcommon/DocumentRef.h, htcommon/DocumentRef.cc: format changes
* htcommon/DocumentDB.h: reformatting
* htcommon/DocumentDB.cc: Fixed major memory leak
* include/.cvsignore, include/Attic/htconfig.h, rx-1.5/.cvsignore,
rx-1.5/Attic/config.cache, rx-1.5/Attic/config.status,
rx-1.5/rx/.cvsignore, rx-1.5/rx/Attic/config.status,
htlib/Attic/htlib.proj, htmerge/.cvsignore,
htmerge/Attic/htmerge.proj, htnotify/.cvsignore,
htnotify/Attic/htnotify.proj, htsearch/.cvsignore,
htsearch/Attic/htsearch.proj, Attic/config.cache,
htcommon/Attic/htcommon.proj, htfuzzy/.cvsignore,
htfuzzy/Attic/htfuzzy.proj, lookfor: General cleanup of archived
stuff
* .cvsignore: config.cache added
* htdig/.cvsignore: Added htdig
Tue Dec 16 15:57:22 1997 turtle <turtle@kiwi>
* htdig/Document.cc: Added little patch by Tobias Oetiker
<oetiker@ee.ethz.ch> that should fix problems with timeouts.
Thu Dec 11 00:28:59 1997 turtle <turtle@kiwi>
* htlib/URL.h, htlib/URL.cc: Added double slash removal code.
These were causing loops.
Thu Oct 23 18:01:10 1997 turtle <turtle@kiwi>
* htlib/Connection.cc: Fix by Pontus Borg for AIX. Changed
'size_t' to 'unsigned long' for the length parameter for
getpeername()
Mon Oct 13 02:13:52 1997 turtle <turtle@kiwi>
* htdig/Attic/Makefile, htdig/Attic/htdig.proj: remove files that
shouldn't be in the repository
* htdig/.cvsignore: Ignore Makefile
* htdoc/cf_byname.html, htdoc/cf_byprog.html, htdoc/attrs.html,
htdoc/ChangeLog: Added documentation for the external_parsers
attribute.
Mon Jul 14 15:32:22 1997 turtle <turtle@kiwi>
* htdoc/uses.html: added cambridge
Wed Jul 9 15:57:30 1997 turtle <turtle@kiwi>
* htdoc/uses.html: added the rhodos project
Mon Jul 7 22:15:45 1997 turtle <turtle@kiwi>
* htdig/Document.cc: Removed old getdate() code that replaced '-'
with ' '.
* htlib/URL.cc: Sequences of "/./" are now replaced with "/" to
reduce the chance of infinite loops
* htdig/Document.cc: Added better date parsing. Now also supports
the old RFC 850 format
Thu Jul 3 17:44:39 1997 turtle <turtle@kiwi>
* htdoc/cf_byname.html, htdoc/cf_byprog.html,
htcommon/defaults.cc, htdig/htdig.h, htdoc/attrs.html,
htlib/Configuration.h, htlib/URL.cc, htdig/Attic/Makefile,
htdig/Document.cc: Added support for virtual hosts
Mon Jun 30 17:07:49 1997 turtle <turtle@kiwi>
* htdoc/uses.html: Added Depaul university
Tue Jun 24 14:59:45 1997 turtle <turtle@kiwi>
* Makefile.in: Fixed syntax error in the installation target.
Mon Jun 23 17:33:14 1997 turtle <turtle@kiwi>
* htdig/Attic/teamball.conf, htdig/Attic/tsdsu.conf,
htdig/Attic/rohan.conf, htdig/Attic/sdsu.conf, htdig/Attic/t.conf,
htdig/Attic/nsdsu.conf, htdig/Attic/daztec.conf,
htdig/Attic/max.conf, htdig/htdig.conf, htdig/Attic/Makefile,
htdig/Attic/catalog.conf: Removed old config files
* htdoc/FAQ.html: FAQ initial
* htdoc/contents.html: Added link to the new FAQ
* htdoc/FAQ.html: *** empty log message ***
* htnotify/htnotify.cc: Added version info to the usage output
* htfuzzy/htfuzzy.cc: Added version info the usage output
* htmerge/htmerge.cc: Added version info to usage message
* htdig/main.cc: Added version info to the usage message
Mon Jun 16 15:35:56 1997 turtle <turtle@kiwi>
* installdir/footer.html: Changed the hardcoded version number to
the new VERSION variable
* htdoc/hts_templates.html: Added docs for the VERSION and PERCENT
variables
* htsearch/Display.cc: Added PERCENT and VERSION variables for the
output templates
Sat Jun 14 18:52:42 1997 turtle <turtle@kiwi>
* htdig/Document.cc: Made redirect detection code more general
Fri Jun 13 05:31:17 1997 turtle <turtle@kiwi>
* htdoc/cf_general.html: Fixed typo
Thu Jun 5 15:00:53 1997 turtle <turtle@kiwi>
* htdoc/uses.html: added VG Gas Analysis Systems
Tue Jun 3 17:49:05 1997 turtle <turtle@kiwi>
* installdir/english.0.original, installdir/english.0: Added new
english dictionary for the endings algorithm
Thu May 29 14:56:40 1997 turtle <turtle@kiwi>
* htdoc/uses.html: Added Indiana University Computer Security
Office
Wed May 28 14:47:25 1997 turtle <turtle@kiwi>
* htdoc/main.html: Fixed typo
Mon May 19 15:23:18 1997 turtle <turtle@kiwi>
* htdoc/uses.html: Added daily californian online
Tue May 13 19:28:32 1997 turtle <turtle@kiwi>
* htdoc/uses.html: Added The Reohr Group
* htdoc/uses.html: Added the Linux Documentation Project
Sun May 11 17:52:05 1997 turtle <turtle@kiwi>
* htdoc/index.html: Made the contents frame a little wider so that
text doesn't wrap
* htdoc/uses.html: Added NOVA and Gajo & Associati
Fri May 2 23:35:56 1997 turtle <turtle@kiwi>
* htdoc/uses.html: added www.bajan.org
Wed Apr 30 22:28:28 1997 turtle <turtle@kiwi>
* htdoc/uses.html: Added Caldera, Inc.
Sun Apr 27 14:43:31 1997 turtle <turtle@kiwi>
* htsearch/parser.cc, htsearch/parser.h, include/Attic/htconfig.h,
htdoc/RELEASE.html, htdoc/uses.html, htdoc/where.html,
htlib/URL.cc, htlib/strcasecmp.cc, htsearch/htsearch.cc, .version,
README, htdig/Attic/Makefile, htdoc/ChangeLog: changes
Mon Apr 21 15:44:39 1997 turtle <turtle@kiwi>
* htsearch/htsearch.cc: Added code to check the search words
against the minimum_word_length attribute
Sun Apr 20 15:27:37 1997 turtle <turtle@kiwi>
* CONFIG: Made paths more generic
* htdig/Document.cc: Added include for ctype.h
* htdig/Plaintext.cc: Fixed bug
Tue Apr 1 17:56:57 1997 turtle <turtle@kiwi>
* htdoc/uses.html: added ukc
Sun Mar 30 01:18:16 1997 turtle <turtle@kiwi>
* htdig/Attic/Makefile, htdoc/uses.html, Attic/Makefile.config,
Attic/config.log, Attic/config.status, .cvsignore, Attic/Makefile,
htsearch/Attic/Makefile, htsearch/.cvsignore,
htnotify/Attic/Makefile, htnotify/.cvsignore, htmerge/.cvsignore,
htmerge/Attic/Makefile, htlib/.cvsignore, htlib/Attic/Makefile,
htfuzzy/.cvsignore, htfuzzy/Attic/Makefile, htcommon/.cvsignore,
htcommon/Attic/Makefile: update
Thu Mar 27 00:06:05 1997 turtle <turtle@kiwi>
* htdig/Plaintext.cc: Applied patch supplied by Peter Enderborg
<pme@ufh.se> to fix a problem with a pointer running off the end
of a string.
Mon Mar 24 04:33:26 1997 turtle <turtle@kiwi>
* rx-1.5/rx/Attic/config.log, rx-1.5/rx/Attic/config.status,
htsearch/htsearch.h, htsearch/parser.h, include/Attic/htconfig.h,
rx-1.5/Attic/config.status, htsearch/Attic/Makefile,
htsearch/ResultList.cc, htsearch/ResultMatch.h,
htsearch/Template.h, htsearch/WeightWord.h, htlib/cgi.cc,
htlib/htString.h, htlib/io.cc, htmerge/Attic/Makefile,
htmerge/htmerge.h, htnotify/Attic/Makefile, htlib/StringList.cc,
htlib/StringList.h, htlib/String_fmt.cc, htlib/URL.h,
htlib/URLTrans.cc, htlib/Attic/SDSU.cc, htlib/Attic/String.h,
htlib/ParsedString.h, htlib/String.cc, htfuzzy/htfuzzy.cc,
htlib/Attic/Makefile, htlib/Configuration.cc, htlib/Connection.cc,
htlib/Database.h, htdig/URLRef.h, htfuzzy/Attic/Makefile,
htfuzzy/Exact.cc, htfuzzy/Fuzzy.h, htfuzzy/Substring.cc,
htfuzzy/SuffixEntry.h, htdig/Plaintext.cc, htdig/Postscript.cc,
htdig/SGMLEntities.cc, htdig/Server.cc, htdig/Server.h,
htdig/Attic/Makefile, htdig/ExternalParser.cc,
htdig/ExternalParser.h, htdig/Parsable.h, htcommon/Attic/Makefile,
htcommon/DocumentRef.h, htcommon/WordList.cc, htcommon/WordList.h,
htcommon/WordReference.h, htdig/Document.h, Attic/config.status,
configure, configure.in, Attic/Makefile, Attic/Makefile.config,
Attic/config.cache, Attic/config.log, Makefile.config.in: Renamed
the String.h file to htString.h to help compiling under win32
* Makefile.in: Updated "make dist" to remove CVS stuff
Fri Mar 14 17:15:32 1997 turtle <turtle@kiwi>
* htcommon/defaults.cc: Changed default value for remove_bad_urls
to true
Thu Mar 13 18:37:50 1997 turtle <turtle@kiwi>
* htnotify/htnotify.cc, Attic/Makefile.config,
htdig/SGMLEntities.cc, htdoc/uses.html: Changes
Thu Feb 27 00:52:52 1997 turtle <turtle@kiwi>
* htdoc/uses.html: new uses
Mon Feb 24 17:52:55 1997 turtle <turtle@kiwi>
* htsearch/htsearch.cc, htnotify/Attic/Makefile,
htsearch/Attic/Makefile, htlib/strcasecmp.cc,
htmerge/Attic/Makefile, htlib/Attic/Makefile, htlib/String.cc,
htlib/StringMatch.cc, htdig/SGMLEntities.cc,
htfuzzy/Attic/Makefile, htdig/Attic/Makefile,
htcommon/Attic/Makefile, htcommon/WordList.cc: Applied patches
supplied by "Jan P. Sorensen" <japs@garm.adm.ku.dk> to make
ht://Dig run on 8-bit text without the global unsigned-char option
to gcc.
Sun Feb 23 17:29:38 1997 turtle <turtle@kiwi>
* htdoc/uses.html: *** empty log message ***
Tue Feb 18 15:03:03 1997 turtle <turtle@kiwi>
* htdoc/uses.html: New uses of ht://Dig
Tue Feb 11 00:38:48 1997 turtle <turtle@kiwi>
* htsearch/htsearch.cc: Renamed the very bad wordlist variable to
badWords
Mon Feb 10 17:32:47 1997 turtle <turtle@kiwi>
* htlib/Connection.cc, htdig/Document.h, htdig/Document.cc,
htcommon/DocumentRef.cc, htcommon/DocumentRef.h: Applied AIX
specific patches supplied by Lars-Owe Ivarsson
<lars-owe.ivarsson@its.uu.se>
Fri Feb 7 18:04:13 1997 turtle <turtle@kiwi>
* htlib/URL.cc: Fixed problem with anchors without a URL
Mon Feb 3 17:37:59 1997 turtle <turtle@kiwi>
* .version, README: updated stuff to 3.0.8
* Many files: Initial CVS
Local Variables:
add-log-time-format: current-time-string
End: