You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
8764 lines
329 KiB
8764 lines
329 KiB
3 years ago
|
Mon Jun 14 10:08:01 CEST 2004 Gabriele Bartolini <angusgb@users.sourceforge.net>
|
||
|
|
||
|
* Tagged release htdig-3-2-0b6
|
||
|
|
||
|
Sun 13 Jun 2004 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* db/os_abs.c, (db/os_abs.c.win32 removed):
|
||
|
Re-fix Cygwin bug (#814268, fixed 25 Apr) so that it won't be
|
||
|
clobbered by autotools.
|
||
|
|
||
|
Sat 12 Jun 2004 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htdoc/RELEASE.html: Separated bug fixes from new features
|
||
|
|
||
|
* htdoc/{htdig,htfuzzy}.html, installdir/{htdig,htfuzzy}.1.in:
|
||
|
Added list of database files used
|
||
|
|
||
|
* htdoc/{htdump,htmerge,htnotify,htpurge,hts_general,htstat,rundig}.html:
|
||
|
Hyperlinked COMMON_DIR, BIN_DIR, DATABASE_DIR to attrs.html.
|
||
|
|
||
|
* htcommon/defaults.cc, htdoc/attrs.html.in:
|
||
|
Remove reference to deprecated '-l' option (generate URL log) of htdig.
|
||
|
|
||
|
Fri Jun 11 11:48:40 2004 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/parser.cc (phrase): Applied Lachlan's patch to prevent endless
|
||
|
loop when boolean keywords appear in a phrase in boolean match method.
|
||
|
|
||
|
Fri Jun 11 11:26:56 2004 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* db/hash.c (CDB___ham_open): Applied Red Hat's h_hash patch, to ensure
|
||
|
that hash function always set to something valid.
|
||
|
|
||
|
Fri Jun 11 10:53:49 2004 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* installdir/HtFileType: Added -f to rm command.
|
||
|
|
||
|
* htsearch/parser.cc (perform_or): Added missing & in if clause.
|
||
|
|
||
|
* contrib/htdig-3.2.0.spec: Updated for 3.2.0b6.
|
||
|
|
||
|
* installdir/Makefile.{am,in}: Don't stick $(DESTDIR) in HtFileType.
|
||
|
|
||
|
Thu Jun 10 16:39:36 CEST 2004 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htcommon/conf_(lexer.lxx,parser.yxx): applied Gilles' patch (April 22)
|
||
|
which features:
|
||
|
- improved error handling, gives file name and correct line number,
|
||
|
even if using include files
|
||
|
- allows space before comment, because otherwise it would just complain
|
||
|
about the "#" character and go on to parse the text after it as a
|
||
|
definition
|
||
|
- allows config file with an unterminated line at end of file, by
|
||
|
pushing an extra newline token to the parser at EOF
|
||
|
- parser correctly handles extra newline tokens, by moving this
|
||
|
handling out of simple_expression, and into simple_expression_list
|
||
|
and block, as simple_expression must return a new ConfigDefaults
|
||
|
object and a newline token doesn't cut it (caused segfaults when
|
||
|
dealing with fix above)
|
||
|
* htcommon/conf_lexer.cxx: Regenerate using flex 2.5.31.
|
||
|
* htcommon/conf_parser.cxx: Regenerate using bison 1.875a.
|
||
|
|
||
|
Wed Jun 9 12:32:47 2004 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/HTML.cc (do_tag): Fixed meta date handling fix of June 3 to
|
||
|
ensure null byte gets put in by get() call.
|
||
|
|
||
|
Wed 9 Jun 2004 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* contrib/doc2html/doc2html.pl, installdir/mime.types:
|
||
|
Add support for OpenOffice.org documents (#957305)
|
||
|
|
||
|
Sat 5 Jun 2004 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* test/t_htdig, test/t_factors: fix tests for non-gnu/linux systems.
|
||
|
|
||
|
Sat 5 Jun 2004 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htdoc/cf_generate.pl: Hyperlink to simplify finding the defaults of
|
||
|
attributes defined in terms of others (e.g.,
|
||
|
accents_db->database_base->database_dir).
|
||
|
* htdoc/attrs.html.in: regenerated using cf_generate.pl
|
||
|
|
||
|
Sat 5 Jun 2004 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htcommon/defaults.cc: Escaped new-line in "allow_spaces_in_url" entry.
|
||
|
Set no_next_page_text to ${next_page_text}; likewise no_prev_page_text.
|
||
|
|
||
|
Fri Jun 4 10:23:53 CEST 2004 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htcommon/URL.cc: added "allow_space_in_url" (from fileSpace.1 patch)
|
||
|
* htcommon/defaults.[cc,xml]: added documentation of allow_space_in_url
|
||
|
* htdoc/attrs.html.in: regenerated using cf_generate.pl
|
||
|
* htdoc/cf_byname.html: ditto
|
||
|
* htdoc/cf_byprog.html: ditto
|
||
|
* htdoc/RELEASE.html: updated with info regarding this attribute
|
||
|
|
||
|
Thu Jun 3 16:04:23 2004 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/HTML.cc (do_tag): Fixed meta date handling to avoid inadvertently
|
||
|
matching names like DC.Date.Review.
|
||
|
|
||
|
Thu Jun 3 10:01:50 CEST 2004 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htdoc/RELEASE.html: updated release notes and changes
|
||
|
* htdoc/THANKS.html: updated the 'thanks' section
|
||
|
|
||
|
Thu Jun 3 09:32:52 CEST 2004 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* global: updated with 'autoreconf -if' (autoconf 2.59, libtool 1.5.6
|
||
|
and automake 1.7.9)
|
||
|
|
||
|
Wed Jun 2 19:03:14 CEST 2004 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* contrib/rtf2html: added the rtf2html.c source as modified by David Lippi
|
||
|
and Gabriele Bartolini of the Comune di Prato. The source code is now
|
||
|
released under GNU GPL and included in the ht://Dig package.
|
||
|
|
||
|
Tue Jun 1 20:23:40 CEST 2004 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htcommon/HtSGMLCodec.cc: changed ¤ to €
|
||
|
|
||
|
Fri 28 May 2004 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* Most files: Update copyright to 2004
|
||
|
|
||
|
Sun 23 May 2004 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htdocs/FAQ.html: Sync with maindocs
|
||
|
|
||
|
Sun 23 May 2004 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* configure, configure.in:
|
||
|
Resolve variables (e.g., BINDIR) copied into attrs.html,
|
||
|
without introducing "NONE" prefix detected by Gabriele.
|
||
|
|
||
|
Sun 23 May 2004 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* .version, htdoc/RELEASE.html, htdoc/where.html,
|
||
|
htdoc/attrs.html.in, htdoc/cf_byname.html, htdoc/cf_byprog.html:
|
||
|
Prepare docs for release of 3.2.0b6.
|
||
|
|
||
|
Mon Apr 26 15:12:22 2004 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htfuzzy/Soundex.cc (generateKey): Applied Alex Kiesel's fix to prevent
|
||
|
segfaults when word has no letters.
|
||
|
|
||
|
Sun 25 Apr 2004 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htdig/HTML.cc: Handle empty noindex_start/noindex_end lists.
|
||
|
* htlib/StringList.{cc,h}: const-correctness of Add/Insert/Assign(char*)
|
||
|
|
||
|
* redo mistakenly backed out patch...
|
||
|
|
||
|
Sun 25 Apr 2004 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htsearch/parser.cc: Address (but not fix) bug #934739
|
||
|
If collection->getDocumentRef() on line 889 returns NULL, don't crash.
|
||
|
I'm still trying to work out why it does return NULL -- I don't think
|
||
|
it ever should.
|
||
|
|
||
|
* mistakenly back out previous patch :(
|
||
|
|
||
|
Sun 25 Apr 2004 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htdig/Retriever.{h,cc}, htcommon/defaults.cc, htdoc/FAQ.html:
|
||
|
Add store_phrases attribute. If it is false, htdig only stores the
|
||
|
first occurrence of each word in a document. This reduces the database
|
||
|
size dramatically, and slightly increases digging speed.
|
||
|
|
||
|
Sun 25 Apr 2004 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* db/{aclocal.m4,configure,os_abs.c.win32}, STATUS, htdoc/THANKS.html:
|
||
|
Correctly dected paths beginning C: as absolute paths in cygwin/Win32.
|
||
|
Fixes bug #814268.
|
||
|
|
||
|
Sun 25 Apr 2004 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htdig/Retriever.cc:
|
||
|
Gilles's patch to avoid regex compile for every URL encountered.
|
||
|
|
||
|
Sun 25 Apr 2004 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* contrib/htdig-3.2.0.spec:
|
||
|
Karl Eichwalder's patch to use mktemp to create safe temp file.
|
||
|
|
||
|
Wed Apr 7 17:12:33 2004 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/Retriever.cc (IsValidURL): Fixed bug #931377 so bad_extensions
|
||
|
and valid_extensions not thrown off by periods in query strings.
|
||
|
|
||
|
Mon Mar 15 11:56:04 CET 2004 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htsearch/Display.cc: changed (and fixed) the date factor formula as
|
||
|
Lachlan and David Lippi suggested, in order not to give negative results.
|
||
|
|
||
|
Fri Mar 12 09:13:28 CET 2004 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* configure.in: removed 'eval' expressions which caused the 'NONE' prefix
|
||
|
path to be instantiated and the make script to hang
|
||
|
* acinclude.in: fixed AC_DEFINEs for SSL and ZLIB check macros, which prevented
|
||
|
autoheader (and therefore autoreconf) to correctly work
|
||
|
* moved manual pages from htdoc to installdir
|
||
|
* htdoc/[manpages].in: removed
|
||
|
* installdir/*.[1,8]: removed man pages (htdig-pdfparser.1, htdig.1,
|
||
|
htdump.1, htfuzzy.1, htload.1, htmerge.1, htnotify.1, htpurge.1,
|
||
|
htsearch.1, htstat.1, rundig.1, htdigconfig.8)
|
||
|
* installdir/*.[1,8].in: added pre-configure man pages (htdig-pdfparser.1.in,
|
||
|
htdig.1.in, htdump.1.in, htfuzzy.1.in, htload.1.in, htmerge.1.in, htnotify.1.in,
|
||
|
htpurge.1.in, htsearch.1.in, htstat.1.in, rundig.1.in, htdigconfig.8.in)
|
||
|
* regenerated configure scripts with autoreconf
|
||
|
* fixes bug #909674
|
||
|
|
||
|
Sat 21 Feb 2004 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* installdir/HtFileType: Use mktemp to create safe temp file (bug #901555)
|
||
|
|
||
|
Wed Feb 25 11:14:45 CET 2004 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htdocs/THANKS.html: added Robert Ribnitz to the 'thanks' page and fixed
|
||
|
Nenciarini's position (it was not in alphabetical order - sorry!).
|
||
|
|
||
|
Wed Feb 25 11:02:37 CET 2004 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* installdir/*.[1,8]: added man pages (htdig-pdfparser.1, htdig.1,
|
||
|
htdump.1, htfuzzy.1, htload.1, htmerge.1, htnotify.1, htpurge.1,
|
||
|
htsearch.1, htstat.1, rundig.1, htdigconfig.8) provided by
|
||
|
Robert Ribnitz <ribnitz at linuxbourg.ch> of the Debian Project
|
||
|
* installdir/Makefile.am: prepared the automake script for correctly
|
||
|
handling the man pages
|
||
|
|
||
|
Sat 21 Feb 2004 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htsearch/htsearch.cc:
|
||
|
Back out change of 21 December, as it causes problems with characters
|
||
|
which *should* be unencded, like /
|
||
|
|
||
|
Thu 19 Feb 2004 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* aclocal.m4, acinclude.m4, configure.in:
|
||
|
Remove duplicate tests for zlib
|
||
|
Fix tests for SSL (Fixes bug #829081)
|
||
|
Fix configure --help formatting
|
||
|
|
||
|
* htdoc/*.[18].in, htdoc/Makefile.am, configure.in: Added man pages
|
||
|
|
||
|
* htdoc/attrs.html.in, htdoc/cf_generate.pl, htdoc/Makefile.am:
|
||
|
Fill in #define'd attribs (Fixes bug #692125)
|
||
|
|
||
|
* test/Makefile.am: Incorporate new tests in make check
|
||
|
|
||
|
* test/t_htdig, test/t_parsing: suppress unwanted diagnostics
|
||
|
|
||
|
* STATUS: list Cygwin bug (#814268)
|
||
|
|
||
|
* htcommon/default.cc:
|
||
|
added wordlist_cache_inserts, remove worlist_cache_dirty_level
|
||
|
|
||
|
* configure, */Makefile.in, */Makefile, htdoc/cf_by{name,prog}.html:
|
||
|
regenerated
|
||
|
|
||
|
Fri 13 Feb 2004 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* db/mp_cmpr.c: Fix bug with --without-zlib
|
||
|
|
||
|
Sun 8 Feb 2004 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htcommon/URL.cc: Make server_alias case insensitive.
|
||
|
|
||
|
* htdig/Document.cc: Don't hex-decode twice. (Caused problems with names
|
||
|
like file%20name)
|
||
|
|
||
|
* htdig/Retriever.cc: Test validity of URL value *before* calling
|
||
|
signature(), as that implictly normalises, and confuses
|
||
|
limit_normalised vs limit_urls_to
|
||
|
|
||
|
* htdig/htdig.cc: Remove stale md5_db if -i specified
|
||
|
|
||
|
* installdir/htdig.conf: Set common_url_parts to contain all strings
|
||
|
which *must* be in a valid URL. Probably contains whole domain name,
|
||
|
so more compression than using standard strings.
|
||
|
|
||
|
* htcommon/defaults.cc: Update docs. Remove default "bad_extensions"
|
||
|
from common_url_parts, and add .shtml
|
||
|
|
||
|
* test/t_htdig, test/t_htdig_local: Update self-tests
|
||
|
|
||
|
Tue Feb 3 18:06:38 CET 2004 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htcommon/HtConfiguration.cc: changed the Find method in order not to
|
||
|
ignore empty string results for string attributes whenever they are
|
||
|
defined in the configuration file by the user
|
||
|
* htdig/Document.cc: fixed bugs in handling the http_proxy,
|
||
|
http_proxy_authorization, authorization attributes
|
||
|
* htlib/Configuration.[h,cc]: added the Exists method in order to query
|
||
|
whether an attribute's definition is present in the configuration
|
||
|
dictionary (before it was checked against its string's length which
|
||
|
prevented empty attributes to be correctly used)
|
||
|
* these changes fix bug #887552
|
||
|
|
||
|
Sun 18 Jan 2004 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htcommon/URL.cc, test/url.cc:
|
||
|
Rename "allow_dbl_slash" to "allow_double_slash", to match defaults.cc
|
||
|
|
||
|
* htcommon/default.cc, htdoc/{hts_temlates,attrs}.html:
|
||
|
Explain that keywords_factor applies to meta keywords. Fix old typo.
|
||
|
|
||
|
* test/t_{factors,templates}, test/htdocs/set1/{title.html,bad_local.htm}
|
||
|
* test/conf/entry-template:
|
||
|
Expanded test suite.
|
||
|
|
||
|
Sat 17 Jan 2004 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* test/t_{parsing,htdig_local,factors,templates},
|
||
|
* test/htdocs/set1/title.html:
|
||
|
Expanded test suite.
|
||
|
|
||
|
Sat 17 Jan 2004 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htcommon/DocumentRef.cc:
|
||
|
Fix old-style use of HtConfiguration, so defaults are read correctly.
|
||
|
Causes max_descriptions to be treated correctly.
|
||
|
|
||
|
* htcommon/default.cc, htdoc/{hts_temlates,attrs,cf_byname,cf_byprog}.html:
|
||
|
Explain that max_description{s,_length} don't affect indexing -- only
|
||
|
text used to fill in template variables.
|
||
|
|
||
|
Mon 12 Jan 2004 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* Very many files: Fix bug #873965
|
||
|
Replace C++ style comments with C style comments in all C files, and .h
|
||
|
files they include.
|
||
|
Also, change //_WIN32 to /* _WIN32 */ in .cc files for uniformity.
|
||
|
|
||
|
Mon 12 Jan 2004 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* test/t_parsing, test/test_functions.in: Add new tests
|
||
|
* htcommon/default.cc, htdoc/hts_templates.html: Cross-ref documentation.
|
||
|
|
||
|
Mon Dec 29 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htdig/Retriever.cc:
|
||
|
Fix bug in which validity of first URL from each server was not checked.
|
||
|
|
||
|
Mon Dec 29 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htdig/htdig.cc, htdoc/htdig.html: Fix bug #845054
|
||
|
Fix behaviour of -m and additional list of urls at the end of a command.
|
||
|
In either case, "-" denotes stdin.
|
||
|
|
||
|
Mon Dec 29 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* installdir/rundig, installdir/Makefile.{in,am}: Address bug #860708
|
||
|
Make bin/rundig -a handle multiple database directories
|
||
|
|
||
|
Sun Dec 21 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htsearch/htsearch.cc:
|
||
|
Improve handling of restrict/exclude URLs with spaces or encoded chars
|
||
|
|
||
|
Sun Dec 21 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htsearch/HtURLSeedScore.cc, htsearch/SplitMatches.cc: Fix bug #863860
|
||
|
Split patterns at "|".
|
||
|
For SplitMatches, make "*" only match if all other patterns fail.
|
||
|
|
||
|
Sun Dec 14 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htdig/Server.cc: Fix bug #851303.
|
||
|
Allow indexing if robots.txt has an empty "disallow".
|
||
|
|
||
|
* test/t_htdig, test/t_htsearch, test/htdocs/robots.txt:
|
||
|
Tests for the above.
|
||
|
|
||
|
Sun Dec 14 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htdig/htdig.cc, test/t_factors: Warn if config file has obsolete fields.
|
||
|
|
||
|
Sun Dec 14 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htsearch/Display.cc: Apply Gilles's patch for ellipses bug #844828.
|
||
|
|
||
|
Sun Dec 14 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* test/{t_validwords,t_templates,t_fuzzy,t_factors}
|
||
|
* test/{set_attr,synonym_dict,dummy.stems,dummy.affixes,bad_word_list}
|
||
|
* test/conf/main-template test/htdocs/set1/{site2.html,site4.html}:
|
||
|
Added four new tests to test suite. Not included in "make check",
|
||
|
but can be run explicitly by "make TESTS=t_... check".
|
||
|
|
||
|
Sun Dec 14 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htcommon/conf_lexer.{lxx,cxx}:
|
||
|
Back out changes to try to accept files without EOL :(
|
||
|
|
||
|
Sat Dec 13 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htcommon/defaults.{cc,xml}, htdoc/{attrs,cf_byprog}.html:
|
||
|
Fix "used by" for max_excerpts, and resulting hyperlinks.
|
||
|
|
||
|
Sat Nov 22 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htcommon/conf_lexer.{lxx,cxx}, htcommon/conf_parser.{yxx,cxx}:
|
||
|
Partially address bug #823455.
|
||
|
Don't complain if config file doesn't end in EOL.
|
||
|
Should the grammar be fixed not to need EOL?
|
||
|
Report errors to stderr, not stdout, as they confuse the web server.
|
||
|
|
||
|
Sun Nov 9 14:44:02 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* Tagged release htdig-3-2-0b5
|
||
|
|
||
|
Sat Nov 8 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htcommon/default.cc, htsearch/parser.cc: Fix bug #825877
|
||
|
Reduce backlink_factor to comparable with other factors, and
|
||
|
interpret multimatch_factor as the *bonus* given for multiple matches.
|
||
|
|
||
|
Sat Nov 1 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htsearch/parser.cc: Fix bug #806419. Ignore bad words at start of phrase.
|
||
|
|
||
|
Tue Oct 28 11:58:06 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htdig/htdig.cc: set the debug level when we are importing a cookie file.
|
||
|
Fix bug #831478.
|
||
|
|
||
|
Mon Oct 27 17:13:02 2003 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/Server.cc: Fix bug #831407. Make sure time properly reset after
|
||
|
delay completed, so that it doesn't allow 2 connections per delay.
|
||
|
|
||
|
Mon Oct 27 15:57:38 2003 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdoc/THANKS.html: Added Lachlan, Jim and Neal to the active developers
|
||
|
list.
|
||
|
|
||
|
Sun Oct 26 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htdoc/hts_templates.html: Clarify that PREV/NEXTPAGE template variables
|
||
|
are empty if there is only one page, ignoring no_{prev,next}_page_text.
|
||
|
|
||
|
Sun Oct 26 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htcommon/defaults.cc: Fixed documentation to close bug #829767
|
||
|
Clarified that noindex_start/end do not get replaced by whitespace.
|
||
|
Also removed spurious '>' from start of boolean_syntax_errors, and
|
||
|
added missing '#' to many local <a href> tags.
|
||
|
|
||
|
Sun Oct 26 12:42:27 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htcommon/defaults.cc: Fixed description of 'head_before_get' after
|
||
|
Lachlan fixes.
|
||
|
* htdoc/attrs.html: rerun cf_generate.pl
|
||
|
|
||
|
Sat Oct 25 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htsearch/Display.cc: Fix #829761.
|
||
|
If last component of the URL is used as a title, URL-decode it.
|
||
|
|
||
|
Sat Oct 25 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htdig/Server.cc: Fix #829754. Avoid calculations with negative time
|
||
|
|
||
|
Fri Oct 24 17:17:15 2003 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdoc/htdig.html, htdoc/meta.html, htdoc/require.html: Update URL for
|
||
|
the Standard for Robot Exclusion.
|
||
|
|
||
|
* htdoc/htmerge.html: Added two clarifications to -m option description.
|
||
|
|
||
|
* htdoc/cf_types.html: Make clear distinction between String List and
|
||
|
Quoted String List.
|
||
|
|
||
|
Fri Oct 24 15:30:08 2003 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/Display.cc: Fix bug #829746. Applied Niel Kohl's fix for this,
|
||
|
to check if words input given before trying to use it, to avoid NULL
|
||
|
argument to syslog().
|
||
|
|
||
|
Fri Oct 24 15:15:53 2003 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/Display.cc: Fix bug #578570. The enddate handling now works
|
||
|
correctly for a large, negative startday value.
|
||
|
|
||
|
Fri Oct 24 12:47:51 2003 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/HTML.cc (ctor): Fix obvious typo in metadatetags.Pattern setting.
|
||
|
|
||
|
Thu Oct 23 10:27:18 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htcommon/default.cc: Fix bug #828808. Default startyear to empty
|
||
|
Document "startyear defaults to 1970 if a start/end date set".
|
||
|
|
||
|
Thu Oct 23 12:14:30 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htdig/htdig.cc: restored the code before Oct 21 (fixes ##828628)
|
||
|
|
||
|
Thu Oct 23 11:41:15 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htdig/Retriever.[h,cc]: removed 'head_before_get' overriding by
|
||
|
restoring the code before Oct 21.
|
||
|
* htdig/Document.[h,cc]: ditto, with the exception of detaching the HEAD
|
||
|
before GET mechanism from the persistent connections'.
|
||
|
* htcommon/defaults.cc: improved documentation (even though it needs
|
||
|
corrections by an english-speaking developer).
|
||
|
* These changes fix bug #828628
|
||
|
|
||
|
Wed Oct 22 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htsearch/parser.cc: Applied Neal's patch to fix bug #823403
|
||
|
Documents only added to search list if they were successfully dug.
|
||
|
Lines 237-238 of htsearch/Display.cc
|
||
|
if (!ref || ref->DocState() != Reference_normal)
|
||
|
continue;
|
||
|
should now be redundant. (Left in to be defensive.)
|
||
|
|
||
|
Tue Oct 21 11:04:56 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htdig/Retriever.h: added the 'RetrieverType' enum and an object variable
|
||
|
for storing the type of dig we are performing (default initial);
|
||
|
* htdig/Retriever.cc: changed constructor in order to handle the type,
|
||
|
added some debugging explanation regarding the override of the
|
||
|
'head_before_get' attribute, added checks regarding an empty
|
||
|
database of URLs to be updated (set the type to initial).
|
||
|
* htdig/Document.h: added the attribute 'is_initial' which stores the
|
||
|
information regarding the type of indexing (initial or incremental)
|
||
|
we are currently performing. Added access methods (get-and-set-like)
|
||
|
* htdig/Document.cc: modified the logic of the HeadBeforeGet settings during
|
||
|
the retrieval phase, in order to always override user's settings in
|
||
|
an incremental dig and automatically set the 'HEAD' call in this case.
|
||
|
* htcommon/defaults.cc: modified the default value of 'head_before_get' and a bit
|
||
|
of its explanation.
|
||
|
* htnet/HtHTTP.cc: detached the HEAD before GET mechanism to the persistent
|
||
|
connections one
|
||
|
* htdig/Server.cc: added one level of debugging to the display of the
|
||
|
server settings in the server constructor
|
||
|
|
||
|
Fri Oct 17 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htword/WordType.cc, htcommon/defaults.cc: Patched to fix bug #823083
|
||
|
Don't assume IsStrictChar returns false for digits.
|
||
|
Clarify behaviour of allow_numbers in the documentation.
|
||
|
|
||
|
Fri Oct 17 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htcommon/defaults.cc: Patched to fix bug #823455
|
||
|
Escaped "$" in valid_punctuation, and add warnings about $, \ and `.
|
||
|
|
||
|
Wed Oct 15 11:12:52 2003 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/Server.cc (robotstxt): Patched to fix bug #765726.
|
||
|
Don't block paths with subpaths excluded by robots.txt, and make
|
||
|
sure any regex meta characters are properly escaped.
|
||
|
|
||
|
Tue Oct 14 11:54:07 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htnet/HtHTTP.cc: add an empty Accept-Encoding header - this inform the
|
||
|
server that htdig is only able to manage documents that are not encoded
|
||
|
(if no Accept-Encoding is sent, the server assumes that the client is
|
||
|
capable of handling every content encoding - i.e. zipped documents with
|
||
|
Apache's mod_gzip module). Partial fix of bug #594790 (which now becomes a
|
||
|
feature request)
|
||
|
|
||
|
Mon Oct 13 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htfuzzy/Regex.cc: Search for regular expression. (Used to ignore it!)
|
||
|
|
||
|
* htfuzzy/Speling.cc, htword/{WordList.cc,WordList.h,WordKey.cc,WordKey.h}:
|
||
|
When looking in word database for misspelt words, don't ask to match
|
||
|
trailing numeric fields in database key.
|
||
|
|
||
|
* htcommon/defaults.cc, htdoc/htfuzzy.cc: Update docs.
|
||
|
|
||
|
Sun Oct 12 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htsearch/htsearch.cc:
|
||
|
Fix bug if fuzzy algorithms produced no search words.
|
||
|
Send all debugging output to cerr not cout. More debugging output.
|
||
|
|
||
|
Sun Oct 12 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htdig/{Retriever,Server}.cc: Back out the previous.
|
||
|
Gilles pointed out inconsistency with Retriever::IsValidURL().
|
||
|
|
||
|
Sun Oct 5 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htdig/{Retriever,Server}.cc: Jim Cole's patch to bug #765726.
|
||
|
Don't block paths with subpaths excluded by robots.txt.
|
||
|
|
||
|
Sun Oct 5 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htsearch/htsearch.cc: Highlight phrases containing stop words
|
||
|
* test/t_htsearch, test/conf/htdig.conf.in: Tests for the above
|
||
|
|
||
|
Sat Sep 27 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* test/{test_functions.in,t_htdig,t_htdig_local,t_htnet}:
|
||
|
Don't assume shell "." command passes arguments. (Doesn't on FreeBSD.)
|
||
|
|
||
|
Sat Sep 27 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htlib/HtDateTime.h, htnet/HtCookie.cc:
|
||
|
Avoid ambiguous function call on systems (HP-UX) where time_t=int
|
||
|
|
||
|
Fri Aug 29 09:35:46 MDT 2003 Neal Richter <nealr at rightnow.com>
|
||
|
|
||
|
* removed references to CDB___mp_dirty_level ,CDB_set_mp_diry_level()
|
||
|
& CDB_get_mp_diry_level()
|
||
|
|
||
|
* The config verb 'wordlist_cache_dirty_level' was left for possible use in
|
||
|
the future.
|
||
|
|
||
|
Thu Aug 28 15:11:21 MDT 2003 Neal Richter <nealr at rightnow.com>
|
||
|
|
||
|
* Changed db/LICENSE file to new LGPL compatible license from Sleepycat
|
||
|
Software -- Thanks Sleepycat!
|
||
|
|
||
|
* Reverted to Revision 1.2 or db/mp_alloc.c The recent changed cuased
|
||
|
large DB growth. Strangely the files contained no 'new' data, they were
|
||
|
just much larger. Looks like the pages were being flushed too often????
|
||
|
|
||
|
Thu Aug 28 12:41:22 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* global: updated with 'autoreconf -if' (autoconf 2.57, libtool 1.5.0a and
|
||
|
automake 1.7.6)
|
||
|
* 'make check' successful on: AMD64 Linux 2.4, Alpha Linux 2.2,
|
||
|
RedHat Linux 7.3 (2.4), SPARC Ultra60 Linux 2.4,
|
||
|
Sparc R220 Sun Solaris (5.8).
|
||
|
* README.developer: added further info
|
||
|
|
||
|
Thu Aug 28 12:00:10 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* db/[config.guess,config.sub,install-sh,ltmain.sh,missing]: added in the
|
||
|
database directory (this way 'make dist' goes on); I have not been able to
|
||
|
tell the db/configure script to get the 'top_srcdir' ones (which should be
|
||
|
the default behaviour). Maybe in the future we'll look for this.
|
||
|
|
||
|
Thu Aug 28 11:53:48 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* db/configure.in: changed AC_PROG_INSTALL() to AC_PROG_INSTALL and removed
|
||
|
AC_CONFIG_AUX_DIR; this implies that autotools copies will be made for the
|
||
|
db directory as well.
|
||
|
|
||
|
Thu Aug 28 11:36:42 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* [htcommon,htdb,htdig,htfuzzy,htlib,htnet,htsearch,httools,htword,test]/Makefile.am:
|
||
|
added the option above to every *_LDFLAGS
|
||
|
|
||
|
Thu Aug 28 11:30:39 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* Makefile.am: removed acconfig.h from the EXTRA_DIST list
|
||
|
|
||
|
Thu Aug 28 11:25:07 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* configure.in: removed portability checks for error, stat and lstat that
|
||
|
caused a compile errors on Solaris. Added the '-mimpure-text'
|
||
|
extra ld flag for GCC on solaris systems (a linkage error occurs
|
||
|
when libstdc++ is not shared)
|
||
|
|
||
|
Thu Aug 28 11:22:57 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* include/Makefile.am: changed htconfig.h.in into config.h.in
|
||
|
|
||
|
Thu Aug 28 11:16:19 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htlib/error.[h,c]: removed for now, until replacement functions will be
|
||
|
correctly performed.
|
||
|
|
||
|
Thu Aug 28 11:11:32 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htdoc/cf_generate.pl: fixed an error when opening tail and head files
|
||
|
* Makefile.am: enabled rebuild from a different directory (it is used
|
||
|
my 'make dist')
|
||
|
|
||
|
Thu Aug 28 10:46:35 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htlib/malloc.c: modified according to autoconf specifications as far
|
||
|
as replacement functions are regarded
|
||
|
* htlib/[lstat, stat].c: removed for now
|
||
|
|
||
|
Thu Aug 28 10:40:58 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htdoc/cf_generate.pl: accept an optional parameter (top source directory)
|
||
|
* htcommon/defaults.cc: fixed some broken lines which prevented
|
||
|
cf_generate.pl from correctly working
|
||
|
* htdoc/Makefile.am: modified the automake file for passing the top
|
||
|
source directory to cf_generate.pl
|
||
|
* htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html:
|
||
|
Regenerated using cf_generate.pl.
|
||
|
|
||
|
Tue Aug 26 12:25:40 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* configure.in: removed AC_FUNC_MKTIME because it may not work properly
|
||
|
and added default replacement directory (htlib) for future uses
|
||
|
* htlib/Makefile.am: back-step with re-inclusion of mktime.c in the
|
||
|
list of files to be always compiled (caused linking errors
|
||
|
for the __mktime_internal function)
|
||
|
* global: updated with 'autoreconf -if'
|
||
|
|
||
|
Sun Aug 24 12:44:29 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* updated with 'autoreconf -if': autoconf 2.57, automake 1.7.6 and
|
||
|
libtool 1.5.0a (autotools that come with Debian SID)
|
||
|
|
||
|
Sun Aug 24 12:39:34 EST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* configure.in: moved AC_PROG_LEX to AM_PROG_LEX
|
||
|
* db/configure.in: enabled AM_MAINTAINER_MODE which prevented users without
|
||
|
autotools to configure and compile the program (relatively to the db
|
||
|
directory)
|
||
|
* include/htconfig.h: previously excluded from the branch (severe error!)
|
||
|
|
||
|
Mon Jul 21 20:54:47 CEST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htlib/(malloc|error|lstat|stat|realloc).c: added for cross-compiling
|
||
|
reasons (as suggested by automake)
|
||
|
* htlib/error.h: ditto
|
||
|
* db/acconfig.h: removed as suggested by autotools' new versions
|
||
|
* configure.in: removed AC_PROG_RANLIB (overriden by AC_PROG_LIBTOOL)
|
||
|
* updated as of rerun 'autoreconf -if'
|
||
|
|
||
|
Mon Jul 21 10:08:24 CEST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* Patch provided by Marco Nenciarini <mnencia at linux.it> has been
|
||
|
completely applied; the patch adds support for detection
|
||
|
of standard C++ library
|
||
|
* all sources using <iostream.h> <fstream.h> <iomanip.h>: modified
|
||
|
to use standard ISO C++ library, if present
|
||
|
* db/configure scripts: modified for autoconf 2.57
|
||
|
|
||
|
Mon Jul 21 09:59:16 CEST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* [.,*]/Makefile.in: regenerated by new automake against new configure.in
|
||
|
* Makefile.config: now looking for the global configuration file
|
||
|
in the source directory
|
||
|
|
||
|
Mon Jul 21 09:49:22 CEST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* configure.in: completely rewritten, deprecated directives have
|
||
|
been removed and now version 2.57 is a prerequisite.
|
||
|
* acinclude.m4: moved all the macros here
|
||
|
* aclocal.m4, configure: regenerated by aclocal and autoconf
|
||
|
* acconfig.h: removed as now it is deprecated
|
||
|
* include/htconfig.h.in: removed, as 'config.h.in' is preferred
|
||
|
and auto-generated
|
||
|
* config.[guess,sub]: updated with newer versions
|
||
|
|
||
|
Tue Jul 8 16:29:44 2003 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/parser.cc (checkSyntax): Fixed boolean_syntax_errors
|
||
|
handling to work over multiple config files.
|
||
|
|
||
|
Mon Jul 7 00:41:55 CEST 2003 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* Updated to autoconf 2.57, libtool 1.5 and automake 1.7.5
|
||
|
* removed acconfig.h files
|
||
|
* autoconf include file is now include/config.h (for autoheader)
|
||
|
* include/htconfig.h.in renamed in include/htconfig.h: now includes
|
||
|
config.h and redefines the bool types
|
||
|
* htlib/HtRegexList.cc, htdig/(Document.cc|ExternalParser.cc): removed
|
||
|
TRUE and FALSE and converted to C++ standard values
|
||
|
|
||
|
Sat Jul 5 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* test/test_functions.in: Fix bugs starting/killing apache
|
||
|
|
||
|
Sat Jul 5 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htcommon/defaults.cc: Disable cache flushing to avoid "page leak".
|
||
|
|
||
|
Tue Jun 24 2003 Neal Richter <nearl at rightnow.com>
|
||
|
|
||
|
* Update Copyright Notices in code & documentation to 2003
|
||
|
|
||
|
* Changed License Notice GPL -> LGPL License change (Decided by HtDig
|
||
|
Board & Membership October 2002
|
||
|
|
||
|
Mon Jun 23 2003 Neal Richter <nearl at rightnow.com>
|
||
|
|
||
|
* Raft of changes. Most todo with Native Win32 support
|
||
|
|
||
|
* TODO: ExternalTranport & ExternalParser are effectively dissabled with
|
||
|
#ifdefs for Native WIN32
|
||
|
|
||
|
* remove global CDB___mp_dirty_level variable and subsitute functions to set/get variable
|
||
|
|
||
|
* Added local copies of GNU LGPL regex, POSIX-like dirent routines, getopt
|
||
|
library and filecopy routines - mainly for Native WIN32 support
|
||
|
|
||
|
* improve IsValidURL with return codes (htdig/Retriever.cc)
|
||
|
|
||
|
* lots of improvements/new-features to libhtdig
|
||
|
|
||
|
Sun Jun 22 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* db/mp_cmpr.c (CDB___memp_cmpr_open):
|
||
|
Make weak compression database standalone to avoid recursion
|
||
|
This *should* fix all of the recent problems with dirty cache etc.
|
||
|
|
||
|
* test/search.cc: Don't take sizeof zero sized array
|
||
|
|
||
|
Fri Jun 20 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* configure,aclocal.m4,acinclude.m4: --with-ssl set CPPFLAGS, not CFLAGS
|
||
|
|
||
|
Fri Jun 20 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* db/configure: Hack which should allow select to be detected on HP/UX
|
||
|
|
||
|
* db/db.c: Replace HAVE_ZLIB with HAVE_LIBZ (as set by configure)
|
||
|
|
||
|
* htword/wordKey.cc: More descriptive error message
|
||
|
|
||
|
(Changes to compile with Sun's C++)
|
||
|
* htnet/{HtCookie.cc,HtFTP.cc,Transport.cc}:
|
||
|
Assign substring of const string to const pointer.
|
||
|
* htsearch/ResultMatch.h:
|
||
|
Allow use of SortType in ResultMatch::setSortType()
|
||
|
* test/search.cc: Don't take sizeof(variable size array)
|
||
|
* htdb/htdb_stat.cc: avoid name clash for global var internal
|
||
|
* htcommon/URL.h, htlib/HtTime.h, htlib/htString.h, htnet/Connection.h,
|
||
|
htword/WordBitCompress.h:
|
||
|
Cast default args of type string literal to type (char*)
|
||
|
|
||
|
* htdocs/require.html: Remove email address.
|
||
|
|
||
|
* htlib/gregex.h: Avoid warning if __restrict_arr already defined
|
||
|
|
||
|
Sun Jun 14 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htcommon/defaults.cc:
|
||
|
Set wordlist_cache_dirty_level to 1 (it most conservative value).
|
||
|
Miscellaneous reformatting.
|
||
|
* htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html:
|
||
|
Regenerated using cf_generate.pl.
|
||
|
|
||
|
* htdoc/{require.html,meta.html,all.html,meta.html}:
|
||
|
Update disk usage for phrase searching.
|
||
|
Updated list of supported platforms. More hyperlinks.
|
||
|
|
||
|
Fri Jun 13 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htsearch/Display.cc (setVariables), htdocs/hts_template.html:
|
||
|
Set MATCH_MESSAGE from method_names (for internationalisability).
|
||
|
Removed all trace of hack for config attribute...
|
||
|
|
||
|
Thu Jun 12 14:16:05 2003 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/htsearch.cc (main): Fixed boolean_keywords handling to
|
||
|
work over multiple config files (must destroy old list before
|
||
|
creating new one).
|
||
|
|
||
|
* htcommon/defaults.cc, htsearch/Display.cc (setVariables): Removed
|
||
|
incorrect default value for "config" attribute, and removed hack
|
||
|
that attempted to correct it.
|
||
|
|
||
|
* htdoc/attrs.html: Regenerated using cf_generate.pl.
|
||
|
|
||
|
Thu Jun 12 13:28:01 2003 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htcommon/defaults.cc, htcommon/HtSGMLCodec.cc (ctor): Added
|
||
|
translate_latin1 option to allow disable Latin 1 specific SGML
|
||
|
translations.
|
||
|
|
||
|
* htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html:
|
||
|
Regenerated using cf_generate.pl.
|
||
|
|
||
|
Mon Jun 9 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htsearch/htsearch.cc: Fixed setupWords loop for junk at end of query
|
||
|
|
||
|
Mon Jun 9 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htsearch/Display.cc: Set CONFIG template variable to the base name
|
||
|
of the config file (no directory or .conf), as expected by htsearch
|
||
|
|
||
|
Mon Jun 9 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* test/test_functions.in: avoid trying killing apache multiple times
|
||
|
|
||
|
* configure,configure.in: Reformat --help output
|
||
|
* htdoc/FAQ.html: Brought up-to-date with main docs
|
||
|
* htdoc/hts_templates.html: added hyperlinks.
|
||
|
* installdir/search.html: Display version
|
||
|
|
||
|
Sun Jun 8 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* configure: Hack to set --disable-bigfile for Solaris (with Sun cc)
|
||
|
and --disable-shared --enable-static for Mac OS X
|
||
|
|
||
|
* test/{test_functions.in,t_htdig,t_htdig_local,t_htnet}:
|
||
|
Only start Apache for tests which need it, and kill it after the test
|
||
|
|
||
|
* contrib/parse_doc.pl: Allow file names containing spaces (from .deb)
|
||
|
|
||
|
Mon Jun 2 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* db/mp_cmpr.c: Add default zlib setting to default_cmpr_info
|
||
|
* htcommon/defaults.cc, htword/WordDBCompress.cc: Fix docs to say
|
||
|
default compression by 8 (not by 3, which I had "fixed" it to...)
|
||
|
|
||
|
* htcommon/conf_lexer.{cxx,lxx}: Avoid warnings, and document hack.
|
||
|
|
||
|
Thu May 29 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* db/mp_cmpr.c: Fix comparison of -1 and unsigned which broke SunOS cc
|
||
|
* htdoc/install.html: Warn SunOS cc users to --disable-bigfile
|
||
|
|
||
|
* htcommon/conf_lexer.cxx: Suppress warnings of unused identifiers
|
||
|
* test/con/htdig.conf2.in: Disable testing of content_classifier
|
||
|
attribute, as didn't work until after installation
|
||
|
|
||
|
Tue May 27 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* db/configure, db/ac{local,include}.m4:
|
||
|
Stop test for zlib from adding -I/default/path (*this* time...)
|
||
|
|
||
|
* htword/DBPage.h: Fix bug introduce in previous patch
|
||
|
|
||
|
* test/Makefile.{in,am}: Replace non-portable make -C X by cd X; make
|
||
|
|
||
|
Tue May 27 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* {,db/}configure, {,db/}ac{local,include}.m4:
|
||
|
Stop test for zlib from adding -I/default/path (broke SunOS cc)
|
||
|
Fix -Wall test if CCC is g++ but CC is not gcc
|
||
|
|
||
|
* test/dbbench.cc: #include <fcntl.h> later, to avoid #define open
|
||
|
causing problems
|
||
|
|
||
|
* includedir/synonyms: Remove trailing blank line which caused warning
|
||
|
* htnet/HtCookieInFileJar.cc,htfuzzy/Synonym.cc: .get() to stop warnings
|
||
|
* htlib/mhash_md5.c: char -> unsigned char to stop warnings
|
||
|
* test/search.cc, htword/WordDBPage.h:
|
||
|
Casts to (int) to stop printf warnings. ALLIGN -> ALIGN
|
||
|
|
||
|
Sat May 24 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htcommon/defaults.cc: Keep more wordlist cache pages clean
|
||
|
|
||
|
* {,db/}configure{,.in}, {,db/}ac{local,include}.m4:
|
||
|
Patch by Richard Munroe to test if -Wno-deprecated needed.
|
||
|
Many bug fixes / extra search paths added.
|
||
|
|
||
|
* include/htconfig.h.in, db/db_config.h.in:
|
||
|
Only '#define const' if not C++ (htword/WordDB.cc uses db_config.h)
|
||
|
* test/dbbench.cc: check for alloca even if gcc
|
||
|
* test/t_url: used grep -C instead of grep -c (for portability)
|
||
|
* db/mp_{alloc,cmpr}.c: Removed/replaced C++ style comments
|
||
|
|
||
|
* htdoc/require.html: Revised list of supported platforms
|
||
|
|
||
|
Thu May 22 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htnet/HtFile.cc: Fix previous .get() patch...
|
||
|
|
||
|
Thu May 22 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htlib/DB_2.cc: Set wordlist_cache_dirty_level before opening
|
||
|
database, to avoid database memory allocation problem.
|
||
|
|
||
|
* db/db_err.c: Make 'fatal' errors actually exit.
|
||
|
|
||
|
* htdig/Document.cc, htsearch/parser.cc, htdig/htdig.cc,
|
||
|
* htnet/Ht{HTTP,File}.cc:
|
||
|
Add .get() to use of strings to avoid compiler warnings (FreeBSD).
|
||
|
|
||
|
Thu May 22 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* ltmain.sh, test/Makefile.in: Hack to list library dependencies
|
||
|
multiple times in g++ command, to get MacOS X to 'make check'.
|
||
|
|
||
|
* test/{search,word}.cc: cast sizeof() to (int) to avoid warnings.
|
||
|
|
||
|
* htdoc/install.html: Documented MacOS X's shared libraries problem.
|
||
|
|
||
|
Sun May 18 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* db/mp_alloc.c: Hopefully the *last* fix for this morning's patch...
|
||
|
|
||
|
* configure, aclocal.m4, acinclude.m4:
|
||
|
Look for httpd modules in .../libexec/httpd for OS X
|
||
|
* test/conf/httpd.conf: Disabled mod_auth_db, mod_log{agent,referer}.
|
||
|
|
||
|
Sun May 18 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* db/db.h.in: Declare variable introduced in db/mp_cmpr.c patch
|
||
|
|
||
|
Sun May 18 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* db/mp.h, db/mp_{alloc,bh,cmpr,region}.c,
|
||
|
* htword/WordDB.cc, htdig/htdig.cc:
|
||
|
Avoid infinite loop if memp_alloc has only dirty,
|
||
|
"weakly compressed" (i.e. overflow) pages.
|
||
|
* htcommon/defaults.cc: Document the above, plus misc updates.
|
||
|
|
||
|
* htword/WordDBPage.h:
|
||
|
Cast sizeof() to (int) in printf()s to avoid compiler warnings.
|
||
|
|
||
|
Sun APR 20 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htdig/htdig.cc: delete db.words.db_weakcmpr if -i specified.
|
||
|
|
||
|
Wed Feb 26 22:10:40 CET 2003 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htnet/HtHTTP.cc: fixed colon (':') problem with HTTP header parsing,
|
||
|
as Frank Passek, Gilles and others suggested, as space is not
|
||
|
mandatory between the field declaration and the field value returned
|
||
|
by the server
|
||
|
|
||
|
Sun Feb 23 10:20:58 CET 2003 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htcommon/defaults.[cc,xml]: added the 'cookies_input_file'
|
||
|
configuration attribute for pre-loading cookies in memory
|
||
|
* htdig/htdig.cc: added the feature above; the code automatically
|
||
|
loads the cookies from the input file into the 'jar' that will be
|
||
|
used during the crawl.
|
||
|
|
||
|
Sun Feb 23 10:16:08 CET 2003 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htnet/HtHTTP.h: removed the NULL pointer check before assigning a
|
||
|
new jar to the HTTP code
|
||
|
|
||
|
Tue Feb 11 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htcommon/defaults.cc: Set default compression_level to 6,
|
||
|
which enables Neal's wordlist_compression_zlib flag.
|
||
|
|
||
|
Tue Feb 11 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htcommon/{DocumentRef.h, HtWordReference.h},
|
||
|
htsearch/WeightWord.{cc,h},
|
||
|
htsearch/parser.{cc,h}, htsearch/htsearch.cc:
|
||
|
Added field-restricted searching, by title:word or author:word
|
||
|
|
||
|
* htdig/ExternalParser.cc, htdig/HTML.{cc,h}, htdig/Parsable.{cc,h},
|
||
|
htdig/Retriever.{cc,h}:
|
||
|
Parse author from <meta ...> tags. Also moved some common
|
||
|
functionality from HTML/ExternalParser into Parsable.
|
||
|
|
||
|
* test/t_htsearch, htcommon/defaults.cc,
|
||
|
htdoc/{TODO.html,hts_general.html,hts_method.html}:
|
||
|
Test and document the above
|
||
|
|
||
|
Sun Feb 9 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htdig/HTML.cc: fix bug in detection of deprecated noindex_start/end
|
||
|
* htsearch/Display.cc: try harder to find value for DBL_MAX #680836
|
||
|
* htcommon/defaults.cc: fixed typos.
|
||
|
|
||
|
Sat Feb 1 13:57:17 CET 2003 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htnet/HtCookie.[h,cc]: allowed printDebug to be passed an ostream object
|
||
|
* htnet/HtCookieMemJar.cc: removed a debug call
|
||
|
|
||
|
Thu Jan 30 19:28:32 CET 2003 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* configure.in: used AC_LIBOBJ instead of deprecated LTLIBOBJS's workaround
|
||
|
* ltconfig: removed as not needed anymore since libtool 1.4
|
||
|
* db/configure.in: added AC_CONFIG_AUX_DIR(../) for letting automake know to use
|
||
|
the main ltmain.sh file
|
||
|
* configure, aclocal.m4, Makefile.in, */Makefile.in, config.guess, config.sub,
|
||
|
install-sh, ltmain.sh, missing, mkinstalldirs: re-generated by autotools:
|
||
|
aclocal, autoconf 2.57, automake 1.6.3 and libtool 1.4.3
|
||
|
* db/aclocal.m4, db/configure, db/mkinstalldirs: ditto
|
||
|
|
||
|
Thu Jan 30 00:16:51 CET 2003 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htsearch/htsearch.cc: removed a warning due to a not-initialized pointer
|
||
|
|
||
|
Wed Jan 29 22:53:25 CET 2003 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* acinclude.m4: included the function for checking against SSL, as
|
||
|
found in the ac-archive.
|
||
|
|
||
|
Tue Jan 28 12:23:16 CET 2003 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htnet/Makefile.am: added HtCookieInFileJar.[h,cc] files
|
||
|
* installdir/cookies.txt: example file for pre-loading HTTP cookies
|
||
|
* installdir/Makefile.am: added cookies.txt
|
||
|
|
||
|
Tue Jan 28 12:16:28 CET 2003 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htnet/HtCookieMemJar.[h,cc]: performed deep copy of the jar in the copy constructor
|
||
|
|
||
|
Tue Jan 28 12:13:44 CET 2003 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htnet/HtCookie.[h,cc]: added the constructor of a cookie object from a line
|
||
|
of a cookie input file (Netscape's way): if an expiration value of '0' is set
|
||
|
through the cookies input file, the cookie is managed as a session cookie.
|
||
|
Improved copy constructor, solving a bug related to the expires field.
|
||
|
|
||
|
Tue Jan 28 12:11:27 CET 2003 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htnet/HtCookieInFileJar.[h,cc]: class for importing cookies from a text file
|
||
|
|
||
|
Tue Jan 28 12:08:20 CET 2003 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htlib/HtDateTime.h: added the constructor HtDateTime(const int)
|
||
|
|
||
|
Sat Jan 25 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htsearch/Display.cc: Convert "<br>\n" in $(DESCRIPTION) to "<br>"
|
||
|
so it can be used in Javascript (feature request #529926).
|
||
|
|
||
|
Tue Jan 21 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* HTML.cc (HTML, parse): Handle noindex_start/end as string lists.
|
||
|
|
||
|
* test/{t_htsearch,htdocs/set1/script}: Test the above
|
||
|
|
||
|
* htcomon/defaults.cc:
|
||
|
Add "<SCRIPT" to default noindex_start/end (feature request #586359).
|
||
|
|
||
|
|
||
|
* htlib/String.cc (operator>> (istream&,String&) ):
|
||
|
Exit loop when getline fails for reasons other than a full buffer.
|
||
|
|
||
|
* htnet/HtFile.cc (File2Mime), installdir/HtFileType:
|
||
|
Allow file names containing spaces.
|
||
|
|
||
|
Sat Jan 11 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htnet/HtFile.cc (Request), htdig/Document.cc (RetrieveLocal),
|
||
|
htcommon/URL.h htcommon/URLTrans.cc:
|
||
|
Decode URL paths before use as local filenames (file:/// & local_urls).
|
||
|
|
||
|
* test/{t_htdig,t_htdig_local,t_htsearch}, test/conf/htdig.conf2.in,
|
||
|
test/htdocs/set1/{index.html,site 1,sub%20dir/empty file.html}:
|
||
|
Tests for the above.
|
||
|
|
||
|
* htcommon/HtConfiguration.cc: brackets around assignment in 'if'.
|
||
|
* test/search.cc (LocationCompare): Only specify default arg once.
|
||
|
|
||
|
Fri Jan 10 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htlib/String.cc (operator>> (istream&,String&) ):
|
||
|
Check status of stream, no return value of get().
|
||
|
Fixes bug (for some C++ libs) where reading stops at a blank line.
|
||
|
|
||
|
Fri Jan 1 2003 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* htnet/HtFile.cc(Ext2Mime,Request), htdig/Document.cc(RetrieveLocal):
|
||
|
Determine local files' MIME types from mime.types, not hard-coded.
|
||
|
URLs matching attribute "bad_local_extensions" must use their true
|
||
|
transport protocol (HTTP for http://, filesystem for file:///).
|
||
|
|
||
|
* htnet/HtFile.cc (File2Mime, Request): For file:/// URLs only,
|
||
|
files without (or with unrecognised) extensions are checked by
|
||
|
the program specfied by the "content_classifier" attribute.
|
||
|
|
||
|
* htnet/htFile.cc (Request): Symbolic links are treated as
|
||
|
redirects, to avoid problems with relative references.
|
||
|
|
||
|
* htcommon/defaults.cc: Documented the above (and added crossrefs).
|
||
|
|
||
|
* test/t_ht{dig,dig_local,search}, test/htdocs/set1/*,
|
||
|
test/conf/htdig.conf2.in: Add tests for bad_local_extensions.
|
||
|
|
||
|
Mon Dec 31 2002 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* configure.in,htfuzzy/EndingsDB.cc,htlib/{HtR,r}egex.h,Makefile.am:
|
||
|
Renamed regex.h to gregex.h and allow use of rx instead.
|
||
|
|
||
|
* htcommon/defaults.cc,htdocs/{attrs,cf_byprog,cf_byname}.html:
|
||
|
Fixed typo in cross-references to restrict and limit_urls_to.
|
||
|
|
||
|
* test/t_htmerge: Re-enabled htmerge command (discarding output).
|
||
|
|
||
|
* test/Makefile,test/conf/htdig.conf3.in: Added conf3 and fixed db path.
|
||
|
|
||
|
Mon Dec 30 2002 Lachlan Andrew <lha at users.sourceforge.net>
|
||
|
|
||
|
* contrib/doc2html/*: Incorporated David Adams' latest version, 3.0.1.
|
||
|
|
||
|
Mon Dec 30 2002 Lachlan Andrew <lha at users.sourcefourge.net>
|
||
|
|
||
|
Forward-ported several patches from 3.1.6:
|
||
|
|
||
|
* htdig/ExternalParser.cc: Added "description_meta_tag_names" attrib.
|
||
|
Added "dc.date|dc.date.created|dc.date.modified" synonyms for "date".
|
||
|
Allow spaces between "url" and "=" in refresh.
|
||
|
Fixed bug in flag positions.
|
||
|
Added "use_doc_date" attribute.
|
||
|
|
||
|
* htdig/HTML.cc: Added "description \_meta_tag_names" attribute.
|
||
|
Added "dc.date|..." synonyms.
|
||
|
Added "ignore_alt_text" attribute.
|
||
|
|
||
|
* htdig/Retriever.cc: Added "ignore_dead_servers" attribute.
|
||
|
Added call to "url.rewrite() in got_href().
|
||
|
|
||
|
* htdig/FAQ.html: Latest version now 3.1.6. Mention old security hole.
|
||
|
Describe external converters for PostScript etc.
|
||
|
Mention pdf_parser not supported in 3.2.
|
||
|
|
||
|
* htdoc/{attrs,cf_byname,cf_byprog}.html: New attributes added
|
||
|
(automatically from defaults.cc).
|
||
|
|
||
|
* htdoc/htmerge.html: Update for multiple database support.
|
||
|
|
||
|
* htdoc/hts_form.html: Describe relative/incomplete dates.
|
||
|
|
||
|
* htdoc/require.html: Describe phrase searching, external parsers,
|
||
|
external transports.
|
||
|
Added some new supported systems. (Commented out as testing
|
||
|
incomplete.)
|
||
|
|
||
|
* htfuzzy/Synonym.cc: Protect against "synonym" entries with one word.
|
||
|
|
||
|
* htlib/String.cc: Protect against negative string lengths.
|
||
|
|
||
|
* htsearch/Display.{cc,h}: Added "search_result_contenttype" attribute,
|
||
|
and corresponding displayHTTPheaders() function.
|
||
|
Rewrite URLs.
|
||
|
Remove old "ANCHOR" variable.
|
||
|
Handle relative dates.
|
||
|
Added "max_excerpts" attribute and buildExcerpts() function.
|
||
|
Added "anchor_target" attribute.
|
||
|
|
||
|
* htsearch/DocMatch.h: Added "orMatches"
|
||
|
|
||
|
* htsearch/htsearch.cc: Added "boolean_keywords" attribute.
|
||
|
Rewrite URLs.
|
||
|
|
||
|
* htsearch/parser.cc: Added "boolean_syntax_errors" attribute.
|
||
|
Added wildcard search.
|
||
|
Fixed bug in perform_phrase() so it now handles "bad words" and
|
||
|
short words properly.
|
||
|
Added "multimatch_factor" to give greater weight to documents matching
|
||
|
multiple "OR" terms.
|
||
|
|
||
|
* htsearch/htparser.h: Added boolean_keywords support.
|
||
|
|
||
|
* htcommon/defaults.{cc,xml}: New attributes added, and enhanced
|
||
|
descriptions
|
||
|
|
||
|
|
||
|
Cleaned code to remove some compiler warnings/errors:
|
||
|
|
||
|
* htcommon/HtConfiguration.cc: Brackets around assignment 'path='
|
||
|
inside 'if'
|
||
|
|
||
|
* htdig/Server.cc, htsearch/Display.cc:
|
||
|
Added ".get()" when strings passed as arguments.
|
||
|
|
||
|
* htlib/StringMatch.h, htword/WordBitCompress.h:
|
||
|
Explicit cast of NULL to (char*)NULL for broken C++ compilers.
|
||
|
|
||
|
|
||
|
Also:
|
||
|
|
||
|
* STATUS: Removed "not all htsearch input parameters handled properly",
|
||
|
"Return all URLs", "Turn on URL parser test",
|
||
|
"htsearch phrase support tests".
|
||
|
Reduced list of things to do for "require.html".
|
||
|
|
||
|
|
||
|
* test/t_htsearch, test/conf/htdig.conf3.in:
|
||
|
Added testing of phrases and boolean_keywords / boolean_syntax_errors.
|
||
|
|
||
|
Thu Nov 28 09:02:46 2002 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* installdir/english.0: Removed S flag from birth, because it doesn't
|
||
|
do what we want (birthes, not births).
|
||
|
|
||
|
Tue Nov 26 23:16:08 2002 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdoc/hts_form.html: Fixed typo in link & description for restrict.
|
||
|
|
||
|
Tue Nov 26 22:30:06 2002 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* installdir/english.0: Patched with Lachlan Andrew's changes, fixing
|
||
|
lots of dubious uses of suffixes to get more appropriate and correct
|
||
|
fuzzy endings expansions.
|
||
|
|
||
|
* installdir/synonyms: Updated with the version contributed by
|
||
|
David Adams, with minor changes. Kept old one as synonyms.original.
|
||
|
|
||
|
Mon Nov 4 10:44:35 CET 2002 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htcommon/URL.[h,cc]: added the assignment operator
|
||
|
|
||
|
Sun Oct 27 09:29:02 2002 Geoffrey Hutchison <ghutchis at localhost>
|
||
|
|
||
|
Merge in word DB zlib patch from Neal Richter.
|
||
|
|
||
|
* db/db.h.in, db/mp_cmpr.c, htword/WordList.cc,
|
||
|
htword/WordDBCompress.h, htword/WordDBCompress.cc: Add support for
|
||
|
using the zlib compression (and compression level) if specified by
|
||
|
the new wordlist_compress_zlib, which is "true" by default.
|
||
|
|
||
|
* htcommon/defaults.cc: Add attribute wordlist_compress_zlib as
|
||
|
above.
|
||
|
|
||
|
* htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html:
|
||
|
Update using cf_generate.pl.
|
||
|
|
||
|
Sat Oct 26 21:59:01 2002 Geoffrey Hutchison <ghutchis at localhost>
|
||
|
|
||
|
Merge in fixes from Lachlan Andrew
|
||
|
|
||
|
* test/Makefile.am, test/Makefile.in, test/t_url, test/url.cc,
|
||
|
test/url.children, test/url.parents, test/url.output: Add URL
|
||
|
tests to the automatic test suite (rather than requiring them to
|
||
|
be run manually).
|
||
|
|
||
|
* */Makefile.in: Regenerate using automake-1.4p6.
|
||
|
|
||
|
* htcommon/URL.cc, htcommon/URL.h: Add new configuration attribute
|
||
|
allow_double_slash to only remove // marks when requested (since
|
||
|
some server-side code uses them), handle initial protocols
|
||
|
without double slashes, and only remove the default doc string
|
||
|
from appropriate protocol URLs (e.g. not file), treat ".//" as a
|
||
|
relative path, and collapse /../ *after* // and /./ handling.
|
||
|
|
||
|
* htcommon/defaults.cc: Add documentation for allow_double_slash,
|
||
|
as well as various documentation cleanups.
|
||
|
|
||
|
* htdig/ExternalTransport.cc: Fix minor bug--recognize service
|
||
|
specified as https:// rather than https.
|
||
|
|
||
|
* htdoc/hts_form.html, htdoc/hts_templates.html: Documentation fixes.
|
||
|
|
||
|
* htsearch/htsearch.cc: Create valid boolean query if "exact" not
|
||
|
specified in search_algorithms by adding the exact word with low
|
||
|
weight. Solves PR#405294.
|
||
|
|
||
|
Fri Oct 4 17:05:06 2002 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htcommon/defaults.xml: Added first-draft XML version of defaults
|
||
|
file. This will eventually be used to generate defaults.cc and
|
||
|
documentation automatically. (As pointed out by Brian White, this
|
||
|
will make the binaries smaller.)
|
||
|
|
||
|
Wed Sep 25 13:56:31 2002 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/HTML.cc (parse): Fixed handling of JavaScript skipping so it
|
||
|
doesn't get confused by "<" in code.
|
||
|
|
||
|
Thu Sep 19 09:04:50 CEST 2002 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htnet/HtHTTP.cc : another check for cookie jar's null pointer
|
||
|
|
||
|
Tue Sep 17 17:41:51 2002 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htcommon/defaults.cc (external_protocols): Fixed table formatting
|
||
|
as suggested by Lachlan Andrew.
|
||
|
|
||
|
Thu Aug 29 21:21:34 CEST 2002 Soeren Vejrup Carlsen <svc at users.sourceforge.net>
|
||
|
|
||
|
* htdig/Document.[h,cc]: first steps in FTP handling. HtFTP.h included and
|
||
|
we now test for the 'ftp' protocol in the Document::Retrieve function.
|
||
|
Has not yet been tested!
|
||
|
|
||
|
* htnet/HtFTP.[h,cc]: added class to handle the FTP-protocol. Very
|
||
|
experimental (has not been tested yet).
|
||
|
|
||
|
Fri Aug 9 13:01:05 2002 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* httools/htnotify.cc (readPreAndPostamble): Check for empty strings
|
||
|
in file names, not just NULL, as suggested by Martin Kraemer.
|
||
|
|
||
|
Wed Aug 7 12:11:31 2002 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/ExternalParser.cc (parse): Fixed to impose max_doc_size
|
||
|
restriction on external converter output which it reads in.
|
||
|
|
||
|
Tue Aug 6 18:21:11 CEST 2002 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* these changes were suggested by David Reed <DReed1 at citgo.com> (thanks)
|
||
|
|
||
|
* htdig/Document.cc: manage cookies via SSL
|
||
|
|
||
|
* htnet/HtCookie.[h,cc]: features both RFC2109 and Netscape version
|
||
|
|
||
|
* htnet/HtCookieJar.cc: ditto
|
||
|
|
||
|
Tue Aug 6 17:12:22 CEST 2002 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htcommon/defaults.cc: added the 'http_proxy_authorization' attribute.
|
||
|
Needs revision due to my usual *spaghetti* english. :-)
|
||
|
|
||
|
* htdig/Document.[h,cc]: proxy authorization is now enabled
|
||
|
|
||
|
Tue Aug 6 09:28:39 CEST 2002 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htnet/Connection.[h,cc]: IP address storing as string (sync with ht://Check)
|
||
|
|
||
|
* htnet/Transport.[h,cc]: HTTP Proxy and Basic credentials handling moved here (ditto)
|
||
|
through the use of a protected static method
|
||
|
|
||
|
* htnet/HtHTTP.h: SetCredentials declared to be virtual (unnecessary because inherited,
|
||
|
but gives better understanding); new method SetProxyCredentials for
|
||
|
proxy authorization.
|
||
|
|
||
|
* htnet/HtHTTP.cc: HTTP header Proxy-Authorization is now handled. The
|
||
|
SetCredentials and SetProxyCredentials methods now make use of the
|
||
|
Transport::SetHTTPBasicAccessAuthorizationString method, in order to
|
||
|
write the string for negotiating the access.
|
||
|
|
||
|
Fri Aug 2 15:40:18 2002 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/Document.cc (Retrieve): Allow redirects from HTTPSConnect.
|
||
|
|
||
|
Tue Jul 30 12:46:56 2002 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htlib/md5.cc: Added missing include of stdlib.h, as Geoff suggested.
|
||
|
|
||
|
Sat Jul 27 11:57:25 2002 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htnet/SSLConnection.cc: Add fix for segfault on SSL connections
|
||
|
noticed by several users. Fix contributed by Andy Bach
|
||
|
<afbach at users.sourceforge.net>.
|
||
|
|
||
|
Tue Jun 18 10:22:01 2002 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/Retriever.cc (got_word): Check that the word length meets
|
||
|
the minimum word length before doing any processing.
|
||
|
|
||
|
Fri Jun 14 17:26:21 2002 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/Display.cc (buildMatchList), htsearch/HtURLSeedScore.cc
|
||
|
(Match), htsearch/SplitMatches.cc (Match): Added Jim Cole's fix to
|
||
|
bugs in handling of search_results_order.
|
||
|
|
||
|
Wed May 15 09:45:40 CEST 2002 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htnet/Retriever.cc: fixed the bug regarding the server_wait_time
|
||
|
feature after the maximum number of requests per connection has been
|
||
|
reached.
|
||
|
|
||
|
Tue Apr 9 16:41:33 CEST 2002 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htnet/HtCookie*.[h,cc]: RFC2109 compliant.
|
||
|
* htlib/HtDateTime.[h,cc]: Add const-ness to the DiffTime static method
|
||
|
|
||
|
Tue Apr 9 12:52:30 CEST 2002 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htnet/HtCookie.cc: fixed a bug regarding expiry date recognition
|
||
|
|
||
|
Fri Apr 5 14:08:39 2002 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/ExternalTransport.cc (Request): Fixed to strip CR from
|
||
|
header lines, output header lines with -vvv.
|
||
|
|
||
|
Tue Mar 19 08:40:54 CET 2002 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htnet/HtCookie.cc: enhanced controls regarding the expires setting
|
||
|
when no expires is returned. Prevents NULL pointer exceptions to be
|
||
|
arisen.
|
||
|
|
||
|
Mon Mar 18 11:28:02 CET 2002 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htlib/HtDateTime.h: added the copy constructor
|
||
|
* htnet/HtCookie.cc: fixed a NULL pointer bug regarding 'datestring'
|
||
|
management and HtDateTime copy constructor is now used
|
||
|
|
||
|
Tue Mar 12 18:19:49 2002 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htlib/HtDateTime.cc (Parse, SetFTime): Added Parse method for
|
||
|
more flexible parsing of LOOSE/SHORT formats, use it in SetFTime.
|
||
|
Also skip unexpected leading spaces in SetFTime, as these frequently
|
||
|
cause problems with some strptime() implementations.
|
||
|
|
||
|
Mon Feb 11 23:28:37 2002 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/Retriever.h (got_redirect): Add referer to properly handle
|
||
|
broken links through a redirect as reported by Joe Jah.
|
||
|
|
||
|
* htdig/Retriever.cc: As above.
|
||
|
|
||
|
* htdig/Document.cc (Retrieve): Fix bug that prevented external
|
||
|
transport methods from reporting redirects as reported by Jamie
|
||
|
Anstice <Jamie.Anstice at sli-systems.com>.
|
||
|
|
||
|
* htlib/Dictionary.cc (hashCode): Trial of hash function suggested
|
||
|
by Jamie Anstice.
|
||
|
|
||
|
Sat Feb 9 18:06:29 2002 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htsearch/DocMatch.[h,cc]: Add scoring code for the new htsearch
|
||
|
framework.
|
||
|
|
||
|
Thu Feb 7 11:32:14 2002 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htnet/HtHTTP.cc (ReadChunkedBody): gets control of Read_Line
|
||
|
methods (return error when they fail).
|
||
|
|
||
|
Fri Feb 1 17:12:31 2002 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* Merged htdig-3-2-x branch back into CVS mainline.
|
||
|
|
||
|
* ChangeLog.0: Update with current 3.1.6 ChangeLog.
|
||
|
|
||
|
Thu Jan 24 18:06:04 2002 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* configure.in, aclocal.m4: Use new CHECK_SSL macro from the
|
||
|
autoconf archive.
|
||
|
|
||
|
* configure: Generate via autoconf.
|
||
|
|
||
|
Fri Jan 18 11:15:29 2002 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htnet/Transport.h (class Transport): Add const to SetCredentials
|
||
|
method declaration as pointed out by Roman Maeder.
|
||
|
|
||
|
Wed Jan 16 13:35:26 2002 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* db/db.h.in: Add #include <sys/stat.h> which seems to help
|
||
|
problems of stat64 conflicts on Solaris as suggested by Gilles.
|
||
|
|
||
|
Sat Jan 12 16:19:55 2002 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htcommon/defaults.cc: A few changes to the wording and formatting
|
||
|
of the 'accept_language' attribute description.
|
||
|
* htdoc/attrs.html, cf_by{name,prog}.html: Reran cf_generate.pl.
|
||
|
|
||
|
Fri Jan 11 21:18:00 CET 2002 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htcommon/defaults.cc: added the 'accept_language' attribute
|
||
|
|
||
|
Fri Jan 11 20:53:36 CET 2002 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htnet/HtHTTP.[h,cc]: management of the accept-language directive added
|
||
|
* htcommon/URL.[h,cc]: const-ness in copy constructor and other cosmetic changes
|
||
|
* htlib/Server.[h,cc]: management of the 'accept_language' attribute as
|
||
|
a server block configuration directive.
|
||
|
* htlib/Document.cc: set of the attribute above for the HTTP layer
|
||
|
|
||
|
Fri Jan 11 13:25:49 2002 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/ExternalTransport.cc (Request): Fixed to allocate access_time
|
||
|
object before setting it.
|
||
|
|
||
|
Fri Jan 4 12:31:34 2002 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htnet/HtCookie.cc, htword/WordKeyInfo.cc, htword/WordMonitor.cc,
|
||
|
test/search.cc: changed all uses of strcasecmp to mystrcasecmp for
|
||
|
consistency and portability.
|
||
|
|
||
|
Fri Jan 4 12:17:10 2002 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htnet/HtHTTP.cc (HTTPRequest): make the second comparison of the
|
||
|
transfer-encoding header the same as the first, i.e. case insensitive
|
||
|
and limited to 7 characters.
|
||
|
|
||
|
Fri Jan 4 15:13:13 CET 2002 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htnet/HtHTTP.cc: parse the transfer-encoding header as case insens.
|
||
|
[fix htdig-Bugs-499388 by Matthias Emmert <Matthias.Emmert2 at start.de>]
|
||
|
|
||
|
Sun Dec 30 15:47:35 CET 2001 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* HtHTTP.[h,cc]: management of the Content-Language directive for the response
|
||
|
|
||
|
Sat Dec 29 13:07:08 CET 2001 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htnet/HtCookie.[h,cc]: new fields (srcURL and isDomainValid) and
|
||
|
a more robust class with initialization list and copy constructor
|
||
|
|
||
|
* htnet/HtCookieJar.[h,cc]: method for calculating the minimum number
|
||
|
of periods that a domain specification of a cookie must have. Depending
|
||
|
on what the Netscape cookies specification says.
|
||
|
|
||
|
* htnet/HtCookieMemJar.cc: Management of the domain field of the cookie
|
||
|
|
||
|
Mon Dec 17 06:45:02 CET 2001 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htdig/htdig.cc: fixed bug about cookie jar creation. It is done in
|
||
|
here, because there is only one jar for the whole process. However
|
||
|
it can be moved anywhere else. :-)
|
||
|
|
||
|
Mon Dec 17 06:40:25 CET 2001 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htnet/HtHTTP.cc: check for null pointer of cookie jar
|
||
|
|
||
|
Sun Dec 16 19:55:07 CET 2001 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htnet/Connection.[h,cc]: default constructor is changed and accepts
|
||
|
a socket value (by default is -1)
|
||
|
* htnet/HtCookieJar.[h,cc]: added a simple iterator
|
||
|
* htnet/HtCookieMemJar.[h,cc]: ditto
|
||
|
* htnet/HtFile: removed the management of modification_time (constructor)
|
||
|
* htnet/HtHTTP.[h,cc]: constructor with initilization list and without
|
||
|
a default constructor (the construction is now forced to pass a valid
|
||
|
connection object). Removed any memory deletion from the destructor.
|
||
|
The class is now abstract (see the virtual pure destructor).
|
||
|
* htnet/HtHTTPBasic.cc: creates a Connection object in the initialization
|
||
|
and the destructor has no responsability
|
||
|
* htnet/HtHTTPSecure.cc: creates an SSLConnection object in the initialization
|
||
|
and the destructor has no responsability
|
||
|
* htnet/HtNNTP.cc: creates a Connection object in the initialization
|
||
|
and the destructor has no responsability
|
||
|
* htnet/Transport.[h,cc]: default constructor accepts a pointer to a
|
||
|
Connection object and the destructor carries out the deletion of it
|
||
|
|
||
|
Thu Dec 6 13:24:30 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* contrib/examples/rundig.sh: Fixed to make use of DBDIR variable,
|
||
|
and to test for and copy db.words.db.work_weakcmpr if it's there.
|
||
|
|
||
|
Fri Oct 19 11:07:33 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/Retriever.cc (IsValidURL): Fixed discrepancies in debug
|
||
|
levels for messages giving cause of rejection, inadvertantly
|
||
|
changed when regex support added.
|
||
|
|
||
|
Wed Oct 17 15:48:23 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/ExternalTransport.h: Added missing class keyword on friend
|
||
|
declaration.
|
||
|
|
||
|
Tue Oct 16 14:35:16 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htcommon/default.cc (external_parsers): Documented external converter
|
||
|
chaining to same content-type, e.g. text/html->text/html-internal.
|
||
|
* htdoc/attrs.html, cf_by{name,prog}.html: Reran cf_generate.pl.
|
||
|
|
||
|
Mon Oct 15 22:25:55 2001 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/Document.cc, htdig/htdig.cc, htdig/Retriever.cc: Make sure
|
||
|
setEscaped is called with the current value of
|
||
|
case_sensitive. Fixes bug pointed out by Phil Glatz.
|
||
|
|
||
|
Fri Oct 12 17:14:08 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdoc/htdump.html, htdoc/htload.html: Fixed 3 little typos.
|
||
|
|
||
|
Fri Oct 12 15:11:45 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htnet/HtHTTP.cc (ParseHeader): Show header lines in debugging
|
||
|
output at verbosity level 3, not 4, for consistency with 3.1.x.
|
||
|
|
||
|
* htcommon/URL.cc (removeIndex): Fixed to make sure the matched
|
||
|
file name is at the end of the URL.
|
||
|
|
||
|
Fri Oct 12 10:39:54 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htlib/HtRegexList.cc (setEscaped): Fixed to set compiled flag to
|
||
|
FALSE when there's no pattern, so match() can detect this condition.
|
||
|
Fixes handling of empty lists in bad_querystr, exclude_urls, etc.
|
||
|
|
||
|
* htdig/Retriever.cc (IsValidURL): Fixed bad_querystr matching to
|
||
|
look at right part of URL, not whole URL.
|
||
|
|
||
|
Mon Sep 24 11:47:15 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htnet/HtHTTP.cc (SetRequestCommand): Put If-Modified-Since header
|
||
|
out in GMT, not local time, and only put it out if existing document
|
||
|
time > 0.
|
||
|
|
||
|
* htsearch/parser.cc (perform_phrase): Optimized phrase search handling
|
||
|
to use linear algorithm with Dictionary lookups instead of n**2 alg.,
|
||
|
as suggested by Toivo Pedaste.
|
||
|
|
||
|
Tue Sep 18 10:50:40 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdoc/running.html: New documentation on how to run after configuring.
|
||
|
* htdoc/rundig.html: New manual page for rundig script.
|
||
|
* htdoc/install.html: Added link to running.html.
|
||
|
* htdoc/contents.html: Added link to running.html, rundig.html, related
|
||
|
projects. Updated links to contrib and developer site.
|
||
|
|
||
|
Fri Sep 14 22:12:56 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htcommon/URL.h: Moved DefaultPort() from private to public for
|
||
|
use in HtHTTP.cc.
|
||
|
|
||
|
Fri Sep 14 09:25:20 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htnet/HtHTTP.cc (SetRequestCommand): Add port to Host: header when
|
||
|
port is not default, as per RFC2616(14.23). Fixes bug #459969.
|
||
|
|
||
|
Sat Sep 8 22:15:33 2001 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* acconfig.h, include/htconfig.h.in: Add undef for
|
||
|
ALLOW_INSECURE_CGI_CONFIG, which if defined does about what you'd
|
||
|
expect. (This is for any wrapper authors who don't want to rewrite
|
||
|
but are willing to run insecure.)
|
||
|
|
||
|
* htsearch/htsearch.cc: Only allow the -c flag to work when
|
||
|
REQUEST_METHOD is undefined. Fixes PR#458013.
|
||
|
|
||
|
Tue Sep 4 18:58:31 2001 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htsearch/DocMatch.cc: Add scoring for Quim's new parser
|
||
|
framework. Only the normal word scoring is currently done, not
|
||
|
backlink_factor or other "Document" methods.
|
||
|
|
||
|
Fri Aug 31 15:34:28 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/HTML.h, htdig/HTML.cc (ctor, parse, do_tag): Fixed buggy
|
||
|
handling of nested tags that independently turn off indexing, so
|
||
|
</script> doesn't cancel <meta name=robots ...> tag. Add handling
|
||
|
of <noindex follow> tag. Added <> delim. to tag debugging output.
|
||
|
Fixed a few typos.
|
||
|
|
||
|
Wed Aug 29 10:33:01 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htcommon/defaults.cc (url_part_aliases): Added clarification
|
||
|
explaining how to use example.
|
||
|
|
||
|
* htdoc/attrs.html, cf_by{name,prog}.html: Reran cf_generate.pl.
|
||
|
|
||
|
Mon Aug 27 15:05:09 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* installdir/search.html: Add DTD tag for HTML 4 compliance.
|
||
|
* installdir/htdig.conf: Added .css to bad_extensions default,
|
||
|
added missing closing ">".
|
||
|
* htdoc/config.html: Updated with sample of latest htdig.conf and
|
||
|
installdir/*.html.
|
||
|
|
||
|
Wed Jul 25 22:16:06 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htcommon/defaults.cc: Put new htnotify_* entries in alphabetical
|
||
|
order. Removed superfluous quotes from htnotify_webmaster example
|
||
|
(htnotify.cc adds in the quotes).
|
||
|
* htdoc/attrs.html, cf_by{name,prog}.html: Reran cf_generate.pl.
|
||
|
|
||
|
Tue Jul 24 16:07:01 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htcommon/defaults.cc: Changed references in (no_)page_number_text
|
||
|
entries from maximum_pages to maximum_page_buttons.
|
||
|
* htdoc/attrs.html, cf_by{name,prog}.html: Reran cf_generate.pl.
|
||
|
|
||
|
Tue Jul 24 14:38:22 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdoc/hts_templates.html: Document Quim Sanmarti's URL decoding
|
||
|
feature for template variables.
|
||
|
|
||
|
Thu Jul 12 14:12:02 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htnet/HtFile.cc (Request): Fixed so it doesn't remove newlines
|
||
|
from documents, and so it only tries to open mime.types once even
|
||
|
if the open fails.
|
||
|
|
||
|
Thu Jul 12 11:40:07 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* contrib/conv_doc.pl, contrib/parse_doc.pl: Fixed EOF handling in
|
||
|
dehyphenation, fixed to handle %xx codes in title made from URL.
|
||
|
|
||
|
* contrib/doc2html/doc2html.pl, contrib/doc2html/pdf2html.pl,
|
||
|
contrib/doc2html/swf2html.pl: Fixed to handle %xx codes in URL title.
|
||
|
|
||
|
Wed Jul 11 15:05:47 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/Display.cc (readFile): Added missing fclose() call, and
|
||
|
debugging message for when file can't be opened.
|
||
|
|
||
|
Wed Jul 11 14:26:28 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/Display.cc (displayParsedFile): Added debugging message
|
||
|
when file can't be opened.
|
||
|
|
||
|
* htseach/Display.cc (buildMatchList): Fixed while loop to avoid
|
||
|
warning.
|
||
|
|
||
|
* htsearch/htsearch.cc (main): Fixed handling of syntax error message
|
||
|
to use String class instead of strdup().
|
||
|
|
||
|
* htsearch/parser.cc (setError): Added debugging message when error
|
||
|
is set.
|
||
|
|
||
|
* htsearch/parser.cc (parse): Fixed not to clear error message after
|
||
|
it's set.
|
||
|
|
||
|
Sat Jul 7 22:19:18 2001 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* */Makefile.in: Update using current production automake
|
||
|
(1.4-p4).
|
||
|
|
||
|
* htfuzzy/Regexp.[cc,h]: Change class name to Regexp to prevent
|
||
|
further namespace clashes.
|
||
|
|
||
|
* htfuzzy/Fuzzy.c: #include "Regexp.h" now and make sure we create
|
||
|
the right class when needed.
|
||
|
|
||
|
* htlib/mktime.c: Change included mktime declaration to mymktime
|
||
|
to avoid conflict on Mac OS X. (For some reason, autoconf's
|
||
|
AC_FUNC_MKTIME doesn't work for Mac OS X. So this is a hack in the
|
||
|
meantime.)
|
||
|
|
||
|
* htfuzzy/Makefile.am: Rename Regex files. Oops!
|
||
|
|
||
|
Fri Jul 6 18:38:58 2001 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htfuzzy/Regexp.cc, htfuzzy/Regexp.h: Rename Regex class to
|
||
|
prevent problems on case-insensitive systems.
|
||
|
|
||
|
* htlib/HtRegexReplaceList.cc, htlib/String.cc, htdig/htdig.cc:
|
||
|
Change #include of <stream.h> to modern standard of iostream.h.
|
||
|
|
||
|
* htlib/Configuration.cc (Read): Make sure we never reference a
|
||
|
negative position when trimming off whitespace.
|
||
|
|
||
|
* config.guess, config.sub: Update with new versions from GNU to
|
||
|
recognize various flavors of Mac OS X/Rhapsody.
|
||
|
|
||
|
* htlib/strptime.cc: Make sure len is initialized.
|
||
|
|
||
|
Fri Jul 6 12:04:52 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htlib/HtRegexList.cc (setEscaped): Fixed a potential problem
|
||
|
with list building. When we go back a step, we still have to
|
||
|
compile the new pattern in case it's the last one.
|
||
|
|
||
|
Wed Jul 4 23:39:19 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htcommon/URL.cc (parse, ServerAlias): Fixed two problems that
|
||
|
caused incorrect signatures to be generated.
|
||
|
|
||
|
Wed Jul 4 13:52:54 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* test/document.cc (dodoc), test/url.cc (dourl),
|
||
|
test/testnet.cc (Retrieve): Fixed up handling of config to match
|
||
|
David Graff's changes of May 16, and handling of HtHTTPBasic class
|
||
|
to match Joshua Gerth's changes of Mar 17.
|
||
|
|
||
|
Tue Jul 3 16:20:56 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/Retriever.cc (GetLocal): Fixed to use URL class on given
|
||
|
URL, so that default port numbers are stripped off. This was needed
|
||
|
to allow local fetching of robots.txt.
|
||
|
|
||
|
* htnet/Connection.cc (ctors, dtor, Assign_Server, Get_Peername),
|
||
|
htnet/Connection.h: Got rid of strdup stuff, used String class for
|
||
|
peer & server_name.
|
||
|
|
||
|
* htnet/Connection.cc (Get_PeerIP): Used unambiguous name for structure.
|
||
|
|
||
|
* htnet/HtHTTP.cc (ctor, dtor): Don't allocate a 2nd Connection, as
|
||
|
child classes already do this, and set pointer to null when connection
|
||
|
is deleted, so we don't try to delete it twice. This was messing up
|
||
|
the heap and causing segfaults. Call Transport::CloseConnection before
|
||
|
deleting connection.
|
||
|
|
||
|
* htnet/HtHTTPBasic.cc (dtor), htnet/HtHTTPSecure.cc (dtor),
|
||
|
|
||
|
* htnet/HtNNTP.cc (dtor): Only delete connection if non-null, & set
|
||
|
to null after deleting. Call Transport::CloseConnection before
|
||
|
deleting connection.
|
||
|
|
||
|
* htnet/Transport.cc (CloseConnection): Don't exit if connection
|
||
|
pointer is null, as this may be normal when called from destructor.
|
||
|
|
||
|
Fri Jun 29 11:14:36 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htfuzzy/Endings.cc (getWords): Undid change introduced in 3.1.3,
|
||
|
in part. It now gets permutations of word whether or not it has
|
||
|
a root, but it also gets permutations of one or more roots that
|
||
|
the word has, based on a suggestion by Alexander Lebedev.
|
||
|
* htfuzzy/EndingsDB.cc (createRoot): Fixed to handle words that have
|
||
|
more than one root.
|
||
|
* installdir/english.0: Removed P flag from wit, like and high, so
|
||
|
they're not treated as roots of witness, likeness and highness, which
|
||
|
are already in the dictionary.
|
||
|
|
||
|
Mon Jun 25 12:50:47 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/htsearch.cc (main): Got rid of last remnants of 'urllist'
|
||
|
and used the 'l' StringList as was used in the code before, to make
|
||
|
restrict and exclude handling work properly.
|
||
|
|
||
|
Mon Jun 25 15:52:19 CEST 2001 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htsearch/htsearch.cc: defined 'urllist' in order to remove the
|
||
|
compilation error (as Jesse suggested).
|
||
|
|
||
|
Fri Jun 22 16:28:13 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/Display.cc (buildMatchList): Fix date_factor calculation
|
||
|
to avoid 32-bit int overflow after multiplication by 1000, and avoid
|
||
|
repetitive time(0) call, as contributed by Marc Pohl. Also move the
|
||
|
localtime() call up before gmtime() call, to avoid clobbering gmtime's
|
||
|
returned static structure (my thinko).
|
||
|
|
||
|
* htdig/htdig.cc (main): Use .work file for md5_db, if -a given,
|
||
|
as contributed by Marc Pohl.
|
||
|
|
||
|
* htcommon/URL.cc (constructURL): Ensure that the _host is set if we
|
||
|
are constructing non-file urls, as contributed by Marc Pohl.
|
||
|
|
||
|
* htdoc/THANKS.html: Credit Marc Pohl for patches.
|
||
|
|
||
|
Tue Jun 19 17:14:05 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* README: Bump up to 3.2.0b4, fix note about bug report submissions.
|
||
|
|
||
|
Tue Jun 19 17:01:16 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/Display.cc (setVariables): Fixed handling of
|
||
|
build_select_lists attribute, to deal with new restrict & exclude
|
||
|
attributes.
|
||
|
|
||
|
Mon Jun 18 12:16:27 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* configure.in, configure: Fix "hdig" typo in help.
|
||
|
|
||
|
Fri Jun 15 17:57:19 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htcommon/defaults.cc: Noted effect of locale setting on floating
|
||
|
point numbers in search_algorithm and locale descriptions.
|
||
|
* htdoc/attrs.html, cf_by{name,prog}.html: Reran cf_generate.pl.
|
||
|
|
||
|
Fri Jun 15 15:36:51 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdoc/cf_generate.pl: Fixed to handle new defaults.cc format
|
||
|
with trailing backslashes.
|
||
|
|
||
|
* htdoc/attrs.html, cf_by{name,prog}.html: Reran cf_generate.pl.
|
||
|
|
||
|
Fri Jun 15 14:57:21 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdb/htdb_dump.cc, htdb/htdb_load.cc, htdb/htdb_stat.cc: Added a
|
||
|
conditional include of <getopt.h> if HAVE_GETOPT_H is defined.
|
||
|
|
||
|
Fri Jun 15 11:25:24 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/htsearch.cc (main), htcommon/defaults.cc,
|
||
|
htdoc/hts_form.html: two new attributes, used by htsearch, have
|
||
|
been added: restrict and exclude. They can now give more control
|
||
|
to template customisation through configuration files, allowing
|
||
|
to restrict or exclude URLs from search without passing
|
||
|
any CGI variables (although this specification overrides the
|
||
|
configuration one).
|
||
|
|
||
|
Fri Jun 15 09:34:23 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/htsearch.cc (main): Changed ridiculously outdated question
|
||
|
"Did you run htmerge?" to "Did you run htdig?".
|
||
|
|
||
|
Fri Jun 8 11:07:04 2001 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htsearch/Display.cc: Add <float.h> header, now needed for RH 7.1.
|
||
|
|
||
|
Thu Jun 7 12:05:09 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* contrib/htdig-3.2.0.spec: Updated to 3.2.0b4.
|
||
|
|
||
|
* contrib/README: Mention acroconv.pl script.
|
||
|
|
||
|
Thu Jun 7 10:46:19 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/Display.cc (expandVariables): Use isalnum() instead of
|
||
|
isalpha() to allow digits in variable names, allow '-' in variable
|
||
|
names too for consistency with attribute name handling.
|
||
|
|
||
|
Wed Jun 6 16:14:06 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* httools/htpurge.cc (main): Added missing "u:" declaration in
|
||
|
getopt() call.
|
||
|
|
||
|
Wed Jun 6 15:24:04 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* contrib/doc2html/DETAILS, contrib/doc2html/README,
|
||
|
contrib/doc2html/doc2html.pl, contrib/doc2html/pdf2html.pl,
|
||
|
contrib/doc2html/swf2html.pl: Update to version 3.0 of doc2html,
|
||
|
contributed by David Adams <D.J.Adams at soton.ac.uk>.
|
||
|
|
||
|
Wed May 16 11:23:04 2001 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
Added a pile of changes contributed by David Graff
|
||
|
<phlat at mindspring.com> fixing compilation problems with
|
||
|
non-gcc/g++ compilers (i.e. Sun's compiler).
|
||
|
|
||
|
* Makefile.config, db/Makefile.am: Added no-dependencies to
|
||
|
AUTOMAKE_OPTIONS for those not on GNU C/C++
|
||
|
|
||
|
* configure.in: Changed AM_PROG_YACC to AC_PROG_YACC as autoconf
|
||
|
and autoreconf both complain that AM_PROG_YACC is not in the
|
||
|
library.
|
||
|
|
||
|
* htcommon/DocumentDB.cc: Removed default parameters as they are
|
||
|
already declared in the header
|
||
|
|
||
|
* htcommon/HtConfiguration.cc: Changed some of the loop
|
||
|
declarations so that Sparc C 4.2 is happy. Removed default
|
||
|
parameters as they are already declared in the header Moved inline
|
||
|
ParseString to header where it belongs. Added initialization for
|
||
|
HtConfiguration::_config static member variable. Added
|
||
|
implementation of HtConfiguration::config() static class member.
|
||
|
|
||
|
* htcommon/HtConfiguration.h: Added include for ParsedString.h.
|
||
|
Added declaration of static member function ::config().
|
||
|
Added private static member variable _config;.
|
||
|
Added inline ParseString from implementation.
|
||
|
|
||
|
* htcommon/HtURLCodec.cc, htcommon/HtURLRewriter.cc,
|
||
|
htcommon/HtZlibCodec.cc, htcommon/URL.cc, htcommon/conf_lexer.lxx,
|
||
|
htdig/Document.cc, htdig/ExternalParser.cc,
|
||
|
htdig/ExternalTransport.cc, htdig/HTML.cc, htdig/Parsable.cc,
|
||
|
htdig/Plaintext.cc, htdig/Retriever.cc, :
|
||
|
Changed to use new global configuration semantics.
|
||
|
|
||
|
* htcommon/conf_parser.yxx: Added a return to yyerror to quiet
|
||
|
Sparc C 4.2. Should really return a value here. Is it normal to
|
||
|
return a YY_something or just -1, 0, ?
|
||
|
|
||
|
* htcommon/defaults.cc: Added line continuation characters at the
|
||
|
end of all the string lines that did not completed by a quote.
|
||
|
|
||
|
* htcommon/defaults.h, htdig/htdig.h: Removed extern
|
||
|
HtConfiguation config in favor of HtConfiguration::config().
|
||
|
|
||
|
* htdig/ExternalTransport.h Changed return type of GetResponse to
|
||
|
match superclass.
|
||
|
|
||
|
* htdig/Server.cc, htdig/htdig.cc, htfuzzy/htfuzzy.cc, htnet/HtFile.cc,
|
||
|
htsearch/Display.cc, htsearch/QueryLexer.cc, htsearch/WordSearcher.cc,
|
||
|
htsearch/htsearch.cc, htsearch/parser.cc, htsearch/qtest.cc,
|
||
|
httools/htdump.cc, httools/htload.cc, httools/htmerge.cc,
|
||
|
httools/htnotify.cc, httools/htpurge.cc, httools/htstat.cc
|
||
|
htlib/Configuration.cc, htlib/HtRegex.cc:
|
||
|
Changed constructor to use initializers
|
||
|
|
||
|
* htlib/HtDateTime.cc: Moved inlines to header
|
||
|
|
||
|
* htlib/HtDateTime.h: Added inlines from implementation
|
||
|
|
||
|
* htlib/HtHeap.cc, htlib/HtHeap.h, htlib/HtVector.cc, htlib/HtVector.h,
|
||
|
htlib/HtVectorGeneric.h, htlib/HtVectorGenericCode.h:
|
||
|
Changed Copy member to return same type as superclass
|
||
|
|
||
|
* htlib/HtRegexReplace.cc, htlib/HtRegexReplaceList.cc: Removed
|
||
|
default parameters as they are declared already in the header
|
||
|
|
||
|
* htlib/myqsort.h: Changed comment in header to use C-style
|
||
|
comments as it's compiled using a C.
|
||
|
|
||
|
* htlib/regex.h: Changed #if __STDC__ to #if defined(__STDC__)
|
||
|
|
||
|
* htword/WordKey.h: Corrected const'ness
|
||
|
|
||
|
Wed May 9 07:50:19 CEST 2001 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htnet/HtCookieJar.h: ShowSummary makes the class abstract
|
||
|
|
||
|
Sat May 5 20:51:00 2001 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdoc/cf_blocks.html: Add colon in example and description of
|
||
|
blocks to match code for the moment. The parser can be changed
|
||
|
later if we like.
|
||
|
|
||
|
Sat May 5 20:38:44 2001 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/ParsedString.cc (get): Use isalnum() instead of isalpha()
|
||
|
for looking up--allows names that contain digits too.
|
||
|
|
||
|
Sat May 5 20:36:29 2001 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/htString.h (class String): Remove now-obsolete and
|
||
|
confusing int() casting operator. This was previously used to make
|
||
|
a string of a certain length. Use String(int) as a ctor instead.
|
||
|
|
||
|
Sat May 5 20:30:18 2001 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htword/WordContext.[h,cc]: Change Initialize to supply a config
|
||
|
that can be modified (i.e. if we don't have ZLIB_H).
|
||
|
|
||
|
Sat May 5 23:30:55 CEST 2001 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htnet/HtCookieJar.h: ShowSummary, printing cookies (to be derived)
|
||
|
* htnet/HtCookieMemJar.[h,cc]: ShowSummary, printing cookies
|
||
|
|
||
|
Thu May 3 23:14:14 CEST 2001 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htnet/HtHTTP[h,cc]: connection object is now created and destroyed.
|
||
|
NULL pointers converted to C++ standard (0).
|
||
|
* htnet/Transport[h,cc]: NULL pointers converted to C++ standard (0).
|
||
|
* htnet/Connection[h,cc]: ditto
|
||
|
|
||
|
Thu May 3 23:09:33 CEST 2001 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htlib/HtDateTime.[h,cc]: Timestamp format added (used by ht://Check
|
||
|
for MySQL interfacing) - keeping them equal helps me maintaining
|
||
|
both of them!
|
||
|
|
||
|
Thu May 3 10:28:56 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/parser.cc (perform_and): Add missing return statement,
|
||
|
as suggested by Quim Sanmarti.
|
||
|
|
||
|
Fri Mar 30 15:50:42 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/ResultMatch.h, htsearch/ResultMatch.cc (setTitle): Changed
|
||
|
argument type to char * to fix problem with sort by title not working,
|
||
|
as reported by Adam Lewenberg.
|
||
|
|
||
|
Fri Mar 30 14:08:51 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/Document.h, htdig/Retriever.cc (parse_url): Define and use
|
||
|
Document::StoredLength() method to get actual length of data
|
||
|
retrieved and given to md5(), which may be less than original
|
||
|
length. Fixes bug reported by Michael Haggerty.
|
||
|
|
||
|
Wed Mar 21 22:22:55 2001 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htsearch/Display.cc (generateStars): Add NSTARS variable for
|
||
|
template output as suggested by Caleb Crome
|
||
|
<ccrome at users.sourceforge.net> (except here precision is 0). Fixes
|
||
|
feature request #405787.
|
||
|
|
||
|
* htdoc/hts_templates.html: Add description of NSTARS variable
|
||
|
above.
|
||
|
|
||
|
* htlib/HtRegex.cc (set): Make sure we free memory if we've
|
||
|
already compiled a pattern.
|
||
|
|
||
|
* htdig/Retriever.cc (got_href): Fix bug pointed out by Gilles
|
||
|
with hopcounts and don't bother to update the DocURL unless we
|
||
|
have a new doc.
|
||
|
|
||
|
Mon Mar 19 18:00:18 2001 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htcommon/URL.cc (URL): Make sure even absolute relative URLs are
|
||
|
run through normalizePath() as pointed out by Gilles. Allows
|
||
|
backout of previous fix of #408586, which does extra re-parsing of
|
||
|
URL.
|
||
|
|
||
|
* htdig/Retriever.cc (Need2Get): Back out change of Mar. 17 for above.
|
||
|
|
||
|
* htcommon/conf_lexer.[cxx, lxx]: Apply change suggested by Jesse
|
||
|
to remove empty statements.
|
||
|
|
||
|
Mon Mar 19 11:33:25 2001 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/HtRegexList.cc (setEscaped): Fix assorted bugs, including
|
||
|
obvious segfault, incorrect creation of limits, and failure to set
|
||
|
"compiled" flag before return().
|
||
|
|
||
|
* htdig/Retriever.cc (IsValidURL): Make sure the tmpList is
|
||
|
cleared before attempting to parse the bad_querystr
|
||
|
config--otherwise we'll just Add to the end of the list.
|
||
|
|
||
|
Sun Mar 18 14:01:56 CET 2001 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htnet/Transport.[h,cc], htnet/HtHTTP.cc: In order to modularize
|
||
|
the net code the default parser string for the content-type has
|
||
|
been added to the Transport class.
|
||
|
* htdig/Document.cc: modified for the changes above.
|
||
|
|
||
|
Sat Mar 17 16:38:27 2001 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* configure.in, configure, include/htconfig.h.in: Add tests for
|
||
|
libssl, libcrypto, and ssl.h.
|
||
|
|
||
|
* htnet/SSLConnection.[cc,h], htnet/HtHTTPBasic.[cc,h],
|
||
|
htnet/HTTPSecure.[cc,h]: New files. Contributed by Joshua Gerth
|
||
|
<jgerth at hmsoaps.com>.
|
||
|
|
||
|
* htnet/Transport.[cc,h], htnet/HtNTTP.cc, htnet/HtHTTP.cc,
|
||
|
htnet/Connection.h: Changes needed to support SSLConnection class.
|
||
|
|
||
|
* htdig/Document.cc, htdig/Document.h: Ditto.
|
||
|
|
||
|
* htnet/Makefile.am, htnet/Makefile.in: Add above for compilation.
|
||
|
|
||
|
* htdoc/THANKS.html: Updated with new contributors.
|
||
|
|
||
|
Sat Mar 17 15:28:20 2001 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htword/WordContext.cc (Initialize): If HAVE_LIBZ or HAVE_ZLIB_H
|
||
|
are not defined, make sure wordlist_compress is set to false. This
|
||
|
semi-hack will not be necessary with new mifluz code which does
|
||
|
not necessary need zlib. Fixes bug #405761.
|
||
|
|
||
|
Sat Mar 17 14:39:17 2001 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/HTML.cc (do_tag): Fixed problems with META descriptions
|
||
|
containing newlines, returns or tabs. They are now replaced with
|
||
|
spaces. Fixes bug #405771.
|
||
|
|
||
|
Sat Mar 17 14:26:55 2001 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/HTML.cc (do_tag): Improve handling of whitespace in META
|
||
|
refresh handling. Fixes bug #406244.
|
||
|
|
||
|
* htlib/HtRegexList.cc (setEscaped): Make this more efficient by
|
||
|
building up larger and larger patterns--when we fail, go back a
|
||
|
step and add the pattern in the next loop. This ensures we have a
|
||
|
list of the maximum allowable length regexp.
|
||
|
|
||
|
* htdig/Retriever.cc (Need2Get): Add change suggested by Yariv Tal
|
||
|
to run URLs through the URL parser for cleanup before comparing to
|
||
|
the visited list. Fixes bug #408586.
|
||
|
|
||
|
Mon Mar 12 13:28:56 2001 Michael Haggerty <mhagger at alum.mit.edu>
|
||
|
|
||
|
* htdig/Retriever.cc, htdig/Retriever.h:
|
||
|
Fixed two off-by-one errors related to Retriever::factor table.
|
||
|
|
||
|
Mon Mar 12 11:25:31 2001 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/Dictionary.cc (Add): Fix comments about add method--it
|
||
|
will replace existing keys. Fixes report #407940.
|
||
|
|
||
|
Thu Mar 8 15:31:45 2001 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htnet/HtHTTP.cc: removed an unuseful <else>
|
||
|
|
||
|
Tue Mar 6 11:42:10 2001 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/regex.[c,h]: Update with versions from glibc 2.2.2.
|
||
|
|
||
|
Mon Mar 5 13:47:30 2001 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* ltconfig (host_os): Add test to solve problems building C++
|
||
|
shared libraries on some platforms. Currently should only make
|
||
|
--enable-shared the default on Linux and *BSD* unless specified
|
||
|
explicitly by the user.
|
||
|
|
||
|
Mon Mar 5 12:52:57 2001 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/String.cc (operator =): Add fix contributed by Yariv Tal
|
||
|
<YarivT at webmap.com>, fixed bug #406075.
|
||
|
|
||
|
Mon Mar 5 12:06:26 2001 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/HtRegexList.cc (match): Ignore rearrangement code for the
|
||
|
moment--may or may not be the culprit for bug #405277, but is a
|
||
|
start to debugging the problem.
|
||
|
|
||
|
* htlib/List.[cc,h]: Remove *prev pointer from listnode
|
||
|
structure and add a *prev pointer to the cursor structure. Saves
|
||
|
one pointer per item in the list, plus overhead.
|
||
|
|
||
|
Mon Mar 5 11:56:16 2001 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htcommon/defaults.cc (bad_extensions): Add .css to ignore CSS docs.
|
||
|
|
||
|
* htdig/Document.cc (getParsable): Ignore CSS documents -- they
|
||
|
aren't very useful to parse. Solves bug report #405772.
|
||
|
|
||
|
Sun Mar 04 11:32:43 2001 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htnet/HtHTTP.cc: fixed a bug regarding <no header> with persistent
|
||
|
connections enabled, but head call before the get one disabled.
|
||
|
Sourceforge.net's bug reference: 405275 - fixed.
|
||
|
|
||
|
Sat Mar 3 21:09:55 2001 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* .version: Bump to 3.2.0b4 so snapshots have right versioning.
|
||
|
|
||
|
Thu Mar 1 16:51:09 2001 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* configure.in: Added test for alloca.h, which is needed for the
|
||
|
regex.c code.
|
||
|
|
||
|
Wed Feb 28 12:54:43 CEST 2001 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htcommon/defaults.cc: 'disable_cookies' option has been added, with
|
||
|
a 'server' scope. By default it is set to 'false'.
|
||
|
* htdig/Server.h, cc: management of the option above has been enhanced.
|
||
|
* htnet/HtHTTP.h, cc: now an HTTP connection can disable/enable cookies
|
||
|
through the configuration attribute 'disable_cookies'.
|
||
|
* htdig/Document.cc: management of cookies enabling/disabling is here.
|
||
|
* Cookies classes: now support the expiration time. Need only the
|
||
|
subdomain treatment.
|
||
|
|
||
|
Mon Feb 26 16:37:30 2001 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htcommon/conf_lexer.lxx: Don't directly call exit(1) on an error
|
||
|
condition! Seems a harsh problem for an unknown character.
|
||
|
|
||
|
* htcommon/conf_parser.yxx: Ditto. (Running out of memory is a
|
||
|
much more fatal condition, of course.)
|
||
|
|
||
|
* htcommon/conf_lexer.cxx: Regenerate using flex 2.5.4.
|
||
|
|
||
|
* htcommon/conf_parser.cxx: Regenerate using bison 1.28.
|
||
|
|
||
|
Sun Feb 25 19:46:01 CEST 2001 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htnet/HtHTTP.h, cc: support for cookies enabled
|
||
|
* htnet/Makefile.am: files for cookies have been added to make.
|
||
|
|
||
|
Sun Feb 25 19:27:18 CEST 2001 Gabriele Bartolini <angusgb at users.sourceforge.net>
|
||
|
|
||
|
* htnet/HtCookie.h,cc: class HTTP cookie
|
||
|
* htnet/HtCookieJar.h,cc: abstract class for managing the
|
||
|
'jar' of cookies. In this way, we can use different methods
|
||
|
for the storage of them.
|
||
|
* htnet/HtCookieMemJar.h,cc: class for managing the 'jar' of
|
||
|
cookies in memory, without persistent storage (no db or file).
|
||
|
* Many thanks to Robert LaFerla for his coding on this! Yeah,
|
||
|
really really thanks Robert! <robertlaferla at mediaone.net>
|
||
|
|
||
|
|
||
|
Thu Feb 22 16:43:18 2001 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdoc/ChangeLog, htdig/RELEASE.html, README: Update to roll the
|
||
|
release of 3.2.0b3.
|
||
|
|
||
|
Thu Feb 22 16:22:05 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/htsearch.cc (main), htsearch/Display.cc (setVariables,
|
||
|
createURL, buildMatchList), htdoc/hts_form.html,
|
||
|
htdoc/hts_templates.html: Add Mike Grommet's date range search
|
||
|
feature.
|
||
|
|
||
|
Mon Feb 19 18:24:42 2001 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htfuzzy/Synonym.cc (createDB): Create database in a temporary
|
||
|
directory before we move it into place, much like the endings
|
||
|
code. This should prevent problems when we just append to the DB
|
||
|
instead of making a new one.
|
||
|
|
||
|
* htdig/htdig.cc (main): Fix bug discovered by Gilles--htword
|
||
|
should be initialized *after* we are finished modifying config
|
||
|
attributes based on flags and unlink with -i.
|
||
|
|
||
|
* installdir/rundig: Fix bug with calling htpurge with -s option.
|
||
|
|
||
|
Thu Feb 15 11:03:42 2001 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdoc/*.html: Update with 2001 copyrights and various changes
|
||
|
with the website move for the pending 3.2.0b3 release.
|
||
|
|
||
|
Thu Feb 15 10:41:47 2001 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/HtRegexList.cc (match): Fix thinko with logic for matching
|
||
|
and add code to rearrange matching nodes for hopefully better
|
||
|
performance.
|
||
|
|
||
|
Sun Feb 11 16:42:11 2001 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/HtRegexList.h, htlib/HtRegexList.cc (class HtRegexList):
|
||
|
Simple List(HtRegex) object with similar calling conventions to
|
||
|
HtRegex class. This version is not as sophisticated as it could
|
||
|
be, but it's not likely to drop objects when reorganizing.
|
||
|
|
||
|
* htlib/Makefile.[in,am]: Add HtRegexList files to list for
|
||
|
compilation.
|
||
|
|
||
|
* htdig/htdig.h, htdig/htdig.cc, htdig/Retriever.cc: Use
|
||
|
HtRegexList instead of HtRegex for setting escaped values--should
|
||
|
never fail (since each String item is short).
|
||
|
|
||
|
* htlib/HtDateTime.cc: Put back timezone specs into the output
|
||
|
formats so we give everything even if we ignore it when reading
|
||
|
input.
|
||
|
|
||
|
Mon Feb 5 11:47:07 2001 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/HtDateTime.cc: Remove the timezone specs in the date
|
||
|
formats--these are not required in the RFCs because many dates are
|
||
|
in GMT anyway.
|
||
|
|
||
|
Wed Jan 17 08:48:30 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/ExternalTransport.cc (Request): Oops, fixed a holdover from
|
||
|
code borrowed from ExternalParser.cc's fork handling.
|
||
|
|
||
|
Mon Jan 15 23:09:37 2001 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htnet/Connection.cc: Back out previous change--this should not
|
||
|
in any way be needed since the configure script should set
|
||
|
FD_SET_T.
|
||
|
|
||
|
* configure.in, configure: Add more lenient prototyping for
|
||
|
select() test--now allows "const struct timeval" for compilation
|
||
|
on BSDI.
|
||
|
|
||
|
* htdoc/RELEASE.html: Update with Gilles's changes.
|
||
|
|
||
|
* htdoc/cf_blocks.html: New file describing <server ...></server>
|
||
|
and <url ...></url> blocks.
|
||
|
|
||
|
* htdoc/cf_general.html, htdoc/confmenu.html: Refer to the above.
|
||
|
|
||
|
Mon Jan 15 17:46:07 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/TemplateList.cc (createFromString), htcommon/defaults.cc:
|
||
|
Treat template_map as a _quoted_ string list.
|
||
|
|
||
|
* htdoc/attrs.html, cf_by{name,prog}.html: Reran cf_generate.pl.
|
||
|
|
||
|
Mon Jan 15 17:40:45 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdoc/hts_templates.html: Add METADESCRIPTION variable.
|
||
|
|
||
|
* htsearch/Display.cc (displayMatch): Add METADESCRIPTION variable.
|
||
|
|
||
|
* htdig/ExternalParser.cc (parse): Fix up handling of arguments.
|
||
|
|
||
|
* htdig/ExternalTransport.cc (Request): Fix up handling of fork/exec
|
||
|
and command arguments, add wait() call.
|
||
|
|
||
|
Wed Jan 10 19:23:36 2001 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* installdir/rundig: Fix -a handling to move db.words.db.work_weakcmpr
|
||
|
into place if it exists
|
||
|
|
||
|
Sat Jan 6 21:50:58 2001 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* configure.in: Add checks for <sys/wait.h> and <wait.h> for
|
||
|
ExternalParser.
|
||
|
|
||
|
* include/htconfig.h.in: Regenerate using autoheader.
|
||
|
|
||
|
* configure: Regenerate using configure.
|
||
|
|
||
|
* htnet/Connection.cc: Add definition for FD_SET_T to fix problems
|
||
|
compiling on BSDI mentioned by Joe.
|
||
|
|
||
|
* htdig/ExternalParser.cc: Use <sys/wait.h> or <wait.h> as
|
||
|
appropriate. Should fix problems with compiliation mentioned by
|
||
|
Jesse on HP/UX.
|
||
|
|
||
|
* README, htdoc/RELEASE.html: Adjust dates for the new year.
|
||
|
|
||
|
* htdoc/upgrade.html: A few "remaining features" have been implemented.
|
||
|
|
||
|
Sun Dec 06 19:46:15 CEST 2000 Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* htnet/HtHTTP.cc: Fixed bug for Read_Line function call in
|
||
|
ReadChunkedBody method. Many thanks to Robert LaFerla. ;-)
|
||
|
|
||
|
Tue Dec 12 13:24:49 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/ExternalParser.cc (parse): Fixed to properly handle binary
|
||
|
output from an external converter. Fixed some compilation errors.
|
||
|
|
||
|
Tue Dec 12 12:52:14 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/ExternalParser.cc (parse): Handle parser command string
|
||
|
as a string list again to allow arguments, build up argv and
|
||
|
use execv instead of execl.
|
||
|
|
||
|
Tue Dec 12 12:25:04 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/ExternalParser.cc (parse): Add call to wait for child process,
|
||
|
to avoid zombie buildup.
|
||
|
|
||
|
Mon Dec 11 23:57:43 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/ExternalParser.cc (parse): Fix up handling of fds in child
|
||
|
process, more fault-tolerant handling of pipe or fork errors.
|
||
|
|
||
|
Mon Dec 11 23:30:55 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/ExternalParser.cc (parse): Fix up handling of creation
|
||
|
of temporary file, check for proper return code, give error if
|
||
|
appropriate.
|
||
|
|
||
|
Mon Dec 11 23:19:28 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/ExternalParser.cc (parse): Lowercase content-types and
|
||
|
strip off any trailing semicolons, at one last spot. This reinserts
|
||
|
code added Sep 11, which was dropped Oct 9, probably inadvertantly
|
||
|
during mifluz back-out.
|
||
|
|
||
|
Sun Dec 10 15:28:44 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/ExternalTransport.cc: Use fork/exec instead of calling
|
||
|
popen, which bypasses any shell escape problems.
|
||
|
|
||
|
* htdig/ExternalParser.cc: Ditto, plus use of mkstemp where
|
||
|
available to pick the filename.
|
||
|
|
||
|
* configure, configure.in: Check for mkstemp where available.
|
||
|
|
||
|
* include/htconfig.h.in: Define it as above.
|
||
|
|
||
|
* htlib/Makefile.am: Omit regex.c from SOURCES--this is included
|
||
|
when necessary by the configure script. Otherwise this produces
|
||
|
duplicate declarations, etc.
|
||
|
|
||
|
* htlib/Makefile.in: Regenerate using automake --foreign.
|
||
|
|
||
|
* htcommon/URL.cc: Fix bug with ports of 0 showing up in URLs like
|
||
|
mailto: or other less-common protocols.
|
||
|
|
||
|
Fri Dec 1 14:45:33 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* contrib/htdig-3.2.0.spec: Updated to 3.2.0b3.
|
||
|
|
||
|
Fri Dec 1 13:59:09 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htlib/Makefile.am: Fix pkginclude_HEADERS to list missing headers
|
||
|
ber.h, libdefs.h, myqsort.h, mhash_md5.h, omit unneeded langinfo.h;
|
||
|
fix libht_la_SOURCES to list missing sources regex.c, myqsort.c.
|
||
|
|
||
|
* htlib/Makefile.in: Regenerate using automake --foreign
|
||
|
|
||
|
* htlib/langinfo.h, htlib/nl_types.h: Removed as they're now unused.
|
||
|
|
||
|
Fri Dec 1 13:22:47 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htlib/strptime.cc (mystrptime): make ptr const and use cast on
|
||
|
return value to avoid warnings.
|
||
|
|
||
|
* htlib/Makefile.am: Fix pkginclude_HEADERS to list HtRegexReplace*.h
|
||
|
rather than .cc.
|
||
|
|
||
|
* htlib/Makefile.in: Regenerate using automake --foreign
|
||
|
|
||
|
Fri Dec 1 11:58:21 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* Makefile.in, [hit]*/Makefile.in: Regenerate using automake --foreign
|
||
|
after fixing bug with cp -pr in automake.
|
||
|
|
||
|
Thu Nov 30 14:41:58 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdoc/Makefile.am: Removed howitworks.html from EXTRA_DIST.
|
||
|
|
||
|
* Makefile.in (distdir): Added missing variable name 'd' to cp -pr.
|
||
|
|
||
|
Thu Nov 30 14:01:48 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htlib/strptime.cc, htlib/lib.h: make first 2 args to strptime
|
||
|
const to avoid warnings, use cast in asizeof to avoid warnings.
|
||
|
|
||
|
* htsearch/qtest.cc: Change include from iostream to iostream.h
|
||
|
|
||
|
* htsearch/DocMatch.cc: Change include from iostream to iostream.h
|
||
|
|
||
|
* htsearch/Display.cc (createURL, buildMatchList, excerpt, hilight):
|
||
|
Clean up code to get rid of warnings, especially resulting from
|
||
|
NULLs in ternary operators.
|
||
|
|
||
|
Thu Nov 30 10:55:09 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htlib/String_fmt.cc (form, vform): Use vsnprintf rather than
|
||
|
vsprintf, for buffer overflow prevention if vsnprintf available.
|
||
|
|
||
|
* htdig/Retriever.cc: Remove unused strptime declaration.
|
||
|
|
||
|
* htlib/HtDateTime.cc: Use mystrptime if HAVE_STRPTIME not set.
|
||
|
|
||
|
Wed Nov 29 23:31:10 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdb/htdb_stat.cc, htdb_load.cc, htdb_dump.cc: Make sure we
|
||
|
include htconfig.h to include proper declarations.
|
||
|
|
||
|
* htlib/strptime.cc: Change to strptime.cc, from htdig-3.1 series
|
||
|
hopefully more portable until I can find a more suitable
|
||
|
replacement.
|
||
|
|
||
|
* htlib/Makefile.am, htlib/Makefile.in: As above.
|
||
|
|
||
|
* htlib/clib.h, htlib/lib.h: Ditto.
|
||
|
|
||
|
* htdoc/all.html: Add a first draft of program summaries.
|
||
|
|
||
|
Wed Nov 29 18:00:15 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/Retriever.cc (parse_url): Remove undeclared "dup" variable,
|
||
|
add missing calls to words.Skip().
|
||
|
|
||
|
Wed Nov 29 17:44:56 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/htdig.html: Add description of -v output.
|
||
|
|
||
|
Mon Nov 27 12:03:34 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htlib/md5.cc: Added missing include of time.h
|
||
|
|
||
|
Fri Nov 24 00:56:01 2000 Toivo Pedaste <toivo at ucs.uwa.edu.au>
|
||
|
|
||
|
* htsearch/Display.cc: Some extra debugging for scoring
|
||
|
|
||
|
Sun Nov 19 00:56:01 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htnet/HtFile.cc (Request): Use opendir/readdir instead of
|
||
|
scandir for generating directory listings on-the-fly.
|
||
|
|
||
|
* htdoc/RELEASE.html: Write up release notes for 3.2.0b3.
|
||
|
|
||
|
* htdoc/THANKS.html: Update list of contributors for 3.2.0b3 as
|
||
|
current.
|
||
|
|
||
|
Fri Nov 17 14:52:37 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* contrib/acroconv.pl: Added external converter script to convert
|
||
|
PDFs with acroread.
|
||
|
|
||
|
Mon Nov 6 12:13:13 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/Retriever.cc (GetLocal, GetLocalUser): move String definition
|
||
|
out of while statement for AIX xlC compiler.
|
||
|
|
||
|
Mon Oct 30 21:50:02 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/Server.h, htdig/Server.cc (push): Add newDoc paramter that
|
||
|
will allow redirects (old docs) to be followed and not count
|
||
|
against the maxDoc restrictions.
|
||
|
|
||
|
* htdig/Retriever.cc (got_redirect): Use new parameter so we don't
|
||
|
count against a server's max documents since it's a redirect.
|
||
|
|
||
|
* htlib/nl_types.h: Add for systems missing this header file.
|
||
|
|
||
|
Sun Oct 29 21:36:51 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htcommon/defaults.cc: Updated per-server and per-URL fields to
|
||
|
match code. I still have a "wish list" of additional attributes
|
||
|
that should work this way eventually.
|
||
|
|
||
|
* htdoc/attrs.html, cf_by{name,prog}.html: Reran cf_generate.pl.
|
||
|
|
||
|
Sun Oct 22 17:13:08 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htcommon/HtWordList.h: Add missing include for stdlib.h needed for
|
||
|
abort().
|
||
|
|
||
|
* htsearch/BooleanQueryParser.cc (ParseAnd): Fix problems with RH7
|
||
|
compiler -- shouldn't use "not" as a variable name!
|
||
|
|
||
|
Thu Oct 19 22:19:16 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* ltmain.sh, ltconfig: Update with versions from libtool
|
||
|
1.3.5. which may fix some problems building libraries.
|
||
|
|
||
|
Mon Oct 9 21:59:11 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* */* [many, many files]: Backed out mifluz merge by going back on
|
||
|
modified files to 091000 snapshot.
|
||
|
|
||
|
* configure: Regenerated from configure.in.
|
||
|
|
||
|
* */Makefile.in: Regenerated using automake.
|
||
|
|
||
|
Fri Oct 6 11:03:14 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/HTML.cc (do_tag): Parse <object> tags properly, looking
|
||
|
for data= attribute rather than src=.
|
||
|
|
||
|
* htcommon/defaults.cc (server_aliases): Additional clarification
|
||
|
to server_aliases description of port numbers.
|
||
|
|
||
|
Wed Oct 4 12:12:31 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htcommon/defaults.cc (limit_normalized, server_aliases,
|
||
|
server_max_docs, server_wait_time): Added clarification
|
||
|
to server_aliases description. Changed word "directive" to
|
||
|
"attribute" where appropriate. Added cross-link to server_aliases
|
||
|
from limit_normalized.
|
||
|
|
||
|
* htdoc/attrs.html, cf_by{name,prog}.html: Reran cf_generate.pl.
|
||
|
|
||
|
Wed Sep 27 00:05:41 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdb/mifluz[dict, dump, load].cc, htdb/util_sig.h,
|
||
|
htdb/util_sig.cc: New files from mifluz merge. (Whoops, missed a
|
||
|
directory).
|
||
|
|
||
|
* htdb/*.cc: Change config.h references to htconfig.h.
|
||
|
|
||
|
* htlib/myqsort.c: Ditto.
|
||
|
|
||
|
* htcommon/HtWordReference.h, htcommon/HtWordReference.cc: Ensure
|
||
|
we keep the WordContext object around--unfortunately this also
|
||
|
requires that callers initialize us with a WordContext (e.g. from
|
||
|
the HtWordList class).
|
||
|
|
||
|
* htlib/StringMatch.h, htlib/StringMatch.cc: Changes to use
|
||
|
WordType directly instead of HtWordType.
|
||
|
|
||
|
* htfuzzy/*: Ditto. Additionally make sure HtWordReference objects
|
||
|
are intstantiated properly.
|
||
|
|
||
|
* htcommon/DocumentRef.cc, htcommon/HtWordList.cc: As above.
|
||
|
|
||
|
* htdig/*: As above.
|
||
|
|
||
|
* htsearch/*: As above.
|
||
|
|
||
|
* httools/*: Don't bother initializing WordContext--this is done
|
||
|
in the HtWordList class now.
|
||
|
|
||
|
* htdig/htdig.cc: Ditto.
|
||
|
|
||
|
* htsearch/htsearch.cc, htsearch/qtest.cc: Ditto.
|
||
|
|
||
|
* htfuzzy/htfuzzy.cc: Ditto.
|
||
|
|
||
|
* db/Makefile.am, db/Makefile.in: Update to build libhtdb instead
|
||
|
of libdb to prevent conflicts.
|
||
|
|
||
|
Sun Sep 24 22:50:22 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htword/HtWordList.h, htword/HtWordList.cc: Keep a WordContext
|
||
|
object private that is associated with this word database and
|
||
|
provide accessor.
|
||
|
|
||
|
* htword/WordType.h, htword/WordType.cc: Add WordToken function,
|
||
|
migrated from HtWordType class.
|
||
|
|
||
|
* htcommon/HtWordType.cc: WordType class no longer has Instance()
|
||
|
method, so just pass along the calls.
|
||
|
|
||
|
* htlib/DB2_db.cc (db_init): Remove unnecessary NULL parameter.
|
||
|
|
||
|
* htlib/Makefile.am, htlib/Makefile.in: Remove HtVectorGeneric and
|
||
|
derived files as well as HtWordType as these are depreciated.
|
||
|
|
||
|
Wed Sep 20 22:47:01 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* aclocal.m4: Add in missing autoconf macros that somehow didn't
|
||
|
make the merge before. (No idea why I didn't catch this earlier.)
|
||
|
|
||
|
* acinclude.m4: Use newer CHECK_ZLIB macro.
|
||
|
|
||
|
* */Makefile.in: Updated with automake for new build changes.
|
||
|
|
||
|
* configure, include/htconfig.h.in: Updated using autoconf.
|
||
|
|
||
|
* test/dbbench.cc, test/word.cc, test/search.cc: Fix #include to
|
||
|
point to htconfig.h not non-existant config.h.
|
||
|
|
||
|
* htlib/Configuration.h: Fix copy ctor, removing code in header file.
|
||
|
|
||
|
* htword/*.cc: Ditto.
|
||
|
|
||
|
* htword/Makefile.am: Update from mifluz version.
|
||
|
|
||
|
* htlib/myqsort.h, htlib/myqsort.c: Additional system library
|
||
|
replacement code.
|
||
|
|
||
|
Sat Sep 16 20:14:32 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* configure.in, configure, acinclude.m4, aclocal.m4, acconfig.h,
|
||
|
include/htconfig.h.in: Merged with mifluz versions. Main
|
||
|
difference is that top-level configure script now also configures
|
||
|
db/ directory as well.
|
||
|
|
||
|
* Makefile.am, */Makefile.in: Updated with automake for new build
|
||
|
environment (with db/ run through top-level configure).
|
||
|
|
||
|
* db/*.c: Updated to use htconfig.h instead of config.h.
|
||
|
|
||
|
Wed Sep 13 22:05:33 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* Merged in mifluz-0.19 branch. Everything will break
|
||
|
temporarily. Loic and I will clean up tomorrow.
|
||
|
|
||
|
* htdoc/RELEASE.html, htdoc/THANKS.html, htdoc/TODO.html: Get a
|
||
|
start on updting these files for the next release.
|
||
|
|
||
|
* htdoc/cf_generate.pl: Revert change of Sep. 9 to ignore links to
|
||
|
all.html in cf_byprog.html file.
|
||
|
|
||
|
* htdoc/all.html: New file, moved from howitworks.html and not
|
||
|
updated yet.
|
||
|
|
||
|
* htdoc/contents.html: Change link from howitworks.html to all.html
|
||
|
|
||
|
Tue Sep 12 17:00:00 CEST 2000 Quim Sanmarti <qss at gtd.es>
|
||
|
|
||
|
* htsearch: added AndQuery.cc BooleanLexer.cc BooleanQueryParser.cc
|
||
|
ExactWordQuery.cc GParser.cc NearQuery.cc NotQuery.cc
|
||
|
OperatorQuery.cc OrFuzzyExpander.cc OrQuery.cc
|
||
|
PhraseQuery.cc Query.cc QueryLexer.cc QueryParser.cc
|
||
|
SimpleQueryParser.cc VolatileCache.cc WordSearcher.cc
|
||
|
qtest.cc WordSearcher.h AndQuery.h AndQueryParser.h
|
||
|
BooleanLexer.h BooleanQueryParser.h ExactWordQuery.h
|
||
|
FuzzyExpander.h GParser.h NearQuery.h NotQuery.h
|
||
|
OperatorQuery.h OrFuzzyExpander.h OrQuery.h OrQueryParser.h
|
||
|
PhraseQuery.h Query.h QueryCache.h QueryLexer.h
|
||
|
QueryParser.h SimpleLexer.h SimpleQueryParser.h VolatileCache.h.
|
||
|
This is the new query parsing/evaluation framework.
|
||
|
|
||
|
* Modified DocMatch.{cc,h} and ResultList.{cc,h} for compatibility.
|
||
|
|
||
|
* Removed the previous {And,Or,Exact,}ParseTree.{cc,h} files.
|
||
|
|
||
|
* Modified Makefile.{am,in} consequently.
|
||
|
|
||
|
Mon Sep 11 11:56:44 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/ExternalParser.cc (parse): Lowercase content-types and
|
||
|
strip off any trailing semicolons, at one last spot which Geoff missed.
|
||
|
|
||
|
Sat Sep 9 21:28:29 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/Document.cc (getParsable): Fix a bug with earlier
|
||
|
change--if no parser is found and the MIME type is not text/* then
|
||
|
return a NULL parser.
|
||
|
|
||
|
* htdig/Retriever.cc (RetrievedDocument): If a NULL parser is
|
||
|
returned, mark the document as noindex and move on.
|
||
|
|
||
|
* configure.in, configure (enable-tests): Fix bug that would run
|
||
|
the 'yes' program inside the configure script if --enable-tests
|
||
|
was set.
|
||
|
|
||
|
Sat Sep 9 17:50:11 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htcommon/defaults.cc: Add "all" program listing for common
|
||
|
attributes--seems more logical esp. now with many httool programs.
|
||
|
|
||
|
* htdoc/cf_generate.pl (cf_byprog): Do not output a link when
|
||
|
'prog' is 'all.'
|
||
|
|
||
|
* htdoc/attrs.html, cf_by{name,prog}.html: Reran cf_generate.pl.
|
||
|
|
||
|
Sat Sep 9 11:44:47 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* aclocal.m4 (AM_CHECK_YACC): New macro to check for bison/yacc
|
||
|
and use "missing yacc" if not found.
|
||
|
|
||
|
* configure.in (enable_tests): Fix buglet where --enable-tests=no
|
||
|
or --disable-tests would not work and set the default to enabled
|
||
|
tests. Since the tests do not build unless the user does a "make
|
||
|
check" this should not be confusing and should help debugging.
|
||
|
Also use AM_CHECK_YACC instead of AC_CHECK_YACC.
|
||
|
|
||
|
* configure: Regenerate using autoconf.
|
||
|
|
||
|
Sat Sep 9 11:01:03 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/ExternalParser.cc (canParse): Lowercase content-types and
|
||
|
strip off any trailing semicolons. Should prevent problems with
|
||
|
combined content-type; charset values.
|
||
|
(ctor): As above.
|
||
|
|
||
|
* htdig/Document.cc (getParsable): Only assume plain text if MIME
|
||
|
code starts with text/. Should prevent problems with retrieving
|
||
|
things like image/png or application/postscript as text.
|
||
|
|
||
|
Fri Sep 8 22:59:10 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htcommon/defaults.cc: Add new attributes htnotify_replyto,
|
||
|
htnotify_webmaster, htnotify_prefix_file, htnotify_suffix_file.
|
||
|
|
||
|
* htdoc/attrs.html, cf_by{name,prog}.html: Reran cf_generate.pl.
|
||
|
|
||
|
* httools/htnotify.cc: Added in code from Richard Beton
|
||
|
<richard.beton at roke.co.uk> to collect multiple URLs per e-mail
|
||
|
address and allow customization of notification messages by
|
||
|
reading in header/footer text as designated by the new attributes
|
||
|
above.
|
||
|
|
||
|
Fri Sep 8 15:15:00 2000 Quim Sanmarti <qss at gtd.es>
|
||
|
|
||
|
* htsearch/Display.cc: Fixed tiny date_format bug;
|
||
|
added url-decoding template variable expansion.
|
||
|
|
||
|
Thu Sep 7 23:45:25 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/Retriever.cc (Retriever): Only open up md5 database if
|
||
|
check_unique_md5 attribute is set.
|
||
|
|
||
|
Thu Sep 7 22:56:19 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htcommon/URL.cc (DefaultPort): Add file default port of 0.
|
||
|
|
||
|
* htnet/HtFile.cc (Request): Handle directory listings by using
|
||
|
scandir and generating minimal HTML file with appropriate noindex listing.
|
||
|
|
||
|
Wed Sep 06 10:00:50 CEST 2000 Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* htlib/URL.h, htlib/URL.cc: Restored corrected versions of URL.*
|
||
|
* htnet/HtNNTP.h: Removed the error in the NNTP class declaration
|
||
|
|
||
|
Mon Sep 04 13:43:40 CEST 2000 Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* htnet/HtHTTP.cc: Restored previous version of HtHTTP. I removed
|
||
|
an initialization in the constructor (_modification_time). Sorry.
|
||
|
|
||
|
Sun Sep 3 16:51:24 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/Retriever.cc, htdig/Server.cc: Fix compiler warnings about
|
||
|
String conversions.
|
||
|
|
||
|
* configure, configure.in, db/configure, db/configure.in,
|
||
|
db/acinclude.m4, db/aclocal.m4: Ensure --enable-bigfile is handled
|
||
|
correctly by the configure scripts as pointed out by Jesse.
|
||
|
|
||
|
Fri Sep 01 23:28:43 CEST 2000 Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* URL.cc: added DefaultPort() method and changed NNTP default port
|
||
|
from 523 to 119.
|
||
|
* Document.cc: management of NNTP documents retrieval.
|
||
|
|
||
|
Fri Sep 01 19:05:02 CEST 2000 Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* htnet/HtNNTP.* : just created them ...
|
||
|
* htnet/HtHTTP.cc : removed modification_time deletion in the
|
||
|
class destructor.
|
||
|
|
||
|
Thu Sep 01 12:00:00 2000 Toivo Pedaste <toivo at ucs.uwa.edu.au>
|
||
|
|
||
|
* htdig/Retriever.cc: Allow for modify time being set to
|
||
|
current time if not available.
|
||
|
|
||
|
Thu Aug 31 13:21:12 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htcommon/defaults.cc (allow_in_form, build_select_lists):
|
||
|
Add clearer instructions to allow_in_form description, add
|
||
|
cross-links between these two sections.
|
||
|
|
||
|
* htdoc/attrs.html, cf_by{name,prog}.html: Reran cf_generate.pl.
|
||
|
|
||
|
Wed Aug 30 10:01:59 CEST 2000 Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* substition of char * returned types to const String & in URL and
|
||
|
Server classes. This change made me do lots of changes in other files:
|
||
|
HtFile.cc, HtHTTP.cc, HtConfiguration.*, Document.*, ExternalParser.*,
|
||
|
Retriever.*.
|
||
|
|
||
|
Tue Aug 30 12:00:00 2000 Toivo Pedaste <toivo at ucs.uwa.edu.au>
|
||
|
|
||
|
* htlibs/md5.cc, htlibs/md5.h: Generate md5 hash of
|
||
|
a page and also optionally the modify date.
|
||
|
|
||
|
* htlibs/mhash_md5.h, htlibs/mhash_md5.c, htlibs/libdefs.h:
|
||
|
Md5 hash code from libmhash
|
||
|
|
||
|
* htdig/Retriever.cc: Allow storing m5 hashes of pages
|
||
|
in order to reject aliases.
|
||
|
|
||
|
* htcommon/defaults.cc: Options "check_unique_md5" and
|
||
|
"check_unique_date"
|
||
|
|
||
|
Tue Aug 29 08:51:39 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdoc/upgrade.html: Add description of the difference between
|
||
|
htmerge and htpurge. Mention other httools.
|
||
|
|
||
|
* htsearch/parser.cc, htsearch/parser.h: Merge in patch by Quim
|
||
|
Sanmarti <qss at gtd.es> to fix problems with phrase searching and
|
||
|
AND searches and improve performance.
|
||
|
|
||
|
Sun Aug 27 22:41:10 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htsearch/AndParseTree.cc, htsearch/OrParseTree.cc (Parse):
|
||
|
Rewrote using new WordToken inherited method. Fixes a bug where
|
||
|
user input two phrases next to each other.
|
||
|
|
||
|
* htsearch/ParseTree.cc (Parse): Fix bug where phrases would
|
||
|
"adsorb" prior query words. Also fix bug where operators were
|
||
|
incorrectly popped off the stack. Should (hopefully) solve all
|
||
|
parsing problems.
|
||
|
|
||
|
* htsearch/*ParseTree.cc (GetLogicalWords): Test for empty list of
|
||
|
children to prevent potential segfault.
|
||
|
|
||
|
Sat Aug 26 18:40:50 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* installdir/{syntax, header, footer, wrapper, nomatch}.html:
|
||
|
Add DTD tags, ALT attributes and remove bogus </select> tags to
|
||
|
fix invalid HTML pointed out in PR#901.
|
||
|
|
||
|
Wed Aug 23 23:39:18 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htsearch/ParseTree.cc (Parse): Get rid of compiler warnings, use
|
||
|
new private tokenizer to ensure parens and quote aren't
|
||
|
removed. Also, when popping an operator off the parens stack, make
|
||
|
sure it's adopted by a new ParseTree object so we get the parens
|
||
|
back in the tree heirarchy.
|
||
|
|
||
|
Wed Aug 23 23:34:44 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htsearch/AndParseTree.cc (Parse): Fix nasty infinite loop when
|
||
|
phrases hit in AND searches.
|
||
|
|
||
|
* htsearch/OrParseTree.cc (Parse): Ditto.
|
||
|
|
||
|
Wed Aug 23 13:24:31 CEST 2000 Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* htnet/HtHTTP.*, htnet/Transport.h: all 'char *', when possibile,
|
||
|
have been changed into 'const String &' types.
|
||
|
|
||
|
Sun Aug 20 23:25:01 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* httools/htpurge.cc (purgeDocs): Add error message when document
|
||
|
database is completely empty. Should take care of PR#672 (and others).
|
||
|
|
||
|
Sun Aug 20 20:37:53 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/HtRegex.h, htlib/HtRegex.cc: Made destructor virtual,
|
||
|
added lastError() and associated support. Changed return type of
|
||
|
set*() to int. They now return the value of |compiled|.
|
||
|
|
||
|
* htcommon/defaults.cc (url_rewrite_rules): Add new attribute to
|
||
|
support patch by Andy Armstrong <andy at tagish.com> for permanent
|
||
|
URL rewriting.
|
||
|
|
||
|
* htdoc/attrs.html, cf_by{name,prog}.html: Reran cf_generate.pl.
|
||
|
|
||
|
* htlib/HtRegexReplace.cc, htlib/HtRegexReplaceList.cc,
|
||
|
htlib/HtRegexReplace.h, htlib/HtRegexReplaceList.h,
|
||
|
htcommon/HtURLRewriter.cc, htcommon/HtURLRewriter.h: New classes.
|
||
|
|
||
|
* htcommon/Makefile.am, htcommon/Makefile.in: Add compilation for
|
||
|
HtURLRewriter.
|
||
|
|
||
|
* htlib/Makefile.am, htcommon/Makefile.in: Ditto for
|
||
|
HtRegexReplace*
|
||
|
|
||
|
* htcommon/URL.h, htcommon/URL.cc (rewrite): New method for
|
||
|
transforming URLs based on HtURLRewriter.
|
||
|
|
||
|
* htdig/Retriever.cc (got_href): Rewrite the URL before we do
|
||
|
anything with it.
|
||
|
|
||
|
* htdig/htdig.cc: Include HtURLRewriter headers and check rewrite
|
||
|
rules for errors.
|
||
|
|
||
|
Sat Aug 19 17:01:36 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htcommon/conf_lexer.lxx: Patched to fix the bug with relative
|
||
|
filename includes. Keeps a separate stack with the filenames and
|
||
|
adjusts accordingly.
|
||
|
|
||
|
* htcommon/conf_lexer.cxx: Updated using flex 2.5.4.
|
||
|
|
||
|
Thu Aug 17 23:59:26 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htcommon/conf_lexer.lxx: Patched to fix a bug reported by Abel
|
||
|
Deuring -- config filename stack was decremented too many times.
|
||
|
|
||
|
* htcommon/conf_lexer.cxx: Updated using flex 2.5.4.
|
||
|
|
||
|
Thu Aug 17 23:40:08 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htword/WordType.h (WordToken): Add non-destructive version of
|
||
|
HtWordToken using a passed int as a pointer into the
|
||
|
string. Add virtual destructor so class can be sub-classed.
|
||
|
|
||
|
* htword/WordType.cc (WordToken): Implement it.
|
||
|
|
||
|
* httools/htmerge.cc (mergeDB): Back out change of Aug. 9th --
|
||
|
WordSearchDescription has disappeared from htword
|
||
|
interfaces. Should be restored when Loic comes back and can
|
||
|
suggest an alternative.
|
||
|
|
||
|
Thu Aug 17 16:59:05 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/Display.cc (createURL): Get rid of extra "config="
|
||
|
parameter that was inserted before collections stuff.
|
||
|
|
||
|
Thu Aug 17 15:47:58 CEST 2000 Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* htnet/HtHTTP.cc: ask again for a document after a <NoHeader>
|
||
|
response is given by the HTTPRequest() method.
|
||
|
|
||
|
Thu Aug 17 12:25:33 CEST 2000 Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* htnet/HtHTTP.*, htnet/Transport.* : fixed bug with HTTP/1.1 management.
|
||
|
Now the "Connection: close" directive is handled and force the connection
|
||
|
to be closed. So the bug has now been fixed. Fixed other minor bugs and
|
||
|
strings initializations.
|
||
|
|
||
|
Tue Aug 15 00:24:33 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* contrib/multidig/Makefile, gen-collect, db.conf, multidig.conf:
|
||
|
Add missing trailing newlines as pointed out by Doug Moran
|
||
|
<dmoran at dougmoran.com>.
|
||
|
|
||
|
* contrib/multidig/Makefile (install): Make sure scripts have a+x
|
||
|
permissions. Pointed out by Doug Moran.
|
||
|
|
||
|
* contrib/multidig/new-collect: Fix typo to ensure MULTIDIG_CONF
|
||
|
is set correctly.
|
||
|
|
||
|
Sun Aug 13 23:17:30 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/Server.h, htdig/Server.cc (Server): Add support for
|
||
|
per-server user_agent configuration.
|
||
|
|
||
|
* htdig/Document.cc (Retrieve): Ditto.
|
||
|
|
||
|
* httools/htpurge.cc (purgeDocs): Set remove_* attributes on a
|
||
|
per-server basis.
|
||
|
|
||
|
* htcommon/defaults.cc: Fix remove_bad_urls and
|
||
|
remove_unretrieved_urls to point to htpurge and not htmerge.
|
||
|
|
||
|
Sat Aug 12 23:03:32 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdoc/cf_generate.pl (html_escape): Fix mindless thinko with
|
||
|
perl stringwise-equal operator. Documentation is now generated
|
||
|
with block: portion appropriate to defaults.cc.
|
||
|
|
||
|
* htdoc/attrs.html, cf_by{name,prog}.html: Reran cf_generate.pl.
|
||
|
|
||
|
Fri Aug 11 16:03:18 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/HTML.cc (parse): fix problem with & not being translated.
|
||
|
|
||
|
Fri Aug 11 10:48:54 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/Display.cc (setVariables), htcommon/defaults.cc: Added
|
||
|
maximum_page_buttons attribute, to limit buttons to less than
|
||
|
maximum_pages. Fixes PR#731 & PR#781.
|
||
|
* htdoc/attrs.html, cf_by{name,prog}.html: reran cf_generate.pl
|
||
|
|
||
|
Wed Aug 9 23:04:39 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* httools/htmerge.cc (mergeDB): Add fix to prevent duplicate
|
||
|
documents when you merge a database with a copy of itself
|
||
|
contributed by Lorenzo.
|
||
|
|
||
|
Wed Aug 9 22:58:39 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htsearch/parser.cc (score): Merged in patch contributed by
|
||
|
Lorenzo Campedelli <lorenzo.campedelli at libero.it> and Arthur
|
||
|
Prokosch <prokosch at aptima.com> to fix problems with AND operators
|
||
|
and phrase matches.
|
||
|
|
||
|
Wed Aug 2 11:44:11 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/Display.cc (setVariables), htcommon/defaults.cc: Enhanced
|
||
|
build_select_lists attribute, to generate not only single-choice
|
||
|
select lists, but also select multiple lists, radio button lists
|
||
|
and checkbox lists. Added explanation and examples in documentation.
|
||
|
* htdoc/hts_selectors.html: Added detailed explanation of new feature.
|
||
|
* htdoc/attrs.html, cf_by{name,prog}.html: reran cf_generate.pl
|
||
|
|
||
|
Tue Aug 1 21:50:22 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htsearch/ParseTree.cc (Parse): Fix problems with token
|
||
|
comparisons and fix thinko with HtWordToken parsing--previously
|
||
|
didn't advance the parse step at all.
|
||
|
|
||
|
* htsearch/*ParseTree.cc (Parse): Fix thinko with HtWordToken as
|
||
|
above--here it acted as an infinite loop.
|
||
|
|
||
|
* htdig/ExternalParser.cc (parse): Add shell quoting around
|
||
|
content-type. Hard to exploit, but a server could potentially
|
||
|
return a strange value that could then be exectuted locally.
|
||
|
|
||
|
Thu Jun 29 23:33:51 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htsearch/ParseTree.h, htsearch/ParseTree.cc: New parent class
|
||
|
for the new htsearch framework. Still needs work.
|
||
|
|
||
|
* htsearch/*ParseTree.*: Derived classes appropriate to the method
|
||
|
indicated.
|
||
|
|
||
|
* htsearch/parsetest.cc: New program to alllow initial
|
||
|
command-line testing of ParseTree classes.
|
||
|
|
||
|
* htsearch/Makefile.am, htsearch/Makefile.in: Build parsetest in
|
||
|
addition to htsearch. Eventually, parsetest is probably best
|
||
|
modified slightly and moved into the tests directory.
|
||
|
|
||
|
Tue Jun 20 22:29:57 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* httools/htmerge.cc (mergeDB): Merge in patch contributed by
|
||
|
Lorenzo Campedelli <lorenzo.campedelli at libero.it> to greatly
|
||
|
reduce memory usage.
|
||
|
|
||
|
Sun Jun 18 13:15:43 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/Object.h (class Object): Fix problems with retrieval order
|
||
|
by insuring the compare() method is declared const.
|
||
|
|
||
|
Tue Jun 13 22:57:10 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/Retriever.cc (GetLocal): Fix bug that would cause a
|
||
|
coredump when local_urls was used and local_default_docs was
|
||
|
needed. The list of default filenames was freed before it should
|
||
|
have been.
|
||
|
|
||
|
Tue Jun 13 19:30:28 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htcommon/HtWordReference.h, htcommon/HtWordReference.cc (Load,
|
||
|
LoadHeaders): New methods to check the header of an ASCII
|
||
|
representation and read it in.
|
||
|
|
||
|
* htcommon/HtWordList.h, htcommon/HtWordList.cc (Load): Add load
|
||
|
method to read in data. Calls the new methods above.
|
||
|
|
||
|
* httools/htload.cc: Open word databases read-write and call
|
||
|
HtWordList::Load().
|
||
|
|
||
|
Sun Jun 11 14:39:28 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htsearch/Display.cc (generateStars): Fix problem when maxScore
|
||
|
== minScore as reported by Rajendra. Fixed problem PR#858.
|
||
|
(displayMatch): Ditto.
|
||
|
|
||
|
* htsearch/htsearch.cc: Fix memory corruption problem in reporting
|
||
|
syntax errors pointed out by Rajendra. Fixes PR#860.
|
||
|
|
||
|
Thu Jun 8 09:31:15 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htfuzzy/Accents.h, htfuzzy/Accents.cc: Apply Robert Marchand's
|
||
|
patch to his algorithm. Gets rid of writeDB function (falls back
|
||
|
on default one in Fuzzy.cc), changes addWord, and adds a new
|
||
|
getWords function to override default. These avoid overhead of
|
||
|
unaccented forms of words in accents database, but ensure that
|
||
|
unaccented form of search word is always searched.
|
||
|
|
||
|
Thu Jun 8 09:00:02 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htcommon/DocumentRef.h(DocScore, docScore),
|
||
|
htsearch/ResultMatch.cc(ScoreMatch::compare),
|
||
|
htsearch/ResultMatch.h(setScore, getScore, score),
|
||
|
htsearch/Display.cc(displayMatch, generateStars, buildMatchList):
|
||
|
Apply Terry Luedtke's patch for score calculations, to calculate
|
||
|
min & max from log(score).
|
||
|
|
||
|
Thu Jun 8 08:47:03 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* contrib/doc2html/doc2html.pl: Apply David Adams' fix for missing
|
||
|
quote.
|
||
|
|
||
|
Wed Jun 07 10:53:53 2000 Loic Dachary <loic at senga.org>
|
||
|
|
||
|
* db/db.c (CDB___db_dbenv_setup): open mode is 0666 instead
|
||
|
of 0 otherwise the weakcmpr file is not open with the proper
|
||
|
mode.
|
||
|
|
||
|
Tue Jun 6 23:48:48 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* httools/htpurge.cc: Fix coredump problems by passing
|
||
|
dictionaries as pointers rather than full objects (this is
|
||
|
preferred anyway).
|
||
|
|
||
|
Sun Jun 4 22:17:14 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* test/t_htdig_local: Added test for local filesystem support.
|
||
|
|
||
|
* test/config/htdig.conf2.in: Change to be a config file for
|
||
|
local_urls testing.
|
||
|
|
||
|
* test/Makefile.am: Add t_htdig_local to list.
|
||
|
|
||
|
Tue May 30 23:52:45 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* httools/htmerge.cc: Move to httools directory, remove "cleanup"
|
||
|
functionality now in htpurge and merge in htmerge.h and db.cc files.
|
||
|
|
||
|
* httools/Makefile.am: Add htmerge now moved to this directory.
|
||
|
|
||
|
* */Makefile.in: Update with automake.
|
||
|
|
||
|
* Makefile.am (SUBDIRS): Remove htmerge, now found in httools.
|
||
|
|
||
|
* configure.in: Ditto.
|
||
|
|
||
|
* configure: Update with autoconf.
|
||
|
|
||
|
* test/test_functions.in: Add paths for htpurge, htstat, htload,
|
||
|
htdump and update path for htmerge.
|
||
|
|
||
|
* test/t_htdig: Change htmerge to htpurge to clean out incorrect URLs.
|
||
|
|
||
|
* installdir/rundig: Change htmerge to htpurge. This needs serious
|
||
|
additional cleanup for use in 3.2 since many conventions have changed!
|
||
|
|
||
|
Tue May 23 22:21:14 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* README: Fix for 3.2.0b3 and clean up organization a bit for new
|
||
|
directory structure.
|
||
|
|
||
|
Wed May 17 23:22:31 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/HTML.cc (do_tag): Add support for TITLE attributes in
|
||
|
anchor and related tags.
|
||
|
|
||
|
Fri May 12 17:54:09 2000 Loic Dachary <loic at senga.org>
|
||
|
|
||
|
* db/acinclude.m4: bigfile support is disabled by default.
|
||
|
|
||
|
* db/mp_region.c (CDB___memp_close): clear weakcmpr pointer
|
||
|
when closing region so that memory pool files are not
|
||
|
released twice.
|
||
|
|
||
|
Wed May 10 22:26:21 2000 Loic Dachary <loic at senga.org>
|
||
|
|
||
|
* */*.cc: all include htconfig.h
|
||
|
|
||
|
* htlib/HtTime.h: remove htconfig.h inclusion (never in headers)
|
||
|
|
||
|
* htlib/*.h,*.cc: Fix copyright GNU Public -> Gnu General Public
|
||
|
and 1999, 2000 instead of 1999.
|
||
|
|
||
|
Tue May 09 16:38:07 2000 Loic Dachary <loic at senga.org>
|
||
|
|
||
|
* htsearch/Collection.cc (Collection): set searchWords and
|
||
|
searchWordsPattern to null in constructor. Delete in destructor.
|
||
|
Also delete matches in destructor.
|
||
|
|
||
|
* test/word.cc (doskip_harness): free cursor after use.
|
||
|
|
||
|
* test/word.cc (doskip_overflow): free cursor after use.
|
||
|
|
||
|
* test/dbbench.cc (find): free cursor after use.
|
||
|
|
||
|
* htsearch/htsearch.cc (main): free searchWords and searchWordsPattern
|
||
|
after usage.
|
||
|
|
||
|
* htdb/htdb_{load,dump,stat}.cc (main): call WordContext::Finish
|
||
|
to free global context for inverted index.
|
||
|
|
||
|
* htdb/htdb_stat.cc (btree_stats): free stat structure.
|
||
|
|
||
|
* htlib/List.h (class List): Add Shift/Unshift/Push/Pop methods.
|
||
|
|
||
|
* htlib/List.h (class List): Add Remove(int position) method.
|
||
|
|
||
|
Tue May 09 00:22:33 2000 Loic Dachary <loic at senga.org>
|
||
|
|
||
|
* htsearch/htsearch.cc (main): kill useless call to
|
||
|
StringList::Release
|
||
|
|
||
|
* htsearch/HtURLSeedScore.cc (ScoreAdjustItem): remove useless
|
||
|
call to StringList::Destroy.
|
||
|
|
||
|
* htlib/HtWordCodec.cc (HtWordCodec): Fix usage of StringList
|
||
|
that was inserting pointers to volatile strings instead of
|
||
|
permanent copies. I suspect that the tweak on StringList was
|
||
|
primarily done to satisfy this piece of code. After reviewing
|
||
|
all the usage of StringList, it's the only one to use it in this
|
||
|
fashion.
|
||
|
|
||
|
* htlib/QuotedStringList.h (class QuotedStringList): remove
|
||
|
noop destructor to enable Destroy of the underlying StringList
|
||
|
when deleted.
|
||
|
|
||
|
Mon May 08 18:17:02 2000 Loic Dachary <loic at senga.org>
|
||
|
|
||
|
* htlib/StringList.h (class StringList): change methods
|
||
|
Add/Insert/Assign that were copying the String* given in argument.
|
||
|
This behaviour is confusing since it has a different semantic
|
||
|
than the base class List.
|
||
|
|
||
|
Mon May 08 17:16:00 2000 Loic Dachary <loic at senga.org>
|
||
|
|
||
|
* htdig/Retriever.cc (GetLocal): fix leaked defaultdocs
|
||
|
|
||
|
Mon May 08 04:27:47 2000 Loic Dachary <loic at senga.org>
|
||
|
|
||
|
* htlib/StringList.cc (Create): remove SRelease. Deleting
|
||
|
the strings is taken care of by the destructor thru
|
||
|
Destroy. If destruction of the Strings is not desirable
|
||
|
Release should be used. SRelease was added apparently after
|
||
|
a virtual constructor doing nothing was added to hide the
|
||
|
default call to Destroy therefore leaking memory.
|
||
|
|
||
|
Mon May 08 01:28:25 2000 Loic Dachary <loic at senga.org>
|
||
|
|
||
|
* test/txt2mifluz.cc,word.cc,search.cc: fix minor memory leaks.
|
||
|
|
||
|
Sun May 07 19:24:12 2000 Loic Dachary <loic at senga.org>
|
||
|
|
||
|
* Makefile.config (HTLIBS): add libht at end because htdb
|
||
|
now depends on htlib.
|
||
|
|
||
|
* configure.in,htlib/Makefile.am: use LTLIBOBJS as suggested
|
||
|
by the libtool documentation.
|
||
|
|
||
|
Sun May 07 17:09:22 2000 Loic Dachary <loic at senga.org>
|
||
|
|
||
|
* test/Makefile.am (clean-local): clean conf to prevent
|
||
|
inconsistencies when re-configuring in a directory that
|
||
|
is not the source directory.
|
||
|
|
||
|
Sun May 07 05:07:23 2000 Loic Dachary <loic at senga.org>
|
||
|
|
||
|
* db/mkinstalldir,test/benchmark: Add for installation purpose
|
||
|
|
||
|
Sun May 07 02:17:03 2000 Loic Dachary <loic at senga.org>
|
||
|
|
||
|
* Makefile.am (distclean-local): Xtest instead of test
|
||
|
that confuse some shells.
|
||
|
|
||
|
Sun May 07 02:02:46 2000 Loic Dachary <loic at senga.org>
|
||
|
|
||
|
* htword/WordDB.cc: Move Open to WordDB.cc.
|
||
|
|
||
|
Sun May 07 01:32:47 2000 Loic Dachary <loic at senga.org>
|
||
|
|
||
|
* test/t_*: check/fix scripts. All regression tests pass
|
||
|
on RedHat-6.2.
|
||
|
|
||
|
Sun May 07 00:54:30 2000 Loic Dachary <loic at senga.org>
|
||
|
|
||
|
* */*.cc: fix warnings and large file support inclusion
|
||
|
files on Solaris.
|
||
|
|
||
|
Sat May 06 21:55:58 2000 Loic Dachary <loic at senga.org>
|
||
|
|
||
|
* test/: import regression tests from mifluz
|
||
|
|
||
|
* htlib/DB2_db.cc (db_init): fix flags used when creating the
|
||
|
environment to include a memory pool.
|
||
|
|
||
|
* htcommon/defaults.cc: change wordkey_description format.
|
||
|
update all wordlist_* attributes
|
||
|
|
||
|
Sat May 06 04:46:03 2000 Loic Dachary <loic at senga.org>
|
||
|
|
||
|
* htmerge/words.cc (mergeWords): WordSearchDescription becomes
|
||
|
WordCursor.
|
||
|
|
||
|
* httools/htpurge.cc (purgeWords): WordSearchDescription becomes
|
||
|
WordCursor.
|
||
|
|
||
|
Sat May 06 02:01:40 2000 Loic Dachary <loic at senga.org>
|
||
|
|
||
|
* htdb/*: upgrade to Berkeley DB 3.0.55. Very different.
|
||
|
|
||
|
* htlib/getcwd.c,memcmp.c,memcpy.c,memmove.c,raise.c,snprintf.c,
|
||
|
strerror.c,vsnprintf.c,clib.h: Add compatibility support
|
||
|
|
||
|
* htcommon/DocumentDB.cc (LoadDB): remove unused variable
|
||
|
|
||
|
* htlib/DB2_db.cc: adapt to Berkeley DB 3.0.55 syntax.
|
||
|
|
||
|
* htlib/Database.h (class Database): remove DB_INFO, does
|
||
|
not exist in Berkeley DB 3.0.55
|
||
|
|
||
|
* htlib/*: run ../db/prefix-symbols.sh
|
||
|
|
||
|
* Makefile.config (INCLUDES): fix db include dirs
|
||
|
|
||
|
* acconfig.h: Big file support + replacement functions
|
||
|
|
||
|
* acinclude.m4,configure.in : db instead of db/dist + bug fixes
|
||
|
|
||
|
Fri May 5 08:33:59 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* db/*: Merge in changes from Loic's mifluz tree. This will break
|
||
|
everything, but Loic promises he'll fix it ASAP after I make this
|
||
|
change.
|
||
|
|
||
|
Mon Apr 24 21:58:22 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/htdig.cc (main): Make the -l stop & restart mode the
|
||
|
default. This will catch signals and quit gracefully. The
|
||
|
command-line parser will still accept -l, it will just ignore it.
|
||
|
(usage): Remove -l portion.
|
||
|
(main): Fix -m option to read in a file as it's
|
||
|
supposed to do! Also set max_hops correctly so really only indexes
|
||
|
the URLs in that file.
|
||
|
|
||
|
* htdoc/htdig.html: Remove -l from documentation since it's now
|
||
|
the default.
|
||
|
|
||
|
Mon Apr 24 21:22:53 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/Server.cc (push): Fix bug where changes in the robots.txt
|
||
|
would be ignored. If a URL was indexed and later the robots.txt
|
||
|
changed to forbid it, the URL would still be updated.
|
||
|
|
||
|
Wed Apr 19 22:13:02 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* Merging in changes from mifluz 0.14 from Loic.
|
||
|
|
||
|
* htlib/Configuration.cc (Read): Removed dependency on fstream.h,
|
||
|
use fopen, fprintf, fgets, fclose instead of iostream.
|
||
|
|
||
|
* htlib/HtPack.cc, htlib/HtVectorGeneric.h, htlib/Object.h,
|
||
|
htlib/ParsedString.cc, htlib/String.cc: Remove use of cerr,
|
||
|
instead use fprintf(stderr ...).
|
||
|
|
||
|
* htlib/Dictionary.cc, htlib/HtVectorGeneric.cc, htlib/List.cc,
|
||
|
htlib/Object.cc, htlib/StringList.cc, htlib/htString.h,
|
||
|
htlib/strcasecmp.cc: Add #ifdef blocks for htconfig.h
|
||
|
|
||
|
Wed Apr 12 19:09:40 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* .version: Bump to 3.2.0b3.
|
||
|
|
||
|
* htdoc/htload.html, htdoc/htpurge.html, htdoc/htstat.html: Fix
|
||
|
typos in headers.
|
||
|
|
||
|
* htdoc/main.html: Fix link to download to actually point to 3.2.0b2.
|
||
|
|
||
|
Tue Apr 11 00:21:48 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htsearch/htsearch.cc (setupWords): Does not apply fuzzy
|
||
|
algorithms to phrase queries. This helps prevent the infinite
|
||
|
loops described on the mailing list.
|
||
|
|
||
|
* htcommon/conf_parser.yxx (list): Add conditions for lists
|
||
|
starting with string-number, number-string, and number-number.
|
||
|
|
||
|
* htcommon/conf_parser.cxx: Regenerate using bison.
|
||
|
|
||
|
* htdoc/RELEASE.html: Update release notes for recent bug fixes
|
||
|
and likely release date for 3.2.0b2.
|
||
|
|
||
|
* htdoc/main.html: Add a blurb about the 3.2.0b2 release.
|
||
|
|
||
|
* htdoc/*.html: Remove author notes in the footer as requested by
|
||
|
Andrew. To balance it out, the copyright notice at the top links
|
||
|
to THANKS.html.
|
||
|
|
||
|
Sun Apr 9 15:21:12 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htcommon/conf_parser.yxx (list): Fix problem with
|
||
|
build_select_lists--parser didn't support lists including numbers.
|
||
|
|
||
|
* htcommon/conf_parser.cxx: Regenerate using bison.
|
||
|
|
||
|
Sun Apr 9 12:53:02 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdoc/RELEASE.html: Add a first draft of 3.2.0b2 release notes.
|
||
|
|
||
|
Sun Apr 9 12:31:13 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* httools/Makefile.am, httools/Makefile.in: Add htload to
|
||
|
compilation list.
|
||
|
|
||
|
* htcommon/DocumentDB.h: Add optional verbose options to DumpDB
|
||
|
and LoadDB.
|
||
|
|
||
|
* htcommon/DocumentDB.cc (LoadDB): Implement loading and parsing
|
||
|
an ASCII version of the document database. Records on disk will
|
||
|
replace any matching records in the db.
|
||
|
(DumpDB): Add all fields in the DocumentRef to ensure the entire
|
||
|
database is written out.
|
||
|
|
||
|
* htcommon/DocumentRef.h: Add new method for setting DocStatus
|
||
|
from an int type.
|
||
|
|
||
|
* htcommon/DocumentRef.cc (DocStatus): Set it using a switch
|
||
|
statement. (It's not pretty, but it works.)
|
||
|
|
||
|
* httools/htload.cc: New file. Loads in ASCII versions of the
|
||
|
databases, replacing existing records if found.
|
||
|
|
||
|
* httools/htdump.cc: Pass verbose flags to DumpDB method. Make
|
||
|
sure to close the document DB before quitting.
|
||
|
|
||
|
* httools/htpurge.cc: Add -u option to specify a URL to purge from
|
||
|
the command-line.
|
||
|
|
||
|
* httools/htstat.cc: Add -u option to output the list of URLs in
|
||
|
the document DB as well.
|
||
|
|
||
|
Sat Apr 8 16:35:55 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htcommon/defaults.cc: Change all <b>, <i>, and <tt> tags to the
|
||
|
HTML-4.0 compliant <strong>, <em>, and <code> tags.
|
||
|
|
||
|
* installdir/long.html, installdir/header.html,
|
||
|
installdir/nomatch.html, installdir/syntax.html,
|
||
|
installdir/wrapper.html: Ditto.
|
||
|
|
||
|
* htdoc/*.html: Ditto. (Don't you just love sed?)
|
||
|
|
||
|
* htsearch/TemplateList.cc (createFromString): Ditto.
|
||
|
|
||
|
* htdoc/htpurge.html, htdoc/htdump.html, htdoc/htload.html,
|
||
|
htdoc/htstat.html: New files documenting usage of httools
|
||
|
programs.
|
||
|
|
||
|
* htdoc/contents.html: Add links to above.
|
||
|
|
||
|
* htdoc/htdig.html: Update table with -t format to match htdump.
|
||
|
|
||
|
Fri Apr 7 00:30:01 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* README: Update to mention 3.2.0b2 and use correct copyright. (It
|
||
|
is 2000 after all!)
|
||
|
|
||
|
* htdoc/FAQ.html, htdoc/where.html, htdoc/uses.html,
|
||
|
htdoc/isp.html: Update with most recent versions from maindocs.
|
||
|
|
||
|
* htdoc/RELEASE.html: Add release notes for 3.1.5 to the
|
||
|
top. (It's out of version ordering, but it is in correct
|
||
|
chronological order.)
|
||
|
|
||
|
Fri Apr 7 00:11:29 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* httools/htpurge.cc (main): Read in URLs from STDIN for purging,
|
||
|
one per line. Pass them along to purgeDocs for removal. Also, make
|
||
|
discard_list into a local variable and pass it from purgeDocs to
|
||
|
purgeWords.
|
||
|
(purgeDocs): Accept a hash of URLs to delete (user input) and
|
||
|
return the list of doc IDs deleted.
|
||
|
(usage): Note the - option to read in URLs to be deleted from STDIN.
|
||
|
|
||
|
Thu Apr 6 00:10:23 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/Retriever.cc (got_redirect): Allow the redirect to accept
|
||
|
relative redirects instead of just full URLs.
|
||
|
|
||
|
Wed Apr 5 15:07:52 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/Display.cc: Added #if test to make sure DBL_MAX is
|
||
|
defined on Solaris, as reported by Terry Luedtke.
|
||
|
|
||
|
Tue Apr 4 12:46:37 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* contrib/doc2html/*: Added parser submitted by D.J.Adams at soton.ac.uk
|
||
|
|
||
|
Mon Apr 3 13:48:59 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htcommon/defaults.cc: Fix error in description of new attribute
|
||
|
plural_suffix.
|
||
|
* htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html:
|
||
|
Regenerate using cf_generate.pl.
|
||
|
|
||
|
Fri Mar 31 21:48:02 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* configure.in, configure: Add test using AC_TRY_RUN to compile
|
||
|
against the htlib/regex.c and attempt to compile a regexp. This
|
||
|
should allow us to find out if the included regex code causes
|
||
|
problems.
|
||
|
|
||
|
* acconfig.h: Add HAVE_BROKEN_REGEX as a result of the configure
|
||
|
script to conditionally include the appropriate regex.h file.
|
||
|
|
||
|
* include/htconfig.h.in: Regenerate using autoheader.
|
||
|
|
||
|
* htlib/regex.c: Move #include "htconfig.h" inside HAVE_CONFIG_H
|
||
|
tests. This file is only created when this is true anyway. This
|
||
|
prevents problems with the configure test.
|
||
|
|
||
|
* htlib/HtRegex.h, htfuzzy/EndingsDB.cc: Use HAVE_BROKEN_REGEX
|
||
|
switch to use the system include instead of the local include
|
||
|
where appropriate.
|
||
|
|
||
|
* htlib/Makefile.am, htlib/Makefile.in: Only compile regex.lo if
|
||
|
the configure script added it to LIBOBJS.
|
||
|
|
||
|
Thu Mar 30 22:41:38 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htcommon/URL.cc (normalizePath): Remove Gilles's loop to add
|
||
|
back ../ components to a path that would go above the top
|
||
|
level. Now we simply discard them. Both are allowed under the RFC,
|
||
|
but this should have fewer "surprises."
|
||
|
|
||
|
Tue Mar 28 21:57:49 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htnet/Connection.cc (Read_Partial): Fix bug reported by Valdas
|
||
|
where a zero value returned by select would result in an infinite
|
||
|
loop.
|
||
|
|
||
|
* htcommon/defaults.cc: Add new attribute plural_suffix to set the
|
||
|
language-dependent suffix for PLURAL_MATCHES contributed by Jesse.
|
||
|
|
||
|
* htsearch/Display.cc (setVariables): Use it.
|
||
|
|
||
|
* htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html:
|
||
|
Regenerate using cf_generate.pl.
|
||
|
|
||
|
Mon Mar 27 22:28:20 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htcommon/DocumentRef.cc (Deserialize): Add back stub for
|
||
|
DOC_IMAGESIZE to prevent decoding errors. This just throws away
|
||
|
that field.
|
||
|
|
||
|
* htcommon/HtSGMLCodec.h (class HtSGMLCodec): Differentiate
|
||
|
between codec used for &foo; and numeric form &#nnn; Make sure
|
||
|
encoding goes through both but decoding only goes through the
|
||
|
preferred text form.
|
||
|
|
||
|
* htcommon/HtSGMLCodec.cc (HtSGMLCodec): When constructing the
|
||
|
private HtWordCodec objects, create separate lists for the number
|
||
|
and text codecs.
|
||
|
|
||
|
Mon Mar 27 21:25:27 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htsearch/HtURLSeedScore.cc (ScoreAdjustItem): Change to use
|
||
|
HtRegex for flexibility and to get around const char * -> char *
|
||
|
problems.
|
||
|
|
||
|
* htsearch/SplitMatches.cc (MatchArea): Ditto.
|
||
|
|
||
|
* htsearch/Makefile.am, htsearch/Makefile.in: Add SplitMatches.cc
|
||
|
and HtURLSeedScore.cc to compilation list!
|
||
|
|
||
|
Mon Mar 27 21:03:12 2000 Hans-Peter Nilsson <hp at bitrange.com>
|
||
|
|
||
|
* htcommon/defaults.cc (defaults): Add default for
|
||
|
search_results_order, url_seed_score.
|
||
|
|
||
|
* htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html:
|
||
|
Regenerated using cf_generate.pl.
|
||
|
|
||
|
* htlib/List.h (List): New method AppendList.
|
||
|
* htlib/List.cc (List::AppendList): Implement it.
|
||
|
|
||
|
* htsearch/SplitMatches.h, htsearch/SplitMatches.cc: New.
|
||
|
|
||
|
* htsearch/HtURLSeedScore.cc, HtURLSeedScore.h: New.
|
||
|
|
||
|
* htsearch/Display.h (class Display: Add member minScore.
|
||
|
Change maxScore type to double.
|
||
|
|
||
|
* htsearch/Display.cc: Include SplitMatches.h and HtURLSeedScore.h
|
||
|
(ctor): Initialize minScore, change init value for
|
||
|
maxScore to -DBL_MAX.
|
||
|
(buildMatchList): Use a SplitMatches to hold search results and
|
||
|
interate over its parts when sorting scores.
|
||
|
Ignore Count() of matches when setting minScore and maxScore.
|
||
|
Use an URLSeedScore to adjust the score after other calculations.
|
||
|
Calculate minScore.
|
||
|
Correct maxScore adjustment for change to double.
|
||
|
(displayMatch): Use minScore in calculation of score to adjust for
|
||
|
negative scores.
|
||
|
(sort): Calculation of maxScore moved to buildMatchList.
|
||
|
|
||
|
Mon Mar 27 20:22:24 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htcommon/DocumentRef.h, htcommon/DocumentRef.cc: Remove
|
||
|
DocImageSize field since it is not used anywhere and is never updated.
|
||
|
|
||
|
* htdig/Retriever.h (class Retriever): Remove references to Images class.
|
||
|
|
||
|
* htcommon/DocumentDB.cc (DumpDB): Ignore DocImageSize field.
|
||
|
|
||
|
* htdig/Makefile.am, htdig/Makefile.in: Remove Images.cc since
|
||
|
this is no longer used.
|
||
|
|
||
|
* htdig/Plaintext.cc: Do not insert SGML equivalents into the
|
||
|
excerpt, these are decoded by HtSGMLCodec automatically.
|
||
|
|
||
|
Sat Mar 25 21:58:36 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdoc/cf_generate.pl (html_escape): Changed <b></b> and <i></i>
|
||
|
tags to HTML 4.0 <strong> and <em> tags.
|
||
|
|
||
|
Sat Mar 25 17:23:46 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdb/Makefile.am, htdb/Makefile.in: Change the names of the htdb
|
||
|
utility programs to escape name conflicts with httool programs.
|
||
|
|
||
|
* htdb/htdb_load.cc: Rename htload.cc to escape name conflict and
|
||
|
more closely match orignal db_load program name.
|
||
|
|
||
|
* htdb/htdb_dump.cc, htdb/htdb_stat.cc: Ditto.
|
||
|
|
||
|
* htfuzzy/Prefix.cc (getWords): Add code to "weed out" duplicates
|
||
|
returned from WordList::Prefix. We only want to add unique words
|
||
|
to the search list.
|
||
|
|
||
|
Fri Mar 24 22:33:20 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/Document.cc (Document): Fix bug reported by Mentos
|
||
|
Hoffman, contributed by Atlee Gordy <agordy at moonlight.net>.
|
||
|
|
||
|
Mon Mar 20 23:14:26 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htcommon/DocumentDB.cc (Delete): Fix bug reported by Valdas
|
||
|
where duplicate document records could "sneak in" because the
|
||
|
doc_index entry was removed incorrectly.
|
||
|
|
||
|
Mon Mar 20 19:08:14 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htcommon/defaults.cc: Added block field and added appropriate blocks.
|
||
|
|
||
|
* htlib/Configuration.h (struct ConfigDefaults): Add block field.
|
||
|
|
||
|
* htdoc/cf_generate.pl: Parse the new block field.
|
||
|
|
||
|
* htdoc/cf_byname.html, htdoc/cf_byprog.html, htdoc/attrs.html:
|
||
|
Regenerate using above.
|
||
|
|
||
|
* htcommon/DocumentDB.cc (DumpDB): Make sure we decompress the
|
||
|
DocHead field before we write it to disk!
|
||
|
|
||
|
* httools/htdump.cc, httools/htstat.cc: Call
|
||
|
WordContext::Initialize() before doing any htword calls.
|
||
|
|
||
|
Mon Mar 20 14:10:30 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* httools/htpurge.cc: Whoops! Left some references to htmerge in
|
||
|
the error messages and usage message.
|
||
|
|
||
|
* httools/htstat.cc: New program. Simply spits up the total number
|
||
|
of documents, words and unique words in the databases.
|
||
|
|
||
|
* httools/htdump.cc: New program. Simply dumps the contents of the
|
||
|
document DB and the word DB to doc_list and word_dump files
|
||
|
respectively. Also has flags -w and -d to pick one or the other.
|
||
|
|
||
|
* httools/Makefile.am, httools/Makefile.in: Add htdump and htstat
|
||
|
programs to compilation list.
|
||
|
|
||
|
* htcommon/DocumentDB.cc (DumpDB): Change name of CreateSearchDB
|
||
|
and add fields for DocBackLinks, DocSig, DocHopCount, DocEmail,
|
||
|
DocNotification, and DocSubject. This should now export every
|
||
|
portion of the document DB.
|
||
|
|
||
|
* htcommon/DocumentDB.h: Change name of CreateSearchDB and add
|
||
|
stub for LoadDB, to be written shortly.
|
||
|
|
||
|
* htdig/htdig.cc: Call DumpDB instead of CreateSearchDB when
|
||
|
creating an ASCII version of the DB.
|
||
|
|
||
|
Sat Mar 18 22:57:02 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* httools/Makefile.am, httools/Makefile.in: New directory for
|
||
|
useful database utilities.
|
||
|
|
||
|
* httools/htnotify.cc: Moved htnotify to httools directory.
|
||
|
|
||
|
* httools/htpurge.cc: New program--currently just purges documents
|
||
|
(and corresponding words) in the databases. Will shortly also
|
||
|
allow deletion of specified URLs.
|
||
|
|
||
|
* Makefile.am, configure.in: Remove htnotify directory in favor of
|
||
|
httools directory.
|
||
|
|
||
|
* configure: Regenerate using autoconf.
|
||
|
|
||
|
* Makefile.in: Regenerate using automake --foreign.
|
||
|
|
||
|
Fri Mar 17 16:47:37 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/Display.cc (excerpt, hilight): Correctly handle case
|
||
|
where there is no pattern to highlight.
|
||
|
* htsearch/htsearch.cc (addRequiredWords), htcommon/defaults.cc:
|
||
|
Add any_keywords attribute, to OR keywords rather than ANDing,
|
||
|
fix addRequiredWords not to mess up expression when there are
|
||
|
no search words, but required words are given.
|
||
|
* htdoc/hts_form.html: Mention new attribute, add links to all
|
||
|
mentioned attributes.
|
||
|
* htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html:
|
||
|
Regenerate using cf_generate.pl.
|
||
|
|
||
|
Fri Mar 17 15:48:12 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htfuzzy/Accents.cc (generateKey): Truncate words to
|
||
|
maximum_word_length, for consistency with what's found in word DB.
|
||
|
|
||
|
Fri Mar 17 10:56:17 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/HTML.cc (do_tag): Use case insensitive parsing of META
|
||
|
robots tag content.
|
||
|
* htlib/String.cc (uppercase): Fix misplaced cast for islower().
|
||
|
|
||
|
Mon Mar 6 17:31:37 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/htsearch.cc (setupWords): Don't allow comma as string
|
||
|
list separator, as it can be a decimal point in some locales.
|
||
|
|
||
|
Mon Mar 06 00:58:00 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* db/mp/mp_bh.c (__memp_bhfree): always free the chain, if
|
||
|
any. The bh is reset to null after free and we loose the
|
||
|
pointer anyway, finally filling the pool with it.
|
||
|
|
||
|
* db/mp/mp_cmpr.c (__memp_cmpr_write): i < CMPR_MAX - 1 instead of
|
||
|
i < CMPR_MAX otherwise go beyond array limits. This fixes a
|
||
|
major problem when handling large files.
|
||
|
|
||
|
Sat Mar 04 19:41:49 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* db/mp/mp_cmpr.c (__memp_cmpr_free_chain): clear BH_CMPR
|
||
|
flag. Was causing core dumps, thanks to
|
||
|
Peter Marelas maral at phase-one.com.au for providing
|
||
|
a simple case to reproduce the error.
|
||
|
|
||
|
Fri Mar 3 11:32:34 2000 CEST Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* Fixed bugs regarding yesterday's changes. Even Leonardo da Vinci
|
||
|
used to commit errors, so ...
|
||
|
|
||
|
Fri Mar 3 11:25:42 2000 CEST Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* testnet.cc: added the -r and -w options in order to set how many
|
||
|
times it retries to re-connect after a timeout occurs, and how long
|
||
|
it should wait after it.
|
||
|
|
||
|
Thu Mar 2 18:45:15 2000 CEST Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* htnet/Connection.*: management of wait_time and number of retries
|
||
|
after a timeout occurs.
|
||
|
|
||
|
* htnet/Transport.*: Management of connection attributes above.
|
||
|
|
||
|
* htdig/Server.*: Set members for managing timeout retries taken from
|
||
|
the configuration file ("timeout", "tcp_max_retries", "tcp_wait_time").
|
||
|
|
||
|
* htdig/Document.cc: Added the chance to configure on a server basis
|
||
|
"persistent_connections", "head_before_get", "timeout",
|
||
|
"tcp_max_retries", "tcp_wait_time". Changed Retrieve method accepting
|
||
|
now a server object pointer: Retrieve (server*, HtDateTime).
|
||
|
|
||
|
* htdig/Retriever.cc: Added the chance to configure on a server basis
|
||
|
"max_connection_requests" attribute.
|
||
|
|
||
|
* htcommon/defaults.cc: Added "tcp_max_retries", "tcp_wait_time" -- Need
|
||
|
to be go over by someone who speaks english better than me. Not a hard
|
||
|
work !!! ;-)
|
||
|
|
||
|
Wed Mar 1 17:01:09 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/Display.cc (excerpt, hilight): move SGML encoding into
|
||
|
hilight() function, because when it's done earlier it breaks
|
||
|
highlighting of accented characters.
|
||
|
|
||
|
Wed Mar 1 16:02:49 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htfuzzy/htfuzzy.cc (main): Correctly test return value on Open()
|
||
|
of word database, include db name in error message if Open() fails,
|
||
|
do a WordContext::Initialize() before we need htword functions.
|
||
|
(Obviously I'm the first to test htfuzzy in 3.2!)
|
||
|
* htfuzzy/Accents.cc (generateKey): cast characters to unsigned char
|
||
|
before using as array subscripts.
|
||
|
|
||
|
Wed Mar 1 13:27:26 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htcommon/defaults.cc: Added accents_db attribute, mentioned accents
|
||
|
algorithm in search_algorithms section.
|
||
|
* htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html:
|
||
|
Regenerate using cf_generate.pl.
|
||
|
* installdir/htdig.conf: Added mentions of accents, speling & substring,
|
||
|
fixed a couple typos in comments.
|
||
|
* htdoc/htfuzzy.html: Added blurb on accents algorithm.
|
||
|
* htdoc/require.html: Added mentions of accents, speling, substring,
|
||
|
prefix & regex.
|
||
|
* htdoc/config.html: Updated with sample of latest htdig.conf and
|
||
|
installdir/*.html, added blurb on wrapper.html.
|
||
|
|
||
|
Wed Mar 1 00:30:19 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* configure.in, configure: Add test for FD_SET_T, the second (also
|
||
|
third and fourth) argument in calls to select(). Should solve PR#739.
|
||
|
|
||
|
* acconfig.h, include/htconfig.h.in: Add declaration for FD_SET_T.
|
||
|
|
||
|
* htnet/Connection.cc (ReadPartial): Change declaration of fds to
|
||
|
use FD_SET_T define set by the configure script.
|
||
|
|
||
|
Tue Feb 29 23:11:49 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/DB2_db.cc (Error): Simply fprint the error message on
|
||
|
stderr. This is not a method since the db.h interface expects a C
|
||
|
function.
|
||
|
(db_init): Don't set db_errfile, instead set errcall to point to
|
||
|
the new Error function.
|
||
|
|
||
|
Tue Feb 29 15:09:41 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htfuzzy/Accents.h, htfuzzy/Accents.cc: Adapted writeDB() for 3.2.
|
||
|
|
||
|
Tue Feb 29 14:29:37 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htfuzzy/Accents.h, htfuzzy/Accents.cc: Added these, as contributed
|
||
|
by Robert Marchand, to implement accents fuzzy match. Adapted to 3.2.
|
||
|
* htfuzzy/Fuzzy.cc, htfuzzy/htfuzzy.cc, htfuzzy/Makefile.am,
|
||
|
htfuzzy/Makefile.in: Added in accents algorithm, as for soundex.
|
||
|
|
||
|
Tue Feb 29 11:31:53 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* test/testnet.cc (Listen): Add -b port to listen to a specific
|
||
|
port. This is to test connect timeout conditions.
|
||
|
|
||
|
* htnet/Connection.cc (Connect): Added SIGALRM signal handler,
|
||
|
Connect() always allow EINTR to occur.
|
||
|
|
||
|
Mon Feb 28 15:32:46 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htword/WordKey.h (class WordKey): explicitly add inline keyword
|
||
|
for all inline functions.
|
||
|
|
||
|
Mon Feb 28 13:10:34 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htword/WordKey.h (class WordKey): nfields data member caches
|
||
|
result of NFields() method.
|
||
|
|
||
|
* htword/WordDBPage.h (class WordDBPage): nfields data member caches
|
||
|
result of WordKey::NFields() method.
|
||
|
|
||
|
* acinclude.m4 (APACHE): check in lib/apache for modules
|
||
|
|
||
|
Sat Feb 26 22:05:03 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htsearch/Collection.h, htsearch/Collection.cc: New files
|
||
|
contributed by Rajendra Inamdar <inamdar at beasys.com>.
|
||
|
|
||
|
* htsearch/Makefile.am, htsearch/Makefile.in: Compile them.
|
||
|
|
||
|
* htcommon/defaults.cc: Add new collection_names attribute as
|
||
|
described by Rajendra.
|
||
|
|
||
|
* htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html:
|
||
|
Regenerate using cf_generate.pl.
|
||
|
|
||
|
* htsearch/Display.h, htsearch/Display.cc: Loop through
|
||
|
collections as we are assembling results.
|
||
|
(buildMatchList): Use 1.0 as minimum score and take log(score) as
|
||
|
the final score. This requires an increase in magnitude in weight
|
||
|
to correspond to a factor of increase in score.
|
||
|
|
||
|
* htsearch/DocMatch.h, htsearch/DocMatch.cc: Keep track of the
|
||
|
collection we're in.
|
||
|
|
||
|
* htsearch/ResultMatch.h: Ditto.
|
||
|
|
||
|
* htsearch/htsearch.h, htsearch/htsearch.cc: Wrap results in
|
||
|
collections.
|
||
|
|
||
|
* htsearch/parser.h, htsearch/parser.cc: Set the collection for
|
||
|
the results--we use this to get to the appropriate word DB.
|
||
|
(score): Divide word weights by word frequency to calibrate for
|
||
|
expected Zipf's law. Rare words should count more.
|
||
|
|
||
|
Fri Feb 25 11:19:47 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htcommon/defaults.cc (maximum_pages): Describe new bahaviour (as of
|
||
|
3.1.4), where this limits total matches shown.
|
||
|
* htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html:
|
||
|
Regenerate using cf_generate.pl.
|
||
|
|
||
|
Thu Feb 24 14:43:06 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htnet/HtFile.cc (Request): Fix silly typo.
|
||
|
|
||
|
* htlib/DB2_db.cc: Remove include of malloc.h, as it causes problems
|
||
|
on some systems (e.g. Mac OS X), and all we need should be in stdlib.h.
|
||
|
|
||
|
Thu Feb 24 13:11:15 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htnet/HtFile.cc (Request): Don't append more than _max_document_size
|
||
|
bytes to _contents string, set _content_length to size returned by
|
||
|
stat().
|
||
|
* htnet/HtHTTP.cc (HTTPRequest): Extra tests in case Content-Length
|
||
|
not given for non-chunked input, and not to close persistent
|
||
|
connection when chunked input exceeds _max_document_size.
|
||
|
(ReadChunkedBody): Don't append more than _max_document_size bytes
|
||
|
to _contents string.
|
||
|
|
||
|
Thu Feb 24 11:40:24 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/HTML.cc (do_tag): Fix handling of img alt text to be consistent
|
||
|
with body text, rather than keywords.
|
||
|
* htdig/Retriever.cc (ctor): Treat alt text as plain text, until it has
|
||
|
its own FLAG and factor.
|
||
|
|
||
|
Thu Feb 24 11:16:37 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htcommon/defaults.cc (version): Moved example over to correct field.
|
||
|
(defaults[] terminator): Padded zeros to new number of fields.
|
||
|
* htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html:
|
||
|
Regenerate using cf_generate.pl.
|
||
|
|
||
|
Thu Feb 24 19:08:41 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htmerge/words.cc: only display Word in verbose message instead
|
||
|
of complete key if verbosity < 3.
|
||
|
|
||
|
Thu Feb 24 10:43:12 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htcommon/defaults.cc (external_protocols, external_parser):
|
||
|
Swapped these two entries to put them in alphabetical order.
|
||
|
(star_blank): Fixed old typo (incorrect reference to image_star).
|
||
|
* htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html:
|
||
|
Regenerate using cf_generate.pl.
|
||
|
|
||
|
Wed Feb 23 16:53:40 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htcommon/defaults.cc (backlink_factor, external_parser,
|
||
|
local_default_doc, local_urls, local_urls_only, local_user_urls):
|
||
|
Add some updates from 3.1.5's attrs.html.
|
||
|
* htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html:
|
||
|
Regenerate using cf_generate.pl.
|
||
|
|
||
|
Wed Feb 23 15:11:51 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
[ Improve htsearch's HTML 4.0 compliance ]
|
||
|
* htsearch/TemplateList.cc (createFromString): Use file name rather
|
||
|
than internal name to select builtin-* templates, use $&(TITLE) and
|
||
|
$&(URL) in templates and quote HTML tag parameters.
|
||
|
* installdir/long.html, installdir/short.html: Use $&(TITLE) and
|
||
|
$&(URL) in templates and quote HTML tag parameters.
|
||
|
* htsearch/Display.cc (setVariables): quote all HTML tag parameters
|
||
|
in generated select lists.
|
||
|
* installdir/footer.html, installdir/header.html,
|
||
|
installdir/nomatch.html, installdir/search.html,
|
||
|
installdir/syntax.html, installdir/wrapper.html:
|
||
|
Use $&(var) where appropriate, and quote HTML tag parameters.
|
||
|
* installdir/htdig.conf: quote all HTML tag parameters.
|
||
|
|
||
|
Wed Feb 23 13:40:27 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htcommon/URL.h (encodeURL): Change list of valid characters to
|
||
|
include only unreserved ones.
|
||
|
* htcommon/cgi.cc (init): Allow "&" and ";" as input param. separators.
|
||
|
* htsearch/Display.cc (createURL): Encode each parameter separately,
|
||
|
using new unreserved list, before piecing together query string, to
|
||
|
allow characters like "?=&" within parameters to be encoded.
|
||
|
|
||
|
Wed Feb 23 13:22:29 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htlib/URL.cc (ServerAlias): Fix server_aliases processing to prevent
|
||
|
infinite loop (as for local_urls in PR#688).
|
||
|
|
||
|
Wed Feb 23 12:49:52 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htlib/HtDateTime.h, htlib/HtDateTime.cc: change Httimegm() method
|
||
|
to HtTimeGM(), to avoid conflict with Httimegm() C function, so we
|
||
|
don't need "::" override, for Mac OS X.
|
||
|
* htlib/htString.h, htlib/String.cc: change write() method to
|
||
|
Write(), to avoid conflict with write() function, so we don't need
|
||
|
"::" override, for Mac OS X.
|
||
|
|
||
|
Wed Feb 23 12:17:46 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htlib/Configuration.cc(Read): Fixed to allow final line without
|
||
|
terminating newline character, rather than ignoring it.
|
||
|
|
||
|
Wed Feb 23 12:01:01 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/Retriever.cc (GetLocal, GetLocalUser): Add URL-decoding
|
||
|
enhancements to local_urls, local_default_urls & local_default_doc,
|
||
|
to allow hex encoding of special characters.
|
||
|
|
||
|
Wed Feb 23 19:14:29 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htcommon/conf_parser.cxx: regenerated from conf_parser.yxx
|
||
|
|
||
|
Wed Feb 23 19:04:16 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* test/test_functions.in: inconditionaly remove existing test/var
|
||
|
directory before runing tests to prevent accidents.
|
||
|
|
||
|
* htcommon/URL.cc (URL): fixed String->char warning
|
||
|
|
||
|
* htcommon/defaults.cc (wordlist_compress): defaults to true
|
||
|
|
||
|
Tue Feb 22 17:09:10 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/HTML.cc(parse, do_tag): Fix handling of <img alt=...> text
|
||
|
and parsing of words in meta tags, to to proper word separation.
|
||
|
* htlib/HtWordType.h, htlib/HtWordType.cc: Add HtWordToken() function,
|
||
|
to replace strtok() in HTML parser.
|
||
|
|
||
|
Tue Feb 22 16:21:25 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htlib/URL.cc (ctor, normalizePath): Fix PR#779, to handle relative
|
||
|
URLs correctly when there's a trailing ".." or leading "//".
|
||
|
|
||
|
Tue Feb 22 14:09:26 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/Document.cc (RetrieveLocal): Handle common extensions for
|
||
|
text/plain, application/pdf & application/postscript.
|
||
|
|
||
|
Mon Feb 21 17:25:21 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* contrib/htdig-3.2.0.spec: Fixed %post script to add more
|
||
|
descriptive entries in htdig.conf, made cron script a config file,
|
||
|
updated to 3.2.0b2.
|
||
|
|
||
|
* contrib/conv_doc.pl, contrib/parse_doc.pl: Added comments to show
|
||
|
Warren Jones's updates in change history.
|
||
|
|
||
|
Mon Feb 21 17:09:13 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htcommon/HtConfiguration.h, htcommon/conf_parser.yxx,
|
||
|
htlib/Configuration.h, htlib/Configuration.cc: split Add() method
|
||
|
into Add() and AddParsed(), so that only config attributes get parsed.
|
||
|
Use AddParsed() only in Read() and Defaults().
|
||
|
|
||
|
Fri Feb 18 22:50:54 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htnet/Connection.h, htnet/Connection.cc: Renamed methods with
|
||
|
capitals to remove the need to use ::-escaped library calls.
|
||
|
|
||
|
* htnet/Transport.h, htnet/Transport.cc, htnet/HtHTTP.cc,
|
||
|
htdig/Images.cc: Fix code using Connection to use the newly
|
||
|
capitalized methods.
|
||
|
|
||
|
Fri Feb 18 14:40:50 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* test/conf/access.conf.in: removed cookies. Not used and some
|
||
|
httpd are not compiled with usertrack.
|
||
|
|
||
|
Wed Feb 16 12:15:08 2000 Vadim Chekan <vadim at etc.lviv.ua>
|
||
|
|
||
|
* htcommon/Makefile.am replaced conf.tab.cc.h by conf_parser.h in
|
||
|
noinst_HEADERS
|
||
|
|
||
|
* htcommon/conf_parser.yxx,conf_parser.lxx,HtConfiguration.cc,
|
||
|
HtConfiguration.h: added copyright and Id:
|
||
|
|
||
|
* htcommon/cgi.cc(init): fixed bug: array must be free by
|
||
|
delete [] buf, not just delete buf;
|
||
|
|
||
|
Tue Feb 15 23:16:14 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htnet/HtHTTP.cc (isParsable): Remove application/pdf as a
|
||
|
default type--it is now handled through the ExternalParser
|
||
|
interface if at all.
|
||
|
|
||
|
* htcommon/defaults.cc: Remove pdf_parser attribute.
|
||
|
|
||
|
* htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html:
|
||
|
Regenerate using cf_generate.pl.
|
||
|
|
||
|
* htdig/Document.cc (getParsable): Remove PDF once and for all
|
||
|
(hopefully).
|
||
|
|
||
|
* htdig/ExternalParser.cc (parse): Ditto.
|
||
|
|
||
|
* configure.in: Remove check for PDF_PARSER.
|
||
|
|
||
|
* configure: Regenerate using autoconf
|
||
|
|
||
|
* htdig/Makefile.am: Remove PDF.cc and PDF.h.
|
||
|
|
||
|
* Makefile.in, */Makefile.in: Regenerate using automake --foreign
|
||
|
|
||
|
Tue Feb 15 12:02:39 EET 2000 Vadim Chekan <vadim at etc.lviv.ua>
|
||
|
|
||
|
* htcommon/HtConfiguration.cc,HtConfiguration.h: fixed bug discovered
|
||
|
by Gilles. HtConfiguration was able to get info only from "url" and
|
||
|
"server" block.
|
||
|
|
||
|
* htcommon/conf_parser.yxx: deleted 1st parameter for new char[],
|
||
|
lefted when realloc was replaced by new char[]. Removed a few unused
|
||
|
variable declaration.
|
||
|
|
||
|
* htcommon/Makefile.am: added -d flag to bison to generate
|
||
|
conf_parser.h template from conf_parser.yxx;
|
||
|
conf_lexer.lxx uses #include conf_parser.h;
|
||
|
conf.tab.cc.h removed.
|
||
|
|
||
|
Sun Feb 13 21:19:04 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htcommon/defaults.cc: Get rid of uncoded_db_compatible since
|
||
|
the current DB format has clearly broken backwards compatibility.
|
||
|
|
||
|
* htsearch/Display.cc (Display), htnotify/htnotify.cc (main),
|
||
|
htmerge/docs.cc (convertDocs), htmerge/db.cc (mergeDB),
|
||
|
htdig/htdig.cc (main): Remove call to DocumentDB::setCompatibility().
|
||
|
|
||
|
* htcommon/DocumentDB.h (class DocumentDB): Remove
|
||
|
setCompatibility and related private variable.
|
||
|
|
||
|
* htcommon/DocumentDB.cc ([], Delete): Don't bother checking for
|
||
|
an unencoded URL, at this point all URLs will be encoded using
|
||
|
HtURLCodec.
|
||
|
|
||
|
* htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html:
|
||
|
Regenerate using cf_generate.pl.
|
||
|
|
||
|
Sat Feb 12 21:29:20 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htcommon/HtSGMLCodec.cc (HtSGMLCodec): Always translate "
|
||
|
& < and >
|
||
|
|
||
|
* htcommon/defaults.cc: Remove translate_* and word_list
|
||
|
attributes since they're now no longer used.
|
||
|
|
||
|
* htdig/PDF.cc (parseNonTextLine): Fix bogus escape sequences
|
||
|
around Title parsing. Fixes PR#740.
|
||
|
|
||
|
* htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html:
|
||
|
Regenerate using cf_generate.pl.
|
||
|
|
||
|
Fri Feb 11 11:41:36 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htlib/Makefile.am: removed CFLAGS=-g (use make CXXFLAGS=-g all
|
||
|
instead).
|
||
|
|
||
|
* htdoc/install.html: specify header/lib install directory now
|
||
|
is prefix/include/htdig and prefix/lib/htdig.
|
||
|
|
||
|
* Makefile.am (distclean-local): use TESTDIR instead of deprecated
|
||
|
HTDIGDIRS.
|
||
|
|
||
|
* */Makefile.am: install libraries in prefix/lib/htdig and
|
||
|
includes in prefix/include/htdig. Just prepend pkg in front of
|
||
|
automake targets.
|
||
|
|
||
|
* include/Makefile.am: install htconfig.h
|
||
|
|
||
|
Thu Feb 10 23:18:37 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* Connection.cc (Connection): set retry_value to 1 instead of
|
||
|
0 as suggested by Geoff.
|
||
|
|
||
|
Thu Feb 10 17:36:09 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htdig/Document.cc: fix (String)->(char*) conversion warnings.
|
||
|
|
||
|
* htword/WordList.cc: kill Collect(WordSearchDescription) which
|
||
|
was useless and error prone.
|
||
|
|
||
|
* htword/WordDB.h (WordDBCursor::Get): small performance improvement
|
||
|
by copying values only if key found.
|
||
|
|
||
|
* htword/WordDB.h,WordList.cc: fix reference counting bug when
|
||
|
using Override (+1 even if entry existed). Turn WordDB.h return
|
||
|
values to be std Berkeley DB fashion instead of the mixture with
|
||
|
OK/NOTOK that was a stupid idea. This allows to detect Put errors
|
||
|
and handle them properly to fix the Override bug without performance
|
||
|
loss.
|
||
|
|
||
|
* test/conf/httpd.conf.in: comment out loading of mod_rewrite
|
||
|
since not everyone has it.
|
||
|
|
||
|
Thu Feb 10 00:26:02 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htcommon/defaults.cc: Add new attribute "nph" to send out
|
||
|
non-parsed headers for servers that do not supply HTTP headers on
|
||
|
CGI output (e.g. IIS).
|
||
|
|
||
|
* htsearch/Display.cc (display): If nph is set, send out HTTP OK
|
||
|
header as suggested by Matthew Daniel <mdaniel at scdi.com>
|
||
|
(displaySyntaxError): Ditto.
|
||
|
|
||
|
* htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html:
|
||
|
Regenerate from current defaults.cc file.
|
||
|
|
||
|
Thu Feb 10 00:21:58 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/HTML.cc (do_tag): Treat <script></script> tags as noindex
|
||
|
tags, much like <style></style> as suggested by Torsten.
|
||
|
|
||
|
Thu Feb 10 00:02:41 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* .version: Bump for 3.2.0b2.
|
||
|
|
||
|
* htcommon/defaults.cc: Add category fields for each
|
||
|
attribute. Though these are currently unused, they could allow the
|
||
|
documentation to be split into multiple files based on logical
|
||
|
categories and subcategories.
|
||
|
|
||
|
Wed Feb 9 23:52:55 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htnet/Connection.cc (connect): Add alarm(timeout) ... alarm(0)
|
||
|
around ::connect() call to ensure this does timeout as appropriate
|
||
|
as suggested by Russ Lentini <rlentini at atl.lmco.com> to resolve
|
||
|
PR#762 (and probably others as well).
|
||
|
(connect): Add a retry loop as suggested by Wilhelm Schnell
|
||
|
<Wilhelm.Schnell at mn.man.de> to resolve PR#754.
|
||
|
|
||
|
* htnet/HtHTTP.cc (HTTPRequest): Add CloseConnection() when the
|
||
|
connection fails on open before returning from the method. Should
|
||
|
take care of PR#670 for htdig-3-2-x.
|
||
|
|
||
|
Wed Feb 09 17:20:50 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* db/dist/Makefile.in (libhtdb.so): move dependent libraries
|
||
|
*after* the list of objects, otherwise it's useless.
|
||
|
|
||
|
* htword/WordKey.h (class WordKey): move #if SWIG around to
|
||
|
please swig (www.swig.org).
|
||
|
|
||
|
* htword/WordList.h (class WordList): allow SWIG to see Walk*
|
||
|
functions (#if SWIG).
|
||
|
|
||
|
Wed Feb 9 09:21:00 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/Server.cc (robotstxt): apply more rigorous parsing of
|
||
|
multiple user-agent fields, and use only the first one.
|
||
|
|
||
|
* htlib/HtRegex.cc (set): apply the fix from Valdas Andrulis, to
|
||
|
properly compile case_sensitive expressions.
|
||
|
|
||
|
Mon Feb 09 09:43:59 2000 CEST Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* htnet/HtHTTP.cc: changed "<<" to append() for content_length
|
||
|
assignment in ReadChunkedBody() function (as Gilles suggested)
|
||
|
|
||
|
Tue Feb 08 10:54:08 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* db/dist/configure.in: Added AC_PREFIX_DEFAULT(/opt/www)
|
||
|
so that headers and libraries are installed in the proper
|
||
|
directory when no --prefix is given.
|
||
|
|
||
|
Tue Feb 08 10:32:48 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* test/t_wordskip: copy $srcdir/skiptest_db.txt to allow running
|
||
|
outside the source tree.
|
||
|
|
||
|
* configure.in: use '${prefix}/...' instead of "$ac_default_prefix/..."
|
||
|
that did not carry the --prefix value.
|
||
|
|
||
|
* configure.in: run CHECK_USER and AC_PROG_APACHE if --enable-tests
|
||
|
|
||
|
Mon Feb 07 17:40:47 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htlib/htString.h (last): turn to const
|
||
|
|
||
|
Mon Feb 07 14:05:37 2000 CEST Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* htnet/HtHTTP.cc: fixed a bug in ReadChunkedBody() function
|
||
|
regarding document size assignment (raised by Valdas Andrulis)
|
||
|
|
||
|
Sun Feb 06 19:11:05 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* configure.in: Fix inconsistencies between default values
|
||
|
shown by ./configure and actual defaults.
|
||
|
|
||
|
* htdoc/install.html: change example version 3.1 to 3.2
|
||
|
Commented out warning about libguile.
|
||
|
Replace CONFIG variables by configure.in options.
|
||
|
Specify default value for each of them.
|
||
|
Replace (and move) make depend by automake (distributed
|
||
|
Makefiles do not include dependency generation)
|
||
|
Added section for running tests.
|
||
|
Added section on shared libraries.
|
||
|
|
||
|
* configure.in: use AM_CONDITIONAL for --enable-tests
|
||
|
|
||
|
* Makefile.am: use automake conditionals for subdir so
|
||
|
that make dist knows what to distribution --enable-tests
|
||
|
specified or not.
|
||
|
|
||
|
* db/Makefile.in: allow make dist to work outside the source
|
||
|
tree.
|
||
|
|
||
|
Sat Feb 05 18:31:04 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* test/word.cc (SkipTestEntries): The fix of
|
||
|
WordList::SkipUselessSequentialWalking actually saves us
|
||
|
a few hops when walking lists of words.
|
||
|
|
||
|
Fri Feb 04 17:28:32 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htword/WordKey.cc,WordReference.cc,WordRecord.cc (Print): use
|
||
|
cerr instead of cout for immediate printing under debugger.
|
||
|
|
||
|
Thu Feb 3 16:06:45 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/Document.cc (RetrieveLocal): fix bug that prevented local
|
||
|
filesystem digging, because max_doc_size was initialized to 0.
|
||
|
Now sets it to max_doc_size for current url.
|
||
|
|
||
|
Thu Feb 3 12:36:56 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* installdir/Makefile.{am,in}: install mime.types as mime.types,
|
||
|
not as htdig.conf.
|
||
|
|
||
|
* htfuzzy/EndingsDB.cc (createDB): fix code to use MV macro in
|
||
|
system() command, not hard-coded "MV" string literal, and use
|
||
|
get() on config objects to avoid passing String objects to form().
|
||
|
|
||
|
Wed Feb 2 19:44:33 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/HtDateTime.cc (SetRFC1123): Strip off weekday, if present
|
||
|
and use LOOSE format.
|
||
|
(SetRFC850): Ditto.
|
||
|
|
||
|
* configure.in, configure: Add configure check for "mv."
|
||
|
|
||
|
* htfuzzy/Makefile.am: Use it.
|
||
|
|
||
|
* */Makefile.in: Regenerate using automake.
|
||
|
|
||
|
* htfuzzy/EndingsDB.cc (createDB): Use the detected mv, or
|
||
|
whatever is in the path to move the endings DB when they're
|
||
|
finished.
|
||
|
|
||
|
Wed Feb 2 15:49:14 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/Document.cc (RetrieveLocal), htdig/Retriever.cc (GetLocal):
|
||
|
Fix compilation errors. Oops!
|
||
|
|
||
|
Wed Feb 2 13:53:27 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/Retriever.cc (IsValidURL): fix problem with valid_extensions
|
||
|
matching failure when URL parameters follow extension.
|
||
|
|
||
|
Wed Feb 2 13:29:48 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htlib/QuotedStringList.cc (Create): fix PR#743, where quoted string
|
||
|
lists didn't allow embedded quotes of opposite sort in strings
|
||
|
(e.g. "'" or '"'), and fix to avoid overrunning end of string
|
||
|
if it ends with backslash.
|
||
|
|
||
|
Wed Feb 2 13:23:16 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/HTML.cc (ctor, parse, do_tag), htcommon/defaults.cc:
|
||
|
Add max_keywords attribute to limit meta keyword spamming.
|
||
|
* htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html:
|
||
|
Regenerate using cf_generate.pl.
|
||
|
|
||
|
Wed Feb 2 12:57:40 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/Document.cc (RetrieveLocal), htdig/Document.h,
|
||
|
htdig/Retriever.cc (Initial, parse_url, GetLocal, GetLocalUser,
|
||
|
IsLocalURL, got_href, got_redirect), htdig/Retriever.h,
|
||
|
htdig/Server.cc (ctor), htdig/Server.h: Add in Paul Henson's
|
||
|
enhancements to local_urls, local_default_urls & local_default_doc.
|
||
|
* htcommon/defaults.cc: Document these.
|
||
|
|
||
|
Wed Feb 02 10:14:57 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htword/WordKeyInfo.h,WordKey.{cc,h}: fix overflow bug when 32
|
||
|
bits. For that purpose implement Outbound/Overflow/Underflow
|
||
|
methods in WordKey, MaxValue in WordKey/WordKeyInfo.
|
||
|
(WordKey::SetToFollowing) was FUBAR : overflow of field1 tested
|
||
|
with number of bits in next field, do not handle overflow,
|
||
|
Re-implemented.
|
||
|
(WordKey::Set) Change atoi to strtoul.
|
||
|
(WordList::SkipUselessSequentialWalking) was much to fucked up
|
||
|
to explain. Re-implement
|
||
|
(WordKey::Diff) Added as a support function of
|
||
|
SkipUselessSequentialWalking.
|
||
|
implement consistent verbosity.
|
||
|
|
||
|
* htword/WordList.cc (operator >>): explicit error message when
|
||
|
insert failed, with line number.
|
||
|
|
||
|
Wed Feb 2 00:11:03 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdoc/RELEASE.html: Finish up with notes on all significant
|
||
|
new attributes.
|
||
|
|
||
|
* htdoc/FAQ.html, htdoc/where.html: Mention new 3.2.0b1 release
|
||
|
as a beta.
|
||
|
|
||
|
* contrib/README: Update to mention new scripts.
|
||
|
|
||
|
* installdir/mime.types: Add default Apache mime.types file for
|
||
|
systems that do not already have one.
|
||
|
|
||
|
* installdir/Makefile.am: Make sure it is installed by default.
|
||
|
|
||
|
* installdir/Makefile.in: Regenerate using automake.
|
||
|
|
||
|
* htcommon/defaults.cc: Add documentation for mime_types
|
||
|
attribute, remove currently unused image_alt_factor, and add
|
||
|
documentation for external_protocols.
|
||
|
|
||
|
* htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html:
|
||
|
Regenerate using cf_generate.pl.
|
||
|
|
||
|
Tue Feb 1 10:24:19 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/parser.cc (score): fix up score calculations for
|
||
|
correctness and efficiency.
|
||
|
|
||
|
Mon Jan 31 16:29:20 2000 Marcel Bosc <bosc at ceic.com>
|
||
|
|
||
|
* htword/WordBitCompress.cc: fixed endian bug in compression
|
||
|
|
||
|
Sat Jan 29 21:14:03 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htsearch/parser.cc (score): Change config.Value (which returns
|
||
|
int) to config.Double to preserve accuracy of attributes.
|
||
|
|
||
|
* htcommon/defaults.cc: Updated documentation for attributes now
|
||
|
allowing regex, search_algorithms (for new fuzzy) and added
|
||
|
documentation for the overlooked remove_unretrieved_urls.
|
||
|
|
||
|
* htdoc/*.html: Updated copyright notice for 2000, changed footer
|
||
|
to use CVS's magic Date keyword. Regenerated documentation from
|
||
|
defaults changes.
|
||
|
|
||
|
Sat Jan 29 16:32:08 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* contrib/htdig-3.1.4.spec, contrib/htdig-3.1.4-conf.patch: Remove
|
||
|
these since they don't apply to the 3.2.x releases.
|
||
|
|
||
|
* htfuzzy/Synonym.cc (openIndex): Change database format from
|
||
|
DB_BTREE to DB_HASH--no reason for the synonym database to be a
|
||
|
btree. This was probably overlooked when I switched the rest of
|
||
|
the fuzzy databases over to DB_HASH.
|
||
|
|
||
|
Sat Jan 29 05:34:26 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htword/WordKey.h (UnpackNumber): Very nasty bug. Optimization
|
||
|
dated Dec 29 broke endianess on Solaris. Restore previous version.
|
||
|
|
||
|
Fri Jan 28 18:17:08 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/Configuration.h (struct ConfigDefaults): Add version and
|
||
|
category fields for more accurate documentation.
|
||
|
|
||
|
* htcommon/defaults.cc: Add blank category fields and start
|
||
|
filling in version field. Killed modification_time_is_now_attribute.
|
||
|
|
||
|
* htdig/Document.cc (Document): Kill attribute
|
||
|
modification_time_is_now since it can cause more harm than good.
|
||
|
|
||
|
* htnet/HtHTTP.cc (ParseHeader): Ditto.
|
||
|
|
||
|
* htdoc/cf_generate.pl: Added support for new version and category
|
||
|
fields. Currently category does nothing, but it could split the
|
||
|
documentation into categories.
|
||
|
|
||
|
Sat Jan 29 01:37:45 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* .version: remove the trailing -dev
|
||
|
|
||
|
Thu Jan 27 12:22:57 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htword/WordList.cc: cdebug replaced by cerr. replace lverbose
|
||
|
by verbose > 2. Remove shutup.
|
||
|
(WordList): monitor = 0
|
||
|
(Open): create monitor only if wordlist_monitor = true
|
||
|
(Close): delete monitor if set, delete compressor if set
|
||
|
|
||
|
* htword/WordDBCompress.cc,WordList.cc: only activate monitoring code
|
||
|
if monitor is set. No interaction with the monitor is therefore possible
|
||
|
if wordlist_monitor is false.
|
||
|
|
||
|
* htword/WordMonitor.cc: remove useless test of wordlist_monitor (done by
|
||
|
WordList now).
|
||
|
|
||
|
* htword/WordDBCompress.cc (TestCompress): remove redundant debuglevel argument.
|
||
|
|
||
|
* htword/WordDBCompress.cc (WordDBCompress): init cmprInfo to 0
|
||
|
|
||
|
* db/include/db_cxx.h: Add get_mp_cmpr_info method
|
||
|
|
||
|
* htword/WordDBCompress.cc (WordDBCompress): set default debug level to 0
|
||
|
|
||
|
* htword/WordDB.h: CmprInfo returns current CmprInfo and non static,
|
||
|
overload to set CmprInfo if argument given.
|
||
|
|
||
|
* htword/WordDBCompress.h: new CmprInfo() method returns DB_CMPR_INFO object
|
||
|
for Berkeley DB database.
|
||
|
|
||
|
* htword/WordList.h: add compressor member, kill cmprInfo member.
|
||
|
|
||
|
* htword/WordList.cc:
|
||
|
|
||
|
Wed Jan 26 20:05:33 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htword/WordList.cc,htword/WordList.h: get rid of obsolete WordBenchmarking
|
||
|
|
||
|
Wed Jan 26 9:14:32 2000 CEST Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* htcommon/defaults.cc: added "max_connection_requests".
|
||
|
|
||
|
* htdig/Retriever.cc: now manages the attribute above.
|
||
|
|
||
|
Tue Jan 25 12:59:01 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htsearch/Display.cc (setVariables): fixed
|
||
|
Display.cc:505: warning: multiline `//' comment
|
||
|
|
||
|
Tue Jan 25 8:37:15 2000 CEST Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* htdig/Document.h: Added the "HtHTTP *GetHTTPHandler()" method, in
|
||
|
order to be able to control an HTTP object outside the Document class.
|
||
|
This is useful for the Server class, after the request for robots.txt.
|
||
|
We can control the response of a server and check if it supports
|
||
|
persistent connections.
|
||
|
|
||
|
* htdig/Server.cc: inside the constructor, persistent_connections var is
|
||
|
initialized to the configuration parameter value, instead of <true>.
|
||
|
Besides, after the request of the robots.txt, it controls and set
|
||
|
the attribute for persistent connections, depending on whether the
|
||
|
server supports them or not.
|
||
|
|
||
|
* htdig/Retriever.cc: modified the Start() method. Now the loop manage
|
||
|
HTTP persistent connections "on a server" basis. Indeed, it's a
|
||
|
Server object that decides if persisent connections are allowed on
|
||
|
that server or not (depending on configuration or capabilities of
|
||
|
the remote http server).
|
||
|
|
||
|
Mon Jan 24 12:57:45 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/Display.cc(setVariables): Added double quotes around
|
||
|
default selection value in build_select_lists handling.
|
||
|
|
||
|
Mon Jan 24 12:37:22 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/Display.cc(setVariables), htcommon/defaults.cc: Added
|
||
|
build_select_lists attribute, to generate selector menus in forms.
|
||
|
Added relevant explanations and links to selectors documentation.
|
||
|
* htdoc/hts_selectors.html: Added this page to explain this new
|
||
|
feature, plus other details on select lists in general.
|
||
|
* htdoc/hts_templates.html: Added relevant links to related attributes
|
||
|
and selectors documentation.
|
||
|
* htdoc/attrs.html, cf_by{name,prog}.html: reran cf_generate.pl
|
||
|
|
||
|
Fri Jan 21 18:57:58 EET 2000 Vadim Chekan <vadim at etc.lviv.ua>
|
||
|
|
||
|
* htcommon/HtConfiguration.cc: added HtConfiguration::ParseString(char*)
|
||
|
method to allow lexer handle "include: ${var}/file.inc" construction
|
||
|
|
||
|
* htcommon/conf_lexer.lxx: fixed handling "include: ${var}file.inc"
|
||
|
bug.
|
||
|
|
||
|
Fri Jan 21 17:04:28 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htword/WordList.cc (WalkFinish,WalkInit,WalkNextStep): fix typos in error messages
|
||
|
and misleading comment.
|
||
|
|
||
|
* htword/WordList.h,WordList.cc: move part of WalkInit in WalkRewind so that
|
||
|
we have a function to go back to the beginning of possible matches.
|
||
|
|
||
|
Wed Jan 19 21:49:57 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/HTML.cc (do_tag): Only add words for META descriptions,
|
||
|
keywords, and IMG ALT attributes if doindex is set.
|
||
|
|
||
|
* htcommon/DocumentRef.h: Added Reference_obsolete for documents
|
||
|
that should be removed (but haven't).
|
||
|
|
||
|
* htdig/Retriever.cc (parse_url): Flag documents that have been
|
||
|
modified as Reference_obsolete and update the database. Flag all
|
||
|
documents with various errors as something other than
|
||
|
Reference_normal, as appropriate--these probably should be pruned.
|
||
|
|
||
|
* htdig/Retriever.h: Get rid of GetRef() method--it's only used once!
|
||
|
|
||
|
* htsearch/Display.cc (display): Don't show DocumentRefs with
|
||
|
states other than Reference_normal--these documents have various
|
||
|
errors.
|
||
|
|
||
|
* htmerge/docs.cc: If a document has a state of Reference_obsolete, ignore it.
|
||
|
|
||
|
* htcommon/HtWordList.h, htcommon/HtWordList.cc (Skip): Change
|
||
|
MarkGone() to Skip() to emphasize that this document should be ignored.
|
||
|
|
||
|
Wed Jan 19 14:11:51 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htword/WordList.cc (SkipUselessSequentialWalking): return OK if skipping,
|
||
|
NOTOK if not skipping.
|
||
|
|
||
|
* htword/WordReference.h: remove useless Clear in WordReference(key, record)
|
||
|
constructor.
|
||
|
|
||
|
* htword/WordList.h,WordList.cc: Split Walk in three separate functions
|
||
|
WalkInit, WalkNext and WalkFinish. Much clearer. Fill the status field
|
||
|
of WordSearchDescription to have more information about the error condition.
|
||
|
Add found field to WordSearchDescription for WalkNext result. Add cursor_get_flags
|
||
|
and searchKeyIsSamePrefix fields to WordSearchDescription as internal state
|
||
|
information.
|
||
|
|
||
|
* htword/WordList.h,WordList.cc: WalkInit to create and prepare cursor,
|
||
|
WalkNext to move to next match
|
||
|
WalkNextStep to move to next index entry, be it a match or not
|
||
|
WalkFinish to release cursor.
|
||
|
|
||
|
* htword/WordList.h: WordSearchDescription::ModifyKey add to jump
|
||
|
while walking.
|
||
|
|
||
|
* htword/WordList.cc (WalkNext) : it is now legal to step without
|
||
|
collection or callback because search contains the last match (found
|
||
|
field) and it s therefore not useless.
|
||
|
|
||
|
Mon Jan 17 12:15:45 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* contrib/htdig-3.2.0.spec: added sample RPM spec file for 3.2
|
||
|
|
||
|
Sat Jan 15 11:53:35 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htdb/htstat.cc,htdb/htdump.cc: remove useless -S option since
|
||
|
the page size is found in the header of the file.
|
||
|
|
||
|
* htdb/htstat.cc,htdump.cc,htload.cc: only call WordContext::Initialize
|
||
|
if -W flag specified.
|
||
|
|
||
|
Fri Jan 14 18:39:12 2000 Marcel Bosc <bosc at ceic.com>
|
||
|
|
||
|
* htword/WordBitCompress.cc: speedup, VlengthCoder::code()
|
||
|
finds appropriate coding interval much faster
|
||
|
|
||
|
Fri Jan 14 11:30:41 2000 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/Retriver.cc(IsValidURL): Fix problem with valid_extensions,
|
||
|
which got lost in the shuffle yesterday.
|
||
|
|
||
|
Fri Jan 14 15:56:49 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htword/WordType.cc,WordRecord.cc,WordKeyInfo.cc (Initialize): change
|
||
|
inverted test on instance (== instead of !=).
|
||
|
|
||
|
* htword/WordRecord.cc (WordRecordInfo): change inverted test on compare
|
||
|
|
||
|
Fri Jan 14 14:24:39 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htdig/htdig.cc,htmerge/htmerge.cc,htsearch/htsearch.cc: Use Initialize(defaults)
|
||
|
to load configuration file if provided.
|
||
|
|
||
|
* htword/WordDBCompress.cc (Compress): initialize monitor to null in
|
||
|
constructor and check if null before usage. Core dumped in htdb/htload.
|
||
|
|
||
|
* htword/WordContext.h (class WordContext): Add
|
||
|
Initialize(const ConfigDefaults* config_defaults = 0)
|
||
|
that probe configuration files. Usefull when htword is used as a standalone library.
|
||
|
|
||
|
Thu Jan 13 19:52:27 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/Retriver.cc: Fix problem with valid_extensions when an
|
||
|
"extension" would include part of a directory path or server
|
||
|
name, as contributed by Warren Jones.
|
||
|
|
||
|
Thu Jan 13 19:22:25 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htnet/Makefile.am, htnet/Makefile.in: Add HtFile to the build process.
|
||
|
|
||
|
Thu Jan 13 18:58:03 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htnet/HtFile.h, htnet/HtFile.cc: New Transport classes
|
||
|
contributed by Alexis Mikhailov to allow file:// access.
|
||
|
|
||
|
* htdig/Document.h, htdig/Document.cc: Add logic to call HtFile
|
||
|
objects for URLs.
|
||
|
|
||
|
* htcommon/URL.cc: Don't remove a trailing index.html (removeIndex)
|
||
|
if the URL is a file://URL.
|
||
|
|
||
|
Thu Jan 13 18:49:41 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* contrib/conv_doc.pl, contrib/parse_doc.pl: Replace "break" by
|
||
|
"last" for correct Perl syntax and additional cleanups and
|
||
|
simplifications as contributed by Warren Jones.
|
||
|
|
||
|
Thu Jan 13 18:42:29 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htword/WordType.h, htword/WordType.cc: Implementation of new
|
||
|
methods IsDigit() and IsCntrl() as contributed by Marc Pohl
|
||
|
<marc.pohl at wdr.de>. Fixes some problems with 8-bit characters.
|
||
|
|
||
|
Thu Jan 13 17:17:47 2000 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* ChangeLog.0, configure, configure.in, htfuzzy/Endings.cc,
|
||
|
htlib/String.cc, htlib/Configuration.cc,
|
||
|
htlib/QuotedStringList.cc, htlib/regex.c, htcommon/defaults.cc,
|
||
|
htdig/ExternalParser.cc, htdig/Retriever.h, htsearch/Display.cc,
|
||
|
include/htconfig.h.in installdir/htdig.conf: Merge in changes from
|
||
|
3.1.x releases.
|
||
|
|
||
|
* htdoc/: Merge in documentation changes from 3.1.x releases.
|
||
|
|
||
|
Thu Jan 13 20:12:42 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htword/WordList.cc (Walk): close the cursor before returning. If
|
||
|
not doing that the cursor might be closed after the database is
|
||
|
closed, leading to double free of the cursor. Bad bug.
|
||
|
|
||
|
Thu Jan 13 13:23:17 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htword/WordContext.h (class WordContext): simplifies a lot. WordContext is
|
||
|
no longer a repository for pointers of class instances. Only a place to call
|
||
|
Initialize for classes that have a single instance.
|
||
|
|
||
|
* htlib/HtWordType.cc: added to include definition of functions shortcuts for
|
||
|
WordType.
|
||
|
|
||
|
* htword/WordRecord.h,WordType.h,WordKeyInfo.h: implement homogeneous scheme to
|
||
|
handle unique instance of the class.
|
||
|
- constructor takes const Configuration& argument and init object with config
|
||
|
values
|
||
|
- static member instance
|
||
|
- static method Initialize the static member instance
|
||
|
- static method Instance returns the pointer in instance data member
|
||
|
|
||
|
* htword/WordRecord.cc: add constructor for WordRecordInfo, and Instance static
|
||
|
function. Add WORD_RECORD_INVALID to depict uninitialize WordRecordInfo object.
|
||
|
|
||
|
* htword/WordKeyInfo.h: rename SetKeyDescriptionFromFile and SetKeyDescriptionFromString
|
||
|
to InitializeFromFile and InitializeFromString and implement them by calling Initialize.
|
||
|
rename SetKeyDescriptionRandom to InitializeRandom
|
||
|
rename Initialize(String& line) to GetNFields(String& line)
|
||
|
rename Initialize(int nfields) to Alloc(int nfields)
|
||
|
|
||
|
* htdig/htdig.cc,htmerge/htmerge.cc,htsearch/htsearch.cc,test/word.cc: replace
|
||
|
WordList::Initialize with WordContext::Initialize and run immediately after
|
||
|
config is read. Otherwise WordType fails to work and configuration value
|
||
|
extraction will fail.
|
||
|
|
||
|
* htmerge/htmerge.cc: move initialization
|
||
|
|
||
|
* test/conf/htdig.conf2.in: reorder so that it looks as much as possible as conf.in
|
||
|
|
||
|
Thu Jan 13 12:33:46 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htdb/htstat.cc,htdump.cc,htload.cc: set proper progname
|
||
|
|
||
|
Wed Jan 12 20:02:26 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htcommon/HtWordList.cc (Dump): Use Walk instead of Collect otherwise does not work.
|
||
|
|
||
|
Wed Jan 12 19:38:33 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htlib/HtDateTime.h (class HtDateTime): killed void SetDateTime(const int t)
|
||
|
because they cause problems when time_t is an int and were useless anyway.
|
||
|
|
||
|
Wed Jan 12 13:31:45 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htword/WordBitCompress.h: remove inline qualifier on check_tag1: its not inline
|
||
|
|
||
|
* htword/WordKey.h: #define WORD_KEY_UNKNOWN_POSITION to -1. Remove default
|
||
|
argument to SetToFollowing so that its more explicit when used with
|
||
|
WORD_KEY_UNKNOWN_POSITION.
|
||
|
|
||
|
* htword/WordKey.cc: change name of variable info0 to info
|
||
|
|
||
|
* htword/WordList.cc: use WordKey::Info instead of WordKeyInfo::Get as done
|
||
|
in WordKey.cc for consistency.
|
||
|
|
||
|
* htword/WordList.{cc,h},htword/WordDB.h: rename WordCursor to WordDBCursor
|
||
|
for consistency.
|
||
|
|
||
|
* htword/WordList.h: Kill the WordSearchDescription::Setup useless function
|
||
|
|
||
|
* htword/WordList.h: WordSearchDescription constructor now have a straightforward
|
||
|
semantics.
|
||
|
|
||
|
* htword/WordList.h: Rename Search into Collect since it already existed, just
|
||
|
with a different prototype.
|
||
|
|
||
|
Wed Jan 12 12:36:46 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htword/WordList.h (class WordSearchDescription): add cursor member
|
||
|
|
||
|
Tue Jan 11 19:33:44 2000 Marcel Bosc <bosc at ceic.com>
|
||
|
|
||
|
* htlib/HtVectorGeneric,htword: Fixed some warnings found
|
||
|
when compiling under FreeBSD
|
||
|
|
||
|
Tue Jan 11 18:22:58 2000 Marcel Bosc <bosc at ceic.com>
|
||
|
|
||
|
* htlib/HtVectorGeneric.h: inlined functions Add and Allocate which
|
||
|
are critical to performance
|
||
|
|
||
|
Tue Jan 11 12:18:47 2000 Marcel Bosc <bosc at ceic.com>
|
||
|
|
||
|
* htword/WordKey.h: fixed uninitialized memory read
|
||
|
|
||
|
* htword/WordBitCompress.cc: Fixed big number BUG
|
||
|
Fixed memeory leak
|
||
|
|
||
|
Tue Jan 11 09:37:36 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htword/WordList.h: move operator << and operator >> to end of
|
||
|
functions declarations instead of data members.
|
||
|
|
||
|
* htword/WordList.h: added more comments on functions behaviour.
|
||
|
|
||
|
* htword/WordList.h: added #if SWIG for Perl interface
|
||
|
|
||
|
Mon Jan 10 17:55:05 2000 Marcel Bosc <bosc at ceic.com>
|
||
|
|
||
|
* htword/WordDBPage: enhanced compression debugging output
|
||
|
|
||
|
Mon Jan 10 09:07:19 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* WordContext.h,WordKey.h,WordList.h: Added #if SWIG for perl
|
||
|
interfaces. Remove InSortOrder, useless now that everything
|
||
|
is manipulated in sort order as far as the interface is concerned.
|
||
|
|
||
|
* WordKey.cc,WordList.cc: remove InSortOrder
|
||
|
|
||
|
* WordKey.h,WordRecord.h,WordReference.h: commented out Set/Get for
|
||
|
ascii Set/Get for SWIG.
|
||
|
|
||
|
* WordKey.h: turn CopyFrom to public for those who dont want to
|
||
|
use operator =.
|
||
|
|
||
|
* WordKey.h: rename info -> Info and nfields NFields
|
||
|
|
||
|
* WordKey.h: remove int IsFullyDefined() const redundant with Filled
|
||
|
|
||
|
Thu Jan 06 14:41:15 2000 Marcel Bosc <bosc at ceic.com>
|
||
|
|
||
|
* htword,all: Changed interface to overloaded Walk function that was
|
||
|
ambigous on some compilers...
|
||
|
|
||
|
Thu Jan 06 14:00:01 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htword/WordList.h (class WordSearchDescription): rename setup to Setup
|
||
|
|
||
|
* htword/WordList.h (class WordBenchmarking): rename show to Show
|
||
|
|
||
|
* htword/WordRecord.{h,cc}, htword/WordReference.h, htword/WordList.h:
|
||
|
add comments, reorganize member functions for clarity.
|
||
|
|
||
|
Thu Jan 06 12:01:47 2000 Marcel Bosc <bosc at ceic.com>
|
||
|
|
||
|
* htword/compression: Split WordDBCompress.* to WordDBCompress +
|
||
|
WordDBPage.*
|
||
|
|
||
|
* htword/WordBitCompress: renamed put/get to put_uint/get_uint. added get/put_uint_vl
|
||
|
|
||
|
* htword/compression: modified slightly the compression: this makes old databases
|
||
|
OBSOLETE: headers compress better. Chaged Flags compress better and faster.
|
||
|
|
||
|
* htword/WordKey: added operator [] and Get/Set accessors
|
||
|
|
||
|
* htword: removed the obsolete --with_key configure option (KEYDESC)
|
||
|
|
||
|
* htword/WordMonitor: addded monitor input
|
||
|
|
||
|
Wed Jan 05 14:32:31 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htword/WordKeyInfo.h (class WordKeyInfo ): if(encode) was if(sort)
|
||
|
|
||
|
* htword/WordKeyInfo.h: rename show to Show an nprint to Nprint
|
||
|
|
||
|
* htword/WordKeyInfo.h: move WORD_ISA from WordKey.h to WordKeyInfo.h,
|
||
|
rename WORD_ISA_String to WORD_ISA_STRING.
|
||
|
|
||
|
* htword/WordKey.h: rename FATAL_ABORT to WORD_FATAL_ABORT and errr to word_errr
|
||
|
|
||
|
* htword/WordKey.h: move private functions at bottom of class above data members
|
||
|
rename show_packed to ShowPacked
|
||
|
|
||
|
* htword/WordKey.cc: move WordKeyInfo::SetKeyDescriptionRandom from WordKey.cc
|
||
|
to WordKeyInfo.cc
|
||
|
|
||
|
* htword/WordKeyInfo.cc: add include htconfig.h
|
||
|
|
||
|
Wed Jan 05 13:26:16 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htdig/ExternalParser.cc (parse): use nocase_compare instead of mystrcasecmp to
|
||
|
suppress warnings. (char*)String for mystrncasecmp that has no equivalent in
|
||
|
the String class.
|
||
|
|
||
|
* htdig/Retriever.cc (IsValidURL): remove warning by (char*)url
|
||
|
|
||
|
Wed Jan 05 11:54:19 2000 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htword/WordKey.h: kill obsolete comment and add suffix explanation at
|
||
|
the beginning of the file.
|
||
|
|
||
|
* htword/WordKey.h (class WordKey): rename copy_from and initialize to CopyFrom
|
||
|
and Initialize to fit naming conventions. Reorganize the methods to group them
|
||
|
in logical sets. Fix indenting. Comment each method.
|
||
|
|
||
|
* htword/WordKey.h (Clear): add kword.trunc()
|
||
|
|
||
|
* htword/WordKey.h: protect SetWord(const char *str,int len) because it opens
|
||
|
the door to all kind of specific derivations. Should be
|
||
|
SetWord(String(foo, foo_length)) if not performance critical.
|
||
|
|
||
|
Wed Dec 29 18:41:14 1999 Marcel Bosc <bosc at ceic.com>
|
||
|
|
||
|
* htlib/HtMaxMin: added max/min of arrays, added comments to
|
||
|
HtMaxMin. Added HtMaxMin.cc all these are used in htword
|
||
|
|
||
|
* htlib/HtTime.h: added comments. included portable time.h
|
||
|
|
||
|
* htlib/HtVectorGeneric.cc: added HtVector_double, HtVector_String
|
||
|
|
||
|
* htlib/HtVectorGeneric.h: inlined several methods, disactivated CheckBounds
|
||
|
|
||
|
* htlib/StringMatch.cc: removed #include"WordType.h", this made htlib dependant
|
||
|
on htword, which is not acceptable for a library
|
||
|
|
||
|
* htlib/HtWordType.h: this replaces the macros used in StringMatch.cc
|
||
|
|
||
|
* htlib/HtRandom.h: added tools for using random number
|
||
|
(this is used currently in tests)
|
||
|
|
||
|
* htword/WordBitCompress.cc: transfered max_v/min_v to htlib
|
||
|
|
||
|
* htword/WordBitCompress.cc: optimized put/get for better performance
|
||
|
|
||
|
* htword/WordMonitor: system for detailed monitoring of operation
|
||
|
and performance within htword
|
||
|
|
||
|
* htword/WordDBCompress: fixed compression for case of empty WordRecord
|
||
|
|
||
|
* htword/WordDBCompress: cleaned up some code added some comments
|
||
|
|
||
|
* htword/WordKeyInfo: split WordKey files into WordKey and WordKeyInfo files
|
||
|
|
||
|
* htword/WordContext: centralized global configuration into one class
|
||
|
|
||
|
* htword/WordKey: inserted randomized key/keydescription into WordKey classes
|
||
|
(this was previously used in several tests)
|
||
|
|
||
|
* htword/WordKey: optimized Compare, UnpackNumber for speed (these are
|
||
|
really speed critical)
|
||
|
|
||
|
* htword/WordRecord: is now configurable, type can be configured to "DATA" (htdig)
|
||
|
or "NONE" (for other uses)
|
||
|
|
||
|
* htword/WordType: changed macros to global functions to make it compatible
|
||
|
with cleanup in StringMatch. Integrated WordType to WordContext
|
||
|
configuration/Initialization
|
||
|
|
||
|
* htword/WordKeyInfo: fixed initialization from key descrition file
|
||
|
|
||
|
Tue Dec 28 18:58:21 EET 1999 Vadim Chekan <vadim at etc.lviv.ua>
|
||
|
|
||
|
* htlib/String.cc: String::lowercase(), String::uppercase()
|
||
|
support for national character added.
|
||
|
|
||
|
* htfuzzy/Prefix.cc: method "prefix" works now.
|
||
|
|
||
|
Mon Dec 27 22:17:48 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htdig/htdig.cc (main): change '\r\n' to "\r\n"
|
||
|
|
||
|
* Makefile.config,db/dist/Makefile.in: rename libdb to libhtdb to
|
||
|
prevent conflicts with installed libdb.
|
||
|
|
||
|
* db/dist/Makefile.in: do not install documentation nor binary
|
||
|
utilities (db_dump & al) since they are replaced by htdb binaries
|
||
|
(htdump & al).
|
||
|
|
||
|
* db/dist/Makefile.in (prefix): prepend $(DESTDIR) to prefix
|
||
|
to support make DESTDIR=/staging install for binary distribution
|
||
|
packages generation.
|
||
|
|
||
|
* configure.in: use AC_FUNC_ALLOCA to check for alloca. Used
|
||
|
in regex and test/dbbench.cc only but definitely a usefull
|
||
|
feature to have.
|
||
|
|
||
|
Thu Dec 23 11:10:24 1999 Marcel Bosc <bosc at ceic.com>
|
||
|
|
||
|
* htcommon/defaults.cc: set wordlist_cache_size default to 10Meg
|
||
|
|
||
|
* db/mp: removed some debuging messages
|
||
|
|
||
|
* htword/WordList.cc: added warning if no cache
|
||
|
|
||
|
* test/word.cc: added cache
|
||
|
|
||
|
* htlib/HtTime.h: added ifdefs for portable time.h sys/time.h
|
||
|
|
||
|
Tue Dec 21 23:33:06 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htdoc/attrs.html,cf_by*.html: regenerate to include
|
||
|
wordlist_wordkey_description attribute
|
||
|
|
||
|
* htcommon/Makefile.am: Add AM_LFLAGS = -L and AM_YFLAGS = -l to
|
||
|
prevent #line generation because it confuses the dependencies
|
||
|
generator of GCC if configure run out of source tree.
|
||
|
|
||
|
* configure.in: remove --with-key option. Not needed since
|
||
|
word description now dynamic. Destroyed WordKey.h if
|
||
|
specified.
|
||
|
|
||
|
* htword/Makefile.am: remove commented lines for WordKey.h
|
||
|
generation.
|
||
|
|
||
|
Tue Dec 21 18:18:01 1999 Marcel Bosc <bosc at ceic.com>
|
||
|
|
||
|
* htword: added code for benchmarking
|
||
|
|
||
|
Mon Dec 20 17:59:15 1999 Marcel Bosc <bosc at ceic.com>
|
||
|
|
||
|
* WordKey: Made the key structure dynamic: Changing the
|
||
|
key structure used to imply recompiling the htword library.
|
||
|
This should not change anything in htdig.
|
||
|
|
||
|
* WordKey: numerical key fields are stored in an array of unsigned
|
||
|
ints instead of compile-time defined pools.
|
||
|
|
||
|
* WordKey.h: WordKey now needs copy opreators. Setbits are stored
|
||
|
in sort order (used to be in encoding order)
|
||
|
|
||
|
* htword: word_key_info is now a pointer, had to change all references
|
||
|
|
||
|
* word.cc: Rewrote wordkey test for new dynamically
|
||
|
set key structure. The test randomly creates key structures
|
||
|
and tests them.
|
||
|
|
||
|
* test: adapted test files (simplifies things a lot)
|
||
|
|
||
|
1999-12-21 Toivo Pedaste <toivo at ucs.uwa.edu.au>
|
||
|
|
||
|
* htlib/Dictionary.cc: Fix memory leak when destroying dictionary
|
||
|
|
||
|
* htlib/StringList.cc, htdig/Retriever.cc: Fix memory leak, not
|
||
|
the most elegent way but I'm not sure about the exact semantics
|
||
|
of StringList
|
||
|
|
||
|
Mon Dec 20 21:59:03 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htdb/{Makefile.am,err.c,getlong.c}: Fix mistake: err.c and
|
||
|
getlong.c contain C functions (declared in clib_ext) and
|
||
|
must be C compiled otherwise the prototype won't fit. Checking
|
||
|
db Makefiles, getlong.c and err.c are added to the list of objects
|
||
|
for each utility program. This guaranties that they won't conflict
|
||
|
with objects included in libdb.a.
|
||
|
|
||
|
Sun Dec 19 20:04:42 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htdb/{Makefile.am, err.cc}: add err.cc for portability
|
||
|
purposes.
|
||
|
|
||
|
Fri Dec 17 18:04:09 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* Makefile.config: add PROFILING variable and document it. Designed
|
||
|
to enable profiling of htdig easily.
|
||
|
|
||
|
* */Makefile.am: add *_LDFLAGS = $(PROFILING) for every binary to
|
||
|
enable profiling, if specified.
|
||
|
|
||
|
Thu Dec 16 17:16:33 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htdb/*.cc: add -W option to activate htword specific compression.
|
||
|
Keep compatibility with zlib compression (-z only).
|
||
|
|
||
|
Thu Dec 16 11:56:02 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* test/dbbench.cc: change wrong strcpy with memcpy
|
||
|
|
||
|
Wed Dec 15 15:04:39 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/htdig.cc(main): Handle list of URLs given on stdin, if
|
||
|
optional "-" argument given. (Uses >> operator below.)
|
||
|
|
||
|
* htlib/htString.h, htlib/String.cc: Added Alexis Mikhailov's String
|
||
|
input methods, readLine() and >> operator.
|
||
|
|
||
|
Wed Dec 15 13:59:34 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/Retriever.cc: remove include of sys/stat.h, which is no
|
||
|
longer needed after hack removed from Need2Get(), and could pose
|
||
|
a problem on systems that need sys/types.h included first.
|
||
|
|
||
|
Wed Dec 15 17:00:04 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htword/WordDB.h: add inline keyword for portability
|
||
|
|
||
|
* htword/WordDB.h: add CmprInfo method to get object describing
|
||
|
compression scheme for Berkeley DB
|
||
|
|
||
|
* htdb: Add htdump, htload, htstat equivalent of db_dump
|
||
|
db_load and db_stat that know about htword specific compression
|
||
|
strategy.
|
||
|
|
||
|
* htword/WordDBCompress: add static to localy defined functions and
|
||
|
variables, remove unecessary #define and #include from header.
|
||
|
|
||
|
Tue Dec 14 21:56:57 EET 1999 Vadim Chekan <vadim at etc.lviv.ua>
|
||
|
|
||
|
* htcommon/conf_parser.lxx, htcommon/conf_lexer.cxx:
|
||
|
bcopy on Solaris is in strings.h, not in string.h. Added
|
||
|
check for #ifdef HAVE_STRINGS_H
|
||
|
|
||
|
Tue Dec 14 19:18:22 1999 Marcel Bosc <bosc at ceic.com>
|
||
|
|
||
|
* WordBitCompress: code cleaned up and commented
|
||
|
|
||
|
Tue Dec 14 18:32:21 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htword/Word{Record,Reference,Key}: added a Get method to
|
||
|
convert the structure into it's ascii string representation.
|
||
|
operator << now uses Get.
|
||
|
|
||
|
Tue Dec 14 17:46:33 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* db/dist/Makefile.in (install): fix bugous test for libshared
|
||
|
|
||
|
Tue Dec 14 14:10:28 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htword/{WordKey,WordReference,WordRecord}: rework
|
||
|
the input methods (operator >>). Each class now has a Set function
|
||
|
to initialize itself from an ascii description and a Get function
|
||
|
to retrieve an ascii description of the object.
|
||
|
|
||
|
* htword/WordList: operator >> has a better and cleaner input loop
|
||
|
using StringList and String instead of char*.
|
||
|
|
||
|
Tue Dec 14 12:06:24 1999 Marcel Bosc <bosc at ceic.com>
|
||
|
|
||
|
* WordDBCompress.cc : Added compression version checking
|
||
|
|
||
|
Mon Dec 13 21:09:31 EET 1999 Vadim Chekan <vadim at etc.lviv.ua>
|
||
|
|
||
|
* htcommon/conf_parser.lxx, htcommon/conf_lexer.cxx:
|
||
|
Added #include <string.h> Without it failed to compile
|
||
|
on Solaris.
|
||
|
|
||
|
Mon Dec 13 16:31:27 1999 Marcel Bosc <bosc at ceic.com>
|
||
|
|
||
|
* htword/WordBitCompress.cc : fixed bug that made compression
|
||
|
fail on big documents or big number of url's ...
|
||
|
|
||
|
Mon Dec 13 13:49:35 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htword/WordKey.h.tmpl: Added *_POSITION macro generation
|
||
|
|
||
|
Mon Dec 13 11:51:50 1999 Marcel Bosc <bosc at ceic.com>
|
||
|
|
||
|
* htcommon/conf_parser.yxx: fixed several delete that should be delete []
|
||
|
|
||
|
Sun Dec 12 17:14:00 EET 1999 Vadim Chekan <vadim at etc.lviv.ua>
|
||
|
|
||
|
* htcommon/conf_lexer.lxx, htcommon/conf_lexer.cxx:
|
||
|
national symbols are allowed in right part of expressions
|
||
|
(noted by Marcel Bosc).
|
||
|
Changed default behavior of flex from print unknown chars
|
||
|
on stdout to exit with error message.
|
||
|
|
||
|
Sat Dec 11 17:34:03 EET 1999 Vadim Chekan <vadim at etc.lviv.ua>
|
||
|
|
||
|
* htdig/Retriever.cc,htdig/htdig.cc: "exclude_urls","bad_querystr"
|
||
|
"bad_extensions","valid_extensions","local_default_doc"
|
||
|
changed for new config.
|
||
|
|
||
|
* htdig/Server.cc: "server_max_docs","server_wait_time" changed for
|
||
|
new config.
|
||
|
|
||
|
* check for "limit_normalized" moved from Retriever::got_href and
|
||
|
Retriever::got_redirect to more appropriate Retriever::IsValidUrl
|
||
|
|
||
|
Fri Dec 10 18:05:48 1999 Marcel Bosc <bosc at ceic.com>
|
||
|
|
||
|
* htword: checked for failed memory allocations in compression code
|
||
|
|
||
|
Fri Dec 10 18:03:42 1999 Marcel Bosc <bosc at ceic.com>
|
||
|
|
||
|
* htword/WordList,htcommon/HtWordList.cc,htmerge/words.cc: cleaned up WordList::Walk()
|
||
|
function, change two occurences of WordList::Walk in htdig files
|
||
|
|
||
|
Fri Dec 10 17:40:22 1999 Marcel Bosc <bosc at ceic.com>
|
||
|
|
||
|
* htword/WordKey.cc (Compare): Fixed bug: compare used to compare chars and not
|
||
|
unsigned chars, this failed when non-ascii caracters were used
|
||
|
|
||
|
Fri Dec 10 11:54:36 1999 Marcel Bosc <bosc at ceic.com>
|
||
|
|
||
|
* htcommon/defaults.cc : doc for wordlist_cache_size
|
||
|
|
||
|
Thu Dec 09 17:07:47 1999 Marcel Bosc <bosc at ceic.com>
|
||
|
|
||
|
* htcommon/defaults.cc: added defaults for compression and DB configuration
|
||
|
parameters
|
||
|
|
||
|
Thu Dec 09 16:47:54 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* db/dist/configure.in,Makefile.in: Added shared lib support
|
||
|
for linux only. Not enabled if not on linux.
|
||
|
|
||
|
Thu Dec 09 15:07:11 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* acinclude.m4,db/dist/acinclude.mr: CHECK_ZLIB now fails if either
|
||
|
zlib.h or libz is not found.
|
||
|
|
||
|
* configure.in: do not test zlib.h
|
||
|
|
||
|
* db/db/db.c,db/mp/mp_fopen.c: added #ifdef HAVE_ZLIB so that
|
||
|
compilation works if zlib is not found
|
||
|
|
||
|
* htlib/.cvsignore: remove wrong *.cxx
|
||
|
|
||
|
* test/dbbench.cc: added #ifdef HAVE_ZLIB so that
|
||
|
compilation works if zlib is not found
|
||
|
|
||
|
Thu Dec 09 13:25:45 1999 Marcel Bosc <bosc at ceic.com>
|
||
|
|
||
|
* test/Word.cc,t_wordlist,Makefile.am: upgraded tests
|
||
|
* htcommon/HtWordList.h: fixed Configuration/HtConfiguration problem
|
||
|
|
||
|
Thu Dec 09 12:10:32 1999 Marcel Bosc <bosc at ceic.com>
|
||
|
|
||
|
* htword: Added the compression code:
|
||
|
* WordDBCompress: Classes for page specific compression code
|
||
|
* WordBitCompress: Classes for bitstreams and non-specific compression
|
||
|
|
||
|
Thu Dec 9 12:09:51 EET 1999 Vadim Chekan <vadim at etc.lviv.ua>
|
||
|
|
||
|
* htcommon/HtConfiguration.cc: bug fix: sometimes
|
||
|
htConfiguration::Find(url,char*) retuned empty values
|
||
|
even if there was something to return.
|
||
|
|
||
|
Thu Dec 09 11:15:30 1999 Marcel Bosc <bosc at ceic.com>
|
||
|
|
||
|
* htlib/Configuration.cc (Read): Read is now a virtual function: the old one
|
||
|
for Configuration the new one (Vadim's ... with the parser) in HtConfiguration
|
||
|
|
||
|
Thu Dec 09 11:01:22 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* acinclude.m4: upgrade AC_PROG_APACHE macro for
|
||
|
modules detection.
|
||
|
|
||
|
* test/conf/httpd.conf,test/test_functions.in,test/conf/Makefile:
|
||
|
use @APACHE_MODULES@ to accomodate various apache modules directory
|
||
|
flavors.
|
||
|
|
||
|
Tue Dec 07 20:32:34 1999 Marcel Bosc <bosc at ceic.com>
|
||
|
|
||
|
* htdig: Split the Configuration class into Configuration
|
||
|
and HtConfiguration. All the HtConfiguration and the
|
||
|
configuration parsing (lex..) was woved to htcommon.
|
||
|
Configuration was replaced by HtConfiguration as needed
|
||
|
|
||
|
Tue Dec 07 16:21:13 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* configure.in: added AM_PROG_LEX and AC_PROG_YACC
|
||
|
|
||
|
* htlib/Makefile.am: simply set conf_lexer.lxx and conf_parser.yxx,
|
||
|
automake knows how to handle these. The renaming is needed to avoid
|
||
|
conflicts in automake generated rules.
|
||
|
|
||
|
Mon Dec 6 16:23:39 CST 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdoc/cf_generate.pl: added a bit of error checking for when it
|
||
|
can't fetch the config info, and made it more flexible for what it
|
||
|
allows as terminator.
|
||
|
* htcommon/defaults.cc: add default and description for authorization
|
||
|
attribute, and clean up external_protocols entry for cf_generate.pl.
|
||
|
* htdoc/attrs.html, cf_by{name,prog}.html: reran cf_generate.pl
|
||
|
* htdig/htdig.cc(main): set authorization parameter before Retriever
|
||
|
constuctor is called, as it may initialize a Server. (Should complete
|
||
|
fix of PR#490.)
|
||
|
|
||
|
Mon Dec 6 21:34:29 EET 1999 Vadim Chekan <vadim at etc.lviv.ua>
|
||
|
|
||
|
* htdig/Document.cc htdig/htdig.cc: "authorization" parameter
|
||
|
in config is added and is new config compatible.
|
||
|
New code has'n got PR#490 bug (don't authentificate robot.txt)
|
||
|
|
||
|
Mon Dec 06 11:58:56 1999 Marcel Bosc <bosc at ceic.com>
|
||
|
|
||
|
* HtVectorGeneric.h: generic vectors, stl-free: this was originally a copy of
|
||
|
HtVector.h with Object * replaced by GType and some small changes.
|
||
|
It has been modified and checked to see if it all works ok.
|
||
|
You can build vectors of any type that has an empty constructor.
|
||
|
* HtVectorGenericCode.h: generic vectors, stl-free: implementation
|
||
|
(modified "copy" of HtVector.cc)
|
||
|
* HtVectorGeneric.cc: generic vectors: implementation for common types
|
||
|
* HtVector_int.h: generic vectors: declaration for the most common type
|
||
|
(and example of howto use)
|
||
|
|
||
|
Sat Dec 4 23:49:18 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htfuzzy/Synonym.cc (createDB): Change declaration to match
|
||
|
Fuzzy::createDB(config), allowing the method to be called by
|
||
|
htfuzzy.
|
||
|
|
||
|
* htfuzzy/htfuzzy.cc (main): Add an error message if
|
||
|
fuzzy->createDB() comes back with an error.
|
||
|
|
||
|
Sat Dec 4 15:38:34 EET 1999 Vadim Chekan <vadim at etc.lviv.ua>
|
||
|
|
||
|
* htnet/HtHTTP.cc, htnet/HtHTTP.h, htdig/Document.cc
|
||
|
fixed proxy bug. GET command in HtHTTP included only
|
||
|
path of url insead full url when use proxy.
|
||
|
HtHTTP::UseProxy(int) added.
|
||
|
|
||
|
* htdig/Document.cc: make "http_proxy" parameter
|
||
|
url-depended for new configuration.
|
||
|
|
||
|
Fri Dec 03 14:57:13 1999 Marcel Bosc <bosc at ceic.com>
|
||
|
|
||
|
* BerkelyDB: Compression code: added possibility to use
|
||
|
user-defined compression routines (the goal is to enable
|
||
|
the mifluz-specific DB page compression that obtains
|
||
|
higher compression ratios than generic zlib compression)
|
||
|
this envolves the following changes in BerkeleyDB:
|
||
|
* BerkelyDB/CompressionEnvironment: Adding a structure db_cmpr_info
|
||
|
in db_env that permits db user to specify the external compression
|
||
|
routines and other information related to compression
|
||
|
* BerkelyDB/CompressionEnvironment: Adding a cmpr_context structure
|
||
|
to DB_MPOOLFILE that stores information that compression needs
|
||
|
(the _weacmpr DB and the db_cmpr_info)
|
||
|
* BerkelyDB/Compression: Needed to modify the compression
|
||
|
system (that is implemented in the BerkelyDB memory pool) to permit
|
||
|
higher compression ratios and to use the compression environment
|
||
|
|
||
|
Thu Dec 2 16:47:30 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/Retriever.cc(parse_url): Use a static int to avoid
|
||
|
re-fetching local_urls_only from the config object.
|
||
|
(Initial, got_href, got_redirect): Try to get the local filename
|
||
|
for a server's robots.txt file and pass it along to the newly
|
||
|
generated server.
|
||
|
|
||
|
* htdig/Server.cc(ctor): Retrieve the robots.txt file from the
|
||
|
filesystem when possible and respect the local_urls_only option.
|
||
|
|
||
|
* htdig/Server.h: Change type of local_robots_file to String* to
|
||
|
better match Retriever::GetLocal().
|
||
|
|
||
|
Thu Dec 02 16:24:27 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htword/WordReference.cc,WordKey.cc,WordRecord.cc (Print): Add function
|
||
|
to ease printing from Perl.
|
||
|
|
||
|
Thu Dec 02 16:06:29 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htword/WordReference.h (WORD_FILLED): remove
|
||
|
unused WORD_FILLED and WORD_PARTIAL macros
|
||
|
|
||
|
Wed Dec 01 19:18:42 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htword/WordKey.h.tmpl,WordRecord.h,WordReference.h,
|
||
|
WordList.h: Added #ifndef SWIG for
|
||
|
www.swig.org sake.
|
||
|
|
||
|
Wed Dec 1 19:47:20 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/HtRegex.cc, htlib/HtRegex.h (set*): Add a case_sensitive
|
||
|
flag which defaults to insensitive. This better mirrors the
|
||
|
StringMatch class.
|
||
|
|
||
|
* htcommon/URL.cc(signature): Make the signature a proper URL to
|
||
|
the base of the server.
|
||
|
|
||
|
* htdig/Server.h: Add IsDead() methods to query the status of the
|
||
|
server, as well as an IsDisallowed() method to query whether a URL
|
||
|
is forbidden by the robots.txt rules. Change _disallow to HtRegex.
|
||
|
|
||
|
* htdig/Server.cc(ctor): Only retrieve the robots.txt file if this
|
||
|
is an http or https server.
|
||
|
(robotstxt): Use the proper HtRegex method for setting the pattern.
|
||
|
(push): Remove logic checking the _disallow patterns. This is now
|
||
|
done by the Retriever object.
|
||
|
|
||
|
* htcommon/defaults.cc: Add new attribute "local_urls_only" which
|
||
|
defaults to false, which dictates whether retrieval should revert
|
||
|
to another method if RetrieveLocal() fails.
|
||
|
|
||
|
* htdig/Retriever.cc(parse_url): Check to see if the server is
|
||
|
dead before calling the Retrieve() method. Notify the server
|
||
|
object if a connection fails. Also respects the new
|
||
|
local_urls_only attribute as described above.
|
||
|
(IsValidURL): Check the server's IsDisallowed() method to see if
|
||
|
the robots.txt forbids this URL.
|
||
|
|
||
|
* htdoc/THANKS.html: Updated to reflect current contributions, etc.
|
||
|
|
||
|
* README: Update to mention version 3.2.0b1.
|
||
|
|
||
|
Wed Dec 1 17:05:48 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/Retriever.cc(GetLocal): Fix error in GetLocalUser() return
|
||
|
value check, as suggested by Vadim.
|
||
|
|
||
|
Wed Dec 1 15:57:09 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* contrib/conv_doc.pl: Added a sample external converter script.
|
||
|
|
||
|
Mon Nov 29 23:19:35 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/Retriver.cc, htdig/Retriver.h, htdig/Server.cc,
|
||
|
htdig/Server.h: forward-ported patch provided by Alexis Mikhailov
|
||
|
<alexis at medinf.chuvashia.su> and Gilles's for cleaning up
|
||
|
IsLocal/GetLocal. Makes local digging persistent, even when HTTP
|
||
|
server is down.
|
||
|
|
||
|
Mon Nov 29 22:35:06 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* test/url.cc: New test for URL class.
|
||
|
|
||
|
* test/url.parents: Base URLs for parsing.
|
||
|
|
||
|
* test/url.children: Derived relative URLs for testing.
|
||
|
|
||
|
* test/Makefile.am, test/Makefile.in: Add the above for building.
|
||
|
|
||
|
* htcommon/URL.cc: A variety of bug fixes (some hacks), especially
|
||
|
for file:// and user@host URLs.
|
||
|
|
||
|
Sun Nov 28 00:35:59 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* .version: Bump to 3.2.0b1-dev.
|
||
|
|
||
|
Sat Nov 27 20:23:14 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/ExternalTransport.h, htdig/ExternalTransport.cc: New class
|
||
|
to allow external scripts to handle transport methods.
|
||
|
|
||
|
* contrib/handler.pl: Example handler using the program 'curl' to
|
||
|
handle HTTP or HTTPS transactions.
|
||
|
|
||
|
* htcommon/defaults.cc: Add new configuration option
|
||
|
'external_protocols' as a list of protocols and scripts to handle
|
||
|
them. Documentation currently needs to be written.
|
||
|
|
||
|
* htdig/Document.h, htdig/Document.cc(Retrieve): Call
|
||
|
ExternalTransport::canHandle to establish which protocols are
|
||
|
supported by handler scripts and then create an appropriate
|
||
|
transport object.
|
||
|
|
||
|
* Makefile.in, htdig/Makefile.am, htdig/Makefile.in: Add
|
||
|
dependencies for ExternalTransport class.
|
||
|
|
||
|
* htnet/HtHTTP.h, htnet/HtHTTP.cc, htnet/Transport.h,
|
||
|
htnet/Transport.cc: Move _location field from HtHTTP_Response to
|
||
|
Transport_Response to allow other subclasses to use it. Similarly,
|
||
|
move NewDate and RecognizeDateFormat to Transport.
|
||
|
|
||
|
Fri Nov 26 17:07:52 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/HTML.cc(HTML & do_tag): add code to turn off indexing between
|
||
|
<style> and </style> tags.
|
||
|
|
||
|
Fri Nov 26 15:56:47 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/Display.cc(setVariables): added Alexis Mikhailov's fix
|
||
|
to check the number of pages against maximum_pages at the right time.
|
||
|
* htlib/String.cc(write): added Alexis Mikhailov's fix to bump up
|
||
|
pointer after writing a block.
|
||
|
|
||
|
Wed Nov 24 15:10:05 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* installdir/htdig.conf: Add bad_extensions to make it more obvious to
|
||
|
users how to exclude certain document types.
|
||
|
|
||
|
Tue Nov 23 19:29:37 CST 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htnotify/htnotify.cc(send_notification): apply Jason Haar's fix
|
||
|
to quote the sender name "ht://Dig Notification Service".
|
||
|
|
||
|
Tue Nov 23 19:46:00 EET 1999 Vadim Chekan <vadim at etc.lviv.ua>
|
||
|
|
||
|
* conf.tab.cc.h conf.l.cc conf.tab.cc
|
||
|
Added files pre-generated from conf.y, conf.l
|
||
|
|
||
|
Sun Nov 21 18:26:21 EET 1999 Vadim Chekan <vadim at etc.lviv.ua>
|
||
|
|
||
|
*htdig/Document.cc: "max_doc_size" supports new
|
||
|
configuration and is url-depended now.
|
||
|
|
||
|
Sun Nov 21 17:06:50 EET 1999 Vadim Chekan <vadim at etc.lviv.ua>
|
||
|
|
||
|
* New config parser commited. htlib/(Makefile.am,Makefile.in),
|
||
|
htlib/Configuration.cc, htlib/Configuration.h
|
||
|
htlib/(conf.y, conf.l) added.
|
||
|
|
||
|
Fri Nov 12 14:17:37 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htcommon/cgi.cc(init): Fix bug in reading long queries via POST
|
||
|
method (PR#668).
|
||
|
|
||
|
Wed Nov 10 15:34:04 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/Display.cc(setVariables & createURL),
|
||
|
htsearch/htsearch.cc(main), htdoc/hts_templates.html: handle keywords
|
||
|
input parameter like others, and make it propagate to followups.
|
||
|
|
||
|
Wed Nov 10 15:16:57 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/Retriever.cc: Fix PR#688, where htdig goes into an infinite
|
||
|
loop if an entry in local_urls (or local_user_urls) is missing a '='
|
||
|
(or a ',').
|
||
|
|
||
|
* htcommon/defaults.cc: removed vestigial references to MAX_MATCHES
|
||
|
template variables in search_results_{header,footer}.
|
||
|
* htdoc/attrs.html, cf_by{name,prog}.html: reran cf_generate.pl
|
||
|
|
||
|
* htdoc/hts_form.html: add disclaimer about keywords parameter not
|
||
|
being limited to meta keywords.
|
||
|
|
||
|
* htdoc/meta.html: add description of "keywords" meta tag property.
|
||
|
add links to keywords_factor & meta_description_factor attributes.
|
||
|
|
||
|
1999-11-10 Toivo Pedaste <toivo at ucs.uwa.edu.au>
|
||
|
|
||
|
* htdig/Retriever.cc : Ignore SIGPIPEs with persistant connections
|
||
|
|
||
|
* htnet/HtHTTP.cc : Fix buffer overrun reading chunks
|
||
|
|
||
|
* htdig/Document.cc : Make redirects work
|
||
|
|
||
|
* htdig/Retriever.cc : Make valid URL checks apply to initial URL's
|
||
|
particularly those from a previous run
|
||
|
|
||
|
* htlib/Dictionary.cc : Fix memory deallocation error
|
||
|
|
||
|
|
||
|
Tue Nov 02 13:44:57 1999 Marcel Bosc <bosc at ceic.com>
|
||
|
|
||
|
* htsearch/Display.cc (setVariables): parentheses missing around ternary
|
||
|
operator : confusion in priority with <<.
|
||
|
|
||
|
Tue Nov 02 13:33:50 1999 Marcel Bosc <bosc at ceic.com>
|
||
|
|
||
|
* htsearch/Display.cc (hilight): changed static char * (!!) to const string,
|
||
|
static char evaluated before configuration is loaded so config had no
|
||
|
effect + unnecesary conversion
|
||
|
|
||
|
Tue Nov 02 11:45:49 1999 Marcel Bosc <bosc at ceic.com>
|
||
|
|
||
|
* htword/WordKey.cc : Cleaned up obsolete code now using *InSortOrder fcts
|
||
|
and WordKeyInfo.sort[]
|
||
|
* htword/WordKey : Added FirstSkipField :
|
||
|
find first field that must be checked for skip
|
||
|
* htword/WordKey (PrefixOnly): now returns OK/NOTOK, fixed bug which
|
||
|
made Walk loop over the whole db if the searchkey just had
|
||
|
a the "word" field defined
|
||
|
* htword/WordKey.cc (Unpack): had forgten to: SetDefinedWordSuffix
|
||
|
* htword/WordKey.cc (operator >>): added check for very very long words
|
||
|
(even if this should never happen)
|
||
|
* htword/WordKey.cc (operators << >>): added <UNDEF> word suffix handling
|
||
|
* htword/WordKey.h : Filled() did not check for WordSuffix
|
||
|
* htword/WordKey.h : added WordKey::ExactEqual
|
||
|
* htword/WordKey.h (IsDefinedWordSuffix): fixed bad flag check
|
||
|
* htword/WordList : Removed all obsolete HTDIG_WORDLIST flags: only
|
||
|
two remain : COLLECTOR and WALKER the rest is now specified by the searchKey
|
||
|
removed action arg to WordList::Collect()
|
||
|
* htcommon/HtWordList.cc,htmerge/words.cc : changed flags in calls to WordList::Walk
|
||
|
* htword/WordList.cc : skip now deals with the SuffixUndefined case
|
||
|
|
||
|
Fri Oct 29 17:13:21 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdoc/cf_generate.pl: now updates last modified date in attrs.html
|
||
|
* htdoc/attrs.html: reran cf_generate.pl
|
||
|
|
||
|
Fri Oct 29 15:28:22 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/Display.cc(setVariables & hilight): added Sergey's idea
|
||
|
for start_highlight, end_highlight & page_number_separator attributes.
|
||
|
* htcommon/defaults.cc: added & documented these.
|
||
|
* htdoc/attrs.html, cf_by{name,prog}.html: reran cf_generate.pl
|
||
|
|
||
|
Thu Oct 28 13:06:23 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/ExternalParser.cc: added support for external converters
|
||
|
as extension to external_parsers attribute.
|
||
|
* htcommon/defaults.cc: Updated external_parsers with new description
|
||
|
and examples of external converters.
|
||
|
|
||
|
Thu Oct 28 12:52:28 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htcommon/defaults.cc: Updated programs lists for *_factor, so they
|
||
|
all refer to htsearch and not htdig. Added htsearch to programs lists
|
||
|
for translate_*. img_alt_factor & url_factor not defined yet because
|
||
|
they're still not used in htdig/htsearch.
|
||
|
|
||
|
Wed Oct 27 15:53:36 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htcommon/defaults.cc: added descriptions & examples for
|
||
|
doc_excerpt, heading_factor, max_descriptions, minimum_speling_length,
|
||
|
regex_max_words, use_doc_date, valid_extensions. Added references
|
||
|
to these elsewhere in document as appropriate. Removed -pairs option
|
||
|
from pdf_parser default (again). Minor changes to noindex_start & end,
|
||
|
and changed example for modification_time_is_now. Corrected references
|
||
|
to heading_factor_[1-6].
|
||
|
* htdoc/attrs.html, cf_by{name,prog}.html: reran cf_generate.pl
|
||
|
|
||
|
Wed Oct 27 13:32:50 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdoc/cf_generate.pl: changed formatting of output to more closely
|
||
|
match format of old attrs.html (to make diff'ing easier),
|
||
|
and fixed handling of pdf_parser default to strip quotes.
|
||
|
* htcommon/defaults.cc: oops, fixed typo in url_part_aliases example.
|
||
|
* htdoc/attrs.html, cf_by{name,prog}.html: reran cf_generate.pl
|
||
|
|
||
|
Wed Oct 27 18:24:36 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htdoc/cf_generate.pl: fixed wrong target for cf_byprog, escape
|
||
|
HTML chars <>&'" for default values.
|
||
|
|
||
|
Wed Oct 27 10:21:18 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htcommon/defaults.cc: restored 2nd example for url_part_aliases
|
||
|
|
||
|
Tue Oct 26 16:28:29 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htcommon/defaults.cc: corrected descriptions for allow_in_form,
|
||
|
search_results_header, noindex_start, noindex_end. Also fixed a
|
||
|
few small typos & formatting errors here & there in descriptions
|
||
|
and examples.
|
||
|
|
||
|
Tue Oct 26 16:01:22 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htword/Makefile.am: rm Wordkey.h instead of chmod to copy with
|
||
|
non existent WordKey.h
|
||
|
|
||
|
Tue Oct 26 10:54:52 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htcommon/default.cc: fixed all inconsistencies reported by Gilles.
|
||
|
|
||
|
Mon Oct 25 11:42:13 1999 Marcel Bosc <bosc at ceic.com>
|
||
|
|
||
|
* htword/ word.cc,t_wordskip,skip_db.txt: Added test for *Skip Speedup*
|
||
|
* htword/ WordList: Added tracing of Walk() for debuging purposes
|
||
|
|
||
|
Fri Oct 22 18:22:00 1999 Marcel Bosc <bosc at ceic.com>
|
||
|
|
||
|
* htword/ WordList.cc,WordKey: Added a defined/undefined flag for saying
|
||
|
if a search key's word is a prefix or not: WORD_KEY_WORDSUFFIX_DEFINED
|
||
|
reduces code size and makes it much easier to undertand
|
||
|
* htword/ WordList,WordReference,WordKey: Added input output streams for
|
||
|
WordList,WordReference,WordKey
|
||
|
|
||
|
Wed Oct 20 16:47:52 1999 Marcel Bosc <bosc at ceic.com>
|
||
|
|
||
|
* htword/ WordKey,Makefile.am,WordCaseIsAStatements.h: for readability
|
||
|
replaced the switch ... #ifdef ..STATEMENT().... sequence that apeared many times
|
||
|
with an include file :WordCaseIsAStatements.h
|
||
|
|
||
|
* htword/ WordKey: WordKeyInfo: duplicated all of the fields structure into
|
||
|
sort structure, for fast acces without cross referencing and for simplifying code
|
||
|
(required change of perl in template WordKey.h.tmpl)
|
||
|
|
||
|
* htword/ WordList: *Skip Speedup* added a speedup to avoid wasting time
|
||
|
by sequentialy walking through useless entries. see function:
|
||
|
SkipUselessSequentialWalking() for an example and more info
|
||
|
|
||
|
* htword/ WordKey.h,WordKey.cc: Changed Set,Unset,IsSet Wordkey accesors' names to:
|
||
|
SetDefined,Undefined,IsDefined. (easier to read and avoids naming conflicts)
|
||
|
|
||
|
* htword/ WordKey: added generic numerical accesors for accesing
|
||
|
numerical fields in WordKey (in sorted order):GetInSortOrder,SetInSortOrder
|
||
|
|
||
|
* htword/ WordKey,word_builder.pl: added a MAX_NFIELDS constant, that specifies
|
||
|
a maximum number of fields that a WordKey can have. Sanity check in word_builder.pl.
|
||
|
|
||
|
* htword/ word_builder.pl: enforced word sort order to ascending
|
||
|
|
||
|
* htword/ WordList: added a verbose flag using config."wordlist_verbose"
|
||
|
|
||
|
Tue Oct 19 18:36:42 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htword/WordType.h: const accessors to wtype and config
|
||
|
|
||
|
Tue Oct 19 13:10:47 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* acconfig.h: remove uncessary VERSION (redundant)
|
||
|
|
||
|
Tue Oct 19 11:32:38 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* db/Makefile.in,db/dist/Makefile.in: install db library so
|
||
|
that external applications can be linked.
|
||
|
|
||
|
Tue Oct 19 10:57:27 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* configure.in: add --with-key to specify alternate to htword/word.desc
|
||
|
|
||
|
* configure.in: htword is done before htcommon to prevent unecessary
|
||
|
recompilation because WordKey.h changes.
|
||
|
|
||
|
* htword/Makefile.am: use @KEYDESC@
|
||
|
|
||
|
Tue Oct 19 10:38:41 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* test/word.cc use TypeA instead of DocID and the like
|
||
|
|
||
|
Mon Oct 18 17:21:34 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* Makefile.config: AUTOMAKE_OPTIONS = foreign
|
||
|
|
||
|
Mon Oct 18 11:40:17 1999 Marcel Bosc <bosc at ceic.com>
|
||
|
|
||
|
* htword/ WordList.cc (Walk): fixed bug in Walk: if flag HTDIG_WORDLIST was set
|
||
|
then data was uninitialized in loop
|
||
|
|
||
|
Fri Oct 15 18:52:03 1999 Marcel Bosc <bosc at ceic.com>
|
||
|
|
||
|
* htdig/Document.h (class Document): added const to:
|
||
|
Transport::DocStatus RetrieveLocal(HtDateTime date, const String filename);
|
||
|
|
||
|
Fri Oct 15 17:46:23 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* acinclude.m4,configure.in: modified AC_APACHE_PROG to detect
|
||
|
version number and control it.
|
||
|
|
||
|
* test/conf/*.in: patch to fit module loading or not, accomodate
|
||
|
various installation configurations.
|
||
|
|
||
|
* test/test_functions.in: More portable call to apache.
|
||
|
|
||
|
Fri Oct 15 12:55:47 1999 CEST Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* htdig/Document: added the management of 'persistent_connections',
|
||
|
'head_before_get', 'max_retries' configuration attributes.
|
||
|
|
||
|
Fri Oct 15 12:54:11 1999 CEST Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* test/testnet.cc: added the option '-m' for setting the max size
|
||
|
of the document.
|
||
|
|
||
|
Fri Oct 15 12:48:49 1999 CEST Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* htdig/Server: added a flag for persistent connections.
|
||
|
It's set to true if the Server allows persistent connections.
|
||
|
It should be used when retrieving a document.
|
||
|
|
||
|
Fri Oct 15 12:45:42 1999 CEST Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* defaults.cc: added the configuration attributes 'persistent_connections',
|
||
|
'max_retries' and 'head_before_get'. Their default values are
|
||
|
respectively true, 3, false.
|
||
|
|
||
|
Fri Oct 15 12:35:51 1999 CEST Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* HtHTTP.cc: managing of uncompleted stream reading with persistent
|
||
|
connections (it occurs when max_doc_size is lower than the real
|
||
|
content length of the document, or when a document is not parsable
|
||
|
and we asked for it with a GET call).
|
||
|
|
||
|
* Transport: _host variable is treated as a String, as Loic suggested.
|
||
|
|
||
|
Fri Oct 15 12:11:23 1999 Marcel Bosc <bosc at ceic.com>
|
||
|
|
||
|
* Added README to htword
|
||
|
|
||
|
Thu Oct 14 11:29:35 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/mktime.c, htlib/regex.c, htlib/regex.h, htlib/strptime.c:
|
||
|
Updated with latest glibc versions. Merging from glibc sources may
|
||
|
have introduced bugs, so this is the last merge before htdig-3.2.0b1.
|
||
|
|
||
|
Thu Oct 14 13:09:32 1999 CEST Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* htnet/Transport: added statistics for open and close of connections
|
||
|
and changes of servers.
|
||
|
Fixed a bug in the SetConnection method, regarding the host comparison.
|
||
|
Added a method for showing the statistics on a given channel.
|
||
|
|
||
|
* htnet/HtHTTP: More debug info available.
|
||
|
Added a method for showing the statistics on a given channel.
|
||
|
|
||
|
* test/testnet.cc: now receives changes above.
|
||
|
|
||
|
Wed Oct 13 13:35:42 1999 CEST Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* htdig/Document.h: added an HtHTTP pointer to the class.
|
||
|
|
||
|
* htdig/Document.cc: Transport and HtHTTP initialization methods
|
||
|
inside the Document constructur. The class destructor now calls
|
||
|
only the HtHTTP destructor (not the Transport destructor).
|
||
|
Modified the Retrieve method.
|
||
|
|
||
|
* htdig/Server.h: _last_connection is now an HtDateTime object.
|
||
|
|
||
|
* htdig/Server.cc: _modified the constructor and the delay method.
|
||
|
|
||
|
* htdig/Retriever.cc: modified the parse_url function in order to manage
|
||
|
all the Document status messages coming from the Transport class.
|
||
|
Also modified the method for not found URLs for managing the no_port
|
||
|
status.
|
||
|
|
||
|
Tue Oct 12 10:12:10 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* install headers and libraries so that htdig libraries may be used by external programs
|
||
|
|
||
|
* htword/WordList.cc,WordType.cc: add comments about config parameters used.
|
||
|
|
||
|
Fri Oct 8 09:35:30 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/HtDateTime.cc (SetFTime): Change buffer argument to const
|
||
|
char* to prevent problems passing in const buffers.
|
||
|
|
||
|
* htnet/HtHTTP.h: Change SetUserAgent to take a const char* to
|
||
|
prevent problems passing in const parameters.
|
||
|
|
||
|
* htdig/Document.h, htdig/Document.cc(): Use Transport class for
|
||
|
obtaining documents. Remove duplication of declarations
|
||
|
(e.g. DocStatus).
|
||
|
|
||
|
* htdig/Retriever.cc: Adapt switch statements from
|
||
|
Document::DocStatus to Transport::DocStatus.
|
||
|
|
||
|
* htdig/Server.cc: Use Document::Retrieve instead of RetrieveHTTP.
|
||
|
|
||
|
Fri Oct 08 16:35:16 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* test/t_htnet: succeed if timeout occurs. It was the opposite.
|
||
|
|
||
|
* configure.in: AC_MSG_CHECKING(how to call getpeername?) add missing
|
||
|
comma at end for header spec block.
|
||
|
|
||
|
Fri Oct 08 14:42:47 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* Fix all warnings reported by gcc-2.95.1 related to string
|
||
|
cast to char*.
|
||
|
|
||
|
Fri Oct 08 14:04:21 1999 Loic Dachary <loic at yoda.ceic.com>
|
||
|
|
||
|
* htlib/Configuration,ParsedString,Dictionary: change char* to String
|
||
|
where possible.
|
||
|
|
||
|
* Fix a lot of warnings reported by gcc-2.95.1 related to string
|
||
|
cast to char*.
|
||
|
|
||
|
* Completely disable exception code from db.
|
||
|
|
||
|
Fri Oct 08 13:44:32 1999 CEST Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* HtHTTP.cc: fixed a little bug in setting the modification time
|
||
|
if not returned by the server.
|
||
|
|
||
|
Fri Oct 08 11:30:53 1999 CEST Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* HtHTTP.cc: better management of connection failures return values.
|
||
|
* Transport.h: added Document_no_connection and
|
||
|
Document_connection_no_port enum values.
|
||
|
* testnet.cc: management of above changes.
|
||
|
|
||
|
Fri Oct 08 11:27:31 1999 CEST Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* configure.in: modified getpeername() test.
|
||
|
|
||
|
Fri Oct 08 10:28:15 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htdig/Retriever.cc (IsValidURL): test return value of
|
||
|
ext = strrchr(url, '.');
|
||
|
|
||
|
* htword/WordRecord.h: initialize info member to 0 in constructor and
|
||
|
Clear.
|
||
|
|
||
|
* htlib/Configuration: char* -> String to all functions. Resolve
|
||
|
warnings.
|
||
|
|
||
|
Thu Oct 07 16:19:46 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htnet/HtHTTP.cc (ReadChunkedBody): use append instead of
|
||
|
<< because buffer is *not* null terminated.
|
||
|
|
||
|
* htnet/Transport.cc (Transport): initialize _port and _max_document_size
|
||
|
otherwise comparison with undefined value occurs.
|
||
|
|
||
|
Thu Oct 07 16:34:21 1999 CEST Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* HtHTTP.cc: call FinishRequest everytime in HTTPRequest() a value is
|
||
|
returned.
|
||
|
* testnet.cc: improved with more statistics and connections timeouts
|
||
|
control.
|
||
|
|
||
|
Thu Oct 07 12:53:12 1999 CEST Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* configure.in: modified getpeername() test function with
|
||
|
AC_LANG_CPLUSPLUS instead of AC_LANG_C.
|
||
|
|
||
|
Thu Oct 07 11:56:52 1999 CEST Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* HtHTTP.cc : fixed bug of double deleting _access_time
|
||
|
and _modification_time objects in ~HtHTTP().
|
||
|
|
||
|
Thu Oct 07 10:17:22 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htword/WordRecord.h: change (const char*) cast to (char*)
|
||
|
|
||
|
* htword/WordKey.h.tmp: fix constness of accessors, const accessor
|
||
|
returns const ref. Prevents unecessary copies.
|
||
|
|
||
|
Wed Oct 6 23:31:50 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htnet/Connection.h, htnet/Connection.cc: Merge in io
|
||
|
class. Connection class was the only subclass of io.
|
||
|
|
||
|
* Makefile.in, htlib/Makefile.am, htlib/Makefile.in: Update for
|
||
|
removed io class.
|
||
|
|
||
|
* htdig/ExternalParser.cc: Add more verbose flags for errors.
|
||
|
|
||
|
Wed Oct 06 14:56:34 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htnet/Connection.cc (assign_server): use free, not delete
|
||
|
on strdup allocated memory.
|
||
|
|
||
|
* htcommon/URL.cc (URL): set _port to 0 in constructors.
|
||
|
|
||
|
Wed Oct 06 12:08:38 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* Move htlib/HtSGMLCodec.* to htcommon to prevent
|
||
|
crossed interdependencies between htlib and htcommon
|
||
|
|
||
|
Wed Oct 06 12:07:32 1999 Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* HtHTTP.cc: patch from Michal Hirohama regarding
|
||
|
the SetBodyReadingController() method
|
||
|
|
||
|
Wed Oct 06 11:49:15 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* Move htlib/HtZlibCodec.* htlib/cgi.* to htcommon to prevent
|
||
|
crossed interdependencies between htlib and htcommon
|
||
|
|
||
|
Wed Oct 06 11:40:48 1999 Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* HtHTTP: stores the server info correctly and removed some debug info
|
||
|
in chunk managing
|
||
|
|
||
|
Wed Oct 06 11:39:12 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* Move htlib/*URL* to htcommon
|
||
|
|
||
|
Wed Oct 06 10:09:19 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* README: add htword
|
||
|
|
||
|
* test/t_htnet: fix variable set problem & return code problem
|
||
|
|
||
|
Wed Oct 06 08:53:52 1999 Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* Written t_htnet test
|
||
|
|
||
|
Tue Oct 5 12:24:43 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* db/*: Import of Sleepycat's Berkeley DB 2.7.7.
|
||
|
|
||
|
* db/db/db.c, db/include/db.h, db/include/db_cxx.h, db/mp/mp_bh.c:
|
||
|
Resolve conflicts created in merge.
|
||
|
|
||
|
Tue Oct 05 18:53:13 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htdig/Display.cc, htword/*.cc: add inclusion of htconfig.h
|
||
|
|
||
|
Tue Oct 05 14:54:17 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htlib/htString.h (class String): add set(char*)
|
||
|
|
||
|
* htword/WordKey.cc: define typedefs for key components. Leads to more
|
||
|
regular code and no dependency on a predefined set of known types.
|
||
|
All types must still be castable to unsigned int.
|
||
|
Assume Word of type String always exists.
|
||
|
Generic Get/Set/Unset methods made simpler. Added const and ref
|
||
|
for Get in both forms.
|
||
|
|
||
|
* htword/WordList.cc: enable word reference counting only if wordlist_extend
|
||
|
configuration parameter is set. This parameter is hidden because
|
||
|
no code uses per word statistics at present. It is only activated
|
||
|
in the test directory.
|
||
|
|
||
|
* htword/word_list.pl: add mapping to symbolic type names,
|
||
|
force and check to have exactly one String field named Word.
|
||
|
|
||
|
Mon Oct 04 20:05:35 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* test: add thingies to make test work when doing ./configure
|
||
|
outside the source directory.
|
||
|
|
||
|
* htword/WordList: Add Ref and Unref to update statistics.
|
||
|
Fix walking to start from the end of statistics. All statistics
|
||
|
words start with \001, therefore at the beginning of the file and
|
||
|
all clustered together.
|
||
|
|
||
|
* htword/WordStat: derived from WordReference to implement
|
||
|
uniq word statistics.
|
||
|
|
||
|
* test/word.cc: test statistics updating.
|
||
|
|
||
|
* htword/WordKey.cc: fix bugous compare (returned length diff
|
||
|
if key of different length).
|
||
|
|
||
|
Mon Oct 04 18:43:56 1999 Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* test/testnet.cc: added the option for HEAD before GET control
|
||
|
|
||
|
Mon Oct 04 17:33:24 1999 Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* htnet/Transport.h .cc: added the FlushConnection() method
|
||
|
|
||
|
* htnet/HtHTTP.h .cc: now the Request() method can make a HEAD
|
||
|
request precede a GET request. This is made by default, and
|
||
|
can be changed by using the methods Enable/DisableHeadBeforeGet().
|
||
|
A configuration option can be raised to manage it.
|
||
|
|
||
|
Mon Oct 04 12:43:41 1999 Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* htlib/io.h .cc: added a flush() method.
|
||
|
|
||
|
* htnet/HtHTTP.cc: manage the chunk correctly, by calling the flush()
|
||
|
method after reading it.
|
||
|
|
||
|
Mon Oct 04 12:02:24 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htlib/htString.h: move null outside inline operator [] functions.
|
||
|
|
||
|
Fri Oct 01 14:55:56 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htword/WordRecord: mutable, can also contain uniq word statistics.
|
||
|
|
||
|
* htword/WordReference: remove all dependencies related to the actual
|
||
|
structure of the key.
|
||
|
|
||
|
* htcommon/HtWordReference: derived from WordReference, explicit
|
||
|
accessors.
|
||
|
|
||
|
* htcommon/HtWordList: derived from WordList, only handles the
|
||
|
word cache (Flush, MarkGone).
|
||
|
|
||
|
* htdig/HTML.cc (do_tag): add wordindex to have location set in
|
||
|
tags
|
||
|
|
||
|
* htcommon/DocumentRef.cc (AddDescription): add Location calculation
|
||
|
|
||
|
* htword/WordList.cc: add dberror to map Berkeley DB error codes
|
||
|
|
||
|
* htsearch/Display.cc (display): initialize good_sort to get rid
|
||
|
of strange warning.
|
||
|
|
||
|
Fri Oct 01 09:02:11 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* Makefile.config: duplicate library lines to resolve
|
||
|
interdependencies.
|
||
|
|
||
|
Thu Sep 30 17:56:55 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htmerge/words.cc (delete_word): Upgrade to use WordCursor.
|
||
|
|
||
|
* htword/WordList: Walk now uses a local WordCursor. Many concurent
|
||
|
Walk can happen at the same time.
|
||
|
|
||
|
* htword/WordList: Walk callback now take the current WordCursor.
|
||
|
Added a Delete method that takes the WordCursor. Allows to delete
|
||
|
the current record while walking.
|
||
|
|
||
|
* db/include/db_cxx.h (DB_ENV): add int return type to operator =
|
||
|
|
||
|
* db/dist/configure.in (CXXFLAGS): disable adding obsolete
|
||
|
g++ option.
|
||
|
|
||
|
* configure.in: enable C++ support when configuring Berkeley DB
|
||
|
|
||
|
* htword: create. move Word* from htcommon. move HtWordType
|
||
|
from htlib and rename WordType.
|
||
|
|
||
|
* htword/WordList: use db_cxx interface instead of Database.
|
||
|
Less interface overhead. Get access to full capabilities of
|
||
|
Berkeley DB. Much more error checking done.
|
||
|
Create WordCursor private class to use String instead of Dbt.
|
||
|
|
||
|
Wed Sep 29 20:03:31 1999 Loic Dachary <loic at yoda.ceic.com>
|
||
|
|
||
|
* htlib/lib.h: AIX xlC does is confused by overloaded mystrcasestr
|
||
|
that only differ in constness. Only keep const form and use cast
|
||
|
where approriate. *sigh*
|
||
|
|
||
|
* htlib/htString.h: accomodate new form of Object::compare and
|
||
|
Copy. Explicitly convert compare arg to String&, prevent hiding
|
||
|
and therefore missing the underlying compare function.
|
||
|
|
||
|
* htlib/HtVector.cc (Copy): make it const
|
||
|
|
||
|
* htlib/HtHeap.cc: accomodate new form of Object::compare
|
||
|
|
||
|
* htcommon/List.h,cc: Add ListCursor to allow many pointers that
|
||
|
walk the list to exist in the same program.
|
||
|
|
||
|
* htlib/Object.h (class Object): kill unused Serialize + Deserialize.
|
||
|
Change unused Copy to const and bark on stderr if called because it
|
||
|
is clearly not was is wanted. If Copy is called and the derived class
|
||
|
does not implement Copy we are in trouble. Alternatives are to make
|
||
|
it pure virtual but it will break things all over the code or to abort
|
||
|
but this will be considered to violent. Change compare to take a
|
||
|
const reference and be a const.
|
||
|
|
||
|
Wed Sep 29 16:51:58 1999 Loic Dachary <loic at yoda.ceic.com>
|
||
|
|
||
|
* acinclude.m4,configure.in,Makefile.config: remove -Wall from
|
||
|
Makefile.conf, add the AC_COMPILE_WARNINGS macro in acinclude.m4
|
||
|
and use it in configure.in.
|
||
|
|
||
|
* htdoc/default_check.pl: remove, unused
|
||
|
|
||
|
Wed Sep 29 13:07:58 1999 Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* htnet/Transport: fixed some bugs on construction and destruction
|
||
|
|
||
|
* htnet/HtHTTP: the most important add is the decoding of chunked
|
||
|
encoded responses, as reported on RFC2616 (HTTP/1.1). It needs
|
||
|
to be developed, because it timeouts at the end of the request.
|
||
|
Added a function pointer in order to dynamically handle the function
|
||
|
that reads the body of a response (for now, normal and chunked, but
|
||
|
other encoding ways exist, so ...). Fixed some bugs on construction
|
||
|
and added some features like Server and Transfer-encoding headers.
|
||
|
|
||
|
Wed Sep 29 13:54:59 1999 Loic Dachary <loic at yoda.ceic.com>
|
||
|
|
||
|
* fix all inline method declarations so that they are always declared
|
||
|
inline in the class declaration if an inline definition follows.
|
||
|
|
||
|
* acinclude.m4: also search apache in /usr/local/apache/bin by default.
|
||
|
|
||
|
* fix various warnings of gcc-2.95, now compiles ok without warnings
|
||
|
and with -Wall.
|
||
|
|
||
|
* htlib/htString.h: removed commented out inline get
|
||
|
|
||
|
* test/testnet.cc: add includes for optarg
|
||
|
|
||
|
Tue Sep 28 18:56:36 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* Makefile.config (HTLIBS): libhtnet at the beginning of the list. It
|
||
|
matters on Solaris-2.6 for instance.
|
||
|
|
||
|
* test/testnet.cc: change times to timesvar to avoid conflict with
|
||
|
function (was warning only on Solaris-2.6).
|
||
|
|
||
|
* htdig,htsearch,htmerge,test/word are purify clean when running
|
||
|
make check.
|
||
|
|
||
|
Tue Sep 28 18:23:49 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htmerge/words.cc (mergeWords): use WordList::Walk to avoid loading ALL
|
||
|
the words into memory.
|
||
|
|
||
|
* htlib/DB2_db.cc (Open): we don't want duplicates. Big mistake. If DUP is
|
||
|
on, every put for update will insert a new entry.
|
||
|
|
||
|
* htcommon/WordList.cc (Delete): separate Delete (straight Delete and WalkDelete)
|
||
|
to avoid accessing dbf from outside WordList.
|
||
|
|
||
|
* htcommon/WordList.cc (Walk): now promoted to public.
|
||
|
|
||
|
Tue Sep 28 16:34:56 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* test/word.cc (dolist): Add regression tests for Delete.
|
||
|
|
||
|
* htcommon/WordList.cc (Delete): Reimplement from scratch. Use Walk
|
||
|
to find records to delete. This allows to say delete all occurence
|
||
|
of this word, delete all words in this document (slow), delete
|
||
|
all occurences of this word in this document etc.
|
||
|
|
||
|
* htcommon/WordList.cc (Walk): extend so that it handles walk for
|
||
|
partially specified keys, remains fully backward compatible. It allows
|
||
|
to extract all the words in a specific document (slow) or all occurences
|
||
|
of a word in a specific document etc.
|
||
|
|
||
|
Tue Sep 28 12:56:12 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htcommon/DocumentDB.cc (Open): report errors on stderr
|
||
|
|
||
|
* htmerge/docs.cc (convertDocs): rely on error reporting from DocumentDB
|
||
|
instead of implementing a custom one.
|
||
|
|
||
|
Tue Sep 28 11:36:28 1999 Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* htnet/Transport.h: added the status code and the reason phrase
|
||
|
|
||
|
* htnet/HtHTTP.cc .h: removed the attributes above.
|
||
|
Read the body of a response if the code is 2xx. Issues the
|
||
|
GetLocation() method.
|
||
|
|
||
|
Tue Sep 28 10:32:47 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* test/htdocs/set3: create and populate with cgi scripts have
|
||
|
bad behaviour (time out and, slow connection).
|
||
|
|
||
|
Tue Sep 28 10:20:23 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* test/htdocs: move html files in set1/set2 subdirectories to allows
|
||
|
tests that use different set of files. Change htdig.conf accordingly.
|
||
|
|
||
|
Tue Sep 28 09:31:12 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* test/Makefile.am: comment test options, add LONG_TEST='y' for lengthy
|
||
|
tests, by default run quick tests.
|
||
|
|
||
|
* installdir/bad_words: removed it an of : since the minimum word
|
||
|
length is by default 3, these words are ignored anyway.
|
||
|
|
||
|
Mon Sep 27 20:37:38 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htlib/HtWordType.h,cc: concentrate knowledge about word definition in this
|
||
|
class. Rename the class WordType (think WordReference etc...). Change
|
||
|
Initialize to use an external default object. A WordType object may be
|
||
|
allocated on its own. Drag functionalities from BadWordFile, Replace and
|
||
|
IsValid of WordList, and concentrate them in the WordType::Normalize
|
||
|
function.
|
||
|
|
||
|
* htcommon/WordList: use the new WordList semantic. WordType is now a member
|
||
|
of WordList, opening the possibility to have many WordList object with different
|
||
|
configurations within the same program since the constructor takes
|
||
|
|
||
|
* htsearch/htsearch.cc (setupWords): Use HtNormalize to find out if word should
|
||
|
be ignored in query. Formerly using IsValid.
|
||
|
|
||
|
* htlib/String.cc (operator []): fix big mistake, operator [] was indeed last() !
|
||
|
|
||
|
* htlib/String.cc(uppercase, lowercase): return the number of converted chars.
|
||
|
|
||
|
* htlib/String.cc(remove): return the number of chars removed.
|
||
|
|
||
|
Mon Sep 27 17:43:23 1999 Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* Created testnet.cc under test dir for trying the htnet library
|
||
|
It's a simple program that retrieves an URL.
|
||
|
|
||
|
* htnet/HtHTTP.cc, .h: added a 'int (*) (char *)' function pointer.
|
||
|
This attribute is static and it is used under the isParsable method
|
||
|
in order to determine if a document is parsable. It must be set
|
||
|
outside this class by using the SetParsingController static method.
|
||
|
The classic use is to set it to 'ExternalParser::canParse' .
|
||
|
|
||
|
Mon Sep 27 10:52:51 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htmerge/db.cc (mergeDB): delete words instead of words->Destroy()
|
||
|
because the words object itself was not freed.
|
||
|
|
||
|
Mon Sep 27 10:38:37 1999 Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* Created 'htnet' library
|
||
|
|
||
|
Mon Sep 27 12:39:24 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* test/word.cc (dolist): don't deal with upper case at present and prevent warning.
|
||
|
|
||
|
Mon Sep 27 10:38:37 1999 Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* htlib/String.cc: removed compiler warnings
|
||
|
|
||
|
* htdig/HtHTTP.h: corrected cvs Id property
|
||
|
|
||
|
Mon Sep 27 10:29:58 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htlib/String.cc (String): make sure *all* constructors set the Data
|
||
|
member to 0.
|
||
|
|
||
|
* htsearch/parser.cc (score): add missing dm->id = wr->DocID();
|
||
|
strange it did not make search fail horribly.
|
||
|
|
||
|
Mon Sep 27 09:46:34 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* test/conf/htdig.conf.in (common_dir): add common_dir so that
|
||
|
templates are found in compile directory.
|
||
|
|
||
|
* htsearch/parser.cc (phrase): free wordList at end and only allocate if
|
||
|
needed.
|
||
|
|
||
|
Fri Sep 24 16:35:47 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htcommon/DocumentDB.ccf (Open): change mode to 666 instead of 664,
|
||
|
it's the bizness of umask to remove permission bits.
|
||
|
|
||
|
* htlib/URL.cc (removeIndex): Memory leak. do not use l.Release
|
||
|
since standard Destroy called by destructor is ok.
|
||
|
|
||
|
* htdig/htdig.cc (main): Memory leak. Use l.Destroy instead of
|
||
|
l.Release.
|
||
|
|
||
|
* htlib/StringList.cc (Join): Memory leak (new String str +
|
||
|
return *str). Also change to const fct.
|
||
|
|
||
|
* htlib/List.cc (Nth): add const version to help StringList::Join save
|
||
|
memory.
|
||
|
|
||
|
* htdig/HTML.cc (parse): delete [] text (was missing [])
|
||
|
|
||
|
* htlib/HtVector.cc: Most of the boundary tests with element_count
|
||
|
(but not all of them) were wrong (> instead of >= for instance).
|
||
|
|
||
|
* htlib/HtVector.cc (Previous): limit test cut and pasted from Next
|
||
|
and obviously completely wrong. Fix.
|
||
|
|
||
|
* htlib/HtVector.cc (Remove): use RemoveFrom, avoid code duplication.
|
||
|
|
||
|
* htcommon/DocumentRef.cc (Clear): set all numerical fields to 0,
|
||
|
and truncate strings to 0. Some were missing.
|
||
|
|
||
|
* htlib/Connection.cc (Connection): free(server_name) because allocated
|
||
|
by strdup not new.
|
||
|
|
||
|
Fri Sep 24 14:30:21 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* */.cvsignore: update to include .pure, *.la, *.lo, .purify
|
||
|
|
||
|
* htlib/String.cc (String): add Data = 0
|
||
|
|
||
|
* htlib/htString.h (class String): add Data = 0
|
||
|
|
||
|
* htlib/String.cc (String): init set to MinimumAllocationSize at least
|
||
|
prevents leaking if init = 0.
|
||
|
|
||
|
* htlib/String.cc (nocase_compare): use get() instead of direct
|
||
|
pointer to Data so that the trailing null will be added.
|
||
|
|
||
|
* htlib/Dictionary.cc (DictionaryEntry): free(key) instead of
|
||
|
delete [] key because obtained with strdup.
|
||
|
|
||
|
* htlib/DB2_db.cc (Close): free(dbenv) because db_appexit does not
|
||
|
free this although it free everything else.
|
||
|
|
||
|
Thu Sep 23 18:18:40 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* configure.in: add PERL detection & use in Makefile.am
|
||
|
|
||
|
Thu Sep 23 14:29:29 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* configure.in: removed unused alloca.h
|
||
|
|
||
|
* htcommon/DocumentDB.cc: test isopen in Close instead of before calling Close.
|
||
|
Add some const in functions arguments.
|
||
|
(Read): change char* args to const String&, changed tests for null pointers to
|
||
|
empty().
|
||
|
(Add): Delete the temp class member, use function local temp.
|
||
|
(operator []): change char* args to const String&
|
||
|
(CreateSearchDB): change char* args to const String&
|
||
|
|
||
|
* htcommon/DocumentRef.cc:(AddDescription): Add some const in functions arguments.
|
||
|
Use a WordReference as insertion context instead of merely the docid: it contains
|
||
|
the insertion context.
|
||
|
(AddAnchor): Add some const in functions arguments.
|
||
|
|
||
|
* htcommon/DocumentRef.h: Add some const in inline functions arguments.
|
||
|
|
||
|
* htcommon/Makefile.am: add WordKey + WordKey.h generation
|
||
|
|
||
|
* htcommon/word_builder.pl, word.desc, WordKey.h.tmpl: generate WordKey.h from WordKey.h.tmpl and
|
||
|
word.desc
|
||
|
|
||
|
* htcommon/WordList.cc: In general remove code that belongs to WordReference rather
|
||
|
than WordList and cleanup const + String.
|
||
|
(WordList) the constructor takes a Configuration object in argument.
|
||
|
(Word -> Replace): Word method replaced by Replace method because more explicit. Now
|
||
|
taks a WordReference in argument instead of the list of fields values.
|
||
|
(valid_word deleted, IsValid only): Add some const in functions arguments.
|
||
|
(BadWordFile): change char* args to const String&
|
||
|
(Open + Read -> Open): Open and Read merge into Open with mode argument. change char* args
|
||
|
to const String&.
|
||
|
(Add): use WordReference::Pack and simply do Put.
|
||
|
(operator[], Prefix ...) now take WordReference instead of Word. Autmatic Conversion from
|
||
|
Word for compatibility thru WordReference(const Word& w).
|
||
|
(Dump): change char* args to const String&
|
||
|
(Walk): use WordReference member functions instead of hard coded packing
|
||
|
|
||
|
* htcommon/WordRecord.h: move flag definitions to WordReference.h
|
||
|
only keep anchor, the reste moved to key.
|
||
|
|
||
|
* htdig/Document.cc: change all config[""] manipulations from char* to String
|
||
|
or const String
|
||
|
(setUsernamePassword): Add some const in functions arguments.
|
||
|
|
||
|
* htdig/HTML.cc: change all config[""] manipulations from char* to String
|
||
|
or const String. Change null pointer tests to empty().
|
||
|
(transSGML): change char* args to const String&
|
||
|
|
||
|
* htdig/HtHTTP.cc: Add error messages for default cases in every switch.
|
||
|
|
||
|
* htdig/PDF.cc: (parse) change char* to const String& for config[""]
|
||
|
|
||
|
* htdig/Plaintext.cc: (parse) remove unused variable
|
||
|
|
||
|
* htdig/Retriever.cc: use WordReference word_context instead of simple docid
|
||
|
to hold the insertion context.
|
||
|
(Retriever) pass config to WordList initializer.
|
||
|
(setUsernamePassword): Add some const in functions arguments.
|
||
|
(Initial): change char* args to const String&
|
||
|
(parse_url): use WordReference word_context, add debug information.
|
||
|
(RetrievedDocument): set anchor in word_context.
|
||
|
(got_word): use Replace instead of Word
|
||
|
(got_*): Add some const in functions arguments.
|
||
|
|
||
|
* htdig/htdig.cc: change all config[""] manipulations from char* to String
|
||
|
|
||
|
* htdoc/cf_generate.pl: compute attrs.html, cf_byprog.html and cf_byname.html from
|
||
|
../htlib/default.cc and attrs_head.html attrs_tail.html cf_byname_head.html cf_byname_tail.html
|
||
|
cf_byprog_head.html cf_byprog_tail.html
|
||
|
Add rules in Makefile.am
|
||
|
|
||
|
* htfuzzy: In every programs I changed the constructor to take a
|
||
|
Configuration agrument. The openIndex and writeDB had this
|
||
|
argument sometime used it, sometimes used the global
|
||
|
config. Having it in the contructor is cleaner and safer, there
|
||
|
is no more reference to the global config. I also changed some
|
||
|
char* to String and const. Most of the program look the same, I
|
||
|
won't go into details here :-}
|
||
|
|
||
|
* htlib/Configuration.cc: changed separators from String* to String. Simpler.
|
||
|
(~Configuration): removed because not needed.
|
||
|
(Add): change to String, remove new String + delete for local var.
|
||
|
(Find, operator[]): make it const fct, add some const in functions arguments.
|
||
|
(Value + Double): killed, replaced by as_integer + as_double from String
|
||
|
(Boolean): use String methods + string objects
|
||
|
(Defaults): Add some const in functions arguments.
|
||
|
|
||
|
* htlib/Configuration.h: add
|
||
|
char *type; // Type of the value (string, integer, boolean)
|
||
|
char *programs; // White separated list of programs/modules using this attribute
|
||
|
char *example; // Example usage of the attribute (HTML)
|
||
|
char *description; // Long description of the attribute (HTML)
|
||
|
to the ConfigDefaults type.
|
||
|
|
||
|
* htlib/Connection.cc: (assign_server) change char* args to const String&
|
||
|
|
||
|
* htlib/DB2_db.cc: Merge with DB2_hash.
|
||
|
Add compare and prefix functions pointers.
|
||
|
Merge OpenRead & OpenReadWrite into Open, keep for compatibility.
|
||
|
skey and data are now strings instead of DBT.
|
||
|
Remove Get_Next_Seq.
|
||
|
Get_Next now returns key and value in arguments.
|
||
|
Remove all other Get_Next interfaces.
|
||
|
|
||
|
* htlib/Database.h:
|
||
|
Compatibility functions for Get_Next
|
||
|
Put, Get, Exists, Delete take String args and are inline
|
||
|
Add SetPrefix and SetCompare
|
||
|
|
||
|
* htlib/Dictionary.cc:
|
||
|
Add copy constructor.
|
||
|
Add DictionaryCursor that holds the traversal context.
|
||
|
Use DictionaryCursor object for traversal without explicit
|
||
|
cursor specified.
|
||
|
Add constness where meaningfull.
|
||
|
|
||
|
* htlib/HtPack.cc:
|
||
|
(htPack) format is const, change strtol call
|
||
|
to use temporary variable to cope with constness.
|
||
|
(htUnpack) dataref argument is not a reference anymore. Not used
|
||
|
anywhere and kind of hidden argument nobody wants.
|
||
|
|
||
|
* htlib/HtRegex.cc: set, match, HtRegex have const args.
|
||
|
|
||
|
* htlib/HtWordCodec.cc: (code) orig is const
|
||
|
|
||
|
* htlib/HtWordType.cc,h: statics is made of String instead of char*. Remove
|
||
|
static String punct_and_extra from Initialize.
|
||
|
|
||
|
* htlib/HtZlibCodec.cc: len is unsigned int
|
||
|
|
||
|
* htlib/ParsedString.cc: add constness to function args
|
||
|
(get) use String instead of char
|
||
|
|
||
|
* htlib/QuotedStringList.cc: inline functions argument variations and
|
||
|
add constness.
|
||
|
|
||
|
* htlib/String.cc: add constness whereever possible.
|
||
|
|
||
|
* htlib/htString.h: Add const get, char* cast, operator [].
|
||
|
Add as_double conversion.
|
||
|
|
||
|
* htlib/StringList.cc: inline functions argument variations and
|
||
|
add constness.
|
||
|
|
||
|
* htlib/StringMatch.cc: add constness to function args.
|
||
|
|
||
|
* htlib/URL.cc: add constness to function args.
|
||
|
(URL): fct arg was used as temp. Change, clearer.
|
||
|
|
||
|
* htlib/lib.h: add const declaration of string manipulation functions.
|
||
|
Two forms for mystrcasestsr: const and not const.
|
||
|
|
||
|
* htlib/strcasecmp.cc: add constness to function args.
|
||
|
|
||
|
* htlib/timegm.c: add declaration for __mktime_internal
|
||
|
|
||
|
* htmerge/db.cc: change *doc* vars from char* to const String, use
|
||
|
new WordList + WordReference interface.
|
||
|
|
||
|
* htmerge/docs.cc: change *doc* vars from char* to const String.
|
||
|
|
||
|
* htmerge/words.cc: use new WordList + WordReference interface.
|
||
|
|
||
|
* htsearch/Display.cc: use empty method on String where appropriate.
|
||
|
use String instead of char* where config[""] used.
|
||
|
(includeURL): change char* args to const String&
|
||
|
|
||
|
* htsearch/ResultMatch.cc: (setTitle, setSortType) change char* args to const String&
|
||
|
|
||
|
* htsearch/Template.cc: (createFromFile) change char* args to const String&
|
||
|
|
||
|
* htsearch/Template.h: accessors return const String& or take const char*
|
||
|
|
||
|
* htsearch/TemplateList.cc: (get) use const String for internalNames.
|
||
|
|
||
|
* htsearch/htsearch.cc: use String instead of char* where config[""] used.
|
||
|
|
||
|
* htsearch/parser.cc: Initialize WordList member with config global.
|
||
|
(perform_push): free the result list after calling score.
|
||
|
(score, phrase): use new WordList + WordReference interface.
|
||
|
|
||
|
Thu Sep 23 14:29:29 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htcommon/WordKey.h.tmpl, WordKey.cc: new, describe the key of the word
|
||
|
database.
|
||
|
|
||
|
* htcommon/word.desc: new, abstract description of the key structure of the word
|
||
|
database.
|
||
|
|
||
|
* htcommon/word_builder.pl: new, generate WordKey.h from WordKey.h.tmpl
|
||
|
|
||
|
* htcommon/WordReference.cc: move key manipulation to WordKey.cc
|
||
|
Add Unpack/Pack functions. Add accessors for fields and move fields to private.
|
||
|
Add constness where possible.
|
||
|
|
||
|
Mon Sep 20 14:50:47 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* Everywhere config["string"] is used, check that it's *not* converted to
|
||
|
char* for later use. Keep String object so that there is no chance to
|
||
|
use a char* that has been deallocated. Using a String as return for config["string"]
|
||
|
is also *much* safer for the great number of calls that did not check for a possible
|
||
|
0 pointer return.
|
||
|
|
||
|
* htfuzzy/*.{cc,h}: const Configuration& config member. Constructor sets it.
|
||
|
Remove config argument from openIndex & writeDB. The idea (as it was initialy,
|
||
|
I guess) is to be able to have a standalone fuzzy library using a specify
|
||
|
configuration file. It is now possible and consistent.
|
||
|
|
||
|
* htlib/htString.cc: more constness where appropriate. Changed compare
|
||
|
to have const String& arg instead of const Object* because useless and
|
||
|
potential source of bugous code.
|
||
|
|
||
|
* htfuzzy/Regex.cc (getWords): fix bugous setting of extra_word_chars
|
||
|
configuration value. It is set to change the behaviour of HtStripPunctuation
|
||
|
but this function get the extra_word_chars from a static array initialized
|
||
|
at program start by static void Initialize(Configuration & config). Use straight
|
||
|
s.remove() instead. Besides, the string was anchored by prepending a ^ that
|
||
|
was removed because part of the reserved chars.
|
||
|
|
||
|
Mon Sep 20 11:47:05 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htlib/Configuration.cc (operator []): changed return type to String
|
||
|
to solve memory leak. When char* the string was malloced from ParsedString
|
||
|
after substitution and never freed. In fact it was even worse : it was
|
||
|
free before use in some cases.
|
||
|
|
||
|
Sun Sep 19 19:12:44 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htdoc/cf_generate.pl, htcommon/defaults.cc, htlib/Configuration.h:
|
||
|
Change the structure of the configuration defaults. Move
|
||
|
description, examples, types, used_by information from attrs.html.
|
||
|
Write cf_generate.pl to build attrs.html, cf_byname, cf_byprog
|
||
|
from defaults.cc. Makes it easier to maintain an up to date
|
||
|
description of existing attributes. About 10 attributes existed
|
||
|
in defaults.cc and were not describted in the HTML pages.
|
||
|
Add rules in htdoc/Makefile.am to generate the pages if a source
|
||
|
changes.
|
||
|
|
||
|
Fri Sep 17 19:34:48 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* Makefile.config: add -Wall to all compilation and fix
|
||
|
all resulting warnings.
|
||
|
|
||
|
* htlib/Connection.cc (assign_server): remove redundant test
|
||
|
and cast litteral value to unsigned
|
||
|
|
||
|
* htlib/String.cc: add const qualifier where possible. Helps
|
||
|
dealing with const objects at an upper level.
|
||
|
|
||
|
Fri Sep 17 18:27:57 1999 Alexander Bergolth <leo at leo.wu-wien.ac.at>
|
||
|
|
||
|
A few changes so that it compiles with xlC on AIX:
|
||
|
|
||
|
* configure.in, include/htconfig.h.in: Add check for sys/select.h.
|
||
|
Add "long unsigned int" to the possible getpeername_length types.
|
||
|
|
||
|
* htdig/htdig.cc: Moved variable declaration out of case block.
|
||
|
|
||
|
* htlib/Connection.cc: Include sys/select.h.
|
||
|
|
||
|
* htcommon/WordList.cc: just a type cast
|
||
|
|
||
|
* htlib/regex.c: define true and false only if they aren't already
|
||
|
|
||
|
* htdig/Transport.{h,cc}: removed inline keywords (inline functions
|
||
|
have to be defined and declared simultaneously)
|
||
|
|
||
|
* htlib/{mktime.c,regex.h,strptime.c,timegm.c}: change // comments
|
||
|
to /* ... */
|
||
|
|
||
|
Tue Sep 14 01:15:48 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htmerge/db.cc: Rewrite to use the WordList functions to merge
|
||
|
the two word databases. Also make sure to load the document
|
||
|
excerpt when adding in DocumentRefs.
|
||
|
|
||
|
* htmerge/docs.cc: Fix bug where ids were not added to the discard
|
||
|
list correctly.
|
||
|
|
||
|
* htmerge/words.cc: Fix bug where ids were not checked for
|
||
|
existance in the discard list correctly.
|
||
|
|
||
|
Sun Sep 12 12:27:16 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htcommon/defaults.cc: Remove word_list since that file is no
|
||
|
longer used.
|
||
|
|
||
|
* htdig/htdig.cc: Ensure -a and -i are followed for the word_db
|
||
|
file. Fixes PR #638.
|
||
|
|
||
|
Sat Sep 11 00:11:28 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/StringMatch.h: Add back mistakenly deleted #ifndef/#define.
|
||
|
|
||
|
Fri Sep 10 23:07:43 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htmerge/*, htcommon/*, htdig/*, htlib/*: Add copyright information.
|
||
|
|
||
|
Fri Sep 10 11:33:50 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htnotify/htnotify.cc: Add copyright information.
|
||
|
|
||
|
* htsearch/* htfuzzy/*: Ditto.
|
||
|
|
||
|
Fri Sep 10 15:24:44 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htdig/Retriever.cc: change static WordList words to
|
||
|
object member. words.Close() at end of Start function
|
||
|
to make sure data is flushed by database.
|
||
|
|
||
|
* htcommon/WordList.cc (Close): test isopen to prevent
|
||
|
ugly crash. Remove isopen test in calling functions.
|
||
|
|
||
|
Fri Sep 10 13:45:53 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* htcommon/WordList.h htcommon/WordList.cc: methods Collect
|
||
|
and Walk that factorise the behaviour of operator [], Prefix
|
||
|
and WordRefs.
|
||
|
|
||
|
* htcommon/WordList.h htcommon/WordList.cc: method Dump to
|
||
|
dump an ascii version of the word database.
|
||
|
|
||
|
* htcommon/WordReference.h,htcommon/WordReference.cc: method Dump
|
||
|
to write an ascii version of a word.
|
||
|
|
||
|
* htdig/htdig.cc: -t now also dump word database in ascii as
|
||
|
well.
|
||
|
|
||
|
* htdoc/attrs.html,cf_byprog.html,cf_byname.html: added doc
|
||
|
for word_dump
|
||
|
|
||
|
Thu Sep 9 20:30:18 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htfuzzy/Fuzzy.h, htfuzzy/Fuzzy.cc, htfuzzy/Prefix.cc,
|
||
|
htfuzzy/Regex.cc, htfuzzy/Speling.cc, htfuzzy/Substring.cc,
|
||
|
htfuzzy/htfuzzy.cc, htfuzzy.h: Change to use WordList code instead
|
||
|
of direct access to the database.
|
||
|
|
||
|
Thu Sep 9 14:55:59 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* contrib/parse_doc.pl: fix bug in pdf title extraction.
|
||
|
|
||
|
Tue Sep 7 23:49:41 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/ExternalParser.h, htdig/ExternalParser.cc (parse): Change
|
||
|
parsing of location to allow phrase searching -- location is *not*
|
||
|
just 0-1000.
|
||
|
|
||
|
* htdig/Plaintext.h, htdig/Plaintext.cc, htdig/PDF.cc: Ditto.
|
||
|
|
||
|
* htdig/Retriever.h, htdig/Retriever.cc: Don't call
|
||
|
HtStripPunctuation. This is now done in the WordList::Word method.
|
||
|
|
||
|
* htcommon/WordList.h htcommon/WordList.cc (Prefix): New method to
|
||
|
do prefix retrievals. Essentially the same as [], except the loop
|
||
|
is broken only in the unlikely event that we retrieve something
|
||
|
beyond the range set.
|
||
|
(Exists): New method for checking the existance of a
|
||
|
string--attempt to retrieve it and determine if anything's
|
||
|
actually there.
|
||
|
(Word): Call HtStripPunctuation as part of the cleanup.
|
||
|
|
||
|
Tue Sep 7 21:37:44 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htcommon/defaults.cc: Add new configuration option
|
||
|
removed_unretrieved_urls to remove docs that have not been accessed.
|
||
|
|
||
|
* htmerge/docs.cc (convertDocs): Use it.
|
||
|
|
||
|
* htcommon/defaults.h, htcommon/WordRecord.h,
|
||
|
htcommon/WordReference.h: Add copyright notice to head of file.
|
||
|
|
||
|
Mon Sep 6 10:32:59 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/HtZlibCodec.h, htlib/HtZlibCodec.cc(instance): New method
|
||
|
as used in other codecs.
|
||
|
(encode, decode): Fix compilation errors.
|
||
|
|
||
|
* htlib/Makefile.am: Added HtZlibCodec.cc to the compilation list.
|
||
|
|
||
|
* htcommon/DocumentDB.cc (ReadExcerpt): Call HtZlibCodec to decompress
|
||
|
the excerpt.
|
||
|
(Add): Call HtZlibCodec to compress the excerpt before storing.
|
||
|
(Open, Read): If the databases are
|
||
|
already open, close them first in case we're opening under a
|
||
|
different filename.
|
||
|
(CreateSearchDB): Remove call to external
|
||
|
sort program. Database is already sorted by DocID.
|
||
|
|
||
|
* configure.in, configure: Remove check for external sort
|
||
|
program. No longer necessary.
|
||
|
|
||
|
* */Makefile.in: Regenerate using automake.
|
||
|
|
||
|
Sun Sep 5 13:50:34 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htmerge/docs.cc: Ensure a document with empty excerpt has
|
||
|
actually been retrieved. Otherwise document stubs are always
|
||
|
removed.
|
||
|
|
||
|
* htlib/String.cc: Implement the nocase_compare method.
|
||
|
|
||
|
* htcommon/WordReference.cc: Implement a compare method for
|
||
|
WordRefs to use in sorting. Uses the above.
|
||
|
|
||
|
* htcommon/DocumentRef.h, htcommon/DocumentRef.cc: Update the
|
||
|
headers.
|
||
|
|
||
|
* htcommon/DocumentDB.h: Ditto.
|
||
|
|
||
|
Sun Sep 5 01:37:27 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htcommon/WordList.cc(Flush): Call Add() instead of storing the
|
||
|
data ourselves. Additionally, don't open the database ourself (and
|
||
|
then close it), instead call Open() if it's not open already.
|
||
|
|
||
|
* htcommon/DocumentRef.h, htcommon/DocumentRef.cc(AddDescription):
|
||
|
Pass in a WordList to use when adding link text words. Ensures
|
||
|
that the word db is never opened twice for writing.
|
||
|
|
||
|
* htdig/Retriever.cc: Call AddDescription as above.
|
||
|
|
||
|
* htdig/Server.cc(ctor): If debugging, write out an entry for the
|
||
|
robots.txt file.
|
||
|
|
||
|
* htlib/HtHeap.cc(percolateUp): Fix a bug where the parent was not
|
||
|
updated when moving up more than once.
|
||
|
(pushDownRoot): Fix a bug where the root was inproperly pushed
|
||
|
down when it required looping.
|
||
|
|
||
|
Fri Sep 3 16:23:23 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/HtHeap.cc(Remove): Correct bug where after a removal, the
|
||
|
structure was not "re-heapified" correctly. The last item should
|
||
|
be moved to the top and pushed down.
|
||
|
(pushDownRoot): Don't move items past the size of the underlying
|
||
|
array.
|
||
|
|
||
|
* htdig/Server.h, htdig/Server.cc: Change _paths to work on a
|
||
|
heap, based on the hopcount. Ensures on a given server that the
|
||
|
indexing will be done in level-order by hopcount.
|
||
|
|
||
|
Wed Sep 01 15:40:37 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* test: implement minimal tests for htsearch and htdig
|
||
|
|
||
|
Tue Aug 31 02:17:04 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htcommon/WordRecord.h: Change back to struct to ensure integrity
|
||
|
when compressed and stored in the word database.
|
||
|
|
||
|
* htcommon/WordList.cc (Flush): Use HtPack to compress the
|
||
|
WordRecord before storage.
|
||
|
([], WordRefs): Use HtUnpack to decompress the WordRecord after
|
||
|
storage.
|
||
|
|
||
|
Sun Aug 29 00:42:07 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htsearch/htsearch.cc (convertToBoolean): Remove debugging
|
||
|
strings.
|
||
|
|
||
|
* htsearch/parser.h: Add new method score(List) to merge scoring
|
||
|
for both standard and phrase searching.
|
||
|
|
||
|
* htsearch/parser.cc(phrase): Keep the current list of successful
|
||
|
matched words around to pass to score and perform_phrase.
|
||
|
(perform_phrase): Naively (and slowly, but correctly) loop through
|
||
|
past words to make sure they match DocID as well as successive locations.
|
||
|
Move scoring to score().
|
||
|
(perform_push): Move scoring to score().
|
||
|
(score): Loop through a list of WordReferences and create a list
|
||
|
of scored DocMatches.
|
||
|
|
||
|
Sun Aug 29 00:33:17 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htsearch/htsearch.cc(createLogicalWords): Hack to produce
|
||
|
correct output with phrase searching (e.g. anything in quotes is
|
||
|
essentially left alone). Ensure the StringMatch pattern includes
|
||
|
the phrase with correct spacing as well.
|
||
|
(setupWords): Add a " token whenever it occurs in the query.
|
||
|
(convertToBoolean): Make sure booleans are not inserted into
|
||
|
phrases.
|
||
|
|
||
|
* htsearch/parser.h: Add new methods phrase and perfor_phrase to
|
||
|
take care of parsing phrases and performing the actual matching.
|
||
|
|
||
|
* htsearch/parser.cc(lexan): Return a '"' when present for phrase
|
||
|
searching.
|
||
|
(factor): Call phrase() before parsing a factor--phrases are the
|
||
|
highest priority, so ("RedHat Linux" & Debian) ! Windows makes
|
||
|
sense.
|
||
|
(phrase): New method--slurps up the rest of a phrase and calls
|
||
|
perform_phrase to do the matching.
|
||
|
(perform_phrase): New method--currently just calls perform_and to
|
||
|
give the simulation of a phrase match.
|
||
|
|
||
|
Sat Aug 28 15:57:53 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/Server.h, htdig/Server.cc: Undo yesterdays change -- still
|
||
|
very buggy and shouldn't be used yet.
|
||
|
|
||
|
* htdig/Retriever.cc (parse_url): Change default index to 1 to
|
||
|
more closely match DocIDs shown with verbose output.
|
||
|
|
||
|
* htsearch/DocMatch.h: Change score to double and clean up
|
||
|
headers.
|
||
|
|
||
|
* htcommon/WordRecord.h: Change unnecessary long ints (id and
|
||
|
flags) to plain ints.
|
||
|
|
||
|
* htdig/HTML.cc (parse): Call got_word with actual word sequence
|
||
|
(i.e. 1, 2, 3...) rather than scaling to 1-1000 by character
|
||
|
offset.
|
||
|
|
||
|
* htlib/Database.h, htlib/DB2_db.h, htlib/DB2_hash.h: Change
|
||
|
Get_Item to Get_Next(String item) to return the data as a
|
||
|
reference. This makes it easier to use in a loop and cuts the
|
||
|
database calls in half.
|
||
|
|
||
|
* htlib/DB2_db.cc, htlib/DB2_hash.cc: Implement it, making sure we
|
||
|
keep the possibly useful data around, rather than tossing it!
|
||
|
|
||
|
* htsearch/htsearch.cc(htsearch): Don't attempt to open the word db
|
||
|
ourselves. Instead, pass the filename off to the parser, which
|
||
|
will do it through WordList.
|
||
|
|
||
|
* htsearch/parser.h: Use a WordList instead of a generic Database.
|
||
|
|
||
|
* htsearch/parser.cc(perform_push): Use the WordList[] operator to
|
||
|
return a list of all matching WordRefs and loop through, summing
|
||
|
the score.
|
||
|
|
||
|
* htcommon/WordList.cc (Flush): Don't use HtPack on the
|
||
|
data--somehow when unpacking, there's a mismatch of sizes.
|
||
|
(Read): Fix thinko where we attempted to open the database as a
|
||
|
DB_HASH.
|
||
|
([]): Don't use HtUnpack since we get mismatches. Use the new
|
||
|
Get_Next(data) call instead of calling Get_Item separately.
|
||
|
(WordRefs): Same as above.
|
||
|
|
||
|
Fri Aug 27 09:44:09 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/Retriever.cc (Need2Get): Remove duplicate detection code for
|
||
|
local_urls. The code is somewhat buggy and should be replaced by
|
||
|
more general code shortly.
|
||
|
|
||
|
* htdig/Server.h, htdig/Server.cc (push, pop): Change _paths to a
|
||
|
HtHeap sorted on hopcount first (and order placed on heap
|
||
|
second). Ensures that on each server, the order indexed is
|
||
|
guaranteed to be level-order by hopcount.
|
||
|
|
||
|
* htdig/URLRef.h, htdig/URLRef.cc (compare): Add comparison method
|
||
|
to enable sorting by hopcount.
|
||
|
|
||
|
Fri Aug 27 09:36:35 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htcommon/WordList.h, htcommon/WordList.cc (WordList): Change
|
||
|
words to a list instead of a dictionary for minor speed improvement.
|
||
|
|
||
|
Thu Aug 26 11:18:20 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htcommon/defaults.cc, htdoc/attrs.html: increase default
|
||
|
maximum_word_length to 32.
|
||
|
|
||
|
Wed Aug 25 16:50:16 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/Retriever.cc(got_word): add code to check for compound words
|
||
|
and add their component parts to the word database.
|
||
|
* htdig/PDF.cc(parseString), htdig/Plaintext.cc(parse): Don't strip
|
||
|
punctuation or lowercase the word before calling got_word. That
|
||
|
should be left up to got_word & Word methods.
|
||
|
|
||
|
* htlib/StringMatch.h, htlib/StringMatch.cc(Pattern, IgnoreCase):
|
||
|
Add an IgnorePunct() method, which allows matches to skip over valid
|
||
|
punctuation, change Pattern() and IgnoreCase() to accomodate this.
|
||
|
* htsearch/htsearch.cc(main, createLogicalWords): use IgnorePunct()
|
||
|
to highlight matching words in excerpts regardless of punctuation,
|
||
|
toss out old origPattern, and don't add short or bad words to
|
||
|
logicalPattern.
|
||
|
|
||
|
* htlib/HtWordType.h, htlib/HtWordType.cc(Initialize): set up and
|
||
|
use a lookup table to speed up HtIsWordChar() and HtIsStrictWordChar().
|
||
|
|
||
|
Mon Aug 23 10:13:05 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/HTML.cc(parse): fix problems with null pointer when attempting
|
||
|
SGML entity decoding on bare &, as reported by Vadim Chekan.
|
||
|
|
||
|
Thu Aug 19 11:52:06 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/htsearch.cc(main): Fix to allow multiple keywords
|
||
|
input parameter definitions.
|
||
|
|
||
|
* contrib/parse_doc.pl: make spaces optional in LANGUAGE = POSTSCRIPT
|
||
|
PJL test.
|
||
|
|
||
|
Wed Aug 18 11:27:46 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/PDF.cc(parse): Fixed wrong variable name in new code.
|
||
|
Double-Oops! (It was Friday the 13th, after all...)
|
||
|
|
||
|
Tue Aug 17 16:26:46 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htlib/HtHeap.cc(Remove): apply Geoff's patch to fix Remove.
|
||
|
|
||
|
* htlib/HtVector.h, htlib/HtVector.cc(Index): various bounds overrun
|
||
|
bug fixes and checking in Last(), Nth() & Index().
|
||
|
|
||
|
Mon Aug 16 13:55:10 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/Display.cc(expandVariables): fix up test for &
|
||
|
|
||
|
Mon Aug 16 12:08:57 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* Makefine.am, Makefile.in, installdir/Makefile.am,
|
||
|
installdir/Makefile.in: change all remaining INSTALL_ROOT to DESTDIR.
|
||
|
|
||
|
Fri Aug 13 15:44:31 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/PDF.cc(parse): added missing ')' in new code. Oops!
|
||
|
|
||
|
* htlib/strptime.c, htlib/mktime.c: added #include "htconfig.h"
|
||
|
to pick up definitions from configure program. Let's try to
|
||
|
remember that config.h != htconfig.h!
|
||
|
|
||
|
Fri Aug 13 14:49:07 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* configure.in: removed unused HTDIG_TOP, changed AM_WITH_ZLIB
|
||
|
by CHECK_ZLIB
|
||
|
|
||
|
Fri Aug 13 14:00:16 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/PDF.cc(parse), htcommon/defaults.cc, htdoc/attrs.html
|
||
|
(pdf_parser): Removed -pairs option from default arguments, added
|
||
|
special test for acroread to decide whether to use output file or
|
||
|
directory as last argument (also adds -toPostScript if missing).
|
||
|
Program now tries to test for existance of parser before trying
|
||
|
to call it.
|
||
|
|
||
|
Fri Aug 13 10:10:16 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdoc/attrs.html(pdf_parser): updated xpdf version number.
|
||
|
|
||
|
Thu Aug 12 17:09:37 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* contrib/parse_doc.pl: updated for xpdf 0.90, plus other fixes.
|
||
|
|
||
|
Thu Aug 12 11:12:07 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdoc/attrs.html(logging): added Geoff's description of log lines.
|
||
|
|
||
|
Thu Aug 12 11:21:12 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* strptime fixes : AC_FUNC_STRPTIME defined in acinclude.m4 and used in configure.in,
|
||
|
conditional compilation of strptime.c (only if HAVE_STRPTIME not defined),
|
||
|
removed Htstrptime (strptime.c now defines strptime), changed all calls to Htstrptime
|
||
|
to calls to strptime.
|
||
|
|
||
|
Wed Aug 11 16:59:41 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* */Makefile.am: use -release instead of -version-info because nobody
|
||
|
wants to bother with published shared lib interfaces version numbers
|
||
|
at present.
|
||
|
|
||
|
* htlib/Makefile.am: added langinfo.h
|
||
|
|
||
|
Wed Aug 11 15:00:07 1999 Loic Dachary <loic at yoda.ceic.com>
|
||
|
|
||
|
* acconfig.h: removed MAX_WORD_LENGTH
|
||
|
|
||
|
* re-run auto* to make sure chain is consistent
|
||
|
|
||
|
* Makefile.am: improve distclean for tests
|
||
|
|
||
|
Wed Aug 11 13:46:22 1999 Loic Dachary <loic at yoda.ceic.com>
|
||
|
|
||
|
* configure.in: change --enable-test to --enable-tests so
|
||
|
that Berkeley DB tests are not activated. Since they depend
|
||
|
on tcl this can be a pain.
|
||
|
|
||
|
* acinclude.m4: AM_PROG_TIME locate time command + find out
|
||
|
if verbose output is -l (freebsd) or -v (linux)
|
||
|
|
||
|
Wed Aug 11 13:13:39 1999 Loic Dachary <loic at yoda.ceic.com>
|
||
|
|
||
|
* acinclude.m4 : AM_WITH_ZLIB autoconf macro for zlib detection that
|
||
|
allows --with-zlib=DIR to specify the install root of zlib,
|
||
|
--without-zlib to prevent inclusion of zlib. If nothing
|
||
|
specified zlib is searched in /usr and /usr/local.
|
||
|
--disable-zlib is replaced with --without-zlib.
|
||
|
|
||
|
* configure.in,configure,aclocal.m4,db/dist/acinclude.m4,
|
||
|
db/dist/aclocal.m4,db/dist/configure,db/dist/configure.in:
|
||
|
changed to use AM_WITH_ZLIB
|
||
|
|
||
|
Tue Aug 10 21:14:34 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htsearch/Display.cc (outputVariable): Fix compilation error with
|
||
|
assignment between char * and char *.
|
||
|
|
||
|
* htsearch/htsearch.cc (main): Use cleaner trick to sidestep
|
||
|
discarding const char * as suggested by Gilles.
|
||
|
|
||
|
Tue Aug 10 17:24:12 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/Display.cc(expandVariables): clean up, simplify and
|
||
|
label lexical analyzer states.
|
||
|
|
||
|
Tue Aug 10 17:04:54 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/Display.cc(expandVariables, outputVariable): add handling
|
||
|
for $%(var) and $&(var) in templates. Still to be documented.
|
||
|
|
||
|
Tue Aug 10 20:13:52 1999 Loic Dachary <loic at yoda.ceic.com>
|
||
|
|
||
|
* db/mp/mp_bh.c: fixed HAVE_ZLIB -> HAVE_LIBZ
|
||
|
|
||
|
Tue Aug 10 17:58:01 1999 Loic Dachary <loic at yoda.ceic.com>
|
||
|
|
||
|
* configure,configure.in,db/dist/configure.in,db/dist/configure:
|
||
|
added --with-zlib configure flag for htdig to specify zlib
|
||
|
installation path. Motivated to have compatible tests between
|
||
|
htdig and db as far as zlib is concerned. Otherwise configuration
|
||
|
is confused and miss an existing libz.
|
||
|
|
||
|
Tue Aug 10 17:44:49 1999 Loic Dachary <loic at yoda.ceic.com>
|
||
|
|
||
|
* db/mp/mp_fopen.c: fixed cmpr_open called even if libz not here
|
||
|
|
||
|
Tue Aug 10 17:40:53 1999 Loic Dachary <loic at yoda.ceic.com>
|
||
|
|
||
|
* htlib/langinfo.h: header missing on FreeBSD-3.2, needed
|
||
|
by strptime.c
|
||
|
|
||
|
Tue Aug 10 11:43:14 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/HTML.h, htdig/HTML.cc(parse, do_tag): fix problems with
|
||
|
SGML entity decoding, add decoding of entities within tag attributes.
|
||
|
|
||
|
Mon Aug 9 21:13:50 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/HtHTTP.h(SetRequestMethod): Fix declaration to be void.
|
||
|
|
||
|
* htdig/Transport.h(GetRequestMaxDocumentSize): Fix declaration to
|
||
|
return int.
|
||
|
|
||
|
* htdig/Retriever.cc(got_href): Fix mistake in hopcount
|
||
|
calculations. Now returns the correct hopcount even for pages
|
||
|
when a faster path is found. (Still need to change indexing to
|
||
|
sort on hopcount).
|
||
|
|
||
|
* htsearch/htsearch.cc(main): Fix compiler error in gcc-2.95 when
|
||
|
discarding const by using strcpy. It's a hack, hopefully there's a
|
||
|
better way.
|
||
|
|
||
|
Mon Aug 9 17:23:15 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htlib/URL.cc(ServerAlias): fix small memory leak in new default
|
||
|
path code (don't need to allocate new from string each time).
|
||
|
|
||
|
* htlib/cgi.cc(init): Fix PR#572, where htsearch crashed if
|
||
|
CONTENT_LENGTH was not set but REQUEST_METHOD was.
|
||
|
|
||
|
* htfuzzy/Fuzzy.cc(getWords), htfuzzy/Metaphone.cc(vscode):
|
||
|
Fix Geoff's change of May 15 to Fuzzy.cc, add test to vscode macro
|
||
|
to stay in array bounds, so non-ASCII letters to cause segfault.
|
||
|
Should fix PR#514.
|
||
|
|
||
|
Mon Aug 9 17:03:45 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* include/htconfig.h.in, htcommon/WordList.cc(Word,Flush&BadWordFile),
|
||
|
htcommon/DocumentRef.cc(AddDescription), htcommon/defaults.cc,
|
||
|
htsearch/parser.cc(perform_push), htdoc/attrs.html,
|
||
|
htdoc/cf_byname.html, htdoc/cf_byprog.html:
|
||
|
Convert the MAX_WORD_LENGTH compile-time option into the run-time
|
||
|
configuration attribute maximum_word_length. This required reinserting
|
||
|
word truncation code that had been taken out of WordList.cc.
|
||
|
|
||
|
Mon Aug 9 16:34:14 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/HtHTTP.cc (isParsable): allow application/pdf as parsable,
|
||
|
to use builtin PDF code.
|
||
|
|
||
|
* htdig/HtHTTP.cc (ParseHeader),
|
||
|
htdig/Document.cc (readHeader): clean up header parsing.
|
||
|
|
||
|
* htdig/Document.cc (getdate): make tm static, so it's initialized
|
||
|
to zeros. Should fix PR#81 & PR#472, where strftime() would crash
|
||
|
on some systems. Idea submitted by benoit.sibaud at cnet.francetelecom.fr
|
||
|
|
||
|
* htlib/URL.cc (parse): fix PR#348, to make sure a missing or invalid
|
||
|
port number will get set correctly.
|
||
|
|
||
|
Mon Aug 9 15:42:41 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html:
|
||
|
Added descriptions for attributes that were missing, added a few
|
||
|
clarifications, and corrected a few defaults and typos.
|
||
|
Covers PR#558, PR#626, and then some.
|
||
|
|
||
|
* configure.in, configure, include/htconfig.h.in, htlib/regex.c:
|
||
|
PR#545 fixed - configure tests for presence of alloca.h for regex.c
|
||
|
|
||
|
Sat Aug 07 13:40:17 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* configure.in: remove test for strptime. Run autoconf + autoheader.
|
||
|
|
||
|
* htlib/HtDateTime.cc: always use htdig strptime, do not try to use
|
||
|
existing function in libc.
|
||
|
|
||
|
* htlib/HtDateTime.h: move inclusion of htconfig.h on top of file,
|
||
|
change #ifdef HAVE_CONFIG to HAVE_CONFIG_H
|
||
|
|
||
|
Fri Aug 6 16:37:33 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/Document.cc (UseProxy): fix call to match() and test of
|
||
|
return value to work as documented for http_proxy_exclude (PR#603).
|
||
|
|
||
|
Fri Aug 06 15:06:23 1999 <loic at yoda.ceic.com>
|
||
|
|
||
|
* db/dist/config.hin, db/mp/mp_cmpr.c db/db/db.c, db/mp/mp_fopen.c:
|
||
|
disable compression if zlib not found by configure.
|
||
|
|
||
|
Thu Aug 05 12:27:15 1999 <loic at yoda.ceic.com>
|
||
|
|
||
|
* test/dbbench.cc: invert -z and -Z for consistency
|
||
|
|
||
|
* test/Makefile.am: add dbbench call examples
|
||
|
|
||
|
Thu Aug 05 11:38:58 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* test/Makefile.am: all .html go in distribution, compile dbbench
|
||
|
that tests Berkeley DB performances.
|
||
|
|
||
|
* configure.in/Makefile.am: conditional inclusion of the test
|
||
|
directory in the list of subdirs (--enable-test). The list
|
||
|
of subdirs is now @HTDIGDIRS@ in configure.in & Makefile.am
|
||
|
|
||
|
* db/*: Transparent I/O compression implementation. Defines the DB_COMPRESS flag.
|
||
|
For instance DB_CREATE | DB_COMPRESS.
|
||
|
|
||
|
* db/db_dump/load: add -C option to specify cache size to db_dump/db_load
|
||
|
|
||
|
Wed Aug 4 22:57:27 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* db/*: Import of Sleepycat's Berkeley DB 2.7.5.
|
||
|
|
||
|
Wed Aug 4 22:40:49 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* contrib/htparsedoc/htparsedoc: Add in contributed bug fixes from
|
||
|
Andrew Bishop to work on SunOS 4.x machines.
|
||
|
|
||
|
Wed Aug 4 01:58:52 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* COPYING, htdoc/COPYING, configure.in, Makefile.am, Makefile.in:
|
||
|
Update information to use canonical version of the GPL from the
|
||
|
FSF. In particular, this version has the correct mailing address
|
||
|
of the FSF.
|
||
|
|
||
|
Mon Aug 02 11:28:00 1999 Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* htlib/htString.h, htlib/String.cc : added the possibility to
|
||
|
insert an unsigned int into a string.
|
||
|
* htdig.cc : with verbose mode shows start and end time.
|
||
|
|
||
|
Thu Jul 22 18:10:00 1999 Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* htdig/Transport.cc, htdig/HtHTTP.cc : modified the destructors.
|
||
|
|
||
|
Thu Jul 22 13:10:00 1999 Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* htdig/Transport.cc, htdig/Transport.h, htdig/HtHTTP.cc,
|
||
|
htdig/HtHTTP.h: Re-analyzed inheritance methods and attributes of
|
||
|
the 2 classes. This is a first step, not definitive ... cos it
|
||
|
still doesn't work as I hope.
|
||
|
|
||
|
Tue Jul 20 11:21:52 1999 <loic at ceic.com>
|
||
|
|
||
|
* configure.in : added AM_MAINTAINER_MODE to prevent unwanted
|
||
|
dependencies check by default.
|
||
|
|
||
|
* db/Makefile.in : remove Makefile when distclean
|
||
|
|
||
|
Mon Jul 19 13:23:53 1999 <loic at ceic.com>
|
||
|
|
||
|
* Makefile.config (INCLUDES): added -I$(top_srcdir)/include because
|
||
|
automatically -I../include is not good, added -I$(top_builddir)/db/dist
|
||
|
because some db headers are configure generated (if building in a
|
||
|
directory that is not the source directory).
|
||
|
|
||
|
* rename db/Makefile db/Makefile.in: otherwise it does not show
|
||
|
up if if building in a directory that is not the source directory.
|
||
|
|
||
|
Mon Jul 19 13:02:22 1999 <loic at ceic.com>
|
||
|
|
||
|
* .cvsignore: do not ignore Makefile.config
|
||
|
|
||
|
Sun Jul 18 22:47:49 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htsearch/parser.cc: Eliminated compiler errors. Currently
|
||
|
returns no matches until bugs in the WordList code are fixed.
|
||
|
|
||
|
Sun Jul 18 22:42:04 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htmerge/htmerge.h: Cleanup, including WordRecord and
|
||
|
WordReference as needed.
|
||
|
|
||
|
* htmerge/htmerge.cc: Update for files necessary for merge
|
||
|
calls.
|
||
|
Call convertDocs before mergeWords so that the discardList gets
|
||
|
the list of documents deleted.
|
||
|
|
||
|
* htmerge/docs.cc: Update for difference in calling order.
|
||
|
|
||
|
* htmerge/words.cc: Update (and significant cleanup) since
|
||
|
WordList writes directly to db.words.db. Iterate over the stored
|
||
|
words, deleting those from deleted documents.
|
||
|
|
||
|
* htmerge/db.cc: Update to eliminate compiler errors. Currently
|
||
|
disabled until bugs in the words code are fixed.
|
||
|
|
||
|
Sun Jul 18 22:33:49 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htcommon/defaults.cc: Collapse the multiple heading_factors into
|
||
|
one. (It's prohibitive to define a flag for each h* tag).
|
||
|
Add a new url_factor for the text of URLs (presently unused).
|
||
|
|
||
|
* htcommon/DocumentRef.cc(AddDescription): Use FLAG_LINK_TEXT as
|
||
|
defined in htcommon/WordRecord.h.
|
||
|
|
||
|
* htdig/Retriever.h: Change factor to accomodate flags instead of
|
||
|
weighting factors.
|
||
|
|
||
|
* htdig/Retriever.cc: Update to use flags, and define the indexed
|
||
|
flags in factor as appropriate.
|
||
|
|
||
|
* htdig/HTML.cc: Update calls to got_word with appropriate new
|
||
|
offsets into factor[].
|
||
|
|
||
|
Sun Jul 18 22:18:16 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htcommon/WordReference.h, htcommon/WordRecord.h: Update to use
|
||
|
flags instead of weight.
|
||
|
|
||
|
* htcommon/WordList.h, htcommon/WordList.cc: Add database access
|
||
|
routines to match DocumentDB.cc.
|
||
|
(Word): Recognize flags instead of weight, simply add the
|
||
|
word. (Duplicates expected!)
|
||
|
(mark*): Simply delete the list of words.
|
||
|
(flush): Rather than dump to a text file, dump directly to the db.
|
||
|
|
||
|
Sun Jul 18 21:50:04 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/Database.h, htlib/DB2_db.h, htlib/DB2_hash.h: Add new
|
||
|
method Get_Item to access the data of the current item when using
|
||
|
Get_Next() or Get_Next_Seq().
|
||
|
|
||
|
* htlib/DB2_db.h, htlib/DB2_hash.cc: Implement Get_Item() using
|
||
|
cursor access.
|
||
|
|
||
|
Sat Jul 17 12:59:01 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* test/*.html: Added various HTML files as the beginnings of a
|
||
|
testing suite.
|
||
|
|
||
|
Fri Jul 16 16:06:27 1999 Loic Dachary <loic at ceic.com>
|
||
|
|
||
|
* All libraries (except db) use libtools. Shared libraries are
|
||
|
generated by default. --disable-shared to get old behaviour.
|
||
|
Libraries are installed in all cases.
|
||
|
|
||
|
* Change structure of default installation directory (match
|
||
|
standard).
|
||
|
database : var/htdig
|
||
|
programs : bin
|
||
|
libraries : lib
|
||
|
|
||
|
Like default apache:
|
||
|
conf : conf
|
||
|
htdocs : htdocs/htdig
|
||
|
cgi-bin : cgi-bin
|
||
|
|
||
|
* Switch all Makefile.in into Makefile.am
|
||
|
|
||
|
* CONFIG.in CONFIG : removed. Replaced with --with- arguments in
|
||
|
configure.in
|
||
|
|
||
|
* Makefile.config.in removed, only keep Makefile.config : automake
|
||
|
automatically defines variables for each AC_SUBST variables.
|
||
|
Makefile.config has HTLIBS + DEFINES
|
||
|
|
||
|
* db/Makefile : added to forward (clean all distclean) targets to
|
||
|
db/dist and implement distdir target.
|
||
|
|
||
|
* acconfig.h : created to allow autoheader to work (contains GETPEERNAME_LENGTH_T
|
||
|
HAVE_BOOL, HAVE_TRUE, HAVE_FALSE, NEED_PROTO_GETHOSTNAME). Extra definitions
|
||
|
added before @TOP@ (TRUE, FALSE, VERSION, MAX_WORD_LENGTH, LOG_LEVEL, LOG_FACILITY).
|
||
|
|
||
|
* installdir/Makefile.am : installation rules moved from Makefile.am to installdir/Makefile.am
|
||
|
|
||
|
* include/Makefile.am : distribute htconfig.h.in and stamp-h.in
|
||
|
|
||
|
* Makefile.am : do not pre-create the directories, creation is done during the installation
|
||
|
|
||
|
* configure.in: CF_MAKE_INCLUDE not needed anymore : automake handles
|
||
|
the include itself.
|
||
|
|
||
|
Fri Jul 16 13:04:27 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/HTML.cc(parse): fix to prevent closing ">" from being passed
|
||
|
to do_tag().
|
||
|
|
||
|
Thu Jul 15 21:25:12 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/Document.cc (readHeader, getParsable): Add back
|
||
|
application/pdf to use builtin PDF code.
|
||
|
|
||
|
* htdig/Makefile.in: Remove broken Postscript parser as it never
|
||
|
worked.
|
||
|
|
||
|
* htlib/URL.cc (normalizePath, path): Use config.Boolean as
|
||
|
pointed out by Gilles.
|
||
|
|
||
|
Thu Jul 15 15:54:30 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdoc/attrs.html(pdf_parser & external_parsers): add corrections &
|
||
|
clarifications, links to relevant FAQ entries.
|
||
|
|
||
|
Thu Jul 15 18:00:00 1999 CEST Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* htlib/HtDateTime.cc, htlib/HtDateTime.h : added the possibility
|
||
|
to initialize and compares HtDateTime with integers. Added the
|
||
|
constructor HtDateTime (int) and various operator overloading methods.
|
||
|
|
||
|
Wed Jul 14 22:57:14 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/URL.cc (normalizePath, path): If not case_sensitive,
|
||
|
lowercase the URL. Should ensure that all URLs are appropriately
|
||
|
lowercased, regardless of where they're generated.
|
||
|
|
||
|
Wed Jul 14 22:37:47 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/DB2_db.cc (OpenReadWrite, OpenRead): Add flag DB_DUP to
|
||
|
database to allow storage of duplicate keys (in this case,
|
||
|
words).
|
||
|
|
||
|
Tue Jul 13 15:36:40 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/HTML.cc (do_tag): Fix handling of <link> and <area>,
|
||
|
to use href= instead of src=.
|
||
|
|
||
|
Mon Jul 12 22:31:48 1999 Hanno Mueller <kontakt at hanno.de>
|
||
|
|
||
|
* contrib/scriptname/results.shtml: Remove unintentional $(VERSION).
|
||
|
|
||
|
Mon Jul 12 22:20:40 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/HTML.cc (do_tag): Cleanups suggested by Gilles, combining
|
||
|
<link> and <area>, <embed> <object> and <frame> and moving <img>
|
||
|
to a separate case.
|
||
|
|
||
|
Sun Jul 11 19:32:38 1999 Hanno Mueller <kontakt at hanno.de>
|
||
|
|
||
|
* contrib/README: Add scriptname directory.
|
||
|
|
||
|
* contrib/scriptname/*: An example of using htsearch within
|
||
|
dynamic SSI pages
|
||
|
|
||
|
* htcommon/defaults.cc: Add script_name attribute to override
|
||
|
SCRIPT_NAME CGI environment variable.
|
||
|
|
||
|
* htdoc/FAQ.html: Update question 4.7 based on including htsearch
|
||
|
as a CGI in SSI markup.
|
||
|
|
||
|
* htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html,
|
||
|
htdoc/hts_templates.html: Update based on behavior of script_name
|
||
|
attribute.
|
||
|
|
||
|
* htsearch/Display.cc: Set SCRIPT_NAME variable to attribute
|
||
|
script_name if set and CGI environment variable if undefined.
|
||
|
|
||
|
Sat Jul 10 00:22:34 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htfuzzy/Regex.cc (getWords): Anchor the match to the beginning
|
||
|
of string, add regex-interpeted characters to extra_word_chars
|
||
|
temporarily, and strip remaining punctuation before making a match.
|
||
|
|
||
|
Fri Jul 9 22:35:57 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htsearch/Display.cc: Back out change of June 24.
|
||
|
|
||
|
* htsearch/htsearch.cc: Ditto.
|
||
|
|
||
|
* htsearch/htsearch.cc (setupWords): Remove HtStripPunctuation in
|
||
|
favor of requiring Fuzzy classes to strip whatever punctuation is
|
||
|
necessary.
|
||
|
|
||
|
* htfuzzy/Fuzzy.h: Add HtWordType.h to #includes and update comments.
|
||
|
|
||
|
* htfuzzy/Synonym.cc, htfuzzy/Substring.cc, htfuzzy/Speling.cc,
|
||
|
htfuzzy/Prefix.cc, htfuzzy/Exact.cc, htfuzzy/Endings.cc,
|
||
|
htfuzzy/Fuzzy.cc (getWords): Call HtStripPunctuation on input before
|
||
|
performing fuzzy matching.
|
||
|
|
||
|
Thu Jul 8 21:28:44 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/HTML.cc (do_tag): Add support for parsing <LINK> tags.
|
||
|
|
||
|
Mon Jul 5 16:53:23 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/htdig.cc (main): Insert '*' instead of username/password
|
||
|
combination to hide credentials in process accounting.
|
||
|
|
||
|
Sat Jul 3 17:35:52 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/Transport.h(ConnectionWrite): Return value from
|
||
|
Connection::write call.
|
||
|
|
||
|
* htdig/URLRef.h, htdig/URLRef.cc: Cleanup and made hopcount
|
||
|
default consistent with 7/3 change to DocumentRef.cc
|
||
|
|
||
|
* htdig/Server.h, htdig/Server.cc, htdig/Retriever.cc: Cleanup and
|
||
|
fixes to match URLRef calling interface.
|
||
|
|
||
|
Sat Jul 3 16:37:29 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/HTML.cc (do_tag): Fix <meta> robots parsing to allow
|
||
|
multiple directives to work correctly. Fixes PR#578, as provided
|
||
|
by Chris Liddiard <c.h.liddiard at qmw.ac.uk>.
|
||
|
|
||
|
Sat Jul 3 00:47:51 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/Makefile.in: Remove old SGMLEntities code.
|
||
|
|
||
|
Sat Jul 3 00:26:55 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htcommon/DocumentRef.cc (Clear): Change default value of
|
||
|
docHopCount to 0 to fix several hopcount bugs.
|
||
|
|
||
|
* htdig/Transport.h, htdig/Transport.cc: Changes to support URL
|
||
|
referers as well as authentication credentials.
|
||
|
|
||
|
* htdig/HtHTTP.h, htdig/HtHTTP.cc(SetCredentials): Implement HTTP
|
||
|
Basic Authentication credentials.
|
||
|
(SetRequestCommand): Use Referer and Authentication headers if
|
||
|
supplied.
|
||
|
|
||
|
Sun Jun 30 11:26:00 1999 Gabriele Bartolini <g.bartol at comune.prato.it>
|
||
|
|
||
|
* htdig/Transport.h: Inserted the methods declarations regarding
|
||
|
the connection management. The code has been moved out from the
|
||
|
HtHTTP.h code. Also moved here the static variable 'debug'.
|
||
|
|
||
|
* htdig/Transport.cc: Definition of the connection management code.
|
||
|
The code has been moved out from the HtHTTP.cc code.
|
||
|
|
||
|
* htdig/HtHTTP.h: Eliminated the connection management code and the
|
||
|
static variable 'debug'. Inserted the 'modification_time_is_now' as
|
||
|
a static variable, in order to respect the encapsulation principle.
|
||
|
|
||
|
* htdig/HtHTTP.cc: Eliminated the connection management code and the
|
||
|
static variable 'debug' initialization. Inserted the
|
||
|
'modification_time_is_now' initialization.
|
||
|
|
||
|
Sun Jun 27 16:29:49 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/HTML.h: Cleanup.
|
||
|
|
||
|
* htcommon/defaults.cc: Added default for img_alt_factor for text
|
||
|
weighting on <IMG ALT="..." tags.
|
||
|
|
||
|
* htdig/Retriever.cc: Add slot for img_alt_factor.
|
||
|
|
||
|
* htdig/HTML.cc (do_tag): Rewrite using Configuration class to
|
||
|
separate tag attributes.
|
||
|
(parse): Ignore final '>' in string passed to do_tag.
|
||
|
(do_tag): Index IMG ALT text.
|
||
|
|
||
|
Fri Jun 25 17:58:44 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/Transport.h: Fix virtual methods for Transport_Response to
|
||
|
have defaults.
|
||
|
|
||
|
* htdig/HtHTTP.h: Fix class declaration of HtHTTP class to prevent
|
||
|
syntax error. Pointed out by Gabriele.
|
||
|
|
||
|
* htdig/Transport.cc: Add (empty) ctor and dtor functions for
|
||
|
Transport_Response.
|
||
|
|
||
|
Thu Jun 24 22:28:44 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htsearch/htsearch.cc (main): Add support for form inputs
|
||
|
configdir and commondir as contributed by Herbert Martin Dietze
|
||
|
<herbert at fh-wedel.de>.
|
||
|
|
||
|
* htsearch/Display.cc (createURL): If configdir and commondir are
|
||
|
defined, add them to URLs sent for other pages.
|
||
|
|
||
|
Wed Jun 23 23:00:18 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/HtHTTP.h, htdig/HtHTTP.cc: Make a subclass of Transport.
|
||
|
|
||
|
Wed Jun 23 22:08:20 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/Configuration.cc (Add): Handle single-quoted values for
|
||
|
attributes.
|
||
|
|
||
|
Tue Jun 22 23:35:39 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/Transport.h, htdig/Transport.cc: Virtual classes to handle
|
||
|
transport protocols such as HTTP, FTP, WAIS, gopher, etc.
|
||
|
|
||
|
* htdig/Makefile.in: Make sure they're compiled (not that there's
|
||
|
much!)
|
||
|
|
||
|
* htdig/HtHTTP.h: Add htdig.h to ensure config is defined.
|
||
|
|
||
|
Mon Jun 21 14:33:10 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/Document.cc(readHeader), htdig/HtHTTP.cc(ParseHeader): fix
|
||
|
handling of modification_time_is_now in readHeader, add similar code
|
||
|
to ParseHeader.
|
||
|
|
||
|
Sun Jun 20 21:25:15 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/Retriever.h: Add hop parameter to got_href
|
||
|
method. Defaults to 1.
|
||
|
|
||
|
* htdig/Retriever.cc(got_href): Use it instead of constant 1.
|
||
|
|
||
|
* htdig/HTML.cc (do_tag): Use new hop parameter to keep the same
|
||
|
hopcount for frame, embed and object tags.
|
||
|
|
||
|
* htdig/Makefile.in: Make sure HtHTTP.cc is compiled.
|
||
|
|
||
|
* htdig/HtHTTP.cc (ctor): Add default value for _server to make
|
||
|
prevent strange segmentation faults.
|
||
|
|
||
|
Fri Jun 18 09:53:30 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htcommon/DocumentRef.h, htcommon/DocumentRef.cc(Clear, Deserialize):
|
||
|
add docHeadIsSet field, code for setting and getting it.
|
||
|
* htcommon/DocumentDB.cc(Add): only put out excerpt record if DocHead
|
||
|
is really set.
|
||
|
* htmerge/doc.cc(convertDocs): add missing else after code to delete
|
||
|
documents with no excerpts.
|
||
|
(All these changes fix the disappearing excerpts problem in 3.2.)
|
||
|
|
||
|
Wed Jun 16 23:04:38 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/Document.cc (UseProxy): Change http_proxy_exclude to an
|
||
|
escaped regex string. Allows for much more complicated rules.
|
||
|
|
||
|
Wed Jun 16 16:04:07 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* Makefile.config.in: fix typo in name IMAGE_URL_PREFIX.
|
||
|
|
||
|
* htdig/Retriever.cc(IsValidURL): change handling of valids to only
|
||
|
reject if list is not empty, give different error message.
|
||
|
|
||
|
Wed Jun 16 14:40:56 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/htsearch.cc(main): pass StringList args to setEscaped()
|
||
|
instead of unprocessed input[] char *'s.
|
||
|
|
||
|
* htsearch/Display.cc(buildMatchList): cast score to (int) in maxScore
|
||
|
calculation, to avoid compiler warnings.
|
||
|
|
||
|
* htdig/htdig.cc(main): change comparison on minimalFile to avoid
|
||
|
compiler warnings.
|
||
|
|
||
|
Wed Jun 16 11:30:23 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htlib/HtRegex.cc(setEscaped): Fix appending of substring to avoid
|
||
|
compiler warnings.
|
||
|
|
||
|
* htlib/HtDateTime.cc(SettoNow): Strip out all the nonsense that
|
||
|
doesn't work, set Ht_t directly instead.
|
||
|
|
||
|
Wed Jun 16 09:58:12 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* configure.in, configure, Makefile.config.in: Correct handling of
|
||
|
SEARCH_FORM variable, as Gabriele recommended.
|
||
|
|
||
|
Wed Jun 16 09:32:06 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htlib/cgi.h, htlib/cgi.cc(cgi & init), htsearch/htsearch.cc
|
||
|
(main & usage): allow a query string to be passed as an argument.
|
||
|
|
||
|
Wed Jun 16 08:43:09 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htcommon/Makefile.in, htdig/Makefile.in, htfuzzy/Makefile.in,
|
||
|
htmerge/Makefile.in, htnotify/Makefile.in: Use standard $(bindir)
|
||
|
variable instead of $(BIN_DIR). Allows for standard configure flags
|
||
|
to set this. (Completes Geoff's change on May 15.)
|
||
|
|
||
|
Tue Jun 15 14:31:50 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/PDF.cc(parseNonTextLine): move line that clears _parsedString,
|
||
|
so title cleared even if rejected.
|
||
|
|
||
|
* htsearch/Display.cc(buildMatchList & sort): move maxScore calculation
|
||
|
from sort to buildMatchList, so it's done even if there's only 1 match.
|
||
|
|
||
|
Mon Jun 14 15:01:07 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/Document.cc(RetrieveHTTP): Show "Unknown host" message if
|
||
|
Connection::assign_server() fails (due to gethostbyname() failure).
|
||
|
|
||
|
Mon Jun 14 13:52:34 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htcommon/defaults.cc, htsearch/Display.h, htsearch/Display.cc,
|
||
|
htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html,
|
||
|
htdoc/hts_templates.html: add template_patterns attribute, to select
|
||
|
result templates based on URL patterns.
|
||
|
|
||
|
Sun Jun 13 16:29:19 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/Retriever.cc (IsValidURL): Add valid_extension list, as
|
||
|
requested numerous times.
|
||
|
|
||
|
* htcommon/defaults.cc: Add config attribute valid_extensions,
|
||
|
with default as empty.
|
||
|
|
||
|
Sat Jun 12 23:10:39 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htcommon/DocumentRef.h: Fix thinkos introduced in change earlier
|
||
|
today. Actually compiles correctly now.
|
||
|
|
||
|
Sat Jun 12 22:37:22 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/HtHTTP.cc (ParseHeader): Fix parsing to take empty headers
|
||
|
into account. Fixes PR#557.
|
||
|
|
||
|
* htsearch/Display.h, htsearch/Display.cc (excerpt): Fix
|
||
|
declaration to refer to first as reference--ensures ANCHOR is
|
||
|
properly set. Fixes PR#541 as suggested by <pmb1 at york.ac.uk>.
|
||
|
|
||
|
* htfuzzy/Endings.cc (getWords): Fixed PR#560 as suggested by
|
||
|
Steve Arlow <yorick at ClarkHill.com>. Solves problems with fuzzy
|
||
|
matching on words like -ness: witness, highness, likeness... Tries
|
||
|
to interpret words as root words before attempting stemming.
|
||
|
|
||
|
* installdir/search.html (Match): Add Boolean to default search
|
||
|
form, as suggested by PR#561.
|
||
|
|
||
|
* htlib/URL.cc (URL): Fix PR#566 by setting the correct length of
|
||
|
the string being matched. 'http://' is 7 characters...
|
||
|
|
||
|
Sat Jun 12 19:06:36 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/HtZlibCodec.h, htlib/HtZlibCodec.cc: New files. Provide
|
||
|
general access to zlib compression routines when available.
|
||
|
|
||
|
* htcommon/DocumentRef.h, htcommon/DocumentRef.cc: Remove
|
||
|
compression access and restore DocHead access through default
|
||
|
methods. Compression of excerpts will occur through the
|
||
|
HtZlibCodec classes and through the DocumentDB excerpt access.
|
||
|
|
||
|
Sat Jun 12 15:25:08 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htmerge/docs.cc (convertDocs): Load excerpt from external
|
||
|
database before considering it empty.
|
||
|
|
||
|
Sat Jun 12 14:41:54 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htsearch/Display.cc (displayMatch): Added patch from Torsten
|
||
|
Neuer <tneuer at inwise.de> to fix PR# 554.
|
||
|
|
||
|
* htdig/HTML.cc (do_tag): Add parsing for <embed> and <object>,
|
||
|
including suggestions from Gilles as to condensing cases with
|
||
|
<img> parsing.
|
||
|
|
||
|
Sat Jun 12 14:00:39 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/ExternalParser.cc (parse): Quote the filename before
|
||
|
passing it to the command-line to prevent shell escapes. Fixes PR#542.
|
||
|
|
||
|
Fri Jun 11 15:59:10 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htlib/URL.cc(removeIndex): use CompareWord instead of FindFirstWord,
|
||
|
to avoid substring matches.
|
||
|
|
||
|
Wed Jun 2 15:51:00 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htlib/URLTrans.cc(encodeURL): Fix to ensure that non-ASCII letters
|
||
|
get URL-encoded.
|
||
|
|
||
|
Mon May 31 22:40:29 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htcommon/DocumentDB.cc(ReadExcerpt): Fix silly typos with methods,
|
||
|
thinko with docID.
|
||
|
(Add): Add the excerpt *before* the URL index is written.
|
||
|
|
||
|
* htdig/Retriever.cc(isValidURL): Remove code restricting URLs to
|
||
|
relative and http://.
|
||
|
|
||
|
* htdig/htdig.cc(main): Unlink the doc_excerpt file when doing an
|
||
|
initial dig.
|
||
|
(main): Fix silly typo with minimumFile.
|
||
|
|
||
|
* htmerge/db.cc(mergeDB): Call DocumentDB::Open() with doc_excerpt for
|
||
|
consistency--doesn't actually do anything with it.
|
||
|
|
||
|
* htmerge/docs.cc(convertDocs): Ditto. Also don't delete a
|
||
|
document simply because it has an empty DocHead. Excerpts are now
|
||
|
stored in a separate database!
|
||
|
|
||
|
* htmerge/htmerge.h: Call mergeDB and convertDocs with
|
||
|
doc_excerpt parameter.
|
||
|
|
||
|
* htmerge/htmerge.cc(main): Ditto.
|
||
|
|
||
|
* htsearch/Display.h: Call ctor with all three doc db filenames.
|
||
|
|
||
|
* htsearch/Display.cc(Display): Call DocumentDB::Open with above.
|
||
|
(excerpt): Retrieve the excerpt from the excerpt database.
|
||
|
|
||
|
* htsearch/htsearch.cc: Call Display::Display with all three doc
|
||
|
db filenames.
|
||
|
|
||
|
Mon May 31 15:08:30 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htcommon/DocumentDB.h: Add new method ReadExcerpt to read the
|
||
|
excerpt from the separate (new) excerpt database. Change Open()
|
||
|
and Read() methods to account for this new database.
|
||
|
|
||
|
* htcommon/DocumentDB.cc (Open): Open the excerpt database too.
|
||
|
(Read): Ditto.
|
||
|
(Close): Close it if it exists.
|
||
|
(ReadExcerpt): Explicitly read the DocHead of this DocumentRef.
|
||
|
(Add): Make sure DocHeads go into the excerpt database.
|
||
|
(Delete): Make sure we delete the associated excerpt too.
|
||
|
(CreateSearchDB): Make sure we grab the excerpt from the database.
|
||
|
|
||
|
* htcommon/DocumentRef.cc(Serialize): Don't serialize the DocHead
|
||
|
field, this is done in the DocumentDB code.
|
||
|
|
||
|
* htcommon/defaults.cc(modification_time_is_now): Set to true to
|
||
|
avoid problems with not setting dates when no Last-Modified:
|
||
|
header appears.
|
||
|
(doc_excerpt): Add new attribute for the filename of the excerpt
|
||
|
database.
|
||
|
|
||
|
* htdig/HtHTTP.h: Remove incorrect virtual declarations from
|
||
|
Request and EstablishConnection methods. Assign void return value
|
||
|
to ResetStatistics since it doesn't return a value.
|
||
|
|
||
|
* htdig/htdig.cc (main): Add new "minimal" flag '-m' to only index
|
||
|
the URLs in the supplied file. Sets hopcount to ignore links.
|
||
|
|
||
|
Sun May 30 19:36:15 1999 Alexander Bergolth <leo at leo.wu-wien.ac.at>
|
||
|
|
||
|
* htlib/URL.cc (normalizePath): Fix bug that caused endless loops
|
||
|
and core dumps when normalizing URLs with more than one of
|
||
|
( "/../" | "/./" | "//" | "%7E" )
|
||
|
|
||
|
* htlib/HtDateTime.cc (Httimegm): Call Httimegm in timegm.c unless
|
||
|
HAVE_TIMEGM.
|
||
|
|
||
|
Wed May 26 23:15:46 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htmerge/db.cc (mergeDB): Add patch contributed by Roman Dimov
|
||
|
<roman at twist.mark-itt.ru> to fix problems with confusing docIDs,
|
||
|
resulting in documents in main db removed when the corresponding
|
||
|
DocID was supposed to be removed from the merged db.
|
||
|
|
||
|
Wed May 26 11:30:22 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htsearch/Display.h, htsearch/Display.cc, htsearch/htsearch.cc:
|
||
|
Switch restrict and excludes to use HtRegex instead of StringMatch.
|
||
|
|
||
|
* htdig/htdig.cc (main): Fix typo clobbering setting of
|
||
|
excludes. Obviously fixes problems with badquerystr and excludes!
|
||
|
|
||
|
* htdig/HtHTTP.cc (ParseHeader): Change parsing to skip extra
|
||
|
whitespace, as in 5/19 Document.cc(readHeader) change.
|
||
|
|
||
|
Wed May 19 22:17:49 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/HtHTTP.cc, htdig/HtHTTP.h: Add new files, contributed by
|
||
|
Gabriele. A start at an HTTP/1.1 implementation.
|
||
|
|
||
|
* htdig/Document.cc (readHeader): Fix change of 5/16 to actually
|
||
|
work! :-)
|
||
|
|
||
|
* htsearch/Display.cc (expandVariables): Change end-of-expansion
|
||
|
test to include states 2 and 5 to ensure templates ending in } are
|
||
|
still properly expanded, as suggested by Gilles.
|
||
|
|
||
|
Mon May 17 14:31:31 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/HtRegex.cc (setEscaped): Use full list of characters to
|
||
|
escape as suggested by Gilles.
|
||
|
|
||
|
Sun May 16 17:27:51 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/Document.cc (readHeader): Since multiple whitespace
|
||
|
characters are allowed after headers, don't use strtok.
|
||
|
(readHeader): We no longer pretend to parse Word, PostScript, or
|
||
|
PDF files internally.
|
||
|
(getParsable): Don't generate PostScript or PDF objects since we
|
||
|
no longer recommend using them.
|
||
|
|
||
|
Sun May 16 17:07:19 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/HtRegex.cc (setEscaped): Ensure escaping does not loop
|
||
|
beyond the end of a string.
|
||
|
|
||
|
* htdig/Retriever.cc (IsValidURL): Fix badquerystr parsing to use
|
||
|
HtRegex as expected. (Oops!)
|
||
|
|
||
|
* htdig/HTML.cc (parse): Use HtSGMLCodec during parsing, rather
|
||
|
than encoding the whole document at the beginning. More consistent
|
||
|
with previous use of SGMLEntities.
|
||
|
|
||
|
Sat May 15 12:57:40 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/URL.cc (normalizePath): Remove extra (useless) variable
|
||
|
declarations.
|
||
|
|
||
|
* htlib/htString.h, htlib/String.cc: Add new method Nth to solve
|
||
|
problems with (String *)->[].
|
||
|
|
||
|
* htlib/HtRegex.h, htlib/HtRegex.cc: Added new method
|
||
|
setEscaped(StringList) to produce a pattern connected with '|' of
|
||
|
possibly escaped strings. Strings are not escaped if enclosed in
|
||
|
[] and the brackets are removed from unescaped regex.
|
||
|
|
||
|
* htdig/htdig.h: Use HtRegex instead of StringMatch for limiting
|
||
|
by default.
|
||
|
|
||
|
* htdig/Retriever.cc: As above.
|
||
|
|
||
|
* htdig/htdig.cc(main): As above. Use setEscaped to set limits
|
||
|
correctly (i.e. in a backwards-compatible way).
|
||
|
|
||
|
Sat May 15 11:24:26 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htfuzzy/Speling.h, htfuzzy/Speling.cc: New files for simple
|
||
|
spelling corection. Currently limited to transpostion and added
|
||
|
character errors. Missing character errors to be added soon.
|
||
|
|
||
|
* htfuzzy/Makefile.in: Compile it.
|
||
|
|
||
|
* htfuzzy/Fuzzy.cc (getFuzzyByName): Use it.
|
||
|
|
||
|
* htcommon/defaults.cc: Add new option minimum_speling_length for
|
||
|
the shortest query word to receive speling fuzzy
|
||
|
modifications. Should prevent problems with valid words generating
|
||
|
unrelated "corrections" of words. Default is 5 chars.
|
||
|
|
||
|
Sat May 15 11:18:27 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htfuzzy/Fuzzy.cc (getWords): Ensure word is not an empty or null
|
||
|
string.
|
||
|
|
||
|
* htfuzzy/Metaphone.cc (generateKey): Ditto. Should solve PR#514.
|
||
|
|
||
|
* htdig/Document.cc (Reset): Do not use modification_time_is_now
|
||
|
attribute. Simply reset modtime to 0, time is set elsewhere.
|
||
|
|
||
|
* Makefile.config.in: Add options from separate CONFIG files.
|
||
|
|
||
|
* configure.in, configure: Add configure-level switches for
|
||
|
--with-image-url-prefix= and --with-search-form=. Do not generate
|
||
|
CONFIG file (hopefully to be phased out soon).
|
||
|
|
||
|
* */Makefile.in: Make linking CONFIG-dependent files depend on
|
||
|
Makefile.config, not CONFIG.
|
||
|
|
||
|
* Makefile.in: Use standard $(bindir) variable instead of
|
||
|
$(BIN_DIR). Allows for standard configure flags to set this.
|
||
|
|
||
|
Tue May 11 11:15:08 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/HtDateTime.h, htlib/HtDateTime.cc: Updates from Gabriele,
|
||
|
fixing SetToNow() and adding GetDiff to return the difference in
|
||
|
time_t between two objects.
|
||
|
|
||
|
* htdig/Retriever.cc (Need2Get): Add patch from Warren Jones
|
||
|
<wjones at tc.fluke.com> to keep track of inodes on local files to
|
||
|
eliminate duplicates. Hopefully this will serve for a first-try at
|
||
|
a signature method for HTTP as well.
|
||
|
|
||
|
Tue May 4 20:20:40 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htfuzzy/Regex.h, htfuzzy/Regex.cc: Add new regex fuzzy
|
||
|
algorithm, based on Substring and Prefix.
|
||
|
|
||
|
* htfuzzy/Fuzzy.cc (getFuzzyByName): Add it.
|
||
|
|
||
|
* htfuzzy/Makefile.in: Compile it.
|
||
|
|
||
|
* htcommon/defaults.cc: Add new attribute regex_max_words, same
|
||
|
concept as substring_max_words.
|
||
|
|
||
|
* htfuzzy/Exact.cc, htfuzzy/Substring.cc, htfuzzy/Prefix.cc:
|
||
|
Define names attribute for debugging purposes.
|
||
|
|
||
|
* installdir/htdig.conf: Fix the comments for search_algorithm to
|
||
|
refer to all the current possibilities.
|
||
|
|
||
|
* htlib/HtRegex.cc (match): Slight cleanup of how to return.
|
||
|
|
||
|
Tue May 4 15:28:38 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htsearch/htsearch.cc (reportError): Add e-mail of maintainer to
|
||
|
error message. Should help direct people to the correct place.
|
||
|
|
||
|
* htdig/Retriever.cc (IsValidURL): Lowercase all extensions from
|
||
|
bad_extensions as well as all extensions used in
|
||
|
comparisons. Ensures we're using case-insenstive matching.
|
||
|
|
||
|
Mon May 3 23:20:22 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/Retriever.cc (IsValidURL): Fix typo with #else statement
|
||
|
for REGEX.
|
||
|
|
||
|
* htdig/htdig.cc: Add conditionals for REGEX to use HtRegex
|
||
|
instead of StringMatch methods when defined.
|
||
|
|
||
|
* htlib/HtDateTime.h: Update to remove definitions of true and
|
||
|
false, established by May 2 change in
|
||
|
include/htconfig.h.in as contributed by Gabriele.
|
||
|
|
||
|
* htlib/HtDateTime.cc: Replace call to mktime internal function to
|
||
|
Httimegm in timegm.c, contributed by Leo.
|
||
|
|
||
|
* htlib/timegm.c: Declare my_mktime_gmtime_r to prevent compiler
|
||
|
errors with incompatible gmtime structures, contributed by Leo.
|
||
|
|
||
|
* configure.in: Rearrange date/time checks for clarity.
|
||
|
|
||
|
* configure: Regenerate using autoconf.
|
||
|
|
||
|
* include/htconfig.in: Add HAVE_STRFTIME flag.
|
||
|
|
||
|
Sun May 2 18:49:04 1999 Alexander Bergolth <leo at leo.wu-wien.ac.at>
|
||
|
|
||
|
* configure.in, include/htconfig.h.in: Added a configure test for
|
||
|
the availability of the bool type.
|
||
|
|
||
|
Fri Apr 30 20:00:09 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/HtDateTime.h, htlib/HtDateTime.cc: Update with new
|
||
|
versions sent by Gabriele.
|
||
|
|
||
|
Fri Apr 30 19:30:42 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/HtRegex.h, htlib/HtRegex.cc: New class, contributed by
|
||
|
Peter D. Gray <pdg at draci.its.uow.edu.au> as a small wrapper for
|
||
|
system regex calls.
|
||
|
|
||
|
* htlib/Makefile.in: Build it.
|
||
|
|
||
|
* htdig/htdig.h: Use it if REGEX is defined.
|
||
|
|
||
|
* htdig/htdig.cc: Ditto.
|
||
|
|
||
|
* htdig/Retriever.cc: Ditto.
|
||
|
|
||
|
* htsearch/Display.cc(generateStars): Remove extra newline after
|
||
|
STARSRIGHT and STARSLEFT variables, noted by Torsten Neuer
|
||
|
<tneuer at inwise.de>.
|
||
|
|
||
|
Fri Apr 30 18:52:56 1999 Alexander Bergolth <leo at leo.wu-wien.ac.at>
|
||
|
|
||
|
* htlib/URL.cc(ServerAlias): port for server_aliases entries now
|
||
|
defaults to 80 if omitted.
|
||
|
|
||
|
Wed Apr 28 19:57:38 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/HtDateTime.h, htlib/HtDateTime.cc: New class, contributed
|
||
|
by Gabriele.
|
||
|
|
||
|
* htlib/Makefile.in: Compile it.
|
||
|
|
||
|
* README: Update message from 3.1.0 (oops!) to 3.2.0, remove rx
|
||
|
directory.
|
||
|
|
||
|
* installdir/htdig.conf: Add example of no_excerpt_show_top
|
||
|
attribute in line with most user's expectations.
|
||
|
|
||
|
* contrib/README: Mention contributed section of the website.
|
||
|
|
||
|
* Makefile.in: Ignore mailarchive directory--now removed from CVS.
|
||
|
|
||
|
Wed Apr 28 10:46:31 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htmerge/db.cc(mergeDB): fix a few errors in how the merge index
|
||
|
name is obtained.
|
||
|
|
||
|
Tue Apr 27 23:00:39 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* Makefile.config.in: Remove now-useless LIBDIRS variable.
|
||
|
|
||
|
* mailarchive/Split.java, mailarchive/htdig: Remove ancient
|
||
|
mailarchive stuff.
|
||
|
|
||
|
Tue Apr 27 18:01:52 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/Display.cc(setupImages): Remove code setting URLimage to
|
||
|
a bogus pattern (remnant left over after merge).
|
||
|
|
||
|
Tue Apr 27 16:43:08 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/Document.cc(RetrieveHTTP): Show "Unable to build connection"
|
||
|
message at lower debug level.
|
||
|
|
||
|
Tue Apr 27 11:24:19 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htsearch/Display.h: Remove sort, compare functions re-introduced
|
||
|
in merge. Moved to ResultMatch by Hans-Peter's April 19th chnages.
|
||
|
|
||
|
* htsearch/Display.cc: Remove bogus call to ResultMatch:setRef,
|
||
|
removed by Hans-Pater's April 19th changes.
|
||
|
|
||
|
Sat Apr 24 21:08:35 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* Merge in changes from 3.1.2 (see below).
|
||
|
|
||
|
* htcommon/WordList.cc: Change valid_word to use iscntl().
|
||
|
|
||
|
* htdig/Plaintext.cc: Remove CVS Log.
|
||
|
|
||
|
* htdig/Retriever.cc: Fix ancient bug with empty excludes list.
|
||
|
|
||
|
* htlib/List.cc: Remove CVS Log, use more succinct test for
|
||
|
out-of-bounds.
|
||
|
|
||
|
* htsearch/Display.cc: Fix logic with starPatterns, only show top
|
||
|
of META description.
|
||
|
|
||
|
* htsearch/Display.h: Introduce headers needed for sort functionality.
|
||
|
|
||
|
* installdir/htdig.conf: Add example max_doc_size attribute as
|
||
|
well as example for including start_url from a file.
|
||
|
|
||
|
* htdoc/ChangeLog, htdoc/RELEASE.html, htdoc/FAQ.html,
|
||
|
htdoc/where.html, htdoc/cf_byname.html, htdoc/cf_byprog.html,
|
||
|
htdoc/uses.html, htdoc/contents.html, htdoc/mailarchive.html:
|
||
|
Merge in documentation updates from 3.1.2.
|
||
|
|
||
|
Sat Apr 24 15:18:45 1999 Hans-Peter Nilsson <hp at bitrange.com>
|
||
|
|
||
|
* htsearch/Display.cc (sort): Return immediately if <= 1 items to
|
||
|
sort.
|
||
|
|
||
|
Mon Apr 19 00:53:06 1999 Hans-Peter Nilsson <hp at bitrange.com>
|
||
|
|
||
|
* htsearch/ResultMatch.h (create): New. All (the only) ctor
|
||
|
caller changed to use this.
|
||
|
(setRef, getRef): Removed. Callers changed to use nearby data.
|
||
|
(incomplete): Removed.
|
||
|
(setIncompleteScore): Renamed to...
|
||
|
(setScore): ...this. All callers changed.
|
||
|
(setSortType): New.
|
||
|
(getTitle, getTime, setTitle, setTime, getSortFun): New virtual
|
||
|
functions.
|
||
|
(enum SortType): Moved from Display, private.
|
||
|
(mySortType): New static member.
|
||
|
|
||
|
* htsearch/ResultMatch.cc (mySortType): Define static member
|
||
|
variable.
|
||
|
(getScore): Remove handling of "incomplete". Moved to ResultMatch.h
|
||
|
(getTitle, getTime, setTitle, setTime): New dummy functions.
|
||
|
(class ScoreMatch, class TimeMatch, class IDMatch, class
|
||
|
TitleMatch): Derived classes with compare functions (from Display)
|
||
|
and extra sort-method-related members, as needed.
|
||
|
(setSortType): New, mostly moved from Display.
|
||
|
(create): New.
|
||
|
|
||
|
* htsearch/Display.h: Changed first argument from ResultMatch * to
|
||
|
DocumentRef *.
|
||
|
(compare, compareTime, compareID, compareTitle, enum SortType,
|
||
|
sortType): Removed.
|
||
|
|
||
|
* htsearch/Display.cc (display): Call ResultMatch::setSortType and
|
||
|
output syntax error page for invalid sort methods.
|
||
|
(displayMatch): Change first argument from ResultMatch * to
|
||
|
DocumentRef *ref. All callers changed.
|
||
|
(buildMatchList): Remove call to sortType and typ variable.
|
||
|
Always call (ResultMatch::)setTime and setTitle. Remove extra
|
||
|
call to setID.
|
||
|
(sort): Call (ResultMatch::)getSortFun for qsort compare function.
|
||
|
(compare, compareTime, compareID, compareTitle, sortType): Removed.
|
||
|
|
||
|
Wed Apr 14 21:21:35 1999 Alexander Bergolth <leo at leo.wu-wien.ac.at>
|
||
|
|
||
|
* htlib/regex.c: fixed compile problem with AIX xlc compiler
|
||
|
|
||
|
* htlib/HtHeap.h: fixed compile problem with AIX xlc compiler (bool)
|
||
|
|
||
|
* htlib/HtVector.h: ditto
|
||
|
|
||
|
* htsearch/Display.cc: fixed typo
|
||
|
|
||
|
Wed Apr 14 00:17:06 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htsearch/Display.h: Add compareID for sorting results by DocID.
|
||
|
|
||
|
* htsearch/Display.cc: As above.
|
||
|
|
||
|
Tue Apr 13 23:50:28 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htcommon/defaults.cc: Add new config option use_doc_date to use
|
||
|
document meta information for the DocTime() field.
|
||
|
|
||
|
* htdig/HTML.cc(do_tag): Call Retriever::got_time if use_doc_date
|
||
|
is set and we run across a META date tag.
|
||
|
|
||
|
* htdig/Retriever.h, htdig/Retriver.cc: Add new got_date
|
||
|
function. When called, sets the DocTime field of the DocumentRef
|
||
|
after parsing is completed. Currently assumes ISO 8601 format for
|
||
|
the date tag.
|
||
|
|
||
|
Sun Apr 11 12:51:39 1999 Hans-Peter Nilsson <hp at bitrange.com>
|
||
|
|
||
|
* htsearch/Display.cc (buildMatchList): Delete thisRef if excluded
|
||
|
by URL. Call setRef(NULL), not setRef(thisRef).
|
||
|
|
||
|
Wed Apr 7 19:35:42 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htsearch/htsearch.cc(usage): Remove bogus -w flag.
|
||
|
|
||
|
Thu Apr 1 12:05:11 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/htsearch.cc(main): Apply Gabriele's patch to avoid using an
|
||
|
invalid matchesperpage CGI input variable.
|
||
|
|
||
|
* htsearch/Display.cc(display) & (setVariables): Correct any invalid
|
||
|
values for matches_per_page attribute to avoid div. by 0 error.
|
||
|
|
||
|
Wed Mar 31 15:19:25 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htfuzzy/Synonym.cc: Fix previous fix of minor memory leak.
|
||
|
(db pointer wasn't properly set)
|
||
|
|
||
|
Mon Mar 29 10:31:09 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htsearch/Display.cc(excerpt): Added patch from Gabriele to
|
||
|
improve display of excerpts--show top of description always,
|
||
|
otherwise try to find the excerpt.
|
||
|
|
||
|
Sun Mar 28 19:45:02 1999 Hans-Peter Nilsson <hp at bitrange.com>
|
||
|
|
||
|
* htlib/HtWordType.h (HtIsWordChar): Avoid matching 0 when using
|
||
|
strchr.
|
||
|
(HtIsStrictWordChar): Ditto.
|
||
|
|
||
|
* htdig/ExternalParser.cc (parse): Before got_href call, set
|
||
|
hopcount of URL to that of base plus 1.
|
||
|
Add URL to external parser error output.
|
||
|
|
||
|
* htlib/URL.cc (URL(char *ref, URL &parent) ): Move call to
|
||
|
constructURL call inside previous else-clause.
|
||
|
(parse): Reset _normal, _signature, _user initially.
|
||
|
Commence parsing, even if no "//" is found. Do not set _normal
|
||
|
here.
|
||
|
(normalizePath): Call removeIndex finally.
|
||
|
|
||
|
* htcommon/WordRecord.h (WORD_RECORD_COMPRESSED_FORMAT)
|
||
|
[!NO_WORD_COUNT]: Change to "cu4".
|
||
|
|
||
|
* htlib/HtPack.cc (htPack): Correct handling at end of code-string
|
||
|
and end of encoding-byte. Add code 'c' for often-1 unsigned ints.
|
||
|
(htUnpack): Add handling of code 'c'.
|
||
|
|
||
|
Thu Mar 25 12:18:05 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* installdir/long.html, installdir/short.html: Remove backslashes
|
||
|
before quotes in HTML versions of the builtin templates.
|
||
|
|
||
|
* Makefile.in: Add long.html & short.html to COMMONHTML list, so
|
||
|
they get installed in common_dir.
|
||
|
|
||
|
Thu Mar 25 11:56:50 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/Display.cc(displayMatch), htcommon/defaults.cc,
|
||
|
htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html:
|
||
|
Add date_format attribute suggested by Marc Pohl.
|
||
|
|
||
|
Thu Mar 25 09:46:07 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/Display.cc(displayMatch): Avoid segfault when DocAnchors
|
||
|
list has too few entries for current anchor number.
|
||
|
|
||
|
Tue Mar 23 15:08:40 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/Display.cc(displayMatch): Fix problem when documents
|
||
|
did not have descriptions.
|
||
|
|
||
|
Tue Mar 23 14:17:14 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/PDF.cc(parseString): Use minimum_word_length instead of
|
||
|
hardcoded constant.
|
||
|
|
||
|
Tue Mar 23 14:02:40 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/HTML.cc: Fix bug where noindex_start was empty, allow case
|
||
|
insensitive matching of noindex_start & noindex_end.
|
||
|
|
||
|
* htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html:
|
||
|
Fix inconsistencies in documentation for noindex_start & noindex_end.
|
||
|
|
||
|
Tue Mar 23 14:01:16 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/HTML.cc: Add check for <a href=...> tag that is missing a
|
||
|
closing </a> tag, terminating it at next href.
|
||
|
|
||
|
Tue Mar 23 13:57:35 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/Document.cc: Fix check of Content-type header in readHeader(),
|
||
|
correcting bug introduced Jan 10 (for PR#91), and check against
|
||
|
allowed external parsers.
|
||
|
|
||
|
Tue Mar 23 13:54:35 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/HTML.cc: More lenient comment parsing, allows extra dashes.
|
||
|
|
||
|
Tue Mar 23 12:22:53 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htlib/Configuration.cc(Add): Fix function to avoid infinite loop
|
||
|
on some systems, which don't allow all the letters in isalnum() that
|
||
|
isalpha() does, e.g. accented ones.
|
||
|
|
||
|
* htdig/HTML.cc: Fix three reported bugs about inconsistent
|
||
|
handling of space and punctuation in title, href description & head.
|
||
|
Now makes destinction between tags that cause word breaks and those
|
||
|
that don't, and which of the latter add space.
|
||
|
|
||
|
Tue Mar 23 12:15:48 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/Plaintext.cc(parse): Use minimum_word_length instead of
|
||
|
hardcoded constant.
|
||
|
|
||
|
Tue Mar 23 12:11:04 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htmerge/words.cc(mergeWords): Fix to prevent description text
|
||
|
words from clobbering anchor number of merged anchor text words.
|
||
|
|
||
|
Tue Mar 23 12:02:00 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/Display.cc(generateStars): Add in support for use_star_image
|
||
|
which was lost when template support was put in way back when.
|
||
|
|
||
|
Tue Mar 23 11:47:52 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* Makefile.in: add missing ';' in for loops, between fi & done
|
||
|
|
||
|
Mon Mar 22 16:06:15 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htdig/HTML.cc: Check for presence of more than one <title> tag.
|
||
|
|
||
|
Mon Mar 22 15:32:15 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* contrib/parse_doc.pl: Fix handling of minimum word length.
|
||
|
|
||
|
Sun Mar 21 15:19:00 1999 Hans-Peter Nilsson <hp at bitrange.com>
|
||
|
|
||
|
* htlib/HtPack.cc (htPack): New.
|
||
|
* htlib/HtPack.h: New.
|
||
|
* htsearch/parser.cc (perform_push): Unpack WordRecords using
|
||
|
htUnpack.
|
||
|
* htsearch/htsearch.h: Add "debug" declaration.
|
||
|
* htmerge/words.cc (mergeWords): Pack WordRecords using htPack.
|
||
|
* htlib/Makefile.in (OBJS): Add HtPack.o
|
||
|
* htcommon/WordRecord.h: Add WORD_RECORD_COMPRESSED_FORMAT
|
||
|
|
||
|
* htdig/HTML.cc (parse): Keep contents in String variable
|
||
|
textified_contents while using its "char *".
|
||
|
|
||
|
* htsearch/Display.cc (excerpt): Similar for head_string.
|
||
|
|
||
|
Thu Mar 18 20:01:24 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* installdir/long.html, installdir/short.html: Write out HTML
|
||
|
versions of the builtin templates.
|
||
|
|
||
|
* installdir/htdig.conf: Add commented-out template_map and
|
||
|
template_name attributes to use the on-disk versions.
|
||
|
|
||
|
Tue Mar 16 03:06:06 1999 Hans-Peter Nilsson <hp at bitrange.com>
|
||
|
|
||
|
* htcommon/DocumentDB.cc (Delete): Fix bad parameter to Get: use
|
||
|
key, not DocID.
|
||
|
|
||
|
Tue Mar 16 01:50:16 1999 Hans-Peter Nilsson <hp at bitrange.com>
|
||
|
|
||
|
* htlib/HtWordType.h (class HtWordType): New.
|
||
|
* htlib/HtWordType.cc: New.
|
||
|
* htlib/Makefile.in (OBJS): Add HtWordType.o
|
||
|
|
||
|
* htdoc/attrs.html: Document attribute extra_word_characters.
|
||
|
* htdoc/cf_byprog.html: Ditto.
|
||
|
* htdoc/cf_byname.html: Ditto.
|
||
|
|
||
|
* htcommon/defaults.cc (defaults): Add extra_word_characters.
|
||
|
|
||
|
* htsearch/htsearch.h: Lose spurious extern declaration of unused
|
||
|
variable valid_punctuation.
|
||
|
* htsearch/htsearch.cc (main): Call HtWordType::Initialize.
|
||
|
(setupWords): Use HtIsWordChar, HtIsStrictWordChar and
|
||
|
HtStripPunctuation. Do not read valid_punctuation.
|
||
|
|
||
|
* htsearch/Display.cc (excerpt): Use HtIsStrictWordChar.
|
||
|
|
||
|
* htlib/StringMatch.cc (FindFirstWord): Ditto.
|
||
|
(CompareWord): Ditto.
|
||
|
|
||
|
* htdig/htdig.cc (main): Call HtWordType::Initialize.
|
||
|
|
||
|
* htdig/Retriever.h (class Retriever): Lose member
|
||
|
valid_punctuation.
|
||
|
* htdig/Retriever.cc (Retriever): Lose its initialization.
|
||
|
|
||
|
* htdig/Postscript.h (class Postscript): Lose member
|
||
|
valid_punctuation.
|
||
|
* htdig/Postscript.cc (Postscript): Lose its initialization.
|
||
|
(flush_word): Use HtStripPunctuation.
|
||
|
(parse_string): Use HtIsWordChar,
|
||
|
HtIsStrictWordChar and HtStripPunctuation.
|
||
|
|
||
|
* htdig/Parsable.h (class Parsable): Lose member
|
||
|
valid_punctuation.
|
||
|
* htdig/Parsable.cc (Parsable): Lose its initilization.
|
||
|
|
||
|
* htcommon/WordList.cc (valid_word): Use HtIsStrictWordChar.
|
||
|
(BadWordFile): Use HtStripPunctuation. Do not read
|
||
|
valid_punctuation.
|
||
|
|
||
|
* htcommon/DocumentRef.cc (AddDescription): Use HtIsWordChar,
|
||
|
HtIsStrictWordChar and HtStripPunctuation. Do not read
|
||
|
valid_punctuation.
|
||
|
|
||
|
* htdig/PDF.cc (parseString): Similar..
|
||
|
|
||
|
* htdig/HTML.cc (parse): Similar.
|
||
|
|
||
|
* htdig/Plaintext.cc (parse): Similar.
|
||
|
|
||
|
Sun Mar 14 14:04:31 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/Makefile.in: Add HtSGMLEntites.o to OBJS.
|
||
|
|
||
|
Sat Mar 13 21:29:38 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htcommon/DocumentDB.cc(Open, Read): Switch to DB_HASH for faster
|
||
|
access. Most important for very quick URL lookups!
|
||
|
|
||
|
* htcommon/DocumentRef.cc(AddDescription): Check to see that
|
||
|
description isn't a null string or contains only whitespace before
|
||
|
doing anything.
|
||
|
|
||
|
* htlib/HtSGMLCodec.h, htlib/HtSGMLCodec.cc: Add new class to
|
||
|
convert between SGML entities and high-bit characters.
|
||
|
|
||
|
* htdig/HTML.cc(parse): Use it instead of SGMLEntities.
|
||
|
|
||
|
* htsearch/Display.cc(excerpt): Use HtSGMLCodec to covert *back*
|
||
|
to SGML entities before displaying.
|
||
|
|
||
|
* htlib/HtHeap.cc: Cleaned up comments, use more efficient
|
||
|
procedure to build from a vector.
|
||
|
|
||
|
* htlib/HtWordCodec.cc(HtWordCodec): Fix bug with constructing from
|
||
|
uninitialized variables!
|
||
|
|
||
|
* htlib/URL.h, htlib/URL.cc: Initial support for multiple schemes and
|
||
|
user@host URLs.
|
||
|
|
||
|
* htlib/List.cc(Nth): Check for out-of-bounds requests before
|
||
|
doing anything.
|
||
|
|
||
|
Fri Mar 12 00:31:03 1999 Hans-Peter Nilsson <hp at bitrange.com>
|
||
|
|
||
|
* htlib/mktime.c (__mon_yday): Correct size to number of
|
||
|
initializers (2).
|
||
|
|
||
|
* htsearch/htsearch.cc (main): Remove doc_index handling.
|
||
|
|
||
|
* htsearch/ResultMatch.h (setURL): Change to setID, use int.
|
||
|
All callers changed.
|
||
|
(getURL): Change to getID.
|
||
|
All callers changed.
|
||
|
(String url): Change to "int id".
|
||
|
|
||
|
* htsearch/Display.h: (Display): Second parameter removed.
|
||
|
(docIndex) removed.
|
||
|
|
||
|
* htsearch/Display.cc (Display, ~Display): Do not handle
|
||
|
docIndex.
|
||
|
(display): Use DocumentDB::operator [](int), not
|
||
|
DocumentDB::operator [] (char *).
|
||
|
(buildMatchList): Changed to handle ResultMatch as DocID int,
|
||
|
instead of URL string: use DocumentDB::operator [](int), not
|
||
|
DocumentDB::operator [] (char *). Get DocumentRef directly, then
|
||
|
filter the URL by includeURL().
|
||
|
|
||
|
* htnotify/htnotify.cc (main): Use DocIDs(), not DocURLs().
|
||
|
Handle the change from String * to IntObject *.
|
||
|
|
||
|
* htmerge/htmerge.cc (main): Do not delete doc_index.
|
||
|
|
||
|
* htmerge/docs.cc (convertDocs): Test doc_index access as
|
||
|
read-only. Pass as parameter for docdb, do not handle separately.
|
||
|
|
||
|
* htmerge/docs.cc (convertDocs): Add debug messages about cause
|
||
|
when deleting documents. If verbose > 1, write id/URL for every URL.
|
||
|
|
||
|
* htmerge/db.cc (mergeDB): Handle doc_index, test accessibility.
|
||
|
|
||
|
* htlib/IntObject.h (class IntObject): Add int-constructor.
|
||
|
|
||
|
* htdoc/attrs.html (doc_index): Say that mapping is from document
|
||
|
URLs to numbers.
|
||
|
(doc_db): Say that indexing is on document number.
|
||
|
|
||
|
* htdoc/cf_byprog.html (doc_index): Move from htsearch to htdig
|
||
|
entry.
|
||
|
|
||
|
* htdig/htdig.cc (main): Add .work suffix to doc_index too.
|
||
|
Unlink doc_index if initial.
|
||
|
|
||
|
* htcommon/DocumentDB.h (Open): New second argument.
|
||
|
(Read): New second argument, default to 0.
|
||
|
(operator [](int)): New.
|
||
|
(Exists(char *), Delete(char *)): Change to int parameter.
|
||
|
(DocIDs, i_dbf): New.
|
||
|
|
||
|
* htcommon/DocumentDB.cc (operator [](int)): New.
|
||
|
(Exists(char *), Delete(char *)): Changed to DocID int parameter.
|
||
|
All callers changed.
|
||
|
(URLs): Assume keys are ok without probing for documents
|
||
|
with each key.
|
||
|
(DocIDs): New.
|
||
|
(Open): Take an index database file name as second argument.
|
||
|
All callers changed.
|
||
|
(Read): Similar, accept 0.
|
||
|
(all): Change to index on DocID.
|
||
|
|
||
|
Wed Mar 10 02:25:24 1999 Hans-Peter Nilsson <hp at bitrange.com>
|
||
|
|
||
|
* htdoc/attrs.html (template_name): Typo; used by htsearch, not
|
||
|
htdig.
|
||
|
|
||
|
Mon Mar 8 13:30:44 1999 Hans-Peter Nilsson <hp at bitrange.com>
|
||
|
|
||
|
* htdig/Retriever.cc (got_href): Check if the ref is for the
|
||
|
current document before adding it to the db.
|
||
|
|
||
|
Mon Mar 8 01:36:38 1999 Hans-Peter Nilsson <hp at bitrange.com>
|
||
|
|
||
|
* htlib/DB2_db.cc: Remove errno.
|
||
|
* htlib/DB2_hash.cc: Ditto.
|
||
|
|
||
|
Sun Mar 7 20:50:37 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htfuzzy/EndingsDB.cc(createDB): Use link and unlink to move,
|
||
|
rather than a non-portable system call.
|
||
|
|
||
|
* htcommon/DocumentRef.h, htcommon/DocumentRef.cc: Fix #ifdef
|
||
|
problems with zlib.
|
||
|
|
||
|
Sun Mar 7 09:39:37 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/timegm.c: Fix problems compiling on libc5 systems noted by
|
||
|
Hans-Peter.
|
||
|
|
||
|
* htlib/Makefile.in, Makefile.in, Makefile.config.in: Use regex.c
|
||
|
instead of rx.
|
||
|
|
||
|
* htfuzzy/EndingsDB.cc: Ditto.
|
||
|
|
||
|
* configure.in, configure: Don't bother to config rx directory.
|
||
|
|
||
|
Fri Mar 5 08:09:20 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* contrig/parse_doc.pl: uses pdftotext to handle PDF files,
|
||
|
generates a head record with punctuation intact, extra checks
|
||
|
for file "wrappers" & check for MS Word signature (no longer
|
||
|
defaults to catdoc), strip extra punct. from start & end of words,
|
||
|
rehyphenate text from PDFs.
|
||
|
|
||
|
Tue Mar 2 23:18:20 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htdig/htdig.cc: Renamed main.cc for consistency with other programs.
|
||
|
|
||
|
* htlib/DB2_hash.h, htlib/DB2_hash.cc: Added interface to Berkeley
|
||
|
hash database format.
|
||
|
|
||
|
* htlib/Makefile.in: Use them!
|
||
|
|
||
|
* htlib/Database.h: Define database types, allowing a choice
|
||
|
between different formats.
|
||
|
|
||
|
* htlib/Database.cc(getDatabaseInstance): Use passed type to pick
|
||
|
between subclasses. Currently only uses Hash and B-Tree formats of
|
||
|
Berkeley DB.
|
||
|
|
||
|
* htcommon/DocumentDB.cc, htfuzzy/Endings.cc,
|
||
|
htfuzzy/EndingsDB.cc, htfuzzy/Fuzzy.cc, htfuzzy/Prefix.cc,
|
||
|
htfuzzy/Substring.cc, htfuzzy/Synonym.cc, htfuzzy/htfuzzy.cc,
|
||
|
htmerge/docs.cc, htmerge/words.cc, htsearch/Display.cc,
|
||
|
htsearch/htsearch.cc: Use new form of getDatabaseInstance(),
|
||
|
currently with DB_BTREE option (for compatibility).
|
||
|
|
||
|
Mon Mar 1 22:53:37 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/regex.c, htlib/striptime.c: Import new versions from
|
||
|
glibc.
|
||
|
|
||
|
* htlib/Makefile.in, htlib/mktime.c, htlib/timegm.c, htlib/lib.h:
|
||
|
Changes to use glibc timegm() function instead of buggy mytimegm().
|
||
|
|
||
|
* htdig/Document.cc(getdate): Use it.
|
||
|
|
||
|
Tue Mar 2 02:35:50 1999 Hans-Peter Nilsson <hp at bitrange.com>
|
||
|
|
||
|
* attrs.html: Rephrase and clarify entry for url_part_aliases.
|
||
|
|
||
|
Sun Feb 28 23:25:40 1999 Hans-Peter Nilsson <hp at bitrange.com>
|
||
|
|
||
|
* htlib/HtURLCodec.cc (~HtURLCodec): Add missing deletion of
|
||
|
myWordCodec.
|
||
|
|
||
|
Fri Feb 26 19:03:58 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* configure, configure.in: Fix typo on timegm test.
|
||
|
|
||
|
* htlib/mytimegm.cc: Fix Y2K problems.
|
||
|
|
||
|
Wed Feb 24 21:09:19 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htsearch/htsearch.cc(main): Remember to delete the parser!
|
||
|
|
||
|
* htlib/String.cc(String(char *s, int len)): Remove redundant copy.
|
||
|
|
||
|
* htsearch/Display.cc(display): Free DocumentRef memory after
|
||
|
displaying them.
|
||
|
(displayMatch): Fix memory leak when documents did not have anchors.
|
||
|
|
||
|
Wed Feb 24 15:18:26 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/Configuration.cc(Add): Fix small leak in locale code.
|
||
|
|
||
|
* htlib/String.cc: Fix up code to be cleaner with memory
|
||
|
allocation, inline next_power_of_2.
|
||
|
|
||
|
Mon Feb 22 22:13:49 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/String.cc, htlib/htString.h: Fix some memory leaks.
|
||
|
|
||
|
Mon Feb 22 08:52:19 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/Dictionary.h, htlib/Dictionary.cc(hashCode): Check if key
|
||
|
can be converted to an integer using strtol. If so, use the
|
||
|
integer as the hash code.
|
||
|
|
||
|
* htlib/HtVector.h, htlib/HtVector.cc: Implement Release() method
|
||
|
and make sure delete calls are done properly.
|
||
|
|
||
|
* htsearch/ResultList.h, htsearch/ResultList.cc(elements): Use HtVector
|
||
|
instead of List.
|
||
|
|
||
|
* htsearch/parser.cc: Ditto.
|
||
|
|
||
|
Sun Feb 21 16:13:59 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/HtHeap.h, htlib/HtHeap.cc: Add new class.
|
||
|
|
||
|
* htlib/Makefile.in: Compile it.
|
||
|
|
||
|
* htlib/HtVector.h, htlib/HtVector.cc: Add Assign() to assign to
|
||
|
elements of vectors.
|
||
|
|
||
|
Sun Feb 21 14:45:26 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htsearch/htsearch.cc: Add patch from Jerome Alet <alet at unice.fr>
|
||
|
to allow '.' in config field but NOT './' for security reasons.
|
||
|
|
||
|
* htdig/HTML.cc: Add patch from Gabriele to ensure META
|
||
|
descriptions are parsed, even if 'description' is added to the
|
||
|
keyword list.
|
||
|
|
||
|
Sun Feb 21 14:43:44 1999 Gilles Detillieux <grdetil at scrc.umanitoba.ca>
|
||
|
|
||
|
* htsearch/parser.h, htsearch/parser.cc: Clean up patch made for
|
||
|
error messages, made on Feb 16.
|
||
|
|
||
|
Thu Feb 18 20:19:30 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* htlib/HtVector.h, htlib/HtVector.cc: Added new Vector class.
|
||
|
|
||
|
* htlib/Makefile.in: Compile it.
|
||
|
|
||
|
* htlib/strptime.c: Add new version from glibc-2.1, replacing
|
||
|
strptime.cc.
|
||
|
|
||
|
* htdig/Document.cc: Use it.
|
||
|
|
||
|
* htlib/regex.h, htlib/regex.c: Add new files from glibc-2.1.
|
||
|
|
||
|
* htlib/mktime.c: Update from glibc-2.1.
|
||
|
|
||
|
Wed Feb 17 23:44:59 1999 Geoff Hutchison <ghutchis at wso.williams.edu>
|
||
|
|
||
|
* configure.in, configure, aclocal.m4: Add autoconf macro to
|
||
|
detect syntax of makefile includes.
|
||
|
|
||
|
* Makefile.in, Makefile.config.in, */Makefile.in: Change include
|
||
|
syntax to use it.
|
||
|
|
||
|
Wed Feb 17 12:36:42 1999 Hans-Peter Nilsson <hp at bitrange.com>
|
||
|
|
||
|
* htcommon/defaults.cc (defaults): locale: change to "C".
|
||
|
|
||
|
Local Variables:
|
||
|
add-log-time-format: current-time-string
|
||
|
End:
|