You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Slávek Banko 8c787c3591
DEB htdig: Added to repository.
3 years ago
..
README DEB htdig: Added to repository. 3 years ago
catdoc.c DEB htdig: Added to repository. 3 years ago
htparsedoc DEB htdig: Added to repository. 3 years ago

README

>    Subject: htdig: HTDIG: Searching Word files
>         To: htdig@sdsu.edu
>       From: Richard Jones <rjones@imcl.com>
>       Date: Tue, 15 Jul 1997 12:44:03 +0100
>
> I'm currently trying to hack together a script to search
> Word files. I have a little program called `catdoc' (attached)
> which takes Word files and turns them into passable text files.
> What I did was write a shell script around this called
> `htparsedoc' (also attached) and add it as an external
> parser:
> 
>         --- /usr/local/lib/htdig/conf/htdig.conf ---
> 
>         # External parser for Word documents.
>         external_parsers:       "applications/msword"
> "/usr/local/lib/htdig/bin/htparsedoc"
> 
> This script produces output like this:
> 
>         t Word document http://annexia.imcl.com/test/comm.doc
>         w INmEDIA 1 -
>         w Investment 2 -
>         w Ltd 3 -
>         w Applications 4 -
>         w Subproject 5 -
>         w Terms 6 -
>         w of 7 -
>  [...]
>         w Needed 994 -
>         w Tbd 995 -
>         w Resources 996 -
>         w Needed 997 -
>         w Tbd 998 -
>         w i 1000 -
>