ftp.nice.ch/pub/next/tools/services/HtmlIndex.0.13.README

This is the README for HtmlIndex.0.13.N.bs.tar.gz [Download] [Browse] [Up]

HtmlIndex (HtmlFilter) [V0.1 by Juergen Sell, js@euler.han.de]
Based on NewsIndex by Izumi Ohzawa, izumi@pinoko.berkeley.edu.
HIs work, my errors.


Html filtering and description services for DL indexing of 
html articles.  The following two services are implemented.

[1] HtmlDescribe Service:
Describes html articles based currently on TITLE and H1 tags.
With this service, when you search in DigitalLibrarian,
titles are listed in the format:

title - header1



[2] HtmlFilter Service:
Starting with Version 0.1, another service -HtmlFilter:... has been added.
The purpose of this filter is to remove junk, such as html  lines  before the
article text is handed over to indexing scanner.  This should reduce
the size of .index.store somewhat (upto 20% compared with Version 0.91).

Code is a quick hack, but it should serve a purpose as a starting point
for writing other description filter daemons for DL.

Advantage of this Listener daemon scheme over the Unix stdio filter
(invoked via NXUNIXSTDIO port) is that the daemon based filter is
invoked only once per DL indexing session, while stdio filter is
invoked for every article indexed.
Daemons can keep running indefinitely, but this one quits after some
duration of inactivity.


No Copyright is claimed.
This program is hereby released into the public domain.

Benoät GrangÝ [ben@fizz.fdn.org] distributed a similar daemon
free of charge, but no source code was included in the distribution.
This version has been developed based on NewsIndex by myself, and comes with
sources.


BTW, this thing works with html articles.

	js

--- To Build FAT binary: --------------------------------------------------------
Launch ProjectBuilder.app, do Project->Open Makefile, and open the Makefile
in this directory.  Select target <Default>, and build!
If there is an error, first select target "clean", build, and then select
target <Default> and build.

--- Installation Procedure ------------------------------------------------------
[1] Copy "HtmlIndexing.service" folder into
/LocalLibrary/Services or ~/Library/Services.

[2] Copy .index.ftype, and .index.swords into
~/Library/HtmlGrazer/HtmlFolders directory.
(Enable Unix Expert mode in Preferences, if you don't see these files.)
Replace these files with new ones, even if you used HtmlIndex0.9 - 0.91
and already have these in the HtmlFolders directory.



[4] Do "make_services" or, logout/relogin or whatever necessary to
make WorkSpace recreate its services cache.  Try doing Command-u in
WorkSpace while ~/Library/Services/HtmlIndexing.service (or /LocalLibrary...)
is selected.

[5] Cd to ~/Library/HtmlGrazer/HtmlFolders, and do:

	rm .index.store
	ixbuild -gsv -LEnglish .
This will create the first usable index for DL.

[6] Start DL and drag ~/Library/HtmlGrazer/HtmlFolders onto shelf. Save.

[7] From this point on, you should be able to update the index from within
DL via the inspector.

Have fun.

-Izumi

---
Izumi Ohzawa             [ $@Bg_78^=;(J ]
USMail: University of California, 360 Minor Hall, Berkeley, CA 94720
Telephone: (510) 642-6440     Fax:  (510) 642-3323
Internet: izumi@pinoko.berkeley.edu (NeXTMail OK)


These are the contents of the former NiCE NeXT User Group NeXTSTEP/OpenStep software archive, currently hosted by Netfuture.ch.