ftp.nice.ch/pub/next/connectivity/infosystems/WAIStation.1.9.6.N.b.tar.gz#/WAIS/doc/waisindex.txt

This is waisindex.txt in view mode; [Download] [Up]



WAISINDEX(1)        UNIX Programmer's Manual         WAISINDEX(1)



NAME
     waisindex - Indexes files

SYNOPSIS
     waisindex [ -d index_filename ] [ -a ] [ -r ]
     [ -mem mbytes ] [ -register ] [ -export ] [ -e [ file ] ]
     [ -l log_level ] [ -pos | -nopos ] [ -nopairs | -pairs ]
     [ -nocat ] [ -T type ] [ -t type ] [ -contents | -nocon-
     tents ] filename filename ...

DESCRIPTION
     waisindex creates an index of the words in files so that
     they can be searched quickly (see waissearch).  The index
     takes about as much disk space as the original text.  It
     also creates a new source structure named index_filename.src
     if none exists.

OPTIONS
     -d index_filename
               This is the base filename for the index files.
               Therefore if /usr/local/foo is specified, then the
               index files will be called /usr/local/foo.dct etc.
               The index should be stored on the local file sys-
               tem of the machine running waisindex.  It works
               over NFS, but it is much slower.

     -a        Append this index to an existing one.  Useful for
               incremental additions or updates.  This will only
               add onto an index, so that if a file has changed,
               it will get reindexed, but the old entries will
               not be purged.  Therefore, to save space, it is a
               good idea to reindex the whole set of files
               periodically.

     -r        Recursively index subdirectories.

     -mem      How much main memory to use during indexing.  This
               variable will have a large effect on how fast
               indexing is done.

     -register Register this database with the directory of
               servers.  You are encouraged to register data-
               bases, but only ones that will be consistently
               running.  The directory of servers is available to
               anyone that is on the internet or can phone in.

     -export   This causes the resulting source description file
               to include the host-name and tcp-port for use by
               the clients.  Otherwise the file contains no con-
               nection information, and is expected to be used
               only for local searches.

     -e [ filename ]
               Redirect error output to pathname, if supplied, or
               to /dev/null.  Error output defaults to stderr,
               unless -s is selected, in which case it defaults
               to /dev/null.

     -l log_level
               set logging level.  Currently only levels 0, 1, 5
               and 10 are meaningful: Level 0 means log nothing
               (silent).  Level 1 logs only errors and warnings
               (messages of HIGH priority), level 5 logs messages
               of MEDIUM priority (like indexing filename info).
               Level 10 logs everything.

     -pos (-nopos)
               Include (don't include - default) word position
               information in the index.  This will increase the
               index size, but will allow search engines to do
               proximity.

     -nopairs (-pairs)
               Don't build (build - the default) word pairs from
               consecutive capitalized words.

     -nocat    Inhibits the creation of a catalog.  This is use-
               ful for databases with a large number of docu-
               ments, as the catalog contains 3 lines per docu-
               ment.

     -contents (-nocontents)
               Include (exclude) the contents of the file from
               the index.  The filename and header will still be
               indexed.  Default is type depedant.

     -T type   Sets the TYPE of the document to "type".

     -t type   This is the format of files that are handled by
               waisindex.  It is easy to parse a different for-
               mat, but that has to be done by changing the
               source (ircfiles.c).  To find out the list of
               currently known types, execute the waisindex com-
               mand with no arguments and it will list them.

     filename filename...
               These are the files that will be indexed according
               to the arguments above.  To insure the files are
               registered in the filename table correctly, it is
               advised that these be full paths (beginning with a
               /).  If the database is to be used from a machine
               other than the machine on which the index is
               created, this should be a machine-independant
               path.


SEE ALSO
     waissearch(1), waisserver(1), waissearch-gmacs(1), xwais(1),
     xwaisq(1)

     Wide Area Information Servers Concepts by Brewster Kahle.
     Brewster@think.com


DIAGNOSTICS
     The diagnostics produced by the waisindex are meant to be
     self-explanatory.


BUGS
     It temporarily takes twice the space it needs for an index.

     Due to some compile time constants the document table is
     limited to 16 Megabytes.  This limits the indexer to data-
     bases with headlines that add up to less than 16 megabytes
     (since thats the principal component of the table).  This is
     typically a problem for database types where a record is
     essentially a headline (one_line, archie).

     See the note in ir/README in the wais distribution for more
     detail.




























































These are the contents of the former NiCE NeXT User Group NeXTSTEP/OpenStep software archive, currently hosted by Netfuture.ch.