ftp.nice.ch/pub/next/unix/text/NeXT_French_Dictionary.3.1.08.I.bs.tar.gz#/NeXT_French_Dictionary3.1.08/src/buildict.tar.gz

COPYING
 
README
 
TODO
 
buildict.pl
[View buildict.pl] 

README

Buildict -- Combine and reduce dictionaries through affixes.

  Copyright (c) 1993 Martin Boyer and Hydro-Quebec.

    This program is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program; if not, write to the Free Software
    Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.


Buildict is a perl script that was written to replace the munchlist
shell script provided with ispell 3.0.9.  I had to do that because
many standard Unix text utilities can't handle 8 bit characters.  And
those who do often do it in an incompatible way (e.g. some
consider the 8th bit as a negative sign, and some don't).

I chose to use 8 bits in the dictionaries because that made the task
of the dictionary writers much easier.  Computer science is for the
benifit of users, not programmers.

Buildict is about 40% slower than munchlist (on French dictionaries,
which have a very high expansion ratio through affixes, English isn't
that bad), but the resulting dictionary is a bit smaller and, more
importantly, correct (if the input dictionaries are correct, that is).

Buildict also prints (optional) messages as it reduces the dictionary,
including hints on how to reduce the dictionary further and timing
information.

Compiling IREQ's French dictionary, buildict peaks at 25 Mbytes of
swap space and 35 Mbytes of temporary file space (not at the same
time).


Buildict supports all the options of munchlist, and more:
----------------------------------------------------------------------
This is buildict version 1.0.

buildict accepts the following arguments:
   -C                           show the copyright notice
   -c 'output affix table'      affix table to convert to
   -l 'input affix table'       affix table of the input list
   -T 'source format'           format of the input list
   -s 'suppression dictionary'  list of words to exclude
   -w 'word characters'         characters that are part of words
   -o 'output file'             where to put the result if not standard output
   -D                           keep files for debugging
   -h                           print this help message and exit
   -e                           only expand the input list
   -v                           verbose output, to help maintain dictionaries
   -V                           very verbose output, with timing
----------------------------------------------------------------------


To use buildict, first edit the first line of the script to reflect
the proper path of the perl binary on your system.  Then, read and
edit the 'Configuration' section.  In particular, pay attention to the
$LIBDIR and $BINDIR variables.  You may want to set $TMPDIR to a
directory in a file system with ample space.  Buildict requires
'ispell' and 'buildhash' from the ispell 3.0.9 distribution (i.e.
Geoff Kuenning's version).  Ispell 2 or ispell 4 WILL NOT WORK,
because they don't support affix files.  Besides, if you think you
need munchlist or buildict with anything but ispell 3.0.9, you should
reread the installation instructions.

You will also need the following Unix (or GNU) programs: 'sort', 'ln'
or 'cp', 'mv', and 'date'.


Send comments, bug reports and fixes to:

  Martin Boyer                            mboyer@ireq-robot.hydro.qc.ca
  Institut de recherche d'Hydro-Quebec    mboyer@ireq-robot.uucp
  1800, montee Ste-Julie
  Varennes (Quebec) Canada   J3X 1S1
  +1 514 652-8412

These are the contents of the former NiCE NeXT User Group NeXTSTEP/OpenStep software archive, currently hosted by Netfuture.ch.