Buildict -- Combine and reduce dictionaries through affixes. Copyright (c) 1993 Martin Boyer and Hydro-Quebec. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. Buildict is a perl script that was written to replace the munchlist shell script provided with ispell 3.0.9. I had to do that because many standard Unix text utilities can't handle 8 bit characters. And those who do often do it in an incompatible way (e.g. some consider the 8th bit as a negative sign, and some don't). I chose to use 8 bits in the dictionaries because that made the task of the dictionary writers much easier. Computer science is for the benifit of users, not programmers. Buildict is about 40% slower than munchlist (on French dictionaries, which have a very high expansion ratio through affixes, English isn't that bad), but the resulting dictionary is a bit smaller and, more importantly, correct (if the input dictionaries are correct, that is). Buildict also prints (optional) messages as it reduces the dictionary, including hints on how to reduce the dictionary further and timing information. Compiling IREQ's French dictionary, buildict peaks at 25 Mbytes of swap space and 35 Mbytes of temporary file space (not at the same time). Buildict supports all the options of munchlist, and more: ---------------------------------------------------------------------- This is buildict version 1.0. buildict accepts the following arguments: -C show the copyright notice -c 'output affix table' affix table to convert to -l 'input affix table' affix table of the input list -T 'source format' format of the input list -s 'suppression dictionary' list of words to exclude -w 'word characters' characters that are part of words -o 'output file' where to put the result if not standard output -D keep files for debugging -h print this help message and exit -e only expand the input list -v verbose output, to help maintain dictionaries -V very verbose output, with timing ---------------------------------------------------------------------- To use buildict, first edit the first line of the script to reflect the proper path of the perl binary on your system. Then, read and edit the 'Configuration' section. In particular, pay attention to the $LIBDIR and $BINDIR variables. You may want to set $TMPDIR to a directory in a file system with ample space. Buildict requires 'ispell' and 'buildhash' from the ispell 3.0.9 distribution (i.e. Geoff Kuenning's version). Ispell 2 or ispell 4 WILL NOT WORK, because they don't support affix files. Besides, if you think you need munchlist or buildict with anything but ispell 3.0.9, you should reread the installation instructions. You will also need the following Unix (or GNU) programs: 'sort', 'ln' or 'cp', 'mv', and 'date'. Send comments, bug reports and fixes to: Martin Boyer mboyer@ireq-robot.hydro.qc.ca Institut de recherche d'Hydro-Quebec mboyer@ireq-robot.uucp 1800, montee Ste-Julie Varennes (Quebec) Canada J3X 1S1 +1 514 652-8412
These are the contents of the former NiCE NeXT User Group NeXTSTEP/OpenStep software archive, currently hosted by Netfuture.ch.