This is a text to speech system produced by integrating various pieces of code and tables of data, which are all (I believe) in the public domain. The bulk of the intergration was done by myself, that is Nick Ing-Simmons. I can be reached via my employer at nik@tiuk.ti.com. THIS PACKAGE HAS NO CONNECTION WITH TEXAS INSTRUMENTS; IT IS A PRIVATE PROJECT OF MY OWN. Despite the E-mail address (which is via TI's US operation) I actually work in the UK. Ideally you should have obtained and installed GNU gdbm (I use version 1.7.3). If you have it but cannot install it see below. For best quality it is highly desirable to use one of the dictionaries suggested below. The package now uses GNU autoconf-2.0 to build a configure script. The generic install instructions are in INSTALL, but basically it works like this : configure make make check say --help say Something of your choice make -n install # see what it is going to do make install # copy program(s) to /usr/local/bin configure --help and INSTALL file explain configure options which may help. To allow the package to be built when installer cannot install the GNU gdbm package in the "normal" place you can specify a pathname to the gdbm source directory as follows : configure --with-gdbm=<path-to-gdbm> e.g. configure --with-gdbm=$HOME/gdbm Currently there are the following drivers: 1. Sun SPARCStations - written & tested by me (nik@tiuk.ti.com) on SunOS4.1.3 and Solaris2.3 2. Linux - see README.linux 3. NeXT 4. SGI - this builds on "mips-sgi-irix4.0.5H" see README.sgi for (a bit) more detail. 5. HPUX 6. Any machine for which a nas/netaudio port exists. And for which configure can find the include files and libraries. (Nas "net audio server" does for audio what X11 does for graphics it is available from ftp.x.org:/contrib/audio/nas .) Dictionaries: THIS VERSION WILL NOT USE THE SAME DICTIONARY AS PREVIOUS VERSIONS. The change was to allow at least one dictionary with a non-restrictive copyright to be used. Dictionaries convert words in "text" to phonemes in "arpabet" symbols. The arpabet symbols are then "expanded" into an ASCII representation of the IPA. The IPA representation is inherited from the "Computer Usable Version of Oxford Advanced Learners Dictionary" (CUVOLAD). The CUVOLAD was used directly by previous releases of rsynth. CUVOLAD is available from Oxford Text Archive. Dictionary databases can be built from either of two ftp'able sources: 1. The Carnegie Mellon Pronouncing Dictionary [cmudict.0.1] is Copyright 1993 by Carnegie Mellon University. Use of this dictionary, for any research or commercial purpose, is completely unrestricted. If you make use of or redistribute this material, we would appreciate acknowlegement of its origin. ftp://ftp.cs.cmu.edu:project/fgdata/dict Latest seems to be cmudict.0.3.Z 2. "beep" from ftp://svr-ftp.eng.cam.ac.uk/comp.speech/data Latest seems to be beep-0.4.tar.gz This is a direct desendant of CUVOLAD (british pronounciation) (as used by previous releases of rsynth), and so has a more restrictive copyright than CMU dictionary. dict.c looks for bDict.db by default. b is for british e.g. beep I use aDict.db for CMU (american) dictionary. You can then : say -d a schedule # sked... say -d b schedule # shed... It is simplest to obtain dictionaries prior to configuring the package and tell it where the source are at configure time: configure --with-aDict=../dict/cmudict.0.3 --with-bDict=../dict/beep-0.4 If you have already built/installed the package you can gdbm from it as follows: mkdictdb main-dictionary-file bDict.db mv bDict.db /usr/local/lib Expect a few messages from mkdictdb about words it does not like in either dictionary. It should not be too hard to port it to other hardware. For a discussion of these issues see PORTING. Use say --help to get a list of command line options. SPARCStation-10 can play audio at rates other than 8000Hz, so if -r is used with an acceptable rate it still plays. If you have '10 then "man 4 dbri" explains legal rates. The components (top down ) : say.c / say.h C main() function. Initializes lower layers and then converts words from command line or "stdin" to phonemes. Some "normalization" of the text is performed, in particular numbers can be represented as sequences of digits. dict.c / dict.h As of this release uses a GNU "gdbm" database which has been pre-loaded with a pronounciation dictionary. text.c / english.c / text.h An implementation of US Naval Research Laboratory rules for converting english (american?) text to phonemes. Based on the version on the comp.speech archives, main changes were in the encoding of the phonemes from the so called "arpabet" to a more concise form used in the above dictionary. This form (which is nmemonic if you know the International Phonetic Alphabet), is described in the dictionary documentation. It is also very close to that described in the postings by Evan Kirshenbaum (evan@hplerk.hpl.hp.com) to sci.lang and alt.usage.english. (The differences are in the vowels and are probably due to the differences between Britsh and American english). saynum.c Code for "saying" numbers derived from same source as above. It has been modified to call the higher level routines recursively rather producing phonemes directly. This will allow any systematic changes (e.g. British vs American switch) to affect numbers without having to change this module. holmes.c / holmes.h / elements.c / elements.def My implementation of a phoneme to "vocal tract parameters" system described by Holmes et. al. [1] The original used an Analogue Hardware synthesizer. nsynth.c / nsynth.h / def_pars.c My recoding of the version of the "Klatt" synthesizer, described in Klatt [2]. I obtained C source code from Jon Iles who had modified the version originally posted to "comp.speech". I have extensively re-coded it in my C style as opposed to Klatt's "original" which showed its FORTRAN ancestry. In my (non-expert) opinion, the changes are extensive enough to avoid any copyright on the original. Only as small subset of the functionality of the synthesizer is used by the "holmes.c" driver. hplay.c / hplay.h hplay.h describes a common interface. hplay.c is a link to play/xxxplay.c Acknowledgements : Particular thanks to Tony Robinson ajr@eng.cam.ac.uk for providing FTP site for alpha testing, and telnet access to a variety of machines. Many thanks to Axel Belinfante Axel.Belinfante@cs.utwente.nl (World Wide Web) Jon Iles J.P.Iles@cs.bham.ac.uk Rob Hooft hooft@EMBL-Heidelberg.de (linux stuff) Thierry Excoffier exco@ligiahp3.univ-lyon1.fr (playpipe for hpux) Markus Gyger mgyger@itr.ch (HPUX port) Ben Stuyts ben@stuyts.nl (NeXT port) Stephen Hocking <sysseh@devetir.qld.gov.au> (Preliminary Netaudio port) Greg Renda <greg@ncd.com> (Netaudio cleanup) Tracey Bernath <bernath@bnr.ca> (Netaudio testing) "Tom Benoist" <ben@ifx.com> (SGI Port) Andrew Anselmo <anselmo@ERXSG.rl.plh.af.mil> (SGI testing) Mark Hanning-Lee <markhl@iris-355.jpl.nasa.gov> (SGI testing) for assisting me in puting this package together. References : [1] Holmes J. N., Mattingly I, and Shearme J. (1964) "Speech Synthesis by Rule" , Language Speech 7, 127-143 [2] Dennis H. Klatt (1980) "Software for a Cascade/Parallel Formant Synthesizer", J. Acoust. Soc. Am. 67(3), March 1980. Sources : OXFORD TEXT ARCHIVE The Oxford Text Archive has for several years maintained copies of several machine-readable dictionaries along with its extensive (if unsystematic) collections of other machine-readable texts. This document gives some further details of the various dictionaries available, and summarises the conditions under which copies of them are currently distributed. The Oxford Text Archive Shortlist (available on request via electronic mail and by FTP) gives up to date brief details of all texts held in the Archive. Send electronic mail to ARCHIVE@VAX.OXFORD.AC.UK. For anonymous FTP, look in the directory ota on ota.ox.ac.uk (129.67.1.165) Internet newsgroups : comp.speech and its archive on svr-ftp.eng.cam.ac.uk (Many of starting point sources). sci.lang (For ASCII IPA)
{\rtf0\ansi{\fonttbl\f0\fswiss Helvetica;\f1\fmodern Courier;\f2\fmodern Ohlfs;\f3\ftech Symbol;} \paperw8920 \paperh9800 \margl680 \margr780 {\colortbl;\red17\green0\blue153;\red68\green85\blue255;\red0\green0\blue0;} \pard\tx520\tx1060\tx1600\tx2120\tx2660\tx3200\tx3720\tx4260\tx4800\tx5320\f0\b\i0\ulnone\fs24\gray85\fc1\cf1 Contents: Other Documents:\ \gray0\fc0\cf0 \ {{\NeXTHelpLink34 \markername ;\linkFilename README.NeXT;\linkMarkername introduction;} ¬}\pard\tx520\tx1060\tx1600\tx2120\tx2660\tx3200\tx3720\tx4260\tx4800\tx5320\f0\b\i0\ulnone\fs24\fc0\cf0 Introduction {{\NeXTHelpLink54 \markername ;\linkFilename README;\linkMarkername ;} ¬}\pard\tx520\tx1060\tx1600\tx2120\tx2660\tx3200\tx3720\tx4260\tx4800\tx5320\f0\b\i0\ulnone\fs24\fc0\cf0 README\ {{\NeXTHelpLink63 \markername ;\linkFilename README.NeXT;\linkMarkername gettingstarted;} ¬}\pard\tx520\tx1060\tx1600\tx2120\tx2660\tx3200\tx3720\tx4260\tx4800\tx5320\f0\b\i0\ulnone\fs24\fc0\cf0 Getting started {{\NeXTHelpLink86 \markername ;\linkFilename INSTALL;\linkMarkername ;} ¬}\pard\tx520\tx1060\tx1600\tx2120\tx2660\tx3200\tx3720\tx4260\tx4800\tx5320\f0\b\i0\ulnone\fs24\fc0\cf0 INSTALL\ {{\NeXTHelpLink96 \markername ;\linkFilename README.NeXT;\linkMarkername asummary;} ¬}\pard\tx520\tx1060\tx1600\tx2120\tx2660\tx3200\tx3720\tx4260\tx4800\tx5320\f0\b\i0\ulnone\fs24\fc0\cf0 A summary {{\NeXTHelpLink113 \markername ;\linkFilename Changes;\linkMarkername ;} ¬}\pard\tx520\tx1060\tx1600\tx2120\tx2660\tx3200\tx3720\tx4260\tx4800\tx5320\f0\b\i0\ulnone\fs24\fc0\cf0 Changes\ {{\NeXTHelpLink123 \markername ;\linkFilename README.NeXT;\linkMarkername dictionary;} ¬}\pard\tx520\tx1060\tx1600\tx2120\tx2660\tx3200\tx3720\tx4260\tx4800\tx5320\f0\b\i0\ulnone\fs24\fc0\cf0 Dictionary \ {{\NeXTHelpLink143 \markername ;\linkFilename README.NeXT;\linkMarkername services;} ¬}\pard\tx520\tx1060\tx1600\tx2120\tx2660\tx3200\tx3720\tx4260\tx4800\tx5320\f0\b\i0\ulnone\fs24\fc0\cf0 Services\ {{\NeXTHelpLink154 \markername ;\linkFilename README.NeXT;\linkMarkername knownbugs;} ¬}\pard\tx520\tx1060\tx1600\tx2120\tx2660\tx3200\tx3720\tx4260\tx4800\tx5320\f0\b\i0\ulnone\fs24\fc0\cf0 Known bugs\ {{\NeXTHelpLink167 \markername ;\linkFilename README.NeXT;\linkMarkername history;} ¬}\pard\tx520\tx1060\tx1600\tx2120\tx2660\tx3200\tx3720\tx4260\tx4800\tx5320\f0\b\i0\ulnone\fs24\fc0\cf0 History\ \b0\gray386\fc2\cf2 _____________________________________________________\ \b\gray0\fc0\cf0 \ {\gray85\fc1\cf1{\NeXTHelpMarker232 \markername introduction;} ¬}\pard\tx520\tx1060\tx1600\tx2120\tx2660\tx3200\tx3720\tx4260\tx4800\tx5320\f0\b\i0\ulnone\fs24\gray85\fc1\cf1 Introduction\ \b0\gray0\fc0\cf0 \ This is a text to speech system produced by integrating various pieces of code and tables of data, which are all (I believe) in the public domain.\ \ The bulk of the integration was done by Nick Ing-Simmons. See the file README for more info.\ \ The port to the NeXT was done by Ben Stuyts. (ben@stuyts.nl -- NeXT Mail Welcome.) I have only tested this on black hardware with NEXTSTEP 3.2.\ \gray386\fc2\cf2 _____________________________________________________\ \gray0\fc0\cf0 \ {\b\gray85\fc1\cf1{\NeXTHelpMarker688 \markername gettingstarted;} ¬}\pard\tx520\tx1060\tx1600\tx2120\tx2660\tx3200\tx3720\tx4260\tx4800\tx5320\f0\b\i0\ulnone\fs24\gray85\fc1\cf1 Getting started\ \b0\gray0\fc0\cf0 \ \pard\tx960\tx1920\tx2880\tx3840\tx4800\tx5760\tx6720\tx7680\tx8640\tx9600\fc0\cf0 Make sure you have the GNU dbm library installed. I have tested it with gdbm-v1.7.3.\ \ \fc3\cf3 Then type:\ \ \f1 rm -f hplay.c\ ln -s play/NeXTplay.c hplay.c\ make -f makefile.next\ \f0 \ Test the result by typing:\ \ ./ \f1 say Welcome to the NeXT world\ \pard\tx520\tx1060\tx1600\tx2120\tx2660\tx3200\tx3720\tx4260\tx4800\tx5320\f0\gray386\fc2\cf2 _____________________________________________________\ \gray0\fc0\cf0 \ {\b\gray85\fc1\cf1{\NeXTHelpMarker990 \markername asummary;} ¬}\pard\tx520\tx1060\tx1600\tx2120\tx2660\tx3200\tx3720\tx4260\tx4800\tx5320\f0\b\i0\ulnone\fs24\gray85\fc1\cf1 A summary\ \b0\gray0\fc0\cf0 \ \li540 Command line options:\ -v verbose\ \pard\tx1060\tx1600\tx2120\tx2120\tx2660\tx3200\tx3720\tx4260\tx4800\tx5320\fi-2660\li3200\fc0\cf0 -r # set the sampling rate in Hz. Default is 8 KHz.\ -q turns off warnings\ -I Impulsive source (default is "NATURAL")\ -c num-cascade Switches to CASCADE_PARALLEL with number of cascaded formants\ -F number f0_flutter value\ -f mSec-per-frame Sets frame length\ -t number voicing spectral tilt in dB, 0 to 24\ -x freq voicing fundamentel frequency\ -p file file to save holmes parameters to.\ -S number speed, default =1, larger means slower\ -K number umm...\ \pard\tx520\tx1060\tx1600\tx2120\tx2660\tx3200\tx3720\tx4260\tx4800\tx5320\li540\fc0\cf0 \ say "words words and more words".\ say "[phonemes]".\ say < file\ \ say\ type words from stdin. A dot end a sentence and starts the conversion.\ \li0 \ Don't expect too much speed: on my 25 MHz cube the generation of 8 KHz speech takes as long as the speech itself. The bottleneck seems to be in nsynth.c, where most of the computation is done in floating point.\ \gray386\fc2\cf2 _____________________________________________________\ \gray0\fc0\cf0 \ {\b\gray85\fc1\cf1{\NeXTHelpMarker1939 \markername dictionary;} ¬}\pard\tx520\tx1060\tx1600\tx2120\tx2660\tx3200\tx3720\tx4260\tx4800\tx5320\f0\b\i0\ulnone\fs24\gray85\fc1\cf1 Dictionary\ \b0\gray0\fc0\cf0 \ You can get an optional pronunciation dictionary. See file README {{\NeXTHelpLink2017 \markername ;\linkFilename README;\linkMarkername ;} ¬}\pard\tx520\tx1060\tx1600\tx2120\tx2660\tx3200\tx3720\tx4260\tx4800\tx5320\f0\b0\i0\ulnone\fs24\fc0\cf0 , section "Dictionaries" for details.\ \gray386\fc2\cf2 _____________________________________________________\ \gray0\fc0\cf0 \ {\b\gray85\fc1\cf1{\NeXTHelpMarker2111 \markername services;} ¬}\pard\tx520\tx1060\tx1600\tx2120\tx2660\tx3200\tx3720\tx4260\tx4800\tx5320\f0\b\i0\ulnone\fs24\gray85\fc1\cf1 Services\ \b0\gray0\fc0\cf0 \ Here's a Tickle-service you can use to say any text.:\ \ \f2\fs20\li540 # Begin TickleServices Version 1.01 Data\ "Menu Item" = "Tickle Services/Say";\ "Send Type" = "NXAsciiPboardType";\ "Tcl" = "\\\ # Speak the selection\ \ exec say << [pasteboard read]\ ";\ # End TickleServices Data\ \f0\fs24\li0\gray386\fc2\cf2 _____________________________________________________\ \gray0\fc0\cf0 \ {\pard\tx520\tx1060\tx1600\tx2120\tx2660\tx3200\tx3720\tx4260\b\gray85\fc1\cf1{\NeXTHelpMarker2438 \markername knownbugs;} ¬}\pard\tx520\tx1060\tx1600\tx2120\tx2660\tx3200\tx3720\tx4260\f0\b\i0\ulnone\fs24\gray85\fc1\cf1 Known bugs\ \pard\tx520\tx1060\tx1600\tx2120\tx2660\tx3200\tx3720\tx4260\tx4800\tx5320\b0\fc0\cf0 \ \pard\tx480\tx960\tx1440\tx1920\tx2400\tx2880\tx3360\tx3840\tx4320\tx4800\f3\fi-480\li480\fc0\cf0 · \f0 A fixed size buffer is used (in hplay.c) as a sound buffer. It is 1 MB, so you probably won't notice it. It might dump core on you though: This happens if you give it a very long sentence, and/or a high sample rate.\ \f3 · \f0 It doesn't compile straight away on NEXTSTEP 3.0 systems. The compiler gives errors like:\ \f2\fs20\fi-1540\li2500\fc3\cf3 cc -O2 -finline-functions -Wall -c holmes.c\ holmes.c: In function `filter':\ holmes.c:47: argument `v' doesn't match function prototype\ holmes.c:47: a formal parameter type that promotes to `double' can match only `double' in the prototype\ \f0\fs24\fi0\li480 To circumvent this, change the affected functions from:\ \f2\fs20\fi480 static float\ filter(p, v)\ filter_ptr p;\ float v;\ \f0\fs24\fi0 To:\ \f2\fs20\fi480 static float\ filter(filter_ptr p, float v)\ \pard\tx520\tx1060\tx1600\tx2120\tx2660\tx3200\tx3720\tx4260\tx4800\tx5320\f0\fs24\gray386\fc2\cf2 _____________________________________________________\ \pard\tx960\tx1920\tx2880\tx3840\tx4800\tx5760\tx6720\tx7680\tx8640\tx9600\fc0\cf0 \ {\b\gray85\fc1\cf1{\NeXTHelpMarker3207 \markername history;} ¬}\pard\tx960\tx1920\tx2880\tx3840\tx4800\tx5760\tx6720\tx7680\tx8640\tx9600\f0\b\i0\ulnone\fs24\gray85\fc1\cf1 History\ \b0\gray0\fc0\cf0 \ 22-feb-94 Ben Stuyts Initial port to NeXT.\ \fi-3840\li3840 05-mar-94 Ben Stuyts Added 3.0 fix to known bugs section.\ \fc3\cf3 06-mar-94 Ben Stuyts Fixed byte-ordering problem for Intel systems. Thanks to ugubser@avalon.unizh.ch for finding this out.\ \fi0\li0\fc0\cf0 20-sep-94 Ben Stuyts Updates for rsynth 1.0 release.\ }
To: Nick Ing-Simmons <nicki@lobby.ti.com> Subject: rsynth (in WWW) From: Axel Belinfante <Axel.Belinfante@cs.utwente.nl> Organisation: University of Twente, Dept of Informatics, Tele Informatics Group, PO Box 217, NL-7500 AE Enschede, The Netherlands Phone: +31 53 89 3774 Telefax: +31 53 333815 X-Face: 3YGZY^_!}k]>-k'9$LK?8GXbi?vs=2v*ut,/8z,z!(QNBk_>~:~"MJ_%i`sLLqGN,DGbkT@ N\jhX/jNLTz2hO_R"*RF(%bRvk+M,iU7SvVJtC*\B6Ud<7~`MGMp7rCI6LVp=%k=HE?-UCV?[p\$R? mI\n2/!#3/wZZsa[m7d;PKWiuH6'~<x[UoHs%Ei=QZA Date: Mon, 14 Feb 94 19:16:22 +0100 Sender: belinfan@cs.utwente.nl Hi, as you may have noticed, i integrated your rsynth program (v 0.9) into the World Wide Web, as it seems the best publicly available TTS translator for a unix environment. While doing so, i noticed that if rsynth is installed _without_ a dictionary file, then it can get into an endless loop when it is trying to spell something. To integrate it, i made a few modifications, as i wanted rsynth to write soundbytes to stdout, and diagnostics to stderr: - i had to comment out some code that tries to get the device output encoding configuration - i added a `-' filename flag, to indicate stdout output of sound bytes, - i made sure that all diagnostics go to stderr - the code that writes the audio header now first writes an 'unknown size', and then tries to overwrite it with the correct size, instead of initially writing a size of zero bytes. to avoid 'corrup' audio headers. Regards, Axel. <Axel.Belinfante@cs.utwente.nl> tel. +31 53 893774 fax. +31 53 333815 University of Twente, Tele-Informatics & Open Systems Group P.O. Box 217 NL-7500 AE Enschede The Netherlands "ili ne sciis ke estas neebla do ili simple faris" -- Loesje
The requirement is basically that a DSP is present which is supported by the linux sound-kit V2.0. Those include Gravis Ultrasound, Pro Audio Spectrum, and Soundblaster (Pro). The sound driver provides the programmer with a relatively device-independent way of addressing these cards. The software requirement is that the sound-kit package is compiled into the kernel. The Linux version has a lot less capabilities than the SPARC version. A summary: say -r # : set the sampling rate in Hz. say -l filename -L : output the resulting sound to a file. say "words words and more words". say "[phonemes]". Don't expect too much speed: on my 486/33 the generation of 12000 Hz speech takes as long as the speech itself. Quality goes up when the speed goes up! Rob Hooft. (hooft@EMBL-Heidelberg.DE) PS: the pronunciation of Linuxer is completely wrong. Try "hello, [lInjuks3]" instead. That is the best approximation I could find. BTW: I have been told that there still is a problem with the Linux version: the /dev/sbdsp device was only used in old days: newer systems only have the newer name /dev/dsp. Maybe you can change this in your version. Bob Blair <@ANLVM.CTD.ANL.GOV:reb@sgi3.hep.anl.gov> Had to make the following change: The problem I had was actually quite simple and I managed to fix it after looking at a file that does work fine when cat'ed to /dev/dsp (one produced by the recording utility "srec"). Maybe my installation is funny or maybe the more advanced SoundBlaster cards work differently, but for my system the thing I did was to change the line in hplay.c that read something like: converted[i] = data[i]/256 ; and change it to: converted[i] = ( data[i] - 32768 )/256; and to change the declaration of converted from "signed *char" to "unsigned *char". My system expects unsigned data oscillating about 128 not signed data oscillating about 0. The whole thing works very nicely now (I wish it was faster but a 16Mhz 386sx is a little on the low end side). I am giving you these details just in case others see the same problem and email you for a solution (also if Mr. Hooft is listening, maybe he has an insight on what or if my system is a freak in this regard).
SGI port has been tested by me (nik@tiuk.ti.com) on a machine at Cambridge via an account kindly provided by Tony Robinson. (It compiles, links, can handle say --help and write a .au file which I can play on my sun. I can't hear it as it is 25 miles away...). (Both the "gcc" and the "cc" that were available worked.) This is with some version of irix4. A number of people have had trouble with irix5.x. Attached is most "helpful" email I have seen. As I don't have even telnet access to an irix5 machine I can't comment. >cfe: Error: ...: Cannot open file audio.h for #include You have an IRIX 5.2 system without the optional Digital Media Developers Option. As of IRIX 5, the audio libraries are no longer supplied with the IDO. As of IRIX 5.3 they will be back. In the meantime, read relnotes IRIX 1 carefully, install irix4_eoe1 from the base IRIX CD and irix4_dev.* from the IDO, and compile under IRIX 4 compatibility mode. The <sys/audio.h> is misleading: it is for a 4D/20 audio system only. Walter Roberson roberson@ibd.nrc.ca
These are the contents of the former NiCE NeXT User Group NeXTSTEP/OpenStep software archive, currently hosted by Netfuture.ch.