cmudict

README

cmudict.0.1.Z

cmudict.0.2.Z

cmudict.0.3.Z

cmudict.0.4.Z

cmudict.0.4.arpabet.Z

README

Date: 11-8-95

Files: README (this file), cmudict.0.1.Z (compressed), cmulex.0.1.Z,
cmudict.0.2.Z (compressed), cmudict.0.3.Z (compressed), cmudict.0.4.Z,
cmulex.0.3.Z, cmulex.0.4.Z, phoneset.0.1, phoneset.0.3, phoneset.0.4.

This directory contains pronunciation dictionaries (cmudict.0.1.Z is
the first one we put out, cmudict.0.4.Z is the latest and most
up-to-date) containing approximately 100k words and their
transcriptions; lists of the words are in cmulex.0.[134].Z. We use
these dictionaries at Carnegie Mellon in our speech understanding
systems.

The phone set for cmudict.0.4 contains 39 phones, a list of which can be
found in phoneset.0.4.

Lexical stress is indicated by means of a numeral [012] attached to a vowel:
  0 = no stress
  1 = primary stress
  2 = secondary stress

Alternate transcriptions are identified with a numeral in parentheses as
part of the lexical entry.

We generated this dictionary using the following independent sources:
- a 20k+ general English dictionary, built by hand at Carnegie Mellon
  (extensively proofed and used).
- a 200k+ UCLA-proofed version of the shoup dictionary.
- a 32k subset of the Dragon dictionary.
- a 53k+ dictionary of proper names, synthesiser-generated, unproofed.
- a 200k dictionary generated with Orator, unproofed.
- a 200k dictionary generated with Mitalk, unproofed.

All entries that occur solely in copyrighted sources, like the Dragon
dictionary, are not currently included in this dictionary. If you have
words and transcriptions that you would like included in this unrestricted
resource, please send them to Robert L. Weide (weide@cs.cmu.edu) and we
will consider them for an upcoming version.

All of the above sources were preprocessed and the transcriptions in the
current cmudict.0.1 were selected from the transcriptions in the sources or
a combination thereof. We have removed some potentially unreliable
transcriptions from this dictionary, including those based on only one
source, and will reintroduce them once we have verified the transcriptions.

CMU does not guarantee the accuracy of this dictionary, nor its suitablity
for any specific purpose. In fact, we expect a number of errors, omissions
and inconsistencies to remain in the current result. We intend to
continually update the dictionary as we make progress in correcting them.
We will make subsequent versions available via anonymous ftp, and those
who would like notification when updated versions are available should
send email to weide@cs.cmu.edu.

We welcome input from users: send e-mail to Robert L. Weide
(weide@cs.cmu.edu) if you have comments and suggestions on the content
of the dictionary.

The Carnegie Mellon Pronouncing Dictionary [cmudict.0.4 and all previous
versions] is Copyright 1993, 1994, and 1995 by Carnegie Mellon University.
Use of this dictionary for any research or commercial purpose is completely
unrestricted.  If you make use of or redistribute this material, we would
appreciate acknowlegement of its origin.

If you add words to or correct words in this dictionary, we would like
the additions and corrections sent to us (weide@cs) for consideration
in a subsequent version. All final entries will be approved by Robert L.
Weide, editor of the dictionary.

These are the contents of the former NiCE NeXT User Group NeXTSTEP/OpenStep software archive, currently hosted by Netfuture.ch.