This is Info file ../../info/lispref.info, produced by Makeinfo version 1.68 from the input file lispref.texi. Edition History: GNU Emacs Lisp Reference Manual Second Edition (v2.01), May 1993 GNU Emacs Lisp Reference Manual Further Revised (v2.02), August 1993 Lucid Emacs Lisp Reference Manual (for 19.10) First Edition, March 1994 XEmacs Lisp Programmer's Manual (for 19.12) Second Edition, April 1995 GNU Emacs Lisp Reference Manual v2.4, June 1995 XEmacs Lisp Programmer's Manual (for 19.13) Third Edition, July 1995 XEmacs Lisp Reference Manual (for 19.14 and 20.0) v3.1, March 1996 XEmacs Lisp Reference Manual (for 19.15 and 20.1, 20.2) v3.2, April, May 1997 Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995 Free Software Foundation, Inc. Copyright (C) 1994, 1995 Sun Microsystems, Inc. Copyright (C) 1995, 1996 Ben Wing. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the Foundation. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided also that the section entitled "GNU General Public License" is included exactly as in the original, and provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that the section entitled "GNU General Public License" may be included in a translation approved by the Free Software Foundation instead of in the original English.  File: lispref.info, Node: Server Data, Next: Grabs, Prev: Resources, Up: X Server Data about the X Server ----------------------- This section describes functions and a variable that you can use to get information about the capabilities and origin of the X server corresponding to a particular device. The device argument is generally optional and defaults to the selected device. - Function: x-server-version &optional DEVICE This function returns the list of version numbers of the X server DEVICE is on. The returned value is a list of three integers: the major and minor version numbers of the X protocol in use, and the vendor-specific release number. - Function: x-server-vendor &optional DEVICE This function returns the vendor supporting the X server DEVICE is on. - Function: x-display-visual-class &optional DEVICE This function returns the visual class of the display DEVICE is on. The value is one of the symbols `static-gray', `gray-scale', `static-color', `pseudo-color', `true-color', and `direct-color'. (Note that this is different from previous versions of XEmacs, which returned `StaticGray', `GrayScale', etc.)  File: lispref.info, Node: Grabs, Prev: Server Data, Up: X Server Restricting Access to the Server by Other Apps ---------------------------------------------- - Function: x-grab-keyboard &optional DEVICE This function grabs the keyboard on the given device (defaulting to the selected one). So long as the keyboard is grabbed, all keyboard events will be delivered to XEmacs - it is not possible for other X clients to eavesdrop on them. Ungrab the keyboard with `x-ungrab-keyboard' (use an `unwind-protect'). Returns `t' if the grab was successful; `nil' otherwise. - Function: x-ungrab-keyboard &optional DEVICE This function releases a keyboard grab made with `x-grab-keyboard'. - Function: x-grab-pointer &optional DEVICE CURSOR IGNORE-KEYBOARD This function grabs the pointer and restricts it to its current window. If optional DEVICE argument is `nil', the selected device will be used. If optional CURSOR argument is non-`nil', change the pointer shape to that until `x-ungrab-pointer' is called (it should be an object returned by the `make-cursor' function). If the second optional argument IGNORE-KEYBOARD is non-`nil', ignore all keyboard events during the grab. Returns `t' if the grab is successful, `nil' otherwise. - Function: x-ungrab-pointer &optional DEVICE This function releases a pointer grab made with `x-grab-pointer'. If optional first arg DEVICE is `nil' the selected device is used. If it is `t' the pointer will be released on all X devices.  File: lispref.info, Node: X Miscellaneous, Prev: X Server, Up: X-Windows Miscellaneous X Functions and Variables ======================================= - Variable: x-bitmap-file-path This variable holds a list of the directories in which X bitmap files may be found. If `nil', this is initialized from the `"*bitmapFilePath"' resource. This is used by the `make-image-instance' function (however, note that if the environment variable `XBMLANGPATH' is set, it is consulted first). - Variable: x-library-search-path This variable holds the search path used by `read-color' to find `rgb.txt'. - Function: x-valid-keysym-name-p KEYSYM This function returns true if KEYSYM names a keysym that the X library knows about. Valid keysyms are listed in the files `/usr/include/X11/keysymdef.h' and in `/usr/lib/X11/XKeysymDB', or whatever the equivalents are on your system. - Function: x-window-id &optional FRAME This function returns the ID of the X11 window. This gives us a chance to manipulate the Emacs window from within a different program. Since the ID is an unsigned long, we return it as a string. - Variable: x-allow-sendevents If non-`nil', synthetic events are allowed. `nil' means they are ignored. Beware: allowing XEmacs to process SendEvents opens a big security hole. - Function: x-debug-mode ARG &optional DEVICE With a true arg, make the connection to the X server synchronous. With false, make it asynchronous. Synchronous connections are much slower, but are useful for debugging. (If you get X errors, make the connection synchronous, and use a debugger to set a breakpoint on `x_error_handler'. Your backtrace of the C stack will now be useful. In asynchronous mode, the stack above `x_error_handler' isn't helpful because of buffering.) If DEVICE is not specified, the selected device is assumed. Calling this function is the same as calling the C function `XSynchronize', or starting the program with the `-sync' command line argument. - Variable: x-debug-events If non-zero, debug information about events that XEmacs sees is displayed. Information is displayed on stderr. Currently defined values are: * 1 == non-verbose output * 2 == verbose output  File: lispref.info, Node: ToolTalk Support, Next: Internationalization, Prev: X-Windows, Up: Top ToolTalk Support **************** * Menu: * XEmacs ToolTalk API Summary:: * Sending Messages:: * Receiving Messages::  File: lispref.info, Node: XEmacs ToolTalk API Summary, Next: Sending Messages, Up: ToolTalk Support XEmacs ToolTalk API Summary =========================== The XEmacs Lisp interface to ToolTalk is similar, at least in spirit, to the standard C ToolTalk API. Only the message and pattern parts of the API are supported at present; more of the API could be added if needed. The Lisp interface departs from the C API in a few ways: * ToolTalk is initialized automatically at XEmacs startup-time. Messages can only be sent other ToolTalk applications connected to the same X11 server that XEmacs is running on. * There are fewer entry points; polymorphic functions with keyword arguments are used instead. * The callback interface is simpler and marginally less functional. A single callback may be associated with a message or a pattern; the callback is specified with a Lisp symbol (the symbol should have a function binding). * The session attribute for messages and patterns is always initialized to the default session. * Anywhere a ToolTalk enum constant, e.g. `TT_SESSION', is valid, one can substitute the corresponding symbol, e.g. `'TT_SESSION'. This simplifies building lists that represent messages and patterns.  File: lispref.info, Node: Sending Messages, Next: Receiving Messages, Prev: XEmacs ToolTalk API Summary, Up: ToolTalk Support Sending Messages ================ * Menu: * Example of Sending Messages:: * Elisp Interface for Sending Messages::  File: lispref.info, Node: Example of Sending Messages, Next: Elisp Interface for Sending Messages, Up: Sending Messages Example of Sending Messages --------------------------- Here's a simple example that sends a query to another application and then displays its reply. Both the query and the reply are stored in the first argument of the message. (defun tooltalk-random-query-handler (msg) (let ((state (get-tooltalk-message-attribute msg 'state))) (cond ((eq state 'TT_HANDLED) (message (get-tooltalk-message-attribute msg arg_val 0))) ((memq state '(TT_FAILED TT_REJECTED)) (message "Random query turns up nothing"))))) (defvar random-query-message '( class TT_REQUEST scope TT_SESSION address TT_PROCEDURE op "random-query" args '((TT_INOUT "?" "string")) callback tooltalk-random-query-handler)) (let ((m (make-tooltalk-message random-query-message))) (send-tooltalk-message m))  File: lispref.info, Node: Elisp Interface for Sending Messages, Prev: Example of Sending Messages, Up: Sending Messages Elisp Interface for Sending Messages ------------------------------------ - Function: make-tooltalk-message ATTRIBUTES Create a ToolTalk message and initialize its attributes. The value of ATTRIBUTES must be a list of alternating keyword/values, where keywords are symbols that name valid message attributes. For example: (make-tooltalk-message '(class TT_NOTICE scope TT_SESSION address TT_PROCEDURE op "do-something" args ("arg1" 12345 (TT_INOUT "arg3" "string")))) Values must always be strings, integers, or symbols that represent ToolTalk constants. Attribute names are the same as those supported by `set-tooltalk-message-attribute', plus `args'. The value of `args' should be a list of message arguments where each message argument has the following form: `(mode [value [type]])' or just `value' Where MODE is one of `TT_IN', `TT_OUT', or `TT_INOUT' and TYPE is a string. If TYPE isn't specified then `int' is used if VALUE is a number; otherwise `string' is used. If TYPE is `string' then VALUE is converted to a string (if it isn't a string already) with `prin1-to-string'. If only a value is specified then MODE defaults to `TT_IN'. If MODE is `TT_OUT' then VALUE and TYPE don't need to be specified. You can find out more about the semantics and uses of ToolTalk message arguments in chapter 4 of the `ToolTalk Programmer's Guide'. - Function: send-tooltalk-message MSG Send the message on its way. Once the message has been sent it's almost always a good idea to get rid of it with `destroy-tooltalk-message'. - Function: return-tooltalk-message MSG &optional MODE Send a reply to this message. The second argument can be `reply', `reject' or `fail'; the default is `reply'. Before sending a reply, all message arguments whose mode is `TT_INOUT' or `TT_OUT' should have been filled in - see `set-tooltalk-message-attribute'. - Function: get-tooltalk-message-attribute MSG ATTRIBUTE &optional ARGN Returns the indicated ToolTalk message attribute. Attributes are identified by symbols with the same name (underscores and all) as the suffix of the ToolTalk `tt_message_' function that extracts the value. String attribute values are copied and enumerated type values (except disposition) are converted to symbols; e.g. `TT_HANDLER' is `'TT_HANDLER', `uid' and `gid' are represented by fixnums (small integers), `opnum' is converted to a string, and `disposition' is converted to a fixnum. We convert `opnum' (a C int) to a string (e.g. `123' => `"123"') because there's no guarantee that opnums will fit within the range of XEmacs Lisp integers. [TBD] Use the `plist' attribute instead of C API `user' attribute for user-defined message data. To retrieve the value of a message property, specify the indicator for ARGN. For example, to get the value of a property called `rflag', use (get-tooltalk-message-attribute msg 'plist 'rflag) To get the value of a message argument use one of the `arg_val' (strings), `arg_ival' (integers), or `arg_bval' (strings with embedded nulls), attributes. For example, to get the integer value of the third argument: (get-tooltalk-message-attribute msg 'arg_ival 2) As you can see, argument numbers are zero-based. The type of each arguments can be retrieved with the `arg_type' attribute; however ToolTalk doesn't define any semantics for the string value of `arg_type'. Conventionally `string' is used for strings and `int' for 32 bit integers. Note that XEmacs Lisp stores the lengths of strings explicitly (unlike C) so treating the value returned by `arg_bval' like a string is fine. - Function: set-tooltalk-message-attribute VALUE MSG ATTRIBUTE &optional ARGN Initialize one ToolTalk message attribute. Attribute names and values are the same as for `get-tooltalk-message-attribute'. A property list is provided for user data (instead of the `user' message attribute); see `get-tooltalk-message-attribute'. Callbacks are handled slightly differently than in the C ToolTalk API. The value of CALLBACK should be the name of a function of one argument. It will be called each time the state of the message changes. This is usually used to notice when the message's state has changed to `TT_HANDLED' (or `TT_FAILED'), so that reply argument values can be used. If one of the argument attributes is specified as `arg_val', `arg_ival', or `arg_bval', then ARGN must be the number of an already created argument. Arguments can be added to a message with `add-tooltalk-message-arg'. - Function: add-tooltalk-message-arg MSG MODE TYPE &optional VALUE Append one new argument to the message. MODE must be one of `TT_IN', `TT_INOUT', or `TT_OUT', TYPE must be a string, and VALUE can be a string or an integer. ToolTalk doesn't define any semantics for TYPE, so only the participants in the protocol you're using need to agree what types mean (if anything). Conventionally `string' is used for strings and `int' for 32 bit integers. Arguments can initialized by providing a value or with `set-tooltalk-message-attribute'; the latter is necessary if you want to initialize the argument with a string that can contain embedded nulls (use `arg_bval'). - Function: create-tooltalk-message Create a new ToolTalk message. The message's session attribute is initialized to the default session. Other attributes can be intialized with `set-tooltalk-message-attribute'. `make-tooltalk-message' is the preferred way to create and initialize a message. - Function: destroy-tooltalk-message MSG Apply `tt_message_destroy' to the message. It's not necessary to destroy messages after they've been processed by a message or pattern callback, the Lisp/ToolTalk callback machinery does this for you.  File: lispref.info, Node: Receiving Messages, Prev: Sending Messages, Up: ToolTalk Support Receiving Messages ================== * Menu: * Example of Receiving Messages:: * Elisp Interface for Receiving Messages::  File: lispref.info, Node: Example of Receiving Messages, Next: Elisp Interface for Receiving Messages, Up: Receiving Messages Example of Receiving Messages ----------------------------- Here's a simple example of a handler for a message that tells XEmacs to display a string in the mini-buffer area. The message operation is called `emacs-display-string'. Its first (0th) argument is the string to display. (defun tooltalk-display-string-handler (msg) (message (get-tooltalk-message-attribute msg 'arg_val 0))) (defvar display-string-pattern '(category TT_HANDLE scope TT_SESSION op "emacs-display-string" callback tooltalk-display-string-handler)) (let ((p (make-tooltalk-pattern display-string-pattern))) (register-tooltalk-pattern p))  File: lispref.info, Node: Elisp Interface for Receiving Messages, Prev: Example of Receiving Messages, Up: Receiving Messages Elisp Interface for Receiving Messages -------------------------------------- - Function: make-tooltalk-pattern ATTRIBUTES Create a ToolTalk pattern and initialize its attributes. The value of attributes must be a list of alternating keyword/values, where keywords are symbols that name valid pattern attributes or lists of valid attributes. For example: (make-tooltalk-pattern '(category TT_OBSERVE scope TT_SESSION op ("operation1" "operation2") args ("arg1" 12345 (TT_INOUT "arg3" "string")))) Attribute names are the same as those supported by `add-tooltalk-pattern-attribute', plus `'args'. Values must always be strings, integers, or symbols that represent ToolTalk constants or lists of same. When a list of values is provided all of the list elements are added to the attribute. In the example above, messages whose `op' attribute is `"operation1"' or `"operation2"' would match the pattern. The value of ARGS should be a list of pattern arguments where each pattern argument has the following form: `(mode [value [type]])' or just `value' Where MODE is one of `TT_IN', `TT_OUT', or `TT_INOUT' and TYPE is a string. If TYPE isn't specified then `int' is used if VALUE is a number; otherwise `string' is used. If TYPE is `string' then VALUE is converted to a string (if it isn't a string already) with `prin1-to-string'. If only a value is specified then MODE defaults to `TT_IN'. If MODE is `TT_OUT' then VALUE and TYPE don't need to be specified. You can find out more about the semantics and uses of ToolTalk pattern arguments in chapter 3 of the `ToolTalk Programmer's Guide'. - Function: register-tooltalk-pattern PAT XEmacs will begin receiving messages that match this pattern. - Function: unregister-tooltalk-pattern PAT XEmacs will stop receiving messages that match this pattern. - Function: add-tooltalk-pattern-attribute VALUE PAT INDICATOR Add one value to the indicated pattern attribute. The names of attributes are the same as the ToolTalk accessors used to set them less the `tooltalk_pattern_' prefix and the `_add' suffix. For example, the name of the attribute for the `tt_pattern_disposition_add' attribute is `disposition'. The `category' attribute is handled specially, since a pattern can only be a member of one category (`TT_OBSERVE' or `TT_HANDLE'). Callbacks are handled slightly differently than in the C ToolTalk API. The value of CALLBACK should be the name of a function of one argument. It will be called each time the pattern matches an incoming message. - Function: add-tooltalk-pattern-arg PAT MODE TYPE VALUE Add one fully-specified argument to a ToolTalk pattern. MODE must be one of `TT_IN', `TT_INOUT', or `TT_OUT'. TYPE must be a string. VALUE can be an integer, string or `nil'. If VALUE is an integer then an integer argument (`tt_pattern_iarg_add') is added; otherwise a string argument is added. At present there's no way to add a binary data argument. - Function: create-tooltalk-pattern Create a new ToolTalk pattern and initialize its session attribute to be the default session. - Function: destroy-tooltalk-pattern PAT Apply `tt_pattern_destroy' to the pattern. This effectively unregisters the pattern. - Function: describe-tooltalk-message MSG &optional STREAM Print the message's attributes and arguments to STREAM. This is often useful for debugging.  File: lispref.info, Node: Internationalization, Next: MULE, Prev: ToolTalk Support, Up: Top Internationalization ******************** * Menu: * I18N Levels 1 and 2:: Support for different time, date, and currency formats. * I18N Level 3:: Support for localized messages. * I18N Level 4:: Support for Asian languages.  File: lispref.info, Node: I18N Levels 1 and 2, Next: I18N Level 3, Up: Internationalization I18N Levels 1 and 2 =================== XEmacs is now compliant with I18N levels 1 and 2. Specifically, this means that it is 8-bit clean and correctly handles time and date functions. XEmacs will correctly display the entire ISO-Latin 1 character set. The compose key may now be used to create any character in the ISO-Latin 1 character set not directly available via the keyboard.. In order for the compose key to work it is necessary to load the file `x-compose.el'. At any time while composing a character, `C-h' will display all valid completions and the character which would be produced.  File: lispref.info, Node: I18N Level 3, Next: I18N Level 4, Prev: I18N Levels 1 and 2, Up: Internationalization I18N Level 3 ============ * Menu: * Level 3 Basics:: * Level 3 Primitives:: * Dynamic Messaging:: * Domain Specification:: * Documentation String Extraction::  File: lispref.info, Node: Level 3 Basics, Next: Level 3 Primitives, Up: I18N Level 3 Level 3 Basics -------------- XEmacs now provides alpha-level functionality for I18N Level 3. This means that everything necessary for full messaging is available, but not every file has been converted. The two message files which have been created are `src/emacs.po' and `lisp/packages/mh-e.po'. Both files need to be converted using `msgfmt', and the resulting `.mo' files placed in some locale's `LC_MESSAGES' directory. The test "translations" in these files are the original messages prefixed by `TRNSLT_'. The domain for a variable is stored on the variable's property list under the property name VARIABLE-DOMAIN. The function `documentation-property' uses this information when translating a variable's documentation.  File: lispref.info, Node: Level 3 Primitives, Next: Dynamic Messaging, Prev: Level 3 Basics, Up: I18N Level 3 Level 3 Primitives ------------------ - Function: gettext STRING This function looks up STRING in the default message domain and returns its translation. If `I18N3' was not enabled when XEmacs was compiled, it just returns STRING. - Function: dgettext DOMAIN STRING This function looks up STRING in the specified message domain and returns its translation. If `I18N3' was not enabled when XEmacs was compiled, it just returns STRING. - Function: bind-text-domain DOMAIN PATHNAME This function associates a pathname with a message domain. Here's how the path to message file is constructed under SunOS 5.x: `{pathname}/{LANG}/LC_MESSAGES/{domain}.mo' If `I18N3' was not enabled when XEmacs was compiled, this function does nothing. - Special Form: domain STRING This function specifies the text domain used for translating documentation strings and interactive prompts of a function. For example, write: (defun foo (arg) "Doc string" (domain "emacs-foo") ...) to specify `emacs-foo' as the text domain of the function `foo'. The "call" to `domain' is actually a declaration rather than a function; when actually called, `domain' just returns `nil'. - Function: domain-of FUNCTION This function returns the text domain of FUNCTION; it returns `nil' if it is the default domain. If `I18N3' was not enabled when XEmacs was compiled, it always returns `nil'.  File: lispref.info, Node: Dynamic Messaging, Next: Domain Specification, Prev: Level 3 Primitives, Up: I18N Level 3 Dynamic Messaging ----------------- The `format' function has been extended to permit you to change the order of parameter insertion. For example, the conversion format `%1$s' inserts parameter one as a string, while `%2$s' inserts parameter two. This is useful when creating translations which require you to change the word order.  File: lispref.info, Node: Domain Specification, Next: Documentation String Extraction, Prev: Dynamic Messaging, Up: I18N Level 3 Domain Specification -------------------- The default message domain of XEmacs is `emacs'. For add-on packages, it is best to use a different domain. For example, let us say we want to convert the "gorilla" package to use the domain `emacs-gorilla'. To translate the message "What gorilla?", use `dgettext' as follows: (dgettext "emacs-gorilla" "What gorilla?") A function (or macro) which has a documentation string or an interactive prompt needs to be associated with the domain in order for the documentation or prompt to be translated. This is done with the `domain' special form as follows: (defun scratch (location) "Scratch the specified location." (domain "emacs-gorilla") (interactive "sScratch: ") ... ) It is most efficient to specify the domain in the first line of the function body, before the `interactive' form. For variables and constants which have documentation strings, specify the domain after the documentation. - Special Form: defvar SYMBOL [VALUE [DOC-STRING [DOMAIN]]] Example: (defvar weight 250 "Weight of gorilla, in pounds." "emacs-gorilla") - Special Form: defconst SYMBOL [VALUE [DOC-STRING [DOMAIN]]] Example: (defconst limbs 4 "Number of limbs" "emacs-gorilla") Autoloaded functions which are specified in `loaddefs.el' do not need to have a domain specification, because their documentation strings are extracted into the main message base. However, for autoloaded functions which are specified in a separate package, use following syntax: - Function: autoload SYMBOL FILENAME &optional DOCSTRING INTERACTIVE MACRO DOMAIN Example: (autoload 'explore "jungle" "Explore the jungle." nil nil "emacs-gorilla")  File: lispref.info, Node: Documentation String Extraction, Prev: Domain Specification, Up: I18N Level 3 Documentation String Extraction ------------------------------- The utility `etc/make-po' scans the file `DOC' to extract documentation strings and creates a message file `doc.po'. This file may then be inserted within `emacs.po'. Currently, `make-po' is hard-coded to read from `DOC' and write to `doc.po'. In order to extract documentation strings from an add-on package, first run `make-docfile' on the package to produce the `DOC' file. Then run `make-po -p' with the `-p' argument to indicate that we are extracting documentation for an add-on package. (The `-p' argument is a kludge to make up for a subtle difference between pre-loaded documentation and add-on documentation: For add-on packages, the final carriage returns in the strings produced by `make-docfile' must be ignored.)  File: lispref.info, Node: I18N Level 4, Prev: I18N Level 3, Up: Internationalization I18N Level 4 ============ The Asian-language support in XEmacs is called "MULE". *Note MULE::.  File: lispref.info, Node: MULE, Next: Tips, Prev: Internationalization, Up: Top MULE **** "MULE" is the name originally given to the version of GNU Emacs extended for multi-lingual (and in particular Asian-language) support. "MULE" is short for "MUlti-Lingual Emacs". It was originally called Nemacs ("Nihon Emacs" where "Nihon" is the Japanese word for "Japan"), when it only provided support for Japanese. XEmacs refers to its multi-lingual support as "MULE support" since it is based on "MULE". * Menu: * Internationalization Terminology:: Definition of various internationalization terms. * Charsets:: Sets of related characters. * MULE Characters:: Working with characters in XEmacs/MULE. * Composite Characters:: Making new characters by overstriking other ones. * ISO 2022:: An international standard for charsets and encodings. * Coding Systems:: Ways of representing a string of chars using integers. * CCL:: A special language for writing fast converters. * Category Tables:: Subdividing charsets into groups.  File: lispref.info, Node: Internationalization Terminology, Next: Charsets, Up: MULE Internationalization Terminology ================================ In internationalization terminology, a string of text is divided up into "characters", which are the printable units that make up the text. A single character is (for example) a capital `A', the number `2', a Katakana character, a Kanji ideograph (an "ideograph" is a "picture" character, such as is used in Japanese Kanji, Chinese Hanzi, and Korean Hangul; typically there are thousands of such ideographs in each language), etc. The basic property of a character is its shape. Note that the same character may be drawn by two different people (or in two different fonts) in slightly different ways, although the basic shape will be the same. In some cases, the differences will be significant enough that it is actually possible to identify two or more distinct shapes that both represent the same character. For example, the lowercase letters `a' and `g' each have two distinct possible shapes - the `a' can optionally have a curved tail projecting off the top, and the `g' can be formed either of two loops, or of one loop and a tail hanging off the bottom. Such distinct possible shapes of a character are called "glyphs". The important characteristic of two glyphs making up the same character is that the choice between one or the other is purely stylistic and has no linguistic effect on a word (this is the reason why a capital `A' and lowercase `a' are different characters rather than different glyphs - e.g. `Aspen' is a city while `aspen' is a kind of tree). Note that "character" and "glyph" are used differently here than elsewhere in XEmacs. A "character set" is simply a set of related characters. ASCII, for example, is a set of 94 characters (or 128, if you count non-printing characters). Other character sets are ISO8859-1 (ASCII plus various accented characters and other international symbols), JISX0201 (ASCII, more or less, plus half-width Katakana), JISX0208 (Japanese Kanji), JISX0212 (a second set of less-used Japanese Kanji), GB2312 (Mainland Chinese Hanzi), etc. Every character set has one or more "orderings", which can be viewed as a way of assigning a number (or set of numbers) to each character in the set. For most character sets, there is a standard ordering, and in fact all of the character sets mentioned above define a particular ordering. ASCII, for example, places letters in their "natural" order, puts uppercase letters before lowercase letters, numbers before letters, etc. Note that for many of the Asian character sets, there is no natural ordering of the characters. The actual orderings are based on one or more salient characteristic, of which there are many to choose from - e.g. number of strokes, common radicals, phonetic ordering, etc. The set of numbers assigned to any particular character are called the character's "position codes". The number of position codes required to index a particular character in a character set is called the "dimension" of the character set. ASCII, being a relatively small character set, is of dimension one, and each character in the set is indexed using a single position code, in the range 0 through 127 (if non-printing characters are included) or 33 through 126 (if only the printing characters are considered). JISX0208, i.e. Japanese Kanji, has thousands of characters, and is of dimension two - every character is indexed by two position codes, each in the range 33 through 126. (Note that the choice of the range here is somewhat arbitrary. Although a character set such as JISX0208 defines an *ordering* of all its characters, it does not define the actual mapping between numbers and characters. You could just as easily index the characters in JISX0208 using numbers in the range 0 through 93, 1 through 94, 2 through 95, etc. The reason for the actual range chosen is so that the position codes match up with the actual values used in the common encodings.) An "encoding" is a way of numerically representing characters from one or more character sets into a stream of like-sized numerical values called "words"; typically these are 8-bit, 16-bit, or 32-bit quantities. If an encoding encompasses only one character set, then the position codes for the characters in that character set could be used directly. (This is the case with ASCII, and as a result, most people do not understand the difference between a character set and an encoding.) This is not possible, however, if more than one character set is to be used in the encoding. For example, printed Japanese text typically requires characters from multiple character sets - ASCII, JISX0208, and JISX0212, to be specific. Each of these is indexed using one or more position codes in the range 33 through 126, so the position codes could not be used directly or there would be no way to tell which character was meant. Different Japanese encodings handle this differently - JIS uses special escape characters to denote different character sets; EUC sets the high bit of the position codes for JISX0208 and JISX0212, and puts a special extra byte before each JISX0212 character; etc. (JIS, EUC, and most of the other encodings you will encounter are 7-bit or 8-bit encodings. There is one common 16-bit encoding, which is Unicode; this strives to represent all the world's characters in a single large character set. 32-bit encodings are generally used internally in programs to simplify the code that manipulates them; however, they are not much used externally because they are not very space-efficient.) Encodings are classified as either "modal" or "non-modal". In a "modal encoding", there are multiple states that the encoding can be in, and the interpretation of the values in the stream depends on the current global state of the encoding. Special values in the encoding, called "escape sequences", are used to change the global state. JIS, for example, is a modal encoding. The bytes `ESC $ B' indicate that, from then on, bytes are to be interpreted as position codes for JISX0208, rather than as ASCII. This effect is cancelled using the bytes `ESC ( B', which mean "switch from whatever the current state is to ASCII". To switch to JISX0212, the escape sequence `ESC $ ( D'. (Note that here, as is common, the escape sequences do in fact begin with `ESC'. This is not necessarily the case, however.) A "non-modal encoding" has no global state that extends past the character currently being interpreted. EUC, for example, is a non-modal encoding. Characters in JISX0208 are encoded by setting the high bit of the position codes, and characters in JISX0212 are encoded by doing the same but also prefixing the character with the byte 0x8F. The advantage of a modal encoding is that it is generally more space-efficient, and is easily extendable because there are essentially an arbitrary number of escape sequences that can be created. The disadvantage, however, is that it is much more difficult to work with if it is not being processed in a sequential manner. In the non-modal EUC encoding, for example, the byte 0x41 always refers to the letter `A'; whereas in JIS, it could either be the letter `A', or one of the two position codes in a JISX0208 character, or one of the two position codes in a JISX0212 character. Determining exactly which one is meant could be difficult and time-consuming if the previous bytes in the string have not already been processed. Non-modal encodings are further divided into "fixed-width" and "variable-width" formats. A fixed-width encoding always uses the same number of words per character, whereas a variable-width encoding does not. EUC is a good example of a variable-width encoding: one to three bytes are used per character, depending on the character set. 16-bit and 32-bit encodings are nearly always fixed-width, and this is in fact one of the main reasons for using an encoding with a larger word size. The advantages of fixed-width encodings should be obvious. The advantages of variable-width encodings are that they are generally more space-efficient and allow for compatibility with existing 8-bit encodings such as ASCII. Note that the bytes in an 8-bit encoding are often referred to as "octets" rather than simply as bytes. This terminology dates back to the days before 8-bit bytes were universal, when some computers had 9-bit bytes, others had 10-bit bytes, etc.  File: lispref.info, Node: Charsets, Next: MULE Characters, Prev: Internationalization Terminology, Up: MULE Charsets ======== A "charset" in MULE is an object that encapsulates a particular character set as well as an ordering of those characters. Charsets are permanent objects and are named using symbols, like faces. - Function: charsetp OBJECT This function returns non-`nil' if OBJECT is a charset. * Menu: * Charset Properties:: Properties of a charset. * Basic Charset Functions:: Functions for working with charsets. * Charset Property Functions:: Functions for accessing charset properties. * Predefined Charsets:: Predefined charset objects.  File: lispref.info, Node: Charset Properties, Next: Basic Charset Functions, Up: Charsets Charset Properties ------------------ Charsets have the following properties: `name' A symbol naming the charset. Every charset must have a different name; this allows a charset to be referred to using its name rather than the actual charset object. `doc-string' A documentation string describing the charset. `registry' A regular expression matching the font registry field for this character set. For example, both the `ascii' and `latin-iso8859-1' charsets use the registry `"ISO8859-1"'. This field is used to choose an appropriate font when the user gives a general font specification such as `-*-courier-medium-r-*-140-*', i.e. a 14-point upright medium-weight Courier font. `dimension' Number of position codes used to index a character in the character set. XEmacs/MULE can only handle character sets of dimension 1 or 2. This property defaults to 1. `chars' Number of characters in each dimension. In XEmacs/MULE, the only allowed values are 94 or 96. (There are a couple of pre-defined character sets, such as ASCII, that do not follow this, but you cannot define new ones like this.) Defaults to 94. Note that if the dimension is 2, the character set thus described is 94x94 or 96x96. `columns' Number of columns used to display a character in this charset. Only used in TTY mode. (Under X, the actual width of a character can be derived from the font used to display the characters.) If unspecified, defaults to the dimension. (This is almost always the correct value, because character sets with dimension 2 are usually ideograph character sets, which need two columns to display the intricate ideographs.) `direction' A symbol, either `l2r' (left-to-right) or `r2l' (right-to-left). Defaults to `l2r'. This specifies the direction that the text should be displayed in, and will be left-to-right for most charsets but right-to-left for Hebrew and Arabic. (Right-to-left display is not currently implemented.) `final' Final byte of the standard ISO 2022 escape sequence designating this charset. Must be supplied. Each combination of (DIMENSION, CHARS) defines a separate namespace for final bytes, and each charset within a particular namespace must have a different final byte. Note that ISO 2022 restricts the final byte to the range 0x30 - 0x7E if dimension == 1, and 0x30 - 0x5F if dimension == 2. Note also that final bytes in the range 0x30 - 0x3F are reserved for user-defined (not official) character sets. For more information on ISO 2022, see *Note Coding Systems::. `graphic' 0 (use left half of font on output) or 1 (use right half of font on output). Defaults to 0. This specifies how to convert the position codes that index a character in a character set into an index into the font used to display the character set. With `graphic' set to 0, position codes 33 through 126 map to font indices 33 through 126; with it set to 1, position codes 33 through 126 map to font indices 161 through 254 (i.e. the same number but with the high bit set). For example, for a font whose registry is ISO8859-1, the left half of the font (octets 0x20 - 0x7F) is the `ascii' charset, while the right half (octets 0xA0 - 0xFF) is the `latin-iso8859-1' charset. `ccl-program' A compiled CCL program used to convert a character in this charset into an index into the font. This is in addition to the `graphic' property. If a CCL program is defined, the position codes of a character will first be processed according to `graphic' and then passed through the CCL program, with the resulting values used to index the font. This is used, for example, in the Big5 character set (used in Taiwan). This character set is not ISO-2022-compliant, and its size (94x157) does not fit within the maximum 96x96 size of ISO-2022-compliant character sets. As a result, XEmacs/MULE splits it (in a rather complex fashion, so as to group the most commonly used characters together) into two charset objects (`big5-1' and `big5-2'), each of size 94x94, and each charset object uses a CCL program to convert the modified position codes back into standard Big5 indices to retrieve a character from a Big5 font. Most of the above properties can only be changed when the charset is created. *Note Charset Property Functions::.  File: lispref.info, Node: Basic Charset Functions, Next: Charset Property Functions, Prev: Charset Properties, Up: Charsets Basic Charset Functions ----------------------- - Function: find-charset CHARSET-OR-NAME This function retrieves the charset of the given name. If CHARSET-OR-NAME is a charset object, it is simply returned. Otherwise, CHARSET-OR-NAME should be a symbol. If there is no such charset, `nil' is returned. Otherwise the associated charset object is returned. - Function: get-charset NAME This function retrieves the charset of the given name. Same as `find-charset' except an error is signalled if there is no such charset instead of returning `nil'. - Function: charset-list This function returns a list of the names of all defined charsets. - Function: make-charset NAME DOC-STRING PROPS This function defines a new character set. This function is for use with Mule support. NAME is a symbol, the name by which the character set is normally referred. DOC-STRING is a string describing the character set. PROPS is a property list, describing the specific nature of the character set. The recognized properties are `registry', `dimension', `columns', `chars', `final', `graphic', `direction', and `ccl-program', as previously described. - Function: make-reverse-direction-charset CHARSET NEW-NAME This function makes a charset equivalent to CHARSET but which goes in the opposite direction. NEW-NAME is the name of the new charset. The new charset is returned. - Function: charset-from-attributes DIMENSION CHARS FINAL &optional DIRECTION This function returns a charset with the given DIMENSION, CHARS, FINAL, and DIRECTION. If DIRECTION is omitted, both directions will be checked (left-to-right will be returned if character sets exist for both directions). - Function: charset-reverse-direction-charset CHARSET This function returns the charset (if any) with the same dimension, number of characters, and final byte as CHARSET, but which is displayed in the opposite direction.  File: lispref.info, Node: Charset Property Functions, Next: Predefined Charsets, Prev: Basic Charset Functions, Up: Charsets Charset Property Functions -------------------------- All of these functions accept either a charset name or charset object. - Function: charset-property CHARSET PROP This function returns property PROP of CHARSET. *Note Charset Properties::. Convenience functions are also provided for retrieving individual properties of a charset. - Function: charset-name CHARSET This function returns the name of CHARSET. This will be a symbol. - Function: charset-doc-string CHARSET This function returns the doc string of CHARSET. - Function: charset-registry CHARSET This function returns the registry of CHARSET. - Function: charset-dimension CHARSET This function returns the dimension of CHARSET. - Function: charset-chars CHARSET This function returns the number of characters per dimension of CHARSET. - Function: charset-columns CHARSET This function returns the number of display columns per character (in TTY mode) of CHARSET. - Function: charset-direction CHARSET This function returns the display direction of CHARSET - either `l2r' or `r2l'. - Function: charset-final CHARSET This function returns the final byte of the ISO 2022 escape sequence designating CHARSET. - Function: charset-graphic CHARSET This function returns either 0 or 1, depending on whether the position codes of characters in CHARSET map to the left or right half of their font, respectively. - Function: charset-ccl-program CHARSET This function returns the CCL program, if any, for converting position codes of characters in CHARSET into font indices. The only property of a charset that can currently be set after the charset has been created is the CCL program. - Function: set-charset-ccl-program CHARSET CCL-PROGRAM This function sets the `ccl-program' property of CHARSET to CCL-PROGRAM.  File: lispref.info, Node: Predefined Charsets, Prev: Charset Property Functions, Up: Charsets Predefined Charsets ------------------- The following charsets are predefined in the C code. Name Type Fi Gr Dir Registry -------------------------------------------------------------- ascii 94 B 0 l2r ISO8859-1 control-1 94 0 l2r --- latin-iso8859-1 94 A 1 l2r ISO8859-1 latin-iso8859-2 96 B 1 l2r ISO8859-2 latin-iso8859-3 96 C 1 l2r ISO8859-3 latin-iso8859-4 96 D 1 l2r ISO8859-4 cyrillic-iso8859-5 96 L 1 l2r ISO8859-5 arabic-iso8859-6 96 G 1 r2l ISO8859-6 greek-iso8859-7 96 F 1 l2r ISO8859-7 hebrew-iso8859-8 96 H 1 r2l ISO8859-8 latin-iso8859-9 96 M 1 l2r ISO8859-9 thai-tis620 96 T 1 l2r TIS620 katakana-jisx0201 94 I 1 l2r JISX0201.1976 latin-jisx0201 94 J 0 l2r JISX0201.1976 japanese-jisx0208-1978 94x94 @ 0 l2r JISX0208.1978 japanese-jisx0208 94x94 B 0 l2r JISX0208.19(83|90) japanese-jisx0212 94x94 D 0 l2r JISX0212 chinese-gb2312 94x94 A 0 l2r GB2312 chinese-cns11643-1 94x94 G 0 l2r CNS11643.1 chinese-cns11643-2 94x94 H 0 l2r CNS11643.2 chinese-big5-1 94x94 0 0 l2r Big5 chinese-big5-2 94x94 1 0 l2r Big5 korean-ksc5601 94x94 C 0 l2r KSC5601 composite 96x96 0 l2r --- The following charsets are predefined in the Lisp code. Name Type Fi Gr Dir Registry -------------------------------------------------------------- arabic-digit 94 2 0 l2r MuleArabic-0 arabic-1-column 94 3 0 r2l MuleArabic-1 arabic-2-column 94 4 0 r2l MuleArabic-2 sisheng 94 0 0 l2r sisheng_cwnn\|OMRON_UDC_ZH chinese-cns11643-3 94x94 I 0 l2r CNS11643.1 chinese-cns11643-4 94x94 J 0 l2r CNS11643.1 chinese-cns11643-5 94x94 K 0 l2r CNS11643.1 chinese-cns11643-6 94x94 L 0 l2r CNS11643.1 chinese-cns11643-7 94x94 M 0 l2r CNS11643.1 ethiopic 94x94 2 0 l2r Ethio ascii-r2l 94 B 0 r2l ISO8859-1 ipa 96 0 1 l2r MuleIPA vietnamese-lower 96 1 1 l2r VISCII1.1 vietnamese-upper 96 2 1 l2r VISCII1.1 For all of the above charsets, the dimension and number of columns are the same. Note that ASCII, Control-1, and Composite are handled specially. This is why some of the fields are blank; and some of the filled-in fields (e.g. the type) are not really accurate.  File: lispref.info, Node: MULE Characters, Next: Composite Characters, Prev: Charsets, Up: MULE MULE Characters =============== - Function: make-char CHARSET ARG1 &optional ARG2 This function makes a multi-byte character from CHARSET and octets ARG1 and ARG2. - Function: char-charset CH This function returns the character set of char CH. - Function: char-octet CH &optional N This function returns the octet (i.e. position code) numbered N (should be 0 or 1) of char CH. N defaults to 0 if omitted. - Function: find-charset-region START END &optional BUFFER This function returns a list of the charsets in the region between START and END. BUFFER defaults to the current buffer if omitted. - Function: find-charset-string STRING This function returns a list of the charsets in STRING.