Tru64 UNIX Technical Reference for Using Chinese Features

Tru64 UNIX Technical Reference for Using Chinese Features
Tru64 UNIX
Technical Reference for Using Chinese
Features
July 1999
This guide provides the Chinese-specific information and describes the Chinese
features supported on the Compaq Tru64 UNIX (formerly DIGITAL UNIX) system.
Software Version:
Tru64 UNIX Version 5.0 or higher
Compaq Computer Corporation
Houston, Texas
i
July 1999
Compaq Computer Corporation makes no representation that the use of its products in the manner described in this
publication will not infringe on existing or future patent rights, nor do the descriptions contained in this publication imply the
granting of licenses to make, use, or sell equipment or software in accordance with the description.
Possession, use, or copying of the software described in this publication is authorized only pursuant to a valid written license
from Compaq or an authorized sublicensor.
No responsibility is assumed for the use or reliability of software on equipment that is not supported by Compaq Computer
Corporation or its affiliated companies.
© Digital Equipment Corporation 1999.
All rights reserved.
Compaq, the Compaq logo, and the Digital logo are registered in the U.S. Patent and Trademark office.
The following are trademarks of Compaq Computer Corporation:
ALL-IN-ONE Alpha, AXP, AlphaGeneration, AlphaServer, AlphaStation, AXP, Bookreader, CDA, DDIS, DEC, DEC Ada,
DEC Fortran, DEC FUSE, DECnet, DECstation, DECsystem, DECterm, DECUS, DECwindows, DTIF, MASSBUS, Micro
Vax, OpenVMS, POLYCENTER, Q-bus, StorageWorks, TruCluster, TURBOchannel, ULTRIX, ULTRIX Mail Connection,
ULTRIX Worksystem Software, UNIBUS, VAX, VAXstation, VMS, XUI, and the DIGITAL Logo.
PostScript and Display PostScript are registered trademarks of Adobe Systems, Inc.
Open Software Foundation, OSF, OSF/1, OSF/Motif, and Motif are trademarks of the Open Software Foundation, Inc. UNIX
is a registered trademark in the United States and other countries licenses exclusively through X/Open Company, Ltd.
All other trademarks and registered trademarks are the property of their respective holders.
Table of Contents
Preface
1 Character Sets............................................................................................ 1–1
1.1 CNS 11643 ............................................................................................................................ 1–1
1.2 DTSCS .................................................................................................................................. 1–4
1.3 Big–5..................................................................................................................................... 1–5
1.4 GB2312–80 ........................................................................................................................... 1–6
1.5 Extended GB ......................................................................................................................... 1–7
1.6 Unicode ................................................................................................................................. 1–7
1.7 ISO/IEC 10646 ...................................................................................................................... 1–7
2 Codesets and Codeset Conversion ......................................................... 2–1
2.1 DEC Hanyu ........................................................................................................................... 2–1
2.1.1 ASCII CODE ............................................................................................................... 2–2
2.1.2 CNS 11643 Code ......................................................................................................... 2–2
2.1.3 DTSCS Code ............................................................................................................... 2–3
2.1.4 User–Defined Characters ............................................................................................. 2–4
2.2 Taiwanese EUC ..................................................................................................................... 2–5
2.3 Big–5..................................................................................................................................... 2–7
2.4 DEC Hanzi ............................................................................................................................ 2–8
2.5 Shift Big–5 .......................................................................................................................... 2–10
2.6 Telecode .............................................................................................................................. 2–11
2.6.1 Plane 1 Character Encoding ....................................................................................... 2–12
2.6.2 Plane 2 Character Encoding ....................................................................................... 2–12
2.7 UCS–4/UCS−2 .................................................................................................................... 2–12
iii
2.8 UTF–8................................................................................................................................. 2–13
2.9 Codeset Conversion............................................................................................................. 2–13
2.9.1 Default Conversion String.......................................................................................... 2–14
2.9.2 One-to-Many Conversion........................................................................................... 2–15
2.9.3 User–Defined Character Mappings ............................................................................ 2–16
2.10 Codeset for Peripheral Devices .......................................................................................... 2–16
3 Locales
...................................................................................................3–1
4 Local Language Devices ...........................................................................4–1
4.1 Terminals .............................................................................................................................. 4–1
4.2 Printers .................................................................................................................................. 4–1
5 Fonts
...................................................................................................5–1
5.1 DECwindows Fonts ............................................................................................................... 5–1
5.1.1 XLFD Font Names....................................................................................................... 5–2
5.1.2 Bitmap Font Samples................................................................................................... 5–4
5.1.3 Font Encodings ............................................................................................................ 5–7
5.1.4 Specifying Fonts in DECwindows Applications ......................................................... 5–11
5.2 Outline Fonts....................................................................................................................... 5–11
5.2.1 XLFD Font Names of Chinese Outline Fonts ............................................................. 5–12
6 Keyboards...................................................................................................6–1
7 Input Methods.............................................................................................7–1
1.1 Activating and Deactivating Chinese Input Methods.............................................................. 7–2
1.1.1 Character–Cell Terminal Applications ......................................................................... 7–2
1.1.2 DECwindows Motif Applications ................................................................................ 7–2
1.1.3 CDE Applications........................................................................................................ 7–4
1.2 Switching Input Method ........................................................................................................ 7–5
1.3 Motif Interface Input Method ................................................................................................ 7–7
1.3.1 Input Areas.................................................................................................................. 7–7
1.3.2 Interaction Styles ......................................................................................................... 7–7
1.3.2.1 Root Window Interaction............................................................................... 7–7
1.3.2.2 Off-the-Spot Interaction................................................................................. 7–9
1.3.3 Input Server Operations ............................................................................................. 7–10
1.1.4 Options Menu ............................................................................................................ 7–10
1.1.4.1 Vertical Layout............................................................................................ 7–11
iv
1.1.4.2 Horizontal Layout........................................................................................ 7–11
1.1.4.3 Select Phrase Input Class ............................................................................. 7–11
1.1.4.4 User Phrase Database................................................................................... 7–11
1.1.4.5 System Phrase Database............................................................................... 7–12
1.1.4.6 Current Window .......................................................................................... 7–12
1.1.4.7 Input Method Customization........................................................................ 7–14
1.1.4.8 Help............................................................................................................. 7–18
1.1.4.9 Quit ............................................................................................................. 7–18
1.1.5 Saving Your New Settings ......................................................................................... 7–18
1.4 Alphabetic Input Methods.................................................................................................... 7–19
1.5 Tsang–Chi Input Method ..................................................................................................... 7–19
1.1.1 Tsang–Chi Root Radicals........................................................................................... 7–19
1.1.2 Tsang–Chi Code Generation ...................................................................................... 7–24
1.1.1.1 General Rules .............................................................................................. 7–24
1.1.1.2 Connected Characters .................................................................................. 7–25
1.1.1.3 Composite Characters .................................................................................. 7–26
1.1.1.4 Exceptional Characters ................................................................................ 7–28
1.1.3 Invoking Tsang–Chi Input Method............................................................................. 7–30
1.1.4 Multiple Candidates ................................................................................................... 7–30
1.1.5 Repeat Character Input............................................................................................... 7–31
1.1.6 Error Handling ........................................................................................................... 7–32
1.6 Quick Tsang–Chi Input Method........................................................................................... 7–32
1.1.1 Quick Tsang–Chi Code Generation ............................................................................ 7–32
1.1.2 Invoking Quick Tsang–Chi Input Method .................................................................. 7–32
1.1.3 Entering Quick Tsang–Chi Code................................................................................ 7–32
1.1.4 Multiple Candidates ................................................................................................... 7–33
1.1.5 Repeat Character Input............................................................................................... 7–33
1.1.6 Error Handling ........................................................................................................... 7–33
1.7 Phonetic Input Method......................................................................................................... 7–33
1.1.1 Phonetic Symbol Categories....................................................................................... 7–33
1.1.2 Phonetic Code Generation.......................................................................................... 7–35
1.1.3 Invoking Phonetic Input Method ................................................................................ 7–35
1.1.4 Entering Phonetic Code.............................................................................................. 7–36
1.1.5 Multiple Candidates ................................................................................................... 7–36
1.1.6 Repeat Character Input............................................................................................... 7–37
1.1.7 Error Handling ........................................................................................................... 7–37
1.8 Internal Code Input Method ................................................................................................. 7–37
1.1.1 Input Procedure.......................................................................................................... 7–37
1.1.2 Repeat Character Input............................................................................................... 7–38
1.1.3 Error Handling ........................................................................................................... 7–38
1.9 Phrase Input Method............................................................................................................ 7–38
1.1.1 Input Procedure.......................................................................................................... 7–39
1.1.2 Error Handling ........................................................................................................... 7–40
v
1.1.2.1 Condition 1.................................................................................................. 7–40
1.1.1.2 Condition 2.................................................................................................. 7–40
1.10 Symbol Input in Dxhanyuim .............................................................................................. 7–41
1.1.1 Invoking Symbol Input Mode .................................................................................... 7–41
1.1.2 Rules of Symbol Input ............................................................................................... 7–41
1.1.3 Entering Symbol Code ............................................................................................... 7–43
1.1.4 Multiple Candidates................................................................................................... 7–44
1.1.5 Error Handling........................................................................................................... 7–44
1.11 Input of User–Defined Characters in DECwindows Motif.................................................. 7–44
1.12 5–Stroke Input Method ...................................................................................................... 7–46
1.1.1 Input Mechanism ....................................................................................................... 7–46
1.1.1.1 Single Character Input ................................................................................. 7–47
1.1.1.2 Input of Terms............................................................................................. 7–47
1.1.2 Procedure................................................................................................................... 7–49
1.1.3 Multiple Candidates................................................................................................... 7–49
1.1.4 The Association Mode ............................................................................................... 7–50
1.1.5 Wildcard Key ............................................................................................................ 7–50
1.13 5–Shape Input Method....................................................................................................... 7–51
1.1.1 Distribution of Radicals ............................................................................................. 7–51
1.1.2 Decomposition of Chinese Characters........................................................................ 7–54
1.1.3 Distinction Code ........................................................................................................ 7–55
1.1.4 Principles of Character Decomposition ...................................................................... 7–56
1.1.5 Input Mechanism ....................................................................................................... 7–56
1.1.5.1 Single Character Input ................................................................................. 7–57
1.1.1.2 Term Input .................................................................................................. 7–58
1.1.6 Procedure................................................................................................................... 7–59
1.1.7 Multiple Candidates................................................................................................... 7–59
1.1.8 Association Mode ...................................................................................................... 7–59
1.1.9 Simple Code Characters............................................................................................. 7–59
1.1.9.1 Frequently-Used Characters (Level 1 Simple Code) .................................... 7–59
1.1.1.2 Level 2 Simple Code ................................................................................... 7–60
1.1.1.3 Level 3 Simple Code ................................................................................... 7–61
1.1.10 Wildcard Key........................................................................................................... 7–61
1.1.11 Entering 5–Shape Code Through the Numeric Keypad ............................................ 7–61
1.14 Pin–Yin Input Method ....................................................................................................... 7–61
1.14.1 Input Mechanism ..................................................................................................... 7–62
1.1.2 Procedure................................................................................................................... 7–62
1.1.3 Multiple Candidates................................................................................................... 7–63
1.1.4 Association Mode ...................................................................................................... 7–63
1.1.5 Multiple Phonetic Representations ............................................................................. 7–63
1.1.6 Radical Characters..................................................................................................... 7–63
1.15 Qu–Wei Input Method ....................................................................................................... 7–64
1.1.1 DEC GB2312 Characters ........................................................................................... 7–64
vi
1.1.2 Extended GB Characters ............................................................................................ 7–64
1.16 Telex Code Input Method .................................................................................................. 7–64
1.17 Symbol Input in Dxhanziim ............................................................................................... 7–65
1.18 Conversion Between Input Servers and DECwindows Motif Applications ......................... 7–65
8 Chinese Printing Support.......................................................................... 8–1
8.1 Supported Printers ................................................................................................................. 8–1
8.1.1 Text Printers ................................................................................................................ 8–1
8.1.2 PostScript Printers........................................................................................................ 8–1
8.2 8.2 Print File Formats ............................................................................................................ 8–1
8.3 Printing Features.................................................................................................................... 8–2
8.3.1 Font Embedding........................................................................................................... 8–2
8.3.2 Font Faulting................................................................................................................ 8–2
8.3.3 Software On–Demand Font Loading ............................................................................ 8–3
8.3.4 Codeset Conversion ..................................................................................................... 8–3
8.3.5 Outline Fonts ............................................................................................................... 8–4
8.4 Commands and Daemons....................................................................................................... 8–4
8.4.1 Country-Specific Options to the lpr Command............................................................. 8–4
8.4.2 PostScript Font Management Utility (pfsetup).............................................................. 8–5
8.4.3 Font-Faulting Daemon (ffd) ......................................................................................... 8–7
8.4.4 PrintServer Printing Command wwlpspr ...................................................................... 8–8
8.5 Chinese Printing Setup........................................................................................................... 8–8
8.5.1 Dot Matrix Printers ...................................................................................................... 8–8
8.5.2 DEClaser 1152............................................................................................................. 8–9
8.5.3 DEClaser 5100........................................................................................................... 8–11
8.5.4 PrintServer 17 ............................................................................................................ 8–13
8.5.5 Generic PostScript Printers......................................................................................... 8–14
9 Other Chinese Features ............................................................................ 9–1
9.1 Phrase Support in the VT382–D............................................................................................. 9–1
9.1.1 Creating a Phrase Definition File ................................................................................. 9–1
9.1.2 Syntax of Phrase Definitions ........................................................................................ 9–1
9.1.3 Phrase Downloading .................................................................................................... 9–3
9.2 Sorting Utility........................................................................................................................ 9–4
9.2.1 asort Utility.................................................................................................................. 9–5
9.2.2 Multiple Collating Sequences....................................................................................... 9–5
9.2.3 Depth–First Against Breadth–First ............................................................................... 9–6
9.2.4 User–Defined Characters ............................................................................................. 9–6
9.3 Hanyu and Hanzi DECterm ................................................................................................... 9–6
9.3.1 Creating a Hanyu or Hanzi DECterm ........................................................................... 9–6
9.3.2 Customizing DECterm ................................................................................................. 9–7
vii
9.3.3 Font Sizes .................................................................................................................... 9–7
9.3.4 Terminal ID ................................................................................................................. 9–7
9.3.5 Interaction Style........................................................................................................... 9–7
9.3.6 Input Server ................................................................................................................. 9–7
9.3.7 Copying Information.................................................................................................... 9–8
9.3.8 Default Character Set................................................................................................... 9–8
9.3.9 Chinese Character Input/Output ................................................................................... 9–8
9.3.10 Reconnecting the Input Server ................................................................................... 9–8
9.3.11 VT382–D and VT382–C Terminal Functions............................................................. 9–8
9.4 Phrase Conversion ............................................................................................................... 9–10
9.5 Special Characters in nroff .................................................................................................. 9–10
Figures
Figure 1–1: CNS 11643 Character Planes .................................................................................... 1–2
Figure 1–2: CNS 11643 First Character Planes ............................................................................ 1–3
Figure 1–3: CNS 11643 Second Character Plane ......................................................................... 1–3
Figure 1–4: EDPC Recommended Character Set ......................................................................... 1–5
Figure 1–5: GB2312-80 Character Set ......................................................................................... 1–6
Figure 2–1: DEC Hanyu Encoding of CNS 11643 Planes ............................................................ 2–2
Figure 2–2: Code Space for CNS 11643 in DEC Hanyu .............................................................. 2–3
Figure 2–3: DEC Hanyu Encoding of DTSCS Characters............................................................ 2–4
Figure 2–4: Code Space for DTSCS in DEC Hanyu..................................................................... 2–4
Figure 2–5: Encoding of Taiwanese EUC.................................................................................... 2–6
Figure 2–6: Code Space for Big-5 ............................................................................................... 2–8
Figure 2–7: DEC Hanzi Character Encoding ............................................................................... 2–9
Figure 2–8: GB2312-80 and Extended GB Code Space ............................................................. 2–10
Figure 3–1: Chinese Language Names......................................................................................... 3–3
Figure 5–1: Sung Font Sample..................................................................................................... 5–5
Figure 5–2: Hei Font Sample....................................................................................................... 5–5
Figure 5–3: Songti Font Sample .................................................................................................. 5–6
Figure 5–4:Heiti Font Sample...................................................................................................... 5–6
Figure 5–5: Fangsongti Font Sample ........................................................................................... 5–7
Figure 5–6: Kaiti Font Sample..................................................................................................... 5–7
Figure 5–7: CNS 11643-1986 Font Encoding Scheme ................................................................. 5–8
Figure 5–8: DTSCS Font Encoding Scheme ................................................................................ 5–9
Figure 5–9: GB2312-80 Font Encoding Schemes....................................................................... 5–10
Figure 6–1: LK201-D Keyboard Layout ...................................................................................... 6–2
Figure 6–2: LK401-D Keyboard Layout ...................................................................................... 6–2
Figure 6–3: LK201-C Keyboard Layout ...................................................................................... 6–3
Figure 6–4: LK401-C Keyboard Layout ...................................................................................... 6–3
viii
Figure 6–5: Numeric Keypad for 5-Stroke Input Method ............................................................. 6–4
Figure 7–1: Chinese Root Window Interaction Style ................................................................... 7–8
Figure 7–2: Chinese Input Window Icon...................................................................................... 7–8
Figure 7–3: Off-the-Spot Interaction Style ................................................................................... 7–9
Figure 7–4: Customization of Invocation Key Sequences in dxhanyuim ................................. 7–17
Figure 7–5: Customization of Invocation Key Sequences in dxhanziim ................................. 7–17
Figure 7–6: Full Form Alphabet Input Method........................................................................... 7–19
Figure 7–7: Invocation of the Tsang–Chi Input Method............................................................. 7–30
Figure 7–8: Entering a Tsang–Chi Radical ................................................................................ 7–30
Figure 7–9: Multiple Candidates................................................................................................ 7–31
Figure 7–10: The Quick Tsang–Chi Input Method ..................................................................... 7–32
Figure 7–11: Entering a Quick Tsang–Chi Code ........................................................................ 7–33
Figure 7–12: Entering the Phonetic Symbols for “ ”................................................................ 7–35
Figure 7–13: Input of “ ” Using the Internal Code Input Method............................................. 7–38
Figure 7–14: Entering a Phrase Code ......................................................................................... 7–39
Figure 7–15: Converting a Phrase Code to a Phrase ................................................................... 7–39
Figure 7–16: Distribution of Radicals ........................................................................................ 7–52
Figure 7–17: 5-Shape Radical Keys ........................................................................................... 7–53
Figure 7–18: Distinction Code for the 5-Shape Input Method .................................................... 7–55
Figure 8–1: Two-Channel Communication of the Font-Faulting Mechanism ............................... 8–9
Tables
Table 1–1: Characters Defined in CNS 11643-1986..................................................................... 1–2
Table 1–2: Characters Defined in CNS 11643-1992..................................................................... 1–4
Table 1–3: Mapping of EDPC Recommended Character Set to CNS 11643-1992........................ 1–5
Table 2–1: CNS 11643 Code Range in DEC Hanyu..................................................................... 2–2
Table 2–2: UDC Code Range in DEC Hanyu............................................................................... 2–5
Table 2–3: Big-5 Code Range...................................................................................................... 2–7
Table 2–4: Big-5 User-Defined Spaces ........................................................................................ 2–7
Table 2–5: Big–5 to Shift Big–5 Mappings................................................................................ 2–11
Table 2–6: Chinese Codeset Conversion.................................................................................... 2–14
Table 2–7: Codeset Names and Associated Strings .................................................................... 2–14
Table 2–8: Mapping Between Big–5 and DEC Hanyu User-Defined Characters ........................ 2–16
Table 2–9: Feasible Chinese Codesets for Applications, Terminals, and Printers ....................... 2–17
Table 3–1: Chinese Locales......................................................................................................... 3–1
Table 4–1: Chinese Print Filters................................................................................................... 4–2
Table 5–1: Traditional Chinese Screen Fonts............................................................................... 5–1
Table 5–2: Simplified Chinese Screen Fonts................................................................................ 5–2
Table 5–3: XLFD of Miscellaneous Chinese Screen Fonts........................................................... 5–4
Table 5–4: Chinese DECwindows Font Encodings ...................................................................... 5–8
Table 5–5: Font Encoding Conversion......................................................................................... 5–9
ix
Table 5–6: Chinese DECwindows Font Encodings .................................................................... 5–10
Table 5–7: GR to GL Font Encoding Conversion ...................................................................... 5–10
Table 5–8: Traditional Chinese Default Fonts............................................................................ 5–11
Table 5–9: Simplified Chinese Default Fonts ............................................................................ 5–11
Table 7–1: Key Sequences that Invoke Chinese Input Method..................................................... 7–5
Table 7–2: Key Sequences Used to Select Traditional Chinese Input Method.............................. 7–6
Table 7–3: Key Sequences Used to Select Simplified Chinese input Method............................... 7–6
Table 7–4: Window Input Areas.................................................................................................. 7–7
Table 7–5: Modifier State Customization .................................................................................. 7–16
Table 7–6: Tsang–Chi root Radicals Classification.................................................................... 7–20
Table 7–7: Quick Reference Table of the Tsang–Chi Root Radicals.......................................... 7–23
Table 7–8: Composition Form Characters.................................................................................. 7–24
Table 7–9: Connected Form Characters..................................................................................... 7–24
Table 7–10: Examples of Connected Character Decomposition................................................. 7–25
Table 7–11: Composite Character Decomposition ..................................................................... 7–27
Table 7–12: Compound Characters............................................................................................ 7–28
Table 7–13: Difficult Characters ............................................................................................... 7–29
Table 7–14: Special Characters ................................................................................................. 7–30
Table 7–15: Meaning of Arrow Characters ................................................................................ 7–31
Table 7–16: Phonetic Symbols .................................................................................................. 7–34
Table 7–17: Examples of Phonetic Input ................................................................................... 7–35
Table 7–18: Phonetic Symbols with Different Termination Keys............................................... 7–36
Table 7–19: Stroke Categories................................................................................................... 7–46
Table 7–20: 5-Stroke Code ........................................................................................................ 7–46
Table 7–21: Input of Single Characters with the 5-Stroke Input Method.................................... 7–47
Table 7–22: Input Terms with the 5-Stroke Input Method.......................................................... 7–48
Table 7–23: Shape Code............................................................................................................ 7–52
Table 7–24: Entering Basic Strokes........................................................................................... 7–57
Table 7–25: Input of Terms with the 5-Shape Input Method ...................................................... 7–58
Table 7–26: Pin-Yin Tone Marks .............................................................................................. 7–62
Table 9–1: Phrase definitions ...................................................................................................... 9–2
Table 9–2: Traditional Chinese Sorting Methods......................................................................... 9–4
Table 9–3: Simplified Chinese Sorting Methods.......................................................................... 9–4
x
Preface
This guide provides Chinese-specific information, such as character sets and locales, for
end users and programmers who want to use and develop internationalized applications in
Chinese locales on the Compaq Tru64 UNIX operating system. Details of the Chinese
features are also documented in this guide.
Intended Audience
This guide is for new and experienced end users and programmers who are interested in
the Chinese variant of the Compaq Tru64 UNIX operating system.
Structure of this Guide
This guide consists of nine chapters:
Chapter 1
Describes the Chinese character sets supported in the Compaq Tru64
UNIX operating system software.
Chapter 2
Describes the Chinese codesets and the conversion among different
codesets.
Chapter 3
Describes the Chinese locales.
Chapter 4
Describes the local hardware devices which support the Chinese locales.
Chapter 5
Provides information on Chinese fonts.
Chapter 6
Provides information on Chinese keyboards.
Chapter 7
Describes how to input Chinese characters.
Chapter 8
Introduces the Chinese printing support.
Chapter 9
Provides descriptions of other Chinese features.
xi
Related Documents
Writing Software for the International Market
Programming for the World: A Guide to Internationalization, Sandra Martin O’Donnell,
Prentice Hall, 1994
OSF/Motif User’s Guide Revision 1.2, Open Software Foundation, Prentice Hall,
Englewood Cliffs, New Jersey 07632
OSF/Motif Style Guide Revision 1.2, Open Software Foundation, Prentice Hall, Englewood
Cliffs, New Jersey 07632
X Window System, Third Edition, Robert W. Scheifler and James Gettys, Digital Press
Programmer's Supplement for Release 5 of the X Window System, Version 11, David
Flanagan, O’Reilly & Associates, Inc.
The Unicode Standard, Version 2.0, The Unicode Consortium, Addison Wesley, Reading,
MA, 1996
Information Technology-Universal Multiple-Octet Coded Character Set, ISO/IEC 10646:
1993
Chinese Code for Data Communication
xii
Conventions
The following typographical conventions are used in this manual:
%
$
A percent sign represents the C shell system prompt. A
dollar sign represents the system prompt for the Bourne
and Korn shell.
#
A number sign represents the superuser prompt.
% cat
Boldface type in interactive examples indicates typed user
input.
File
Italic (slanted) type indicates variable values,
placeholders, and function argument names.
[|]
{|}
In syntax definitions, brackets indicate items that are
optional and braces indicate items that are required.
Vertical bars separating items inside brackets or braces
indicate that you choose one item from among those listed.
...
In syntax definitions, a horizontal ellipsis indicates that the
preceding item can be repeated one or more times.
cat(1)
A cross-reference to a reference page includes the
appropriate section number in parentheses. For example,
cat(1) indicates that you can find information on the cat
command in Section 1 of the reference pages.
[RETURN]
In an example, a key name enclosed in a box indicates that
you press that key.
Ctrl/x
This symbol indicates that you hold down the first named
key while pressing the key or mouse button that follows
the slash. In examples, this key combination is enclosed in
a box (for example [Ctrl/C]).
xiii
1
Character Sets
The Compaq Tru64 UNIX (formerly DIGITAL UNIX) operating system software supports
the following Chinese character sets:
•
CNS 11643
•
DTSCS
•
Big-5
•
GB2312-80
•
Extended GB
•
Unicode
•
ISO/IEC 10646
For traditional Chinese characters the CNS 11643 and Big-5 character sets are commonly
used. The GB2312-80 character set is commonly used for Simplified Chinese characters.
The Unicode and ISO/IEC 10646 character sets are common to both traditional and
Simplified Chinese.
1.1 CNS 11643
The CNS (Chinese National Standard) 11643 character set standard was published by the
National Bureau of Standards of Taiwan in 1986 and was updated in 1992. It was also
called "Standard Interchange Code for Generally-used Chinese Character" (SICGCC).
CNS 11643 provides 16 character planes for defining Chinese characters. Each character
plane is divided into 94 rows and each row has 94 columns. Altogether, a total number of
8,836 characters can be accommodated in each plane. Character planes 1-11 are reserved
for defining standard Chinese characters while character planes 12-16 are user-defined
areas.
Tru64 UNIX Technical Reference for Using Chinese Features 1–1
Figure 1–1: CNS 11643 Character Planes
The original CNS 11643 standard, published in 1986, defines certain groups of characters
only on the first and second character planes. Table 1–1 shows these groups of characters.
Table 1–1: Characters Defined in CNS 11643-1986
Character Plane
Character Type
Number of Characters
Plane 1
Special characters
Control characters
Frequently-used characters
651
33
5,401
Plane 2
Less frequently-used characters
7,650
Figure 1–2 and Figure 1–3 illustrate the positions of these characters in the first and second
character planes.
1–2 Tru64 UNIX Technical Reference for Using Chinese Features
Figure 1–2: CNS 11643 First Character Planes
Figure 1–3: CNS 11643 Second Character Plane
As the CNS11643-1986 character set was not rich enough to meet most of the application
requirements, such as names and addresses, the information industry in Taiwan requested
to expand the character set. In 1991, the Bureau of National Standard formed a team to
study how to expand CNS 11643. On August 4, 1992, the Bureau of National Standard
published the revised CNS 11643 - Chinese Standard Interchange Code (CSIC)
Tru64 UNIX Technical Reference for Using Chinese Features 1–3
The revised CNS 11643, called CNS 11643-1992, defined 651 special characters, 33
control characters and 48,027 Chinese characters, as shown in Table 1–2.
Table 1–2: Characters Defined in CNS 11643-1992
Character Plane
Character Type
Number of Characters
Plane 1
Special characters
Control characters
Frequently-used characters
651
33
5,401
Plane 2
Less frequently-used characters
7,650
Plane 3
Rarely-used characters (EDPC Part I)
6,148
Plane 4
Used for residency system, ISO 2nd
edition DIS 10646 Han characters, 171
EDPC Part II Characters
7,298
Plane 5
Rarely-used characters (Based on the
Ministry of Education publications)
8,603
Plane 6
Variants based on the Ministry of
Education publications (<=14 strokes)
6,388
Plane 7
Variants based on the Ministry of
Education publications (>14 strokes)
6,539
Since the number of characters defined in CNS 11643-1992 is far greater than those
required for general use, the revised CNS 11643 is called "Chinese Standard Interchange
Code (CSIC)".
______________________________ Note ___________________________
In this release, the new characters added to CNS 11643-1992 are not supported. Only
the characters defined in CNS 11643-1986 and DTSCS (which will be described in the
next section) are supported.
______________________________________________________________
1.2 DTSCS
In addition to CNS 11643, the Compaq Tru64 UNIX operating system supports the
DIGITAL Taiwan Supplemental Character Set (DTSCS). Currently, only the EDPC
Recommended Character Set, which defines a total of 6,319 characters, is included in
DTSCS. EDPC Recommended Character Set was first published by the Electronic Data
Processing Center of Executive Yuen in June, 1988.
1–4 Tru64 UNIX Technical Reference for Using Chinese Features
Figure 1–4: EDPC Recommended Character Set
As a de facto standard, computer vendors support the EDPC Recommended Character Set
and assign it to CNS 11643 character plane 14.
In the revised CNS 11643-1992, the 6,319 characters in the EDPC Recommended
Character Set are assigned to the third and fourth character planes of CNS 11643, as shown
in Table 1–3.
Table 1–3: Mapping of EDPC Recommended Character Set to CNS
11643-1992
EDPC Characters
Character Plane
Number of Characters
Part I
Plane 3
6,148
Part II
Plane 4
171
1.3 Big–5
The Big-5 character set, though not a national standard, is commonly used by the Taiwan
information industry, particularly in the PC and workstation market. Big-5 character set
was designed to meet the requirements of five major software vendors in Taiwan. Since
its publication, much software and hardware, and many peripheral devices have been
developed to support Big-5.
Big-5 is very similar to the first two planes of CNS 11643-1992. The frequently-used
Chinese characters (5,401) defined in the two character sets are exactly the same except
that their positions in the code table are different. For the less frequently-used Chinese
characters, Big-5 defines two more characters in addition to the 7,650 characters defined in
Tru64 UNIX Technical Reference for Using Chinese Features 1–5
the second character plane of CNS 11643, and their positions in the code table are
different.
1.4 GB2312–80
The GB2312-80 character set is a standard published by the State Bureau of
Standardization of the People’s Republic of China (PRC) in 1980 and put in force in May,
1981.
GB2312-80 defines 7,445 characters, including 6,763 Chinese characters:
•
Graphic symbols
682 graphic symbols are defined and placed in rows 1-9.
•
Level 1 characters
Those are 3,755 frequently-used characters placed in rows 16-55.
•
Level 2 characters
Those are 3,008 less frequently-used characters placed in rows 56-87. See Figure 1–5.
The GB2312-80 code table is divided into 94 rows (Qu), numbered from 1 to 94. Each
row has 94 columns (Wei), also numbered from 1 to 94.
Figure 1–5: GB2312-80 Character Set
1–6 Tru64 UNIX Technical Reference for Using Chinese Features
1.5 Extended GB
The extended GB character set provides 8,836 (94 x 94) code points for defining userdefined characters. The 8,836 code points are divided into two regions:
•
User-Defined Area — Spans rows 1-87 and provides 8,178 code positions.
•
User-Defined (reserved) Area — Spans rows 88-94 and provides 658 code positions.
This area is where users define special and frequently-used user-defined characters.
The extended GB code table is similar to the GB2312 code table. It is divided into 94
rows and each row has 94 columns.
1.6 Unicode
The Unicode Standard, Version 2.0 specifies a universal character set (UCS) that contains
definitions for 38,885 characters and also includes a Private Use Area for vendor-defined
or user-defined characters. The main features of this character set are:
•
All characters are treated as 16-bit units.
•
Each 16-bit unit has an abstract character identity.
•
Certain sequences of 16-bit characters in a text stream are transformed into other
characters, called composed characters.
•
All characters have properties, such as base, numeric, spacing, combination, and
directionality. The Unicode standard provides rules for ordering characters with
different properties so that parsing of character sequences is unambiguous.
•
The relationship between Unicode characters and the glyphs in the native language
script that users see, type, or print is not necessarily one-to-one. A glyph may be
mapped to a single abstract character or a composed character. Conversely, more than
one glyph can be mapped to a character.
•
The ISO 8859-1 character set occupies the first 256 code positions (and the ASCII
character set the first 128 positions) of the UCS.
1.7 ISO/IEC 10646
The ISO/IEC 10646 standard, which is specified in Information Technology-Universal
Multiple-Octet Coded Character Set, ISO/IEC 10646, allows characters to be specified as
either 32-bit units or like Unicode, as 16-bit units. In their 32-bit form, the 16-bit character
values in Unicode are zero-extended through a second 16-bit unit to conform to ISO/IEC
10646.
Tru64 UNIX Technical Reference for Using Chinese Features 1–7
2
Codesets and Codeset Conversion
The Compaq Tru64 UNIX operating system fully supports the following Chinese codesets
by including locales and codeset conversion support:
•
DEC Hanyu
•
Taiwanese EUC (Extended UNIX Code)
•
Big-5
•
DEC Hanzi
It also provides codeset conversion support for the following codesets:
•
Telecode
•
Shift Big-5
•
UCS-2
•
UCS-4
•
UTF-8
2.1 DEC Hanyu
The DEC Hanyu codeset, denoted by dechanyu, consists of the following character sets:
•
ASCII
•
CNS11643, the first and second character planes
•
DTSCS
•
User-Defined Characters
Tru64 UNIX Technical Reference for Using Chinese Features 2–1
DEC Hanyu uses a combination of single-byte, two-byte, and four-byte data to represent
ASCII characters, symbols, or ideographic characters.
2.1.1 ASCII CODE
All ASCII characters can be represented in the form of single-byte 7-bit data in DEC
Hanyu. That is, the most significant bit (MSB) of ASCII characters is always set off.
2.1.2 CNS 11643 Code
Each CNS 11643 character is represented by a two-byte code in DEC Hanyu, which
complies with the CNS 11643 standard. The MSB of the first byte is always set on while
that of the second byte can be on for the first character plane or off for the second
character plane. See Figure 2–1.
Figure 2–1: DEC Hanyu Encoding of CNS 11643 Planes
The first byte of a CNS 11643 code determines the row number of the character, while the
second byte determines its column number. Table 2–1 illustrates the code range of a CNS
11643 code.
Table 2–1: CNS 11643 Code Range in DEC Hanyu
Character Plane
1st Byte (hex)
2nd Byte (hex)
Plane 1
A1 to FE
A1 to FE
Plane 2
A1 to FE
21 to 7E
The following formulas illustrate the code of a CNS 11643 character in relation to its row
and column numbers.
CNS 11643 plane 1 character:
First byte = A0 + row number
Second byte = A0 + column number
CNS 11643 plane 2 character:
First byte = A0 + row number
Second byte = 20 + column number
2–2 Tru64 UNIX Technical Reference for Using Chinese Features
For example, if a character is positioned at the first column of the 36th row on CNS 11643
plane 1, its encoding value is calculated as follows:
First byte = A0 (hex) + 36 = C4 (hex)
Second byte = A0 (hex) + 01 = A1 (hex)
Its encoded value is C4A1.
Similarly, if a character is positioned at the first column of the 36th row on CNS 11643
plane 2, its encoding value is calculated as follows:
First byte = A0 (hex) + 36 = C4 (hex)
Second byte = 20 (hex) + 01 = 21 (hex)
Its encoded value is C421.
Figure 2–2 illustrates the division of a two-byte code space and the position of CNS 11643
characters.
Figure 2–2: Code Space for CNS 11643 in DEC Hanyu
2.1.3 DTSCS Code
Each DTSCS character is represented by a four-byte code in DEC Hanyu. The first two
bytes are the leading codes, namely 0xC2 0xCB, which are used as a designator sequence
for the DTSCS character set. The MSB of the third and fourth bytes is set on for the
EDPC Recommended Character Set.
DIGITAL UNIX Technical Reference for Using Chinese Features 2–3
Figure 2–3: DEC Hanyu Encoding of DTSCS Characters
Figure 2–4 illustrates the 4-byte code space and the position of DTSCS characters.
Figure 2–4: Code Space for DTSCS in DEC Hanyu
2.1.4 User–Defined Characters
In addition to the CNS11643 and the DTSCS character sets described above, DEC Hanyu
provides 3,587 positions for user-defined characters (UDC). The positions for UDCs are
those unused (but not reserved) code points on the CNS 11643 first and second character
planes. Therefore, the encoding of UDC is exactly the same as that of CNS 11643 except
that they occupy different regions, as shown in Table 2–2.
2–4 Tru64 UNIX Technical Reference for Using Chinese Features
Table 2–2: UDC Code Range in DEC Hanyu
Character Plane
Number of UDC
Code Range
Plane 1
145
FDCC - FEFE
Plane 1
2,256
AAA1 - C1FE
Plane 2
1,186
F245 - FE7E
2.2 Taiwanese EUC
Taiwanese EUC (extended UNIX code), denoted as eucTW, is another codeset to support
CNS 11643. The design of Taiwanese EUC allows the 16 character planes of CNS 11643
to be encoded in a unified way. A stream of data encoded in Taiwanese EUC can contain
characters defined in ASCII and the 16 character planes. Figure 2–5 illustrates the
encoding of Taiwanese EUC.
DIGITAL UNIX Technical Reference for Using Chinese Features 2–5
Figure 2–5: Encoding of Taiwanese EUC
Taiwanese EUC uses the Single-Shift 2 control character (SS2) and an additional byte to
specify a character plane. The only exception is the first plane, which does not require
leading codes. Instead, two bytes specify a character’s position on the first plane. The first
byte determines its row number, while the second determines its column number. The
MSBs of the two bytes are set on.
2–6 Tru64 UNIX Technical Reference for Using Chinese Features
In this release, only the characters defined in the first and second planes of CNS 11643 and
those in the EDPC Recommended Character Set that have been remapped into the third
and fourth character planes of the revised CNS 11643-1992 are supported in Taiwanese
EUC. Other characters that were added to the CNS 11643-1992 standard are not
supported.
2.3 Big–5
The Big-5 codeset, denoted as big5, is the only codeset that supports the Big-5 character
set. The encoding of the Big-5 codeset is similar to that of CNS 11643 in DEC Hanyu.
Each Big-5 character is represented by a two-byte code which complies with the Big-5
standard. The MSB of the first byte is always set on while that of the second byte can be
set on or off.
The Big-5 code range is defined as shown in Table 2–3.
Table 2–3: Big-5 Code Range
Character
Number of Characters
Code Range
Special symbols
408
A140-A3BF
Level 1 characters
5,401
A440-C67E
Level 2 characters
7,652
C940-F9D5
In addition to the code points for special symbols and Chinese characters shown in Table
2–3, three areas are set aside for user-defined spaces. Some vendors in Taiwan support
user-defined characters in the code ranges shown in Table 2–4.
Table 2–4: Big-5 User-Defined Spaces
Character
Number of Characters
Code Range
Level 1 user-defined space
785
FA40-FEFE
Level 2 user-defined space
2,983
8E40-A0FE
Level 3 user-defined space
2,041
8140-8DFE
DIGITAL UNIX Technical Reference for Using Chinese Features 2–7
The valid ranges of the two bytes are:
Byte
Valid Ranges
First byte
81-FE
Second byte
40-7E and A1-FE
Figure 2–6 illustrates the encoding of the Big-5 codeset in a two-byte code space.
Figure 2–6: Code Space for Big-5
2.4 DEC Hanzi
The ASCII, GB2312-80 and extended GB character sets are combined to form the DEC
Hanzi codeset.
DEC Hanzi, denoted as dechanzi, uses a two-byte data representation for symbols and
ideographic characters defined in the GB2312-80 character set. To differentiate GB231280 codes from ASCII codes, MSB of the first byte is always set on while that of the second
byte is on for GB2312-80 and off for extended GB as shown in Figure 2–7.
2–8 Tru64 UNIX Technical Reference for Using Chinese Features
Figure 2–7: DEC Hanzi Character Encoding
The first byte of a two-byte code determines its row number, while the second byte
determines its column number.
The following formulas illustrate the code of a GB2312-80 character or an extended GB
character in relation to its row and column numbers.
GB2312-80 character:
First byte = A0 + row number
Second byte = A0 + column number
Extended GB character:
First byte = A0 + row number
Second byte = 20 + column number
For example, if a character is positioned at the first column of the 16th row on the
GB2312-80 code plane, its encoding value is calculated as follows:
First byte = A0 (hex) + 16 = B0 (hex)
Second byte = A0 (hex) + 01 = A1 (hex)
Its encoded value is B0A1.
DIGITAL UNIX Technical Reference for Using Chinese Features 2–9
Similarly, if a character is positioned at the first column of the 16th row on the extended
GB code plane, its encoding value is calculated as follows:
First byte = A0 (hex) + 16 = B0 (hex)
Second byte = 20 (hex) + 01 = 21 (hex)
Its encoded value is B021.
Figure 2–8 illustrates the division of a two-byte code space and the position of the Chinese
character sets.
Figure 2–8: GB2312-80 and Extended GB Code Space
2.5 Shift Big–5
The Shift Big-5 codeset, denoted as sbig5, is a variant of the Big-5 codeset. The
difference between the two is that the second byte of some Big-5 characters is mapped to
other values to form Shift Big-5 characters. Table 2–5 illustrates the mappings of Big-5
characters to Shift Big-5 characters.
2–10 Tru64 UNIX Technical Reference for Using Chinese Features
Table 2–5: Big–5 to Shift Big–5 Mappings
Big-5 (Second Byte)
Shift Big-5 (Second Byte)
40
30
5B
31
5C
32
5D
33
5E
34
5F
35
60
36
7B
37
7C
38
7D
39
7E
9F
The Shift Big-5 codeset can be used in codeset conversion and terminal display. Refer to
Section 2.9 for details.
2.6 Telecode
The Telecode codeset (called Mitac Telex in early versions of the operating system),
denoted as telecode, consists of 2 character planes. Each character plane has 8836
character positions. In plane 1, standard characters occupy positions 0001 to 8045; the
remaining 791 positions are for user-defined characters. In plane 2, standard characters
occupy positions 0001 to 8489; the remaining 346 positions are for user-defined
characters. Telecode uses 2-byte values to represent characters on both planes.
______________________________ Note ____________________________
For information about the character sets encoded by Telecode, refer to Chinese Code
For Data Communication.
_______________________________________________________________
Telecode can be used in codeset conversion and terminal display. Refer to Section 2.9 for
further details.
DIGITAL UNIX Technical Reference for Using Chinese Features 2–11
2.6.1 Plane 1 Character Encoding
To differentiate plane 1 code from plane 2 code, MSB is set on in both bytes of a plane 1
character code. You can use the following formula to calculate the value of a plane 1
character from its position on the plane:
First byte = M + 161
Second byte = N + 161 - M x 94
In this formula, N is the position of the character and M = N / 94.
For example, if a character is at position 2502 on plane 1, its encoded value is BBDB,
which is calculated as follows:
N = 2502, M = 2502/94 = 26
First byte = 26 + 161 = 187 (or, BB (hex))
Second byte = 2502 + 161 - 26 x 94 = 219 (or, DB (hex))
2.6.2 Plane 2 Character Encoding
To differentiate plane 2 code from plane 1 code, the MSB of the first byte is set on and
that of the second byte is set off for each plane 2 character code. You can use the
following formula to calculate the value of a plane 2 character from its position:
First byte = M + 161
Second byte = N + 33 - M x 94
In this formula, N is the position of the character on the plane and M = N / 94.
For example, if a character is at position 2502 on plane 2, its encoded value is BB5B,
which is calculated as follows:
N = 2502, M = 2502/94 = 26
First byte = 26 + 161 = 187 (or, BB (hex))
Second byte = 2502 + 33 - 26 x 94 = 91 (or, 5B, hex))
2.7 UCS–4/UCS−2
The UCS codeset is a standard character encoding for the universal character set (UCS)
specified in Unicode and ISO/IEC 10646. There are two encoding schemes for UCS. An
implementation that parses in 16-bit units (2 octets) is known as UCS-2. This is the
canonical Unicode encoding in wide use on personal computers. An implementation that
parses in 32-bit units (4 octets) is known as UCS-4. This is the canonical ISO/IEC 10646
encoding that is in use on systems that can support larger data unit size.
2–12 Tru64 UNIX Technical Reference for Using Chinese Features
On Compaq Tru64 UNIX, UCS-2 and UCS-4 can be used in codeset conversion. In
addition, UCS-4 is used as an internal process code for some locales. For codeset
conversion, see Section 2.9. For locale variants, see Chapter 3.
2.8 UTF–8
The Unicode and ISO/IEC 10646 standards define transformation formats for the UCS.
The following UCS transformation formats (UTFs) exist mainly to transform UCS values
into sequences of bytes for handling by various byte-oriented protocols:
•
UTF-8 is the standard method for transforming UCS-encoded data into a sequence of
8-bit bytes and ensuring interchange transparency for characters in C0 code positions
(0 to 31), the SPACE (32) character, and the DEL (127) character.
•
UTF-7 is the standard interchange format for environments that strip the eighth bit
from each byte.
•
UTF-16 is a transformation format that allows systems that are limited to processing
of 16-bit units (specified by UCS-2 encoding) to support the extended character
definition space that is included in UCS-4.
The current version of Compaq Tru64 UNIX supports UTF-8. UTF-8 can be used in
codeset conversion and in the universal.UTF-8 locale. For codeset conversion, see Section
2.9. For locale variants, see Chapter 3.
2.9 Codeset Conversion
Users may sometimes need to convert files from one codeset to another. The iconv
utility provided by Compaq Tru64 UNIX is used to convert the encoding of characters in
one codeset to another and write the results to standard output. Table 2–6 shows the pairs
of Chinese codeset converters that are provided.
DIGITAL UNIX Technical Reference for Using Chinese Features 2–13
Table 2–6: Chinese Codeset Conversion
DEC
Hanyu
Taiwanese
EUC
Big-5
Shift
Big-5
Telecode
DEC
Hanzi
UCS-4
UCS-2
UTF-8
DEC Hanyu
–
Y
Y
N
Y
Y
Y
Y
Y
Taiwanese
EUC
Y
–
Y
Y
Y
Y
Y
Y
Y
BIG-5
Y
Y
–
Y
Y
Y
Y
Y
Y
Shift Big-5
N
Y
Y
–
N
N
N
N
N
Telecode
Y
Y
Y
N
–
N
N
N
N
DEC Hanzi
Y
Y
Y
N
N
–
Y
Y
Y
UCS-4
Y
Y
Y
N
N
Y
–
Y
Y
UCS-2
Y
Y
Y
N
N
N
Y
-
Y
UTF-8
Y
Y
Y
N
N
Y
Y
Y
–
For example, the following command converts a DEC Hanyu file to Big-5:
% iconv -f dechanyu -t big5 <file>
Table 2–7 shows the string names you can use as the parameters of the iconv utility.
Table 2–7: Codeset Names and Associated Strings
Codeset
String
DEC Hanyu
dechanyu
Taiwanese EUC
eucTW
Big-5
big5
Shift Big-5
sbig5
Telecode
Telecode
DEC Hanzi
dechanzi
Universal Codeset (4 octet form)
UCS-4
Universal Codeset (2-octet form)
UCS-2
Universal Transfer Format
UTF-8
2.9.1 Default Conversion String
When converting from one codeset to another, characters in the source codeset that have
no corresponding code point in the destination codeset will not be converted. By default,
the characters that cannot be converted are skipped and have no representation in the
converted output.
2–14 Tru64 UNIX Technical Reference for Using Chinese Features
You can control this behavior by using the ICONV_DEFSTR environment variable to
define a default string to replace those unconvertible characters. If you specify a numeric
value for this environment variable, the corresponding character value will be used.
The ICONV_DEFSTR environment variable affects all Chinese iconv converters. You can
also use the "ICONV_DEFSTR_<from_code>_<to_code>" environment variable to
control specific codeset conversion. For example, to convert a DEC Hanyu input file to
DEC Hanzi with unconvertible characters converted to "?", you would enter the following
commands:
%setenv ICONV_DEFSTR_dehanyu_dechanzi "?"
%iconv -f dechanyu -t dechanzi hanzi_input > hanyu_output
For codeset converters that end in UCS-2, UCS-4, or UTF-8, you can use the "U+XXXX"
notation to specify the default character for conversion failure fallback.
______________________________ Note ____________________________
During cut-and-paste operations, those traditional Chinese characters that cannot be
converted to Simplified Chinese characters are shown as default characters in the
applications.
_______________________________________________________________
2.9.2 One-to-Many Conversion
When converting from the DEC Hanzi codeset to other Chinese codesets, one Simplified
Chinese character may be mapped to multiple traditional Chinese characters. By default,
the iconv utility picks up only the most likely candidate from a list of possible choices.
You can control the behavior of the iconv utility with the ICONV_ACTION environment
variable.
The ICONV_ACTION environment variable determines how the iconv utility behaves
when there are one-to-many mappings. The possible values are:
•
batch –The most likely or preferred candidate will be picked up. This is the default.
•
conv_all – All possible choices are generated within the brackets "{" and "}" so
that you can edit the converted file manually and determine which one should be used.
•
conv_all_nosym – All characters except symbols (for example, punctuation
marks) are handled in the same manner as conv_all.
DIGITAL UNIX Technical Reference for Using Chinese Features 2–15
______________________________ Note ___________________________
During cut-and-paste operations, the batch mode is always used for those nonunique
characters.
______________________________________________________________
______________________________ Note ___________________________
The ICONV_ACTION environment variable applies to Simplified Chinese to
traditional Chinese conversion only and has no effect on UCS-4 and UTF-8
converters.
______________________________________________________________
2.9.3 User–Defined Character Mappings
Some user-defined characters in the Big-5 codeset have predefined mappings to userdefined spaces in DEC Hanyu. These mappings are the same as those supported by
Pathworks/Hanyu. Table 2–8 shows this mapping.
Table 2–8: Mapping Between Big–5 and DEC Hanyu User-Defined
Characters
DEC Hanyu
Big-5
Code Size
F321 - FB41
FA40 - FEFE
785
FB42 - FEFE
8E40 - 905C
343
AAA1 - C1FE
905D - 9EB8
2256
These predefined user-defined character mappings are supported by both the iconv
methods and the terminal driver.
Because some user-defined characters do not have predefined mappings, Compaq
recommends that you use only those user-defined characters that have predefined
mappings.
2.10 Codeset for Peripheral Devices
The Compaq Tru64 UNIX software provides a mechanism for you to use to configure your
system to run applications with peripherals, such as terminals and printers, that support
different codesets. You can specify the codesets for the applications, terminals, and
printers independently, as shown in Table 2–9. The Compaq Tru64 UNIX software
automatically does the necessary codeset conversion.
2–16 Tru64 UNIX Technical Reference for Using Chinese Features
Table 2–9: Feasible Chinese Codesets for Applications, Terminals,
and Printers
Application Code
Terminal Code
Printer Code
DEC Hanyu
DEC Hanyu
DEC Hanyu
Taiwanese EUC
Taiwanese EUC
Taiwanese EUC
Big-5
Big-5
Big-5
DEC Hanzi
DEC Hanzi
DEC Hanzi
Shift Big-5
Telecode
______________________________ Note ____________________________
Chinese DECterm software supports only DEC Hanyu, Big5, or DEC Hanzi as its
terminal code. You must activate the stty drive and set tcode to dechanyu when
running in a Taiwanese EUC locale. For example:
% stty adec tcode dechanyu
_______________________________________________________________
For details about setting up codesets for terminals and printers, see Writing Software for
the International Market.
DIGITAL UNIX Technical Reference for Using Chinese Features 2–17
3
Locales
The Compaq Tru64 UNIX operating system supports different Chinese locales for
different countries and areas. These include Taiwan, People’s Republic of China (PRC),
and Hong Kong. Table 3–1 shows the valid Chinese locales with different countries,
codesets, and collating sequences.
______________________________ Note ____________________________
zh_TW is an alias of zh_TW.eucTW, and zh_CN is an alias of zh_CN.dechanzi.
_______________________________________________________________
Table 3–1: Chinese Locales
Codeset
Locale
Collation Sequence
DEC Hanyu
zh_TW
zh_TW.dechanyu
[email protected]
[email protected]
[email protected]
[email protected]@ucs4
[email protected]
[email protected]@ucs4
[email protected]
[email protected]@ucs4
zh_HK.dechanyu
[email protected]
Internal code
Internal code
Internal code
Internal code
Radical
Radical
Stroke
Stroke
Chuyin (Phonetic)
Chuyin (Phonetic)
Internal code
Internal code
Tru64 UNIX Technical Reference for Using Chinese Features 3–1
Table 3-1: (Cont.) Chinese Locales
Codeset
Locale
Collation Sequence
Taiwanese EUC
zh_TW.eucTW
[email protected]
[email protected]
[email protected]@ucs4
[email protected]
[email protected]@ucs4
[email protected]
[email protected]@ucs4
zh_HK.eucTW
[email protected]
Internal code
Internal code
Radical
Radical
Stroke
Stroke
Chuyin (Phonetic)
Chuyin (Phonetic)
Internal code
Internal code
Big-5
zh_TW.big5
[email protected]
[email protected]
[email protected]
zh_HK.big5
Internal code
Radical
Stroke
Chuyin (Phonetic)
Internal code
DEC Hanzi
zh_CN
zh_CN.dechanzi
[email protected]
[email protected]
[email protected]
[email protected] ucs4
[email protected]
[email protected]@ucs4
[email protected]
[email protected]@ucs4
zh_HK.dechanzi
[email protected]
Internal code
Internal code
Internal code
Internal code
Radical
Radical
Stroke
Stroke
Pinyin (Phonetic)
Pinyin (Phonetic)
Internal code
Internal code
Locales that support the same country and codeset are basically the same. The radical,
stroke, pinyin, and chuyin modifiers after the at (@) sign specify different criterion
for collation and sorting. Moreover, the characters defined in character set standards have
collating precedence over user-defined characters, which, in turn, have precedence over
undefined or reserved characters.
The ucs4 modifier indicates that UCS-4 is used as the internal processing code. The
classification information, however, is not provided for the full set of UCS-4 characters,
but only for the corresponding language.
If you are using DECwindows Motif, you can select the locale through the Language
Menu of Session Manager. If you are using CDE, you can select the locale using the
language menu on the CDE login screen. The applicable locales are shown in Table 3–1.
3–2 Tru64 UNIX Technical Reference for Using Chinese Features
Figure 3–1: Chinese Language Names
Locale
Language Name
zh_TW
Chinese Taiwan
zh_TW.dechanyu
Chinese Taiwan (DEC Hanyu)
zh_TW.eucTW
Chinese Taiwan (EUC)
zh_TW.big5
Chinese Taiwan (Big5)
zh_HK.dechanyu
Chinese Hong Kong (DEC Hanyu)
zh_HK.eucTW
Chinese Hong Kong (EUC)
zh_HK.big5
Chinese Hong Kong (Big5)
zh_CN
Chinese China
zh_CN.dechanzi
Chinese China (DEC Hanzi)
zh_HK.dechanzi
Chinese Hong Kong (DEC Hanzi)
______________________________ Note ____________________________
For DECwindows Motif and CDE, the locale modifier is ignored.
_______________________________________________________________
DIGITAL UNIX Technical Reference for Using Chinese Features 3–3
4
Local Language Devices
4.1 Terminals
The Compaq Tru64 UNIX operating system supports the VT382-D and VT382-C Chinese
terminals. The VT382-D terminal is for traditional Chinese and the VT382-C terminal is
for Simplified Chinese. Hanyu DECterm and Hanzi DECterm are the emulation of the
VT382-D Chinese terminal and Simplified Chinese terminal respectively, which provide
compatible functionalities for running Chinese character-cell terminal applications. For
details of Hanyu DECterm and Hanzi DECterm, see Chapter 9, Other Chinese Features.
You can also use dtterm running in a Chinese locale to display Chinese in a charactercell application.
4.2 Printers
The Compaq Tru64 UNIX operating system supports the following dot matrix Chinese
printers:
•
CP382-D
(traditional Chinese)
•
LA88-C
(Simplified Chinese)
•
LA380-CB
(Simplified Chinese)
The following PostScript printers can be configured for traditional or Simplified Chinese:
•
DEClaser 1152
•
DEClaser 5100 with font disk (LN09X-HD)
•
PrintServer 17
•
All PostScript level 2 printers
Tru64 UNIX Technical Reference for Using Chinese Features 4–1
The print filters listed in Table 4–1 are provided to support these Chinese printers.
Table 4–1: Chinese Print Filters
Filter Name
Printer Name
cp382dof
CP382-D printer
la88cof
LA88-C printer
la380cbof
LA380-CB printer
dl1152wrof
DEClaser 1152
dl5100wrof
DEClaser 5100
lpsof
PrintServer 17
wwpsof
Level 2 PostScript printers
______________________________ Note ___________________________
To use PrintServer 17, the PrintServer Software Version 5.0 or later for Compaq
Tru64 UNIX is also required.
______________________________________________________________
For details about setting up Chinese printer queues, see Chapter 8, Chinese Printing
Support.
4–2 Tru64 UNIX Technical Reference for Using Chinese Features
5
Fonts
5.1 DECwindows Fonts
The Compaq Tru64 UNIX operating system provides Chinese DECwindows fonts in
various sizes and typefaces for 75 dpi (dots-per-inch) display devices. Table 5–1 lists the
screen fonts for traditional Chinese.
Table 5–1: Traditional Chinese Screen Fonts
Typefaces
Glyph Size
Bounding Box
Remarks
Screen
15 x 16
22 x 22
16 x 18
24 x 24
Mandatory font
Mandatory font
Sung
22 x 22
30 x 30
24 x 24
32 x 32
Optional font
Optional font
Hei
15 x 16
22 x 22
16 x 16
24 x 24
Optional font
Optional font
There are two sets of DECwindows fonts, one for CNS 11643-1986 and one for DTSCS.
Table 5–2 lists the screen fonts for Simplified Chinese.
Tru64 UNIX Technical Reference for Using Chinese Features 5–1
Table 5–2: Simplified Chinese Screen Fonts
Typefaces
Glyph Size
Bounding Box
Remarks
Screen
15 x 16*
16 x 18
22 x 22*
24 x 24
Mandatory font, defined in
GB5199.1-85
Mandatory font
15 x 16*
16 x 16
22 x 22*
32 x 32*
24 x 24
34 x 34
Heiti
15 x 16
22 x 22*
32 x 32*
16 x 16
24 x 24
34 x 34
Optional font
Optional font
Optional font, defined in
GB12036-89
Fangsongti
22 x 22*
32 x 32*
24 x 24
34 x 34
Optional font
Optional font, defined in
GB12034-89
Kaiti
22 x 22*
32 x 32*
24 x 24
34 x 34
Optional font
Optional font, defined in
GB12035-89
Songti
Optional font, defined in
GB5199.1-85
Optional font
Optional font, defined in
GB6345.1-86
*The fonts marked with an asterisk are supplied by China Standard Technology Development Corporation
(CSTDC) of People’s Republic of China.
In addition to these Chinese fonts, several miscellaneous screen fonts are provided for use
in Hanyu and Hanzi DECterm, and Motif toolkit.
The mandatory fonts are available after you install the Chinese language support from the
worldwide language support software. Other optional fonts are available only if you
install the optional Chinese font subsets. If you do not find the optional fonts on your
system, please contact your system administrator.
No 100 dpi Chinese fonts are provided in the kit. To allow you to use the Chinese fonts on
100 dpi display devices, a font alias file is provided to map the 75 dpi font names to 100
dpi font names.
5.1.1 XLFD Font Names
You must specify the DECwindows font names in X Logical Font Description (XLFD)
format in your application programs or in the application resource files. You can specify
wildcards for any fields in the font names.
You can use the following font names for both 75 dpi or 100 dpi display devices. If you
want to state the display resolution explicitly, you can specify 75 or 100 in the X- and Yresolution fields, that is, the second and third asterisks in the following XLFD names.
5–2 Tru64 UNIX Technical Reference for Using Chinese Features
•
Traditional Chinese Screen family font names in XLFD format:
CNS 11643-1986 Fonts
-ADECW-Screen-Medium-R-Normal--*-180-*-*-M-160-DEC.CNS11643.1986-2
-ADECW-Screen-Medium-R-Normal--*-240-*-*-M-240-DEC.CNS11643.1986-2
DTSCS Fonts
-ADECW-Screen-Medium-R-Normal--*-180-*-*-M-160-DEC.DTSCS.1990-2
-ADECW-Screen-Medium-R-Normal--*-240-*-*-M-240-DEC.DTSCS.1990-2
•
Traditional Chinese Sung family font names in XLFD format:
CNS 11643-1986 Fonts
-ADECW-Sung-Medium-R-Normal--*-240-*-*-M-240-DEC.CNS11643.1986-2
-ADECW-Sung-Medium-R-Normal--*-320-*-*-M-320-DEC.CNS11643.1986-2
DTSCS Fonts
-ADECW-Sung-Medium-R-Normal--*-240-*-*-M-240-DEC.DTSCS.1990-2
-ADECW-Sung-Medium-R-Normal--*-320-*-*-M-320-DEC.DTSCS.1990-2
•
Traditional Chinese Hei family font names in XLFD format:
CNS 11643-1986 Fonts
-ADECW-Hei-Medium-R-Normal--*-160-*-*-M-160-DEC.CNS11643.1986-2
-ADECW-Hei-Medium-R-Normal--*-240-*-*-M-240-DEC.CNS11643.1986-2
DTSCS Fonts
-ADECW-Hei-Medium-R-Normal--*-160-*-*-M-160-DEC.DTSCS.1990-2
-ADECW-Hei-Medium-R-Normal--*-240-*-*-M-240-DEC.DTSCS.1990-2
•
Simplified Chinese Screen family font names in XLFD format:
-ADECW-Screen-Medium-R-Normal--*-180-*-*-M-160-GB2312.1980-1
-ADECW-Screen-Medium-R-Normal--*-240-*-*-M-240-GB2312.1980-1
•
Simplified Chinese Songti family font names in XLFD format:
-ADECW-Songti-Medium-R-Normal--*-160-*-*-M-160-GB2312.1980-1
-ADECW-Songti-Medium-R-Normal--*-240-*-*-M-240-GB2312.1980-1
-ADECW-Songti-Medium-R-Normal--*-340-*-*-M-340-GB2312.1980-1
DIGITAL UNIX Supplemental Guide for Chinese Support 5–3
•
Simplified Chinese Heiti family font names in XLFD format:
-ADECW-Heiti-Medium-R-Normal--*-160-*-*-M-160-GB2312.1980-1
-ADECW-Heiti-Medium-R-Normal--*-240-*-*-M-240-GB2312.1980-1
-ADECW-Heiti-Medium-R-Normal--*-340-*-*-M-340-GB2312.1980-1
•
Simplified Chinese Fangsongti family font names in XLFD format:
-ADECW-Fangsongti-Medium-R-Normal--*-240-*-*-M-240-GB2312.1980-1
-ADECW-Fangsongti-Medium-R-Normal--*-340-*-*-M-340-GB2312.1980-1
•
Simplified Chinese Kaiti family font names in XLFD format:
-ADECW-Kaiti-Medium-R-Normal--*-240-*-*-M-240-GB2312.1980-1
-ADECW-Kaiti-Medium-R-Normal--*-340-*-*-M-340-GB2312.1980-1
Table 5–3 shows the font names, in XLFD format, of several miscellaneous Chinese
screen fonts.
Table 5–3: XLFD of Miscellaneous Chinese Screen Fonts
XLFD Font Name
Character Set
-ADECW-Screen-Medium-R-Normal--*-180-*-*-M-80-iso8859-1
ISO Latin-1
-ADECW-Screen-Medium-R-Normal--*-180-*-*-M-80-DEC-DECctrl
DEC Display Control
-ADECW-Screen-Medium-R-Normal--*-180-*-*-M-80-DEC-DECsuppl
DEC Supplemental
-ADECW-Screen-Medium-R-Normal--*-180-*-*-M-80-DEC-DECtech
DEC Technical
-ADECW-Screen-Medium-R-Normal--*-180-*-*-M-80-DEC-DRCS
DEC DRCS
-ADECW-Screen-Medium-R-Normal--*-240-*-*-M-120-iso8859-1
ISO Latin-1
-ADECW-Screen-Medium-R-Normal--*-240-*-*-M-120-DEC-DECctrl
DEC Display Control
-ADECW-Screen-Medium-R-Normal--*-240-*-*-M-120-DEC-DECsuppl
DEC Supplemental
-ADECW-Screen-Medium-R-Normal--*-240-*-*-M-120-DEC-DECtech
DEC Technical
-ADECW-Screen-Medium-R-Normal--*-240-*-*-M-120-DEC-DRCS
DEC DRCS
5.1.2 Bitmap Font Samples
Figure 5–1 through Figure 5–6 illustrate samples of Chinese fonts.
5–4 Tru64 UNIX Technical Reference for Using Chinese Features
Figure 5–1: Sung Font Sample
Figure 5–2: Hei Font Sample
DIGITAL UNIX Supplemental Guide for Chinese Support 5–5
Figure 5–3: Songti Font Sample
Figure 5–4:Heiti Font Sample
5–6 Tru64 UNIX Technical Reference for Using Chinese Features
Figure 5–5: Fangsongti Font Sample
Figure 5–6: Kaiti Font Sample
5.1.3 Font Encodings
The X Consortium registers names for font encodings that are used in XLFDs. However,
no names currently are registered for CNS 11643 and DTSCS. Therefore, they are
currently supported as Compaq private encodings as shown in Table 5–4.
DIGITAL UNIX Supplemental Guide for Chinese Support 5–7
Table 5–4: Chinese DECwindows Font Encodings
Character Set
Character Set Registry
CNS 11643-1986
DEC.CNS11643.1986-2
DTSCS
DEC.DTSCS.1990-2
Since the X Window System provides only basic Xlib functions for handling 8-bit and 16bit characters, the four-byte data representation of DTSCS is trimmed to remove the two
leading bytes, C2 CB, to form a two-byte encoding. DECwindows applications should
either preprocess the four-byte data and then handle them with the low level Xlib functions
or handle Chinese strings with the internationalized text drawing functions provided by
X11R6 Xlib or Motif Toolkit.
Figure 5–7 and Figure 5–8 illustrate these two encoding schemes.
Figure 5–7: CNS 11643-1986 Font Encoding Scheme
5–8 Tru64 UNIX Technical Reference for Using Chinese Features
Figure 5–8: DTSCS Font Encoding Scheme
Vendors may adopt different encoding schemes or even different character sets to produce
their fonts. The fonts supplied by Compaq are all in the encoding schemes defined in this
section. To allow you to run applications on third-party workstations on which different
font encodings are installed, the Compaq Tru64 UNIX implementation of X11R6 Xlib
supports the conversion of encodings during text display.
Table 5–5 shows these encoding conversions.
Table 5–5: Font Encoding Conversion
Character Set
Taiwanese EUC
Big-5
Convert From
euctw-1 (plane 1)
euctw-2 (plane 2)
euctw-3 (plane 3)
euctw-4 (plane 4)
Convert To
big5-0
dec.cns11643.1986-2
dec.cns11643.1986-2
dec.cns11643.1986-2
dec.dtscs.1990-2
dec.dtscs.1990-2
For Simplified Chinese, the X Window System defines two encodings for the GB2312-80
character set as shown in Table 5–6.
DIGITAL UNIX Supplemental Guide for Chinese Support 5–9
Table 5–6: Chinese DECwindows Font Encodings
Encoding
Character Set Registry
GL
GB2312.1980-0
GR
GB2312.1980-1
Figure 5–9: GB2312-80 Font Encoding Schemes
The Chinese DECwindows fonts supplied by Compaq are all in GR encoding. To allow
you to run applications on third-party workstations on which only GL-encoded fonts are
installed, the Compaq Tru64 UNIX implementation of X11R6 Xlib supports the
conversion of GR encoding to GL encoding for text drawing and measurement, as shown
in Table 5–7.
Table 5–7: GR to GL Font Encoding Conversion
Convert From
Convert To
gb2312.1980-1
gb2312.1980-0
For details, see Writing Software for the International Market.
5–10 Tru64 UNIX Technical Reference for Using Chinese Features
5.1.4 Specifying Fonts in DECwindows Applications
Table 5–8 and Table 5–9 show the default fonts used in the Motif Toolkit.
Table 5–8: Traditional Chinese Default Fonts
XLFD Font Name
Character Set
-ADECW-Screen-Medium-R-Normal--*-180-*-*-M-80-iso8859-1
ISO8859-1
-ADECW-Screen-Medium-R-Normal--*-180-*-*-M-160DEC.CNS11643.1986-2
DEC.CNS11643.1986-2
-ADECW-Screen-Medium-R-Normal--*-180-*-*-M-160DEC.CNS11643.1986-2-UDC
DEC.CNS11643.1986-2UDC
-ADECW-Screen-Medium-R-Normal--*-180-*-*-M-160DEC.DTSCS.1990-2
DEC.DTSCS.1990-2
-ADECW-Screen-Medium-R-Normal--*-180-*-*-*-*-*
Fontset
Table 5–9: Simplified Chinese Default Fonts
XLFD Font Name
Character Set
-ADECW-Screen-Medium-R-Normal--*-180-*-*-M-80-iso8859-1
ISO8859-1
-ADECW-Screen-Medium-R-Normal--*-180-*-*-M-160-GB2312.1980-1
GB2312.1980-1
-ADECW-Screen-Medium-R-Normal--*-180-*-*-M-160-GB2312.1980UDC
GB2312.1980-UDC
-ADECW-Screen-Medium-R-Normal--*-180-*-*-*-*-*
Fontset
To override the default fonts of a traditional Chinese DECwindows application, you should
specify the ISO Latin-1, DTSCS, and CNS11643 (UDC) fonts as well as the Chinese
fontset when creating widget instances. For a Simplified Chinese DECwindows
application, you should specify the ISO Latin-1, GB2312-80, and extended GB (UDC)
fonts as well as the Chinese fontset when creating widget instances. For details, see
Writing Software for the International Market.
5.2 Outline Fonts
The Compaq Tru64 UNIX software provides the following traditional and Simplified
Chinese outline fonts for printing on PostScript printers and for display through Level II
Display PostScript extension.
For traditional Chinese:
•
Sung-Light-CNS11643
DIGITAL UNIX Supplemental Guide for Chinese Support 5–11
•
Hei-Light-CNS11643
For Simplified Chinese:
•
XiSong-GB2312-80
•
Hei-GB2312-80
The encoding of this font is the same as that illustrated in Figure 5–7 and Figure 5–9.
These Chinese outline fonts have the following uses:
•
Printing on PostScript printers. For details see Chapter 8, Chinese Printing Support.
•
Displaying through the R6 X Windows System Type 1 rasterizer. To use these outline
fonts, add the $I18NPATH/usr/lib/X11/fonts/TChinesePS and
$I18NPATH/usr/lib/X11/fonts/SChinesePS directories to your font path
with the following command:
% xset +fp $I18NPATH/usr/lib/X11/fonts/TChinesePS,
$I18NPATH/usr/lib/X11/fonts/SchinesePS
This is done automatically when the outline fonts are installed.
•
Displaying through Display PostScript. You can view PostScript files with Chinese
characters using the CDA Viewer or through the Display PostScript extension.
5.2.1 XLFD Font Names of Chinese Outline Fonts
To use the Chinese outline fonts through the Type 1 rasterizer, you can specify the font
names in XLFD (X Logical Font Description) format in your application programs or in
the application resource files, just like ordinary DECwindows bitmap fonts.
To specify the XLFD font name of an outline font, you should replace the fields currently
marked with 0 (zero) with the following information:
•
Field 1 — The font height in number of dots. An asterisk (*) usually is entered in this
field.
•
Field 2 — The font height in point size. For example, you can enter 240 to specify a
24 point font.
•
Fields 3 and 4 — The X- and Y-resolution. They usually have the value of 75 or 100.
•
Field 5 — The average font width in point size. An asterisk (*) usually is put in this
field.
For example, if you want to use a 48 point font of the Sung-Light-CNS11643 family for a
100 dpi display device, you would specify:
-dyna-sung-medium-r-normal--*-480-100-100-m-*-CNS11643.1986
5–12 Tru64 UNIX Technical Reference for Using Chinese Features
6
Keyboards
The Compaq Tru64 UNIX operating system supports the following Chinese keyboard
types:
•
LK201
•
LK401
•
PCXAL
There are some variants for these keyboards in different countries. For example, in
Taiwan, the LK201-D variant has additional symbols printed on the keycaps for different
Chinese input methods. The figures in this chapter show some of these Chinese keyboard
layouts. You can find online copies of these figures at the location specified. These
figures are in .ddif format.
Tru64 UNIX Technical Reference for Using Chinese Features 6–1
Figure 6–1: LK201-D Keyboard Layout
Required Keymap:
us_lk201re
Location of file:
/usr/lib/cda/hanyu-lk201d-100.ddif
Figure 6–2: LK401-D Keyboard Layout
Required Keymap
us_lk401aa
Location of file:
/usr/lib/cda/hanyu-lk401d-100.ddif
6–2 Tru64 UNIX Technical Reference for Using Chinese Features
Figure 6–3: LK201-C Keyboard Layout
Required Keymap:
us_lk201re
Location of file:
/usr/lib/cda/hanzi-lk201c-100.ddif
Figure 6–4: LK401-C Keyboard Layout
Required Keymap
us_lk401aa
Location of file:
/usr/lib/cda/hanzi-lk401c-100.ddif
Tru64 UNIX Technical Reference for Using Chinese Features 6–3
Figure 6–5: Numeric Keypad for 5-Stroke Input Method
Location of file:
/usr/lib/cda/hanzi-keypad-100.ddif
6–4 Tru64 UNIX Technical Reference for Using Chinese Features
7
Input Methods
This chapter describes the input methods for entering Chinese characters. There are two
groups of input methods, one for traditional Chinese and the other for Simplified Chinese.
Traditional Chinese has the following input methods:
•
Full-form alphabets
•
Tsang-Chi
•
Quick Tsang-Chi, also known as Easy
•
Phonetic
•
Internal Code
•
Phrase input method
•
Symbol Input
Simplified Chinese has the following input methods:
•
5-Stroke
•
5-Shape
•
Pin-Yin, or Phonetic
•
Qu-Wei or Row-Column in GB2312-80
•
Telex Code
•
Phrase Input
Tru64 UNIX Technical Reference for Using Chinese Features 7–1
This chapter also describes:
•
The mechanism for activating and deactivating Chinese input methods
•
The mechanism for switching input methods
•
The DECwindows Motif interface and its customization
7.1 Activating and Deactivating Chinese Input Methods
This section describes how to activate and deactivate the Chinese input methods for
character-cell terminal applications, DECwindows Motif applications, and CDE
applications.
7.1.1 Character–Cell Terminal Applications
For character-cell terminal applications, traditional and Simplified Chinese input methods
are implemented by the firmware of the VT382-D traditional Chinese terminal and
VT382-C Simplified Chinese terminal respectively, or incorporated in the terminal
emulation software, such as Hanyu DECterm for the former and Hanzi DECterm for the
latter. Applications do not need to provide their own support for Chinese input They can
rely on the terminal or emulation software to provide the input method services.
Hanyu DECterm and Hanzi DECterm are considered as part of the DECwindows Motif
applications. Therefore, the activating and deactivating method follows those of
DECwindows Motif, discussed in Section 7.1.2.
On VT382-D or VT382-C terminals, you select the input mode using the Compose key,
which is located on the lower-left side of the main keyboard. On the Chinese version of
LK201 or LK401 keyboard (that is, LK201-C, LK201-D, LK401-C or LK401-D), the
Compose key is labeled as 0 . For details, see Chapter 6.
Once the Chinese input mode is activated, the firmware of the terminal or the input
methods incorporated in DECterm automatically compose Chinese characters and return
the input data as appropriate.
7.1.2 DECwindows Motif Applications
For DECwindows Motif applications, Chinese input methods are implemented in the form
of independent processes called input servers. These Chinese input servers are X client
processes that can work on a standard X server provided the X server has the required
Chinese fonts installed. This means that the Chinese input server can run on any system
which can access your X display device, including the device itself.
Although the codesets returned by traditional and Simplified Chinese input servers are
fixed, the Compaq Tru64 UNIX software allows you to connect applications to the input
servers using any valid Chinese locale. The Compaq Tru64 UNIX software provides the
required codeset conversion.
7–2 Tru64 UNIX Technical Reference for Using Chinese Features
The traditional Chinese input server provided with the Compaq Tru64 UNIX software is
interoperable with all existing DECwindows Motif /Hanyu platforms, including OpenVMS
DECwindows Motif /Hanyu and UWS/Hanyu. The Simplified Chinese input server
provided with the Compaq Tru64 UNIX software is interoperable with all existing
DECwindows Motif /Hanzi platforms, including OpenVMS DECwindows Motif /Hanzi
and UWS/Hanzi. Both input servers also provide input method services to the R6 X
library (Xlib) supported by the Compaq Tru64 UNIX software. You can write
internationalized applications using the standard R6 application programming interface
and communicate with this input server. For details about developing internationalized
software with X11R6, see Writing Software for the International Market.
Before you can input Chinese data, you must start the Chinese input server on your
workstation or any system on your network that can be accessed by your workstation.
English and Chinese user interfaces are provided, so be sure to set the correct session
language before starting the input server. There are several ways to start the Chinese input
server:
•
Using the Session Manager
You can start the Chinese input server after logging in to a session by selecting Hanyu
IM or Hanzi IM from the Applications menu of the Session Manager, just like starting
an ordinary DECwindows application.
•
Automatic Startup of the Input Server
If you start up your session in one of the traditional Chinese locales, by default, the
Hanyu IM menu item is added to the Session Manager’s Automatic Startup list. Hanzi
IM is the default for Simplified Chinese. When you log in, the input server starts
automatically. If you do not want to auto-start the input server, you can remove this
item from the Automatic Startup list by using the Session Manager’s Customize menu.
______________________________ Note ____________________________
Applications which are started before Hanyu IM or Hanzi IM cannot connect to the
input server. Therefore, Hanyu IM or Hanzi IM should be the first item on the
Automatic Startup list.
_______________________________________________________________
Tru64 UNIX Technical Reference for Using Chinese Features 7–3
•
Using a Command
You can start the input server on a workstation that you are using by entering one of
the following commands:
% /usr/bin/X11/dxhanyuim &
% /usr/bin/X11/dxhanziim &
You can start the input server on a remote system for Hanyu IM by entering the
following command on that system:
% setenv DISPLAY <display-name>
% /usr/bin/X11/dxhanyuim &
For Hanzi IM, enter the following commands:
% setenv DISPLAY <display-name>
% /usr/bin/X11/dxhanziim &
In these examples, <display-name> is the display name of your workstation.
After you invoke the Chinese input server, the DECwindows Motif applications which
have been internationalized to support Chinese can communicate with it to provide input
method services.
7.1.3 CDE Applications
For CDE applications, Chinese input methods are implemented by input servers. Before
you can input Chinese data, you must start the Chinese input server. There are two ways
to start the Chinese input servers in CDE:
•
Automatic Startup of the Input Server
If Chinese is selected on the CDE login menu, the Chinese input server starts
automatically. When you log in, the following script runs:
/usr/dt/config/Xsession.d/0020.dtims
The value of the DTSTARTIMS environment variable determines whether the script
will automatically start the specified Chinese input server.
•
Using a Command
You can enter one of the following commands to start the input server on a
workstation you are using:
% /usr/bin/X11/dxhanyuim &
% /usr/bin/X11/dxhanziim &
7–4 Tru64 UNIX Technical Reference for Using Chinese Features
You can enter one of the following commands on a remote system to start the input
server on that system for Hanyu IM:
% setenv DISPLAY <display-name>
% /usr/bin/X11/dxhanyuim &
For Hanzi IM, enter the following command:
% setenv DISPLAY <display-name>
% /usr/bin/X11/dxhanziim &
In these examples, <display-name> is the display name of your workstation.
After you invoke the Chinese input server, the CDE applications which are
internationalized to support Chinese can communicate with it to provide input method
services.
7.2 Switching Input Method
Table 7–1 shows the key sequences for toggling between the English and Chinese input
modes.
Table 7–1: Key Sequences that Invoke Chinese Input Method
Terminal or Keyboard Type
Default Key Sequence
VT382-D
[Compose]
VT382-C
DECwindows Motif
LK201
[Compose/Space]
LK401
[Compose]
PCXAL
[Alt/Space]
______________________________ Note ____________________________
You can use the input server options menu to customize the key sequences used to
invoke the Chinese input method.
_______________________________________________________________
Table 7–2 and Table 7–3 show the key sequences for selecting a specific traditional
Chinese or Simplified Chinese input method once you are in the Chinese input mode.
Tru64 UNIX Technical Reference for Using Chinese Features 7–5
Table 7–2: Key Sequences Used to Select Traditional Chinese Input
Method
Input Method
Default Key Sequence
Full-Form Alphabets
[Shift/Space]
Tsang—Chi
[F6]
Quick Tsang—Chi
[F7]
Internal Code
[F8]
Phrase
[F9]
Phonetic
[F10]
Symbol Input
[Z] (in Tsang—Chi or Quick Tsang—Chi mode)
Table 7–3: Key Sequences Used to Select Simplified Chinese input
Method
Input Method
Default Key Sequence
Phrase Input
[F5]
5-Stroke
[F6]
Qu-Wei
[F7]
Pin-Yin
[F8]
Telex Code
[F9]
5-Shape
[F10]
______________________________ Note ___________________________
You can use the Options menu to customize the key sequences used to select the input
method.
______________________________________________________________
In standard Motif, the function key [F10] is defined as the accelerator of the pull-down
menu bar. In DECwindows Motif, the menu accelerator by default is [Ctrl/F10] so that
[F10] can be used to invoke the Phonetic Input Method. To make your DECwindows
Motif applications Motif compliant, insert the following line in your
$HOME/.Xdefaults file:
*menuAccelerator: Ctrl <Key> F10:
7–6 Tru64 UNIX Technical Reference for Using Chinese Features
______________________________ Note ____________________________
If you change [F10] to the menu accelerator, you cannot invoke the Phonetic Input
Method unless you change the invocation key to another key sequence.
_______________________________________________________________
7.3 Motif Interface Input Method
You can interact with the Chinese input server through a Motif-style user-interface. This
interface allows an input method to provide feedback about the data being edited, to help
you compose a character, list choices for selection, provide options for customizing the
input server, and so on.
7.3.1 Input Areas
The X Input Method specification defines the three input areas shown in Table 7–4.
Table 7–4: Window Input Areas
Region
Description
Auxiliary area
An option menu helps you customize the Chinese input methods and the input
method window.
Status area
Critical information about the internal state of the Chinese input methods is
displayed in this area.
Pre-edit area
The intermediate text that is being composed is displayed in this area, which also
displays a list of valid candidates for the input key sequences.
7.3.2 Interaction Styles
The use of the input areas depends on the interaction style (or pre-edit style) selected for
the application. The Chinese input server supports two interaction styles:
•
Root Window Interaction
•
Off-the-Spot Interaction
7.3.2.1 Root Window Interaction
You can choose the root window interaction style if you want to display the pre-edit data
in an input window which is separate from the application window. You can scale and
move the input window to meet your preferences. If you want to free up more screen
space, you can iconize the input method window. You can also choose to display pre-edit
data in vertical or horizontal layout.
Tru64 UNIX Technical Reference for Using Chinese Features 7–7
Figure 7–1: Chinese Root Window Interaction Style
Figure 7–2: Chinese Input Window Icon
You can continue to input Chinese characters through a Chinese application window when
the input window is iconized. The input state is displayed on the icon title, which is
updated according to the input mode and the input focus. If you want to see the pre-edit
data, you can double click the icon to redisplay the input window.
7–8 Tru64 UNIX Technical Reference for Using Chinese Features
7.3.2.2 Off-the-Spot Interaction
If you want to display the pre-edit data in a fixed location of the application window, you
can choose the off-the-spot interaction style. With this interaction style, the Chinese input
server creates the input window at the bottom of the application window. You need not
refer to the root window and you can iconize it to save screen area.
Figure 7–3: Off-the-Spot Interaction Style
You can specify the priority of the interaction styles of DECwindows Motif or CDE
applications by specifying the VendorShell resource, XmNpreeditType. By default, the
resource value is overthespot,offthespot,root,onthespot. This list is in
priority order. The first style is used if available in an input method, else the second, and
so on.
There are two ways to choose your preferred interaction style:
•
Use the -xrm command line option to specify the resource value when you start an
application. For example, the following command starts CDE Calendar Manager with
the root window interaction style:
% dtcm -xrm ’*preeditType: root,offthespot’ &
To start CDE Calendar Manager with the off-the-spot interactive style, you can enter:
% dtcm -xrm ’*preeditType: offthespot,root’ &
•
For DECwindows Motif, use the Session Manager’s Options menu in XDM:
— From the Session Manager's Options menu, select Input Method...
— In the popup Input Method Options window, click on the appropriate pre-edit
style button
Tru64 UNIX Technical Reference for Using Chinese Features 7–9
The XmNpreeditType resource is set to a priority list beginning with the pre-edit style
that you have chosen.
After you select your preferred interaction style, the applications you invoke start up
with the new setting.
•
For CDE, you can invoke the dtimsstart application to change the input method
and the input style of the current input method.
______________________________ Note ___________________________
Some applications, such as DECterm, may provide their own user interface to handle
interaction styles. Those mechanisms may override the methods described here.
______________________________________________________________
7.3.3 Input Server Operations
When you start a Chinese input server, no application is connected to it and the title bar
indicates that there is no connection. When an internationalized application starts in a
is displayed in the status area for dxhanyuim, and
Chinese locale, the string
is displayed for dxhanziim, indicating that the application is connected to the
Chinese input server and the mode is English. If you invoke a Chinese input method, the
input state displayed in the status area and the title bar is updated accordingly. If you
change the input focus to a noninternationalized application window, the title bar of the
input window changes to indicate there is no connection.
The input server can maintain an individual state of composition for different input
contexts or application windows. In addition, under the root window interaction style,
each application window can be associated with its own attributes, such as font size, font
style, layout, input window size, and position. You can set the input focus to an
application window and then compose a Chinese character or customize the input window.
The input server stores information about the composing state and input window attributes
so that next time this application window gets the input focus, the original composing state
and attributes are restored.
7.3.4 Options Menu
The auxiliary area of the input window provides an Options menu that you use to
customize the input server. You can click on the Options button to access the
customization pull-down menu. The menu provides these options for traditional Chinese:
•
Vertical Layout
•
Horizontal Layout
•
Select Phrase Input Class
7–10 Tru64 UNIX Technical Reference for Using Chinese Features
•
User Phrase Database
•
System Phrase Database
•
Current Window
•
Input Method Customization
•
Help
•
Quit
7.3.4.1 Vertical Layout
You can choose the Vertical Layout option only if the current layout is horizontal. When
you choose this option, the input window and the layout of its contexts immediately
display in a vertical orientation. The vertical input window remains at the same origin.
7.3.4.2 Horizontal Layout
You can choose this option only if the current layout is vertical. When you choose this
option, the input window and the layout of its context immediately display in a horizontal
orientation. The horizontal input window remains at the same origin.
7.3.4.3 Select Phrase Input Class
You can use the Select Phrase option to customize the phrase input mode. DECwindows
Motif shares the phrase databases that are created and managed by the Compaq Tru64
UNIX Phrase Utility. After you create a phrase database and define your phrases, both
character-cell terminal applications and DECwindows Motif applications can use the data
for phrase input. In order to use the phrase databases, the LANG environment variable
must be set to reflect the required codeset, that is, zh_TW.dechanyu. For details about the
Compaq Tru64 UNIX Phrase Utility and phrase definition file, see Writing Software for
the International Market.
The Select Phrase Input Class option allows you to focus on a particular class of phrases
during phrase input. When you choose this option, a dialog box pops up and you can
select the phrase class that you want to use. To select all classes, you choose the * option.
If you do this, the phrase input method searches all classes of phrase definitions for the
phrase code that you entered.
7.3.4.4 User Phrase Database
The Phrase Input method allows you to access two phrase definition databases: the system
phrase database and the user phrase database. The former is for public access by all users
using your system. It should be created and modified by your system administrator. You
can also create and maintain your own private phrase database for storing your frequently
used phrases. This is called the user phrase database.
Tru64 UNIX Technical Reference for Using Chinese Features 7–11
______________________________ Note ___________________________
The databases that you can access are the ones available on the system on which you
start your Chinese input server.
______________________________________________________________
For details about creating a phrase database, see Writing Software for the International
Market.
If you choose this option, you will access your private user phrase database.
7.3.4.5 System Phrase Database
If you choose the System Phrase Database option, you will access the system phrase
database.
7.3.4.6 Current Window
The Current Window option allows you to customize the attributes of a specific
application window.
______________________________ Note ___________________________
The Current Window option is available only if you chose root window as your
interaction style and you focused to an internationalized application input area.
Otherwise, this option is greyed out. If you choose the off-the-spot interaction style,
the application determines the attributes.
______________________________________________________________
When you choose this option, a dialog box pops up and the following options are
displayed:
•
Font Size
You can choose the font size for displaying pre-edit data. Click on either the Big Font
or Small Font toggle buttons.
•
Font Typeface
You can choose the font typeface to be used in the input window. To choose the font
typeface in a traditional Chinese input server, click on one of the following toggle
buttons:
— Hei
— Sung
— Screen
7–12 Tru64 UNIX Technical Reference for Using Chinese Features
To choose the font typeface in a Simplified Chinese input server, click on one of the
following toggle buttons:
— Heiti
— Songti
— Kaiti
— FangSongti
— Screen
You can define a typeface that does not exist in the options list in the Chinese input
server resource file. This typeface will be displayed beside the Other: label in the
customization window.
•
Line Spacing
The Chinese input server can display pre-edit data on more than one text line.
Usually, this happens when a list of items is displayed for your selection. You can use
the Line Spacing option to specify the spacing between text lines in pixels. To adjust
the line spacing, drag the Line Spacing slider or move the pointer to the desired
position on the slider and click MB1.
•
Foreground and Background Colors
You can customize the foreground and background colors of the input window.
For monochrome display, the following options are provided:
— Dark Text, Light Background
— Light Text, Dark Background
For color display, you can choose from a palette of colors to design a visually pleasing
input window. To customize the foreground or background color, you should first
select the color that you want to change by clicking one of the following toggle
buttons:
— Input Window Foreground Color
— Input Window Background Color
A color mixing window pops up in which you can mix the color using the three
sliders, which represent the intensities of the primary colors. The modified color is
displayed in the right half of the color box while the left half displays the original
color.
Tru64 UNIX Technical Reference for Using Chinese Features 7–13
7.3.4.7 Input Method Customization
There are several customizable attributes which apply to all input windows. In the
traditional Chinese input server, they are:
•
Default Input Method
•
Bell Volume
•
EDPC support
•
Invocate Key
In the Simplified Chinese input server, they are:
•
Default Input Method
•
Bell Volume
•
Display 5-Shape Radicals
•
Invocate Key
When you choose the Input Method Customization option, a dialog box pops in which you
can customize the attributes.
•
Default Input Method
When the Chinese input server is activated, the default Chinese input method is set to
the one you choose via this option. The input methods you can choose in the
traditional Chinese input server are:
— Tsang–chi
— Quick Tsang–chi
— Phonetic
— Internal code
The input methods you can choose in the Simplified Chinese input server are:
— 5-Stroke
— Row-Column
— Pin-Yin
— Telex Code
— 5-Shape
•
Bell Volume
7–14 Tru64 UNIX Technical Reference for Using Chinese Features
When you make an error while composing a Chinese character, the bell rings to alert
you. To adjust the bell volume, drag the Bell Volume slider or move the pointer to
the desired position on the slider and click MB1.
•
EDPC support (in dxhanyuim only)
The traditional Chinese input server supports the input of both CNS 11643 and
DTSCS (that is, EDPC) characters. However, you can choose to disable the input of
EDPC characters so that the data that you can enter contains only CNS 11643
characters. This option is useful if you need to prepare Chinese data and interchange
it with systems supporting only CNS 11643.
To enable or disable the input of EDPC characters, click on the EDPC Characters
Input button.
•
Display 5-shape Radicals (in dxhanziim only)
The Simplified Chinese input server supports the display of 5-shape radicals when you
choose the 5-stroke input method. The radicals are displayed after each candidate in
the candidate list during the pre-edit.
•
Invocate Key
The key sequences for invoking and switching Chinese input methods are set by
default. You can change these default key sequences to meet your personal preference
or working style. This option allows you to customize the following key sequences
for both dxhanyuim and dxhanziim:
— Start Input Method
— End Input Method
— Phrase Input
— Invoke Next Input Method
The following choices are for dxhanyuim only:
— Start Full Form
— End Full Form
— Tsang–chi
— Quick Tsang–chi
— Internal Code
— Phonetic
Tru64 UNIX Technical Reference for Using Chinese Features 7–15
The following choices are for dxhanziim only:
— 5-Stroke
— Row-Column
— Pin-Yin
— Telex Code
— 5-Shape
In the bottom part of the dialog box is an easy-to-use interface where you can
customize a key sequence. You can select a trigger key and toggle the on/off state of
the Ctrl, Alt, and Shift modifiers. The trigger keys that you can choose include
NoSymbol, [F1] - [F20], [Space], [Return], [Compose] and [A]- [Z]. If you choose
NoSymbol, no invocation sequence will be provided for the selected action.
For each modifier key, you can select the on/off state with the toggle buttons shown in
Table 7–5.
Table 7–5: Modifier State Customization
Modifier
On State
Off State
Ctrl
Ctrl
~Ctrl
Shift
Shift
~Shift
Alt
Alt
~Alt
The tilde (~) sign means that you should not press that modifier key when invoking
the action. In addition to the on/off state, you can also deselect both of the states for a
modifier key so that neither state is selected. To do this, click the toggle button which
is currently set on. If you deselect a modifier, the input server will accept the
invocation key with or without holding the modifier key.
When an invocation key sequence is selected, the state of the toggle switches and the
trigger key displayed at the bottom of the dialog box is updated to reflect the current
value and the label at the bottom left-hand side changes.
7–16 Tru64 UNIX Technical Reference for Using Chinese Features
Figure 7–4: Customization of Invocation Key Sequences in
dxhanyuim
Figure 7–5: Customization of Invocation Key Sequences in
dxhanziim
Tru64 UNIX Technical Reference for Using Chinese Features 7–17
For example, if you want to change the End Input Method key sequence to
[Ctrl/Space], select the Ctrl, ~Alt and ~Shift buttons.
To reduce the number of keys required for selecting input methods, set the hot keys
for all the input methods to "NoSymbol", and define a key sequence for the Choose
Next Input Method option. This method releases the [F6] to [F10] function keys for
use by other DECwindows applications. To switch the input method, press the key
sequence for Choose Next Input Method and the Chinese input server will cycle
through all supported input methods. For example, if you use the Tsang–Chi input
method and you want to switch to the internal code input method, press the hot key
twice.
7.3.4.8 Help
The Help option provides the following menu items that you use to display help for the
Chinese input server:
•
Context-Sensitive Help
•
Overview
•
Using Help
•
Product Information
7.3.4.9 Quit
Use the Quit option to terminate the input server. If you select this option, a dialog box
pops up and asks if you really want to exit.
7.3.5 Saving Your New Settings
All attributes that you customize with the Current Window and Input Method
Customization menus can be saved into a resource file in your login directory. Each
customization window provides the following options:
•
Save Settings as Defaults
Saves all current attributes as default values. These attributes are saved to a private
resource file in your login directory; DXhanziim if you are using dxhanziim, and
.DXhanyuim if you are using dxhanyuim.
•
Restore system setting
Restores all system default attributes
7–18 Tru64 UNIX Technical Reference for Using Chinese Features
7.4 Alphabetic Input Methods
There are two alphabetic input methods available under the English mode:
•
Half Form Alphabet
•
Full Form Alphabet
The Half Form Alphabet method allows you to enter uppercase and lowercase English
characters, numerals, and symbols marked on the keyboard. Full Form Alphabet allows
you to enter 2-byte alphabets, numerals, and symbols defined in the Chinese character sets.
To invoke the Full Form Alphabet input method, press [Shift/Space]. The string
(full form) is displayed in the status area, as shown in Figure 7–6. Once the prompt
appears, all characters that you enter at the keyboard are sent as 2-byte characters.
To exit the Full Form Alphabet input method, press [Shift/Space] again.
Figure 7–6: Full Form Alphabet Input Method
`
7.5 Tsang–Chi Input Method
To understand the Tsang–Chi input method, you must understand the concepts of Tsang–
Chi root radicals, auxiliary forms, and character-splitting methods.
7.5.1 Tsang–Chi Root Radicals
The Tsang–Chi input method is based on the concept of root radicals. The input method
requires a Chinese character to be broken down into various root radicals according to its
shape. Altogether 24 Tsang–Chi root radicals have been defined with which almost all
existing Chinese characters can be composed. The root radicals are divided into four
groups and assigned to the alphabet keys [A] - [Y] (with the exception of the [X] key) on
the main keyboard. Table 7–6 a-d illustrate the classification of the root radicals, their
corresponding English keys, their auxiliary forms, and the way that they are derived.
Table 7–7 is a shorter table for quick reference.
Tru64 UNIX Technical Reference for Using Chinese Features 7–19
Table 7–6: Tsang–Chi root Radicals Classification
a.
Philosophical:
7–20 Tru64 UNIX Technical Reference for Using Chinese Features
b.
Stroke:
c.
Human Body:
Tru64 UNIX Technical Reference for Using Chinese Features 7–21
d.
Form:
7–22 Tru64 UNIX Technical Reference for Using Chinese Features
Table 7–7: Quick Reference Table of the Tsang–Chi Root Radicals
Tru64 UNIX Technical Reference for Using Chinese Features 7–23
7.5.2 Tsang–Chi Code Generation
To input a Chinese character using the Tsang–Chi input method, its Tsang–Chi code
should be generated based on character decomposition. Most Chinese characters can be
divided into two categories: the composite form and the connected form. The composite
form can be split into a character head and a character body while the connected form
cannot.
Table 7–8: Composition Form Characters
Composite Form
Examples
Left-right form
Top-bottom form
Inclusion form
Table 7–9: Connected Form Characters
Connected Form
Examples
Connected form
7.5.2.1 General Rules
The general rules for generating Tsang–Chi codes are:
•
The character category must be composite or connected.
•
The code according to the writing order is usually one of the following:
•
From outside to inside
•
From top to bottom
•
From left to right
•
The number of radicals is up to 5 for composite characters, where the character head
can be decomposed into at most 2 radicals and the character body can be decomposed
into at most 3 radicals. Connected characters can be decomposed into no more than 4
radicals.
•
If more than one Tsang–Chi code exists for a character, use the one with fewer
radicals.
•
If several Tsang–Chi codes have the same number of radicals, use the one which
better represents the character.
7–24 Tru64 UNIX Technical Reference for Using Chinese Features
7.5.2.2 Connected Characters
Connected characters are those which cannot be split due to the existence of crossed or
,
. Each character can be input by entering at most
connected strokes, such as ,
4 radicals. If more than 4 radicals can be derived, the first 3 radicals and the last can be
taken to generate the Tsang–Chi code. Table 7–10 illustrates some examples of
decomposing connected characters.
Table 7–10: Examples of Connected Character Decomposition
Tru64 UNIX Technical Reference for Using Chinese Features 7–25
7.5.2.3 Composite Characters
Composite characters are those that can be split from top to bottom, left to right, and
,
. You can decompose the character head into 1 to 2
outside to inside, such as ,
radicals. If more than 2 radicals are generated, take the first and the last radicals. The
character body can be decomposed into 1 to 3 radicals. If it is made up of 3 or fewer
radicals, you should enter all the radicals. If it is made up of more than 3 radicals which
are connected, enter the first two radicals and the last. If the character body is itself a
composite character, you can further decompose the character body into subhead and
subbody. The radicals you should enter are the first and last radicals of the subhead, and
the last radical of the subbody.
Table 7–11 illustrates examples of decomposing composite characters. In the "Shape"
column, a solid square represents a character head while a square represents a radical of
the character body.
7–26 Tru64 UNIX Technical Reference for Using Chinese Features
Table 7–11: Composite Character Decomposition
Tru64 UNIX Technical Reference for Using Chinese Features 7–27
7.5.2.4 Exceptional Characters
Approximately 95% of Chinese characters can be decomposed according to the rules
described in Section 7.5.2.3. The remaining 5% are exceptional characters that need to be
entered in different ways. The exceptional characters can be divided into the following
groups:
•
Compound Characters
The Tsang–Chi input method has defined 9 compound characters. A compound
character can be a connected character, or the character head or body of a composite
character. In any case, compound characters must be represented by their first and
last radicals.
Table 7–12: Compound Characters
•
Difficult Characters
Difficult Characters are those which are difficult to decompose in the Tsang–Chi
method. Usually, these characters are composed of some special root radicals which
are neither the Tsang–Chi root radicals nor their auxiliary forms. Using the Tsang–
and therefore
Chi input method you can press the [X] key (which is labeled with
7–28 Tru64 UNIX Technical Reference for Using Chinese Features
will be referred to as the [ ] in this document) to access the special root radicals and
use them to compose difficult characters.
The rules of decomposing difficult characters are:
— If it is easy to identify the first and the last radicals, you can enter the first radical,
the [ ] key, and the last radical.
can be decomposed into
For example,
(HXH).
— If the first radical is easy to identify while the others are difficult, enter the first
radical and then press the [ ] key for the rest.
For example,
decomposed into
— Never use the [
can be decomposed into
(HX).
(YX), and
can be
] key for the first radical.
Table 7–13: Difficult Characters
•
Special Characters
!,
Some special characters are composed by superimposing the root radicals
- and
on other strokes or radicals. To keep the decomposed radicals as
simple as possible, take the root radicals first, before the rest of the character is
entered.
Tru64 UNIX Technical Reference for Using Chinese Features 7–29
Table 7–14: Special Characters
7.5.3 Invoking Tsang–Chi Input Method
When you invoke the Tsang–Chi input method, the Chinese string
in the status area, as shown in Figure 7–7.
is displayed
Figure 7–7: Invocation of the Tsang–Chi Input Method
The radicals that you enter with the Tsang-Chi input method are displayed in the pre-edit
area, as shown in Figure 7–8. To correct the data, press the Delete key and reenter the
key (that is, F6 on a standard LK201
correct radical. Alternatively, you can press the
or LK401 keyboard) to erase all radicals in the pre-edit buffer. If fewer radicals are
required to input a character, you can press the Return key or Space bar to signal the end
of input.
Figure 7–8: Entering a Tsang–Chi Radical
7.5.4 Multiple Candidates
If there is exactly one character represented by a Tsang–Chi code, the character is sent
directly to the application. Sometimes, multiple candidates for a Tsang–Chi code are
7–30 Tru64 UNIX Technical Reference for Using Chinese Features
available for selection when the code represents more than one Chinese character. In this
case, the candidates are displayed in the pre-edit area in the following order:
•
Most frequently-used
•
Less frequently-used
•
Seldom used
The pre-edit area can display up to 9 candidates at a time.
Figure 7–9: Multiple Candidates
2!
!!!5!
!!!8!
!!
The numbers 1, 4, and 7 divide the 9 characters into 3 groups so that you can easily select
the desired candidate. To select a character that is displayed in the pre-edit area, press the
corresponding numeric key on the main keyboard.
When there are more than 9 candidates for selection, the indicators,
are displayed in the pre-edit area. Table 7–15 lists the indicators and their
definitions.
and
Table 7–15: Meaning of Arrow Characters
Indicator
Definition
The current row is the first row and you can press [Space] or [⇒] to move to the
next row.
The current row is somewhere between the first and the last row. You can press:
•
[Space] or [⇒] — move to the next row
•
[⇐] — move to the previous row
•
[⇑] — move to the first row
The current row is the last row and you can press [⇐] to move to the previous
row or [⇑] to the first row.
If you enter another Tsang–Chi code without selecting a candidate, the first candidate in
the list is sent to the application.
If you do not want to select any candidate, but want to clear the Tsang–Chi code, press the
key (that is, F6).
Return key or the
7.5.5 Repeat Character Input
If you want to repeat input of the same character, press the equals (=) key.
Tru64 UNIX Technical Reference for Using Chinese Features 7–31
7.5.6 Error Handling
If you input incorrect data, the bell will ring. If no character is generated after you enter a
Tsang–Chi code, this indicates that there is no character for the code. The radicals entered
remain in the pre-edit buffer. To handle the error, you can do one of the following:
1.
Press the Delete key to erase the radicals, one at a time, and then reenter the correct
radicals.
2.
Press the Return key or the
key to erase all radicals in the pre-edit buffer, and
then reenter the correct radicals.
3.
Enter new radicals without pressing the Return key. The radicals in the pre-edit buffer
are replaced by the newly-entered radicals.
7.6 Quick Tsang–Chi Input Method
The Quick Tsang–Chi input method, also known as the Quick input method or the socalled Easy input method, is a variant of the Tsang–Chi input method and follows the
same principles and rules for decomposing characters into radicals. However, the process
for entering radicals is simplified and requires only the first and the last radicals. For
is decomposed in the Tsang–Chi input method into
example, the character
,
,
,
. The Quick Tsang–Chi input method, in this case, requires input
and
.
of only
This section discusses the mechanism of the Quick Tsang–Chi input method. For details
about character decomposition, see Section 7.5, Tsang–Chi Input Method.
7.6.1 Quick Tsang–Chi Code Generation
As in the Tsang–Chi input method, the character decomposition in Quick Tsang-Chi is
based on whether a character is of the composite form or the connected form. However,
the Quick Tsang–Chi input method requires only the first and the last radicals regardless of
the number of radicals obtained.
7.6.2 Invoking Quick Tsang–Chi Input Method
When you invoke the Quick Tsang–Chi input method, the Chinese string
displayed in the status area, as shown in Figure 7–10.
!is
Figure 7–10: The Quick Tsang–Chi Input Method
7.6.3 Entering Quick Tsang–Chi Code
The radical that you enter is displayed in the pre-edit area, as shown in Figure 7–11. To
correct the data, press the Delete key and reenter the correct radical. Alternatively, you
7–32 Tru64 UNIX Technical Reference for Using Chinese Features
can press the
key (that is, F7 on a standard LK201 or LK401 keyboard) to erase all
radicals in the pre-edit buffer. If only one radical is required to input a character, press the
Return key or Space bar to signal the end of input.
Figure 7–11: Entering a Quick Tsang–Chi Code
7.6.4 Multiple Candidates
If exactly one character is represented by a Quick Tsang–Chi code, the character is sent
directly to the application. Frequently, multiple candidates for a Quick Tsang–Chi code
are available for selection when the Quick Tsang–Chi code represents more than one
Chinese character. In this case, the candidates are displayed in the pre-edit area.
If you do not want to select any candidate, but want to clear the Tsang–Chi code, press the
key (that is, F7).
Return or the
7.6.5 Repeat Character Input
If you want to repeat the input of the same character, press the equals (=) key.
7.6.6 Error Handling
If you input incorrect data, the bell will ring. If no character is generated after you enter a
Quick Tsang–Chi code, this indicates that there is no character for the code. The radicals
entered remain in the pre-edit buffer. To handle the error, you can do one of the
following:
•
Press the Delete key to erase the radicals, one at a time, and then reenter the correct
radicals.
•
Press the Return key or the
key to erase all radicals in the pre-edit buffer, and
then reenter the correct radicals.
•
Enter new radicals without pressing the Return key. The radicals in the pre-edit buffer
are replaced by the newly entered radicals.
7.7 Phonetic Input Method
The Phonetic input method is based on Chinese phonetic symbols (bopomofo) that
represent the pronunciation of Chinese characters.
7.7.1 Phonetic Symbol Categories
Phonetic symbols can be divided into 3 categories: consonants, vowels, and tone marks.
There are 21 consonants, 16 vowels, and 5 tone marks. The 5 tone marks for Chinese
pronunciation are the first, the second, the third, the fourth and the light tones. Chinese
Tru64 UNIX Technical Reference for Using Chinese Features 7–33
phonetic symbols are assigned to the alphanumeric keys on the main keyboard. Table 7–16
summaries all consonants, vowels, and tone marks.
Table 7–16: Phonetic Symbols
7–34 Tru64 UNIX Technical Reference for Using Chinese Features
______________________________ Note ____________________________
-and
are also called semi-vowels.
The vowels ,
_______________________________________________________________
7.7.2 Phonetic Code Generation
The pronunciation of a Chinese character is composed of consonants, vowels, and tone
marks. Therefore, a phonetic code can be generated according to the following rules:
•
The phonetic symbols are entered in the following order:
- Consonant
- Vowel
- Tone marks
•
A phonetic representation must have at least one consonant or one vowel.
•
The end of input must be indicated with a tone mark or a Return.
Table 7–17: Examples of Phonetic Input
Code Format
Phonetic Symbols
Characters
Consonant + vowel + tone mark
Consonant + [vowel] + tone mark
-!
Vowel + tone mark
-!
Consonant + semivowel + vowel + tone
mark
,
Semivowel + vowel + tone mark
7.7.3 Invoking Phonetic Input Method
When you invoke the Phonetic input method, the Chinese string
the status area, as shown in Figure 7–12.
The example in Figure 7–12 shows how to input the character
(RU6) at the main keyboard.
phonetic symbols
is displayed in
by entering the
Figure 7–12: Entering the Phonetic Symbols for
[
]
1
4
7
Tru64 UNIX Technical Reference for Using Chinese Features 7–35
7.7.4 Entering Phonetic Code
The phonetic symbols that you enter are displayed in the pre-edit area. To correct the
data, press the Delete key and reenter the correct symbol. Alternatively, you can press the
] key (that is, F10 on a standard LK201 or LK401 keyboard) to erase all phonetic
[
symbols in the pre-edit buffer.
You can press various termination keys to signal that you are done entering the phonetic
symbols. Table 7–18 shows the results of entering phonetic symbols with different
termination keys.
Table 7–18: Phonetic Symbols with Different Termination Keys
Tone
Key
Description
Example
1
st
Space
Lists characters with the same
consonant or vowel in the order of
the first, second, third, fourth, and
light tone marks.
Type
2
nd
6
Lists characters of the second tone.
Type
rd
3
Lists characters of the third tone.
Type
- then press the Space bar,
//////-!
///-!
///
will be listed.
///
3
will be listed.
/// ! will be
listed.
th
4
4
Lists characters of the fourth tone.
-
Type
/// ! will be
listed.
-!!
/// ! will be
Light
7
Lists characters of the light tone.
Type
listed
None
Return
Characters corresponding to the
phonetic symbols are displayed
according to the order of tone
marks. If only a consonant is
entered before pressing Return,
characters corresponding to any
valid combinations of this
consonant and other vowels are
also displayed.
-, then press the Return
Type
key, characters corresponding to the
order
///-
///listed.
///
will be
7.7.5 Multiple Candidates
If a phonetic string matches exactly one character, the character is sent directly to the
application. Frequently, a phonetic string matches multiple Chinese characters. In this
case, the candidates are displayed in the pre-edit area.
7–36 Tru64 UNIX Technical Reference for Using Chinese Features
If you do not want to select any candidate, but want to clear the phonetic code, press the
key (that is, F10).
Return key or the
7.7.6 Repeat Character Input
If you want to repeat the input of the same character, press the equals (=) key.
7.7.7 Error Handling
If you input incorrect data, the bell will ring. If no character is generated after you enter a
phonetic code, this indicates that there is no character for the code. The phonetic symbols
entered remain in the pre-edit buffer. To handle the error, you can do one of the
following:
•
Press the Delete key to erase the radicals, one at a time, and then re-enter the correct
radicals.
•
Press the Return key or the
key to erase all phonetic symbols in the pre-edit
buffer, and then reenter the correct phonetic symbols.
•
Enter new phonetic symbols without pressing the Return key. The phonetic symbols
in the pre-edit buffer are replaced by the newly entered symbols.
7.8 Internal Code Input Method
Each character in DEC Hanyu has been assigned a unique internal code, just like the ID
number of a company employee. For a complete list of the characters and their internal
codes, see The DEC Chinese Code Book (Part Number EK-VT38D-CB-001).
______________________________ Note ____________________________
In this Compaq Tru64 UNIX release, the Internal Code input method supports only the
DEC Hanyu internal code. The internal codes for Taiwanese EUC and Big-5 are not
supported. Even if you set the locale to one of the Taiwanese EUC or Big-5 locales,
this input method still requires you to specify the DEC Hanyu internal code.
_______________________________________________________________
7.8.1 Input Procedure
When you invoke the Internal Code input method, the Chinese string
displayed in the status area.
is
To enter an internal code, you can optionally enter a character set number to be followed
by a four-digit hexadecimal number which specifies the position of the character with
respect to the character set. The character set number can be 1 for the CNS 11643-1986
character set, or 2 for the DTSCS character set.
Tru64 UNIX Technical Reference for Using Chinese Features 7–37
If you omit the character set number and enter only the four-digit hexadecimal code, you
must press the Return key or the Space bar to signal the end of input. If you enter the
character set number and the four-digit hexadecimal code, the respective character is sent
automatically without pressing the Return key. Figure 7–13 shows the input of
using the Internal Code input method.
Figure 7–13: Input of
Using the Internal Code Input Method
`
1CBF
The internal code that you enter is displayed in the pre-edit area. To correct the data, press
key
the Delete key and re-enter the correct code. Alternatively, you can press the
(that is, F8 on a standard LK201 or LK401 keyboard) to erase all characters in the pre-edit
buffer.
Since internal codes are unique for any symbols or Chinese character in DEC Hanyu, there
are not multiple candidates for an internal code.
7.8.2 Repeat Character Input
If you want to repeat the input of the same character, press the equals (=) key.
7.8.3 Error Handling
If you input incorrect data, the bell will ring. If no character is generated after you enter
an internal code, this indicates that there is no character for the code. The internal code
entered remains in the pre-edit buffer. To handle the error, you can do one of the
following:
•
Press the Delete key to erase the characters representing the internal code, one at a
time, and then reenter the correct internal code.
•
Press the Return key or the
key to erase all characters in the pre-edit buffer, and
then reenter the correct internal code.
•
Enter a new internal code without pressing the Return key. The characters in the preedit buffer are replaced by the newly entered internal code.
7.9 Phrase Input Method
The Phrase input method is a mechanism designed to facilitate the input of frequently used
phrases. You can define your own frequently used phrases by preparing your own phrase
database. Each phrase is identified by a phrase code. To input a phrase, you enter its
phrase code and then convert it to the respective phrase.
7–38 Tru64 UNIX Technical Reference for Using Chinese Features
The Compaq Tru64 UNIX operating system provides a Phrase Utility that you use to
create phrase databases. For details, see Writing Software for the International Market.
In addition, the firmware of the VT382-D traditional Chinese terminal is designed to allow
you to download phrases into its built-in memory. For DECwindows Motif applications,
you do not need to download phrase definitions because the Chinese input servers can
directly access the phrase databases. In addition, you can select the phrase database being
used.
7.9.1 Input Procedure
When you invoke the Phrase input method, the Chinese string
is displayed in the
in the status area of dxhanziim. Figure 7–14
status area of dxhanyuim, and
shows an example of entering a phrase in dxhanyuim by specifying its phrase code.
Figure 7–15 shows an example of converting the phrase code to the phrase.
Figure 7–14: Entering a Phrase Code
0BNCBTTBEPS
74-!Divoh!Tibo!O/!Se/-!Tfd/!3-!Ubjqfj
`
ASIA
Figure 7–15: Converting a Phrase Code to a Phrase
!
0BNCBTTBEPS
!
!74-!Divoh!Tibo!O/!Se/-!Tfd/!3-!Ubjqfj
!
0BTJB!XPSME!QMB[B`!!\
^
The phrase code that you enter is displayed in the pre-edit area, as shown in Figure 7-14.
To correct the data, press the Delete key and reenter the correct code. Alternatively, you
key (that is, F9 on a standard LK201 or LK401 keyboard) to erase all
can press the
characters in the pre-edit buffer.
The phrase code can consist of at most 8 characters. If it has fewer than 8 characters, press
the Return key or Space bar to signal the end of input. If it has exactly 8 characters, the
respective phrase is sent automatically without having to press the Return key after you
enter the last character.
Tru64 UNIX Technical Reference for Using Chinese Features 7–39
The Phrase input method is different from other input methods in the sense that once a
phrase is entered, the input mode switches back to the original input mode from which the
key again and
Phrase input method was invoked. To input another phrase, press the
then enter another phrase code.
7.9.2 Error Handling
If the data that you input is incorrect, the bell will ring.
7.9.2.1 Condition 1
(Requested phrase does not exist) is displayed when
If the message
you enter a phrase code, you can check the following:
•
Check to be sure that the phrase is in the phrase definition file.
•
If you are using a VT382-D terminal:
- Check to be sure that the phrase definition file containing the phrase definition
has been downloaded to the VT382-D Chinese terminal.
- Check to see if the terminal power supply has been interrupted since you last
loaded the phrase definition file.
- Check to see if you have reset the VT382-D terminal. The Recall and Default
operations provided by the VT382-D setup menu clear the phrase definitions which
have been downloaded.
•
If you are using a Chinese input server:
- Check to be sure that the phrase definition file containing the phrase definition
has been selected by the Chinese input server.
- Check to see if the LANG environment variable has been set up correctly before
starting the Chinese input server.
If the phrase has been defined but it does not exist in the in-memory phrase database,
reload or reselect the phrase definition file.
7.9.2.2 Condition 2
Errors can occur in the phrase definition file. If you cannot solve the problem using the
procedures in Section 7.9.2.1, you can do one of the following:
•
Check the syntax of the definition statements in the phrase definition file. A phrase
cannot be downloaded if the syntax of its definition is incorrect.
7–40 Tru64 UNIX Technical Reference for Using Chinese Features
•
If you are using a VT382-D terminal, a maximum of 100 phrases can be downloaded.
If the number of phrases in the phrase definition file exceeds this limit, the definitions
beyond the limit are not downloaded.
7.10 Symbol Input in Dxhanyuim
The Symbol input method is a more intuitive way to input two-byte symbols, such as
punctuation marks, table and mathematical symbols, foreign characters, phonetic symbols,
and traditional Chinese control characters.
You can invoke the Symbol input method only from the Tsang–Chi or the Quick Tsang–
Chi input method. You can use it to input more than 600 symbols, including all full form
alphabets (A -Z, a-z), full form numerals (0-9), full form symbols (such as - -! ///),
phonetic elements (such as -! -! -! ), Chinese radicals and special symbols (such
as -! -! *.
7.10.1 Invoking Symbol Input Mode
To enter the Symbol input mode, press the Z key when invoking Tsang–Chi or the Quick
Tsang–Chi input method. When you invoke the Symbol input mode, the Chinese string
is displayed in the status area.
7.10.2 Rules of Symbol Input
Use the following rules when entering symbols:
•
If a symbol exists on the keyboard, input it the same way you enter single-byte
characters.
Symbol
•
Input Code
Key Sequence
!
<Shift/1>
#
<Shift/3>
$
<Shift/4>
A
[A]
A
[a]
Symbols of similar form or meaning have the same key sequence.
Symbol
Input Code
Description
(
vertical parenthesis
)
vertical parenthesis
(
horizontal parenthesis
)
horizontal parenthesis
Tru64 UNIX Technical Reference for Using Chinese Features 7–41
•
If a symbol does not exist on the keyboard, decompose it according to its shape.
Symbol
Key Sequence
[=][/]
[O][+]
[S][S]
[K][G]
•
For graphical symbols, press the hyphen (-) key and a number within the range 1-8 to
specify horizontal symbols or press the vertical bar key ( | ) and a number within the
range 1-7 to specify vertical symbols.
Symbol
•
Key Sequence
Symbol
Key Sequence
[-][1]
[|][1]
[-][2]
[|][2]
[-][3]
[|][3]
[-][8]
[|][7]
To enter symbols for constructing a table or diagram, press the T key and an alphabet
key specifying the direction.
Symbol
Key Sequence
[T][Z]
[T][X]
[T][C]
-
[T][A]
[T][S]
,
•
Q
W
E
A
S
D
Z
X
C
To enter arrows of various directions, press the A key and an alphabet key specifying
the direction.
7–42 Tru64 UNIX Technical Reference for Using Chinese Features
Symbol
Key Sequence
[A][Z]
[A][X]
[A][C]
[A][A]
Q
W
A
Z
•
E
D
X
C
Press the P key to indicate the input of phonetic symbols. To specify the required
symbol, press the key with the respective phonetic symbol marked on the keyboard.
See Chapter 6 for the keymap of phonetic symbols.
Symbol
Key Sequence
[P][1]
[P][Q]
[P][A]
[P][Z]
[P][2]
•
Press the G key (or g for the lowercase) to input Greek letters and then enter the first
two letters of the Greek character’s name. Use uppercase or lowercase letters as
appropriate.
Symbol
Key Sequence
[G][A][L]
[G][B][E]
[g][a][l]
[g][b][e]
[G][E][P]
7.10.3 Entering Symbol Code
You can input a symbol by entering its symbol code and then pressing the Return key or
Space bar to signal the end of input.
Tru64 UNIX Technical Reference for Using Chinese Features 7–43
The following example describes how to enter the full form alphabet
1.
Press the Z key to invoke the Symbol input mode.
2.
Press the A key to enter the first alphabet of the Symbol code.
;
B
7.10.4 Multiple Candidates
If a symbol code matches exactly one character, the character is sent directly to the
application. However, sometimes a symbol code matches multiple candidates. In this
case, the candidates are displayed in the pre-edit area. To continue the preceding example:
1.
In the preceding example, press the Return key to signal the end of input.
B
2.
!2!
Press the 1 key to select
_
!!!!!!!
______________________________ Note ___________________________
After the symbol is input, the input mode switches back to the original input mode.
______________________________________________________________
7.10.5 Error Handling
If no symbol is input after you enter a symbol code, the symbol code is invalid and the
input mode switches back to the original input mode. You can press the Z key to invoke
the Symbol input mode again and reenter the correct symbol code.
7.11 Input of User–Defined Characters in DECwindows Motif
The Compaq Tru64 UNIX software supports the input of user-defined characters (UDC)
through the Tsang–Chi and Quick Tsang–Chi input methods. When you invoke the
Chinese input server, the following Tsang–Chi dictionaries are loaded:
•
The private UDC Tsang–Chi dictionary that you create
•
The system wide UDC Tsang–Chi dictionary
7–44 Tru64 UNIX Technical Reference for Using Chinese Features
•
The system default Tsang–Chi dictionary
The Tsang–Chi and Quick Tsang–Chi input methods display candidates, which match the
input key sequence you enter, in this order.
You can use the cedit utility to define the input sequences of user-defined characters and
then use the cgen command with the -iks option to produce a UDC Tsang–Chi
dictionary for the Chinese input server. You should also define the corresponding font
glyphs (both 18X16 and 24X24 size fonts). After you define the user-defined characters
and their corresponding information in the system or user database, restart the input server
to reread the database. For details, see Writing Software for the International Market or
the cgen reference pages.
By default, the Chinese input server puts your private UDC Tsang–Chi dictionary in the
~/.iks/dwimdb-dechanyu directory You can use the USER_UCD_DICT
environment variable to override this default.
The system wide UDC Tsang–Chi dictionary is, by default, placed in the
/var/i18n/iks/dwimdb-dechanyu directory. If the dictionary is located elsewhere
on your system, you use the SYSTEM_UDC_DICT variable to override the default
location. If you are a system administrator, you can install a UDC Tsang–Chi dictionary
by placing it at the default location and setting proper access mode for system-wide use.
The system default Tsang–Chi dictionary, which is shipped with the Compaq Tru64 UNIX
system, is placed in the /usr/i18n/hanyu/hanyu_tsangchi.dic directory. If it
is installed at a different location, you can use the HANYU_SYSTEM_DICT environment
variable to specify the location.
______________________________ Note ____________________________
In this Compaq Tru64 UNIX release, the Chinese input server supports UDC input key
sequences only in the DEC Hanyu codeset.
To display UDC in DECwindows Motif applications or the Chinese input server, UDC
fonts should be available on your system. For instance, the Chinese input server will
display UDC using the fonts -adecw-screen-*--18-*-cns11643.1986-udc or -adecwscreen-*--24-*-cns11643.1986-udc.
If you define your own UDC fonts, you can override the system UDC fonts by adding
the directory in which the UDC fonts are located to the font path. For example, if
your private UDC fonts are located in the ~/fonts directory, you can enter the
following command to update the font path:
%xset +fp ~/fonts
Tru64 UNIX Technical Reference for Using Chinese Features 7–45
Use the cedit utility to define UDC font glyphs (both 18X16 and 24X24 size fonts)
and the cgen utility to generate fonts for use in DECwindows Motif. For details, see
Writing Software for the International Market.
______________________________________________________________
7.12 5–Stroke Input Method
The 5-Stroke input method makes use of the basic strokes to construct the Chinese
characters. A stroke is a segment of continuous line or curve that constitutes a Chinese
character. Table 7–19 describes the five categories of strokes.
Table 7–19: Stroke Categories
Category
Description
Horizontal strokes or
Horizontal lines and left-to-right ticks
Vertical strokes or
Slash or
Vertical lines and right-to-left hooks
Backslash or
Slanting lines and curves drawn towards lower-left
Zip-zap curves or
Dots
right
, slanting lines, and curves drawn towards lower-
Including different types of joints or corners which can be drawn
in single continuous strokes
Using the 5-Stroke input method you can input single Chinese characters and Chinese
terms. Approximately 5,000 terms can be input using this method.
7.12.1 Input Mechanism
To input a Chinese character using the 5-Stroke input method, enter its 5-stroke code
through the numeric keypad according to the writing order. Table 7–20 shows the codes
representing the five categories of strokes.
Table 7–20: 5-Stroke Code
Stroke
Code
Key
1
[KP1]
2
[KP2]
3
[KP3]
4
[KP4]
5
[KP5]
7–46 Tru64 UNIX Technical Reference for Using Chinese Features
Figure 6–5 shows the numeric keypad layout for entering 5-Stroke codes.
The following are the general rules of writing order for Chinese characters:
•
From top to bottom
•
From left to right
•
From outside to inside
•
Writing radical inside before drawing the last stroke of the outside radical (for
)
example, the last stroke of
7.12.1.1 Single Character Input
If a Chinese character is composed of exactly five strokes, enter the strokes according to
the writing order. If it is composed of fewer than five strokes, press the KP0 key to signal
the end of input. If it is composed of more than five strokes, enter the first four strokes
and the last stroke. Table 7–21 shows some examples of using the 5-Stroke input method.
Table 7–21: Input of Single Characters with the 5-Stroke Input
Method
Character
No. of
Strokes Write Order
5-Stroke
Code
Key Sequence
5
35112
[KP3][KP5][KP1][KP1][KP2]
4
12510
[KP1][KP2][KP5][KP1][KP0]
9
43254
[KP4][KP3][KP2][KP5][KP4]
If you are uncertain about the type of strokes or the writing order of strokes, press the
wildcard key (KP6) in place of the strokes. For details, see Section 7.12.5, Wildcard Key.
7.12.1.2 Input of Terms
The 5-Stroke input method can be used to input terms. A maximum of eight strokes is
required for entering terms. Press the [KP7] key before inputting strokes. The [KP7] key
signals the system that subsequent key strokes are for composing a term instead of a
character.
The number of strokes input for each character in a term depends on the number of
characters composing the term as shown in Table 7–22.
Tru64 UNIX Technical Reference for Using Chinese Features 7–47
8
Chinese Printing Support
This chapter introduces the Chinese printing support provided by the Compaq Tru64 UNIX
operating system. It describes the supported printers, print file formats, features, and
maintenance procedures for supporting Chinese printing.
8.1 Supported Printers
The Compaq Tru64 UNIX software supports text and PostScript printers.
8.1.1 Text Printers
The Compaq Tru64 UNIX software supports text printers with built-in Chinese fonts.
8.1.2 PostScript Printers
The Compaq Tru64 UNIX software supports Chinese printing on PostScript printers in two
ways:
•
Based on the built-in or downloaded fonts installed in printers
•
Based on the font-faulting mechanism, which is explained in Section 8.3, Printing
Features
•
Based on the font-embedding mechanism
For details about the supported printer types and print filters, see Chapter 4, Local
Language Devices.
8.2 8.2 Print File Formats
The Compaq Tru64 UNIX software supports printing mixed ASCII and Chinese characters
in the following print file formats:
•
Plain text files on text printers and PostScript printers
Tru64 UNIX Technical Reference for Using Chinese Features 8–1
•
Files with the nroff control sequences (for printing with underline, superscript,
subscript, and bold attributes) on text printers and PostScript printers
•
PostScript files on PostScript printers
The print filters for PostScript printers can automatically detect the format of a print file
and convert it to the proper format for printing.
8.3 Printing Features
The Compaq Tru64 UNIX software supports the following printing features:
•
Font embedding
•
Font faulting
•
Software on-demand fond loading (SoftODL)
•
Codeset conversion
•
Outline fonts
8.3.1 Font Embedding
Font embedding refers to a mechanism where all the necessary font data is embedded
within the printer input file. It is not necessary for printers to have resident fonts. This
mechanism is supported only on PostScript printers that support PostScript level 2 or level
1 with multibyte font extension. Outline fonts are used for the embedding, if available.
Otherwise, lower quality bitmap fonts are used.
The font-embedding mechanism is useful since few printers have Chinese resident fonts.
However, there must be enough memory inside the printer to hold the font data or the print
job might fail. The amount of memory required depends on the nature of the print job. It is
recommended that at least 4 MB or more printer memory be available for the embedding
mechanism.
Because of the large amount of data transferred to the printer, it is also necessary to use a
high speed link between the host computer and the printer. Using the serial port for
printing is too slow for the font-embedding mechanism.
See Writing Software for the International Market and the wwpsof(8) reference page for
more information about how to use the font embedding mechanism.
8.3.2 Font Faulting
Font faulting is a mechanism used to handle the large memory required by fonts for some
codesets, particularly multibyte codesets for Asian languages. Using font faulting, font
information is stored on either:
•
The secondary storage of a supporting host machine, called a font-faulting server, or
8–2 Tru64 UNIX Technical Reference for Using Chinese Features
•
An internal font disk
The font information is loaded into the printer on demand, thus conserving printer
memory.
Font faulting is often essential for multibyte ideographic fonts because the memory
required to store a single font can exceed the memory capacity of many printers.
Specialized local language printers, such as Japanese printers, do not require font faulting
because the local language fonts reside on the printer. However, other printers require a
mechanism to load these fonts as needed for different parts of the same print job.
The font-faulting mechanism is very useful for a desktop printing environment, in which a
large number of different single-byte fonts may be required. In this case, simultaneously
storing all the fonts in memory reduces the available memory, and therefore speed, of the
printer. Also, it is possible that the number of required fonts is so large that they cannot all
be in memory at the same time.
Font faulting for multibyte fonts is done on a per character (or per glyph) basis because
these fonts support extremely large numbers of characters. Font faulting for single-byte
fonts is done on a per font basis. Single-byte fonts are small and relatively simple, so
loading the whole font is more efficient.
The font-faulting mechanism can be used with the following printers:
•
DEClaser 1152
•
DEClaser 5100
•
PrintServer 17
See Section 8.5, Chinese Printing Setup for information about configuring these printers.
8.3.3 Software On–Demand Font Loading
Software On-Demand Font Loading (SoftODL) is a mechanism through which a terminal
or a bitmap printer downloads the relevant bitmap font information for a user-defined
character (UDC) at the time the character needs to be displayed or printed. The Chinese
bitmap printers that support this feature include:
•
CP382D controller (for traditional Chinese)
•
LA88-C (for Simplified Chinese)
•
LA380-CB (for Simplified Chinese)
8.3.4 Codeset Conversion
The Compaq Tru64 UNIX software includes a codeset conversion mechanism used to print
text files that have a codeset different from the one used by the printer. For printers with
built-in or downloaded Chinese fonts, the codeset of the printer should be defined to match
Tru64 UNIX Technical Reference for Using Chinese Features 8–3
the codeset of the built-in fonts. For printers using the font-faulting mechanism, the
codeset of the printer should be defined to match the codeset of the font to be loaded. For
printers using the font-embedding mechanism, the codeset of the printer should not be
defined.
8.3.5 Outline Fonts
The Compaq Tru64 UNIX software provides a large set of outline fonts for printing files in
various languages. Depending on how many local language support subsets are installed
on your system, more than 150 outline fonts may be available.
There are four sets of Chinese outline fonts, two for traditional Chinese and two for
Simplified Chinese. These fonts are:
•
Sung-Light-CNS11643
•
Hei-Light-CNS11643
•
Hei-GB2312-80Xi
•
XiSong-GB2312-80
Those fonts with the CNS11643 extension are traditional Chinese fonts encoded in the
DEC Hanyu codeset, with glyphs for plane 1 and plane 2 characters. Those fonts with the
GB2312-80 extension are Simplified Chinese fonts encoded in DEC Hanzi.
8.4 Commands and Daemons
Before you can utilize the printing features supported by the Compaq Tru64 UNIX
software, there are some commands and daemons that you should understand. This section
discusses these commands and daemons in detail and the following section illustrates how
they are used for configuration.
8.4.1 Country-Specific Options to the lpr Command
In addition to the usual options to the lpr command, the -A option is used to pass
country-specific parameters. You can use ya to set the parameters to the -A option in the
/etc/printcap file. For example, you can specify the parameters using the -A option
to the lpr command as:
% lpr -A "flocale=zh_TW.big5 font=Sung-LightCNS11643 plocale=zh_TW.dechanyu" <file>
You can define the same set of parameters in the /etc/printcap, file as:
:ya="flocale=zh_TW.big5 font=Sung-LightCNS11643 plocale=zh_TW.dechanyu":\
The parameters supplied with the -A option to the lpr command override the
corresponding default values in the /etc/printcap file.
8–4 Tru64 UNIX Technical Reference for Using Chinese Features
The following parameters are applicable to Chinese printing:
•
flocale=<any valid locale>
Specifies the locale for the source text file. The printer filters use this locale to
validate the characters inside the source text file. If this value is not set properly, the
text is interpreted using the current locale. In Chinese printing, this value is
particularly important in order for the lpr command to correctly interpret the
characters. Moreover, if the plocale option is also set, the lpr command
performs codeset conversion for the source text file.
•
plocale=<any valid locale>
Specifies the locale for the printer. If the printer has built-in fonts, the plocale
value should match the codeset of the built-in fonts. If the printer employs the font
faulting mechanism, the plocale value should match the font used to print the text
file.
•
font=<supported outline font>
Specifies the font name for printing the source text files on a PostScript printer. This
parameter is used for printing text files only, as PostScript files are already tagged
with the required font name.
•
odldb=<odl database path>
Specifies the path of the SoftODL database files. This parameter is used to override
the system default SoftODL database path, hence allowing users to access their own
SoftODL database.
•
odlstyle=<odl style and size>
Specifies what SoftODL font style and size to use. The value is of the form <style><NxN> (that is, normal-24x24). If not specified, the system default SoftODL style
and size are used.
•
line=<number of lines>
Specifies the number of lines per page. You can use this parameter with the -w
command to control the font size and orientation of the output.
8.4.2 PostScript Font Management Utility (pfsetup)
A PostScript font management utility, pfsetup, is provided to help you set up print
queues to use the font-faulting mechanism. This utility has the following syntax:
pfsetup [-s] [-d] [queue_name...]
The following options can be used:
-s
setup mode — Allows you to set up lists of fonts to be downloaded
Tru64 UNIX Technical Reference for Using Chinese Features 8–5
-d
download mode — Downloads fonts to printers according to the lists prepared
with the -s option
If you do not specify an option, pfsetup displays the information about the print queues
that have been set up with this utility. If you do not specify a particular queue name, the
pfsetup command processes every applicable queue.
In setup mode, the pfsetup command displays all PostScript printer fonts available on
the system and prompts you to select the fonts or font headers to be downloaded onto
individual print queues:
% /usr/sbin/pfsetup -s
========================================================
Printer queue: lp1 | 1 | dl1152w
No font has been setup for downloading in queue lp1
These are fonts available in your system for downloading. Fonts
chosen for downloading are marked with *
1 Hei-Light-CNS11643
3 Hei-GB2312-80
5 XiSong-GB2312-80
7 [email protected]
9 AngsanaUPC-Light
11 CordiaUPC-BoldItalic
13 CordiaUPC-Light
15 EucrosiaUPC-BoldItalic
17 EucrosiaUPC-Light
19 FreesiaUPC-BoldItalic
21 FreesiaUPC-Light
23 IrisUPC-BoldItalic
25 IrisUPC-Light
2
4
6
8
10
12
14
16
18
20
22
24
26
Sung-Light-CNS11643
XiSong-GB2312-80
[email protected]
AngsanaUPC-Italic
CordiaUPC-Bold
CordiaUPC-Italic
EucrosiaUPC-Bold
EucrosiaUPC-Italic
FreesiaUPC-Bold
FreesiaUPC-Italic
IrisUPC-Bold
IrisUPC-Italic
JasmineUPC-Bold
[C]ontinue | [S]etup | [L]ist fonts | [Q]uit | [N]ext queue <C>
8–6 Tru64 UNIX Technical Reference for Using Chinese Features
The action keys provided by pfsetup have the following meaning:
Key
Action
[C]ontinue
Displays a further font listing
[S]etup
Proceed to setup
[L]ist fonts
List fonts again
[Q]uit
Quit pfsetup
[N]ext queue
Proceed to the next queue
If you choose the s option, another prompt is displayed:
[A]dd fonts | [R]emove fonts | [L]ist fonts | [Q]uit |
[N]ext queue <N>
In response to the prompt, you can enter a to add fonts to or r to delete fonts from the list
for a print queue. The fonts you select are highlighted with an asterisk (*).
______________________________ Note ____________________________
The fonts on this list vary according to the language variants that have been installed
on your system.
_______________________________________________________________
To download the fonts or font headers selected in setup mode, use the pfsetup command
with the -d option.
For fonts of ideographic character sets, the pfsetup utility downloads only their font
headers. Data defining their font glyphs is downloaded only on an as-needed basis through
the font faulting mechanism. This saves printer memory. For single-byte fonts, the utility
downloads the entire font onto printers for efficiency.
8.4.3 Font-Faulting Daemon (ffd)
To handle font-faulting requests from a PostScript printer with the two channel approach,
(as DEClaser 1152 see Section 8.5.2), the font faulting-daemon, ffd, must be running on
your system. When the daemon receives a font data request, it extracts the required font
glyph data from the specified font and sends it to the printer through the secondary
channel.
If you configure a print queue that uses the two channel approach, or modify the secondary
channel of a print queue, you must restart the font-faulting daemon.
To restart the font-faulting daemon, log in as a superuser and stop the existing fontfaulting daemon with the following command:
% /sbin/init.d/ffserver stop
Tru64 UNIX Technical Reference for Using Chinese Features 8–7
To start the font-faulting daemon, you can enter:
% /sbin/init.d/ffserver
8.4.4 PrintServer Printing Command wwlpspr
To fully utilize the features of PrintServer 17 such as two-sided printing, multiple pages
per side, and so on, it is necessary to use the lpspr command provided by PrintServer
Software Version 5.0 or later for Compaq Tru64 UNIX. This command, however, does
not provide the features like locales and fonts for printing text files. To support printing
files to PrintServer 17, the Compaq Tru64 UNIX software provides a unified wwlpspr
command.
The wwlpspr command is a front-end program that parses the parameters passed by
users and calls different commands such as print filter, lpr, or lpspr with the
appropriate command. This provides users with a unified interface.
For details on the wwlpspr command, refer to the wwlpspr(1) reference page.
8.5 Chinese Printing Setup
This section describes how to set up the following printers to print Chinese characters:
•
CP382-D, LA88-C, and LA380-CB Dot Matrix Printers
•
DEClaser 1152
•
DEClaser 5100
•
PrintServer 17
•
Generic PostScript printers
8.5.1 Dot Matrix Printers
The only consideration in configuring the CP382-D, LA88-C, and LA380-CB printers is
whether ODL printing should be enabled or not, and, if enabled, the default ODL database
path and style.
To configure any of the Chinese dot matrix printers:
1.
Add a printer using the lprsetup command and specify the cp382d, la88c, or
la380cb printer.
The following prompt is displayed:
Do you want to enable ODL? [n]
2.
Answer y if you want ODL printing, and n if you do not.
8–8 Tru64 UNIX Technical Reference for Using Chinese Features
If you answer y, the default ODL database and style will be used and the following
prompt is displayed:
Enter symbol name:
3.
To change the default, enter ya at the prompt. The following prompt is displayed:
Enter a new value for symbol ’ya’?
["plocale=zh_TW.dechanyu"]
4.
Enter the following string at the prompt, using the pathname and style of your ODL:
"plocale=zh_TW.dechanyu odldb=<path to the default ODL db>
odlstyle=<default ODL style>"
Remember the double quotation marks around the parameters, and replace the
plocale with the value found in the previous prompt.
You can now use the lpr command to send Chinese text files to print queues connected to
these dot matrix printers. You can either set the LANG environment variable or use the
-A option to denote the codeset of the text files. For example, the following command
prints the file encoded in the Taiwanese EUC codeset:
% lpr -A "flocale=zh_TW.eucTW" my.file1
You can override the default ODL path by including the odldb and odlstyle options
to the -A option of the lpr command. For example, the following command uses the
ODL database in the /usr/priv/odl directory:
% lpr -A "odldb=/usr/priv odlstyle=normal-24x24" my.file2
8.5.2 DEClaser 1152
The DEClaser 1152 printer can be used to print Chinese characters by using the font
faulting mechanism with two communications channels; one channel is for normal data
and the second channel is for font-faulting data, as shown in Figure 8–1.
Figure 8–1: Two-Channel Communication of the Font-Faulting
Mechanism
Tru64 UNIX Technical Reference for Using Chinese Features 8–9
Font faulting requires 4MB of printer memory. If your DEClaser 1152 printer has only
2MB of memory, you must install the LN07X-UF memory board to provide the additional
2MB of memory. Refer to the printer manual for information about installing the LN07XUF memory board.
You also must establish one and only one system to be the font-faulting server for the
printer. This server sends font information to the printer through a secondary
communication interface, or channel. The printer’s secondary channel connection to the
font-faulting server can be through either a local port or a Local Area Transport (LAT)
port. If the connection is through a LAT port, make sure that no other applications or
hosts are using that port.
An 8 pin Din to 6 Pos MMJ Adapter is needed to convert the Apple-talk interface on the
printer to be the secondary channel that the font-faulting mechanism uses. The baud rate
of the secondary interface should match the value of the $BAUD variable in the
/sbin/init.d/ffserver file. By default, this value is 9600 baud.
To configure the DEClaser 1152 to print Chinese files:
1.
Add a printer using the lprsetup command and select the dl1152w printer type.
The following prompt is displayed:
Do you want to configure this machine as font faulting
server for the DEClaser 1152 printer? One and only one font
faulting server should be configured for every DEClaser
1152 employing font faulting mechanism. [n]
2.
If this machine is the font-faulting server for the printer, answer y to the prompt, and
then enter the full pathname of the port used to connect the secondary channel for the
ya capability.
The following prompt is displayed:
Do you want to set up the printer codeset or default font
for printing non-Ascii text? If your answer is ’y’, please
consult printcap.4 for the parameters. *** Remember to
enclose the parameters with a pair of double quotes. [n]
3.
Answer y to the prompt, and then define plocale and font in one of the following
ways:
"plocale=<Chinese locale> font=<font>"
- Define plocale as zh_TW.dechanyu if you are printing traditional Chinese
8–10 Tru64 UNIX Technical Reference for Using Chinese Features
characters, and define font as either Sung-Light-CNS11643 or Hei-LightCNS11643;
- Define plocale as zh_CN.dechanzi if you are printing Simplified Chinese
characters, and define font as either XiSong-GB2312-80 or Hei-GB2312-80.
4.
Use the pfsetup command to download the fonts. For greatest efficiency,
download all the Chinese fonts unless there are some you will never use, plus the most
frequently used single-byte fonts. The printer can access the fonts that are not
manually downloaded, but there is overhead for dynamically downloading fonts.
5.
If you configured this machine as the font-faulting host, enter the following
commands to notify the font-faulting daemon about the new printer:
% /sbin/init.d/ffserver stop
% /sbin/init.d/ffserver start
You need to enter these commands only after you add or modify the printer queue.
The next time you reboot the server, the font-faulting daemon will locate the printer.
6.
Each time the font-faulting server or the DEClaser 1152 is restarted, you should use
the pfsetup command to download the fonts again.
For details about the font-faulting daemon and the pfsetup utility, see Section 8.4,
Commands and Daemons.
To send Chinese text files to print queues connected to a DEClaser 1152, use the lpr
command. You can either set the LANG environment variable or use the -A option to
denote the codeset of the text files. For example, the following command prints the file
encoded in the Taiwanese EUC codeset:
% lpr -A "flocale=zh_TW.eucTW" my.file1
You can override the default plocale and font setting in the /etc/printcap file
using the -A option of the lpr command, but be sure the font you use matches the codeset
of plocale. For example, the following command gets the file encoded in Taiwanese
EUC, converts it to DEC Hanzi, and prints it using the XiSong-GB2312-80 font:
% lpr -A "flocale=zh_TW.eucTW
plocale=zh_CN.dechanzi font=XiSong-GB2312-80" my.file2
8.5.3 DEClaser 5100
The Declaser 5100 printer can be used to print Chinese characters by using the fontfaulting mechanism with built-in hard disk. The LN90X-HD model that supports the fontfaulting mechanism also includes the 128 MB hard disk option. The printer also must
have at least 6 MB of memory.
Tru64 UNIX Technical Reference for Using Chinese Features 8–11
To configure the DEClaser 5100 to print Chinese files:
1.
Add a printer using the lprsetup command and select the dl5100w printer type.
The following prompt is displayed:
Do you want to set up the printer codeset or default font
for printing non-Ascii text? If your answer is ’y’, please
consult printcap.4 for the parameters. *** Remember to
enclose the parameters with a pair of double quotes. [n]
2.
Answer y to the prompt and define the plocale and font in one of the following
ways:
"plocale=<Chinese locale> font=<font>"
- Define plocale as zh_TW.dechanyu if you are printing traditional Chinese
characters, and define font as either Sung-Light-CNS11643 or Hei-LightCNS11643;
- Define plocale as zh_CN.dechanzi if you are printing Simplified Chinese
characters, and define font as either XiSong-GB2312-80 or Hei-GB2312-80.
3.
Use the pfsetup command to download the fonts. For greatest efficiency,
download all fonts that you expect the printer to need; any fonts not manually
downloaded are not accessible to the printer.
You need to download fonts only once. The fonts remain on the printer until they are
manually removed or the hard disk is reformatted.
For details about the pfsetup utility, see Section 8.4, Commands and Daemons.
To send Chinese text files to print queues connected to a DEClaser 5100, use the lpr
command. You can either set the LANG environment variable or use the -A option to
denote the codeset of the text files. For example, the following command prints the file
encoded in the Taiwanese EUC codeset:
% lpr -A "flocale=zh_TW.eucTW" my.file1
You can override the default plocale and font setting in the /etc/printcap file
by using the -A option to the lpr command, but be sure the font you use matches the
codeset of the plocale. For example, the following command gets the file encoded in
Taiwanese EUC, converts it to DEC Hanzi, and prints it using the XiSong-GB2312-80
font:
8–12 Tru64 UNIX Technical Reference for Using Chinese Features
% lpr -A "flocale=zh_TW.eucTW
plocale=zh_CN.dechanzi font=XiSong-GB2312-80"
my.file2
8.5.4 PrintServer 17
The PrintServer 17 printer can be used to print Chinese characters by using the fontfaulting mechanism through the network. There are no special hardware requirements.
To configure the PrintServer 17 to print Chinese files:
1.
Install the PrintServer Software Version 5.0 or later for Compaq Tru64 UNIX. This is
a layered product, purchased separately. Refer to the installation guide for installing
and configuring the PrintServer software.
2.
If you want to define default the printer codeset or default font to print Chinese text,
use the lprsetup command or manually update the /etc/printcap file to
include the ya option. Add the following string to the file:
:ya="plocale=<Chinese locale> font=<font>":\
- Define plocale as zh_TW.dechanyu if you are printing traditional Chinese
characters, and define font as either Sung-Light-CNS11643 or Hei-LightCNS11643;
- Define plocale as zh_CN.dechanzi if you are printing Simplified Chinese
characters, and define font as either XiSong-GB2312-80 or Hei-GB2312-80.
3.
Use the pfsetup command to define the fonts to be downloaded. For greatest
efficiency, download all Chinese fonts unless there are some you will never use, plus
the most frequently used single-byte fonts. The printer can access the fonts that are
not manually downloaded, but there is overhead for dynamically downloading fonts.
Note that you should execute the pfsetup command in the Configuration Host of
the PrintServer Printer. After executing pfsetup, restart the Management Client to
initialize the new configuration. Refer to the document on PrintServer Software
Version 5.0 or later for further details.
4.
Turn off the PrintServer 17 and then turn it on. The fonts defined in step 3 will be
downloaded automatically.
Hereafter, each time you restart the PrintServer 17, the fonts download automatically.
Hence the pfsetup command is executed only once as long as you do not modify
the configuration.
For details about the pfsetup utility, see Section 8.4, Commands and Daemons.
Tru64 UNIX Technical Reference for Using Chinese Features 8–13
To send Chinese text files to print queues connected to a PrintServer 17, use the wwlpspr
command. You can either set the LANG environment variable or use the -A option to
denote the codeset of the text files. For example, the following command prints the file
encoded in the Taiwanese EUC codeset:
% wwlpspr -A "flocale=zh_TW.eucTW" my.file1
You can override the default plocale and font setting in the /etc/printcap file
by using the -A option of the wwlpspr command, but make sure the font you use
matches the codeset of the plocale. For example, the following command gets the file
encoded in Taiwanese EUC, converts it to DEC Hanzi, and prints it using the XiSongGB2312-80 font:
% wwlpspr -A "flocale=zh_TW.eucTW
plocale=zh_CN.dechanzi font=XiSong-GB2312-80" my.file2
8.5.5 Generic PostScript Printers
Any PostScript printer that supports PostScript level 2 or level 1 with multibyte font
extension and sufficient memory can be used to print Chinese using the font-embedding
mechanism.
To configure a printer to use font embedding, use the lprsetup or printconfig
utility to set up a print queue using the wwpsof filter. You do not need to specify the
printer locale as this mechanism can print any language as long as the fonts are available
in the system. See the wwpsof(8) reference page for information on how to use the
features available for this print filter.
8–14 Tru64 UNIX Technical Reference for Using Chinese Features
9
Other Chinese Features
This chapter describes features specific to the Chinese locale in Compaq Tru64 UNIX that
are not described elsewhere.
9.1 Phrase Support in the VT382–D
You can download up to 100 phrase definitions into the built-in memory of the VT382-D
traditional Chinese terminal. You can create a phrase definition file containing the
definitions and then download the file to the terminal through the serial port.
______________________________ Note ____________________________
The information presented in this section is not applicable to DECwindows Motif or
CDE.
_______________________________________________________________
9.1.1 Creating a Phrase Definition File
Each phrase definition file can contain up to 100 phrase definitions. You can create the
file using any editor which allows you to edit Chinese data, such as vi.
9.1.2 Syntax of Phrase Definitions
Phrase definitions have the following syntax:
DCS Pc SP v phrase-code / phrase-data ST
Tru64 UNIX Technical Reference for Using Chinese Features 9–1
The parameters are:
Table 9–1: Phrase definitions
DCS
A phrase identifier defined by the Compaq
Tru64 UNIX software. Its hexadecimal code is
90. For the 7-bit environment, you can use ESC
P (hexadecimal code 1B 50) instead.
Pc
A parameter which controls whether the old
phrase definitions in the memory should be
cleared before a new one is downloaded. If Pc
is equal to zero or is omitted, the old phrase
definitions are kept. They are cleared if Pc
equals 1.
SP
A space character.
V
A lowercase v.
Phrase-code
A phrase code is a string of up to 8
alphanumeric characters. Uppercase and
lowercase letters are regarded as same
characters.
/
A slash character separates a phrase code from
its phrase data.
phrase-data
A phrase containing up to 80 characters.
Characters can be Chinese characters, English
letters, numerals, or printable symbols.
ST
An identifier that signals the end of the DCS
statement. Its hexadecimal code is 9C. For the
7-bit environment, you can use ESC \
(hexadecimal code 1B 5C) instead.
The following examples show two phrase definition files for the 8-bit and 7-bit
environments respectively.
9–2 Tru64 UNIX Technical Reference for Using Chinese Features
•
For the 8-bit environment
<ST>
<DCS>1 vBL/
<ST>
<DCS>0 vBW/
<ST>
<DCS>0 vBTT/
<ST>
<DCS>0 vCBC/
<ST>
<DCS>0 vCH/
<ST>
<DCS>0 vCL/
<ST>
<DCS>0 vCM/
<ST>
<DCS>0 vCPD/
<ST>
<DCS>0 vCWO/
<ST>
<DCS>0 vFAS/
<ST>
<DCS>0 vLC/
•
For the 7-bit environment:
In this example, the second slash in each phrase definition is regarded as part of the
phrase definition:
<ESC>P1 vAMBASSAD/
<ESC>P0 vASIA/
<ESC>P0 vBROTHER/
<ESC>P0 vCENT/
<ESC>P0 vFORTUNA/
/AMBASSADOR<ESC>\
/ASIA WORLD PLAZA<ESC>\
/BROTHER<ESC>\
/CENTURY PLAZA<ESC>\
/FORTUNA<ESC>\
9.1.3 Phrase Downloading
The procedure for transferring phrase definitions from a disk file to the terminal is called
downloading. The downloaded phrases are kept in the terminal memory as long as the
terminal is powered on. Consequently, a phrase definition file needs to be downloaded
only once in a terminal session.
To download a phrase definition file to a terminal, display the file onto a terminal using
one of the Compaq Tru64 UNIX commands, such as cat. You can also download phrase
definitions using the Phrase Utility. For details, see Writing Software for the International
Market .
The following situations may occur during downloading:
•
If a phrase code is duplicated, the new phrase definition replaces the old one.
•
If a phrase code contains more than 8 characters, the phrase definition is not accepted.
Tru64 UNIX Technical Reference for Using Chinese Features 9–3
•
If the ST code is typed incorrectly, all the characters from the first slash to the correct
ST code (with the exception of DCS) are treated as part of the phrase. For example:
<DCS>1 vPHONETIC/
<DCS>0 vINTERNAL/
AT
<ST>
The AT at the end of the first line is incorrect. If you enter the phrase code
"PHONETIC", the following string is input:
AT0 vINTERNAL/
•
If there are more than 100 phrase definitions, the phrase definitions beyond the limit
are ignored.
9.2 Sorting Utility
The methods for sorting Chinese characters are shown in Table 9–2 and Table 9–3:
Table 9–2: Traditional Chinese Sorting Methods
Sorting Method
Full Option Name
Short Option Name
Internal code
Code
c
Phonetic
Phonetic
p
Radical
Radical
r
Stroke
Stroke
s
Table 9–3: Simplified Chinese Sorting Methods
Sorting Method
Full Option Name
Short Option Name
Qu-Wei
Quwei
q
Pinyin
Pinyin
p
Radical
Radical
r
Stroke
Stroke
s
You can sort Chinese data using the internationalized sort utility. This utility allows
you to use one sorting method by selecting the respective locale as described in Chapter 3.
In some cases, you may find that using one sorting method is insufficient to meet your
needs. You might need to sort your data with multiple collating sequences. For instance,
many characters can have the same number of strokes and you might want to sort these
characters further according to their radicals.
9–4 Tru64 UNIX Technical Reference for Using Chinese Features
9.2.1 asort Utility
To sort characters according to their radicals, the Compaq Tru64 UNIX software provides
an extended sort utility, called asort, that you can use to sort or merge files containing
Chinese characters according to specified collating sequences. The asort utility has the
same syntax as that of the sort utility, but it provides two additional options:
-C"collate_sequence"
Defines the collating sequences where collate_sequence is a list of
identifiers or abbreviations of the collating sequences for sorting or
merging a file.
-v
Specifies to sort Chinese data in breadth-first comparison, just like
the behavior of the VMS/Hanyu or VMS/Hanzi sorting mechanism.
By default, depth-first comparison is used.
9.2.2 Multiple Collating Sequences
The asort utility allows you to specify multiple collating sequences. By default,
Chinese data is sorted by internal code. You can specify collating sequences with the -C
option so that Chinese data can be sorted using other collation methods. For example, the
following command sorts DEC Hanyu data files in the order of stroke, radical, and then
phonetic:
% setenv LANG zh_TW.dechanyu
% asort -C"Stroke Radical Phonetic" input.dat> output.dat
Alternatively you can enter:
% setenv LANG zh_TW.dechanyu
% asort -C"srp" input.dat > output.dat
These commands first sort the input data file according to the number of strokes. If
multiple characters have the same number of strokes, they are then sorted by radical. If
multiple characters within this group start with the same radical, they will then be sorted
by phonetic order.
______________________________ Note ____________________________
The asort utility is locale sensitive. You should first set the LANG environment
variable to the required Chinese locales before using the asort utility.
_______________________________________________________________
Tru64 UNIX Technical Reference for Using Chinese Features 9–5
9.2.3 Depth–First Against Breadth–First
By default, the asort utility compares Chinese data according to the specified collating
sequences using depth-first comparison. That is, each character in a sort field is compared
using all the specified collating sequences until the collating order is resolved. When two
characters have the same collating order, the next pair of characters is compared.
OpenVMS/Hanyu or OpenVMS/Hanzi use a slightly different sorting mechanism. The
HSORT utility provided with OpenVMS/Hanyu sorts characters in the whole sort field
using the first collating method. The second collating method applies only if the collating
order of the two sorting fields is identical. This is called breadth-first comparison. If you
want your sorting results to be compatible with that generated by OpenVMS/Hanyu or
OpenVMS/Hanzi, you can specify the -v option:
% asort -C"srp" -v input.dat > output.dat
9.2.4 User–Defined Characters
The asort utility supports the sorting of user-defined characters with the collating values
defined through the cedit utility. If required, the asort utility looks up the collating
values from the UDC database and sorts the data accordingly. The mechanism for sorting
UDCs is totally transparent to you.
9.3 Hanyu and Hanzi DECterm
Hanyu DECterm is a VT382-D terminal emulator; Hanzi DECterm is a VT382-C terminal
emulator. This section describes the Chinese features which are specific to the Hanyu and
Hanzi DECterm. For the details of the common internationalization features provided by
DECterm, see Writing Software for the International Market.
This section describes:
•
How to create a Hanyu or Hanzi DECterm
•
How to use terminal emulator features
•
How to use input and output Chinese characters
•
How to use other VT382-D and VT382-C functions
9.3.1 Creating a Hanyu or Hanzi DECterm
The terminal type that DECterm emulates is sensitive to the session language.
To create a Hanyu DECterm through the Session Manager, set the session language to one
of the traditional Chinese locales, for example, Chinese Taiwan, and then select DECterm
from the Applications menu of Session Manager. To create a Hanzi DECterm, select one
of the Simplified Chinese locales, for example, Chinese China.
9–6 Tru64 UNIX Technical Reference for Using Chinese Features
Alternatively, you can use the -xnllanguage qualifier to specify the terminal type of
the DECterm to create. For example, you can use zh_CN.dechanzi as the value for
-xnllanguage to create a Hanzi DECterm:
% /usr/bin/X11/dxterm -xnllanguage \ zh_CN.dechanzi
If you specify an unknown value for -xnllanguage, then ISO Latin-1 DECterm is
assumed. If no Chinese font exists, it defaults to ISO Latin-1 DECterm.
The user interface language of Hanyu and Hanzi DECterm always follows the terminal
type. The language is independent of the language selection.
9.3.2 Customizing DECterm
Except for customization of NRCS character sets, all customization features applicable to
the ISO Latin-1 DECterm window can also be applied to any Hanyu or Hanzi DECterm
window.
Customized features can be saved and restored in the same way as in ISO Latin-1
DECterm.
9.3.3 Font Sizes
By choosing the Big Font or Little Font option from the Window... item of the Options
menu you can choose either the 24x24 or 16x18 Chinese fonts.
9.3.4 Terminal ID
By choosing the General... item from the Options menu you can change the general
features, such as the terminal type, for the Hanyu and Hanzi DECterm from a dialog box.
You can also choose the VT382 ID from the dialog box.
9.3.5 Interaction Style
By choosing the Input Method... item from the Options menu you can select the interaction
style for Hanyu and Hanzi DECterm. For example, if you want to select the root window
interaction style, you can click on the Root Window button and then apply the change. If
you click on the ISO Latin 1 Input button, Hanyu and Hanzi DECterm disable the input of
Chinese data until another style is selected.
9.3.6 Input Server
By choosing the Input Method... item from the Options menu you can switch to use
another input server for Hanyu and Hanzi DECterm. By default, the traditional Chinese
input server is used for Hanyu DECterm, and Simplified Chinese input server is used for
Hanzi DECterm. To select another input server, you can click on the Other button and
then enter the input server name on the input field.
Tru64 UNIX Technical Reference for Using Chinese Features 9–7
For Hanyu DECterm, you can enter DECCN to switch to the Simplified Chinese input
server. For Hanzi DECterm, you can enter DECTW to switch to the traditional Chinese
input server. For details about these input servers, see Chapter 7.
9.3.7 Copying Information
You can use the Edit menu to copy information within or between DECterm windows.
The Cut-and-Paste operation is enhanced to handle mixed ASCII and Chinese characters.
Beyond this, conversion between traditional and Simplified Chinese data is performed
when data is copied between Hanyu DECterm and Simplified Chinese applications, and
between Hanzi DECterm and traditional Chinese applications, through the cut-and-paste or
quick copying operation.
9.3.8 Default Character Set
Hanyu DECterm supports CNS 11643 (first and second planes), DTSCS and all character
sets supported by the ISO Latin-1 DECterm. Hanzi DECterm supports GB2312 and all
character sets supported by the ISO Latin-1 DECterm.
ISO Latin-1 DECterm uses the ISO 8859-1 (Latin-1) as the default character set. You can
override this setting by choosing another option from the General... item on the Options
menu. For Hanyu DECterm, the default character set for 8-bit data is the Hanyu character
set (CNS 11643 and DTSCS). For Hanzi DECterm, the default character set for 8-bit data
is the Hanzi character set GB2312.
In general, Hanyu and Hanzi DECterm cannot display mixed accented Latin-1 characters
and Chinese characters. If you want to achieve this, you can output the data together with
the designated character set escape sequences.
9.3.9 Chinese Character Input/Output
You can enter Chinese characters in Hanyu and Hanzi DECterm by invoking any of the
Chinese input modes as described in Chapter 7. Mixed ASCII and Chinese characters can
be displayed properly in Hanyu and Hanzi DECterm without any special settings.
9.3.10 Reconnecting the Input Server
The Chinese input server provides you the ability to input Chinese characters. If this
process does not exist or terminates for some reason, the message Hanyu input
method does not exist or Hanzi input method does not exist is
displayed. You can restart the input server and then use the Reset Terminal option from
the Commands menu to reconnect the Hanyu and Hanzi DECterm to the input server.
9.3.11 VT382–D and VT382–C Terminal Functions
The following functions of the VT382-D and VT382-C terminal are implemented in the
Hanyu and Hanzi DECterm terminal emulator respectively:
9–8 Tru64 UNIX Technical Reference for Using Chinese Features
•
Display characteristics and capabilities
•
Text capabilities
— Level 3 terminal compatibility
*
VT300 mode
*
VT100 mode
*
VT52 mode
— ANSI compatible control functions
•
Support for Terminal State Interrogation (TSI)
•
Support for all of the Chinese input methods
•
Support for the following character sets:
— DEC Special Graphics Character Set (line drawing)
— DEC Supplemental Character Set
— DEC Technical Character Set
— ISO Latin-1 Character Set
— CNS11643-1986 and DTSCS-1990 Character Sets for VT382-D, and GB2312-80
Characters Sets for VT382-C
•
Control Representation mode
•
Support for sixel graphics
•
Support for UDK editing function
•
Support for Chinese character display attributes: reverse, underline, bold, blink,
double height/width
The following functions of the VT382-D terminal are implemented in the Hanyu DECterm
terminal emulator:
•
Display/Suppress leading code
A selection button is added in the Display... item under the Options menu for users to
enable or disable the display of a symbol for the leading code in a four-byte EDPC
character.
•
The escape sequence DECLCSM, that is, Leading Code Suppression Mode, is also
supported in Hanyu DECterm.
Tru64 UNIX Technical Reference for Using Chinese Features 9–9
For details about the VT382-D terminal functions, refer to VT382-D Programming
Reference Manual and VT382-D User’s Manual. For the details of the VT382-C terminal
functions, refer to VT382-C Programming Reference Manual and VT382-C User’s Manual.
9.4 Phrase Conversion
Chapter 2 described that the Compaq Tru64 UNIX software supports conversion between
different codesets using the iconv utility. This utility can also be used for phrase
conversion. When phrase conversion is activated, a phrase in traditional Chinese can be
converted to a phrase in Simplified Chinese, or the reverse.
To activate the phrase conversion option, you can define the ICONV_PHRCONV
environment variable. If this environment variable is set to mark, the converted phrases
are enclosed in brackets ([]) to highlight the conversion result for visual checking.
The phrase conversion databases in the /usr/share/phrdb directory are normal text
files with the same file names as those of the algorithmic converters in
/usr/lib/nls/loc/iconv/*. These phrase conversion databases contain entries for
phrase conversion pairs.
9.5 Special Characters in nroff
The nroff utility has been internationalized to format text of various languages. When a
Chinese document is formatted using nroff, its contents are handled according to
Chinese formatting rules:
•
Text line wrapping
- Chinese text, unlike English text, does not use space characters as separators
between words or as break points for text line wrapping. Instead, a Chinese text line
can be broken between any two consecutive Chinese characters, with the following
exceptions:
- Some Chinese characters cannot be placed at the beginning of a text line. They are
called no-first characters. For traditional Chinese, the no-first characters include the
following:
For Simplified Chinese, the no-first characters include the following:
‰ˆd
9–10 Tru64 UNIX Technical Reference for Using Chinese Features
- Some Chinese characters cannot be placed at the end of a text line. They are called
no-last characters. For traditional Chinese, the no-last characters include the
following:
For Simplified Chinese, the no-last characters include the following:
- Some English characters are handled similarly according to this rule.
No-first English characters:
!),.:;>?])
No-last English characters:
(<[(
•
Right justification
- To justify the right margin of a paragraph, nroff inserts space characters at proper
break points. For English, space characters are usually inserted at word breaks. For
Chinese, space characters are inserted only at the following places:
- Before a can-space-before character if it is not placed at the beginning of a text line.
For traditional Chinese, the can-space-before characters include the following:
For Simplified Chinese, the can-space-before characters include the following:
- After a can-space-after character if it is not placed at the end of a text line. For
traditional Chinese, the can-space-after characters include the following:
Tru64 UNIX Technical Reference for Using Chinese Features 9–11
For Simplified Chinese, the can-space-after characters include the following:
9–12 Tru64 UNIX Technical Reference for Using Chinese Features
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement