HP Neoview Character Sets Administrator`s Guide

HP Neoview Character Sets Administrator's
Guide
HP Part Number: 546188-001
Published: April 2009
Edition: HP Neoview Release 2.4
© Copyright 2009 Hewlett-Packard Development Company, L.P.
Legal Notice
Confidential computer software. Valid license from HP required for possession, use or copying. Consistent with FAR 12.211 and 12.212, Commercial
Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under
vendor’s standard commercial license.
The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express
warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP
shall not be liable for technical or editorial errors or omissions contained herein.
Export of the information contained in this publication may require authorization from the U.S. Department of Commerce.
Microsoft, Windows, and Windows NT are U.S. registered trademarks of Microsoft Corporation.
Intel, Pentium, and Celeron are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other
countries.
Java is a U.S. trademark of Sun Microsystems, Inc.
Motif, OSF/1, UNIX, X/Open, and the "X" device are registered trademarks, and IT DialTone and The Open Group are trademarks of The Open
Group in the U.S. and other countries.
Open Software Foundation, OSF, the OSF logo, OSF/1, OSF/Motif, and Motif are trademarks of the Open Software Foundation, Inc. OSF MAKES
NO WARRANTY OF ANY KIND WITH REGARD TO THE OSF MATERIAL PROVIDED HEREIN, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. OSF shall not be liable for errors contained
herein or for incidental consequential damages in connection with the furnishing, performance, or use of this material.
© 1990, 1991, 1992, 1993 Open Software Foundation, Inc. The OSF documentation and the OSF software to which it relates are derived in part
from materials supplied by the following:© 1987, 1988, 1989 Carnegie-Mellon University. © 1989, 1990, 1991 Digital Equipment Corporation. ©
1985, 1988, 1989, 1990 Encore Computer Corporation. © 1988 Free Software Foundation, Inc. © 1987, 1988, 1989, 1990, 1991 Hewlett-Packard
Company. © 1985, 1987, 1988, 1989, 1990, 1991, 1992 International Business Machines Corporation. © 1988, 1989 Massachusetts Institute of
Technology. © 1988, 1989, 1990 Mentat Inc. © 1988 Microsoft Corporation. © 1987, 1988, 1989, 1990, 1991, 1992 SecureWare, Inc. © 1990, 1991
Siemens Nixdorf Informationssysteme AG. © 1986, 1989, 1996, 1997 Sun Microsystems, Inc. © 1989, 1990, 1991 Transarc Corporation.OSF software
and documentation are based in part on the Fourth Berkeley Software Distribution under license from The Regents of the University of California.
OSF acknowledges the following individuals and institutions for their role in its development: Kenneth C.R.C. Arnold, Gregory S. Couch, Conrad
C. Huang, Ed James, Symmetric Computer Systems, Robert Elz. © 1980, 1981, 1982, 1983, 1985, 1986, 1987, 1988, 1989 Regents of the University
of California.
Table of Contents
About This Document.........................................................................................................7
Intended Audience.................................................................................................................................7
New and Changed Information in This Edition.....................................................................................7
Document Organization.........................................................................................................................7
Notation Conventions.............................................................................................................................8
General Syntax Notation...................................................................................................................8
Related Documentation........................................................................................................................10
Neoview Customer Library.............................................................................................................10
Publishing History................................................................................................................................11
HP Encourages Your Comments..........................................................................................................11
1 Introduction to Neoview Character Sets....................................................................13
Neoview Character Set Configurations, Column Character Definitions, and Character Sets.............13
Client Locale Character Encoding and Neoview Database Character Encoding.................................15
Compatible Client Locale Characters and Multiple Client Locale Characters.....................................15
Neoview Character Set Configurations................................................................................................16
Compatibility Between Neoview ODBC and JDBC Drivers and Neoview Platforms.........................19
2 Selecting a Neoview Character Set Configuration..................................................21
Criteria for Selecting a Neoview Character Set Configuration............................................................21
Process for Implementing a Neoview Character Set Configuration....................................................22
Rules for Migrating to Neoview Release 2.4.........................................................................................22
3 Using SQL Language Elements to Define and Manage Database Encoding.......23
Rules for Encoding SQL Language Elements.......................................................................................23
Behavior of SQL Functions...................................................................................................................26
Behavior of SQL String Functions.........................................................................................................27
Guidelines for the LIKE Predicate in the SJIS and Unicode Configurations........................................31
Locating Invalid Characters in Syntax Error Messages........................................................................33
Example of a Syntax Error in a Shorter SQL Statement..................................................................33
Example of a Syntax Error in a Multi-Line SQL Statement.............................................................34
4 Capabilities and Limitations of Neoview Client Applications.................................35
Neoview Command Interface (NCI)....................................................................................................35
Neoview DB Admin..............................................................................................................................35
Neoview Loader....................................................................................................................................36
Neoview Management Dashboard.......................................................................................................36
Neoview Manageability Repository.....................................................................................................37
Neoview Transporter Client.................................................................................................................37
Neoview Workload Management Services (WMS)..............................................................................38
5 Troubleshooting Guidelines for Neoview Character Sets Users.............................39
SJIS Character Mismatches...................................................................................................................42
SQL-Side SJIS Character Mismatch Examples................................................................................43
SQL-Side SJIS Character Mismatch Scenarios.................................................................................43
How to Prevent SQL-Side SJIS Character Mismatches...................................................................44
Table of Contents
3
A Character Set Mapping Tables..................................................................................45
B Capabilities and Limitations of Multiple Client Locales in the Unicode
Configuration...................................................................................................................47
C Configuring Neoview Client Applications................................................................49
How Character Encoding Is Implemented in the Neoview Transporter Client...................................49
How Pass-Through Mode and UTF16 Conversion Are Implemented From the Transporter
Client................................................................................................................................................49
Encoding Data Sources....................................................................................................................50
Control File Option Syntax........................................................................................................51
Control File Example..................................................................................................................51
Encoding Control Files....................................................................................................................51
Encoding Transporter Client Event and Log File Messages...........................................................52
How Character Encoding Is Implemented in the Neoview Loader.....................................................52
How Character Encoding Is Implemented in the Neoview ODBC Driver for Windows....................53
How Character Encoding Is Implemented in the Neoview ODBC Drivers for UNIX.........................53
How Character Encoding Is Implemented in the Neoview JDBC Driver............................................54
D Neoview ODBC and JDBC Driver Mappings of Character Sets and Language
IDs.....................................................................................................................................55
Mapping Information for the Neoview ODBC Driver for Windows...................................................55
Mapping Information for the Neoview ODBC Drivers for UNIX........................................................55
Mapping Information for the Neoview JDBC Driver...........................................................................56
Glossary............................................................................................................................57
Index.................................................................................................................................59
4
Table of Contents
List of Tables
1-1
1-2
1-3
1-4
2-1
3-1
3-2
3-3
3-4
4-1
5-1
B-1
C-1
C-2
C-3
C-4
C-5
C-6
D-1
D-2
D-3
Character Sets Stored in ISO88591 and UCS2 Columns for the Neoview Character Set
Configurations...............................................................................................................................14
Default Prefixes for Character String Literals...............................................................................15
Features, Behaviors, and Limitations of the Neoview Character Set Configurations..................17
Driver and Neoview Platform Compatibility...............................................................................19
Criteria for Selecting the Correct Neoview Character Set Configuration.....................................21
Summary of SQL Language Rules by Neoview Character Set Configuration.............................23
Behavior of SQL Functions............................................................................................................26
String Function Behaviors for Neoview Release 2.3 and Neoview Release 2.4............................27
Behaviors of SQL String Functions in the Three Configurations..................................................28
Neoview Loader Capabilities and Limitations ............................................................................36
Troubleshooting Symptoms, Causes, and Recommended Corrective Actions for Users.............39
Capabilities and Limitations for Multiple Client Locales in the Unicode Configuration.............47
How Pass-Through Mode and UTF16 Conversion Are Implemented From the Neoview
Transporter Client..........................................................................................................................50
Setting the Conversion of Input Data for the Neoview Loader....................................................52
Character Set Translation Behavior of the Neoview ODBC Driver for Windows........................53
Character Set Translation Behavior of the Neoview ODBC Drivers for UNIX.............................53
Attribute Values Used by the Neoview ODBC Drivers for UNIX for a Sample DSN
Configuration................................................................................................................................53
Character Set Translation Behavior of the Neoview JDBC Driver................................................54
Character Set and Language ID Mappings for the Neoview ODBC Driver for Windows...........55
Character Set and Language Name Mappings for the Neoview ODBC Drivers for UNIX.........55
Mapping Information for the Neoview JDBC Driver...................................................................56
5
6
About This Document
This manual contains the information needed by database administrators and end users to use,
configure, and troubleshoot the Neoview Character Sets feature for Release 2.4.
Intended Audience
This manual is intended for database administrators and other users of the Neoview Character
Sets feature on the Neoview platform.
New and Changed Information in This Edition
This version of the Neoview Character Sets Administrator's Guide contains this new and changed
information:
•
•
•
•
•
•
•
•
Table 1-3 (page 17) describes changes to features of the Neoview character set configuration.
For updated driver and Neoview platform compatibility, see “Compatibility Between
Neoview ODBC and JDBC Drivers and Neoview Platforms” (page 19).
To move to Release 2.4, see Chapter 2 (page 21).
Character string functions now treat each multibyte character in an input string as one
character, regardless of the byte length of the character. See “Behavior of SQL String
Functions” (page 27).
You can now translate a string from UCS2 to UTF8 (and from UTF8 to UCS2) in the Unicode
configuration, and you can translate a string from SJIS to UCS2 (and from UCS2 to SJIS) in
the SJIS configuration. See the TRANSLATE function in “Behavior of SQL String Functions”
(page 27).
In SJIS and Unicode configurations, the LIKE predicate now handles an underscore as one
character, regardless of how a character is stored in the database, and now compares character
strings at the character level, not the byte level. See “Guidelines for the LIKE Predicate in
the SJIS and Unicode Configurations” (page 31).
Syntax error messages now identify the number of characters from the start of the SQL
statement where the error occurred, allowing you to find an invalid character more quickly
with your editor. See “Locating Invalid Characters in Syntax Error Messages” (page 33).
Table 5-1 (page 39) describes new and revised troubleshooting procedures.
Document Organization
Chapter 1 (page 13)
Provides an introduction to the Neoview character set configurations and other
features of the Neoview Character Sets feature for Release 2.4.
Chapter 2 (page 21)
Identifies the criteria for determining which of the three Neoview character set
configurations (ISO88591, SJIS, or Unicode) is correct for a given customer.
Chapter 3 (page 23)
Describes the rules for using SQL language elements such as DDL and DML
statements, character string functions, and SQL identifiers to define and manage
character data in Neoview SQL tables.
Chapter 4 (page 35)
Describes the capabilities and limitations of the Neoview clients with respect to the
three Neoview character set configurations.
Chapter 5 (page 39)
Identifies known troubleshooting issues for end users and their recommended fixes
and solutions.
Appendix A (page 45)
Identifies the mapping tables used with the Neoview Character Sets feature and
provides a cross-reference to the document that contains links to these mapping
tables.
Intended Audience
7
Appendix B (page 47)
Describes the capabilities and limitations imposed on multiple client locales in the
Unicode configuration for Release 2.4.
Appendix C (page 49)
Describes how to configure and enable the translation functions of Neoview client
applications.
Appendix D (page 55)
Provides information about mapping character sets and language ID values for the
Neoview ODBC and JDBC drivers.
Notation Conventions
General Syntax Notation
This list summarizes the notation conventions for syntax presentation in this manual.
UPPERCASE LETTERS
Uppercase letters indicate keywords and reserved words. Type these items exactly as shown.
Items not enclosed in brackets are required. For example:
SELECT
Italic Letters
Italic letters, regardless of font, indicate variable items that you supply. Items not enclosed
in brackets are required. For example:
file-name
Computer Type
Computer type letters within text indicate case-sensitive keywords and reserved words. Type
these items exactly as shown. Items not enclosed in brackets are required. For example:
myfile.sh
Bold Text
Bold text in an example indicates user input typed at the terminal. For example:
ENTER RUN CODE
?123
CODE RECEIVED:
123.00
The user must press the Return key after typing the input.
[ ] Brackets
Brackets enclose optional syntax items. For example:
DATETIME [start-field TO] end-field
A group of items enclosed in brackets is a list from which you can choose one item or none.
The items in the list can be arranged either vertically, with aligned brackets on each side of
the list, or horizontally, enclosed in a pair of brackets and separated by vertical lines. For
example:
DROP SCHEMA schema [CASCADE]
[RESTRICT]
DROP SCHEMA schema [ CASCADE | RESTRICT ]
{ } Braces
Braces enclose required syntax items. For example:
FROM { grantee[, grantee]...}
8
A group of items enclosed in braces is a list from which you are required to choose one item.
The items in the list can be arranged either vertically, with aligned braces on each side of the
list, or horizontally, enclosed in a pair of braces and separated by vertical lines. For example:
INTERVAL { start-field TO end-field }
{ single-field }
INTERVAL { start-field TO end-field
| single-field }
| Vertical Line
A vertical line separates alternatives in a horizontal list that is enclosed in brackets or braces.
For example:
{expression | NULL}
… Ellipsis
An ellipsis immediately following a pair of brackets or braces indicates that you can repeat
the enclosed sequence of syntax items any number of times. For example:
ATTRIBUTE[S] attribute [, attribute]...
{, sql-expression}...
An ellipsis immediately following a single syntax item indicates that you can repeat that
syntax item any number of times. For example:
expression-n…
Punctuation
Parentheses, commas, semicolons, and other symbols not previously described must be typed
as shown. For example:
DAY (datetime-expression)
@script-file
Quotation marks around a symbol such as a bracket or brace indicate the symbol is a required
character that you must type as shown. For example:
"{" module-name [, module-name]... "}"
Item Spacing
Spaces shown between items are required unless one of the items is a punctuation symbol
such as a parenthesis or a comma. For example:
DAY (datetime-expression)
DAY(datetime-expression)
If there is no space between two items, spaces are not permitted. In this example, no spaces
are permitted between the period and any other items:
myfile.sh
Line Spacing
If the syntax of a command is too long to fit on a single line, each continuation line is indented
three spaces and is separated from the preceding line by a blank line. This spacing
distinguishes items in a continuation line from items in a vertical list of selections. For example:
match-value [NOT] LIKE pattern
Notation Conventions
9
[ESCAPE esc-char-expression]
Related Documentation
This manual is part of the HP Neoview customer library.
Neoview Customer Library
The manuals in the Neoview customer library are listed here for your convenience.
•
Administration
Neoview Character Sets
Administrator's Guide
Information for database administrators and end users of the Neoview
Character Sets product, including rules for defining and managing character
data using SQL language elements, capabilities and limitations of Neoview
client applications, troubleshooting character set-related problems, and enabling
Pass-Through mode in the ISO88591 configuration.
Neoview Database
Administrator’s Guide
Information about how to load and manage the Neoview database by using
the Neoview DB Admin and other tools.
Neoview Guide to Stored
Procedures in Java
Information about how to use stored procedures that are written in Java within
a Neoview database.
Neoview Query Guide
Information about reviewing query execution plans and investigating query
performance of Neoview databases.
Neoview Transporter User Guide Information about processes and commands for loading data into your
Neoview platform or extracting data from it.
README files for installing
Administration products
•
— README for the HP Neoview Transporter Java Client
Management
HP Database Manager (HPDM) Help topics that describe how to use the HP Database Manager Client to
Online Help
connect and manage a Neoview data warehousing platform.
HP Database Manager (HPDM) Information about how to connect and manage the HP Database Manager for
User Guide
database administrators.
Neoview Command Interface
(NCI) Guide
Information about using the HP Neoview Command Interface to run SQL
statements interactively or from script files.
Neoview Command Interface
(NCI) Online Help
Command-line help that describes the commands supported in the current
operating mode of Neoview Command Interface.
Neoview DB Admin Online Help Context-sensitive help topics that describe how to use the HP Neoview DB
Admin management interface.
Neoview Management Dashboard Information on using the Dashboard Client, including how to install the Client,
Client Guide for Database
start and configure the Client Server Gateway (CSG), use the Client windows
Administrators
and property sheets, interpret entity screen information, and use Command
and Control to manage queries from the Client.
Neoview Management Dashboard Context-sensitive help topics that describe how to use the Neoview
Online Help
Management Dashboard Client.
10
Neoview Performance Analyzer
Online Help
Context-sensitive help topics that describe how to use the Neoview
Performance Analyzer to analyze and troubleshoot query-related issues on
the Neoview data warehousing platform.
Neoview Reports Online Help
Help topics that describe how to use the HP Neoview Reports Tool.
Neoview Repository User Guide
Information about using the Repository, including descriptions of Repository
views and guidelines for writing Neoview SQL queries against the views.
Neoview System Monitor Quick
Start
Instructions for starting, using, customizing, and troubleshooting the Neoview
System Monitor.
Neoview Workload Management Information about using Neoview Workload Management Services (WMS) to
Services Guide
manage workload and resources on a Neoview data warehousing platform.
README files for installing
Management products
•
—
—
—
—
—
—
README for the HP Database Manager (HPDM)
README for the HP Neoview Management Dashboard Client
README for HP Neoview Command Interface (NCI)
README for HP Neoview Reports Client
README for the HP Neoview Performance Analysis Tools
README for the HP Neoview System Monitor
Connectivity
Neoview JDBC Type 4 Driver API Reference information about the HP Neoview JDBC Type 4 Driver API.
Reference
Neoview JDBC Type 4 Driver
Programmer’s Reference
Information about using the HP Neoview JDBC Type 4 driver, which provides
Java applications on client workstations access to a Neoview database.
Neoview ODBC Drivers Manual Information about using HP Neoview ODBC drivers on a client workstation
to access a Neoview database.
•
ODBC Client Administrator
Online Help
Context-sensitive help topics that describe how to use the ODBC Data Source
Administrator.
README files for installing
Connectivity products
— README for the HP Neoview JDBC Type 4 Driver
— README for the HP Neoview ODBC Driver for Windows
— README for the HP Neoview ODBC Drivers for UNIX
Reference
Mapping Tables for Neoview
Character Sets
A hyperlinked collection of East Asian characters supported by Neoview
character set functionality.
Neoview SQL Reference Manual Reference information about the syntax of SQL statements, functions, and
other SQL language elements supported by the Neoview database software.
Neoview Messages Manual
Cause, effect, and recovery information for error messages.
Publishing History
Part Number
Product Version
Publication Date
544818-001
HP Neoview Release 2.3
April 2008
546188-001
HP Neoview Release 2.4
April 2009
HP Encourages Your Comments
HP encourages your comments concerning this document. We are committed to providing
documentation that meets your needs. Send any errors found, suggestions for improvement, or
compliments to docsfeedback@hp.com.
Include the document title, part number, and any comment, error found, or suggestion for
improvement you have concerning this document.
Publishing History
11
12
1 Introduction to Neoview Character Sets
The Neoview Character Sets feature allows clients to store data encoded in any supported
character set, including multibyte data, into SQL database objects on the Neoview platform.
Clients include customer applications running on other systems and users accessing Neoview
client applications from client workstations. When configured to do so, translation functions in
the Neoview ODBC and JDBC drivers or in the Neoview Transporter or Neoview Loader convert
the client locale character data into the character set encoding that can be stored in and retrieved
from the Neoview database.
The Neoview platform enforces the use of compatible and mappable character sets between
client locales and the Neoview database by ensuring that:
•
•
Character data sent to the Neoview database from customer applications is successfully
converted to the character encoding required for the Neoview database.
As needed, character data retrieved from the Neoview database is converted to a character
set that is compatible with the retrieving customer application or other client locale.
Incompatible characters from Neoview client and server components are managed as follows:
•
•
If a user attempts to store incompatible character data in the Neoview database, that data
is rejected and an error is returned to the user.
If character data retrieved from the Neoview database is returned to a client locale that is
configured with an incompatible character set, the incompatible characters are replaced with
replacement characters (by default, question marks) and a warning message is returned
whenever possible.
The objective of the Neoview Character Sets feature is to perform transparent and successful
mapping of character data sent back and forth between customer applications and the Neoview
database.
Neoview Character Set Configurations, Column Character Definitions,
and Character Sets
Because the terminology overlaps, it is important to clearly understand the differences between
Neoview character set configurations, column character set definitions, and the character sets
that are actually stored in Neoview database character columns.
For this release, customers can choose one of these Neoview character set configurations:
•
•
•
ISO88591 configuration
SJIS configuration
Unicode configuration
For detailed descriptions of these configurations, see “Neoview Character Set Configurations”
(page 16).
In any of these three configurations, character data must be stored in columns defined with an
ISO88591 column character set definition or a UCS2 column character set definition using a
character set that is supported by the Neoview database. ISO88591 columns store character data
in single-byte containers. UCS2 columns store character data in double-byte containers.
You specify these values when you use an SQL statement to create a table and define its character
columns. To identify the column character set definition for a column, you specify the value
CHARACTER SET ISO88591 or CHARACTER SET UCS2. To define the character set encoding
for a character string literal stored in that column, you specify the literal prefix _ISO88591,
_UCS2, or N.
It is important to distinguish between the column character set definition and the actual character
set encoding of the data stored in the column, because they are not always the same. UCS2
columns do not always store UCS2-encoded data, and ISO88591 columns can store any supported
Neoview Character Set Configurations, Column Character Definitions, and Character Sets
13
client locale character set, SJIS characters, or UTF8 characters, depending on the selected Neoview
character set configuration.
Table 1-1 (page 14) identifies the character set encodings that the Neoview database uses to store
characters in ISO88591 and UCS2 columns for the three Neoview character set configurations.
Table 1-1 Character Sets Stored in ISO88591 and UCS2 Columns for the Neoview Character Set
Configurations
ISO88591 Configuration
SJIS Configuration
Unicode Configuration
Character set stored in
ISO88591 columns in this
configuration
Client locale characters
SJIS characters
UTF8 characters
Character set stored in
UCS2 columns in this
configuration
• All character data sent to • All character data sent to UCS2 characters
UCS2 columns in SQL
UCS2 columns in SQL
statement character
statement character
string literals must be
string literals must be
encoded and stored as
encoded and stored as
ISO8859-1 characters.
SJIS characters.
• All character data for
• All character data for
UCS2 columns that is
UCS2 columns that is
bound as parameters is
bound as parameters is
encoded and stored as
encoded and stored as
UCS2 characters.
UCS2 characters.
The first row in Table 1-1 (page 14) identifies the character set encoding stored in ISO88591
columns in the three configurations. In the ISO88591 configuration, any supported client locale
character set can be stored in ISO88591 columns. In the SJIS configuration, SJIS characters are
stored. In the Unicode configuration, UTF8 characters are stored in ISO88591 columns. UTF8
characters are also stored in metadata columns that contain table names, view names, column
names, and the SQL test for constraints.
The second row in Table 1-1 (page 14) shows that, for the ISO88591 and SJIS configurations, the
Neoview database encodes and stores character data for UCS2 columns from SQL statement
character string literals differently than it does character data for UCS2 columns sent as
parameters. The Neoview database does this to ensure the correct representation of the stored
data. For the Unicode configuration, all character data for UCS2 columns is encoded and stored
in UCS2 format.
To further illustrate these restrictions on character set encoding for UCS2 columns in the ISO88591
and SJIS configurations, assume an ISO88591 configuration with a client locale that uses
MS932-encoded SJIS characters. Users can store these client locale SJIS characters in UCS2 columns
only if they send them as parameters. However, if they send the SJIS characters in SQL statement
character string literals, the characters are automatically stored in their ISO8859-1 representation,
not as proper SJIS characters.
Table 1-2 (page 15) identifies:
• The default column character set definition for each configuration
• The non-default column character set definition for each configuration
• The default character string literal prefix for each configuration
• The non-default character string literal prefixes for each configuration
14
Introduction to Neoview Character Sets
Table 1-2 Default Prefixes for Character String Literals
Neoview
Character Set
Configuration
Default Column
Default Prefix for
Character Set Definition Non-Default Column
Character String Literals Non-Default Prefixes for
(Does Not Need to Be Character Set Definition (Does Not Need to Be Character String Literals
Specified)
(Must Be Specified)
Specified)
(Must Be Specified)
ISO88591
CHARACTER SET
ISO88591
CHARACTER SET
UCS2
_ISO88591
_UCS2, N
SJIS
CHARACTER SET
ISO88591
CHARACTER SET
UCS2
_ISO88591
_UCS2, N
Unicode
CHARACTER SET
UCS2
CHARACTER SET
ISO88591
_UCS2, N
_ISO88591
If you do not explicitly specify a column character set definition for a character column in an
SQL statement, the Neoview database assumes it is the default column character set definition
for that configuration. If you do not intend to use the default value, you must explicitly specify
the column character set definition in the SQL statement.
If you do not explicitly specify the character set prefix for a character string literal, the Neoview
database assumes the literal uses the default encoding for that configuration. If the literal does
not use that default encoding, you must explicitly define the encoding in the prefix. The N prefix
represents NCHAR, which maps by default to UCS2 characters.
Client Locale Character Encoding and Neoview Database Character
Encoding
To understand how the Neoview Character Sets feature works, it is important to distinguish
between client locale character encoding and Neoview database character encoding. Client locale
character encoding refers to the character set language that is currently used and active on a
client locale, such as a customer application or client workstation. Table 1-3 (page 17) identifies
the client locale character sets that are supported by the ISO88591, SJIS, and Unicode
configurations. The SQL identifiers and string literals in SQL statements are encoded in the client
locale character set of the client.
Neoview database character encoding refers to the character sets that can be used to store
character data in table columns for a given Neoview character set configuration. Database
character encoding may be different from the client locale character encodings used by customer
applications and client workstations. The rules that govern the use of character data in SQL
language components are described in Chapter 3 (page 23).
For example, assume a Neoview platform using the Unicode configuration has an ODBC
connection to a client workstation that is currently configured for the GBK character set. The
SQL statements issued from this client workstation are encoded in GBK, then converted to the
Neoview database encoding by the Neoview ODBC driver and/or the SQL engine and stored in
user tables. When the stored user data is retrieved by the GBK-configured client workstation or
by a customer application using a different character set, the user data is converted to the client
locale character encoding of the target client.
Compatible Client Locale Characters and Multiple Client Locale Characters
Two other important terms are compatible client locale characters and multiple client locale
characters. Compatible client locale characters refers to use of compatible characters among all
the client locales. Every character used by any one client locale must map to the same code point
value on the Neoview database that it does for every other client locale.
Multiple client locale characters refers to the ability to store and retrieve characters from different
languages on the same Neoview platform. For example, when the Unicode configuration is used,
the Neoview database can store data from different client locale character sets. Every client locale
Client Locale Character Encoding and Neoview Database Character Encoding
15
character is mapped to a plane within the Unicode encoding and equivalent characters are
mapped to the same Unicode code point. Java-based Neoview client applications such as Neoview
DB Admin, Neoview Command Interface, and JDBC applications can display data from multiple
client locale characters. Neoview ODBC drivers can display only data that matches the client
locale characters and replaces all other characters (by default, with question marks).
An example of a multiple client locale environment is one where every client workstation, no
matter its local character set encoding, can access Neoview DB Admin or a customer query
application and display SQL table and column names in every supported client locale character
encoding. On the other hand, if every client workstation only displays table and column names
that are encoded in its own current encoding and shows question marks for all other character
set data, multiple client locales are not supported.
To display different languages from multiple client locales, all those languages must be installed
on every client locale.
Neoview Character Set Configurations
For this Neoview release, one of these three Neoview character set configurations must be selected
and configured on each Neoview platform:
•
•
•
ISO88591 configuration
SJIS configuration
Unicode configuration
Your HP support provider can work with you to identify the correct configuration for your
database environment. For information about how to select the correct configuration for your
Neoview platform, see Table 2-1 (page 21).
Table 1-3 (page 17) describes the key features, behaviors, and limitations of the three Neoview
character set configurations for this Neoview release.
16
Introduction to Neoview Character Sets
Table 1-3 Features, Behaviors, and Limitations of the Neoview Character Set Configurations
Configuration Features and Behaviors
Limitations
ISO88591
For this release, the ISO88591 configuration has
these limitations and restrictions:
• Character string literals in an SQL statement are
assumed to be in the ISO8859-1 encoding. An
invalid translation might occur when users
attempt to store a character encoding other than
ISO8859-1, such as the client locale, in a UCS2
column.
• To ensure the correct representation of all
character data sent to UCS2 columns:
— All character data sent to UCS2 columns from
character string literals must be encoded as
ISO8859-1 characters.
— All character data for UCS2 columns that are
bound as parameters are encoded and stored
as UCS2 characters.
SJIS
The ISO88591 configuration replicates the
Neoview character set environment for Release
2.2. It allows users to store data encoded in any
character set—including ISO8859-1 through
ISO8859-15 and East Asian multibyte character
sets—in ISO88591 columns.
• The Neoview database stores and retrieves
all client locale character-encoded table
names, column names, and character literals
as if they were encoded in 8-bit ISO8859-1
characters.
• The default column character set definition
is ISO88591.
• Uses binary collation.
• All error messages are sent in the client locale
character encoding.
• The Neoview database assumes that all EMS
event messages are in UTF8 format.
• Neoview platforms with the ISO88591
configuration support the use of Release 2.4,
Release 2.3, and Release 2.2 Neoview ODBC
and JDBC drivers. For more information, see
“Compatibility Between Neoview ODBC and
JDBC Drivers and Neoview Platforms”
(page 19).
• If they want to ensure compatible client locales,
users are required to use only 7-bit ASCII or
ISO8859-1 characters for all SQL identifiers and
user character data. For more information about
compatible client locales, see the troubleshooting
information in Table 5-1 (page 39).
• SQL EMS messages are displayed correctly only
if they are encoded in ISO8859-1. NDCS EMS
messages are displayed correctly only if they are
encoded in 7-bit ASCII.
• SQL string functions assume all characters in
columns defined as ISO88591 columns are
single-byte characters.
• Neoview DB Admin displays only table and
column names that are encoded in ISO8859-1.
• Neoview Management Dashboard displays only
ASCII-formatted characters.
• Table names, column names, and character For this release, the SJIS configuration has these
literals are stored in ISO88591 columns as
limitations and restrictions:
SJIS characters using the Microsoft codepage • Character string literals in an SQL statement or
932 (MS932).
character data bound as parameters are assumed
to be in the MS932 (SJIS) encoding. Attempts to
NOTE: Client locales must use
store any other character encoding in ISO88591
MS932-compliant encoding of SJIS characters
or UCS2 columns are rejected.
to prevent possible character translation
errors. Client locales that use other encoding
(for example, EUC-JP or standard SJIS) might
generate unrecognizable characters (question
marks) for characters that do not map to
MS932.
Neoview Character Set Configurations
17
Table 1-3 Features, Behaviors, and Limitations of the Neoview Character Set Configurations
(continued)
Configuration Features and Behaviors
18
Limitations
SJIS
• Neoview platforms with the SJIS
configuration require Release 2.4 or Release
2.3 Neoview ODBC and JDBC drivers. If you
connect a Release 2.2 driver to a Release 2.4
Neoview platform with the SJIS
configuration, the connection fails and a
connection error is generated. For more
information, see “Compatibility Between
Neoview ODBC and JDBC Drivers and
Neoview Platforms” (page 19).
• Compatible data from EUC-JP or UTF8
character sets is translated to SJIS.
• The default column character set definition
is ISO88591.
• The size of an ISO88591 column indicates the
number of bytes in the column, whether or
not the column contains SJIS characters.
• Uses binary collation, which for SJIS
characters is also JIS collation.
• All EMS messages are sent in UTF8 format.
Unicode
• Supports the character sets EUC-JP, KS-Code, For this release, the Unicode configuration has these
BIG5, GB2312, GB18030, GBK, UTF8, and
limitations and restrictions:
UTF16 from client locales. Multibyte client • To ensure that ODBC client applications can
locale character encoding is converted to
query tables across multiple client locales, use
UTF16 encoding when it is stored in UCS2
7-bit ASCII characters in table and column
columns in the Neoview database.
names.
• Neoview platforms with the Unicode
configuration require Release 2.4 or Release
2.3 Neoview ODBC and JDBC drivers. If you
connect a Release 2.2 driver to a Release 2.4
Neoview platform with the Unicode
configuration, the connection fails and a
connection error is generated. For more
information, see “Compatibility Between
Neoview ODBC and JDBC Drivers and
Neoview Platforms” (page 19).
• The default column character set definition
is UCS2.
• User data is encoded in UTF16 for UCS2
columns and UTF8 for ISO88591 columns.
• SQL string functions operate on Unicode
characters in UCS2 columns.
Supplementary-plane characters from
GB18030, which use surrogate pairs, are not
supported by the SQL string functions. SQL
string functions that operate on surrogate
pair characters assume they consist of two
UCS2-encoded characters that comprise four
bytes.
• The size of the character column indicates
the number of 16-bit Unicode characters in a
UCS2 column.
• Table and column names provided in SQL
statements are stored in ISO88591 columns
as UTF8 characters.
• Uses binary collation.
• All EMS messages are sent in UTF8 format.
Introduction to Neoview Character Sets
Compatibility Between Neoview ODBC and JDBC Drivers and Neoview
Platforms
Table 1-4 summarizes the compatibility between Release 2.4, 2.3, and 2.2 ODBC and JDBC drivers
and Release 2.4, 2.3, and 2.2 Neoview platforms.
Table 1-4 Driver and Neoview Platform Compatibility
Driver Release
Version
Release 2.4 Neoview Platform Release 2.3 Neoview Platform Release 2.2 Neoview Platform
2.4
Yes
Yes
Yes
2.3
Yes
Yes
Yes
2.2
ISO88591 configuration only ISO88591 configuration only
Yes
As Table 1-4 shows, Neoview ODBC and JDBC drivers that are connected to a network containing
Release 2.4, 2.3, and 2.2 Neoview platforms operate according to these rules and restrictions:
•
•
Release 2.4 and 2.3 Neoview ODBC and JDBC drivers are backward compatible with Release
2.2 Neoview platforms.
Release 2.2 Neoview ODBC and JDBC drivers can be connected only to a Release 2.4 or 2.3
Neoview platform that uses the ISO88591 configuration. If you connect Release 2.2 drivers
to a Release 2.4 or 2.3 Neoview platform that uses the SJIS or Unicode configuration, the
connection fails and a connection error is generated. For information about troubleshooting
this connection error, see the Chapter 5 (page 39).
NOTE: For information about how to check the version compatibility of Neoview ODBC and
JDBC drivers and install them, see these Readme files:
• README for the HP Neoview ODBC Driver for Windows
• README for the HP Neoview ODBC Drivers for UNIX
• README for the HP Neoview JDBC Type 4 Driver
Compatibility Between Neoview ODBC and JDBC Drivers and Neoview Platforms
19
20
2 Selecting a Neoview Character Set Configuration
This chapter provides this information:
•
•
•
“Criteria for Selecting a Neoview Character Set Configuration” (page 21)
“Process for Implementing a Neoview Character Set Configuration” (page 22)
“Rules for Migrating to Neoview Release 2.4” (page 22)
Criteria for Selecting a Neoview Character Set Configuration
Table 2-1 identifies the criteria you should use to identify the correct Neoview Character Set
configuration for your Neoview platform. For professional assistance, contact your HP support
provider.
Table 2-1 Criteria for Selecting the Correct Neoview Character Set Configuration
Select This
Configuration...
If Your Neoview Platform Environment Meets
Any One of These Conditions or Sets of
Conditions...
Additional Considerations
ISO88591
• Any Neoview Release 2.3 customer with For a summary of the features and
the ISO88591 configuration who is
limitations of the ISO88591 configuration for
migrating to Release 2.4.
this release, see Table 1-3 (page 17).
• New customer wants to be able to use any
client locale encoding, including
multibyte character sets, and store the
data in that encoding in ISO88591
columns without any character mapping
or translation.
SJIS
• Any Neoview Release 2.3 customer with For a summary of the features and
the SJIS configuration who is migrating limitations of the SJIS configuration for this
to Release 2.4.
release, see Table 1-3 (page 17).
• Customer wants to provide SJIS
characters in SQL identifiers.
• A new Neoview customer who already
uses the SJIS character set on Windows
PCs with Microsoft codepage 932
(MS932), requires JIS collations, and uses
SJIS characters in table and column
names.
NOTE: If character data that is not fully
compatible with SJIS MS932 encoding is
present in a Release 2.3 database, that
data might not translate successfully in a
migration to the SJIS configuration in the
Release 2.4 environment. For more
information, contact your HP support
provider.
Unicode
• Any Neoview Release 2.3 customer with Other factors for choosing the Unicode
the SJIS configuration who is migrating configuration are:
to Release 2.4.
• Can use any of the character sets
• Wants to use multiple client locales.
identified in Table 1-3 (page 17).
• Uses the SJIS character set but does not
require JIS collation on their data and will
store the SJIS data in UCS2 columns.
For a summary of the features and
limitations of the Unicode configuration for
this release, see Table 1-3 (page 17).
Criteria for Selecting a Neoview Character Set Configuration
21
Process for Implementing a Neoview Character Set Configuration
If you are a new customer, follow this process to select and implement the correct Neoview
character set configuration:
1.
2.
3.
4.
When you order your new Neoview platform, you receive a Neoview Order processing
form in the Customer Process Systems Architecture (CPSA) that describes the selection
criteria for the supported Neoview character set configurations.
Complete the form, including your configuration choice, and return it in the CPSA.
Your configuration choice is communicated, along with other Neoview platform configuration
information, to HP manufacturing.
HP manufacturing builds and configures your Neoview platform as ordered, including
setting the Neoview character set configuration during N02.04 RVU software installation.
Rules for Migrating to Neoview Release 2.4
If you are an existing Neoview customer who is migrating to Release 2.4, follow these rules:
•
•
If you want to migrate to Neoview Release 2.4 from a release earlier than Neoview Release
2.3, you must first migrate to Release 2.3. For more information, contact your HP support
provider.
If you want to migrate to Neoview Release 2.4 from Neoview Release 2.3, you must retain
the same Neoview character set configuration (ISO88591, SJIS, or Unicode). You cannot
change to a different configuration for Release 2.4.
NOTE: HP support personnel are fully responsible for migrating a Neoview platform to the
Release 2.4 environment.
22
Selecting a Neoview Character Set Configuration
3 Using SQL Language Elements to Define and Manage
Database Encoding
This chapter includes:
•
•
•
•
•
“Rules for Encoding SQL Language Elements” (page 23)
“Behavior of SQL Functions” (page 26)
“Behavior of SQL String Functions” (page 27)
“Guidelines for the LIKE Predicate in the SJIS and Unicode Configurations” (page 31)
“Locating Invalid Characters in Syntax Error Messages” (page 33)
Rules for Encoding SQL Language Elements
Table 3-1 describes the rules that govern the use of character set data in SQL language elements
for each of the three Neoview character set configurations.
NOTE: Failure to observe the rules described in Table 3-1 can cause SQL queries to fail and
return error messages.
Table 3-1 Summary of SQL Language Rules by Neoview Character Set Configuration
SQL Language Rule
ISO88591 Configuration
SJIS Configuration
Unicode Configuration
The default column is
ISO88591, so you must
explicitly specify CHARACTER
SET UCS2 in the character
set value for any UCS2
column you create.
The default column is UCS2,
so you must explicitly specify
CHARACTER SET ISO88591
in the character set value for
any ISO88591 column you
create.
DDL and DML Statements
Explicitly define the
column whenever a
column is not the
default column for the
configuration.
The default column is
ISO88591, so you must
explicitly specify CHARACTER
SET UCS2 in the character
set value for any UCS2
column you create.
Specify the correct size Character data size is in
for the character data bytes.
stored in a column.
When you use SJIS characters • Whenever you explicitly
in an ISO88591 column, you
provide ISO88591 columns
should always double the
(which contain UTF8 data)
number of bytes you would
in the Unicode
normally assign to the
configuration, make sure
character data type. For
to specify a size (in bytes)
for the column that will be
example, use CHAR(2)
sufficient to hold your data.
instead of CHAR(1) and use
For UTF8 encoding, ASCII
CHAR(10) instead of
characters require only one
CHAR(5)
byte, but other character
sets might require two,
three, or even four bytes.
• Whenever you use
surrogate Unicode
characters in a UCS2
column, you must specify
a minimum of two
characters in the data type
(for example, CHAR(2)).
You cannot use CHAR(1)
for Unicode surrogate
characters.
Rules for Encoding SQL Language Elements
23
Table 3-1 Summary of SQL Language Rules by Neoview Character Set Configuration (continued)
SQL Language Rule
ISO88591 Configuration
Explicitly specify a
Use these prefixes:
valid character set
• In ISO8859-1 string
prefix value
literals, you can but are
(_character-set)
not required to specify:
for every string literal
— _ISO88591 for
in a column that is not
ISO8859-1 characters
in the default character
• In UCS2 string literals,
set for the
you must specify:
configuration.
— _UCS2
for the UCS2 character
set or
— N for NCHAR, which
maps by default to
UCS2 characters
SJIS Configuration
Unicode Configuration
Use these prefixes:
• In ISO8859-1 string
literals, you can but are
not required to specify:
— _ISO88591 for SJIS
characters
Use these prefixes:
• In UCS2 string literals, you
can but are not required to
specify:
— _UCS2
• In UCS2 string literals,
you must specify:
— _UCS2
for UCS2 characters or
— N for NCHAR, which
maps by default to
UCS2 characters
for UCS2 characters or • In ISO8859-1 string literals,
you must specify:
— N for NCHAR, which
maps by default to
— _ISO88591, which
UCS2 characters
maps to UTF8
characters
SQL Functions and SQL String Functions
The behavior of SQL
functions and SQL
string functions is
sometimes determined
by the selected
Neoview character set
configuration. For
information about SQL
functions, see Table 3-2
(page 26). For more
information about SQL
string functions, see
Table 3-3 (page 27)and
Table 3-4 (page 28).
For the ISO88591
For the SJIS configuration:
For the Unicode configuration:
configuration:
• String functions that
• String functions that
• For ISO8859-1 character
contain multibyte SJIS
contain multibyte UTF8
data, string functions
characters can be stored
characters can be stored in
work on single-byte
in SQL table columns.
SQL table columns.
character boundaries. For • Multibyte SJIS characters • Multibyte UTF8 characters
UCS2 data, string
are treated as one
are treated as one character
functions work on 2-byte
character in string
in string functions.
character boundaries.
functions.
• For ISO8859-1 character
data, substring functions
return single-byte
characters. For UCS2 data,
substring functions return
2-byte characters.
Binary Collations
The type of encoding
used in binary
collations is
determined by the
selected configuration.
For ISO88591 columns,
binary collations are based
on ISO8859-1 encoding. For
UCS2 columns, binary
collations are based on UCS2
encoding.
For ISO88591 columns,
binary collations are based
on SJIS encoding. For UCS2
columns, binary collations
are based on UCS2 encoding.
For UCS2 columns, binary
collations are based on UCS2
encoding. For ISO88591
columns, binary collations are
based on UTF8 encoding.
Binary data can be stored in User-defined SJIS characters
ISO88591 columns. Such data can be stored in ISO88591
can include user-defined
columns.
characters.
User-defined GB2312 and
GBK characters can be stored
either in ISO88591 columns
(encoded in UTF8) or in UCS2
columns (encoded in UCS2).
User-Defined Characters
Provide user-defined
characters only as
permitted by the
selected configuration.
24
Using SQL Language Elements to Define and Manage Database Encoding
Table 3-1 Summary of SQL Language Rules by Neoview Character Set Configuration (continued)
SQL Language Rule
ISO88591 Configuration
SJIS Configuration
Unicode Configuration
SQL Identifiers
Size SQL identifiers as • SQL identifiers can be up • SQL identifiers can be a
dictated by the selected
to 128 characters (bytes)
maximum of 64 to 128
configuration. SQL
in length.
characters in length,
identifiers are limited • Regular identifiers are
depending on the SJIS
to 128 byte lengths in
characters stored.
used.
all three Neoview
• Delimited identifiers are
character set
required.
configurations, but the
number of characters
enclosed in the SQL
identifier can range
from 32 to 128.
Comply with the
110-byte length limit
on table names that are
used by materialized
views and triggers.
• SQL identifiers are stored
in UTF8 format, where
Unicode characters are
stored in one, two, three, or
four bytes. SQL identifiers
can be a maximum of 32 to
128 characters in length,
depending on the size of
the UTF8 characters stored.
For example, an SQL
identifier that uses a series
of 4-byte characters would
be limited to 32 characters.
• Delimited identifiers are
required.
Table names that are used by materialized views and triggers are limited to 110 bytes in all
three configurations. If you create a materialized view or trigger on a table, you must make
sure the table name does not exceed this limit. If it does, you must shorten the table name
to an acceptable size.
When necessary, use
Not applicable
7-bit ASCII encoding
for SQL identifiers and
SQL data to enforce
support for multiple
client locales.
Not applicable
If you are using a Neoview
ODBC driver, each client
workstation can only display
SQL identifiers and SQL data
encoded in the workstation's
currently-configured character
set. So use only 7-bit ASCII
characters in SQL identifiers
and SQL data if they will be
displayed on workstations
with different client locales.
For more information, see
Appendix B (page 47).
Not applicable because
ISO88591 is the default
column.
Because UCS2 is the default
column and metadata must be
stored in ISO88591 columns,
you must explicitly specify the
_ISO88591 prefix for every
literal value in the SQL
statement.
Querying SQL Metadata
Applications that
Not applicable because
access SQL metadata ISO88591 is the default
must explicitly specify column.
prefix literals for
metadata.
Stored Procedures in Java
If you use a character All literals for UCS2 columns All literals for UCS2 columns
string literal in a CALL must be explicitly defined
must be explicitly defined
statement or as the
with the prefix _UCS2.
with the prefix _UCS2.
translated value for an
SQL statement inside a
stored procedure, you
must explicitly define
the prefix for every
character string in a
table column that is not
in the default character
set for the selected
Neoview character set
configuration.
All literals for ISO88591
columns must be explicitly
defined with the prefix
_ISO88591.
Rules for Encoding SQL Language Elements
25
Table 3-1 Summary of SQL Language Rules by Neoview Character Set Configuration (continued)
SQL Language Rule
ISO88591 Configuration
SJIS Configuration
Unicode Configuration
EMS event messages use
client locale character
encoding and might not be
readable from Neoview DB
Admin.
Uses UTF8 encoding.
Uses UTF8 encoding.
EMS Event Messages
EMS event messages
from the Neoview
platform are normally
sent in UTF8 encoding.
Behavior of SQL Functions
Table 3-2 (page 26) describes the behavior of the SQL functions that operate on character data
for each of the three Neoview character set configurations.
Table 3-2 Behavior of SQL Functions
SQL Function
26
ISO88591 Configuration
SJIS Configuration
Unicode Configuration
CONVERTTOHEX Returns ASCII characters in
ISO8859-1 encoding
Returns ASCII characters in
ISO8859-1 encoding
Returns ASCII characters in
UCS2 encoding
CURRENT_ROLE Returns user role name in
ISO8859-1 encoding
Returns user role name in
ISO8859-1 encoding
Returns user role name in UCS2
encoding.
CURRENT_USER Returns user role name in
ISO8859-1 encoding
Returns user role name in
ISO8859-1 encoding
Returns user role name in UCS2
encoding.
DATEFORMAT
Returns the datetime value in
ISO8859-1 encoding
Returns the datetime value in Returns the datetime value in
ISO8859-1 encoding
UCS2 encoding
DAYNAME
Returns the name of the day
from a date or timestamp
expression in ISO8859-1
encoding
Returns the name of the day
from a date or timestamp
expression in ISO8859-1
encoding
Returns the name of the day
from a date or timestamp
expression in UCS2 encoding
MONTHNAME
Returns the name of the month
from a date or timestamp
expression in ISO8859-1
encoding
Returns the name of the
month from a date or
timestamp expression in
ISO8859-1 encoding
Returns the name of the month
from a date or timestamp
expression in UCS2 encoding
USER
Returns the current user role
Returns the current user role Returns the current user role
name or the role assigned to a name or the role assigned to a name or the role assigned to a
number in ISO8859-1 encoding number in ISO8859-1 encoding number in UCS2 encoding
Using SQL Language Elements to Define and Manage Database Encoding
Behavior of SQL String Functions
String functions behave differently in the Neoview Release 2.3 and Neoview Release 2.4
environments. Table 3-3 describes these differences.
Table 3-3 String Function Behaviors for Neoview Release 2.3 and Neoview Release 2.4
Neoview Release
Limitations
Storage Length vs. Character Boundaries
Release 2.3
String functions might not work properly
when input strings containing multibyte
characters are stored in ISO88591 columns on
a Neoview platform using the SJIS or Unicode
configuration.
Position and length used by string functions
determine the storage length, not character
boundaries. For example, single-byte
characters use one character for string
functions, while 2-byte characters require two
characters.
Release 2.4
String functions that contain valid multibyte
characters can be stored in SQL table columns
when using SJIS characters in the SJIS
configuration or UTF8 characters in the
Unicode configuration without causing data
corruption. If an invalid character is used, an
error is returned.
Character boundaries determine character
units. Each single-byte and multibyte
character is treated as one character for string
functions, no matter the byte length of the
character.
Table 3-4 describes the SQL string functions and, where applicable, unique considerations for
the ISO88591, SJIS, and Unicode configurations. SQL string function values can be provided to
table columns in the form of a literal or a parameter.
Behavior of SQL String Functions
27
Table 3-4 Behaviors of SQL String Functions in the Three Configurations
ISO88591
Configuration
Considerations
SJIS Configuration
Considerations
Unicode Configuration
Considerations
If the value of the first
byte in the string is
greater than 127,
Neoview SQL returns
error 8428 (“The
argument to function
ASCII is not valid”).
If the value of the first
byte in the string is
greater than 127,
Neoview SQL returns
error 8428 (“The
argument to function
ASCII is not valid”).
CHAR: Returns the character that has the • The
• Valid
specified code-value. The value of
char-set-name
char-set-name
value can be
char-set-name can be ISO88591, UCS2,
values are SJIS,
ISO88591 or UCS2.
SJIS (for the SJIS configuration only), and
ISO88591, and
UTF8 (for the Unicode configuration only). • If no value is
UCS2. Users who
want
specified for
char-set-name
char-set-name,
to be SJIS or UCS2
the CHAR function
must explicitly
assumes it is
specify it.
ISO88591. Users
who want
• The default
char-set-name to
char-set-name
be UCS2 must
value is
explicitly specify it.
ISO88591. If no
value is specified,
the CHAR
function assumes
it is an ISO88591
column containing
SJIS data.
• If the
char-set-name
value is SJIS or
ISO88591 and
code-value can
be mapped to a
valid SJIS
character, the
CHAR function
returns a SJIS
character.
• For a SJIS
character, the
return type is
VARCHAR(2).
• Valid
char-set-name
values are UTF8,
ISO88591, and
UCS2. Users who
want
char-set-name to
be ISO88591 or
UTF8 must
explicitly specify it.
• The default
char-set-name
value is UCS2. If no
value is specified,
the CHAR function
assumes it is UCS2.
• If char-set-name
is UTF8 or
ISO88591 and
code-value can be
mapped to a valid
UTF8 character, the
CHAR function
returns a UTF8
character.
• For a UTF8
character, the return
type is
VARCHAR(4).
CHAR_LENGTH: Returns the number of
characters in a string.
Every character,
including multibyte
characters, counts as
one character.
Every character,
including multibyte
characters, counts as
one character.
The character string
argument in the
CODE_VALUE
function can be in one
of the character sets
supported by this
configuration.
The character string
argument in the
CODE_VALUE
function can be in one
of the character sets
supported by this
configuration.
SQL String Function and Description
ASCII: Returns the code point value for the No special
first byte in a character string.
considerations
Every character,
including multibyte
characters, counts as
one character.
CODE_VALUE: Returns an unsigned
The character string
integer for the code point value of the first argument in the
character in a character string.
CODE_VALUE
function can be in one
of the character sets
supported by this
configuration.
28
Using SQL Language Elements to Define and Manage Database Encoding
Table 3-4 Behaviors of SQL String Functions in the Three Configurations (continued)
ISO88591
Configuration
Considerations
SJIS Configuration
Considerations
Unicode Configuration
Considerations
CONCAT: Returns the concatenation of
two character value expressions as a
character string value.
Both character value
expressions must be
either ISO8859-1
character expressions
or UCS2 character
expressions.
Both character value
expressions must be
either ISO8859-1
character expressions
or UCS2 character
expressions.
Both character value
expressions must be
either ISO8859-1
character expressions
or UCS2 character
expressions.
INSERT: Returns a character string where
a specified number of characters within the
character string has been deleted, beginning
at the specified start position, and where
another character string has been inserted
at the start position.
Every character,
including multibyte
characters, is treated as
one character.
Every character,
including multibyte
characters, is treated
as one character.
Every character,
including multibyte
characters, is treated as
one character.
LCASE: Downshifts alphabetic characters. No special
For non-alphabetic characters, LCASE
considerations
returns the same character.
No special
considerations
No special
considerations
LEFT: Returns the leftmost specified
number of characters from a character
expression.
Every character,
including multibyte
characters, is treated as
one character.
Every character,
including multibyte
characters, is treated
as one character.
Every character,
including multibyte
characters, is treated as
one character.
LOCATE: Searches for given substring in
a character string. If the substring is found,
Neoview SQL returns the character position
of the substring within the string.
Every character,
including multibyte
characters, is treated as
one character.
Every character,
including multibyte
characters, is treated
as one character.
Every character,
including multibyte
characters, is treated as
one character.
LOWER: Same as LCASE.
No special
considerations
No special
considerations
No special
considerations
LPAD: Pads the left side of a string with
the specified string.
Every character,
including multibyte
characters, is treated as
one character.
Every character,
including multibyte
characters, is treated
as one character.
Every character,
including multibyte
characters, is treated as
one character.
LTRIM: Removes leading spaces from a
character string.
No special
considerations
No special
considerations
No special
considerations
OCTET_LENGTH: Returns the length of a No special
character string in bytes.
considerations
No special
considerations
No special
considerations
POSITION: Same as LOCATE.
Every character,
including multibyte
characters, is treated
as one character.
Every character,
including multibyte
characters, is treated as
one character.
REPEAT: Returns a character string
No special
composed of the evaluation of a character considerations
expression repeated a specified number of
times.
No special
considerations
No special
considerations
REPLACE: Returns a character string where
all occurrences of a specified character
string in the original string are replaced
with another character string.
All three character
value expressions must
be comparable types
and must use the same
character set.
All three character
value expressions
must be comparable
types and must use
the same character
set.
All three character
value expressions must
be comparable types
and must use the same
character set.
RIGHT: Returns the rightmost specified
number of characters from a character
expression.
Every character,
including multibyte
characters, is treated as
one character.
Every character,
including multibyte
characters, is treated
as one character.
Every character,
including multibyte
characters, is treated as
one character.
SQL String Function and Description
Every character,
including multibyte
characters, is treated as
one character.
Behavior of SQL String Functions
29
Table 3-4 Behaviors of SQL String Functions in the Three Configurations (continued)
ISO88591
Configuration
Considerations
SJIS Configuration
Considerations
Unicode Configuration
Considerations
RPAD: Pads the right side of a string with Every character,
the specified string.
including multibyte
characters, is treated as
one character.
Every character,
including multibyte
characters, is treated
as one character.
Every character,
including multibyte
characters, is treated as
one character.
RTRIM: Removes trailing spaces from a
character string.
No special
considerations
No special
considerations
Every character,
including multibyte
characters, is treated
as one character.
Every character,
including multibyte
characters, is treated as
one character.
The second argument
in the function, the
char-set-name
value, can be
ISO88591 or UCS2;
SJIS and UTF8 are not
supported. If you do
not specify the
char-set-name
value, the default is
ISO88591.
The second argument
in the function,
char-set-name
value, can be
ISO88591 or UCS2;
SJIS and UTF8 are
not supported. If you
do not specify the
char-set-name
value, the default is
ISO88591 (containing
SJIS data).
The second argument
in the function,
char-set-name
value, can be
ISO88591 or UCS2;
SJIS and UTF8 are not
supported. If you do
not specify the
char-set-name
value, the default is
UCS2.
SUBSTRING: Extracts a substring out of a When specifying the
given character expression.
length of the substring,
keep in mind that
every character,
including multibyte
characters, is treated as
one character.
When specifying the
length of the
substring, keep in
mind that every
character, including
multibyte characters,
is treated as one
character.
When specifying the
length of the substring,
keep in mind that
every character,
including multibyte
characters, is treated as
one character.
SUBSTR: Same as SUBSTRING.
No special
considerations
No special
considerations
SQL String Function and Description
No special
considerations
SPACE: Returns a character string
Every character,
consisting of a specified number of spaces. including multibyte
characters, is treated as
one character.
30
No special
considerations
Using SQL Language Elements to Define and Manage Database Encoding
Table 3-4 Behaviors of SQL String Functions in the Three Configurations (continued)
ISO88591
Configuration
Considerations
SJIS Configuration
Considerations
Unicode Configuration
Considerations
TRANSLATE: Translates a character string • The
from a source character set to a target
ISO88591TOUCS2
character set. These six
option can be used
to translate
translation-name options can be used:
ISO8859-1
• ISO88591TOUCS2
characters to UCS2
• SJISTOUCS2
characters.
• UTF8TOUCS2
• The
• UCS2TOISO88591
UCS2TOISO88591
• UCS2TOSJIS
option can be used
• UCS2TOUTF8
to translate UCS2
characters to
The TRANSLATE function changes both
ISO8859-1
the character string data type and the
characters.
character set encoding of the string.
• The SJISTOUCS2
option can be used
to translate SJIS
characters in an
ISO88591 column
to UCS2
characters.
• The UCS2TOSJIS
option can be used
to translate UCS2
characters in a
UCS2 column to
SJIS characters.
• The
ISO88591TOUCS2
option can be used
to translate
ISO8859-1
characters to UCS2
characters.
• The
UCS2TOISO88591
option can be used
to translate UCS2
characters to
ISO8859-1
characters.
• The UTF8TOUCS2
option can be used
to translate UTF8
characters in an
ISO88591 column to
UCS2 characters.
• The UCS2TOUTF8
option can be used
to translate UCS2
characters to UTF8
characters.
• The
ISO88591TOUCS2
option can be used
to translate
ISO8859-1
characters to UCS2
characters.
• The
UCS2TOISO88591
option can be used
to translate UCS2
characters to
ISO8859-1
characters.
TRIM: Removes leading and trailing
characters from a character string.
Every character,
including multibyte
characters, is treated
as one character.
Every character,
including multibyte
characters, is treated as
one character.
UCASE: Upshifts alphabetic characters. For No special
non-alphabetic characters, UCASE returns considerations
the same character.
No special
considerations
No special
considerations
UPPER: Same as UCASE.
No special
considerations
No special
considerations
No special
considerations
UPSHIFT: Same as UCASE.
No special
considerations
No special
considerations
No special
considerations
SQL String Function and Description
Every character,
including multibyte
characters, is treated as
one character.
Guidelines for the LIKE Predicate in the SJIS and Unicode Configurations
For customers with the SJIS or Unicode configuration, Neoview SQL Release 2.4 behaves
differently than Neoview Release 2.3 with respect to the LIKE predicate.
In Neoview Release 2.3, the underscore (_) matches only one byte. To use the underscore effectively
with multibyte characters, you must know the byte length of the multibyte character. Otherwise,
you might not get the results that you want using the LIKE predicate. If the Neoview platform
uses the SJIS configuration and you want to match one SJIS character, you must specify one or
two underscore characters in the LIKE pattern, depending on the number of bytes that make up
a single SJIS character. If the Neoview platform uses the Unicode configuration and you want
to match one UTF8 character, you must specify one to four underscore characters in the LIKE
pattern, depending on the number of bytes that make up a single UTF8 character.
Guidelines for the LIKE Predicate in the SJIS and Unicode Configurations
31
In Neoview Release 2.4, one underscore always matches one character, regardless of the byte
length of the character. When matching a single multibyte character in the database, specify one
underscore in the LIKE pattern.
In Neoview Release 2.3, the query in the following example requires you to put two underscores
in the LIKE pattern on a Neoview platform with the SJIS configuration, and three underscores
in the LIKE pattern on a system with the Unicode configuration. Otherwise, Neoview SQL does
not return the row, as this example shows:
>>select * from (values(_ISO88591'
')) as t(c1)
+> where c1 like _ISO88591' _
%';
C1
---------------------------------------------------- 0 row(s) selected.
>>select * from (values(_ISO88591'
')) as t(c1)
+>where c1 like _ISO88591' __
%';
C1
--------------------------------------------------
--- 1 row(s) selected.
In double-byte SJIS characters, the second byte might be 0x5C (\) or 0x5F (_). The underscore is
one of the wild-card characters in SQL, and the backslash character is used as an escape character
in ODBC. In Neoview Release 2.3, if a LIKE pattern includes a double-byte SJIS character that
uses 0x5C or 0x5F as the second byte, the SELECT statement might return the wrong results
because Neoview SQL interprets the second byte as an escape sequence or a wild-card character,
as this example shows:
>>insert into t1 values(x'41 42 43 83 5C 84 5C 31 32 33');
--- 1 row(s) inserted.
>>select * from t1;
C1
-------------------ABC
123
--- 1 row(s) selected.
>>-- The next query should not return 'ABC
123' because
>>-- it does not match '% %' (x'25 83 5F 84 5F 25')
>>select * from t1 where c1 like x'25 83 5F 84 5F 25';
C1
-------------------ABC
123
--- 1 row(s) selected.
>>-- The next query should return 'ABC
123' because
>>-- it matches '% %'
32
Using SQL Language Elements to Define and Manage Database Encoding
>>select * from t1 where c1 like x'25 84 5C 25' escape '\';
--- 0 row(s) selected.
In Neoview Release 2.4, character strings are compared at the character level, not the byte level.
Therefore, the second byte of a double-byte SJIS character in a LIKE pattern is treated as part of
the SJIS character and not as an escape sequence or a wild-card character, as this example shows:
>>select * from t1 where c1 like x'25 83 5F 84 5F 25';
--- 0 row(s) selected.
>>select * from t1 where c1 like '%
%'; -- (
is 835F, and
is 845F)
--- 0 row(s) selected.
>>select * from t1 where c1 like x'25 84 5C 25' escape '\';
C1
-------------------ABC
123
--- 1 row(s) selected.
>>select * from t1 where c1 like '% %' escape '\'; -- (
is 845C)
C1
-------------------ABC
123
--- 1 row(s) selected.
Locating Invalid Characters in Syntax Error Messages
It is impossible to guarantee that a Neoview SQL syntax error message can always point to the
exact character that initiated the syntax error. That is why the message has always stated that
the error occurred “at or before” the spot where the carat is printed. This is further complicated
when non-ISO8859-1 source languages such as SJIS or GBK are used, where characters in strings
or delimited identifiers may consist of from one to three bytes of information.
There is the additional problem of SQL commands that can consist of hundreds or even thousands
of lines. The SQL compiler currently prints up to 945 characters that preceded the error, but this
is of limited assistance if the command contains hundreds of lines of very similar sequences of
clauses. SQL commands with fewer than 945 characters are reprinted in full.
For this Neoview Release, parser syntax error messages are expanded to include the actual
number of the character where the error was detected. Syntax error messages now identify the
number of characters from the start of the SQL statement where the error occurred, allowing
you to find the character more quickly with your editor. Unfortunately, the line number where
the error occurred cannot be displayed because the newline characters have been stripped from
the source before Neoview SQL sees it.
Example of a Syntax Error in a Shorter SQL Statement
This is an example of a syntax error on a command that uses less than 945 characters. The
character-identifying text with a carat is provided at the end of the syntax error.
>>create table simpl(a int,);
*** ERROR[15001] A syntax error occurred at or before:
create table simpl(a int,);
Locating Invalid Characters in Syntax Error Messages
33
^ (25 characters from start of SQL statement)
*** ERROR[8822] The statement was not prepared.
>>
Example of a Syntax Error in a Multi-Line SQL Statement
This is an example of a syntax errors on a command that uses more than 945 characters. The
character-identifying text with a carat is provided at the end of the syntax error.
SELECT T.WK_END_DT
, TRIM(TRAILING
FROM T.FCL_YR_ID)
, TRIM(TRAILING
FROM T.FCL_PER_ID)
, P.PKY_ID AS PRIM_KEY
, TRIM(TRAILING
FROM P.PKY_DSC_TX) AS KEY_DSCR
, SUM(A.TG_SL_QT) AS SALES_QT
, SUM(A.TG_SCN_QT) AS RNDM_WGT
, SUM(A.TG_SL_AM) AS SALES_AM
, ZEROIFNULL(SUM(A.TG_SL_AM ) - SUM(A.TG_DMGN_AM))AS MPC_COST
, SUM(A.TG_DMGN_AM) AS DIRECT_MARGIN
, ZEROIFNULL((SUM(A.TG_DMGN_AM ) / NULLIFZERO(SUM(A.TG_SL_AM))) )AS DMGN_PERCENT
, ZEROIFNULL((SUM(A.TG_T_MKDN_AM) ) / 1)AS MARKDOWN
, ZEROIFNULL((SUM(A.TG_T_MKDN_AM ) / NULLIFZERO(SUM(A.TG_SL_AM ) + SUM(A.TG_T_MKDN_AM ))) )
AS MKDN_PERCENT
, ZEROIFNULL((SUM(A.TG_INV_QT) ) / 1)AS INV_QUANTITY
, ZEROIFNULL((SUM(A.TG_INV_CST_AM) ) / 1)AS INV_COST
, SUM(A.TG_PTS_AM) AS PTS_SALES
, SUM(A.TG_PTS_DMGN_AM) AS PTS_DIRECT_MARGIN
, ZEROIFNULL((SUM(A.TG_PTS_DMGN_AM ) / NULLIFZERO(SUM(A.TG_PTS_AM))) )AS PTS_DMGN_PERCENT
, SUM(A.TG_PTS_MKDN_AM) AS PTS_MARKDOWN
, SUM(A.TG_CLRN_AM) AS CLRN_AMOUNT
, SUM(A.TG_CLRN_DMGN_AM) AS CLRN_DIRECT_MARGIN
, ZEROIFNULL((SUM(A.TG_CLRN_DMGN_AM
/ NULLIFZERO(SUM(A.TG_CLRN_AM))) )AS CLRN_DMGN_PERCENT
, SUM(A.TG_CLRN_MKDN_AM) AS CLRN_MARKDOWN
, ZEROIFNULL((SUM(A.TG_CLRN_INV_CST_AM) ) / 1) AS CLRN_INV_COST
, ZEROIFNULL((INV_QUANTITY / NULLIFZERO(SALES_QT)))AS STR_WK_SUPPLY
FROM WK_ZN_PRMKY_HST A
, WK_END_DT_INF T
, PRMKY_INF P
WHERE T.WK_END_DT = A.WK_END_DT
AND P.PKY_ID = A.PKY_ID
AND (T.WK_END_DT IN ('20060610'))
GROUP BY 1, 2, 3, 4, 5;
*** ERROR[15001] A syntax error occurred at or before:
...TG_DMGN_AM) AS DIRECT_MARGIN , ZEROIFNULL((SUM(A.TG_DMGN_AM ) / NULLIFZERO(S
UM(A.TG_SL_AM))) )AS DMGN_PERCENT , ZEROIFNULL((SUM(A.TG_T_MKDN_AM) ) / 1)AS M
ARKDOWN , ZEROIFNULL((SUM(A.TG_T_MKDN_AM ) / NULLIFZERO(SUM(A.TG_SL_AM ) + SUM(
A.TG_T_MKDN_AM ))) )AS MKDN_PERCENT , ZEROIFNULL((SUM(A.TG_INV_QT) ) / 1)AS INV
_QUANTITY , ZEROIFNULL((SUM(A.TG_INV_CST_AM) ) / 1)AS INV_COST , SUM(A.TG_PTS_A
M) AS PTS_SALES , SUM(A.TG_PTS_DMGN_AM) AS PTS_DIRECT_MARGIN , ZEROIFNULL((SUM(
A.TG_PTS_DMGN_AM ) / NULLIFZERO(SUM(A.TG_PTS_AM))) )AS PTS_DMGN_PERCENT , SUM(A
.TG_PTS_MKDN_AM) AS PTS_MARKDOWN , SUM(A.TG_CLRN_AM) AS CLRN_AMOUNT , SUM(A.TG_
CLRN_DMGN_AM) AS CLRN_DIRECT_MARGIN , ZEROIFNULL((SUM(A.TG_CLRN_DMGN_AM
/ NUL
LIFZERO(SUM(A.TG_CLRN_AM))) )AS CLRN_DMGN_PERCENT , SUM(A.TG_CLRN_MKDN_AM) AS C
^ (1057 characters from start of SQL statement)
*** ERROR[8822] The statement was not prepared.
>>
34
Using SQL Language Elements to Define and Manage Database Encoding
4 Capabilities and Limitations of Neoview Client
Applications
This chapter describes the capabilities and limitations of these Neoview client applications with
respect to the Neoview Character Sets feature for this Neoview release:
•
•
•
•
•
•
“Neoview Command Interface (NCI)” (page 35)
“Neoview DB Admin” (page 35)
“Neoview Loader” (page 36)
“Neoview Management Dashboard” (page 36)
“Neoview Manageability Repository” (page 37)
“Neoview Transporter Client” (page 37)
This chapter also describes the capabilities and limitations of the Neoview Character Sets feature
in Neoview Workload Management Services (WMS). See “Neoview Workload Management
Services (WMS)” (page 38).
Neoview Command Interface (NCI)
NCI communicates with the Neoview database through JDBC. The Neoview JDBC driver handles
all required character translations and manages all SQL statements and character data in UTF16.
Obey files and log files are assumed to be encoded in the client locale character set and are
translated by the Java runtime to UTF16.
Any obey file that you run from the Neoview Command Interface must be opened in the character
set that is configured on the client workstation from which the operation is issued. An obey file
encoded with a different character set than the client locale might not open. For example, if your
client locale is a Windows workstation configured with a code page for Japanese characters, the
obey file must also be in Japanese characters.
For more information about the Neoview Command Interface, see the Neoview Command Interface
(NCI) Guide.
Neoview DB Admin
The Neoview DB Admin GUI interface is Java-based and accepts characters in any language that
is installed on a connected client workstation. Character translation operates as follows:
•
•
•
•
•
Client locale character encoding is translated to UTF16 for internal use, then converted by
the JORB to UTF8 and sent to the SQL server.
The SQL server accepts the UTF8 characters and performs the necessary internal translations,
including converting all characters to the ISO_MAPPING encoding and translating data
being returned to the Neoview DB Admin servers from the ISO_MAPPING encoding to
UTF8. Any character that is not properly translated to UTF8 on the server side is reported
as a CORBA exception.
Character data sent in its client locale character encoding is translated to UTF16 for internal
use by Neoview DB Admin.
The UTF16 data is converted to UTF8 and sent to the CORBA interfaces and the SQL server.
Character data retrieved by Neoview DB Admin from the Neoview database is first converted
to UTF8, then internally to UTF16, and finally encoded in its client locale character set.
For this Neoview release, Neoview DB Admin references the character set values defined in the
SYSTEM_DEFAULTS table. It displays and allow users to enter characters in all the supported
client locale character encodings.
Neoview Command Interface (NCI)
35
For this Neoview release, Neoview DB Admin imposes these restrictions:
•
•
•
If the characters you enter from Neoview DB Admin are not recognized by or compatible
with the SQL database, the Neoview DB Admin operation will be rejected by SQL with an
error.
When the ISO88591 configuration is used, Neoview DB Admin reverts to using only 7-bit
ASCII characters and will not support the use of 8-bit ASCII or multibyte characters from
the client locales.
It returns CORBA errors when you attempt to retrieve character data inserted from a different
client workstation using a character set that is incompatible with the current workstation.
For more information, see Table 5-1 (page 39).
For more information about Neoview DB Admin, see the Neoview DB Admin Online Help.
Neoview Loader
Table 4-1 (page 36) describes the input file encoding guidelines for the Neoview Loader in the
three Neoview character set configurations.
NOTE: For this Neoview release, the Neoview Loader is still supported for those Neoview
customers who acquired it in a previous Neoview release.
Table 4-1 Neoview Loader Capabilities and Limitations
ISO88591 Configuration
SJIS Configuration
Unicode Configuration
Input files for ISO88591 columns
must be encoded in ISO8859-1.
• Client locale Input files for
Input files for ISO88591 columns must
ISO88591 columns must be
be encoded in UTF8 with pass-through
encoded in MS932.
mode set to OFF.
• Always inserts SJIS-encoded data
into ISO88591 columns.
For all three configurations, input files for UCS2 columns must be encoded in UTF8.
For all three configurations, delimited 7-bit ASCII or ISO8859-1 characters must be used in table and column names.
For information about using gcmd to set the value of the cSetConversion (-cc) argument to
specify whether or not the Neoview Loader should perform character set conversion, see “How
Character Encoding Is Implemented in the Neoview Loader” (page 52).
Neoview Management Dashboard
For this Neoview release, the Dashboard Client supports the multibyte character sets, including
ISO88591, SJIS, and Unicode, implemented by the Neoview Character Sets feature, with these
restrictions:
•
36
Non-ASCII character set support is provided for only the following entities, fields, window,
and a character:
Entity
Fields
Query
ClientID, ApplID, SQLText, DSN
QueryRTS
ClientID, ApplID, ServiceName, QueryName
Table
Domain Name, Catalog, Schema, Table Name, Columns (Missing Stats),
Repair Action
Window
Command Wizard
Character
Backslash in all domain names only
Capabilities and Limitations of Neoview Client Applications
•
•
•
•
•
•
Non-ASCII characters do not occur in the other fields defined for these entities, except show
related displays.
Non-ASCII character set support has been provided for the East Asian locales supported
by the latest Neoview release.
In Show Related displays, characters outside the ASCII numeric code set are not displayed.
Capture of EDL with non-ASCII data is not supported.
Commands issued in the Command and Control facility might fail or have unexpected
results if the queries on which they operate have domain names that contain non-ASCII
characters.
The Client Server Gateway (CSG) cannot establish a connection to the Neoview platform
on behalf of a workstation that has non-ASCII characters in its name. Thus, the Dashboard
Client will obtain no data from the Neoview platform. A workaround for this limitation is
to use dynamic workstation names by clearing the Term Name option on the SSG property
sheet, as described in the Neoview Management Dashboard Client Guide for Database
Administrators.
A table whose name contains characters other than 7-bit ASCII cannot be explicitly configured
(by HP Support) for monitoring by Dashboard. Such a table will be monitored only if the
SQL compiler reports that the table has missing or obsolete statistics.
For detailed information about the Dashboard Client, see the Neoview Management Dashboard
Client Guide for Database Administrators.
Neoview Manageability Repository
The Repository supports these character set encoding features:
•
•
•
The Repository is aware of whether a Neoview platform uses the ISO88591, SJIS, or Unicode
configuration.
If the Neoview platform uses the ISO88591 configuration, the character fields in all Repository
tables and views will contain ISO8859-1 character data.
If the Neoview platform uses the SJIS or Unicode configuration, character fields in some,
but not all, Repository views will contain double-wide UCS2–encoded data.
For more information, see the Neoview Repository User Guide.
Neoview Transporter Client
For this Neoview release, the Neoview Transporter client supports:
•
•
•
•
•
•
•
Using delimited identifiers in table and column names that are encoded in the ISO8859-1,
SJIS, EUC-JP, BIG 5, GB2312, GB18030, KSC, and UTF8 character sets
In the ISO88591 configuration, loading input data sources “as is,” without translation, into
ISO88591 columns
In the ISO88591 configuration, using client locale character encoding for table and column
names
In the SJIS configuration, loading SJIS and EUC-JP encoded input data sources into the
Neoview database
In the Unicode configuration, loading GB2312, GB18030, KSC, BIG5, EUC-JP, and
SJIS-encoded data sources
Reporting all character translation errors as bad records
Logging all event messages in UTF8 format
For information about managing character translation and encoding for the Neoview Transporter
client, see “How Character Encoding Is Implemented in the Neoview Transporter Client”
(page 49). For general information about the Neoview Transporter, see the Neoview Transporter
User Guide.
Neoview Manageability Repository
37
Neoview Workload Management Services (WMS)
For this Neoview release, WMS has these character encoding behaviors:
•
•
•
•
•
•
•
WMS service names can be defined and created through ODBC, JDBC, or NCI. The service
names can be provided in any character set that is supported by the Neoview platform for
this Neoview release.
Service names are sent to EMS logs in UTF8. Internally, the names are transmitted and stored
in UTF8 in WMS
Service names must be enclosed in double quotes when the service name is case sensitive
or when it contains multibyte characters.
A WMS service name can be up to 24 characters, and each character can be from one to four
bytes. For the QUERY_NAME column returned in the STATUS QUERY/QUERIES commands,
the service name is truncated to 16 characters and followed by two vertical lines (||) and
the application ID.
Within WMS, SQL statements are always assumed to be in UTF8 encoding for the SJIS and
Unicode configurations. In the ISO88591 configuration, SQL statements are in the client
locale character encoding and will therefore display properly through a Java client but not
through an ODBC client.
Within WMS, query plans are assumed to be in the encoding specified by the ISO_MAPPING
value for the SJIS and Unicode configurations. In the ISO88591 configuration, characters are
provided in the client locale character encoding.
In the ISO88591 configuration, WMS cannot ascertain the encoding of any extended characters
that are defined in character strings.
For more information, see the Neoview Workload Management Services Guide.
38
Capabilities and Limitations of Neoview Client Applications
5 Troubleshooting Guidelines for Neoview Character Sets
Users
Table 5-1 identifies the Neoview Character Sets-related problems that you might need to
troubleshoot. For each problem type, the symptoms, probable causes, and recommended corrective
actions are provided. If you encounter problems that are not described here, contact your HP
support provider.
Table 5-1 Troubleshooting Symptoms, Causes, and Recommended Corrective Actions for Users
Problem Type
Symptoms
Probable Causes
Connection
error generated
when using a
downrev
Neoview ODBC
driver or
Neoview JDBC
driver
The connection is
rejected and a
connection error is
generated.
Release 2.2 Neoview ODBC and Use Neoview ODBC and JDBC drivers
JDBC drivers can be connected
that are compatible with all the
only to a Release 2.4 or Release 2.3 Neoview platforms on your network.
Neoview platform that uses the
ISO88591 configuration. If you
connect Release 2.2 Neoview
ODBC or JDBC drivers to a
Release 2.4 or Release 2.3
Neoview platform that uses the
SJIS or Unicode configuration, a
connection error is generated. For
more information, see
“Compatibility Between Neoview
ODBC and JDBC Drivers and
Neoview Platforms” (page 19).
SQL-side SJIS
The Neoview database
character
cannot successfully
mismatch error query or compare SJIS
characters.
This type of character mismatch
can occur when SJIS character
data is inserted and queried using
either one of these two methods :
• The data was inserted as a
literal in an SQL statement but
queried using a parameter.
• The character value was
inserted as a parameter but
queried using a literal in an
SQL statement.
Recommended Corrective Actions
Correct the character mismatch. For
example:
• Locate the string literal that
contains the mismatched character.
• Issue an UPDATE statement with
the updated literal value.
• Pass the corrected value as a
parameter.
To prevent SQL-side SJIS character
mismatches from occurring, always
send the SJIS character data in
For more information, see “SJIS parameters when using an ODBC
Character Mismatches” (page 42) connection. Do not enclose them in
SQL statement character string literals.
Using 4-byte
A syntax or translation • You are using 4-byte GB18030 • Do not use 4-byte GB18030
GB18030
error is displayed at a
characters in SQL identifiers
characters in SQL identifiers or in
characters in
client workstation.
or in character string literals
string literals inside SQL
SQL statements
within SQL statements. The
statements. If you must use 4-byte
SQL compiler does not support
GB18030 characters in the data,
the use of 4-byte GB18030
always use parameters to pass them
characters for these purposes.
to the Neoview database.
• You are using one or both of
• Do not use the two 4-byte GB18030
the two 4-byte GB18030
characters that map to the UCS2
characters that map to 0xFFFE
values, 0xFFFE and 0xFFFF.
and 0xFFFF for Unicode or
UCS2 characters. The SQL
compiler might not support
these two characters, even
when they are passed as
parameters.
39
Table 5-1 Troubleshooting Symptoms, Causes, and Recommended Corrective Actions for Users
(continued)
Problem Type
40
Symptoms
Probable Causes
Recommended Corrective Actions
Correct
• An incompatible
character string
character set error
literal prefixes
(4039) is displayed
not provided in
at a client
an SQL
workstation.
statement
• The DDL or DML
statement fails.
This error is displayed when a
user fails to explicitly specify the
correct prefix for a character string
literal (for example, _ISO88591
or _UCS2) for a character column
that is not the default character
set. For example, in the ISO88591
configuration, all newly-created
columns default to ISO88591
columns unless the column's
character set definition explicitly
specifies CHARACTER SET UCS2.
If it does, the user must also
specify the UCS2 prefix (_UCS2)
for every character string literal
in the SQL statement that is
associated with a UCS2 column.
If literal prefixes are not specified
for columns that are not the
default column for the
configuration, the SQL statement
will fail and return an error.
Make sure you always provide prefixes
to character string literals as instructed
in “Rules for Encoding SQL Language
Elements” (page 23).
Using UCS2
columns to
query SQL
metadata
An error reporting an
incorrectly-prefixed
literal is received.
An attempt was made to store
All SQL metadata must be stored in
character data in a metadata table ISO88591 columns, regardless of the
column using a UCS2 column.
Neoview character set configuration.
SQL identifier
too long
An error reporting that an SQL identifier exceeded the
an SQL identifier is too 128-byte limit.
long is received.
Troubleshooting Guidelines for Neoview Character Sets Users
If you cannot generate prefixes with
your character string literals, contact
your HP support provider and ask
them to resolve the problem.
In the Unicode configuration, SQL
identifiers are stored in UTF8
characters, where multibyte characters
can quickly consume limited space.
Reduce the number of characters in the
SQL identifier or use 7-bit ASCII
characters instead of multibyte
characters.
Table 5-1 Troubleshooting Symptoms, Causes, and Recommended Corrective Actions for Users
(continued)
Problem Type
Symptoms
Probable Causes
Incompatible
client locale
errors in the
ISO88591
configuration
These symptoms occur Causes can include:
when a client
• You inserted characters from
workstation attempts
one client workstation into an
to query the Neoview
SQL table column and
database:
attempted to retrieve
• An error is
incompatible characters from
generated stating
another workstation.
that you inserted
• You inserted characters from
characters from
two client workstations into
another client
the same SQL table column
workstation that are
and attempted to retrieve
not supported on
incompatible characters from
the retrieving
the second workstation.
workstation.
• Replacement
characters or
garbage data are
displayed from the
retrieving client
workstation.
• Neoview DB Admin
displays no data but
returns a CORBA
error.
• WMS cannot
display the
character data.
• Generated EMS
event messages
contain garbage
data.
When the ISO88591 configuration is
selected, use only characters that are
supported (compatible) across all the
client workstations. To ensure
compatibility, use only 7-bit ASCII
characters or compatible 8-bit
ISO8859-n characters in SQL identifiers
and user data.
In the SJIS
configuration, a
client locale
with encoding
other than
MS932
generates
replacement
SJIS characters
When the client locale
attempts to perform a
query in the SJIS
configuration, these
symptoms occur:
• A translation error
is generated stating
you inserted invalid
characters.
• Replacement
characters are
displayed from the
query application.
To avoid these replacement characters,
use only MS932-compatible SJIS
characters across all the client locales.
Do not use EUC-JP or other characters
that are incompatible with MS932.
A client locale that does not use
MS932-compatible encoding
attempted to retrieve
incompatible characters.
Recommended Corrective Actions
41
Table 5-1 Troubleshooting Symptoms, Causes, and Recommended Corrective Actions for Users
(continued)
Problem Type
Symptoms
Probable Causes
Recommended Corrective Actions
Incompatible
client locale
errors when an
ODBC
driver-connected
query
application is
used in the
Unicode
configuration
When a client
workstation attempts
to view character data
from a Neoview ODBC
driver-connected query
application, these
symptoms occur:
• A translation error
is generated stating
that you inserted
characters from
another client
workstation that are
not supported on
the retrieving
workstation.
• Replacement
characters (question
marks by default)
are displayed from
the query
application.
A client workstation attempted to
query character data stored in the
Neoview database that is not
compatible with its configured
character set. In the Unicode
configuration, workstations can
only display character fonts that
have been that have been installed
on them. When you use a
Neoview ODBC driver-connected
application to query the database
in the Unicode configuration, each
client workstation can only view
character data that is encoded in
the character set that is currently
configured on that workstation.
Use only characters that are supported
(compatible) across all the client
workstations. Install all the supported
fonts on every workstation. If a
Neoview ODBC driver is used, it
might also be necessary to configure
the same character set on every
workstation. For more information, see
Appendix B (page 47).
Key size length
limit exceeded
when NO
PARTITION is
specified for a
table
When you create a
table with the NO
PARTITION option
specified and define a
key for a character
column, you get an
error stating that you
have exceeded the
255-byte size limit.
A table with the NO PARTITION • If possible, reduce the size of the
option specified imposes lower
key to conform to the 255-byte limit.
size limits on keys, data blocks,
• Otherwise, do not use the NO
and table row sizes than does a
PARTITION option, particularly in
hash-partitioned table.
tables that store multibyte
characters.
SQLFetch error
when an ODBC
parameter for
the catalog API
contains a SJIS
character that
has 0x5C as the
second byte
Neoview SQL returns
an SQLFetch error to
the client application
and issues an event
message that contains
this error message:
ERROR[8410] An
escape character
in a LIKE pattern
must be followed
by another escape
character, an
underscore, or a
percent
character.
Internally, Neoview SQL uses a
Avoid using the catalog API for SJIS
LIKE predicate when handling
data that is likely to contain 0x5C as
catalog API parameters.
the second byte.
Therefore, Neoview SQL
interprets the second byte, 0x5C,
of a SJIS character in the
parameter as an escape sequence
(\) and expects a pattern
matching character, such as %, or
_, to follow. Hence, Neoview SQL
returns an SQLFetch error.
SJIS Character Mismatches
Over time, multiple character hexadecimal values have been assigned to the same glyph in the
SJIS character set. There are now approximately 400 SJIS character hexadecimal values for which
one or more other SJIS character hexadecimal values represent the same character glyph and
map to the same Unicode code point value.
It is possible, then, for customer applications to store and retrieve different SJIS character
hexadecimal values for the same character glyph. In a SJIS configuration with an ODBC connection,
when these SJIS character hexadecimal values are provided in SQL statement character string
literals and sent to the Neoview database and then retrieved, they can undergo one or more
42
Troubleshooting Guidelines for Neoview Character Sets Users
Unicode format conversions (UTF16, UTF8, or UCS2) along the way. Once these conversions
map the SJIS character hexadecimal value to a Unicode code point value that is shared by other
SJIS character hexadecimal values, it can be difficult to determine the original SJIS character
hexadecimal value. When the hexadecimal values of SJIS characters with the same Unicode code
point value are compared and found not to be the same, a SJIS character mismatch has occurred.
SJIS character mismatches can occur on the client and SQL sides of the Neoview platform. This
discussion focuses on the more critical SQL-side SJIS character mismatches that occur from the
Neoview database.
SQL-Side SJIS Character Mismatch Examples
SJIS characters with the hexadecimal values 0x81CA, 0xEEF9, and 0xFA54 all represent the same
glyph and map to the Unicode code point value 0xFFE2.
Assume in these examples that you are operating in a Windows environment and using the
Neoview ODBC driver for Windows. If you execute an SQL statement that contains any of these
three SJIS characters and the SQL compiler is invoked to compile the statement, the compiler
translates the character to the UCS2 code point value of 0xFFE2 before parsing the statement.
When the SQL compiler converts the character string literal back to SJIS encoding, it automatically
converts 0xFFE2 to the lowest value (0x81CA) and returns it to the client. This is fine if the
hexadecimal value of the SJIS character originally provided by the client was 0x81CA, but not if
it was 0xEEF9 or 0xFA54.
First, assume you perform either or both of these two SQL operations:
CREATE TABLE T1 (col1 char(2) character set ISO88591) no partition;
and
INSERT INTO T1 values ( 'character-glyph' ) ;
where character-glyph represents the common SJIS glyph and has a SJIS hexadecimal value
of 0x81CA
In these examples, assume that the lower hexadecimal value is chosen, so both operations produce
a row where the value in col1 is 0x81CA.
The SQL operation:
SELECT * from T1 where col1 = 'character-glyph'
;
displays the row because the compiler uses a WHERE clause to translate the SJIS character glyph
to 0x81CA, allowing the comparison routine used by the SELECT statement to find the row.
Next, assume ODBC is instructed to insert a row where the col1 value is 0xEEF9 and is bound
to the statement by a parameter:
INSERT INTO T1 values ( ? ) ;
The new row assumes the value 0xEEF9 in col1. The previous SELECT statement, which used a
character string literal, would not have selected this row because the WHERE clause was searching
for 0x81CA.
If a new row is inserted using JDBC, a SJIS character mismatch does not occur because 0xEEF9
(or 0xFA54) would be converted to Unicode and back to the SJIS character value 0x81CA before
the data is put in the new row.
SQL-Side SJIS Character Mismatch Scenarios
SJIS character mismatches can occur from the Neoview database in the SJIS configuration using
a Neoview ODBC connection in either of these two scenarios:
•
The Neoview ODBC driver converts an SQL statement's SQL identifiers and SJIS
MS932-encoded character string literals to UTF8 and sends the UTF8 data to the Neoview
database, where it is converted back to SJIS and stored in a SJIS column. At the same time,
the Neoview ODBC driver also sends SJIS MS932-encoded characters with the same original
SJIS Character Mismatches
43
•
hexadecimal values as parameters untranslated to the Neoview database. If the Neoview
database attempts to match the hexadecimal values of SJIS characters with the same Unicode
code point value from both sources, they will not match.
The ODBC driver manager sends the Neoview ODBC driver an SQL statement that contains
both SJIS MS932-encoded character string literals and character values as parameters that
have the same original hexadecimal values. The Neoview ODBC driver converts the regular
character string literals to UTF8 but leaves the character values sent as parameters
untranslated. Again, when the Neoview database compares SJIS MS932-encoded literals
that the driver converted to UTF8 to the untranslated character values, a SJIS character
mismatch occurs.
SJIS character mismatches are confined to SELECT, INSERT UPDATE, and INSERT DELETE
statements with a WHERE clause (predicate) that searches for SJIS character value matches. The
SQL statement is usually reported to the client as having succeeded with no data found.
How to Prevent SQL-Side SJIS Character Mismatches
When using the SJIS configuration with an ODBC connection, SQL-side SJIS character mismatches
can occur in either of the two scenarios described in “SQL-Side SJIS Character Mismatch Scenarios”
(page 43).
To prevent SQL-side SJIS character mismatches from occurring, always send the SJIS character
data in parameters when using an ODBC connection. Do not enclose them in SQL statement
character string literals.
44
Troubleshooting Guidelines for Neoview Character Sets Users
A Character Set Mapping Tables
The Neoview platform and its clients use mapping tables for these character sets to support the
Neoview Character Sets feature for this Neoview release:
• Big5
• EUC-JP
• GB2312
• GB18030
• GBK
• KSC5601-1987
• SJIS
To access these mapping tables, see Mapping Tables for Neoview Character Sets.
45
46
B Capabilities and Limitations of Multiple Client Locales in
the Unicode Configuration
This appendix describes the capabilities and limitations imposed on multiple client locales in
the Unicode configuration for this Neoview release.
Table B-1 Capabilities and Limitations for Multiple Client Locales in the Unicode Configuration
Capabilities and Limitations
Recommended User Actions (If Any)
All the character sets used by the client workstations must Use only character sets that are compatible across all the
be compatible.
client workstations.
From Neoview client applications such as Neoview DB
Admin, each client workstation can display the characters
of every client locale language for which the code pages
are present and all the associated fonts have been installed
on the workstation.
To be able to view all the characters displayed from
Neoview client applications, make sure the character fonts
associated with all the supported client locale character
encodings have been installed and their code pages are
present on every client workstation.
From each client workstation, Unicode-aware Neoview Make sure the fonts associated with all the supported
client applications such as Neoview DB Admin, Neoview character sets have been installed and defined on every
Command Interface, and the Neoview JDBC driver can client workstation.
accept, display, and process those character sets that are
installed and defined on that workstation.
The Unicode configuration supports the use of SQL
identifiers of any character set type in table and column
names when they are entered and retrieved from
Unicode-aware Neoview client applications such as
Neoview DB Admin, Neoview Command Interface, and
the Neoview JDBC driver. Neoview ODBC drivers return
SQL identifiers and character data in the client locale
character encoding. Data that is not compatible with the
client locale character encoding is replaced with and
displayed as replacement characters (question marks by
default).
Neoview ODBC drivers, which are not Unicode aware,
automatically translate all client locale character encoding
sent to the Neoview database to UCS2 encoding and
converts retrieved UCS2 database encoding to the client
locale character encoding.
From a client workstation, Neoview ODBC
driver-connected query applications display and return
only columns and user data that are encoded in the
current character set of the workstation.
Use a Neoview JDBC driver-connected query application
to display data in all the encodings that are compatible
with the current character set on the client workstation.
47
48
C Configuring Neoview Client Applications
The Neoview Transporter, Neoview Loader, Neoview ODBC drivers, and Neoview JDBC driver
each provides certain translation functions on client locale character encoding inserted into the
Neoview database and database encoding retrieved by the client workstations. These translations
ensure that client locale character data can be converted, stored in the Neoview database, retrieved,
and converted to the appropriate client locale character encoding without causing mismatch
errors.
This appendix describes how the translation functions behave and must be configured for these
Neoview client applications:
• “How Character Encoding Is Implemented in the Neoview Transporter Client” (page 49)
• “How Character Encoding Is Implemented in the Neoview Loader” (page 52)
• “How Character Encoding Is Implemented in the Neoview ODBC Driver for Windows”
(page 53)
• “How Character Encoding Is Implemented in the Neoview ODBC Drivers for UNIX”
(page 53)
• “How Character Encoding Is Implemented in the Neoview JDBC Driver” (page 54)
Neoview client applications such as Neoview DB Admin or Neoview Command Interface do
not need to be configured.
How Character Encoding Is Implemented in the Neoview Transporter Client
The Neoview Transporter uses the Transporter client, a high-speed Java-based data loader and
extractor, to load and extract data from the Neoview platform. The Transporter client application
manages load and extract operations from Windows, Linux, and HP-UX for Itanium platforms.
This section describes:
• “How Pass-Through Mode and UTF16 Conversion Are Implemented From the Transporter
Client” (page 49)
• “Encoding Data Sources” (page 50)
• “Encoding Control Files” (page 51)
• “Encoding Transporter Client Event and Log File Messages” (page 52)
For more information about the features and capabilities of the Neoview Transporter client for
this Neoview release, see “Neoview Transporter Client” (page 37).
How Pass-Through Mode and UTF16 Conversion Are Implemented From the
Transporter Client
Transporter client encoding of character data from client locales and the Neoview database
consists of these options:
• Whether or not to enforce pass-through mode for incoming character data
• Whether or not to enforce conversion of incoming character data to UTF16 for internal use
in the native Java String format.
Table C-1 (page 50) describes how pass-through mode and UTF16 conversion for the Transporter
client are implemented and interact. The Transporter client does not support ODBC data sources.
How Character Encoding Is Implemented in the Neoview Transporter Client
49
Table C-1 How Pass-Through Mode and UTF16 Conversion Are Implemented From the Neoview
Transporter Client
How Pass-Through Mode
is Enabled and Disabled
for the Transporter Client
How to Enable and Disable
UTF16 Conversion for Java
Strings
Additional Guidelines
The JDBC connectivity
server communicates the
current ISO_MAPPING
value to the Transporter
client so that it knows
what character set to
store in ISO88591
columns in each of the
three configurations:
• If the value of
ISO_MAPPING is
ISO88591 (ISO88591
configuration), the
Transporter client
automatically enables
pass-through mode.
All client locale
character data is
stored in ISO88591
columns in its original
client locale character
encoding but treated
as if it were encoded
in ISO8859-1.
• If the value of
ISO_MAPPING is
SJIS (SJIS
configuration), the
Transporter client
automatically disables
pass-through mode.
Client locale SJIS
encoding is stored in
ISO88591 columns in
SJIS format.
• If the value of
ISO_MAPPING is
UTF8 (Unicode
configuration), the
Transporter client
automatically disables
pass-through mode.
Client locale-encoded
data is stored in
ISO88591 columns in
UTF8 format.
Before they run the
• In the ISO88591
Transporter client, users
configuration, the
can edit the properties
NVT.pass-through-mode
stored in the
system property must
NVTHOME/conf/nvt.properties
be set to TRUE to
file in the installation
disable UTF16
directory. Users can add
conversion.
the
• In the SJIS or Unicode
NVT.pass-through-mode
configuration, the
system property value to
NVT.pass-through-mode
enable or disable UTF16
system property must
conversion within the
be set to FALSE to
Transporter client:
enable UTF16
• Set the
conversion. If you set it
to TRUE in the SJIS or
NVT.pass-through-mode
Unicode configuration,
system property to
the Transporter client
TRUE to disable
generates a warning to
conversion of incoming
the log file and resets
character data to UTF16
the system property to
for native Java Strings.
FALSE.
• Set the
•
If no value is specified
NVT.pass-through-mode
for the
system property to
NVT.pass-through-mode
FALSE to enable
system property, it
Transporter client
defaults to FALSE.
conversion of incoming
character data to
UTF16. This is the
default.
Summary of Character
Translation Behaviors
When the Transporter client is
enabled for pass-through mode,
the Transporter client loads and
extracts character data in its
client locale character encoding.
When pass-through mode is
enabled, the Transporter client
does not perform UTF16
translation for internal use.
When pass-through mode is
disabled, the Transporter client
first converts all incoming
character data to UTF16, the
native Java String format used
internally.
• The Transporter client then
converts character data
bound for client locales from
UTF16 to the specified client
locale character encoding. If
the encoding is specified in
the Transporter client's
control file, that encoding is
used. If not, the encoding
defaults to the character set
that is currently configured
for the target client locale.
• The Transporter client
converts character data
bound for ISO88591 columns
in the Neoview database from
UTF16 to the encoding
required for the database. For
example, character data for
ISO88591 columns is encoded
in ISO8859-1 for the ISO88591
configuration, SJIS for the SJIS
configuration, and UTF8 for
the Unicode configuration.
The Transporter client always
preserves UTF16-encoded
character data bound for
UCS2 columns in the UTF16
format.
Encoding Data Sources
Encoding for a data source or pipe source must be Java-compliant and can be specified in a global
option or a sources option of the control file. File and pipe sources that do not specify an encoding
at the source level inherit the encoding specified in the global options section. If an encoding is
not specified in the global options section or for the individual source, the default Java encoding
is used. Java determines the default encoding by the locale of the client machine.
50
Configuring Neoview Client Applications
NOTE:
A syntax error occurs if the encoding option is specified for a JDBC or JMS source.
On a load operation, the data file is read using the specified or default encoding and converted
to UTF16 Java strings, then encoded in the character set specified by ISO_MAPPING for ISO88591
columns or retained in UTF16 encoding for UCS2 columns. On an extract job, the reverse actions
occur. Data is extracted from the Neoview database and converted from its database encoding
into UTF16 Java strings. Those strings are then encoded using the encoding specified in the
control file or, if not specified there, by the default encoding and written to the target source.
You can control how encoding and decoding errors are handled when user data is loaded. The
NVT.encoding-error-disposition system property controls how unmappable or malformed
characters are handled. Allowed property values are REPLACE, REPORT and IGNORE, all of
which are case-insensitive. The default is REPORT, which means the record containing the
characters that cannot be encoded is rejected as a bad record. REPLACE replaces the offending
character with a replacement character that defaults to a question mark or is specified by the
NVT.encoding-error-replacementString system property. IGNORE causes the offending
character to be skipped over and the process continues with the next character.
Control File Option Syntax
[ encoding = "encoding" ]
Where encoding specifies any valid Java character set encoding.
Control File Example
options {
# encoding to use if not specified by data source
encoding = "UTF-8",
truncate = "true"
.
.
.
}
sources {
# encoding overrides the UTF-8 specified in the
# global options section
ex_file_1 file "/data/ex_file_1" options (encoding =
"SJIS"),
# encoding is UTF-8 as specified in global options
ex_file_2 pipe "./data-files/test_data_FSR030-pipe"
.
.
.
}
Encoding Control Files
A control file is a text file that instructs the Transporter client how you want your data moved
from source to target for loading or extracting purposes.
Control file characters are encoded in UTF8. UTF8 supports existing control files and allows
non-ASCII characters to be used in newly-created control files.
These areas of the control file can contain non-ASCII values:
• Filenames (source and include)
• SQL identifiers (schema, table, column/field)
• String literals in SQL statements
• Datasource name
• Field delimiter
• Nullstring
How Character Encoding Is Implemented in the Neoview Transporter Client
51
• JMS URL
• Startseq
• endseq
• Comments
• Named control file elements including type format, data format, map, source, and job
When the Transporter client operates in pass-through mode, incoming data is recognized and
managed as single-byte containers, not as distinct and separate characters. Consequently, field
delimiters, nullstring, startseq, and endseq values should always be limited to single-byte
characters that will not be mistaken for the second byte of multibyte character data. Valid values
are the first 64 characters of the ASCII character set, including the 31 control characters, the
numbers 0 through 9, and the 21 punctuation mark characters that end with the question mark.
For more information about control file organization and syntax, see the Neoview Transporter
User Guide.
Encoding Transporter Client Event and Log File Messages
Event messages that are generated by the Transporter client and sent to the Neoview platform
are always encoded in UTF8.
Messages written to the log file are logged, by default, in the user's default encoding. Any character
that cannot be mapped is replaced, by default, with a question mark.
You can configure the encoding of the log file by editing the log4j.properties file located
in the $NVTHOME/conf directory. For example, adding this line to the file makes the log file
encoding UTF8:
log4j.appender.A1.encoding=UTF-8
It is useful to set the log file encoding to UTF8 when you are working with several different data
file encodings and the default client encoding is not UTF8. Using UTF8 prevents any characters
from being replaced by question marks.
If you change the encoding of the log file and the log file already exists, you should remove or
rename the file because log files that contain more than one encoding might not display correctly.
Messages that are written to the console that runs the Transporter client are always encoded in
the default character set of that machine. Characters that cannot be mapped are replaced, by
default, with question marks. Every message displayed on the Transporter client console is also
present in the log file.
How Character Encoding Is Implemented in the Neoview Loader
•
Because the Neoview Loader does not perform character set translation (with its pass-through
flag is always set to ON), the character data in any input file for the Neoview Loader must
use the same encoding as is required for database encoding.
• You must enable the loader's pass-through mode flag to load UTF8 files into ISO88591
columns.
Table C-2 (page 52) provides guidelines for setting the value for the cSetConversion (-cc)
argument from gcmd for each of the Neoview character set configurations. A value of Y specifies
character set conversion. A value of N specifies no character set conversion.
Table C-2 Setting the Conversion of Input Data for the Neoview Loader
52
ISO88591 Configuration
SJIS Configuration
Unicode Configuration
Set the value of cSetConversion to N
for input data received when the
ISO88591 configuration is used.
Set the value of cSetConversion to N
for SJIS-encoded input data.
• Set the value of cSetConversion to
N for UTF8-encoded input data sent
to UCS2 columns.
• Set the value of cSetConversion to
Y for UTF8-encoded input data sent
to ISO88591 columns.
Configuring Neoview Client Applications
For information about input file encoding features of the Neoview Loader for this Neoview
release, see “Neoview Loader” (page 36).
How Character Encoding Is Implemented in the Neoview ODBC Driver
for Windows
The Neoview ODBC driver for Windows loads a separate translation DLL to perform translations
on character data sent through the driver.
Table C-3 (page 53) describes the translation behavior of the Neoview ODBC driver for Windows
for the three Neoview character set configurations.
Table C-3 Character Set Translation Behavior of the Neoview ODBC Driver for Windows
ISO88591 Configuration
SJIS Configuration
Unicode Configuration
Pass-through mode is enabled and
translation is not active.
Pass-through mode is always
disabled and translation is active.
Pass-through mode is always disabled
and translation is active.
How Character Encoding Is Implemented in the Neoview ODBC Drivers
for UNIX
The Neoview ODBC drivers for UNIX use the Neoview ODBC driver translation, which is bound
to the driver, to perform translations on character data sent through the driver.
Table C-4 (page 53) describes the translation behavior of the Neoview ODBC drivers for UNIX
for the three Neoview character set configurations.
Table C-4 Character Set Translation Behavior of the Neoview ODBC Drivers for UNIX
ISO88591 Configuration
SJIS Configuration
Unicode Configuration
Pass-through mode is enabled and
translation is not active.
Pass-through mode is always
disabled and translation is active.
Pass-through mode is always disabled
and translation is active.
Table C-5 (page 53) identifies and describes the attributes used to configure character set
translation for the Neoview ODBC drivers for UNIX.
Table C-5 Attribute Values Used by the Neoview ODBC Drivers for UNIX for a Sample DSN
Configuration
Attributes
Values
Description
= Data Source for Charset Support
Catalog
= NEO
Schema
= example-schema-name
DataLang
=0
ReplacementCharacter
=?
FetchBufferSize
= SYSTEM_DEFAULT
Server
= TCP: example-IP-address
SQL_ATTR_CONNECTION_TIMEOUT = SYSTEM_DEFAULT
SQL_LOGIN_TIMEOUT
= SYSTEM_DEFAULT
SQL_QUERY_TIMEOUT
= NO_TIMEOUT
The DataLang attribute is used for translation. Always set DataLang to 0 to use the default
locale setting from the UNIX environment variables.
How Character Encoding Is Implemented in the Neoview ODBC Driver for Windows
53
How Character Encoding Is Implemented in the Neoview JDBC Driver
The Neoview JDBC driver, which uses the Java runtime environment to perform character set
translation, automatically enables or disables translation between client locale and database
encoding based on the Neoview character set configuration of the Neoview platform.
Table C-6 (page 54) describes the translation behavior of the Neoview JDBC driver for the three
Neoview character set configurations.
Table C-6 Character Set Translation Behavior of the Neoview JDBC Driver
ISO88591 Configuration
SJIS Configuration
Unicode Configuration
Pass-through mode is enabled and
translation is not active.
Pass-through mode is always
disabled and translation is active.
Pass-through mode is always disabled
and translation is active.
NOTE: Even when pass-through mode is enabled in the ISO88591 configuration, all character
data received from client locales and the Neoview database is automatically converted to UTF16
for use with Java String objects by the Neoview JDBC driver, then converted back to its original
encoding on its way to the Neoview database or a client locale.
You can use the replacementString property to specify the string used for character
replacement during character set decoding errors. The replacementString property should
be used only when reading data from the server. Malformed data is rejected when an attempt
is made to insert it. The default value is a question mark.
54
Configuring Neoview Client Applications
D Neoview ODBC and JDBC Driver Mappings of Character
Sets and Language IDs
This appendix provides information about the language ID values that map to the client locale
character sets supported by the Neoview ODBC drivers and Neoview JDBC driver.
The Language attribute used on the client side of the Neoview platform can take one of several
values. The default value for the character set is SYSTEM_DEFAULT. For this Neoview release,
users cannot specify other values for these character sets.
Mapping Information for the Neoview ODBC Driver for Windows
Table D-1 (page 55) shows the character sets, their associated language IDs, and the language
names that can be used by the Neoview ODBC driver for Windows. If the Language ID values
shown are not present on your Neoview ODBC driver for Windows, the driver uses a different
mapping.
Table D-1 Character Set and Language ID Mappings for the Neoview ODBC Driver for Windows
Character Set
Language ID (hexadecimal)
Language ID (decimal)
Language Name
BIG5
0x0404
1028
Chinese traditional
GBK
0x0804
2052
Chinese simplified (PRC)
ISO8859–1
0x0409
1033
English
SJIS
0x0411
1041
Japanese
KSC
0x0412
1042
Korean
If there is no match, the Neoview ODBC driver for Windows assumes that the character set is
ISO8859-1.
Mapping Information for the Neoview ODBC Drivers for UNIX
If the Neoview ODBC drivers for UNIX data source attribute for the character set is
SYSTEM_DEFAULT, Linux checks the environment variables to identify the client locale in this
order:
LC_ALL
LC_CTYPE
LANG
If none is defined, it uses ISO8859-1 as the default character set.
Table D-2 (page 55) shows the mappings between character set values and language name values.
Table D-2 Character Set and Language Name Mappings for the Neoview ODBC Drivers for UNIX
Character Set
Language Name
ISO8859-1
C
ISO8859-1
iso88591
SJIS
SJIS
EUC_JP
eucJP
UTF8
utf8 or UTF-8
GBK
gbk, gb18030, gb2312, or hp15CN
BIG5
big5
Mapping Information for the Neoview ODBC Driver for Windows
55
Because there is no Microsoft driver manager on the *nix side, the Neoview ODBC drivers for
UNIX take as input any character that is sent “as is” by the client application.
If your language settings match any of those listed in Table D-2 (page 55), the Neoview ODBC
drivers for UNIX perform the required translations. If the language settings do not match, the
driver uses pass-through mode, meaning that all character data is sent to the server “as is.”
Mapping Information for the Neoview JDBC Driver
Table D-3 (page 56) identifies the string values that the Neoview JDBC driver uses to call Java
conversion routines for the supported client locale character sets.
Table D-3 Mapping Information for the Neoview JDBC Driver
56
Character Set
String Value
ISO8859-1
IS8859_1
SJIS
MS932
UCS2
UTF16-BE
EUC-JP
EUCJP
Big5
MS950
GB18030
GB18030
UTF8
UTF-8
MB_KSC5601
MS949
GB2312
GB2312
Neoview ODBC and JDBC Driver Mappings of Character Sets and Language IDs
Glossary
character set
A mapping of characters to code point values.
client locale
In the context of the Neoview Character Sets feature, the character set used by a client.
compatible
character sets
Two or more character sets are compatible when every character in one character set can be
successfully mapped to a character in the other character set, although not necessarily with the
same code point values. If any of the characters do not map to the other character sets, those
character sets are incompatible.
MS932
The Microsoft codepage 932 version of SJIS that is required for the SJIS configuration.
57
58
Index
C
Capabilities and limitations
multiple client locales in Unicode configuration, 47
Neoview Command Interface, 35
Neoview DB Admin, 35
Neoview Loader, 36
Neoview Manageability Repository, 37
Neoview Management Dashboard Client, 36
Neoview Transporter, 37
Workload Management Services, 38
Character set column definitions, 13
Client locale character encoding
overview, 15
Compatibility between drivers and Neoview database,
19
Compatible client locales, 15
Configuring
JDBC driver, 54
Neoview Loader, 52
Neoview Transporter, 49
ODBC driver for Windows, 53
ODBC drivers for UNIX, 53
Neoview Command Interface, 35
Neoview DB Admin, 35
Neoview Loader
capabilities and limitations, 36
configuring for character set conversion, 52
Neoview Manageability Repository, 37
Neoview Management Dashboard Client, 36
Neoview Transporter
capabilities and limitations, 37
specifying data source encoding, 49
R
Rules for using SQL language elements, 23
S
D
SJIS character mismatches
how to prevent SQL-side mismatches, 44
overview, 42
SQL-side mismatch examples, 43
SQL-side mismatch scenarios, 43
SQL functions, 26
SQL identifiers
capabilities and limitations, 23
SQL string functions, 27
Syntax error messages, locating invalid characters, 33
Database character encoding, 15
Documents, related information, 10
T
I
Introduction to the Neoview Character Sets feature, 13
L
Troubleshooting guidelines, 39
W
Workload Management Services, 38
LIKE predicate in SJIS and Unicode configurations, 31
Locating invalid characters in syntax error messages
example for multi-line SQL statement, 34
example for smaller SQL statement, 33
Overview, 33
M
Mapping information
for the JDBC driver, 56
for the ODBC driver for Windows, 55
Mapping tables, 45
Multiple client locales, 15
N
Neoview character set configurations
criteria for selecting the correct configuration, 21
features and behaviors, 16
implementing for existing customers, 22
introduction, 16
Neoview Character Sets
introduction, 13
mapping tables, 45
rules for using SQL language elements, 23
troubleshooting guidelines, 39
59
Open as PDF
Similar pages