Sybase Technical Library - Product Manuals Home
[Search Forms] [Previous Section with Hits] [Next Section with Hits] [Clear Search] Expand Search

Native Languages [Table of Contents] Chapter 3: Functions

Open ServerConnect Programmer's Reference for COBOL

[-] Chapter 2: Topics
[-] Processing Japanese Client Requests

Processing Japanese Client Requests

Note: The Japanese Conversion Module (JCM) is available for CICS only. If you are not using the JCM, you can skip this section.

The Japanese Conversion Module

Open ServerConnect can accept and process client requests written in Japanese if you have the JCM installed. The JCM is provided on a separate tape. It does the workstation-to-mainframe-to-workstation translations necessary to process requests containing Japanese characters.

Customization

The Open ServerConnect environment must be customized to process Japanese requests. A system programmer customizes your environment when Open ServerConnect is installed. Open ServerConnect loads the customization module when TDINIT is called.

Customization information includes client login information from the client login packet that TRS forwards to the mainframe along with the client request. Among the client information contained in the login packet is the name of the client character set. See "The Login Packet" for details.

The following options are set during customization:

If the native language is Japanese, TDINIT loads the JCM.

An Open ServerConnect program can retrieve customization information with the function TDGETUSR.

How the JCM Works

Once the JCM is loaded, it gets control whenever an Open ServerConnect program receives a client request containing TDSCHAR or TDSVARYCHAR data. TDSCHAR and TDSVARYCHAR are the datatypes used to represent Japanese characters in workstation character sets. The JCM converts the workstation Japanese characters to the character set used on the mainframe. Once mainframe processing is completed, the JCM converts results back to the original workstation character set before returning them to the client.

The Translate Tables

The JCM uses translate tables to convert workstation characters to mainframe characters.

When an Open ServerConnect program receives a client request in Japanese that contains character datatypes, it gives control to the JCM. The JCM looks up the client character set in the translate tables.

Japanese Character Sets

Different brands of workstations use different character sets to represent double-byte characters. See "Character Sets" to learn what single-byte and double-byte character sets are supported on the workstation and at the mainframe.

Differences Among Japanese Character Sets

Each character set used to handle Japanese characters has its own way of representing kanji or hankaku katakana characters and specifying lengths for Japanese character strings. While most of the differences are handled by the JCM, you need to understand a few of these differences in order to specify field lengths correctly. These differences are discussed in this section.

See Table 2-14: Length requirements in Japanese character sets and
Table 2-15: Length-settings in Japanese character set conversions for information on character set differences in tabular form.

Datatypes Used with Japanese Characters

The following datatypes can be used with Japanese characters at the workstation:

The following datatypes can be used with Japanese characters at the mainframe:

Kanji Datatypes

Kanji characters always occupy 2 bytes.

Hankaku Katakana Datatypes

Hankaku katakana characters are always represented as single-byte character-type data with datatypes of TDSCHAR or TDSVARYCHAR.

Kanji String Lengths

Kanji characters are represented as character-type data at the workstation, and as either character-type or graphic-type data at the mainframe. The length of a Japanese character string depends on which workstation is being used and whether the datatype is graphic or character.

Some character sets use a special indicator or code in character-type strings to announce that the following series of characters are double-byte characters. With kanji, this indicator is called a Shift Out (SO) code. An SO code marks the beginning of a double-byte kanji string. The end of the kanji string is marked by a Shift In (SI) code.

When setting field lengths for Japanese character strings, you must include room for these SO/SI codes.

When sending data from a mainframe to a workstation, you can replace SO/SI codes with blanks by calling the Gateway-Library function TDSETSOI before receiving or sending data.

Graphic datatypes do not use SO/SI codes.

WARNING! When receiving data from a workstation character set that does not use SO/SI codes, IBM_Kanji always inserts the SO/SI codes at the beginning and end of double-byte character strings. If the field length specification does not take this into account, and the length is just long enough for the data itself, some of the data is lost.

If a field contains mixed single-byte and double-byte data in more than one kanji string, an SO/SI pair exists for each kanji string.

At the mainframe, the length of graphic-type strings is counted in double-byte (16-bit) characters. Thus, a string of 10 kanji characters has a length of 10.

At the workstation, the length of kanji character strings is counted in bytes. Thus, a string of 10 kanji characters has a length of 20.

Hankaku Katakana String Lengths

The length of a hankaku katakana string is always represented in bytes, at both the workstation and the mainframe. A hankaku katakana character occupies one byte, except in eucjis.

The eucjis hankaku katakana character set uses an indicator (SS2) in character-type strings to announce that the next byte is occupied by a hankaku katakana. The SS2 indicator occupies one byte, and the hankaku katakana itself occupies one byte. As a result, the total length of each eucjis hankaku katakana character is two bytes.

Summary of Datatypes Used with Japanese Characters

The following datatypes are used with Japanese characters:

Table 2-13: Datatypes used with Japanese characters

Datatype

Used With

Uses SO/SI
or SS2

Length Measures

TDSCHAR
TDSVARYCHAR

DBCS and SBCS.

At the
workstation
and at the mainframe.

IBM Kanji:
Uses SO/SI with double-byte characters.

EUC-JIS
:
Uses SS2 with hankaku katakana.

For all character sets:
Number of bytes.

Maximum length for TDSCHAR and TDSVARYCHAR is 255.

TDSGRAPHIC
TDSVARYGRAPHIC

DBCS only.
At mainframe only.

No.

Number of characters.
Maximum length is 127.

Length Considerations

When converting from a workstation Japanese character set to a mainframe Japanese character set, you frequently need to adjust the length. The adjustment depends on which character sets, datatypes, and language are being used.

In this section:

Character Set Length Requirements

The following table describes how Japanese characters are represented in supported character sets, and how their lengths are affected.

Table 2-14: Length requirements in Japanese character sets

Character Set

SBCS or DBCS

Datatype

Length Considerations

Example

EUC-JIS

DBCS (hankaku
katakana)

character

Each 1-byte hankaku katakana character is preceded by a 1-byte SS2 indicator. As a result, each eucjis hankaku katakana character has a length of 2: the SS2 indicator and the hankaku katakana itself.

A string of 4 hankaku katakana occupies 8 bytes and has a length of 8.

EUC-JIS

DBCS
(kanji)

character

Each kanji character is 2 bytes long and has a length of 2.

Kanji and single-byte alphabetic characters can be mixed. When converting mixed strings from IBM Kanji to workstation kanji, double the length to be safe.

A string of 4 kanji occupies 8 bytes and has a length of 8.

Shift-JIS

SBCS
(hankaku
katakana)

character

Each hankaku katakana character is 1 byte long and has a length of 1.

Shift-JIS hankaku katakana does not use SS2 indicators.

A string of 4 hankaku katakana occupies 4 bytes and has a length of 4.

Shift-JIS

DBCS
(kanji)

character

Each kanji character is 2 bytes long and has a length of 2.

Kanji and single-byte alphabetic characters can be mixed. When converting mixed strings from IBM Kanji to workstation kanji, double the length to be safe.

A string of 4 kanji occupies 8 bytes and has a length of 8.

IBM Kanji
kanji

DBCS

character

Each kanji character is 2 bytes long and has a length of 2.

Each kanji string is preceded by a Shift Out indicator and followed by a Shift In indicator, adding two to the length of each kanji string.

Kanji and single-byte alphabetic characters can be mixed. When converting mixed strings from IBM Kanji to workstation kanji, double the length to be safe.

A string of 4 kanji occupies 10 bytes and has a length of 10.

(8 bytes for the data and 2 bytes for the SO/SI codes.)

IBM Kanji
kanji

DBCS

graphic

Each kanji character is a double-byte character and has a length of 1.

There are no SO/SI indicators with graphic data.

A string of 4 kanji occupies 8 bytes and has a length of 4.

IBM Kanji
hankaku
katakana

SBCS

character

Each hankaku katakana character is 1 byte long and has a length of 1.

IBM Kanji hankaku katakana does not use SS2 indicators.

A string of 4 hankaku katakana occupies 4 bytes and has a length
of 4.

Examples of Length Settings in Conversions

Table 2-15 illustrates length adjustments required for some workstation-to-mainframe Japanese character set conversions.

Table 2-15: Length-settings in Japanese character set conversions

Source Character Set

Source
Datatypes

Source
Length

Target
Character
Set

Target
Datatypes

Target
Length

EUCJIS hankaku katakana

character

8

IBM Kanji hankaku katakana

character

4

EUCJIS kanji

character

8

IBM Kanji
kanji

character

10

EUCJIS kanji

character

8

IBM Kanji
kanji

graphic

4

Shift-JIS hankaku katakana

character

4

IBM Kanji hankaku katakana

character

4

Shift-JIS kanji

character

8

IBM Kanji
kanji

character

10

Shift-JIS kanji

character

8

IBM Kanji
kanji

graphic

4

IBM Kanji hankaku katakana

character

4

EUC-JIS hankaku katakana

character

8

IBM Kanji hankaku katakana

character

4

Shift-JIS hankaku katakana

character

4

IBM Kanji kanji

character

10

EUC-JIS kanji

character

8

IBM Kanji kanji

character

10

Shift-JIS kanji

character

8

IBM Kanji kanji

graphic

4

EUC-JIS kanji

character

8

IBM Kanji kanji

graphic

4

Shift-JIS kanji

character

8

Lengths in Conversions

Because differences among Japanese character sets can result in longer and shorter lengths after conversion, Gateway-Library includes the TDSETSOI function that specifies padding or stripping the SO/SI indicators.

When converting from a character set that uses SO/SI indicators to one that does not (for example, converting CHAR data from IBM Kanji to Shift-JIS kanji), you can use TDSETSOI to specify whether the SO/SI indicators are stripped or whether they are replaced with embedded blanks. When replaced with embedded blanks, the length does not change. When stripped, the length is reduced by two bytes for each kanji string.

If no strip option is set, the JCM automatically strips SO/SI indicators.

When TDSETSOI replaces SO/SI indicators with blanks, the blanks are positioned at the end of the field. For example, in an IBM Kanji CHAR field that contains four kanji, the first byte contains the SO indicator, and the tenth byte contains the SI indicator. After conversion to Shift-JIS kanji, the first eight bytes are occupied by kanji, and the blanks occupy bytes nine and ten.

By judicious use of TDSETSOI, you can minimize the length changes and calculations needed in Open ServerConnect programs. See"TDSETSOI" for details.

See "TDGETSOI" for information about how to query the SO/SI processing settings for a column or parameter.


Native Languages [Table of Contents] Chapter 3: Functions