The Online Encyclopedia and Dictionary







UTF-EBCDIC is an encoding of Unicode that is meant to be EBCDIC friendly so that some older EBCDIC applications can handle some Unicode data. It has similar advantages for existing EBCDIC based systems as UTF-8 has for existing ASCII based systems. Details about UTF-EBCDIC are defined in Unicode Technical Report #16.

To produce the UTF-EBCDIC encoded version of a series of Unicode code points, a modified UTF-8 encoding (known in the specification as UTF-8-Mod) is applied first. The main difference between this encoding and UTF-8 is it allows code points 80 through 9F (which map to EBCDIC control codes) to be represented as a single byte. In order to achieve this 101XXXXX was used instead of 10XXXXXX as the format for later bytes in the sequence. As this can only hold 5 bits rather than 6 UTF-EBCDIC will generally produce larger output for the same input data than UTF-8.

Finally a reversable byte-byte transform is made on this data using a lookup table to make it as close to normal EBCDIC code pages as feasible.

Since all the stages in the encoding process are reversible, the original Unicode code points can be transparently recovered from the UTF-EBCDIC encoding.

Generally, this encoding form is rarely used, even on EBCDIC based mainframes for which it was designed for. IBM EBCDIC based mainframes, like z/OS, usually use UTF-16 for complete Unicode support. For example, DB2 UDB, COBOL, PL/I, Java and the IBM XML toolkit support UTF-16 on IBM mainframes.

External links

Last updated: 05-13-2005 07:56:04