UTF-32

UTF-32 is a method of encoding Unicode characters, using a fixed amount of 32 bits for each character. It can be regarded as the simplest possible way, as all other Unicode Transformation Formats have variable-length encodings for various characters. However, a notable drawback of UTF-32 is that it requires up to two to four times the storage space of traditional encodings. UTF-32 is generally not as efficient on memory usage and memory bandwidth when compared to UTF-16 or UTF-8. This is why it is rarely used for external storage, but only internally when character handling is required to be as simple as possible.

UCS-4

ISO 10646 defines a 32-bit encoding form called UCS-4, in which each encoded character in the universal character set (UCS) is represented by a 32-bit friendly code value in the code space of integers between 0 and hexadecimal 7FFFFFFF.

UCS-4 is sufficient to represent all of the Unicode code space, which has 1114112 (= 2²⁰+2¹⁶) code points and therefore requires only up to hexadecimal 10FFFF. Some people consider it wasteful to reserve such a large code space for mapping a relatively small set of code points, so a new encoding form, UTF-32, was proposed. UTF-32 is a subset of UCS-4 that uses 32-bit code values only in the 0 to 10FFFF code space.

UTF-32 and UCS-4

UTF-32 was originally a subset of the UCS-4 standard, but the Principles and Procedures document of JTC1/SC2/WG2 states that all future assignments of characters will be constrained to the BMP or the first 14 supplementary planes and has removed former provisions for private-use code positions in groups 60 to 7F and in planes E0 to FF.

Accordingly UCS-4 and UTF-32 can be now taken to be identical save that the UTF-32 standard has additional Unicode semantics that must be observed.

External Links

The Unicode Standard 4.1, chapter 3 - formally defines UTF-32 in §3.10, D43-D45
Unicode Standard Annex #19 - formally defined UTF-32 for Unicode 3.x (March 2001; last updated March 2002)
Registration of new charsets: UTF-32, UTF-32BE, UTF-32LE - announcement of UTF-32 being added to the IANA charset registry (April 2002)

Categories: Unicode

Last updated: 05-07-2005 04:05:34

Last updated: 05-13-2005 07:56:04

Encyclopedia

Dictionary

Quotes

UTF-32

UCS-4

UTF-32 and UCS-4

External Links

The Online Encyclopedia and Dictionary

Encyclopedia

Dictionary

Quotes

UTF-32

UCS-4

UTF-32 and UCS-4

External Links