Search

The Online Encyclopedia and Dictionary

 
     
 

Encyclopedia

Dictionary

Quotes

   
 

UTF-32

UTF-32 is a method of encoding Unicode characters, using a fixed amount of 32 bits for each character. It can be regarded as the simplest possible way, as all other Unicode Transformation Formats have variable-length encodings for various characters. However, a notable drawback of UTF-32 is that it requires up to two to four times the storage space of traditional encodings. UTF-32 is generally not as efficient on memory usage and memory bandwidth when compared to UTF-16 or UTF-8. This is why it is rarely used for external storage, but only internally when character handling is required to be as simple as possible.

UCS-4

ISO 10646 defines a 32-bit encoding form called UCS-4, in which each encoded character in the universal character set (UCS) is represented by a 32-bit friendly code value in the code space of integers between 0 and hexadecimal 7FFFFFFF.

UCS-4 is sufficient to represent all of the Unicode code space, which has 1114112 (= 220+216) code points and therefore requires only up to hexadecimal 10FFFF. Some people consider it wasteful to reserve such a large code space for mapping a relatively small set of code points, so a new encoding form, UTF-32, was proposed. UTF-32 is a subset of UCS-4 that uses 32-bit code values only in the 0 to 10FFFF code space.

UTF-32 and UCS-4

UTF-32 was originally a subset of the UCS-4 standard, but the Principles and Procedures document of JTC1/SC2/WG2 states that all future assignments of characters will be constrained to the BMP or the first 14 supplementary planes and has removed former provisions for private-use code positions in groups 60 to 7F and in planes E0 to FF.

Accordingly UCS-4 and UTF-32 can be now taken to be identical save that the UTF-32 standard has additional Unicode semantics that must be observed.

External Links

Last updated: 05-07-2005 04:05:34
Last updated: 05-13-2005 07:56:04