Search

The Online Encyclopedia and Dictionary

 
     
 

Encyclopedia

Dictionary

Quotes

   
 

Byte Order Mark


A Byte Order Mark (BOM) is the character at code point FEFF (ZERO-WIDTH NO-BREAK SPACE), when that character is used to denote the Endianness of a string of UCS/Unicode characters encoded in UTF-16 or UTF-32.

A BOM can also be used to indicate the encoding of unlabeled text in many Unicode encodings. In most encodings the BOM is a sequence which is unlikely to be seen in more conventional encodings or other Unicode encodings (usually looking like a sequence of obscure control codes). If a BOM is misinterpreted as an actual character within the text then it will generally be invisible due to the fact it is a ZERO-WIDTH NO-BREAK SPACE. Some consider such use of the BOM to be misuse though such use is specifically mentioned in official Unicode documents and is never called misuse there.

In UTF-16, a BOM is expressed as the 2 byte sequence FE FF at the beginning of the encoded string, to indicate that the encoded characters that follow it use big-endian byte order; or it is expressed as the byte sequence FF FE to indicate little-endian order.

UTF-8 text can also use a BOM and quite a lot of Windows software adds one to UTF-8 files. However in Unix-like systems (which make heavy use of text files for configuration) this practice is not recommended as it will interfere with correct processing of important codes such as the hash-bang at the start of a file. The UTF-8 representation of the BOM is the byte sequence EF BB BF.

Whilst a BOM could be used with UTF-32 this encoding is almost never used for transmission anyway.

Representations of Byte Order Marks by Encoding

  • UTF-8: EF BB BF
  • UTF-16 Big Endian: FE FF
  • UTF-16 Little Endian: FF FE
  • UTF-32 Big Endian: 00 00 FE FF
  • UTF-32 Little Endian: FF FE 00 00
  • SCSU: 0E FE FF
  • UTF-7: 2B 2F 76 and one of the following byte sequences [ 38 | 39 | 2B | 2F | 38 2D ]
  • UTF-EBCDIC: DD 73 66 73
  • BOCU-1: FB EE 28

See also

External links

Last updated: 05-07-2005 04:16:20
Last updated: 05-13-2005 07:56:04