When integers or any other data are represented with multiple bytes, there is no unique way of ordering those bytes in memory or in transmission over some medium, so the order is subject to arbitrary convention, called endianness. This is actually somewhat similar to the situation in different written languages, where some are written left-to-right, while others are written right-to-left.
The two main types of endianness are termed big-endian and little-endian. Endianness is also referred to as byte order or byte sex. There seems to be no significant advantage in using one way over the other; the endianness does not matter when dealing with a sequence of single bytes. This is the case with strings encoded in ASCII and similar codes, where each byte corresponds to a single character. Strings encoded with unicode UTF-16 or UTF-32 are affected by endianness, because each set of two or four bytes represents a single character.
There seems to be some confusion about how endianness should be spelled. The two major variants are endianness and endianess. There are even some documents containing both variants. While neither of the two forms appear in current (non-computing) dictionaries, it appears that the former follows the pattern of similar words such as "barren" and "barrenness". Thus, endianness is generally more accepted and is used in this article.
Endianness in computers
When some computers store a 32-bit integer value in memory, for example 0x4A3B2C1D (in hexadecimal notation), they store it as bytes in the following order: 4A 3B 2C 1D. That is, the most significant byte (also known as the msb, which is 4A in our example) is stored at the memory location with the lowest address, the next byte in significance, 3B, is stored at the next memory location and so on.
Architectures that follow this rule are called big-endian and include Motorola 68000, SPARC and System/370.
Other computers store the value 0x4A3B2C1D as 1D 2C 3B 4A, that is, least significant ("littlest") byte (also known as lsb) first. Architectures that follow this rule are called little-endian and include the MOS Technology 6502, Intel x86 and DEC VAX.
Some architectures can be configured either way; these include ARM, PowerPC (but not the PPC970/G5), DEC Alpha, MIPS, PA-RISC and IA64. The word bytesexual or bi-endian, said of hardware, denotes willingness to compute or pass data in either big-endian or little-endian format (depending, presumably, on a mode bit somewhere). Many of these architectures can be switched via software to default to a specific endian format (usually done when the computer starts up); however, on some architectures the default endianness is selected by some hardware on the motherboard and sometimes even cannot be changed by software (e.g., the DEC Alpha, which runs only in big-endian mode on the Cray T3E).
Still other (generally older) architectures, called middle-endian, may have a more complicated ordering such that the bytes within a 16-bit unit are ordered differently from the 16-bit units within a 32-bit word, for instance, 0x4A3B2C1D is stored as 3B 4A 1D 2C. Middle-endian architectures include the PDP family of processors. In general, these complex orderings are more confusing to work with than consistent big- or little-endianness.
Endianness also applies in the numbering of the bits within a byte or word. In a consistently big-endian architecture the bits in the word are numbered from the left, bit zero being the most significant bit and bit 7 being the least significant bit in a byte. The favored bit endianness depends somewhat on where the computer users expect the binary point to be located in a number. It seems most intuitive to number the bits in the little-endian order if the byte is taken to represent an integer. In this case the bit number corresponds to the exponent of the numeric weight of the bit. However, if the byte is taken to represent a binary fraction, with the binary point to the left of the most significant bit, then the big-endian numbering convention is more convenient.
To summarize, here is the default endian-formats of some common computer architectures:
- Pure big-endian: Sun SPARC, Motorola 68000, PowerPC 970, IBM System/360
- Bi-endian, running in big-endian mode by default: MIPS running IRIX, PA-RISC, most POWER and PowerPC systems
- Bi-Endian, running in little-endian mode by default: MIPS running Ultrix, most DEC Alpha, IA-64 running Linux
- Pure little-endian: Intel x86, AMD64, DEC VAX
Endianness has grave implications in software portability. For example, in interpreting data stored in binary format and using an appropriate bitmask, the endianness is important because different endianness will lead to different results from the mask.
Writing binary data from software to a common format leads to a concern of the proper endianness. For example saving data in the BMP bitmap format requires little endian integers - if the data are stored using big endian integers then the data will be corrupted since they do not match the format.
The OPENSTEP operating system has software that swaps the bytes of integers and other C datatypes in order to preserve the correct endianness, since software running on OPENSTEP for PA-RISC is intended to be portable to OPENSTEP running on Mach/i386.
In Unicode a Byte Order Mark (BOM) of between 2 and 4 bytes is used at the beginning of a string to denote its endianness.
Endianness in communications
In general, the NUXI problem is the problem of transferring data between computers with differing byte order. For example, the string "UNIX", packed two bytes per 16-bit word integer, might look like "NUXI" on a machine with a different "byte sex". The problem is caused by the difference in endianness. The problem was first discovered when porting an early version of Unix from PDP-11 (a little-endian architecture) to a big-endian IBM architecture.
The Internet Protocol defines a standard "big-endian" network byte order, where binary values are in general encoded into packets, and sent out over the network, most significant byte first. This occurs regardless of the native endianness of the host CPU.
Serial devices also have bit-endianness: the bits in a byte can be sent little-endian (least significant bit first) or big-endian (most significant bit first). This decision is made in the very bottom of the data link layer of the OSI model.
Endianness in date formats
Endianness is simply illustrated by the different manners in which countries format calendar dates. For example, in the United States and a few other countries, dates are commonly formatted as Month; Day; Year (e.g. "May 24th, 2006" or "5/24/2006"). This is a middle-endian order.
In most of the world's countries, including all of Europe except Latvia and Hungary, dates are formatted as Day; Month; Year (e.g. "24th May, 2006" or "24/5/2006" or "24/5-2006"). This is little-endian.
The ISO 8601 International formal standard ordering for dates displays them in the order of Year; Month; Day (e.g. "2006 May 24th", or, more properly, "2006-05-24"). This is big-endian.
The latter two ordering schemes lend themselves to straightforward computerized sorting of dates (note, however, that this simplicity in sorting, most often based on ASCIIbetical text string order, requires that the nine first dates of each month be numbered '01', '02', ... , '09').
Big-endian numbers are easier to read when debugging a program. Some think they are less intuitive because the most significant byte is at the smaller address. Some think they are less confusing because the significance order is the same as the order of normal textual character strings in the computer, just as in non-computer text (see below). A person's preference usually is based on which convention was studied first and on which one the person's mental models were built.
The choice of big-endian vs. little-endian has been the subject of a lot of flame wars. Emphasizing the futility of this argument, the very terms big-endian and little-endian were taken from the Big-Endians and Little-Endians of Jonathan Swift's novel Gulliver's Travels, two peoples in conflict over which end to crack an egg in the voyage to Lilliput and Blefuscu.
See the Endian FAQ, including the significant essay "On Holy Wars and a Plea for Peace" by Danny Cohen (1980).
The written system of arabic numerals is used world-wide and is such that the most significant digits are always written to the left of the less significant ones. In languages that write text left-to-right, this system is therefore big-endian, in languages that write right-to-left, this numeral system is little-endian. The spoken numeral system in English is big endian (with minor exceptions: we say "seventeen" instead of "ten-seven"). German and Dutch are also mainly big-endian, with an exception for the multiples-of-ten, e.g. 376 is pronounced as "Dreihundertsechsundsiebzig", i.e. "three hundred six-and-seventy".
Last updated: 10-29-2005 02:13:46