Search

The Online Encyclopedia and Dictionary

 
     
 

Encyclopedia

Dictionary

Quotes

   
 

Base64

In computing, base64 is a data encoding scheme whereby binary-encoded data is converted to printable ASCII characters. It is defined as a MIME content transfer encoding for use in internet e-mail. The only characters used are the upper- and lower-case Roman alphabet characters (A-Z, a-z), the numerals (0-9), and the "+" and "/" symbols, with the "=" symbol as a special suffix code.

Full specifications for base64 are contained in RFC 1421 and RFC 2045. The scheme is defined only for data whose original length is a multiple of 8 bits, a requirement met by most computer file formats. The resultant base64-encoded data has a length that is approximately 33% greater than the original data, and typically appears as seemingly random characters.

To convert data to base 64, the first byte is placed in the most significant eight bits of a 24-bit buffer, the next in the middle eight, and the third in the least significant eight bits. If there are fewer than three bytes to encode, the corresponding buffer bits will be zero. The buffer is then used, six bits at a time, most significant first, as indices into the string "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/" and the indicated character output. If there were only one or two input bytes, the output is padded with two or one "=" characters respectively. This prevents extra bits being added to the reconstructed data. The process then repeats on the remaining input data.

For example, the (historic) Wikipedia slogan,

Man is distinguished, not only by his reason, but by this singular passion from other animals, which is a lust of the mind, that by a perseverance of delight in the continued and indefatigable generation of knowledge, exceeds the short vehemence of any carnal pleasure.

encoded in base64 is as follows:

TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0
aGlzIHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1
c3Qgb2YgdGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0
aGUgY29udGludWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdl
LCBleGNlZWRzIHRoZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4=

Basic spam scanners which do not decode Base64 messages will often pass messages in Base64 since they appear random enough, or do not contain keywords in the Base64 text to be spam.


Modified Base64 is a data encoding scheme whereby characters above 0x80 (hexadecimal notation) are encoded using printable ASCII characters. It is a variant of Base64, and is primarily used for encoding Unicode text into UTF-7 format for use in MIME messages. See UTF-7 for examples.

Modified Base64 is standardized as RFC 2152, A Mail-Safe Transformation Format of Unicode.

The main difference it has versus Base64 is that it does not use the "=" symbol for padding, as that character tends to require a fair amount of escaping. Instead, it pads the octet bits with 0s.

See also

External links

Resources

Last updated: 05-13-2005 07:56:04