Binary and text files
Computer files can be divided into two broad categories: binary and text. The distinction is vague because in many contexts, any file is a sequence of digital bits. For instance, to the circuits which handle information read from or written to a disk, there is no distinction between text data and any other sort. The software concerned with those circuits likewise makes no such distinction. Humans, on the other hand, are concerned with this distinction.
Text files (plain text files) are files with generally a one-to-one correspondence between the bytes and ordinary readable characters such as letters and digits. Therefore any simple program to view a file makes them human-readable. Generally, they contain ASCII characters and some control characters such as tabs, line feeds and carriage returns without any embedded information such as font information, hyperlinks or inline images. But sometimes text files contain more than ASCII characters if they are encoded by East-Asian encoding such as SJIS or Unicode. If the files are written in Unicode, a UTF standard such as UTF-8 defines the encoding format. Although text files are generally human-readable, they can of course be used for data storage by computer programs. This may be done because text files avoid problems which may arise with binary files, such as problems of endianness or the byte-length of integers.
A plain text is textual material, usually in a disk file, that is (largely) unformatted. A webpage with formatted text is not in plain text in this sense, but the HTML source is. The distinction is usually not clear-cut.
Source code of the computer programs is usually written as a text file, but once compiled, it turned into a binary file as described below.
Transferring text files between Unix, Macintosh, and Microsoft Windows or DOS computers can be problematic, as each platform uses different characters to signify a line break. See new line for a discussion of this confusion. Further cross-platform confusion occurs because many non-Unix systems have traditionally used an Extended ASCII character encoding, where the first 128 byte values conform to ASCII and where the upper 128 byte values are mapped to textual or punctuation characters, such as curly quotes or characters having a diacritical mark. Prior to the advent of Mac OS X, Macintosh users would call a document a text file so long as all of its non-whitespace bytes were printable in the Macintosh environment.
The related term, plaintext, is most commonly used in a cryptographic context, while cleartext usually refers to lack of protection from eavesdropping. Usage of these terms is such that there is some confusion amongst them, especially among those new to computers, cryptography, or data communications.
Binary files, in contrast, usually contain non-alphabetic characters, and may contain any byte value at all. They are generally used to store data rather than textual material in plain text form. Computer programs are typical examples, as the data and CPU instructions they contain can — in principle — be any binary value. As a result, compiled applications are often simply referred to as binaries, as opposed to source code, which is contained in plain text files. But binary files can also be image files, sound files, compressed files, etc. — in short, any file content whatsoever, including plain text. Usually the specification of a binary file's file format indicates how to handle that file.
Binary files are often encoded into a plain text representation to improve survivability during transit, using encoding schemes such as Base64.