Search

TeX

The TeX mascot, by Duane Bibby

TEX, written as TeX in plain text, is a typesetting system created by Donald Knuth. It is popular in academia, especially in the mathematics, physics and computer science communities. It has largely displaced Unix troff, the other favored formatter, in many Unix installations.

TeX is generally considered to be the best way to typeset complex mathematical formulas, but, especially in the form of LaTeX and other template packages, is now also being used for many other typesetting tasks.

 Contents

The name

An homage to Caltech, where Knuth received his doctorate, the name TeX is intended to be pronounced "tekh", where "kh" represents the sound at the end of Scottish loch or the name of the German composer Bach (in IPA ). The X is meant to represent the Greek letter χ (chi). TeX is the abbreviation of τέχνη (technē), Greek for "art" and "craft", which is also the source word of technical.

The name is properly typeset with the "E" below the baseline; systems that do not support subscript layout use the approximation "TeX". Fans like to proliferate names from the word "TeX" — such as TeXnician (user of TeX software), TeXhacker (TeX programmer), TeXmaster (competent TeX programmer), TeXhax, and TeXnique.

History

Knuth began to write TeX because he had become annoyed at the declining quality of the typesetting in volumes I-III of his monumental The Art of Computer Programming. In a manifestation of the typical hackish urge to solve the problem at hand once and for all, he began to design his own typesetting language. He thought he would finish it on his sabbatical in 1978, however the language was not frozen until 1989, more than ten years later.

Guy Steele happened to be at Stanford during the summer of 1978, when Knuth was developing his first version of TeX. When Steele returned to MIT that fall, he rewrote TeX's I/O to run under ITS.

The first version of TeX was written in the SAIL programming language to run on a PDP-10 under Stanford's WAITS operating system. For later versions of TeX, Knuth invented the concept of literate programming, a way of producing compilable source code and high quality cross-linked documentation (typeset in TeX of course) from the same original file. The language used is called WEB and produces programs in Pascal.

TeX has an idiosyncratic version numbering system. Since version 3, updates have been indicated by adding an extra digit at the end of the decimal, so that the version number asymptotically approaches π. The current version is 3.141592. This is a reflection of the fact that TeX is now very stable, and only minor updates are anticipated. Knuth has stated that the "absolutely final change (to be made after my death)" will be to change the version number to π, at which point all remaining bugs will become features.

The typesetting system

TeX commands commonly start with a backslash and are grouped with curly braces. However, almost all of TeX's syntactic properties can be changed on the fly which makes TeX input hard to parse by anything but TeX itself. TeX is a macro and token based language: many commands, including most user-defined ones, are expanded on the fly until only unexpandable tokens remain which get executed. Expansion itself is practically side-effect free. Tail recursion of macros takes no memory, and if-then-else constructs are available. This makes TeX a Turing-complete language even at expansion level.

The system can roughly be divided in four levels: in the first characters are read from file and assigned a category code. Combinations of a backslash (really: any character of category zero) followed by letters (characters of category 11) or a single other character are replaced by a control sequence token. In this sense this stage is like lexical analysis, although it does not form numbers from digits. In the next stage, expandable control sequences (such as conditionals or defined macros) are replaced by their replacement text. The input for the third stage is then a stream of characters, including ones with special meaning, and unexpandable control sequences, typically assignments and visual commands. Here characters get assembled into a paragraph. TeX's paragraph breaking algorithm works by optimizing breakpoints over the whole paragraph. After the paragraph is broken into lines, the vertical list of lines and other material is broken into pages.

The TeX system has precise knowledge of the sizes of all characters and symbols, and using this information, it computes the optimal arrangement of letters per line and lines per page. It then produces a DVI file (for "device independent") containing the final locations of all characters. This dvi file can be printed directly given an appropriate printer driver, or it can be converted to other formats. Nowadays, PDFTeX is often used which bypasses DVI generation altogether.

Most functionality is provided by format files (predumped memory images of TeX after large macro collections have been loaded). Common formats are Knuth's original basic plain TeX, LaTeX (ubiquitous in the technical sciences), and ConTeXt (which is used primarily for Desktop Publishing).

The ultimate reference works for TeX are the first two volumes of Knuth's Computers and Typesetting, The TeXbook and TeX: The Program (which includes the complete documented source code for TeX).

TeX is usually distributed together with Metafont, a companion program also developed by Knuth which allows algorithmic description of fonts. The organisation of the directories in a TeX / Metafont installation is standardized in a tree called texmf.

The license allows free distribution and modification but demands that any changed version not be called TEX, TeX, or anything confusingly similar, providing rights similar to those of a trademark. A test suite called TRIPTRAP has been made to help testing whether an implementation is really a TEX.

Quality

Though well-written, TeX is so large (and so full of cutting edge technique) that it is said to have unearthed at least one bug in every Pascal system it has been compiled with. TeX runs on almost all operating systems.

Knuth offers monetary awards to people who find and report a bug in it. The award per bug started at $2.56 and doubled every year until it was frozen at its current value of$327.68. This has not made Knuth poor, however, as there have been very few bugs and in any case a cheque proving that the owner found a bug in TeX is usually framed instead of cashed.

Computer-scientific aspects of TeX

The TeX software incorporates several interesting algorithms, and has led to a number of theses of Knuth's students. For instance, a hyphenation algorithm (work by Frank Liang ) is used that assigns priorities to breakpoints in letter groups. A list of hyphenation patterns can be generated automatically from a corpus of hyphenated words.

The line breaking algorithm is an example of dynamic programming. The problem of breaking a paragraph of n words into lines has a naive complexity of 2n, but with dynamic programming a globally optimal layout can be derived in time proportional to the number of words and the number of words per line. A thesis by Michael Plass shows how the page breaking problem can be NP-complete because of the added complication of placing figures.

The companion program Metafont for character generation uses Bezier curves in a fairly standard way, but Knuth devotes lots of attention to the rasterizing problem on bitmapped displays. Another thesis, by John Hobby , further explores this problem of digitizing brush trajectories'. This term derives from the fact that Metafont describes characters as having been drawn by abstract brushes.

Derived works

Several document processing systems are based on TeX, notably:

• LaTeX (Lamport TeX), which incorporates document styles for books, letters, slides, etc., and adds support for referencing and automatic numbering of sections and equations,
• ConTeXt, written mostly by Hans Hagen at Pragma is a professional document designing tool based on TeX. It's much younger than LaTeX and therefore maybe less popular than its older brother, but much more powerful.
• AMS-TeX , produced by the American Mathematical Society, this has a lot of more user-friendly commands, which can be altered by journals to fit with the house style. Most of the features of AMS-TeX can be used in LaTeX by using the AMS "packages". This is then referred to as AMS-LaTeX. The main AMS-TeX manual is entitled The Joy of TeX.
• jadeTeX which uses TeX as a backend for printing from James Clark's DSSSL Engine ,
• Texinfo, the GNU documentation processing system.
• XeTeX is a new TeX engine that supports Unicode and the advanced Mac OS X font technologies.

Numerous extensions and companion programs for TeX exist, among them BibTeX for bibliographies (distributed with LaTeX), PDFTeX, which bypasses dvi and produces output in Adobe Systems' Portable Document Format, and Omega, which allows TeX to use the Unicode character set. All TeX extensions are available for free from CTAN, the Comprehensive TeX Archive Network.

Compatible tools

On UNIX-compatible systems (including Mac OS X), TeX is distributed in the form of teTeX. On Windows, there is the MiKTeX distribution and the fpTeX distribution.

The TeXmacs text editor is a WYSIWYG scientific text editor that is intended to be compatible with TeX. It uses Knuth's fonts, and can generate TeX output. LyX is a similar tool.

TeX and MediaWiki

As of 2003, the MediaWiki wiki software (as used on Wikipedia) implements TeX markup, using $...$ tags enclosing blocks of TeX. This capability is implemented via Texvc which is basically a script that pipes the markup through TeX, then dvips to produce a PostScript file which Ghostscript renders into a PNG image. Due to the nature of the web environment, this is done in an efficient (cached) and security-conscious way – allowing third parties to pass unsanitised text through the standard TeX engine is a bad idea if you value your files.

The example fragments of TeX below are rendered using Texvc, and simple ones such as $a \over b$ can be used to generate $a \over b$, although it is recommended that one writes the HTML-rendered a/b instead.

Examples

A simple plain TeX example - Create a text file myfile.tex with the following content:

hello
\bye


Then open a command line interpreter and type

tex myfile.tex


TeX then creates a file myfile.dvi Use a viewer to look at the file. MikTeX for example contains a viewer called yap:

yap myfile.dvi


The viewer shows hello on a page. \bye is a TeX command which marks the end of the file and is not shown in the final output.

The dvi file can either be printed directly from the viewer or converted to a more common format such as PostScript using the dvips program.

Alternatively PDF-files may be created directly, using pdfTeX:

 pdftex myfile.tex


pdfTeX was originally created because converting generated PostScript into PDF resulted in poor font display, though printing performance was fine. This was because TeX natively uses bitmap fonts, which are only designed to display well at one particular size, whereas PostScript typically uses scalable Type 1 fonts.

It is now possible to make dvips output scalable fonts with a bit of tweaking (newer versions of Ghostscript support it), but direct conversion to PDF has other benefits: it is a one-step, not two-step process, and pdfTeX provides facilities such as bookmarks and hyperlinks not found in PostScript.

Mathematical examples

To see TeX further in action, look at its formatting of mathematical formulas. For example, to write the well-known quadratic equation, try entering

The quadratic formula is ${-b\pm\sqrt{b^2-4ac} \over {2a}}$
\bye


Use TeX as above, and you should get something that looks like

The quadratic formula is ${-b\pm\sqrt{b^2-4ac} \over {2a}}$

Notice how the formula is printed in a way a person would write by hand, or typeset the equation. In a document, entering mathematics mode is done by starting with a $, then entering a formula in TeX semantics and closing again with another$. Display mathematics, or mathematics presented centered on a new line is done by using $$. For example, the above with the quadratic formula in display math: The quadratic formula is$${-b\pm\sqrt{b^2-4ac} \over {2a}}
\bye


renders as

${-b\pm\sqrt{b^2-4ac} \over {2a}}$

LaTeX examples

LaTeX is a collection of macros written in TeX. There are many predefined templates (with predefined styles) one can use. It is much more structured than TeX, providing a set of macros and utilities for indexing, tables, lists and so forth. For example:

\documentclass[a4paper]{book}
\begin{document}
\section{ ... a title }
\subsection{ ... a subtitle}
%% The text goes here
\end{document}


To render the book as a PostScript file, use

 latex myfile.tex
dvips myfile.dvi


Alternatively, one way to render the book as a PDF file is

 pdflatex myfile.tex
`