A Document Type Definition (DTD for short) is a set of declarations that conform to a particular markup syntax and that describe a class, or "type", of SGML or XML documents, in terms of constraints on the structure of those documents.
A DTD effectively specifies the syntax of an application of SGML or XML, which may be a widely-used standard such as XHTML or a local application. The syntax will generally be a less general form of that of SGML or XML.
DTDs are usually used to describe the structure of a class of documents. A typical DTD used for this purpose is mainly composed of element and attribute-list declarations. Element declarations name the allowable set of elements within the document, and also specify a permissible arrangement of "child" (contained) elements and runs of character data. Attribute-list declarations name the allowable set of attributes for each element, including the type of each attribute value, if not an explicit set of valid value(s).
A DTD may also declare default attribute values, named entities and their replacement text, and other constructs.
Associating DTDs with documents
A DTD is associated with a particular document via a Document Type Declaration, which is a bit of markup that appears near the start of the associated document. The declaration establishes that the document is an instance of the type defined by the referenced DTD.
The declarations in a DTD are divided into an internal subset and an external subset. The declarations in the internal subset are embedded in the Document Type Declaration in the document itself. The declarations in the external subset are located in a separate text file. The external subset may be referenced via a public identifier and/or a system identifier . Programs for reading documents may not be required to read the external subset.
Here is an example of a Document Type Declaration containing both public and system identifiers:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
Here is an example of a Document Type Declaration that encapsulates an internal subset consisting of a single entity declaration:
<!DOCTYPE foo [ <!ENTITY greeting "hello"> ]>
All HTML 4.01 documents are expected to conform to one of three SGML DTDs. The public identifiers of these DTDs are constant and are as follows:
The system identifiers of these DTDs, if present in the Document Type Declaration, will be URI references. System identifiers can vary, but are expected to point to a specific set of declarations in a resolvable location. SGML allows for public identifiers to be mapped to system identifiers in catalogs that are optionally made available to the URI resolvers used by document parsing software.
XML DTDs and schema validation
The XML DTD syntax is one of several XML schema languages. A common misunderstanding is that non-validating XML parsers are not required to read DTDs, when in fact, the DTD must still be scanned for correct syntax as well as declarations of entities and default attributes.
The syntax of SGML and XML DTDs is very similar, but not identical.
XML DTD Example
An example of a very simple XML DTD to describe a list of persons is given below:
<!ELEMENT people_list (person*)>
<!ELEMENT person (name, birthdate?, gender?, socialsecuritynumber?)>
<!ELEMENT name (#PCDATA) >
<!ELEMENT birthdate (#PCDATA) >
<!ELEMENT gender (#PCDATA) >
<!ELEMENT socialsecuritynumber (#PCDATA) >
Taking this line by line, it says:
- A "people_list" element contains any number of "person" elements. The "*" denotes there can be 0, 1 or many "person" elements within the "people_list" element.
- A "person" element contains the elements "name", "birthdate", "gender" and "socialsecuritynumber". The "?" indicates that an element is optional. The "name" element does not have a "?", so a "person" element must contain a "name" element.
- A "name" element contains data.
- A "birthdate" element contains data.
- A "gender" element contains data.
- A "socialsecuritynumber" element contains data.
An example of an XML file which makes use of this DTD follows. It assumes the DTD is identifiable by the relative URI reference "example.dtd":
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE people_list SYSTEM "example.dtd">
The DTD given above requires a name element within every person element; the people_list element is also mandatory, but the rest are optional.
It is possible to render this in an XML-enabled browser (such as IE5 or Mozilla) by pasting and saving the DTD component above to a text file named example.dtd and the XML file to a differently-named text file, and opening the XML file with the browser. The files should both be saved in the same directory. However, many browsers do not check that an XML document conforms to the rules in the DTD; they are only required to check that the DTD is syntactically correct. For security reasons, they may also choose not to read the external DTD.
DTD criticisms and alternatives
While DTD support in XML tools is widespread due to its inclusion in the XML 1.0 standard, it is seen as limited for the following reasons:
- No support for newer features of XML — most importantly, namespaces.
- Lack of expressivity. Certain formal aspects of an XML document cannot be captured in a DTD.
- Custom non-XML syntax to describe the schema, inherited from SGML.
Two newer XML schema languages that are much more powerful are increasingly favored over DTDs: