In computing, HyperText Markup Language (HTML) is a markup language designed for the creation of web pages and other information viewable in a browser. The focus of HTML is on the presentation of information—paragraphs, fonts, italics, tables, and so forth—rather than the semantics—what the words mean.
Originally defined as a highly simplified subset of SGML, which is used by organizations with highly complex publishing requirements, HTML is now an international standard (ISO/IEC 15445:2000). The HTML specification is maintained mainly by the World Wide Web Consortium (W3C).
The initial versions of HTML were very tolerant of simple kinds of coding mistakes. The browser commonly made assumptions about intent, and proceeded with the rendering. Over time, the trend has been to create an increasingly strict language syntax. HTML 4.01 is the current version, although the W3C is moving toward replacing it with XHTML, which applies the exacting strictness of XML to the HTML world.
HTML is a form of markup that is oriented toward the presentation of single-page text documents with specialized rendering software called an HTML user agent, the most common example of which is a web browser. HTML provides a means by which the document's main content can be annotated with various kinds of metadata and rendering hints. The rendering cues may range from minor text decorations, such as specifying that a certain word be underlined or that an image be inserted, to sophisticated scripts, imagemaps, and form definitions that control web browsers. The metadata may include information about the document's title and author, structural information such as an expression of how the content is segmented into arbitary divisions, headings, paragraphs, lists, etc., and crucially information that allows the document to be linked to other documents to form a hypertext web. Although most HTML documents contain a main body of text, it is not uncommon to encounter minimal HTML documents that exist only to present visual media in a web browser.
HTML is a text based format that is designed to be both readable and editable by humans using a text editor. However writing and updating all your pages by hand in this way is time consuming, requires a good knowledge of HTML and can make consistancy difficult to maintain.
GUI HTML editors such as Macromedia Dreamweaver or Microsoft FrontPage allow web pages to be treated much like word processor documents but there is still the problem of having to alter lots of files to make a sitewide change (though some tools may automate this somewhat) and the html they generate is often pretty horrible to read should you ever need to edit it manually.
The final possibility is to combine content and presentation on the fly using a server-side scripting system (PHP, ASP, etc) to make the final HTML. The complexity of this may range from just pulling a text file of content into a html framework with all the page styling to complex processing done by content management systems wikis and web forums.
HTML is also used in email. In this case the message will be composed with something built into the mail client that is similar to a gui HTML editor then it is wrapped with the mime headers and sent to the receiving mail client which renders the HTML generally by using some form of browser control. Use of HTML in email is quite controversial and many mailing lists deliberately block it.
Version history of the standard
HTML 2.0 — (RFC 1866) approved as a proposed standard September 22, 1995,
HTML 3.2 — January 14, 1997,
HTML 4.0 — December 18, 1997,
HTML 4.01 (minor fixes) — December 24, 1999,
ISO/IEC 15445:2000 ("ISO HTML", based on HTML 4.01 Strict) — May 15, 2000.
There is no official HTML 1.0 specification because there were multiple informal HTML standards at the time. Work on a successor for HTML, then called 'HTML+', began in late 1993, designed originally to be "A superset of HTML … which will allow a gradual rollover from the previous format of HTML". The first formal specification was therefore given the version number 2.0 in order to distinguish it from these unofficial "standards". Work on HTML+ continued, but this never became a standard.
The HTML 3.0 standard was proposed by the newly formed W3C in March, 1995, and provided many new capabilities such as support for tables, text flow around figures and the display of complex math elements. Even though it was designed to be compatible with HTML 2.0, it was too complex at the time to be implemented, and when the draft expired in September 1995 it was not continued due to lack of browser support. HTML 3.1 was never officially proposed, and the next standard proposal was HTML 3.2, which had dropped the majority of the new features in HTML 3.0 and had instead adopted many browser-specific elements and attributes which had been created for the Netscape and Mosaic web browsers. Support for math as proposed by HTML 3.0 finally came with the different standard MathML.
HTML 4.0 likewise adopted many browser-specific elements and attributes, but at the same time began to try to 'clean up' the standard by marking some of them as 'deprecated'.
There will no longer be any new versions of HTML. However, HTML lives on in XHTML, which is based on XML.
There are four kinds of markup elements in HTML.
- Structural markup. Describes the purpose of text. For example,
- directs the browser to render "Golf" as a first–level heading, similar to "Markup elements" at the start of this section.
- Presentational markup. Describes the appearance of the text, regardless of its function. For example,
will render "boldface" in bold text. Presentational markup has been superseded by Cascading Style Sheets and is no longer recommended for general use.
- Hypertext markup. Links parts of the document to other documents. For example,
will render the word Wikipedia as a hyperlink to the specified URL.
- Widget element. Creates objects, such as buttons and lists.
Only presentational markup (Cascading Style Sheets) determine how the content within that markup should be presented. The other markup elements tell the browser what objects to render or what functions to perform.
Separation of style and content
Efforts of the web development community have led to a new thinking in the way a web document should be written; XHTML epitomizes this effort. Standards stress using markup which suggests the structure of the document, like headings, paragraphs, block quoted text, and tables, instead of using markup which is written for visual purposes only, like <font>, <b> (bold), and <i> (italics). Such presentational code has been removed from the HTML 4.01 Strict and XHTML specifications in favor of CSS solutions. CSS provides a way to separate the HTML structure from the content's presentation. See separation of style and content.
The document type definition (DTD)
All HTML documents should start with a Document Type Definition (or DTD) declaration. For example:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
This defines a document that conformes to the Strict DTD of HTML 4.01, which is purely structural, leaving formatting to Cascading Style Sheets. Other DTDs, including Loose, Transitional, and Frameset, define different rules for the use of the language.