The Online Encyclopedia and Dictionary






Project Gutenberg

Project Gutenberg (PG) was launched by Michael Hart in 1971 in order to provide a library, on what would later become the Internet, of free electronic versions (sometimes called e-texts) of physically existing books. The texts provided are mostly in the public domain, either because they were never under copyright, or because their copyrights have expired. There are also a few copyrighted texts that Gutenberg has made available with the authors' permission. The project was named after the 15th-century German printer Johannes Gutenberg who propelled the movable type printing press revolution. As of March 2005, Project Gutenberg claimed over 15,000 books in its collection.


General information

Project Gutenberg's database of e-texts were placed on CD-ROM, like the one shown above.
Project Gutenberg's database of e-texts were placed on CD-ROM, like the one shown above.

For the most part, Project Gutenberg concentrates on historically-significant literature and reference works. The slogan of the project is "break down the bars of ignorance and illiteracy", chosen because the project hopes to continue the work of spreading public literacy and appreciation for our literary heritage that public libraries began in the early 20th century. Whenever possible, Gutenberg releases are available in plain ASCII text. Other formats may be released as well, when submitted by volunteers. For years, there has been discussion of using some type of XML, although progress on that has been slow. Formats which are not easily editable, such as PDF, are generally not considered to fit in with the goals of Project Gutenberg, although a few have been added to the collection.

While most Project Gutenberg releases are in English, there are now significant numbers in German, French, Italian, Spanish, Dutch, Finnish, and Chinese, as well as a few in other languages. All Project Gutenberg texts may be obtained and redistributed by readers for no fee: the only restriction placed on redistribution is that the unaltered text must contain the Project Gutenberg header. If the redistributed text has been modified, the file must not be labelled as a Gutenberg text.

The project has released over 15,000 electronic books, almost entirely produced by volunteers, and remains active. Anyone can become a proofreader by signing up to the Distributed Proofreaders site [1], and volunteering to proof one page at a time.

Criticism has been levelled at some Project Gutenberg etexts for their lack of scholarly rigour, for example in the inadequate detailing of editions used and in the omission of original published prefaces and other critical apparatus.


In 1971, Michael Hart was attending the University of Illinois. Hart obtained access to a Xerox Sigma V mainframe computer in the university's Materials Research Lab, as he was friends with some of the operators. He was given an operator's account with a virtually unlimited amount of computer time; that account has since been variously estimated to have been loaded with $100,000 or $100,000,000 worth of time. Hart would never have been able to use up that much computer time. Nonetheless he wanted to 'give back' this gift and thought of something that would be worth that much money.

This particular computer happened to be one of the 15 nodes on the computer network that would become the Internet. Hart believed that computers would one day be accessible to the general public and decided to make works of literature available for free in electronic form. He happened to have a copy of the United States Declaration of Independence in his backpack, and this became the first Project Gutenberg e-text.

By the time the University of Illinois stopped hosting Project Gutenberg in the mid-1990s, Hart was running it from Illinois Benedictine College . Later he came to a similar arrangement with Carnegie Mellon University, which agreed to administer Project Gutenberg's finances. It was not until the year 2000 that Project Gutenberg was formally organized as an independent legal entity, and it is now a non-profit corporation chartered in Mississippi with an IRS ruling that donations to it are tax-deductible.

Since the Project's early days, the time required to digitize a book has decreased dramatically. Books are generally not typed in, but are instead converted into text with the aid of optical character recognition (OCR) software. Despite these advances, books still need to be heavily proofread and edited before they can be added to the collection.

Other projects inspired by Project Gutenberg


Project Gutenberg of Australia is an official sister project of PG. While the primary Gutenberg site is bound by U.S. copyright law and especially the Sonny Bono Copyright Term Extension Act which in some cases has retroactively extended the duration of copyright to ninety-five years, PG Australia produces e-texts in accordance with Australian copyright law, which differs from US law in defining when works enter the public domain. Specifically, works enter public domain under Australian law after only 50 years, so many more works of the 20th century may be freely distributed. Thus, PG Australia is able to produce and host e-texts that otherwise would be illegal for Project Gutenberg in the United States to host, while due to different copyright laws, some texts from the US project cannot be hosted there. PG Australia also focuses on digitizing Australian material. However, due to the negotiation of a renewed free trade agreement between Australia and the United States, the availability of these texts may not be continued, because of copyright changes included in the agreement.

PG-EU is a new sister project which operates under the copyright law of the European Union. One of its aims is to include as many languages as possible into Project Gutenberg. It operates in Unicode to ensure that all alphabets can be represented easily and correctly.

Aozora Bunko is a similar project in Japan, which focuses on digitizing non-copyrighted texts under Japan's copyright law and distributing them for free. Most of the texts provided are Japanese literature and translations from English literature.

Project Runeberg is a similar project for the Nordic language texts, begun in 1992.

Project Ben-Yehuda brings public domain Hebrew texts to the internet, and was inspired by Project Gutenberg. It was begun in 1999. A project by the National Yiddish Book Center in Amherst, Massachusetts is attempting to produce digital versions of its entire collection of Yiddish books.

In 2000, Charles Franks founded Distributed Proofreaders, which allows the proofreading of scanned texts to be distributed among many volunteers over the Internet. To make this possible, volunteers scan and run optical character recognition software on books, then place the results on a website for volunteer "proofers" to check. The book passes through two proofreading rounds. With thousands of volunteers each working on one or more pages, a reasonably-sized book can be proofed in several hours. Other volunteers post-process the books and post them to Project Gutenberg.

The Million Book Project aims to digitize a million public domain books by 2005. In order to process such a large number of books in such a short time, they generally skip the time-consuming transcription process and store their books as compressed image files.


There is a sub-project within Project Gutenberg working on digitizing sheet music.

The Mutopia project attempts to do for music what Project Gutenberg does for literary works.

Related projects

External links

The contents of this article are licensed from under the GNU Free Documentation License. How to see transparent copy