Historical linguistics (also diachronic linguistics or comparative linguistics) is primarily the study of the ways in which languages change over time, by means of examining languages which are recognizably related through similarities such as vocabulary, word formation, and syntax, as well as the surviving records of ancient languages. Historical linguistics aims to classify the world's languages by their genetic affiliations and to trace the historic development of languages. Modern historical linguistics grew out of the earlier discipline of philology, the study of ancient texts and documents. In its early years, historical linguistics focused on the well-known Indo-European languages; but since then, significant historical-linguistic work has been done on the Austronesian languages and various families of Native American languages, among many others.
Language Evolution and the Comparative Method
Languages change over time. Historical linguists construct family trees, an idea pioneered by the 19th century historical linguist August Schleicher. The basis for the trees is the comparative method: languages presumed to be related are compared with one another, and linguists look for regular sound correspondences based on what is generally known about how languages can change, and use them to reconstruct the best hypothesis about the nature of the common ancestor language from which the attested languages are descended.
Use of the comparative method is validated by its application to languages whose common ancestor is known. Thus, when the method is applied to the Romance languages (which include French, Spanish, Portuguese, Italian, and Romanian), the reconstructed common ancestor language comes out rather similar to Latin - not the classical Latin of Horace and Cicero, but Vulgar Latin, the colloquial Latin spoken in various dialects in the late Roman Empire.
The comparative method can be used to reconstruct languages for which no written records exist, either because none were preserved or because the speakers were illiterate. Thus, the Germanic languages (which include German, Dutch, English, Norwegian, Swedish, Danish, Faroese, Icelandic, and the extinct Gothic) can be compared to reconstruct Proto-Germanic, a language that was probably contemporaneous with Latin and for which no records are preserved.
Germanic and Latin (more precisely, Proto-Italic, the ancestor of Latin and a few of its neighbors) are themselves related, being co-descended from Proto-Indo-European, spoken perhaps 5000 years ago. Scholars have reconstructed Proto-Indo-European on the basis of data from its ten daughter branches, which are: Germanic, Italic, Celtic, Greek, Baltic, Slavic, Albanian, Armenian, Indo-Iranian, and the two dead branches Tocharian and Anatolian.
The comparative method is used to distinguish true linguistic descent - that is, the passing of a language from parents to children, down through the generations - from accidental resemblance due to cultural contact. For example, c. 30% percentage of the vocabulary of Persian is taken from Arabic, as a result of the Arab conquest of Iran in the 8th century and much subsequent cultural contact. Yet Persian is Indo-European, being a member of the Indo-Iranian branch that also includes Sanskrit and many of the languages of modern India. The clue that Persian is Indo-European is that its core vocabulary generally has Indo-European cognates (as in mādar 'mother'), and its essential grammatical elements are likewise Indo-European (as in būd 'was', which includes elements related to English "be" and the English past tense ending "-ed".)
The comparative method has been successfully used to reconstruct some very large language families, notably Austronesian (which includes Hawaiian, Tagalog, Indonesian, and Malagasy) and Niger-Congo (the majority of the languages of modern Africa). Once the various changes in the daughter branches have been worked out, and a fair amount of the core vocabulary and grammar of the protolanguage are understood, then scholars will quite generally agree that a relationship of genetic relatedness has been proven.
Non-Comparative Method Theories
Vastly more controversial are hypotheses about relatedness which are not supported by application of the comparative method. Scholars who attempt to probe deeper than the comparative method supports (for example, by tabulating similarities found by mass comparison without setting up sound correspondences) are often accused of scholarly wishful thinking. The problem is that any two languages have a huge number of opportunities to resemble one another just by accident, so merely pointing out isolated resemblances has little evidentiary value. A famous example is the Persian word for "bad", which is pronounced (more or less) just like English "bad". It can be shown that the resemblance between these two words is completely accidental, and has nothing to do with the (rather remote) genetic connection between English and Persian. For further examples, see False cognate. On the other hand, this linguistic "noise" may be reduced by comparing large amounts of words, which is exactly the point of mass lexical comparison.
Since supporting distant genetic relationships is so difficult, and the methodology for finding and proving such relationships is not well established (in the way that the comparative method is), the field of locating remote relationships is riven with scholarly controversy. Nevertheless, the temptation to pursue remote relationships remains a powerful lure to many scholars--after all, Proto-Indo-European must have seemed a rather wild hypothesis to many when it was first proposed.
This uncertainty also relates to estimates of how long it would take for languages to diverge completely. One commonly cited opinion is that if a group of people were sent to a distant galaxy, after 10,000 years they would be speaking a language that would be no more similar to their native language than any other language selected at random. This figure is based on from glottochronology, using a simplified assumption of a constant 14% loss rate each millennium and a chance similarity rate of 5%. However, other work by Isidore Dyen  and Sergei Starostin indicates that in fact words have wildly differing expected life spans; thus, for instance, a specialized word like "goshawk" might on average last a mere millennium or two, whereas extremely common words like "I" and "you" often last so long that it is not possible to even estimate their life span without reconstructions going further back in time than those that are universally accepted.
The ultimate in remote reconstruction is the recovery of a Proto-World language. Not all scholars believe that such a language even necessarily existed. Moreover, it is difficult to reconcile Proto-World with what we know about prehistory. Joseph Greenberg has suggested that people coming out of northeast Africa around 50,000 BC spoke Proto-World. But that would violate the claim that no relationships would be recognizable after 10,000 years; if that figure is accurate, then if all languages are observably related, such a relationship must have somehow formed more recently.
Dené-Caucasian has also been postulated to include Na-Dené; (North America), Sino-Tibetan, Ket (Siberia), Burushashki (Pakistan), Caucasian (Chechen, Dagestan languages), and Basque. This language family is extremely hypothetical.
The Nostratic hypothesis was proposed by a Dane named Holger Pedersen, in 1903. The hypothesis claims that the Nostratic grouping includes such widely ranging language families as Indo-European, Afro-Asiatic, Uralic, Altaic, Sumerian, Elamo-Dravidian, and Kartvelian. Others claim other sets of languages. Some have speculated that the Nostratics were refugees from a Black Sea Flood of around 5600 BC, and some think this is the origin of Noah's Flood from the Bible. However, linguists have reached no firm conclusion about the validity of the Nostratic hypothesis. Its proponents, unlike Greenberg, use the traditional comparative method; however, their comparisons are often accused of being far-fetched or involving too many semantic shifts, while some also accuse them of simply grouping together the language families most familiar to them and neglecting to compare each of them to language families further afield.
Last updated: 05-13-2005 07:56:04