The comparative method (in linguistics) is a method used to detect genetic relationships between languages and to establish a consistent relationship hypothesis by reconstructing:
- the common ancestor of the languages in question,
- a plausible sequence of regular changes by which the historically known languages can be derived from that common ancestor.
The comparative method is the "gold standard" by which mainstream linguists judge whether two languages are related; relation is deemed certain only if a reconstruction of the common ancestor (or at least a partial reconstruction) is feasible. Other approaches to the problem that have been proposed, such as Joseph H. Greenberg's "mass lexical comparison" method, are still considered too unreliable by most linguists.
In the present context, "related" has a specific meaning: two languages are said to be related if they are descended from the same ancestor language. Thus, for example, Spanish and French are both descended from Latin. "Descent", in turn, is defined in terms of transmission across the generations: children learn a language from the parents' generation, transmit it in slightly changed form to the next generation, and so on. A continuous chain of speakers across the centuries links Vulgar Latin to all of its modern descendants.
This definition of relatedness implies that even if two languages are quite similar in their vocabularies, they are not necessarily closely related. Modern Persian in fact takes more of its words from Arabic than from its direct ancestor, Proto-Indo-Iranian. This is because of heavy borrowing over the years from Arabic into Persian. But under the definition just given, Persian is considered to be descended from Proto-Indo-Iranian, and not from Arabic.
The comparative method is a method for proving relatedness in the sense just given.
How the comparative method works
The essential steps are as follows:
- Relationship between two (or more) languages can be suspected if they show a number of regular correspondences in lexicon, which means that there is a regularly recurring match between the phonetic structure of words with similar meanings (one usually begins with characteristic sets like family terms, numerals, body parts, etc.). The notion of regular correspondence is very important here: mere phonetic similarity, as between English day and Latin dies (same meaning), has no probative value. English initial [d-] does not regularly match Latin [d-], and whatever sporadic matches can be observed are due either to chance (as in the above example) or to borrowing (e.g. Latin diabolus, English devil, both ultimately of Greek origin).
- There is, however, a regular correspondence between Latin [d-] and English [t-]:
- decem | ten
- duo | two
- duco | tow
- Old Latin dingua | tongue
Closer analysis reveals that the correspondence is both regular and pervasive, and that it is part of a more general regular pattern (Grimm's law)
- More trivial equations also hold between Latin and English:
- mater | mother
- ment- | mind
- mus | mouse
- They demonstrate that Latin word-initial [m] corresponds to English [m]. However, it is the regularity of the matches, not the identity of sound, that counts here.
- A really systematic correspondence can hardly be accidental. If we can rule out alternative possibilities like massive borrowing, the correspondence can be attributed to common descent. If there are many regular correspondence sets of this kind (the more the better), and if they add up to a sensible pattern (one that could have been produced by known types of sound change), common origin becomes a virtual certainty.
- On the basis of regular correspondence sets, we formulate a relationship hypothesis, involving an attempt to reconstruct the hypothetical ancestor of the languages being compared. Without going into detail, Latin [d] and English [t] are both derived from primitive *d (the asterisk means that the sound is inferred rather than historically documented) in the reconstructible common ancestor of both languages (called Proto-Indo-European or PIE for short). We also attempt to recover the past sound changes responsible for the historically known reflexes of the reconstructed protoform. For example (the symbol > should be read as "became"):
- PIE *dek^m > Proto-Germanic *texun > Old English teon (attested, yielding Modern English ten)
- PIE *dek^m > Proto-Italic *dekem > Latin decem (c = /k/ in Classical Latin)
- PIE *dek^m > Proto-Indo-Iranian *daCa > Sanskrit das′a
- PIE *dek^m > Greek deka
Each step must be justified, e.g. *k^ > *x (the sound of German ch) is part of a regular pattern seen also in Latin cord- | Germanic *xert- 'heart' (> English heart, German Herz) and many similar equations. The weakening and loss of this *x between vowels in the history of English (*-x- > *-h- > zero) is also regular. So are other changes visible in these word histories, e.g. the development of the syllabic nasal at the end of the word into Greek and Indo-Iranian [a], the change *e > *a (or rather the falling together of *e, *o and *a) in Indo-Iranian, or the so-called Satem development of *k^ in the same group (giving a Sanskrit palatal fricative via an Indo-Iranian palatal affricate).
- Regular sound changes form historical sequences and often "feed" one another (an older change creates an environment in which more recent changes apply).
- The reconstruction of proto-sounds and their historical transformations enables us to proceed further: we can compare grammatical morphemes (word-forming affixes and inflectional endings), patterns of declension and conjugation, and so on. The full reconstruction of an unrecorded protolanguage can never be complete (for example, proto-syntax is far more elusive than phonology or morphology, and all elements of linguistic structure undergo inevitable erosion and gradual loss or replacement over time), but a consistent partial reconstruction can and must be attempted as proof of genetic relationship.
More sophisticated comparisons
During the time the comparative method was being developed (late 18th to late 19th century), two major developments occurred which improved the method's effectiveness.
First, it was found that many sound changes are conditioned by a particular context. Thus for example, in both Greek and Sanskrit, an aspirated stop evolved into an unaspirated one, but only if a second aspirate occurred later on in the same word; this is the so-called "Grassmann's law", known to the ancient Indian grammarians and promulgated as a historical discovery by Hermann Grassmann. A number of the sound changes mentioned above are also contextual.
Second, it was found that sometimes sound changes occurred in contexts that were later lost. For instance, in Sanskrit velar (k-like) sounds were replaced by palatals (ch-like sounds) whenever the following vowel was i or e. Subsequent to this change, all instances of e were replaced by a. The situation would probably have been unreconstructable, had not the original distribution of e and a been recoverable from the evidence of other Indo-European languages. Thus, for instance, Latin que 'and' preserves the original e vowel that caused the consonant shift in Sanskrit:
ke pre-Sanskrit 'and'
ce velars replaced by palatals before i and e
ca e replaced by a
ca is the attested Sanskrit form for 'and'. This finding was made independently by several scholars during the 1870's.
Verner's Law, discovered by Karl Verner in about 1875, is a similar case: the voicing of consonants in Germanic languages underwent a change that was determined by the position of the old Indo-European accent. Following the change, the accent shifted across the board to initial position. Verner solved the puzzle by comparing the Germanic voicing pattern with data from Greek and Sanskrit accent. For full discussion, see Verner's Law.
Related Wikipedia articles
The Discovery of Language by Holger Pedersen. Bloomington: Indiana University Press, 1962.
Last updated: 05-07-2005 10:33:34
Last updated: 05-13-2005 07:56:04