Historical linguistics is a subfield of linguistics that studies language in its historical aspects. It investigates a language or languages at various points in time. The term “diachronic linguistics” is often used in place of historical linguistics and sets it apart from “synchronic linguistics,” which studies language at a single point in time. The investigation of historical linguistics involves language change, reconstruction, and classification.
Linguistic Change
No language stands still so long as it is spoken, and every language is the product of change. Linguistic change is cumulative and, for the most part, gradual enough to escape our attention as it occurs. Typically, the cumulative effect of linguistic changes makes itself felt after a span of centuries. The longer the span, the greater the changes that are accumulated.
External Change
One way in which languages change is through the influence of other languages or external change. Such influence is most obvious in the borrowing of words that may arise from contact brought about by navigation, trade, cultural exchange, political dominance, conquest, and so on. The study of linguistic borrowing is often called “areal linguistics.”
The two principal strategies of lexical borrowing are loanwords and loan translations. A loanword is a word of foreign origin that is adapted at least partly in sound or grammar to the native ways of speaking. The English word macaroni, for instance, was borrowed from Italian (maccheroni). In loan translation, the parts of a foreign expression are translated, producing a new idiom in the native language, as in the French gratteciel and the Spanish rascacielos. Both are derived from the English word skyscraper and formed after the metaphor of “scraping the sky” to convey the idea of a very tall building. Such forms are a kind of calque in which the internal structure of a foreign expression is maintained but the morphemes are nativized.
Loanwords often have a life that cuts across the boundaries between languages. A case in point is the English word chess, which was borrowed from Old French during the 13th century. But the word originated in Persian and was adopted as a loanword by Arabic and then by Latin from Arabic. The Medieval Latin scaccus, in turn, gave the Old French esches (singular eschec). Thus, the etymology of chess reaches from Persian through Arabic, Latin, and Old French to English.
The distribution of loanwords is a subject of serious study because it bears on the nature of contact between two languages during a particular historical stage. Consider what happened after the Norman Conquest of 1066, which made French the language of the official class in England. It is small wonder that many English words having to do with government, administration, nobility, law, military and religious affairs, and so on are of French origin. But when the English cow, sheep, pig, and calf were served at the table, they became the French beef, mutton, pork, and veal, respectively. French names were also given to the culinary processes whereby the meats were prepared for Normans’ consumption.
Phonological and syntactic borrowing occurs less freely than does lexical borrowing. This is because a phonological or syntactic system consists of an integrated sum of rules, and the modification of one rule may have serious consequences elsewhere in the system. But there are known examples of phonological and syntactic borrowing. Old English (450-1100), for instance, did not have phonemes such as /v/ (vase), /z/ (zeal), and /A / (they). It was with the introduction of loanwords from French as well as Latin that these three sounds eventually achieved phonemic status during the Middle English period (1100-1450). A well-known case of syntactic borrowing is the restricted use of the infinitive by the languages of the Balkan Peninsula (Albania, Bulgarian, Greek, and Romania). This shared trait is attributable to mutual borrowing instead of genealogical relationship that is quite indirect among them.
When two languages come into contact, linguistic acculturation ensues, resulting in the adoption of linguistic traits from each other. If the speakers are equally powerful or prestigious, we have two adstratum languages. In an adstratal relationship, linguistic borrowing goes in both directions, with each language serving as donor and recipient. If the speakers are unequal in terms of power or prestige, the language of the dominant group is called the superstratum language and the language of the less dominant group is called the substratum language. In this case, borrowing is primarily unidirectional, with the superstratum language serving as the donor and accepting only a few loanwords from the substratum language.
Linguistic borrowing is subject to sociolinguistic and structural factors. Traditional wisdom is that the degree of typological and structural similarity is more significant in determining whether an indigenous language accepts loanwords. Recent studies, however, have shown that the opposite is more likely to be true. In the case of Native American languages, for instance, past bilingualism appears to have been a more prominent factor in influencing the borrowing of lexical items and grammatical features.
Internal Change
Internal change can occur on all levels of linguistic structure. The difficulties that Middle English (11001450) poses for the modern reader, for instance, are due partly to the strangeness of outdated spelling conventions. This was about to change when Geoffrey Chaucer (ca. 1340-1400) condemned what he considered to be haphazard variations in English vowel pronunciation. In actuality, these seemingly random variations were part of an internal sound change (the Great Vowel Shift) that resulted in the transition of Middle English to Early Modern English (14501700). Consequently, the spelling conventions of Shakespeare’s time were brought more in line with what we have today, but in an actual performance of a Shakespearean play the pronunciation of the actors is still not easy to follow.
Interestingly, internal sound change may be related to culture change. The distribution of f sounds in time and space, for instance, led Charles F. Hockett to propose in 1985 that as a relatively recent innovation attested by diachronic and synchronic evidence, the f sounds arose in association with agriculture. Not only is there a correlation between the development of f sounds in language and the practice of crop raising, but the use of cereal grains in the daily diet may have been responsible for inducing the changes in the dental configuration and jaw motion that made the articulation of labiodentals possible and easy.
Another form of internal change involves the loss and addition of lexical items. There used to be, for instance, an English verb cennan, meaning “give birth to,” but it has completely disappeared from the language. The addition of new terms can be through coinage (for example, Kodak, xerox), derivation (use of prefixation, use of suffixation), compounding (takeoff, headstrong), abbreviation (radar, ASAP), clipping (exam, dorm), blending (brunch, smog), functional shift (to bag, to milk), semantic change, and so on.
On the syntactic level, internal change may occur in word order or the use of morphemes that indicate syntactic relations among words. During the early stages of English, for instance, a possessive pronoun tended to follow rather than precede the noun it modified. Therefore, the Lord’s Prayer used to begin with F^der ure (“Father our”) in the opposite order from what is said today. Even more dramatic is the amount of inflection shed by Modern English. Old English was a highly inflected language, employing an abundance of grammatical endings to distinguish case, number, gender, person, tense, and so on. In Modern English, however, there is relatively little left in the way of inflection. This revolutionary change was initiated by the lowly English peasants and shepherds, who adhered to their native tongue during the Norman Conquest and, in the process, introduced an overhaul of the structure of English in their speech.
Reconstruction
The emergence of modern linguistics is often traced back to 1786, when Sir William Jones (1746-1794) published an article describing the structural similarities that crystallized from a comparison of Sanskrit, Greek, and Latin and called for the reconstruction of their ancestor language. At its onset, modern linguistics exhibited a predominantly comparative historical orientation.
The Growth of Comparative Historical Linguistics
Following the lead of Jones, the 19th-century historical linguists made a concerted effort to study the relationship among languages, especially those that constituted the Indo-European family of languages. Among these linguists were Rasmas Rask (1787-1832), Jacob Grimm (1785-1863), Karl Verner (1846-1896), and August Schleicher (1821-1868).
In 1818, Rask, a Danish linguist, wrote the first comparative grammar of the Scandinavian languages. Through rigorous comparisons, he mapped out the sound correspondences among these languages, brought order into their historical relationship, and defined their place in the Germanic branch of the Indo-European language family. Grimm, a German linguist, had focused his attention on the Germanic languages in general and Gothic in particular. His work resulted in two important achievements in 1822. One was the regular correspondences of consonants that he found between the Germanic languages, on the one hand, and Classic Greek, Latin, and Sanskrit, on the other. The other achievement was the revelation of the systematic nature of sound change. His conclusions went down as Grimm’s Law, a simplified summary of which is given as follows:
However, there are exceptions to Grimm’s Law. Verner found that the regularity of the correspondences predicted by Grimm was undercut by the place of the Indo-European stress. To account for the exceptions resulting from it, he proposed an additional phonological rule in 1875 that is to be applied to the outcome of Grimm’s Law. The rule is named Verner’s Law in honor of this Danish linguist.
During the mid-19th century, the German linguist Schleicher applied himself to the task of reconstructing the protolanguage of the Indo-European family. A protolanguage is a hypothetical parent language reconstructed from a group of languages that appear to be related. Schleicher set the reconstruction process on a solid footing by synthesizing the work of his precursors and formulating the “family tree theory” in 1871. According to this theory, language changes in regular ways (the regularity hypothesis) and similarities among languages are the result of a genetic relationship among them (the relatedness hypothesis). To reconstruct the protolanguage, the comparative method is to be used.
The Comparative Method
The technique of comparing related languages to arrive at a reconstruction of the protolanguage is known as the comparative method. It is a technique for establishing the relatedness of several languages and for reconstructing the linguistic forms (such as sounds, morphemes, words) of their protolanguage. In both cases, the comparative method works primarily with cognates.
Cognates are words that belong in different languages but are descended from the same ancestral root. A cognate set is a group of words similar in form and meaning. The English word three, for instance, is a cognate of Sanskrit tri, Persian thri, Greek treis, Latin trgs, German drei, Dutch drie, Icelandic thrir, Welsh tri, and so on. The application of the comparative method to this cognate set yields *treyes as the proto-Indo-European form, where the asterisk (*) denotes its hypothetical nature. The sound correspondences revealed in this case also operate with regularity in other sets of cognates from these languages.
There is no intrinsic tie between form and meaning. This led Shakespeare to remark, in Romeo and Juliet, “That which we call a rose, by any other name would smell as sweet.” If names are arbitrary, then the probabilities that different languages randomly choose to call the same thing by similar names are statistically insignificant. When we do find similarities (cognates), they are taken to point to relatedness. However, this is on the condition that their resemblance is not due to chance or borrowing. For example, Modern Persian has a word that virtually sounds and means the same as the English word bad, but the Modern Persian sound correspondences (b-b, ®-®, d-d) do not recur in a large set of linguistic items. It is relatively easy to eliminate chance similarities of this sort. Eliminating similarities due to borrowing, however, is more difficult. A common strategy is to focus on basic vocabulary, an area where borrowing has the least effect.
Given a cognate set, the historical linguist is able to arrive at its protoform by means of comparative reconstruction. The process involves examining each sound correspondence and applying two principal strategies. One is to reconstruct the sound occurring in the greatest number of languages being compared (majority rules), and the other is to reconstruct the sound that would have undergone the most common sound change (the rule of realism). In case there are two alternative analyses, preference is given to the one that is more natural or more explanatory in the sense that it sits better with the overall linguistic structure. No reconstruction, however, should violate the maxim of “Occam’s Razor,” which favors the simplest possible analysis. Following the reconstruction of the protoform, the linguist goes on to specify the sound changes that applied to the daughter languages and to check whether these proposed changes operated regularly across the whole collection of cognate sets.
Linguistic paleontology or archaeology uses reconstructed lexical items to recover information pertaining to prehistoric culture, society, and geographical facts. In the case of proto-Indo-European, linguistic evidence suggests that its speakers were small farmers who worked their fields with plows, had a clear sense of family relationship, kept domesticated animals and fowls, used wheeled vehicles, could count, lived in small villages, and believed in multiple gods. The original home of the Indo-Europeans must have been in an area of Central and Western Europe, defined by the coexistence of beech and birch trees. Similarly, the proto-Athabascan kinship system seems to contain evidence that its culture featured bilateral cross-cousin marriage, de facto sister exchange, and sororal polygyny, among others.
Internal Reconstruction
The internal reconstruction technique is designed to reduce synchronic, language-internal variation to an earlier stage of invariance. Except for majority rules, the strategies discussed in connection with comparative reconstruction are also applicable to internal reconstruction. In addition, the linguist is watchful for the effects of “conditioned” sound change in that they might cause morphological irregularities in inflectional and derivational paradigm. Consider the following data from Latin, where only item 1 contains irregularities:
Internal reconstruction will yield the protoforms *urb-s/*urb-is and *reg-s/*reg-is for item 1. A conditioned sound change caused the devoicing of *b and *g before voiceless s in the protoforms *urb-s > urp-s and *re-g-s > re-k-s, hence the synchronic irregularities. This process is known as “voicing assimilation,” but it had no effect on voiceless *k and *p in the protoforms of item 2: *wok-s > wok-s and *stirp-s > stirp-s. By recovering the protoforms and the conditioned sound change, internal reconstruction completes its reduction of synchronic variation to historical invariance. However, the application of internal reconstruction has limitations. After all, an earlier structural feature can be reconstructed only if evidence for positing it happens to have been retained.
Classification
The historical linguist seeks to classify the world’s languages in terms of genetic relationship. Such relationships are couched in the idiom of kinship so that a protolanguage is a parent language, with the divergent languages as its daughters and as sister languages among themselves. A parent language and its daughters constitute a genetic group or language family.
The most popular mechanism for diagramming genetic relationships is the family tree, a device created by August Schleicher. As can be seen in the figure, in its basic form, a family tree consists of a parent language (A) as a starting point, with branches showing the daughter languages (B-F).
This is reminiscent of a speech community splitting into dialectal segments that do not communicate with one another until they are mutually unintelligible and become separate languages. Once a dialect achieves the status of a language, it is itself subject to the process of dialectal split and the ultimate generation of new languages. Each branching node in a family tree represents a formerly existing language that is ancestral to the languages branched from it.
Eurasia
An overwhelming majority of the European languages belong to the Indo-European family, which is divided into 10 branches. English is a member of the Germanic branch, as are Frisian, German, Dutch, Yiddish, Gothic, and the Scandinavian languages. The other branches are Albanian, Anatolian (Hittite, Palaic, Lydian, etc.), Armenian, Balto-Slavic (Latvian, Lithuanian, Russian, Serbo-Croatian, Polish), Celtic (Welsh, Irish, Breton, Scots Gaelic), Greek, Indo-Iranian (Sanskrit, Persian, Hindi, Bengali, Romany), Italic (French, Spanish, Italian, Portuguese), and Tocharian.
Eurasia is home to several other language families as well. Among them are Caucasian (Georgian, Avar, Kabardian), Uralic (Hungarian, Finnish, Lapp), Altaic (Turkish, Uzbek, Manchu, Tungus), Sino-Tibetan (Chinese, Burmese, Tibetan, Miao), Dravidian (Tamil, Telegu, Kannada, Malayalam), Tai-Kadai (Thai, Laotian, Kam-Sui), Austro-Asiatic (Santali, Mon, Khmer, Nicobarese), and Palaeo-Siberian (Chukchi, Koryak). There is a lack of consensus as to whether Korean and Japanese are members of the Altaic family or actually form a language family of their own.
Oceania
The languages of Oceania are grouped into four phyla (superfamilies). The Austronesian phylum, also known as Malayo-Polynesian, consists of some 900 languages that are spoken on the islands of the Pacific and Indian oceans from Madagascar off the coast of Africa to Malaysia of Asia. Some of the Austronesian languages are Malagasy, Hawaiian, Fijian, Samoan, Maori, Tagalog, Indonesian, and Malay. The indigenous languages spoken on the continent of Australia are grouped as the Australian phylum. Among the indigenous languages that have the most speakers are Tiwi, Walmatjari, Warlpiri, and Aranda. The much smaller island of New Guinea shows a tremendous variety of languages. Collectively, they are known as the Papuan phylum, which may contain a large number of language families. Oceania used to have one more language family, Tasmanian, which is now extinct.
Africa
The languages of Africa can be classified into one of four phyla. The Afro-Asiatic phylum, also known as Hamito-Semitic, contains languages spoken in north and northeast Africa and the neighboring parts of western Asia. This phylum consists of five families. Semitic is the most widespread and includes Hebrew, Aramaic, Arabic, Amharic, and extinct languages such as Akkadian and Phoenician. The other four families all are restricted to Africa: Berber (Tuareg, Riff, Kabyle), Cushitic (Somali, Gala, Walamo), Chadic (Hausa, Gabin, Mandara), and Egyptian (Coptic).
Congo-Kordofanian is the largest language phylum in Africa, extending from the equator to the extreme south. Kordofanian, however, is a small group of languages spoken in Sudan. The division of Niger—Congo is vast and consists of six families. Benue-Congo is the most numerous, with some 700 languages, including the Bantu languages (Swahili, Zulu, Kongo, Rwanda) that are spoken throughout central and southern Africa. The other families are West Atlantic (Fulani, Wolof), Mande (Malinka, Bambara, Mande), Gur or Voltaic (Mossi, Tallensi, Nupe), Kwa (Yoruba, Igbo, Akan, Ewe), and Adamawa-Eastern (Sango, Gbaya).
The Nilo-Saharan phylum is a highly diversified group of languages that are spoken in three areas. One is around the upper parts of the Nile River, with languages that belong to the Sudanic branch. Another area lies to the east of Lake Chad and is home to the Saharan branch. The third area is spread along the
Niger River of Mali and is home to the Songhai branch. The best-known members of this family are Masai, Nubian, Kanuri, and Songhai.
Khoisan is the smallest language family, containing a group of languages spoken in areas scattered around the Kalahari Desert in southern Africa. Until fairly recently, speakers of Khoisan were foragers, known to the outside world by labels such as Khoikhoi, Hottentot, Bushman, and San. Their languages feature the use of click consonants. Through borrowing, this distinctive feature has found its way into certain other African languages such as Zulu and Xhosa.
North America
In North America, the Eskimo-Aleut family is found in the extreme north. Eskimo languages are spoken on the Arctic shores of Alaska, Canada, and Greenland as well as along the tip of Siberia. These languages are linked to the native language of the Aleutian Islands that is virtually extinct. A relationship between Eskimo-Aleut and the Palae-Siberian family of Asia has been suggested.
Further south is the Na-Dene phylum, a grouping of languages spoken in the Pacific Northwest (western Canada and northern California) and the southwestern United States (Arizona and New Mexico). With a few exceptions, such as Haida and Tlingit, these languages belong to the Athabascan family. Some of the main Athabascan languages are Navajo, Beaver, Chipeweyan, Dogrib, Hare, and Hupa.
The Algonquian family was found in central Canada and the Great Lakes region in association with a number of tribal languages (Ojibwa, Cree, Arapaho, Cheyenne, Micmac, Shawnee). Other major North American families include Wakashan in the Pacific Northwest (Nootka, Kwakiutl), Salish in western Canada and the western United States (Flathead, Squamish), Iroquois in eastern Canada and the eastern United States (Huron, Mohawk, Seneca, Cherokee, Tuscarora), Siouan in the Great Plains (Dakota, Omaha, Osage, Crow, Hidatsa), Caddoan in the southern Plains (Pawnee, Wichita, Caddo), and Muskogean in the southeastern United States (Creek, Koasati, Choctaw, Muskogee).
Moreover, there are three language phyla that have an extensive north-south distribution. One is Hokan, which links the Yuman family in southern California with languages such as Tlapanecan and Jicaque in Central America. The Penutian phylum is a grouping of languages spoken in North and Central America and probably in South America as well. Mayan is its most important family, with Yucatan, Mam, Quiche, and Kekchin continuing to be widely spoken today. The third phylum is Aztec-Tanoan, which had an impressive presence in the western United States and Mexico. Some of its languages are Ute, Paiute, Shoshone, Comanche, and Hopi as well as Nahuatl, Pima-Papago, and Tarahumar, which are still spoken in Mexico. Also included in this phylum are a number of Pueblo languages.
Oto-Manguean is the only phylum restricted to Central America. It consists of several diverse families such as Zapotecan and Mixtecan. Both the vicissitude of pre-Columbian civilizations and a mountainous topography may have contributed to the linguistic diversity of Middle America. Some of the languages in this area are grouped with South American families.
South America
The South American continent is linguistically diversified, with a total number of languages estimated to be between 1,000 and 2,000. Only some 600 South American languages are documented, and some 120 of them are extinct. These known languages are grouped into three major phyla.
The Macro-Chibchan phylum covers the northwest of the continent in addition to parts of Central America and certain areas in northern and western South America. Languages of this phylum include Chibcha in Colombia, Esmeralda in Ecuador, Rama in Nicaragua, Cuna in Panama, and Waica along the Venezuelan-Brazilian border. The second phylum is Ge-Pano-Carib, found in northern and central South America. Ge is a family of languages spoken in central Brazil; the Panoan family is concentrated in Peru, Bolivia, and Paraguay; and the Carib family is widespread in northeastern Brazil, Venezuela, and the Guayanas. The third phylum, Andean-Equatorial, extends from central South America to southern Argentina. Its most important language families are Arawakan, once spoken from the Antilles down to
Paraguay, Tupi-Guarani, whose speakers occupy a large portion of Brazil and most of Paraguay, and Quechumaran, which contains Quechua and several languages spoken in Argentina.
New Classification
The Amerindian languages, a new classification, are conventionally grouped into some 200 families. Various links of relationship between and within these families have been proposed. In a new worldwide grouping of languages proposed during the mid-1980s, Joseph Greenberg (1915-2001) held that the native languages of the Americas belong to just three phyla: Eskimo-Aleut, Na-Dene, and Amerind. Indeed, there is anthropological evidence that links the first two groupings to two late migrations from the Old World into the New World. The arrival of Eskimo-Aleut speakers was more recent and was predated by the arrival of Na-Dene speakers.
The idea that all other New World languages are derived from proto-Amerind has remained controversial. On the one hand, physical anthropology reveals substantial differences among the modern speakers of Eskimo-Aleut, Na-Dene, and all other American Indian languages. On the other hand, many scholars wonder how some 2,000 languages could have diverged from a single tongue within the span of 12,000 to 20,000 years since the earliest migration into the New World. The procedure generating the new classification of Amerindian languages is called “mass inspection.” It lists words by language for comparison, and when look-alikes cluster, an earlier genetic relationship is hypothesized. This process is based on the assumption that at remote time depths, regular sound correspondences will be lost and only residual cognates will remain.
Language Isolates
If a language has no known relationship to any other language and does not lend itself to assignment to a family, it is called an isolate. Basque is one isolate that is spoken in the region of the Pyrenees (southern France and northeastern Spain). Other well-known language isolates include Adamanese (islands in the Bay of Bengal), Burushaski (Kashmir and India), Ainu (northern Japan), Fur (Sudan and Chad), Keresan (New Mexico), and Kutenai (western North America). Some extinct languages are also considered isolates, including Sumerian (Mesopotamia), Iberian (Spain), Etruscan (Italy), and Mohenjo-Daro (Pakistan).
References:
- Arlotto, A. (1972). Introduction to historical linguistics. Boston: Houghton Mifflin.
- Brown, C. H. (1999). Lexical acculturation in Native American languages. Oxford, UK: Oxford University Press.
- Crystal, D. (1987). Cambridge encyclopedia of language. New York: Cambridge University Press.
- Dyen, I., & Aberle, D. F. (1974). Lexical reconstruction: The case of the proto-Athapaskan kinship system. Cambridge, UK: Cambridge University Press.
- Greenberg, J. H. (1987). Language in the Americas. Stanford, CA: Stanford University Press.
- Hickerson, N. P. (2000). Linguistic anthropology (2nd). Fort Worth, TX: Harcourt College Publishers.
- Hock, H. H. (1991). Principles of historical linguistics (2nd ed.). New York: Mouton de Gruyter.
- Hockett, C. F. (1985). Distinguished lecture: F. American Anthropologist, 2, 263-281.
- Langacker, R. W. (1973). Language and its structure (2nd ed.). New York: Harcourt Brace Jovanovich.
- Shaul, D. L., & Furbee, N. L. (1998). Language and culture. Prospect Heights, IL: Waveland.