Story by Brandon T. Bishop/Scicats
An international group of researchers used a new statistical approach to try to solve a long-disputed mystery—the birthplace of the Indo-European language family.
For many years, most linguists and archaeologists supported an area in the Eurasian steppe north of the Black Sea. A minority of scholars, however, favored an origin in central Anatolia in what is now modern Turkey.
English, Spanish, Russian, Hindi and many other languages in Europe and South Asia all share a common ancestor once spoken near the Black Sea. These languages, which belong to the Indo-European family, are now spoken on every continent. In fact, almost one of every two people on Earth speaks one of these languages.
The research group, headed by Dr. Quentin Atkinson of the University of Auckland, used a statistical method originally designed to track the development and spread of disease. By analyzing linguistic data from 103 living and extinct Indo-European languages, the researchers located the language family’s origin. They suggested that the Indo-European languages might have expanded from central Anatolia as farming began to spread north and west between 8,000 and 9,500 years ago. They announced their findings recently in the journal Science.
Many supporters of a steppe origin doubt these findings. Dr. Douglas Adams, an Indo-European linguist at the University of Idaho, expressed a number of concerns about Gray’s study. Adams was surprised by relationships the researchers found between the languages. “Their study pairs Tocharian with Armenian, which comes as a surprise to both Tocharianists and Armenianists,” Adams said.
Unique grammar traits that link Armenian, Greek and Indo-Iranian languages don’t appear in Tocharian, a language once spoken by the inhabitants of the Tarim Basin in what is now western China. A combination of Tocharian’s few preserved records and Armenian’s extensive word replacement may be more than Gray’s statistical approach can handle.
Grammar might also play a role in the models’ odd reconstruction of other language relationships, Adams said. Both Greek and Indo-Iranian have similar ways of modifying verbs, which suggest they diverged relatively recently, but Atkinson’s vocabulary-based models indicate an early separation for these two groups.
Adams saw another problem in the ability of a single word to have multiple meanings, known as polysemy. “First, all languages indulge in a massive way in polysemy, but each language indulges differently,” Adams said.
An example is the word “cow,” which can mean either “any bovine” or “an adult female bovine.” In other Indo-European languages, words with a similar pronunciation may have both meanings or only one, making it difficult to determine what counts as a match between languages, Adams said.
Dr. Hans Holm, an independent Indo-European researcher specializing in statistical methods, also saw a number of problems in Atkinson’s models. The approach used in the new models, called Bayesian statistics, relies on prior information to compute probabilities. This makes the approach highly dependent on the accuracy of the information put into equations when preparing the models.
For Atkinson’s models, an important piece of prior information is the branching, tree-like diagram that describes how Indo-European languages are related to each other and when different languages developed from older languages. According to Holm, Gray’s group has not supplied enough details to decide how its language tree might be affecting the models “The explanations of the team up to now aren’t sufficient for such a task,” Holm said. “Even in the supplementary material, only general information is given, hardly exceeding the basic information of a Wikipedia article.”
Language trees are based mainly on the location of their roots and how fast their branches move apart, Holm said. This means that Gray’s models can be seen as three-dimensional trees. These trees have roots and trunks that stretch far back into prehistory, and branches that spread farther across Europe and Asia as they reach the present day.
Because the models have their earliest branches in Anatolia or the Balkans, it’s inevitable that they’ll find an origin in Anatolia, Holm said. For these models, this spread is assumed to be similar to how virus epidemics behave, which isn’t similar to how people have migrated historically, Holm said.
The ages of the branches are another problem with Atkinson’s models, according to Holm. The models assume a steady rate for the replacement of old words in a language with new words. For languages with uncertain or incomplete vocabularies, assuming a steady rate makes the languages appear older than they really are, Holm said. This can change the entire arrangement of the language tree and lead to a wrong place and time of origin.
Atkinson has mixed feelings about the scholarly response to the new paper. “Much of the criticism has been disappointing in its lack of understanding of the methods,” Atkinson said, “but there are some excellent suggestions about how to incorporate more data and more complex geographic expansion models, which we will be using to improve the approach.”
Many critics have said that Atkinson’s models locate Indo-European languages’ origin in Anatolia simply because that’s the geographic center of their distribution. This isn’t correct, Atkinson said, because the actual geographic center is in the Eurasian steppe.
Other critics say Atkinson’s models are incorrect because they don’t include archaeological finds linked to Indo-European speakers. “This is an odd criticism,” Atkinson said. “Linguists and archaeologists are usually at pains to emphasize the difficulty of matching patterns in the archaeological record with certain modern or historically attested cultural groups. Indeed, the entire question about the origin of Indo-European is about which archaeological patterns correspond to the expansion of the languages.... Our paper focused on the linguistic evidence: Where are the languages telling us they came from? How does this match with the expansion’s evidence in the archaeological record?”
Where will Atkinson’s group turn their attention next? Atkinson said the researchers are hoping to study the development of language groups in Australasia and the Americas.
(Updated Dec. 18, 10:21 a.m.) Correction: The story was changed to reflect the proper hierarchy in the leadership structure of the leadership team. Dr. Quentin Atkinson is the leader of the team, but the story originally and incorrectly credited Dr. Russell Gray as the leader of the research team.
Brandon Bishop is a graduate student at the University of Arizona geosciences department. Bishop is currently a student in the science journalism class taught by Prof. Carol Schwalbe, a former senior text editor at the National Geographic Magazine.