Wednesday, April 27, 2016

Counting time!

Let's start with some shameless self-promotion: this week, Andreea Calude and I published a paper on Indo-European numerals. We investigate how these languages form numerals beyond 1-10, so 11-99, 100s, and 1000s. There are many famous languages around with crazy numeral systems but we were interested in both the regular and the crazy. For that purpose, we investigate data from Eugene Chan's amazing database 'Numeral system's of the world's languages' to get an overview of how regular and how crazy the Indo-European languages are.

Turns out there are indeed some well-known crazies to be found in Indo-European: Welsh, which forms 18 by 2 * 9, Breton, which forms 50 by 1/2 * 100, and Danish, which uses a base 10 for 20, 30, and 40, but a base 20 for 50, 60, 70, 80, and 90.

But aside from these rara in the expression of particular numbers, most languages form higher numerals in very regular ways. One particular interest of us is when languages start using syntagms rather than atoms. Atoms are lower numerals with a unique, non-compositional lexical expression, such as English four, and syntagms are composites of atoms. So English eighty-five is a syntagm composed of atoms eight and five (and a base 10 ty). Turns out all of the languages in our sample make the switch from atoms somewhere between 11 and 13: most languages have a syntagm for 11, a few start at 12 (Catalan and Marwari), and famously the Germanic languages, including English, start at 13. So there is a little variation, but none of the languages in our sample used atoms all the way up to 19, for instance. Welsh might be weird in that it uses 2*9 to form 18, but at least it doesn't have a atom for 18...

Once languages have syntagms, we looked at the order of atom and base for what we call teens (11-19), crowns (20, 30, 40, 50, 60, 70, 80, 90), and running numbers (21-29, 31-29, 41-49, 51-59, 61-69, 71-79, 81-89, 91-99). I had a personal interest in studying the last category, as my two languages (English and Dutch) have opposing orders (English eighty-five '80-5' is base-then-atom; which is vijf-en-tachtig '5-and-80' atom-then-base in Dutch) and I am always struggling to get it right. It just so happens that the famous typologist Joseph Greenberg has published a cool article on numeral systems, also talking about the order of atom and base.

He finds that if languages have both atom-then-base AND base-then-atom order, it's always the case that they have atom-then-base for the lower numerals, and base-then-atom for the higher numerals, never the other way around. Many Indo-European languages, like English, switch from having atom-then-base order in the teens (eighteen '8-10'), to base-then-atom order in the running numbers (eighty-one '80-1'). Others have atom-then-base order for both the teens and the running numbers (most of the Indian languages in our sample) and a few have base-then-atom order (Wakhi, Modern Armenian, Tocharian). But, in line with Greenberg's universal, no language in our sample changes from base-then-atom order to atom-then-base order.

As for English having base-then-atom order for running-numbers, I believe this must be due to pressure from the conquering Scandinavians and/or Normans during the formation of Middle English, as Old English still had ancestral West Germanic atom-then-base order. Damn those Vikings and Normans for making my bilingual life difficult!

In the paper, we then go on to reconstruct the ancestral order of atom and base, and we look at correlations between the order of atom and base in numerals and other word orders. We do this using phylogenetic comparative methods which are great for studying historical change in typological features such as these. You can read all about that in the paper.

But here I'd like to expand a little on one of the other questions that arose when we were writing this up. Not all languages are like the Indo-European languages: some languages do not have numerals at all, or they have a restricted set that stops somewhere and cannot be used for the derivation of infinitely higher numerals. Bernard Comrie's WALS chapter on numeral bases lists 20 languages with such a 'restricted' system, out of 196 languages. We started wondering about the dynamics of change between 'restricted' and 'productive' numeral systems: Given that productive numeral systems are so useful for counting, once you have it, can you lose it? Are they faithfully inherited as language families diverge, or are they frequently borrowed? We know that languages with restricted numeral systems can lose numbers, as investigated by Kevin Zhou and Claire Bowen for Pama-Nyungan languages of Australia. But hardly anything is known regarding the dynamics of change between restricted and productive systems.

In the paper, we shortly mention the Arawakan language family, to which languages belong with both restricted and productive systems. Comrie (2005) samples three Arawakan languages, two of which (Baré and Achagua) have restricted number systems, while the third, Arawak (Lokono), has a vigesimal number system. In the 'Numeral system's of the world's languages' database, information is available on 36 Arawakan languages, of which more than half, 20, have restricted number systems (they have numerals for 1, 2, 3, sometimes up to 5). Another 8 have traditional numerals until 20. Only 8 have truly productive systems. What is interesting aout the Arawakan languages in the database is the comments on where these systems are coming from: for several of the languages with productive numeral systems, it is remarked that the language has "developed" this system, suggesting that ancestrally, all Arawakan languages had restricted numeral systems. However, for many of the languages with restricted systems we find comments that these have "lost" their numerals - this would suggest that Arawakan languages had productive systems to start with! The fact that many speakers of Arawakan languages have now adopted the colonial French, Spanish, or Portuguese numeral systems does not help with uncovering changes between restricted and productive systems in the Arawakan language family. 

As a last note, the Arawakan numeral systems seem to be based on body counting (see here for the Mehináku system), using the fingers (and toes) to count. Some nice pics on different methods of counting can be found here.

Thursday, April 7, 2016

Hedvig's Evolutionary/Diversity Linguistics Mixtape vol. 1

Lately I've been reading papers in diversity and evolutionary linguistics and some in biology. And I've also been creating mixtapes for friends. So, it occurred to me to make a mixtape of academic papers that go well together and that I think should be shared, and well.. here we are now. This is Hedvig's Evolutionary/Diversity Linguistics Mixtape Vol.1! It features some well-known classics, some perhaps lesser well-known ones. They're ordered alphabetically, but can be read ever which way. There's a link to a free PDF for almost all of the publications.

The intention is not to create an exhaustive list of everything in evolutionary/diversity linguistics that is important (just like a mixtape of pop songs does not consist of all excellent pop songs), these publications have been especially selected for how they go together. They are also very suitable for new comers to these kinds of research questions, they're quite accessibly written and readable for non-linguists. I hope you'll see what I mean after having read a few.

Many thanks to Simon Greenhill who pointed out several of these to me (and who also wrote three).

Track list

  • Evans, N. and Levinson, S. C. (2009). The myth of language universals: Language diversity and its importance for cognitive science. Behavioral and Brain Sciences, 32.
  • Gray, R., Drummond, A., and Greenhill, S. (2009). Language phylogenies reveal expansion pulses and pauses in pacific settlement. Science, 323.
  • Greenhill, S. (2015). Demographic Correlates of Language Diversity. In Bowern, C. and Evans, B., editors, The Routledge Handbook of Historical Linguistics. Routledge Taylor & Francis Group, Abingdon, UK and New York, USA.
  • Greenhill SJ. (2015). Evolution and Language: Phylogenetic Analyses. In The International Encyclopedia of the Social and Behavioral Sciences, 2nd Edition. Wright, JD (Ed). Elsevier: Oxford.
  • Ives, A. R., Midford, P. E., and Theodore Garland, J. (2007). Within-species variation and measurement error in phylogenetic comparative methods. Systematic Biology, 56(2):252–270.
  • Levinson, S. C. and Evans, N. (2010). Time for a sea-change in linguistics: Response to comments on ’The myth of language universals’. Lingua, 120.
  • Levinson, S. C. and Gray, R. D. (2012). Tools from evolutionary biology shed new light on the diversification of languages. Trends in Cognitive Sciences, 16(3):167 – 173. 
  • Penny, D. and Phillips, M. J. (2004). The rise of birds and mammals: are microevolutionary processes sufficient for macroevolution? Trends in Ecology and Evolution, 19(10).
  • Reznick, D. and Ricklefs, R. (2009). Darwin's bridge between microevolution and macroevolution.
  • Szmrecsanyi, B., Wälchli, B., and Auer, P., (eds) (2014), Aggregating Dialectology, Typology, and Register Analysis : Linguistic Variation in Text and Speech, Linguae and litterae: 28. Walter de Gruyter, Tubingen. 
  • Verkerk, A. (2014). The evolutionary dynamics of motion event encoding. PhD thesis, Radboud University Nijmegen.
  • Honkola, T (2016) Macro- and microevolution of languages: exploring linguistic divergence with approaches from evolutionary biology. PhD thesis, University of Turku.
    • LINK TO FREE PDF (PhD thesis by compilation, some of the included publications must be sought elsewhere) 

At the top is a picture of the front of the mixtape (adaptation of the mixtape from Guardians of the Galaxy), and here is the back. For more on this back image, go here.

FroStrickberger (1990). Read more here.