Time-saving magic for linguist fieldworkers: automatic segmenting with PRAAT and ELAN
|A typical recording situation from my own fieldwork. Melenesa (my research assistant) and Ta'alolo (one of my informants) in Neiafu-Tai village on Savai'i, in Samoa.|
Linguistic fieldwork is so much more than just the time spent in the field site, when you come home there's a lot of work that needs to be done with annotating your files, segmenting them up and in different other ways preparing them for the analysis you want to do.
|Yours truly, handling transcription files in ELAN in her office in Coombs, Canberra.|
|ELAN, a free transcription program from The Language Archive.|
Instead, Eri and Mark found a nifty way of using another program, PRAAT, do to segmentation. Eri explains it all in this blog post.
This method is not fool-proof, after all the machine is not as good as a human ear to recognize when an utterance starts and ends. But, it does a pretty good job! Errors are mostly false positives, i.e. PRAAT thinks there's an utterance there but it's just a dog barking, clothes shuffling etc. That's easy, just don't transcribe that later and delete the annotation. The other error, false negative (i.e. missing to create an annotation when there is actually speech) is more problematic, but not hard to deal with. Just add an annotation there later in ELAN. This still saves you lots of time!
|PRAAT, a program for phonetic analysis by |
Paul Boersma and David Weenink
from the University of Amsterdam
Maybe you all had this all figured out, and we were just catching up. Regardless, we're very pleased with this discovery and are now going to tell it to every fieldworker we know.
While we're on the topic of ELAN and corpora, you've all got to read this paper by Mosel on using regular expressions in ELAN to do more clever searches in your corpora. I've got some exercises she's written up to, just email me and I'll send them.
There is no need to waste time on things that machines can do quite cleverly, regexes and automatic silence recognizers are your friends, and they're easy to learn.
Over and out from the Canberra team!