An exploration of language identification techniques for the Dutch folktale database

Share/Save/Bookmark

Trieschnigg, Dolf and Hiemstra, Djoerd and Theune, Mariët and Jong de, Franciska and Meder, Theo (2012) An exploration of language identification techniques for the Dutch folktale database. In: Workshop on Adaptation of Language Resources and Tools for Processing Cultural Heritage, LREC 2012, 26 May 2012, Istanbul, Turkey.

[img]
Preview
PDF
328Kb
Abstract:The Dutch Folktale Database contains fairy tales, traditional legends, urban legends, and jokes written in a large variety and combination of languages including (Middle and 17th century) Dutch, Frisian and a number of Dutch dialects. In this work we compare a number of approaches to automatic language identification for this collection. We show that in comparison to typical language identification tasks, classification performance for highly similar languages with little training data is low. The studied dataset consisting of over 39,000 documents in 16 languages and dialects is available on request for followup research.
Item Type:Conference or Workshop Item
Faculty:
Electrical Engineering, Mathematics and Computer Science (EEMCS)
Research Group:
Link to this item:http://purl.utwente.nl/publications/82013
Proceedings URL:http://www.lrec-conf.org/proceedings/lrec2012/workshops/13.ProceedingsCultHeritage.pdf
Export this item as:BibTeX
EndNote
HTML Citation
Reference Manager

 

Repository Staff Only: item control page