Automated metadata extraction for semantic access to spoken word archives


Jong, Franciska de and Heeren, Willemijn and Hessen, Arjan van and Ordelman, Roeland and Nijholt, Anton (2011) Automated metadata extraction for semantic access to spoken word archives. In: 12th International Symposium on Social Communication, 17-21 January 2011, Santiago de Cuba, Cuba (pp. pp. 896-905).

open access
[img] PDF
Abstract:Archival practice is shifting from the analogue to the digital world. A specific subset of heritage collections that impose interesting challenges for the field of language and speech technology are spoken word archives. Given the enormous backlog at audiovisual archives of unannotated materials and the generally global level of item description, collection disclosure and item access are both at risk, and (semi-)automated methods for analysis and annotation may help to increase the use and reuse of these rich content collections. In several HMI projects the interplay has been investigated between evolving user scenarios and user requirements for spoken audio collections on the one hand, and the potential of automatic annotation and search technology for the improved accessibility and search paradigms on the other hand. In this paper we will present an overview of the state-of-the-art in metadata generation for audio content and explain the crucial importance of involving user groups in the design of research agendas and road maps for novel applications in this domain.
Item Type:Conference or Workshop Item
Additional information:Cultural heritage, spoken audio collection, automatic annotation, speech technology, information retrieval
Copyright:© 2011 Centro de Lingüística Aplicada
Electrical Engineering, Mathematics and Computer Science (EEMCS)
Research Group:
Link to this item:
Official URL:
Conference URL:
Export this item as:BibTeX
HTML Citation
Reference Manager


Repository Staff Only: item control page