Evaluation of noisy transcripts for spoken document retrieval


Share/Save/Bookmark

Werff, Laurens van der (2012) Evaluation of noisy transcripts for spoken document retrieval. thesis.

open access
[img]
Preview
PDF
3MB
Abstract:Spoken Document Retrieval (SDR) is usually implemented by using an Information Retrieval (IR) engine on speech transcripts that are produced by an Automatic Speech Recognition (ASR) system. These transcripts generally contain a substantial amount of transcription errors (noise) and are mostly unstructured. This thesis addresses two challenges that arise when doing IR on this type of source material: i. segmentation of speech transcripts into suitable retrieval units, and ii. evaluation of the impact of transcript noise on the results of an IR task.
It is shown that intrinsic evaluation results in different conclusions with regard to the quality of automatic story boundaries than when (extrinsic) Mean Average Precision (MAP) is used. This indicates that for automatic story segmentation for search applications, the traditionally used (intrinsic) segmentation cost may not be a good performance target. The best performance in an SDR context was achieved using lexical cohesion-based approaches, rather than the statistical approaches that were most popular in story segmentation benchmarks.
For the evaluation of speech transcript noise in an SDR context a novel framework is introduced, in which evaluation is done in an extrinsic, and query-dependent manner but without depending on relevance judgments. This is achieved by making a direct comparison between the ranked results lists of IR tasks on a reference and an ASR-derived transcription. The resulting measures are highly correlated with MAP, making it possible to do extrinsic evaluation of ASR transcripts for ad-hoc collections, while using a similar amount of reference material as the popular intrinsic metric Word Error Rate.
The proposed evaluation methods are expected to be helpful for the task of optimizing the configuration of ASR systems for the transcription of (large) speech collections for use in Spoken Document Retrieval, rather than the more traditional dictation tasks.
Item Type:Thesis
Additional information:SIKS Dissertation Series; no. 2012-24
Faculty:
Electrical Engineering, Mathematics and Computer Science (EEMCS)
Research Group:
Link to this item:http://purl.utwente.nl/publications/80673
Export this item as:BibTeX
EndNote
HTML Citation
Reference Manager

 

Repository Staff Only: item control page

Metis ID: 287935