Multilingual domain modeling in Twenty-One: automatic creation of a bi-directional translation lexicon from a parallel corpus


Share/Save/Bookmark

Hiemstra, Djoerd (1998) Multilingual domain modeling in Twenty-One: automatic creation of a bi-directional translation lexicon from a parallel corpus. In: Computational Linguistics in the Netherlands, Eighth CLIN Meeting 1997, December 1997, Nijmegen, the Netherlands.

[img]
Preview
PDF
79Kb
Abstract:Within the project Twenty-One, which aims at the effective dissemination of information on ecology and sustainable development, a sytem is developed that supports cross-language information retrieval in any of the four languages Dutch, English, French and German. Knowledge of this application domain is needed to enhance existing translation resources for the purpose of lexical disambiguation. This paper describes an algorithm for the automated acquisition of a translation lexicon from a parallel corpus. New about the presented algorithm is the statistical language model used. Because the algorithm is based on a symmetric translation model it becomes possible to identify one-to-many and many-to-one relations between words of a language pair. We claim that the presented method has two advantages over algorithms that have been published before. Firstly, because the translation model is more powerful, the resulting bilingual lexicon will be more accurate. Secondly, the resulting bilingual lexicon can be used to translate in both directions between a language pair. Different versions of the algorithm were evaluated on the Dutch and English version of the Agenda 21 corpus, which is a UN document on the application domain of sustainable development.
Item Type:Conference or Workshop Item
Faculty:
Electrical Engineering, Mathematics and Computer Science (EEMCS)
Research Group:
Link to this item:http://purl.utwente.nl/publications/66995
Export this item as:BibTeX
EndNote
HTML Citation
Reference Manager

 

Repository Staff Only: item control page