A Neural Network Based Dutch Part of Speech Tagger

Share/Save/Bookmark

Poel, M. and Boschman, E. and op den Akker, H.J.A. (2008) A Neural Network Based Dutch Part of Speech Tagger. In: Proceedings of the twentieth Belgian-Dutch Artificial Intelligence Conference (BNAIC 2008)., 30-31 Oct 2008, Enschede/Bad Boekelo, The Netherlands.

[img]
Preview
PDF
101Kb
Abstract:In this paper a Neural Network is designed for Part-of-Speech Tagging of Dutch text. Our approach uses the Corpus Gesproken Nederlands (CGN) consisting of almost 9 million transcribed words of spoken Dutch, divided into 15 different categories. The outcome of the design is a Neural Network with an input window of size 8 (4 words back and 3 words ahead) and a hidden layer of 370 neurons. The words ahead are coded based on the relative frequency of the tags in the training set for the word. Special attention is paid to unknown words (words not in the training set) for which such a relative frequency cannot be determined. Based on a 10-fold cross validation an approximation of the relative frequency of tags for unknown words is determined. The performance of the Neural Network is 97.35%, 97.88% on known
words and 41.67% on unknown words. This is comparable to state of the art performances found in the literature. The special coding of unknown words resulted of an increase of almost 13% for the tagging of unknown words.
Item Type:Conference or Workshop Item
Faculty:
Electrical Engineering, Mathematics and Computer Science (EEMCS)
Research Group:
Link to this item:http://purl.utwente.nl/publications/65237
Export this item as:BibTeX
EndNote
HTML Citation
Reference Manager

 

Repository Staff Only: item control page

Metis ID: 255028