The Influence of Basic Tokenization on Biomedical Document Retrieval
Trieschnigg, Dolf and Kraaij, Wessel and Jong de, Franciska (2007) The Influence of Basic Tokenization on Biomedical Document Retrieval. In: 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 23-27 July 2007, Amsterdam, The Netherlands.
| PDF Restricted to UT campus only: Request a copy 143Kb |
| Abstract: | Tokenization is a fundamental preprocessing step in Information Retrieval systems in which text is turned into index terms. This paper quantifies and compares the influence of various simple tokenization techniques on document retrieval effectiveness in two domains: biomedicine and news. As expected, biomedical retrieval is more sensitive to small changes in the tokenization method. The tokenization strategy can make the difference between a mediocre and well performing IR system, especially in the biomedical domain. |
| Item Type: | Conference or Workshop Item |
| Copyright: | © 2007 ACM |
| Faculty: | Electrical Engineering, Mathematics and Computer Science (EEMCS) |
| Research Group: | |
| Link to this item: | http://purl.utwente.nl/publications/61906 |
| Official URL: | http://doi.acm.org/10.1145/1277741.1277917 |
| Export this item as: | BibTeX EndNote HTML Citation Reference Manager |
Repository Staff Only: item control page
Metis ID: 241899

Show download statistics for this publication
Show download statistics for this publication