A Linguistically Motivated Probabilistic Model of Information Retrieval


Hiemstra, Djoerd (1998) A Linguistically Motivated Probabilistic Model of Information Retrieval. In: Second European Conference on Research and Advanced Technology for Digital Libraries, ECDL 1998, September 21-23, 1998, Heraklion, Crete, Greece (pp. pp. 569-584).

open access
Abstract:This paper presents a new probabilistic model of information retrieval. The most important modeling assumption made is that documents and queries are defined by an ordered sequence of single terms. This assumption is not made in well known existing models of information retrieval, but is essential in the field of statistical natural language processing. Advances already made in statistical natural language processing will be used in this paper to formulate a probabilistic justification for using tf x idf term weighting. The paper shows that the new probabilistic interpretation of tf x idf term weighting might lead to better understanding of statistical ranking mechanisms, for example by explaining how they relate to coordination level ranking. A pilot experiment on the Cranfield test collection indicates that the presented model outperforms the vector space model with classical tf x idf and cosine length normalisation.
Item Type:Conference or Workshop Item
Copyright:© 1998 Springer
Electrical Engineering, Mathematics and Computer Science (EEMCS)
Research Group:
Link to this item:http://purl.utwente.nl/publications/66993
Official URL:https://doi.org/10.1007/3-540-49653-X_34
Export this item as:BibTeX
HTML Citation
Reference Manager


Repository Staff Only: item control page

Metis ID: 119183