Shard Ranking and Cutoff Estimation for Topically Partitioned Collections

Share/Save/Bookmark

Kulkarni, Anagha and Tigelaar, Almer S. and Hiemstra, Djoerd and Callan, Jamie (2012) Shard Ranking and Cutoff Estimation for Topically Partitioned Collections. In: 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, 29 October - 02 November 2012, Maui, Hawaii.

[img] PDF
Restricted to UT campus only
: Request a copy
182kB
Abstract:Large document collections can be partitioned into topical shards to facilitate distributed search. In a low-resource search environment only a few of the shards can be searched in parallel. Such a search environment faces two intertwined challenges. First, determining which shards to consult for a given query: shard ranking. Second, how many shards to consult from the ranking: cutoff estimation. In this paper we present a family of three algorithms that address both of these problems. As a basis we employ a commonly used data structure, the central sample index (CSI), to represent the shard contents. Running a query against the CSI yields a flat document ranking that each of our algorithms transforms into a tree structure. A bottom up traversal of the tree is used to infer a ranking of shards and also to estimate a stopping point in this ranking that yields cost-effective selective distributed search. As compared to a state-of-the-art shard ranking approach the proposed algorithms provide substantially higher search efficiency while providing comparable search effectiveness.
Item Type:Conference or Workshop Item
Copyright:© 2012 ACM
Faculty:
Electrical Engineering, Mathematics and Computer Science (EEMCS)
Research Group:
Link to this item:http://purl.utwente.nl/publications/81533
Export this item as:BibTeX
EndNote
HTML Citation
Reference Manager

 

Repository Staff Only: item control page