Retrieving Web Pages using Content, Links, URLs and Anchors

Share/Save/Bookmark

Westerveld, Thijs and Kraaij, Wessel and Hiemstra, Djoerd (2002) Retrieving Web Pages using Content, Links, URLs and Anchors. In: Tenth Text REtrieval Conference, TREC 2001, November 13-16 2001, Gaithersburg, Maryland, USA (pp. pp. 663-672).

open access
[img]
Preview
PDF
102kB
Abstract:For this year’s web track, we concentrated on the entry page finding task. For the content-only runs, in both the ad-hoc task and the entry page finding task, we used an information retrieval system based on a simple unigram language model. In the Ad hoc task we experimented with alternatieve approaches to smoothing. For the entry page task, we incorporated additional information into the model. The sources of information we used in addition to the document’s content are links, URLs and anchors. We found that almost every approach can improve the results of a content only run. In the end, a very basic approach, using the depth of the path of the URL as a prior, yielded by far the largest improvement over the content only results.
Item Type:Conference or Workshop Item
Additional information:Imported from EWI/DB PMS [db-utwente:inpr:0000003205]
Faculty:
Electrical Engineering, Mathematics and Computer Science (EEMCS)
Research Group:
Link to this item:http://purl.utwente.nl/publications/66475
Proceedings URL:http://trec.nist.gov/pubs/trec10/t10_proceedings.html
Export this item as:BibTeX
EndNote
HTML Citation
Reference Manager

 

Repository Staff Only: item control page

Metis ID: 204321