Abstract
Prior work on using retrievability measures in the evaluation of information retrieval (IR) systems has laid out the foundations for investigating the relation between retrieval performance and retrieval bias. While various factors influencing retrievability have been examined, showing how the retrieval model may influence bias, no prior work has examined the impact of the index (and how it is optimized) on retrieval bias. Intuitively, how the documents are represented, and what terms they contain, will influence whether they are retrievable or not. In this paper, we investigate how the retrieval bias of a system changes as the inverted index is optimized for efficiency through static index pruning. In our analysis, we consider four pruning methods and examine how they affect performance and bias on the TREC GOV2 Collection. Our results show that the relationship between these factors is varied and complex-and very much dependent on the pruning algorithm. We find that more pruning results in relatively little change or a slight decrease in bias up to a point, and then a dramatic increase. The increase in bias corresponds to a sharp decrease in early precision such as NDCG@10 and is also indicative of a large decrease in MAP. The findings suggest that the impact of pruning algorithms can be quite varied-but retrieval bias could be used to guide the pruning process. Further work is required to determine precisely which documents are most affected and how this impacts upon performance.
Original language | English |
---|---|
Title of host publication | CIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management |
Place of Publication | New York |
Pages | 2023-2026 |
Number of pages | 4 |
DOIs | |
Publication status | Published - 6 Nov 2017 |
Event | 26th ACM International Conference on Information and Knowledge Management, CIKM 2017 - Singapore, Singapore Duration: 6 Nov 2017 → 10 Nov 2017 |
Conference
Conference | 26th ACM International Conference on Information and Knowledge Management, CIKM 2017 |
---|---|
Country/Territory | Singapore |
City | Singapore |
Period | 6/11/17 → 10/11/17 |
Keywords
- indexing
- pruning
- retrievability
- information retrieval