An empirical analysis of pruning techniques performance, retrievability and bias

Ruey-Cheng Chen, Leif Azzopardi, Falk Scholer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Prior work on using retrievability measures in the evaluation of information retrieval (IR) systems has laid out the foundations for investigating the relation between retrieval performance and retrieval bias. While various factors influencing retrievability have been examined, showing how the retrieval model may influence bias, no prior work has examined the impact of the index (and how it is optimized) on retrieval bias. Intuitively, how the documents are represented, and what terms they contain, will influence whether they are retrievable or not. In this paper, we investigate how the retrieval bias of a system changes as the inverted index is optimized for efficiency through static index pruning. In our analysis, we consider four pruning methods and examine how they affect performance and bias on the TREC GOV2 Collection. Our results show that the relationship between these factors is varied and complex-and very much dependent on the pruning algorithm. We find that more pruning results in relatively little change or a slight decrease in bias up to a point, and then a dramatic increase. The increase in bias corresponds to a sharp decrease in early precision such as NDCG@10 and is also indicative of a large decrease in MAP. The findings suggest that the impact of pruning algorithms can be quite varied-but retrieval bias could be used to guide the pruning process. Further work is required to determine precisely which documents are most affected and how this impacts upon performance.

LanguageEnglish
Title of host publicationCIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management
Place of PublicationNew York
Pages2023-2026
Number of pages4
DOIs
StatePublished - 6 Nov 2017
Event26th ACM International Conference on Information and Knowledge Management, CIKM 2017 - Singapore, Singapore
Duration: 6 Nov 201710 Nov 2017

Conference

Conference26th ACM International Conference on Information and Knowledge Management, CIKM 2017
CountrySingapore
CitySingapore
Period6/11/1710/11/17

Fingerprint

Information retrieval systems

Keywords

  • indexing
  • pruning
  • retrievability
  • information retrieval

Cite this

Chen, R-C., Azzopardi, L., & Scholer, F. (2017). An empirical analysis of pruning techniques performance, retrievability and bias. In CIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management (pp. 2023-2026). New York. DOI: 10.1145/3132847.3133151
Chen, Ruey-Cheng ; Azzopardi, Leif ; Scholer, Falk. / An empirical analysis of pruning techniques performance, retrievability and bias. CIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management. New York, 2017. pp. 2023-2026
@inproceedings{2579c8de81bb4c9b81d679c7985c7e3f,
title = "An empirical analysis of pruning techniques performance, retrievability and bias",
abstract = "Prior work on using retrievability measures in the evaluation of information retrieval (IR) systems has laid out the foundations for investigating the relation between retrieval performance and retrieval bias. While various factors influencing retrievability have been examined, showing how the retrieval model may influence bias, no prior work has examined the impact of the index (and how it is optimized) on retrieval bias. Intuitively, how the documents are represented, and what terms they contain, will influence whether they are retrievable or not. In this paper, we investigate how the retrieval bias of a system changes as the inverted index is optimized for efficiency through static index pruning. In our analysis, we consider four pruning methods and examine how they affect performance and bias on the TREC GOV2 Collection. Our results show that the relationship between these factors is varied and complex-and very much dependent on the pruning algorithm. We find that more pruning results in relatively little change or a slight decrease in bias up to a point, and then a dramatic increase. The increase in bias corresponds to a sharp decrease in early precision such as NDCG@10 and is also indicative of a large decrease in MAP. The findings suggest that the impact of pruning algorithms can be quite varied-but retrieval bias could be used to guide the pruning process. Further work is required to determine precisely which documents are most affected and how this impacts upon performance.",
keywords = "indexing, pruning, retrievability, information retrieval",
author = "Ruey-Cheng Chen and Leif Azzopardi and Falk Scholer",
year = "2017",
month = "11",
day = "6",
doi = "10.1145/3132847.3133151",
language = "English",
isbn = "9781450349185",
pages = "2023--2026",
booktitle = "CIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management",

}

Chen, R-C, Azzopardi, L & Scholer, F 2017, An empirical analysis of pruning techniques performance, retrievability and bias. in CIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management. New York, pp. 2023-2026, 26th ACM International Conference on Information and Knowledge Management, CIKM 2017, Singapore, Singapore, 6/11/17. DOI: 10.1145/3132847.3133151

An empirical analysis of pruning techniques performance, retrievability and bias. / Chen, Ruey-Cheng; Azzopardi, Leif; Scholer, Falk.

CIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management. New York, 2017. p. 2023-2026.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - An empirical analysis of pruning techniques performance, retrievability and bias

AU - Chen,Ruey-Cheng

AU - Azzopardi,Leif

AU - Scholer,Falk

PY - 2017/11/6

Y1 - 2017/11/6

N2 - Prior work on using retrievability measures in the evaluation of information retrieval (IR) systems has laid out the foundations for investigating the relation between retrieval performance and retrieval bias. While various factors influencing retrievability have been examined, showing how the retrieval model may influence bias, no prior work has examined the impact of the index (and how it is optimized) on retrieval bias. Intuitively, how the documents are represented, and what terms they contain, will influence whether they are retrievable or not. In this paper, we investigate how the retrieval bias of a system changes as the inverted index is optimized for efficiency through static index pruning. In our analysis, we consider four pruning methods and examine how they affect performance and bias on the TREC GOV2 Collection. Our results show that the relationship between these factors is varied and complex-and very much dependent on the pruning algorithm. We find that more pruning results in relatively little change or a slight decrease in bias up to a point, and then a dramatic increase. The increase in bias corresponds to a sharp decrease in early precision such as NDCG@10 and is also indicative of a large decrease in MAP. The findings suggest that the impact of pruning algorithms can be quite varied-but retrieval bias could be used to guide the pruning process. Further work is required to determine precisely which documents are most affected and how this impacts upon performance.

AB - Prior work on using retrievability measures in the evaluation of information retrieval (IR) systems has laid out the foundations for investigating the relation between retrieval performance and retrieval bias. While various factors influencing retrievability have been examined, showing how the retrieval model may influence bias, no prior work has examined the impact of the index (and how it is optimized) on retrieval bias. Intuitively, how the documents are represented, and what terms they contain, will influence whether they are retrievable or not. In this paper, we investigate how the retrieval bias of a system changes as the inverted index is optimized for efficiency through static index pruning. In our analysis, we consider four pruning methods and examine how they affect performance and bias on the TREC GOV2 Collection. Our results show that the relationship between these factors is varied and complex-and very much dependent on the pruning algorithm. We find that more pruning results in relatively little change or a slight decrease in bias up to a point, and then a dramatic increase. The increase in bias corresponds to a sharp decrease in early precision such as NDCG@10 and is also indicative of a large decrease in MAP. The findings suggest that the impact of pruning algorithms can be quite varied-but retrieval bias could be used to guide the pruning process. Further work is required to determine precisely which documents are most affected and how this impacts upon performance.

KW - indexing

KW - pruning

KW - retrievability

KW - information retrieval

UR - http://cikm2017.org/

UR - http://www.scopus.com/inward/record.url?scp=85037355739&partnerID=8YFLogxK

U2 - 10.1145/3132847.3133151

DO - 10.1145/3132847.3133151

M3 - Conference contribution

SN - 9781450349185

SP - 2023

EP - 2026

BT - CIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management

CY - New York

ER -

Chen R-C, Azzopardi L, Scholer F. An empirical analysis of pruning techniques performance, retrievability and bias. In CIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management. New York. 2017. p. 2023-2026. Available from, DOI: 10.1145/3132847.3133151