TY - GEN
T1 - Towards better measures
T2 - evaluation of estimated resource description quality for distributed IR
AU - Baillie, Mark
AU - Azzopardi, Leif
AU - Crestani, Fabio
PY - 2006/5/30
Y1 - 2006/5/30
N2 - An open problem for Distributed Information Retrieval systems (DIR) is how to represent large document repositories, also known as resources, both accurately and efficiently. Obtaining resource description estimates is an important phase in DIR, especially in non-cooperative environments. Measuring the quality of an estimated resource description is a contentious issue as current measures do not provide an adequate indication of quality. In this paper, we provide an overview of these currently applied measures of resource description quality, before proposing the Kullback-Leibler (KL) divergence as an alternative. Through experimentation we illustrate the shortcomings of these past measures, whilst providing evidence that KL is a more appropriate measure of quality. When applying KL to compare different QBS algorithms, our experiments provide strong evidence in favour of a previously unsupported hypothesis originally posited in the initial Query-Based Sampling work.
AB - An open problem for Distributed Information Retrieval systems (DIR) is how to represent large document repositories, also known as resources, both accurately and efficiently. Obtaining resource description estimates is an important phase in DIR, especially in non-cooperative environments. Measuring the quality of an estimated resource description is a contentious issue as current measures do not provide an adequate indication of quality. In this paper, we provide an overview of these currently applied measures of resource description quality, before proposing the Kullback-Leibler (KL) divergence as an alternative. Through experimentation we illustrate the shortcomings of these past measures, whilst providing evidence that KL is a more appropriate measure of quality. When applying KL to compare different QBS algorithms, our experiments provide strong evidence in favour of a previously unsupported hypothesis originally posited in the initial Query-Based Sampling work.
KW - information retrieval
KW - search engine optimization
KW - retrieval evaluation
U2 - 10.1145/1146847.1146888
DO - 10.1145/1146847.1146888
M3 - Conference contribution book
SN - 1-59593-428-6
T3 - InfoScale '06
BT - InfoScale '06 Proceedings of the 1st International Conference on Scalable Information Systems
CY - New York, NY, USA
ER -