Towards better measures: evaluation of estimated resource description quality for distributed IR

M. Baillie, L. Azzopardi, F. Crestani

Research output: Contribution to conferencePaper

5 Citations (Scopus)

Abstract

An open problem for Distributed Information Retrieval systems (DIR) is how to represent large document repositories, also known as resources, both accurately and efficiently. Obtaining resource description estimates is an important phase in DIR, especially in non-cooperative environments. Measuring the quality of an estimated resource description is a contentious issue as current measures do not provide an adequate indication of quality. In this paper, we provide an overview of these currently applied measures of resource description quality, before proposing the Kullback-Leibler (KL) divergence as an alternative. Through experimentation we illustrate the shortcomings of these past measures, whilst providing evidence that KL is a more appropriate measure of quality. When applying KL to compare different QBS algorithms, our experiments provide strong evidence in favour of a previously unsupported hypothesis originally posited in the initial Query-Based Sampling work.

Conference

ConferenceFirst International Conference on Scalable Information Systems
Abbreviated titleINFOSCALE 2006
CityHong Kong
Period30/05/061/06/06

Fingerprint

Sampling
Experiments

Keywords

  • cataloguing
  • resource description
  • metadata
  • information retrieval
  • searching
  • search algorithm

Cite this

Baillie, M., Azzopardi, L., & Crestani, F. (2006). Towards better measures: evaluation of estimated resource description quality for distributed IR. Paper presented at First International Conference on Scalable Information Systems , Hong Kong, .
Baillie, M. ; Azzopardi, L. ; Crestani, F. / Towards better measures: evaluation of estimated resource description quality for distributed IR. Paper presented at First International Conference on Scalable Information Systems , Hong Kong, .
@conference{f44e2b89a26b44c29a8ee195be6b0c94,
title = "Towards better measures: evaluation of estimated resource description quality for distributed IR",
abstract = "An open problem for Distributed Information Retrieval systems (DIR) is how to represent large document repositories, also known as resources, both accurately and efficiently. Obtaining resource description estimates is an important phase in DIR, especially in non-cooperative environments. Measuring the quality of an estimated resource description is a contentious issue as current measures do not provide an adequate indication of quality. In this paper, we provide an overview of these currently applied measures of resource description quality, before proposing the Kullback-Leibler (KL) divergence as an alternative. Through experimentation we illustrate the shortcomings of these past measures, whilst providing evidence that KL is a more appropriate measure of quality. When applying KL to compare different QBS algorithms, our experiments provide strong evidence in favour of a previously unsupported hypothesis originally posited in the initial Query-Based Sampling work.",
keywords = "cataloguing, resource description, metadata, information retrieval, searching, search algorithm",
author = "M. Baillie and L. Azzopardi and F. Crestani",
year = "2006",
language = "English",
note = "First International Conference on Scalable Information Systems , INFOSCALE 2006 ; Conference date: 30-05-2006 Through 01-06-2006",

}

Baillie, M, Azzopardi, L & Crestani, F 2006, 'Towards better measures: evaluation of estimated resource description quality for distributed IR' Paper presented at First International Conference on Scalable Information Systems , Hong Kong, 30/05/06 - 1/06/06, .

Towards better measures: evaluation of estimated resource description quality for distributed IR. / Baillie, M.; Azzopardi, L.; Crestani, F.

2006. Paper presented at First International Conference on Scalable Information Systems , Hong Kong, .

Research output: Contribution to conferencePaper

TY - CONF

T1 - Towards better measures: evaluation of estimated resource description quality for distributed IR

AU - Baillie, M.

AU - Azzopardi, L.

AU - Crestani, F.

PY - 2006

Y1 - 2006

N2 - An open problem for Distributed Information Retrieval systems (DIR) is how to represent large document repositories, also known as resources, both accurately and efficiently. Obtaining resource description estimates is an important phase in DIR, especially in non-cooperative environments. Measuring the quality of an estimated resource description is a contentious issue as current measures do not provide an adequate indication of quality. In this paper, we provide an overview of these currently applied measures of resource description quality, before proposing the Kullback-Leibler (KL) divergence as an alternative. Through experimentation we illustrate the shortcomings of these past measures, whilst providing evidence that KL is a more appropriate measure of quality. When applying KL to compare different QBS algorithms, our experiments provide strong evidence in favour of a previously unsupported hypothesis originally posited in the initial Query-Based Sampling work.

AB - An open problem for Distributed Information Retrieval systems (DIR) is how to represent large document repositories, also known as resources, both accurately and efficiently. Obtaining resource description estimates is an important phase in DIR, especially in non-cooperative environments. Measuring the quality of an estimated resource description is a contentious issue as current measures do not provide an adequate indication of quality. In this paper, we provide an overview of these currently applied measures of resource description quality, before proposing the Kullback-Leibler (KL) divergence as an alternative. Through experimentation we illustrate the shortcomings of these past measures, whilst providing evidence that KL is a more appropriate measure of quality. When applying KL to compare different QBS algorithms, our experiments provide strong evidence in favour of a previously unsupported hypothesis originally posited in the initial Query-Based Sampling work.

KW - cataloguing

KW - resource description

KW - metadata

KW - information retrieval

KW - searching

KW - search algorithm

UR - http://www.infoscale.org/

UR - http://www.peng-project.org/downloads/10.pdf

M3 - Paper

ER -

Baillie M, Azzopardi L, Crestani F. Towards better measures: evaluation of estimated resource description quality for distributed IR. 2006. Paper presented at First International Conference on Scalable Information Systems , Hong Kong, .