The accessibility dimension for structured document retrieval

Thomas Roelleke, Mounia Lalmas, Gabriella Kazai, Ian Ruthven, Stefan Quicker, F. Crestani (Editor), M. Dunlop (Editor), S. Mizzaro (Editor)

Research output: Chapter in Book/Report/Conference proceedingChapter

12 Citations (Scopus)
35 Downloads (Pure)

Abstract

Structured document retrieval aims at retrieving the document components that best satisfy a query, instead of merely retrieving pre-defined document units. This paper reports on an investigation of a tf-idf-acc approach, where tf and idf are the classical term frequency and inverse document frequency, and acc, a new parameter called accessibility, that captures the structure of documents. The tf-idf-acc approach is defined using a probabilistic relational algebra. To investigate the retrieval quality and estimate the acc values, we developed a method that automatically constructs diverse test collections of structured documents from a standard test collection, with which experiments were carried out. The analysis of the experiments provides estimates of the acc values.
Original languageEnglish
Title of host publicationAdvances in Information Retrieval
Place of PublicationGermany
PublisherSpringer
Pages284-302
Number of pages18
Volume2291
ISBN (Print)978-3-540-43343-9
DOIs
Publication statusPublished - 25 Mar 2002

Publication series

NameLecture Notes in Computer Science
PublisherSpringer

Keywords

  • structured document retrieval
  • probabilistic relational algebra
  • accessibility dimension

Fingerprint

Dive into the research topics of 'The accessibility dimension for structured document retrieval'. Together they form a unique fingerprint.

Cite this