Improving access to large patent corpora

Richard Bache, Leif Azzopardi

Research output: Chapter in Book/Report/Conference proceedingOther chapter contribution

Abstract

Retrievability is a measure of access that quantifies how easily documents can be found using a retrieval system. Such a measure is of particular interest within the patent domain, because if a retrieval system makes some patents hard to find, then patent searchers will have a difficult time retrieving these patents. This may mean that a patent searcher could miss important and relevant patents because of the retrieval system. In this paper, we describe measures of retrievability and how they can be applied to measure the overall access to a collection given a retrieval system. We then identify three features of best-match retrieval models that are hypothesized to lead to an improvement in access to all documents in the collection: sensitivity to term frequency, length normalization and convexity. Since patent searchers tend to favour Boolean models over best-match models, hybrid retrieval models are proposed that incorporate these features while preserving the desirable aspects of the traditional Boolean model. An empirical study conducted on four large patent corpora demonstrates that these hybrid models provide better access to the corpus of patents that the traditional Boolean model.
LanguageEnglish
Title of host publicationTransactions on Large-Scale Data- and Knowledge-Centered Systems II
EditorsAbdelkader Hameurlain, Josef Küng, Roland Wagner
Place of PublicationBerlin, Heidelberg
PublisherSpringer-Verlag
Pages103-121
Number of pages19
ISBN (Print)978-3-642-16174-2
Publication statusPublished - 2010
Externally publishedYes

Publication series

NameTransactions on Large-Scale Data- and Knowledge-Centered Systems
PublisherSpringer Verlag
Volume6380

Fingerprint

patent
normalization

Keywords

  • patent searching
  • information retrieval
  • Boolean searches

Cite this

Bache, R., & Azzopardi, L. (2010). Improving access to large patent corpora. In A. Hameurlain, J. Küng, & R. Wagner (Eds.), Transactions on Large-Scale Data- and Knowledge-Centered Systems II (pp. 103-121). (Transactions on Large-Scale Data- and Knowledge-Centered Systems; Vol. 6380). Berlin, Heidelberg: Springer-Verlag.
Bache, Richard ; Azzopardi, Leif. / Improving access to large patent corpora. Transactions on Large-Scale Data- and Knowledge-Centered Systems II. editor / Abdelkader Hameurlain ; Josef Küng ; Roland Wagner. Berlin, Heidelberg : Springer-Verlag, 2010. pp. 103-121 (Transactions on Large-Scale Data- and Knowledge-Centered Systems).
@inbook{f4b6af0b8880410e93aa5a10dc5cb7e1,
title = "Improving access to large patent corpora",
abstract = "Retrievability is a measure of access that quantifies how easily documents can be found using a retrieval system. Such a measure is of particular interest within the patent domain, because if a retrieval system makes some patents hard to find, then patent searchers will have a difficult time retrieving these patents. This may mean that a patent searcher could miss important and relevant patents because of the retrieval system. In this paper, we describe measures of retrievability and how they can be applied to measure the overall access to a collection given a retrieval system. We then identify three features of best-match retrieval models that are hypothesized to lead to an improvement in access to all documents in the collection: sensitivity to term frequency, length normalization and convexity. Since patent searchers tend to favour Boolean models over best-match models, hybrid retrieval models are proposed that incorporate these features while preserving the desirable aspects of the traditional Boolean model. An empirical study conducted on four large patent corpora demonstrates that these hybrid models provide better access to the corpus of patents that the traditional Boolean model.",
keywords = "patent searching, information retrieval, Boolean searches",
author = "Richard Bache and Leif Azzopardi",
year = "2010",
language = "English",
isbn = "978-3-642-16174-2",
series = "Transactions on Large-Scale Data- and Knowledge-Centered Systems",
publisher = "Springer-Verlag",
pages = "103--121",
editor = "Abdelkader Hameurlain and Josef K{\"u}ng and Roland Wagner",
booktitle = "Transactions on Large-Scale Data- and Knowledge-Centered Systems II",

}

Bache, R & Azzopardi, L 2010, Improving access to large patent corpora. in A Hameurlain, J Küng & R Wagner (eds), Transactions on Large-Scale Data- and Knowledge-Centered Systems II. Transactions on Large-Scale Data- and Knowledge-Centered Systems, vol. 6380, Springer-Verlag, Berlin, Heidelberg, pp. 103-121.

Improving access to large patent corpora. / Bache, Richard; Azzopardi, Leif.

Transactions on Large-Scale Data- and Knowledge-Centered Systems II. ed. / Abdelkader Hameurlain; Josef Küng; Roland Wagner. Berlin, Heidelberg : Springer-Verlag, 2010. p. 103-121 (Transactions on Large-Scale Data- and Knowledge-Centered Systems; Vol. 6380).

Research output: Chapter in Book/Report/Conference proceedingOther chapter contribution

TY - CHAP

T1 - Improving access to large patent corpora

AU - Bache, Richard

AU - Azzopardi, Leif

PY - 2010

Y1 - 2010

N2 - Retrievability is a measure of access that quantifies how easily documents can be found using a retrieval system. Such a measure is of particular interest within the patent domain, because if a retrieval system makes some patents hard to find, then patent searchers will have a difficult time retrieving these patents. This may mean that a patent searcher could miss important and relevant patents because of the retrieval system. In this paper, we describe measures of retrievability and how they can be applied to measure the overall access to a collection given a retrieval system. We then identify three features of best-match retrieval models that are hypothesized to lead to an improvement in access to all documents in the collection: sensitivity to term frequency, length normalization and convexity. Since patent searchers tend to favour Boolean models over best-match models, hybrid retrieval models are proposed that incorporate these features while preserving the desirable aspects of the traditional Boolean model. An empirical study conducted on four large patent corpora demonstrates that these hybrid models provide better access to the corpus of patents that the traditional Boolean model.

AB - Retrievability is a measure of access that quantifies how easily documents can be found using a retrieval system. Such a measure is of particular interest within the patent domain, because if a retrieval system makes some patents hard to find, then patent searchers will have a difficult time retrieving these patents. This may mean that a patent searcher could miss important and relevant patents because of the retrieval system. In this paper, we describe measures of retrievability and how they can be applied to measure the overall access to a collection given a retrieval system. We then identify three features of best-match retrieval models that are hypothesized to lead to an improvement in access to all documents in the collection: sensitivity to term frequency, length normalization and convexity. Since patent searchers tend to favour Boolean models over best-match models, hybrid retrieval models are proposed that incorporate these features while preserving the desirable aspects of the traditional Boolean model. An empirical study conducted on four large patent corpora demonstrates that these hybrid models provide better access to the corpus of patents that the traditional Boolean model.

KW - patent searching

KW - information retrieval

KW - Boolean searches

UR - http://dl.acm.org/citation.cfm?id=1980651.1980657

UR - http://www.springer.com/us/book/9783642161742

M3 - Other chapter contribution

SN - 978-3-642-16174-2

T3 - Transactions on Large-Scale Data- and Knowledge-Centered Systems

SP - 103

EP - 121

BT - Transactions on Large-Scale Data- and Knowledge-Centered Systems II

A2 - Hameurlain, Abdelkader

A2 - Küng, Josef

A2 - Wagner, Roland

PB - Springer-Verlag

CY - Berlin, Heidelberg

ER -

Bache R, Azzopardi L. Improving access to large patent corpora. In Hameurlain A, Küng J, Wagner R, editors, Transactions on Large-Scale Data- and Knowledge-Centered Systems II. Berlin, Heidelberg: Springer-Verlag. 2010. p. 103-121. (Transactions on Large-Scale Data- and Knowledge-Centered Systems).