Improving access to large patent corpora

Richard Bache, Leif Azzopardi

Research output: Chapter in Book/Report/Conference proceedingConference contribution book

10 Citations (Scopus)


Retrievability is a measure of access that quantifies how easily documents can be found using a retrieval system. Such a measure is of particular interest within the patent domain, because if a retrieval system makes some patents hard to find, then patent searchers will have a difficult time retrieving these patents. This may mean that a patent searcher could miss important and relevant patents because of the retrieval system. In this paper, we describe measures of retrievability and how they can be applied to measure the overall access to a collection given a retrieval system. We then identify three features of best-match retrieval models that are hypothesized to lead to an improvement in access to all documents in the collection: sensitivity to term frequency, length normalization and convexity. Since patent searchers tend to favour Boolean models over best-match models, hybrid retrieval models are proposed that incorporate these features while preserving the desirable aspects of the traditional Boolean model. An empirical study conducted on four large patent corpora demonstrates that these hybrid models provide better access to the corpus of patents that the traditional Boolean model.
Original languageEnglish
Title of host publicationTransactions on Large-Scale Data- and Knowledge-Centered Systems II
EditorsAbdelkader Hameurlain, Josef Küng, Roland Wagner
Place of PublicationBerlin, Heidelberg
Number of pages19
ISBN (Electronic)9783642161759
ISBN (Print)9783642161742
Publication statusPublished - 16 Sept 2010
Event11th International Conference on Data Warehousing and Knowledge Discovery, DaWaK 2009 - Linz, Austria
Duration: 31 Aug 20092 Sept 2009

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume6380 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference11th International Conference on Data Warehousing and Knowledge Discovery, DaWaK 2009


  • patent searching
  • information retrieval
  • Boolean searches


Dive into the research topics of 'Improving access to large patent corpora'. Together they form a unique fingerprint.

Cite this