Retrievability bias estimation using synthetically generated queries

Amin Abolghasemi, Suzan Verberne, Arian Askari, Leif Azzopardi

Research output: Chapter in Book/Report/Conference proceedingConference contribution book

4 Citations (Scopus)

Abstract

Ranking with pre-trained language models (PLMs) has shown to be highly effective for various Information Retrieval tasks. However, there is no prior work on evaluating PLM-based rankers in terms of their retrievability bias. In this work, we compare the retrievability bias in two of the most common PLM-based rankers, a Bi-Encoder BERT ranker and a Cross-Encoder BERT re-ranker against BM25, which was found to be one of the least biased models in prior work. Furthermore, we conduct a series of experiments with which we explore the plausibility of using synthetic queries generated with a generative model, docT5query, in the evaluation of retrievability bias. Our experiments show promising results on the use of synthetically generated queries for the purpose of retrievability bias estimation. Moreover, we find that the estimated bias values resulting from synthetically generated queries are lower than the ones estimated with user-generated queries on the MS MARCO evaluation benchmark. This indicates that synthetically generated queries might cause less bias than user-generated queries and therefore, by using such queries in training PLM-based rankers, we might be able to reduce the retrievability bias in these models.

Original languageEnglish
Title of host publicationCIKM 2023 - Proceedings of the 32nd ACM International Conference on Information and Knowledge Management
Place of PublicationNew York
Pages3712-3716
Number of pages5
ISBN (Electronic)9798400701245
DOIs
Publication statusPublished - 21 Oct 2023
Event32nd ACM International Conference on Information and Knowledge Management, CIKM 2023 - Birmingham, United Kingdom
Duration: 21 Oct 202325 Oct 2023

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Conference

Conference32nd ACM International Conference on Information and Knowledge Management, CIKM 2023
Country/TerritoryUnited Kingdom
CityBirmingham
Period21/10/2325/10/23

Keywords

  • bias
  • evaluation
  • query generation
  • retrievability

Fingerprint

Dive into the research topics of 'Retrievability bias estimation using synthetically generated queries'. Together they form a unique fingerprint.

Cite this