The impact of fielding on retrieval performance and bias

Colin Wilkie, Leif Azzopardi

Research output: Contribution to conferencePaper

1 Citation (Scopus)
12 Downloads (Pure)

Abstract

Within many domains, such as news, medicine and patent, documents contain a variety of fields such as title, author, body, source, etc. As such fielded retrieval models that query across fields are often employed. It is largely presumed that fielding provides a better representation of the document and offers more control when querying, and that this will lead to improved retrieval performance. However, depending on how the fields are weighted and if the fields are populated, the retrieval algorithm may unduly favour certain documents over others. This is known as algorithmic bias and it may be detrimental to retrieval systems performance. In this paper, we explore the impact of fielding on retrieval bias and performance across a variety of TREC News Test Collections. We perform an extensive large-scale analysis on two types of fielded retrieval model variations that are based on the popular BM25 retrieval algorithm where either: fields are scored independently and then combined (Model 1), or fields are first combined and then scored (Model 2). Our findings show that for Model 1 fielding, a strong correlation exists between retrieval bias and performance such that as title fields are weighted more heavily, bias increases, while retrieval performance decreases. When weighting is applied to content-based fields, performance increases as bias decreases, showing that relying more on content may be favourable in terms of fairness and performance. On the other hand, for Model 2 fielding, the relationship between retrieval bias and performance is more complex. But, crucially we show that Model 2 fielding results in lower retrieval bias and greater performance than Model 1 fielding. And, we observed that under Model 1, news articles without titles are substantially less retrievable (i.e. more susceptible to algorithmic bias). These findings have serious ramifications as many popular Open Source Information Retrieval frameworks, commonly used by professional searchers, use the default implementation of Model 1 for their fielded search capability. This research shows the importance of analysing retrieval algorithms with respect to both bias and performance to ensure they minimize any unwanted or unintended biases when maximising performance. Further work is required to examine this phenomenon in more detail and to design fielded retrieval models that have the advantages of control and performance without detrimental biases.
Original languageEnglish
Number of pages10
Publication statusPublished - 10 Nov 2018
EventASIS&T Annual Meeting 2018: Building an Ethical and Sustainable Information Future with Emerging Technology - Vancouver, Canada
Duration: 10 Nov 201814 Nov 2018
https://www.asist.org/am18/

Conference

ConferenceASIS&T Annual Meeting 2018
Abbreviated titleASIST 2018
CountryCanada
CityVancouver
Period10/11/1814/11/18
Internet address

Keywords

  • algorithmic bias
  • information retrieval
  • search engine bias

Research Output

  • 1 Citations
  • 4 Conference contribution book

Algorithmic bias: do good systems make relevant documents more retrievable?

Wilkie, C. & Azzopardi, L., 6 Nov 2017, CIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management. New York, p. 2375-2378 4 p.

Research output: Chapter in Book/Report/Conference proceedingConference contribution book

Open Access
File
  • 3 Citations (Scopus)
    56 Downloads (Pure)

    A retrievability analysis: exploring the relationship between retrieval bias and retrieval performance

    Wilkie, C. & Azzopardi, L., 3 Nov 2014, CIKM '14 Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management . New York, NY, USA, p. 81-90 10 p.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution book

    8 Citations (Scopus)

    Relating retrievability, performance and length

    Wilkie, C. & Azzopardi, L., 28 Jul 2013, SIGIR '13 Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY, USA, p. 937-940 4 p.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution book

    16 Citations (Scopus)

    Cite this

    Wilkie, C., & Azzopardi, L. (2018). The impact of fielding on retrieval performance and bias. Paper presented at ASIS&T Annual Meeting 2018, Vancouver, Canada.