The impact of fielding on retrieval performance and bias

Colin Wilkie, Leif Azzopardi

Research output: Contribution to conferencePaper

Abstract

Within many domains, such as news, medicine and patent, documents contain a variety of fields such as title, author, body, source, etc. As such fielded retrieval models that query across fields are often employed. It is largely presumed that fielding provides a better representation of the document and offers more control when querying, and that this will lead to improved retrieval performance. However, depending on how the fields are weighted and if the fields are populated, the retrieval algorithm may unduly favour certain documents over others. This is known as algorithmic bias and it may be detrimental to retrieval systems performance. In this paper, we explore the impact of fielding on retrieval bias and performance across a variety of TREC News Test Collections. We perform an extensive large-scale analysis on two types of fielded retrieval model variations that are based on the popular BM25 retrieval algorithm where either: fields are scored independently and then combined (Model 1), or fields are first combined and then scored (Model 2). Our findings show that for Model 1 fielding, a strong correlation exists between retrieval bias and performance such that as title fields are weighted more heavily, bias increases, while retrieval performance decreases. When weighting is applied to content-based fields, performance increases as bias decreases, showing that relying more on content may be favourable in terms of fairness and performance. On the other hand, for Model 2 fielding, the relationship between retrieval bias and performance is more complex. But, crucially we show that Model 2 fielding results in lower retrieval bias and greater performance than Model 1 fielding. And, we observed that under Model 1, news articles without titles are substantially less retrievable (i.e. more susceptible to algorithmic bias). These findings have serious ramifications as many popular Open Source Information Retrieval frameworks, commonly used by professional searchers, use the default implementation of Model 1 for their fielded search capability. This research shows the importance of analysing retrieval algorithms with respect to both bias and performance to ensure they minimize any unwanted or unintended biases when maximising performance. Further work is required to examine this phenomenon in more detail and to design fielded retrieval models that have the advantages of control and performance without detrimental biases.

Conference

ConferenceASIS&T Annual Meeting 2018
Abbreviated titleASIST 2018
CountryCanada
CityVancouver
Period10/11/1814/11/18
Internet address

Fingerprint

trend
performance
news
weighting
information retrieval
patent
fairness
medicine

Keywords

  • algorithmic bias
  • information retrieval
  • search engine bias

Cite this

Wilkie, C., & Azzopardi, L. (2018). The impact of fielding on retrieval performance and bias. Paper presented at ASIS&T Annual Meeting 2018, Vancouver, Canada.
Wilkie, Colin ; Azzopardi, Leif. / The impact of fielding on retrieval performance and bias. Paper presented at ASIS&T Annual Meeting 2018, Vancouver, Canada.10 p.
@conference{69858d9c5acc4d4a8482c9ad1f572011,
title = "The impact of fielding on retrieval performance and bias",
abstract = "Within many domains, such as news, medicine and patent, documents contain a variety of fields such as title, author, body, source, etc. As such fielded retrieval models that query across fields are often employed. It is largely presumed that fielding provides a better representation of the document and offers more control when querying, and that this will lead to improved retrieval performance. However, depending on how the fields are weighted and if the fields are populated, the retrieval algorithm may unduly favour certain documents over others. This is known as algorithmic bias and it may be detrimental to retrieval systems performance. In this paper, we explore the impact of fielding on retrieval bias and performance across a variety of TREC News Test Collections. We perform an extensive large-scale analysis on two types of fielded retrieval model variations that are based on the popular BM25 retrieval algorithm where either: fields are scored independently and then combined (Model 1), or fields are first combined and then scored (Model 2). Our findings show that for Model 1 fielding, a strong correlation exists between retrieval bias and performance such that as title fields are weighted more heavily, bias increases, while retrieval performance decreases. When weighting is applied to content-based fields, performance increases as bias decreases, showing that relying more on content may be favourable in terms of fairness and performance. On the other hand, for Model 2 fielding, the relationship between retrieval bias and performance is more complex. But, crucially we show that Model 2 fielding results in lower retrieval bias and greater performance than Model 1 fielding. And, we observed that under Model 1, news articles without titles are substantially less retrievable (i.e. more susceptible to algorithmic bias). These findings have serious ramifications as many popular Open Source Information Retrieval frameworks, commonly used by professional searchers, use the default implementation of Model 1 for their fielded search capability. This research shows the importance of analysing retrieval algorithms with respect to both bias and performance to ensure they minimize any unwanted or unintended biases when maximising performance. Further work is required to examine this phenomenon in more detail and to design fielded retrieval models that have the advantages of control and performance without detrimental biases.",
keywords = "algorithmic bias, information retrieval, search engine bias",
author = "Colin Wilkie and Leif Azzopardi",
year = "2018",
month = "11",
day = "10",
language = "English",
note = "ASIS&T Annual Meeting 2018 : Building an Ethical and Sustainable Information Future with Emerging Technology, ASIST 2018 ; Conference date: 10-11-2018 Through 14-11-2018",
url = "https://www.asist.org/am18/",

}

Wilkie, C & Azzopardi, L 2018, 'The impact of fielding on retrieval performance and bias' Paper presented at ASIS&T Annual Meeting 2018, Vancouver, Canada, 10/11/18 - 14/11/18, .

The impact of fielding on retrieval performance and bias. / Wilkie, Colin; Azzopardi, Leif.

2018. Paper presented at ASIS&T Annual Meeting 2018, Vancouver, Canada.

Research output: Contribution to conferencePaper

TY - CONF

T1 - The impact of fielding on retrieval performance and bias

AU - Wilkie,Colin

AU - Azzopardi,Leif

PY - 2018/11/10

Y1 - 2018/11/10

N2 - Within many domains, such as news, medicine and patent, documents contain a variety of fields such as title, author, body, source, etc. As such fielded retrieval models that query across fields are often employed. It is largely presumed that fielding provides a better representation of the document and offers more control when querying, and that this will lead to improved retrieval performance. However, depending on how the fields are weighted and if the fields are populated, the retrieval algorithm may unduly favour certain documents over others. This is known as algorithmic bias and it may be detrimental to retrieval systems performance. In this paper, we explore the impact of fielding on retrieval bias and performance across a variety of TREC News Test Collections. We perform an extensive large-scale analysis on two types of fielded retrieval model variations that are based on the popular BM25 retrieval algorithm where either: fields are scored independently and then combined (Model 1), or fields are first combined and then scored (Model 2). Our findings show that for Model 1 fielding, a strong correlation exists between retrieval bias and performance such that as title fields are weighted more heavily, bias increases, while retrieval performance decreases. When weighting is applied to content-based fields, performance increases as bias decreases, showing that relying more on content may be favourable in terms of fairness and performance. On the other hand, for Model 2 fielding, the relationship between retrieval bias and performance is more complex. But, crucially we show that Model 2 fielding results in lower retrieval bias and greater performance than Model 1 fielding. And, we observed that under Model 1, news articles without titles are substantially less retrievable (i.e. more susceptible to algorithmic bias). These findings have serious ramifications as many popular Open Source Information Retrieval frameworks, commonly used by professional searchers, use the default implementation of Model 1 for their fielded search capability. This research shows the importance of analysing retrieval algorithms with respect to both bias and performance to ensure they minimize any unwanted or unintended biases when maximising performance. Further work is required to examine this phenomenon in more detail and to design fielded retrieval models that have the advantages of control and performance without detrimental biases.

AB - Within many domains, such as news, medicine and patent, documents contain a variety of fields such as title, author, body, source, etc. As such fielded retrieval models that query across fields are often employed. It is largely presumed that fielding provides a better representation of the document and offers more control when querying, and that this will lead to improved retrieval performance. However, depending on how the fields are weighted and if the fields are populated, the retrieval algorithm may unduly favour certain documents over others. This is known as algorithmic bias and it may be detrimental to retrieval systems performance. In this paper, we explore the impact of fielding on retrieval bias and performance across a variety of TREC News Test Collections. We perform an extensive large-scale analysis on two types of fielded retrieval model variations that are based on the popular BM25 retrieval algorithm where either: fields are scored independently and then combined (Model 1), or fields are first combined and then scored (Model 2). Our findings show that for Model 1 fielding, a strong correlation exists between retrieval bias and performance such that as title fields are weighted more heavily, bias increases, while retrieval performance decreases. When weighting is applied to content-based fields, performance increases as bias decreases, showing that relying more on content may be favourable in terms of fairness and performance. On the other hand, for Model 2 fielding, the relationship between retrieval bias and performance is more complex. But, crucially we show that Model 2 fielding results in lower retrieval bias and greater performance than Model 1 fielding. And, we observed that under Model 1, news articles without titles are substantially less retrievable (i.e. more susceptible to algorithmic bias). These findings have serious ramifications as many popular Open Source Information Retrieval frameworks, commonly used by professional searchers, use the default implementation of Model 1 for their fielded search capability. This research shows the importance of analysing retrieval algorithms with respect to both bias and performance to ensure they minimize any unwanted or unintended biases when maximising performance. Further work is required to examine this phenomenon in more detail and to design fielded retrieval models that have the advantages of control and performance without detrimental biases.

KW - algorithmic bias

KW - information retrieval

KW - search engine bias

UR - https://www.asist.org/am18/

M3 - Paper

ER -

Wilkie C, Azzopardi L. The impact of fielding on retrieval performance and bias. 2018. Paper presented at ASIS&T Annual Meeting 2018, Vancouver, Canada.