Validating simulated interaction for retrieval evaluation

Teemu Pääkkönen, Jaana Kekäläinen, Heikki Keskustalo, Leif Azzopardi, David Maxwell, Kalervo Järvelin

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

A searcher’s interaction with a retrieval system consists of actions such as query formulation, search result list interaction and document interaction. The simulation of searcher interaction has recently gained momentum in the analysis and evaluation of interactive information retrieval (IIR). However, a key issue that has not yet been adequately addressed is the validity of such IIR simulations and whether they reliably predict the performance obtained by a searcher across the session. The aim of this paper is to determine the validity of the common interaction model (CIM) typically used for simulating multi-query sessions. We focus on search result interactions, i.e., inspecting snippets, examining documents and deciding when to stop examining the results of a single query, or when to stop the whole session. To this end, we run a series of simulations grounded by real world behavioral data to show how accurate and responsive the model is to various experimental conditions under which the data were produced. We then validate on a second real world data set derived under similar experimental conditions. We seek to predict cumulated gain across the session. We find that the interaction model with a query-level stopping strategy based on consecutive non-relevant snippets leads to the highest prediction accuracy, and lowest deviation from ground truth, around 9 to 15% depending on the experimental conditions. To our knowledge, the present study is the first validation effort of the CIM that shows that the model’s acceptance and use is justified within IIR evaluations. We also identify and discuss ways to further improve the CIM and its behavioral parameters for more accurate simulations.
LanguageEnglish
Number of pages25
JournalInformation Retrieval Journal
DOIs
Publication statusPublished - 6 May 2017

Fingerprint

interaction
Information retrieval
evaluation
information retrieval
simulation
system of action
Momentum
acceptance
performance

Keywords

  • session-based evaluation
  • IR interaction
  • simulations
  • retrieval systems
  • interactive information retrieval
  • common interaction model

Cite this

Pääkkönen, Teemu ; Kekäläinen, Jaana ; Keskustalo, Heikki ; Azzopardi, Leif ; Maxwell, David ; Järvelin, Kalervo. / Validating simulated interaction for retrieval evaluation. In: Information Retrieval Journal. 2017.
@article{c79e155f371a4f43a3c7e959df9c7c09,
title = "Validating simulated interaction for retrieval evaluation",
abstract = "A searcher’s interaction with a retrieval system consists of actions such as query formulation, search result list interaction and document interaction. The simulation of searcher interaction has recently gained momentum in the analysis and evaluation of interactive information retrieval (IIR). However, a key issue that has not yet been adequately addressed is the validity of such IIR simulations and whether they reliably predict the performance obtained by a searcher across the session. The aim of this paper is to determine the validity of the common interaction model (CIM) typically used for simulating multi-query sessions. We focus on search result interactions, i.e., inspecting snippets, examining documents and deciding when to stop examining the results of a single query, or when to stop the whole session. To this end, we run a series of simulations grounded by real world behavioral data to show how accurate and responsive the model is to various experimental conditions under which the data were produced. We then validate on a second real world data set derived under similar experimental conditions. We seek to predict cumulated gain across the session. We find that the interaction model with a query-level stopping strategy based on consecutive non-relevant snippets leads to the highest prediction accuracy, and lowest deviation from ground truth, around 9 to 15{\%} depending on the experimental conditions. To our knowledge, the present study is the first validation effort of the CIM that shows that the model’s acceptance and use is justified within IIR evaluations. We also identify and discuss ways to further improve the CIM and its behavioral parameters for more accurate simulations.",
keywords = "session-based evaluation, IR interaction, simulations, retrieval systems, interactive information retrieval, common interaction model",
author = "Teemu P{\"a}{\"a}kk{\"o}nen and Jaana Kek{\"a}l{\"a}inen and Heikki Keskustalo and Leif Azzopardi and David Maxwell and Kalervo J{\"a}rvelin",
year = "2017",
month = "5",
day = "6",
doi = "10.1007/s10791-017-9301-2",
language = "English",
journal = "Information Retrieval Journal",
issn = "1386-4564",
publisher = "Springer Netherlands",

}

Validating simulated interaction for retrieval evaluation. / Pääkkönen, Teemu; Kekäläinen, Jaana; Keskustalo, Heikki; Azzopardi, Leif; Maxwell, David; Järvelin, Kalervo.

In: Information Retrieval Journal, 06.05.2017.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Validating simulated interaction for retrieval evaluation

AU - Pääkkönen, Teemu

AU - Kekäläinen, Jaana

AU - Keskustalo, Heikki

AU - Azzopardi, Leif

AU - Maxwell, David

AU - Järvelin, Kalervo

PY - 2017/5/6

Y1 - 2017/5/6

N2 - A searcher’s interaction with a retrieval system consists of actions such as query formulation, search result list interaction and document interaction. The simulation of searcher interaction has recently gained momentum in the analysis and evaluation of interactive information retrieval (IIR). However, a key issue that has not yet been adequately addressed is the validity of such IIR simulations and whether they reliably predict the performance obtained by a searcher across the session. The aim of this paper is to determine the validity of the common interaction model (CIM) typically used for simulating multi-query sessions. We focus on search result interactions, i.e., inspecting snippets, examining documents and deciding when to stop examining the results of a single query, or when to stop the whole session. To this end, we run a series of simulations grounded by real world behavioral data to show how accurate and responsive the model is to various experimental conditions under which the data were produced. We then validate on a second real world data set derived under similar experimental conditions. We seek to predict cumulated gain across the session. We find that the interaction model with a query-level stopping strategy based on consecutive non-relevant snippets leads to the highest prediction accuracy, and lowest deviation from ground truth, around 9 to 15% depending on the experimental conditions. To our knowledge, the present study is the first validation effort of the CIM that shows that the model’s acceptance and use is justified within IIR evaluations. We also identify and discuss ways to further improve the CIM and its behavioral parameters for more accurate simulations.

AB - A searcher’s interaction with a retrieval system consists of actions such as query formulation, search result list interaction and document interaction. The simulation of searcher interaction has recently gained momentum in the analysis and evaluation of interactive information retrieval (IIR). However, a key issue that has not yet been adequately addressed is the validity of such IIR simulations and whether they reliably predict the performance obtained by a searcher across the session. The aim of this paper is to determine the validity of the common interaction model (CIM) typically used for simulating multi-query sessions. We focus on search result interactions, i.e., inspecting snippets, examining documents and deciding when to stop examining the results of a single query, or when to stop the whole session. To this end, we run a series of simulations grounded by real world behavioral data to show how accurate and responsive the model is to various experimental conditions under which the data were produced. We then validate on a second real world data set derived under similar experimental conditions. We seek to predict cumulated gain across the session. We find that the interaction model with a query-level stopping strategy based on consecutive non-relevant snippets leads to the highest prediction accuracy, and lowest deviation from ground truth, around 9 to 15% depending on the experimental conditions. To our knowledge, the present study is the first validation effort of the CIM that shows that the model’s acceptance and use is justified within IIR evaluations. We also identify and discuss ways to further improve the CIM and its behavioral parameters for more accurate simulations.

KW - session-based evaluation

KW - IR interaction

KW - simulations

KW - retrieval systems

KW - interactive information retrieval

KW - common interaction model

U2 - 10.1007/s10791-017-9301-2

DO - 10.1007/s10791-017-9301-2

M3 - Article

JO - Information Retrieval Journal

T2 - Information Retrieval Journal

JF - Information Retrieval Journal

SN - 1386-4564

ER -