Combination of Similarity Measures for Effective Spoken Document Retrieval

F. Crestani

Research output: Contribution to journalArticle

10 Citations (Scopus)

Abstract

Often users of information retrieval systems and document authors use different terms to refer to the same concept. For this simple reason, information retrieval is affected by the 'term mismatch' problem. The term mismatch problem does not only have the effect of hindering the retrieval of relevant documents, it also produces bad rankings of relevant documents. A similar problem can be found in spoken document retrieval, where terms misrecognized by the speech recognition process can hinder the retrieval of potentially relevant spoken documents. We will call this problem 'term misrecognition', by analogy to the term mismatch problem. This paper presents two classes of retrieval models that attempt to tackle both the term mismatch and the term misrecognition problems at retrieval time using term similarity information. The models use either complete or partial knowledge of semantic and phonetic term similarity, evaluated using statistical methods from the corpus.
LanguageEnglish
Pages87-96
Number of pages9
JournalJournal of Information Science
Volume29
Issue number2
DOIs
Publication statusPublished - 2003

Fingerprint

mismatch
Information retrieval systems
Speech analysis
Information retrieval
Speech recognition
Statistical methods
Semantics
information retrieval
statistical method
phonetics
ranking
semantics

Keywords

  • similarity measures
  • information retrieval
  • spoken document retrieval

Cite this

@article{9b90a672225841c58d925ab39b8f2d01,
title = "Combination of Similarity Measures for Effective Spoken Document Retrieval",
abstract = "Often users of information retrieval systems and document authors use different terms to refer to the same concept. For this simple reason, information retrieval is affected by the 'term mismatch' problem. The term mismatch problem does not only have the effect of hindering the retrieval of relevant documents, it also produces bad rankings of relevant documents. A similar problem can be found in spoken document retrieval, where terms misrecognized by the speech recognition process can hinder the retrieval of potentially relevant spoken documents. We will call this problem 'term misrecognition', by analogy to the term mismatch problem. This paper presents two classes of retrieval models that attempt to tackle both the term mismatch and the term misrecognition problems at retrieval time using term similarity information. The models use either complete or partial knowledge of semantic and phonetic term similarity, evaluated using statistical methods from the corpus.",
keywords = "similarity measures, information retrieval, spoken document retrieval",
author = "F. Crestani",
year = "2003",
doi = "10.1177/016555150302900201",
language = "English",
volume = "29",
pages = "87--96",
journal = "Journal of Information Science",
issn = "0165-5515",
number = "2",

}

Combination of Similarity Measures for Effective Spoken Document Retrieval. / Crestani, F.

In: Journal of Information Science, Vol. 29, No. 2, 2003, p. 87-96.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Combination of Similarity Measures for Effective Spoken Document Retrieval

AU - Crestani, F.

PY - 2003

Y1 - 2003

N2 - Often users of information retrieval systems and document authors use different terms to refer to the same concept. For this simple reason, information retrieval is affected by the 'term mismatch' problem. The term mismatch problem does not only have the effect of hindering the retrieval of relevant documents, it also produces bad rankings of relevant documents. A similar problem can be found in spoken document retrieval, where terms misrecognized by the speech recognition process can hinder the retrieval of potentially relevant spoken documents. We will call this problem 'term misrecognition', by analogy to the term mismatch problem. This paper presents two classes of retrieval models that attempt to tackle both the term mismatch and the term misrecognition problems at retrieval time using term similarity information. The models use either complete or partial knowledge of semantic and phonetic term similarity, evaluated using statistical methods from the corpus.

AB - Often users of information retrieval systems and document authors use different terms to refer to the same concept. For this simple reason, information retrieval is affected by the 'term mismatch' problem. The term mismatch problem does not only have the effect of hindering the retrieval of relevant documents, it also produces bad rankings of relevant documents. A similar problem can be found in spoken document retrieval, where terms misrecognized by the speech recognition process can hinder the retrieval of potentially relevant spoken documents. We will call this problem 'term misrecognition', by analogy to the term mismatch problem. This paper presents two classes of retrieval models that attempt to tackle both the term mismatch and the term misrecognition problems at retrieval time using term similarity information. The models use either complete or partial knowledge of semantic and phonetic term similarity, evaluated using statistical methods from the corpus.

KW - similarity measures

KW - information retrieval

KW - spoken document retrieval

UR - http://www.cis.strath.ac.uk/research/publications/papers/strath_cis_publication_187.pdf

UR - http://dx.doi.org/10.1177/016555150302900201

U2 - 10.1177/016555150302900201

DO - 10.1177/016555150302900201

M3 - Article

VL - 29

SP - 87

EP - 96

JO - Journal of Information Science

T2 - Journal of Information Science

JF - Journal of Information Science

SN - 0165-5515

IS - 2

ER -