TY - JOUR
T1 - Combination of Similarity Measures for Effective Spoken Document Retrieval
AU - Crestani, F.
PY - 2003
Y1 - 2003
N2 - Often users of information retrieval systems and document authors use different terms to refer to the same concept. For this simple reason, information retrieval is affected by the 'term mismatch' problem. The term mismatch problem does not only have the effect of hindering the retrieval of relevant documents, it also produces bad rankings of relevant documents. A similar problem can be found in spoken document retrieval, where terms misrecognized by the speech recognition process can hinder the retrieval of potentially relevant spoken documents. We will call this problem 'term misrecognition', by analogy to the term mismatch problem. This paper presents two classes of retrieval models that attempt to tackle both the term mismatch and the term misrecognition problems at retrieval time using term similarity information. The models use either complete or partial knowledge of semantic and phonetic term similarity, evaluated using statistical methods from the corpus.
AB - Often users of information retrieval systems and document authors use different terms to refer to the same concept. For this simple reason, information retrieval is affected by the 'term mismatch' problem. The term mismatch problem does not only have the effect of hindering the retrieval of relevant documents, it also produces bad rankings of relevant documents. A similar problem can be found in spoken document retrieval, where terms misrecognized by the speech recognition process can hinder the retrieval of potentially relevant spoken documents. We will call this problem 'term misrecognition', by analogy to the term mismatch problem. This paper presents two classes of retrieval models that attempt to tackle both the term mismatch and the term misrecognition problems at retrieval time using term similarity information. The models use either complete or partial knowledge of semantic and phonetic term similarity, evaluated using statistical methods from the corpus.
KW - similarity measures
KW - information retrieval
KW - spoken document retrieval
UR - http://www.cis.strath.ac.uk/research/publications/papers/strath_cis_publication_187.pdf
UR - http://dx.doi.org/10.1177/016555150302900201
U2 - 10.1177/016555150302900201
DO - 10.1177/016555150302900201
M3 - Article
SN - 0165-5515
VL - 29
SP - 87
EP - 96
JO - Journal of Information Science
JF - Journal of Information Science
IS - 2
ER -