Building bilingual dictionaries from parallel web documents

C.J.A. McEwan, I. Ounis, I. Ruthven, F. Crestani (Editor), M. Lalmas (Editor)

Research output: Chapter in Book/Report/Conference proceedingChapter

12 Citations (Scopus)

Abstract

In this paper we describe a system for automatically constructing a bilingual dictionary for cross-language information retrieval applications. We describe how we automatically target candidate parallel documents, filter the candidate documents and process them to create parallel sentences. The parallel sentences are then automatically translated using an adaptation of the EMIM technique and a dictionary of translation terms is created. We evaluate our dictionary using human experts. The evaluation showed that the system performs well. In addition the results obtained from automatically-created corpora are comparable to those obtained from manually created corpora of parallel documents. Compared to other available techniques, our approach has the advantage of being simple, uniform, and easy-to-implement while providing encouraging results.
LanguageEnglish
Title of host publicationAdvances in Information Retrieval
Place of PublicationGermany
PublisherSpringer
Pages303-323
Number of pages20
Volume2291
ISBN (Print)978-3-540-43343-9
DOIs
Publication statusPublished - 27 Mar 2002

Publication series

NameLecture Notes in Computer Science
PublisherSpringer

Fingerprint

Glossaries
Query languages

Keywords

  • bilingual dictionaries
  • web documents
  • information retrieval
  • searching

Cite this

McEwan, C. J. A., Ounis, I., Ruthven, I., Crestani, F. (Ed.), & Lalmas, M. (Ed.) (2002). Building bilingual dictionaries from parallel web documents. In Advances in Information Retrieval (Vol. 2291, pp. 303-323). (Lecture Notes in Computer Science). Germany: Springer. https://doi.org/10.1007/3-540-45886-7_20
McEwan, C.J.A. ; Ounis, I. ; Ruthven, I. ; Crestani, F. (Editor) ; Lalmas, M. (Editor). / Building bilingual dictionaries from parallel web documents. Advances in Information Retrieval. Vol. 2291 Germany : Springer, 2002. pp. 303-323 (Lecture Notes in Computer Science).
@inbook{019de35e31de41fb8654c492170cfdd9,
title = "Building bilingual dictionaries from parallel web documents",
abstract = "In this paper we describe a system for automatically constructing a bilingual dictionary for cross-language information retrieval applications. We describe how we automatically target candidate parallel documents, filter the candidate documents and process them to create parallel sentences. The parallel sentences are then automatically translated using an adaptation of the EMIM technique and a dictionary of translation terms is created. We evaluate our dictionary using human experts. The evaluation showed that the system performs well. In addition the results obtained from automatically-created corpora are comparable to those obtained from manually created corpora of parallel documents. Compared to other available techniques, our approach has the advantage of being simple, uniform, and easy-to-implement while providing encouraging results.",
keywords = "bilingual dictionaries, web documents, information retrieval, searching",
author = "C.J.A. McEwan and I. Ounis and I. Ruthven and F. Crestani and M. Lalmas",
year = "2002",
month = "3",
day = "27",
doi = "10.1007/3-540-45886-7_20",
language = "English",
isbn = "978-3-540-43343-9",
volume = "2291",
series = "Lecture Notes in Computer Science",
publisher = "Springer",
pages = "303--323",
booktitle = "Advances in Information Retrieval",

}

McEwan, CJA, Ounis, I, Ruthven, I, Crestani, F (ed.) & Lalmas, M (ed.) 2002, Building bilingual dictionaries from parallel web documents. in Advances in Information Retrieval. vol. 2291, Lecture Notes in Computer Science, Springer, Germany, pp. 303-323. https://doi.org/10.1007/3-540-45886-7_20

Building bilingual dictionaries from parallel web documents. / McEwan, C.J.A.; Ounis, I.; Ruthven, I.; Crestani, F. (Editor); Lalmas, M. (Editor).

Advances in Information Retrieval. Vol. 2291 Germany : Springer, 2002. p. 303-323 (Lecture Notes in Computer Science).

Research output: Chapter in Book/Report/Conference proceedingChapter

TY - CHAP

T1 - Building bilingual dictionaries from parallel web documents

AU - McEwan, C.J.A.

AU - Ounis, I.

AU - Ruthven, I.

A2 - Crestani, F.

A2 - Lalmas, M.

PY - 2002/3/27

Y1 - 2002/3/27

N2 - In this paper we describe a system for automatically constructing a bilingual dictionary for cross-language information retrieval applications. We describe how we automatically target candidate parallel documents, filter the candidate documents and process them to create parallel sentences. The parallel sentences are then automatically translated using an adaptation of the EMIM technique and a dictionary of translation terms is created. We evaluate our dictionary using human experts. The evaluation showed that the system performs well. In addition the results obtained from automatically-created corpora are comparable to those obtained from manually created corpora of parallel documents. Compared to other available techniques, our approach has the advantage of being simple, uniform, and easy-to-implement while providing encouraging results.

AB - In this paper we describe a system for automatically constructing a bilingual dictionary for cross-language information retrieval applications. We describe how we automatically target candidate parallel documents, filter the candidate documents and process them to create parallel sentences. The parallel sentences are then automatically translated using an adaptation of the EMIM technique and a dictionary of translation terms is created. We evaluate our dictionary using human experts. The evaluation showed that the system performs well. In addition the results obtained from automatically-created corpora are comparable to those obtained from manually created corpora of parallel documents. Compared to other available techniques, our approach has the advantage of being simple, uniform, and easy-to-implement while providing encouraging results.

KW - bilingual dictionaries

KW - web documents

KW - information retrieval

KW - searching

UR - http://www.cis.strath.ac.uk/research/publications/papers/strath_cis_publication_143.pdf

U2 - 10.1007/3-540-45886-7_20

DO - 10.1007/3-540-45886-7_20

M3 - Chapter

SN - 978-3-540-43343-9

VL - 2291

T3 - Lecture Notes in Computer Science

SP - 303

EP - 323

BT - Advances in Information Retrieval

PB - Springer

CY - Germany

ER -

McEwan CJA, Ounis I, Ruthven I, Crestani F, (ed.), Lalmas M, (ed.). Building bilingual dictionaries from parallel web documents. In Advances in Information Retrieval. Vol. 2291. Germany: Springer. 2002. p. 303-323. (Lecture Notes in Computer Science). https://doi.org/10.1007/3-540-45886-7_20