Building bilingual dictionaries from parallel web documents

C.J.A. McEwan, I. Ounis, I. Ruthven, F. Crestani (Editor), M. Lalmas (Editor)

Research output: Chapter in Book/Report/Conference proceedingChapter

16 Citations (Scopus)
24 Downloads (Pure)


In this paper we describe a system for automatically constructing a bilingual dictionary for cross-language information retrieval applications. We describe how we automatically target candidate parallel documents, filter the candidate documents and process them to create parallel sentences. The parallel sentences are then automatically translated using an adaptation of the EMIM technique and a dictionary of translation terms is created. We evaluate our dictionary using human experts. The evaluation showed that the system performs well. In addition the results obtained from automatically-created corpora are comparable to those obtained from manually created corpora of parallel documents. Compared to other available techniques, our approach has the advantage of being simple, uniform, and easy-to-implement while providing encouraging results.
Original languageEnglish
Title of host publicationAdvances in Information Retrieval
Place of PublicationGermany
Number of pages20
ISBN (Print)978-3-540-43343-9
Publication statusPublished - 27 Mar 2002

Publication series

NameLecture Notes in Computer Science


  • bilingual dictionaries
  • web documents
  • information retrieval
  • searching


Dive into the research topics of 'Building bilingual dictionaries from parallel web documents'. Together they form a unique fingerprint.

Cite this