Resolving person names in web people search

Krisztian Balog, Leif Azzopardi, Maarten De Rijke

Research output: Chapter in Book/Report/Conference proceedingChapter

10 Citations (Scopus)

Abstract

Disambiguating person names in a set of documents (such as a set of web pages returned in response to a person name) is a key task for the presentation of results and the automatic profiling of experts. With largely unstructured documents and an unknown number of people with the same name the problem presents many difficulties and challenges. This chapter treats the task of person name disambiguation as a document clustering problem, where it is assumed that the documents represent particular people. This leads to the person cluster hypothesis, which states that similar documents tend to represent the same person. Single Pass Clustering, k-Means Clustering, Agglomerative Clustering and Probabilistic Latent Semantic Analysis are employed and empirically evaluated in this context. On the SemEval 2007 Web People Search it is shown that the person cluster hypothesis holds reasonably well and that the Single Pass Clustering and Agglomerative Clustering methods provide the best performance.

Original languageEnglish
Title of host publicationWeaving Services and People on the World Wide Web
Place of PublicationBerlin
Pages301-323
Number of pages23
DOIs
Publication statusPublished - 1 Dec 2009

Keywords

  • similarity threshold
  • agglomerative cluster
  • latent topic
  • test collection
  • computational linguistics
  • person names
  • author disambiguation

Fingerprint Dive into the research topics of 'Resolving person names in web people search'. Together they form a unique fingerprint.

  • Cite this

    Balog, K., Azzopardi, L., & De Rijke, M. (2009). Resolving person names in web people search. In Weaving Services and People on the World Wide Web (pp. 301-323). https://doi.org/10.1007/978-3-642-00570-1_15