Improving social bookmark search using personalised latent variable language models

Morgan Harvey, Ian Ruthven, Mark J. Carman

Research output: Contribution to conferencePaper

13 Citations (Scopus)

Abstract

Social tagging systems have recently become very popular as a method of categorising information online and have been used to annotate a wide range of different resources. In such systems users are free to choose whatever keywords or 'tags' they wish to annotate each resource, resulting in a highly personalised, unrestricted vocabulary. While this freedom of choice has several notable advantages, it does come at the cost of making searching of these systems more difficult as the vocabulary problem introduced is more pronounced than in a normal information retrieval setting. In this paper we propose to use hidden topic models as a principled way of reducing the dimensionality of this data to provide more accurate resource rankings with higher recall. We first describe Latent Dirichlet Allocation (LDA), a sim- ple topic model and then introduce 2 extended models which can be used to personalise the results by including informa- tion about the user who made each annotation. We test these 3 models and compare them with 3 non-topic model baselines on a large data sample obtained from the Delicious social bookmarking site. Our evaluations show that our methods significantly outperform all of the baselines with the personalised models also improving significantly upon unpersonalised LDA.
LanguageEnglish
Pages485-494
Number of pages10
DOIs
Publication statusPublished - 12 Feb 2011
Event4th ACM International Conference on Web Search and Data Mining - Hong Kong, China
Duration: 9 Feb 201112 Feb 2011

Conference

Conference4th ACM International Conference on Web Search and Data Mining
Abbreviated titleWSDM 2011
CountryChina
CityHong Kong
Period9/02/1112/02/11

Fingerprint

language
vocabulary
resources
decision making leeway
social system
information retrieval
ranking
evaluation

Keywords

  • collaborative tagging
  • personalised search
  • social bookmarks
  • topic models
  • information search and retrieval
  • computational linguistics

Cite this

Harvey, M., Ruthven, I., & Carman, M. J. (2011). Improving social bookmark search using personalised latent variable language models. 485-494. Paper presented at 4th ACM International Conference on Web Search and Data Mining , Hong Kong, China. https://doi.org/10.1145/1935826.1935898
Harvey, Morgan ; Ruthven, Ian ; Carman, Mark J. / Improving social bookmark search using personalised latent variable language models. Paper presented at 4th ACM International Conference on Web Search and Data Mining , Hong Kong, China.10 p.
@conference{20fdb1bb52844a878bbb1b66ee97b950,
title = "Improving social bookmark search using personalised latent variable language models",
abstract = "Social tagging systems have recently become very popular as a method of categorising information online and have been used to annotate a wide range of different resources. In such systems users are free to choose whatever keywords or 'tags' they wish to annotate each resource, resulting in a highly personalised, unrestricted vocabulary. While this freedom of choice has several notable advantages, it does come at the cost of making searching of these systems more difficult as the vocabulary problem introduced is more pronounced than in a normal information retrieval setting. In this paper we propose to use hidden topic models as a principled way of reducing the dimensionality of this data to provide more accurate resource rankings with higher recall. We first describe Latent Dirichlet Allocation (LDA), a sim- ple topic model and then introduce 2 extended models which can be used to personalise the results by including informa- tion about the user who made each annotation. We test these 3 models and compare them with 3 non-topic model baselines on a large data sample obtained from the Delicious social bookmarking site. Our evaluations show that our methods significantly outperform all of the baselines with the personalised models also improving significantly upon unpersonalised LDA.",
keywords = "collaborative tagging, personalised search, social bookmarks, topic models, information search and retrieval, computational linguistics",
author = "Morgan Harvey and Ian Ruthven and Carman, {Mark J.}",
year = "2011",
month = "2",
day = "12",
doi = "10.1145/1935826.1935898",
language = "English",
pages = "485--494",
note = "4th ACM International Conference on Web Search and Data Mining , WSDM 2011 ; Conference date: 09-02-2011 Through 12-02-2011",

}

Harvey, M, Ruthven, I & Carman, MJ 2011, 'Improving social bookmark search using personalised latent variable language models' Paper presented at 4th ACM International Conference on Web Search and Data Mining , Hong Kong, China, 9/02/11 - 12/02/11, pp. 485-494. https://doi.org/10.1145/1935826.1935898

Improving social bookmark search using personalised latent variable language models. / Harvey, Morgan; Ruthven, Ian; Carman, Mark J.

2011. 485-494 Paper presented at 4th ACM International Conference on Web Search and Data Mining , Hong Kong, China.

Research output: Contribution to conferencePaper

TY - CONF

T1 - Improving social bookmark search using personalised latent variable language models

AU - Harvey, Morgan

AU - Ruthven, Ian

AU - Carman, Mark J.

PY - 2011/2/12

Y1 - 2011/2/12

N2 - Social tagging systems have recently become very popular as a method of categorising information online and have been used to annotate a wide range of different resources. In such systems users are free to choose whatever keywords or 'tags' they wish to annotate each resource, resulting in a highly personalised, unrestricted vocabulary. While this freedom of choice has several notable advantages, it does come at the cost of making searching of these systems more difficult as the vocabulary problem introduced is more pronounced than in a normal information retrieval setting. In this paper we propose to use hidden topic models as a principled way of reducing the dimensionality of this data to provide more accurate resource rankings with higher recall. We first describe Latent Dirichlet Allocation (LDA), a sim- ple topic model and then introduce 2 extended models which can be used to personalise the results by including informa- tion about the user who made each annotation. We test these 3 models and compare them with 3 non-topic model baselines on a large data sample obtained from the Delicious social bookmarking site. Our evaluations show that our methods significantly outperform all of the baselines with the personalised models also improving significantly upon unpersonalised LDA.

AB - Social tagging systems have recently become very popular as a method of categorising information online and have been used to annotate a wide range of different resources. In such systems users are free to choose whatever keywords or 'tags' they wish to annotate each resource, resulting in a highly personalised, unrestricted vocabulary. While this freedom of choice has several notable advantages, it does come at the cost of making searching of these systems more difficult as the vocabulary problem introduced is more pronounced than in a normal information retrieval setting. In this paper we propose to use hidden topic models as a principled way of reducing the dimensionality of this data to provide more accurate resource rankings with higher recall. We first describe Latent Dirichlet Allocation (LDA), a sim- ple topic model and then introduce 2 extended models which can be used to personalise the results by including informa- tion about the user who made each annotation. We test these 3 models and compare them with 3 non-topic model baselines on a large data sample obtained from the Delicious social bookmarking site. Our evaluations show that our methods significantly outperform all of the baselines with the personalised models also improving significantly upon unpersonalised LDA.

KW - collaborative tagging

KW - personalised search

KW - social bookmarks

KW - topic models

KW - information search and retrieval

KW - computational linguistics

U2 - 10.1145/1935826.1935898

DO - 10.1145/1935826.1935898

M3 - Paper

SP - 485

EP - 494

ER -

Harvey M, Ruthven I, Carman MJ. Improving social bookmark search using personalised latent variable language models. 2011. Paper presented at 4th ACM International Conference on Web Search and Data Mining , Hong Kong, China. https://doi.org/10.1145/1935826.1935898