A bounded distance metric for comparing tree structure

R. Connor, F. Simeoni, M. Iakovos, R. Moss

Research output: Contribution to journalArticle

17 Citations (Scopus)

Abstract

Comparing tree-structured data for structural similarity is a recurring theme and one on which much effort has been spent. Most approaches so far are grounded, implicitly or explicitly, in algorithmic information theory, being approximations to an information distance derived from Kolmogorov complexity. In this paper we propose a novel complexity metric, also grounded in information theory, but calculated via Shannon's entropy equations. This is used to formulate a directly and efficiently computable metric for the structural difference between unordered trees. The paper explains the derivation of the metric in terms of information theory, and proves the essential property that it is a distance metric. The property of boundedness means that the metric can be used in contexts such as clustering, where second-order comparisons are required. The distance metric property means that the metric can be used in the context of similarity search and metric spaces in general, allowing trees to be indexed and stored within this domain. We are not aware of any other tree similarity metric with these properties.
LanguageEnglish
Pages748-764
Number of pages17
JournalInformation Systems
Volume36
Issue number4
DOIs
Publication statusPublished - Jun 2011

Fingerprint

Information theory
Entropy

Keywords

  • unordered tree
  • tree comparison
  • distance metric
  • algorithmic information theory
  • information content
  • information distance
  • entropy

Cite this

Connor, R., Simeoni, F., Iakovos, M., & Moss, R. (2011). A bounded distance metric for comparing tree structure. Information Systems, 36(4), 748-764. https://doi.org/10.1016/j.is.2010.12.003
Connor, R. ; Simeoni, F. ; Iakovos, M. ; Moss, R. / A bounded distance metric for comparing tree structure. In: Information Systems. 2011 ; Vol. 36, No. 4. pp. 748-764.
@article{16753fd0676246a0940f789541e26543,
title = "A bounded distance metric for comparing tree structure",
abstract = "Comparing tree-structured data for structural similarity is a recurring theme and one on which much effort has been spent. Most approaches so far are grounded, implicitly or explicitly, in algorithmic information theory, being approximations to an information distance derived from Kolmogorov complexity. In this paper we propose a novel complexity metric, also grounded in information theory, but calculated via Shannon's entropy equations. This is used to formulate a directly and efficiently computable metric for the structural difference between unordered trees. The paper explains the derivation of the metric in terms of information theory, and proves the essential property that it is a distance metric. The property of boundedness means that the metric can be used in contexts such as clustering, where second-order comparisons are required. The distance metric property means that the metric can be used in the context of similarity search and metric spaces in general, allowing trees to be indexed and stored within this domain. We are not aware of any other tree similarity metric with these properties.",
keywords = "unordered tree, tree comparison, distance metric, algorithmic information theory, information content, information distance, entropy",
author = "R. Connor and F. Simeoni and M. Iakovos and R. Moss",
year = "2011",
month = "6",
doi = "10.1016/j.is.2010.12.003",
language = "English",
volume = "36",
pages = "748--764",
journal = "Information Systems",
issn = "0306-4379",
number = "4",

}

Connor, R, Simeoni, F, Iakovos, M & Moss, R 2011, 'A bounded distance metric for comparing tree structure' Information Systems, vol. 36, no. 4, pp. 748-764. https://doi.org/10.1016/j.is.2010.12.003

A bounded distance metric for comparing tree structure. / Connor, R.; Simeoni, F.; Iakovos, M.; Moss, R.

In: Information Systems, Vol. 36, No. 4, 06.2011, p. 748-764.

Research output: Contribution to journalArticle

TY - JOUR

T1 - A bounded distance metric for comparing tree structure

AU - Connor, R.

AU - Simeoni, F.

AU - Iakovos, M.

AU - Moss, R.

PY - 2011/6

Y1 - 2011/6

N2 - Comparing tree-structured data for structural similarity is a recurring theme and one on which much effort has been spent. Most approaches so far are grounded, implicitly or explicitly, in algorithmic information theory, being approximations to an information distance derived from Kolmogorov complexity. In this paper we propose a novel complexity metric, also grounded in information theory, but calculated via Shannon's entropy equations. This is used to formulate a directly and efficiently computable metric for the structural difference between unordered trees. The paper explains the derivation of the metric in terms of information theory, and proves the essential property that it is a distance metric. The property of boundedness means that the metric can be used in contexts such as clustering, where second-order comparisons are required. The distance metric property means that the metric can be used in the context of similarity search and metric spaces in general, allowing trees to be indexed and stored within this domain. We are not aware of any other tree similarity metric with these properties.

AB - Comparing tree-structured data for structural similarity is a recurring theme and one on which much effort has been spent. Most approaches so far are grounded, implicitly or explicitly, in algorithmic information theory, being approximations to an information distance derived from Kolmogorov complexity. In this paper we propose a novel complexity metric, also grounded in information theory, but calculated via Shannon's entropy equations. This is used to formulate a directly and efficiently computable metric for the structural difference between unordered trees. The paper explains the derivation of the metric in terms of information theory, and proves the essential property that it is a distance metric. The property of boundedness means that the metric can be used in contexts such as clustering, where second-order comparisons are required. The distance metric property means that the metric can be used in the context of similarity search and metric spaces in general, allowing trees to be indexed and stored within this domain. We are not aware of any other tree similarity metric with these properties.

KW - unordered tree

KW - tree comparison

KW - distance metric

KW - algorithmic information theory

KW - information content

KW - information distance

KW - entropy

U2 - 10.1016/j.is.2010.12.003

DO - 10.1016/j.is.2010.12.003

M3 - Article

VL - 36

SP - 748

EP - 764

JO - Information Systems

T2 - Information Systems

JF - Information Systems

SN - 0306-4379

IS - 4

ER -