Reference point hyperplane trees

Richard Connor

Research output: Chapter in Book/Report/Conference proceedingConference contribution book

Abstract

We make the simple observation that, the deeper a data item is within the tree, the higher the probability of that item being excluded from a search. Assuming a fixed and independent probability p of any subtree being excluded at query time, the probability of an individual data item being accessed is (1-p)d for a node at depth d. In a balanced binary tree half of the data will be at the maximum depth of the tree so this effect should be significant and observable. We test this hypothesis with two experiments on partition trees. First, we force a balance by adjusting the partition/exclusion criteria, and compare this with unbalanced trees where the mean data depth is greater. Second, we compare a generic hyperplane tree with a monotone hyperplane tree, where also the mean depth is greater. In both cases the tree with the greater mean data depth performs better in high-dimensional spaces. We then experiment with increasing the mean depth of nodes by using a small, fixed set of reference points to make exclusion decisions over the whole tree, so that almost all of the data resides at the maximum depth. Again this can be seen to reduce the overall cost of indexing. Furthermore, we observe that having already calculated reference point distances for all data, a final filtering can be applied if the distance table is retained. This reduces further the number of distance calculations required, whilst retaining scalability. The final structure can in fact be viewed as a hybrid between a generic hyperplane tree and a LAESA search structure.
LanguageEnglish
Title of host publication9th International Conference on Similarity Search and Applications
EditorsLaurent Amsaleg, Michael E. Houle, Erich Shubert
Place of PublicationNew York
PublisherSpringer-Verlag
Pages65-78
Number of pages14
Volume9939
ISBN (Print)978-3-319-46758-0
DOIs
Publication statusPublished - 24 Oct 2016
EventSISAP 2016 - 9th International Conference on Similarity Searches and Applications - Tokyo, Japan
Duration: 24 Oct 201626 Oct 2016

Publication series

NameLecture Notes in Computing Science
PublisherSpringer-Verlag
ISSN (Print)0302-9743

Conference

ConferenceSISAP 2016 - 9th International Conference on Similarity Searches and Applications
CountryJapan
CityTokyo
Period24/10/1626/10/16

Fingerprint

Binary trees
Scalability
Experiments
Costs

Keywords

  • metric search
  • partition tree
  • reference point
  • monotonic hyperplane tree
  • LAESA

Cite this

Connor, R. (2016). Reference point hyperplane trees. In L. Amsaleg, M. E. Houle, & E. Shubert (Eds.), 9th International Conference on Similarity Search and Applications (Vol. 9939, pp. 65-78). (Lecture Notes in Computing Science). New York: Springer-Verlag. https://doi.org/10.1007/978-3-319-46759-7
Connor, Richard. / Reference point hyperplane trees. 9th International Conference on Similarity Search and Applications. editor / Laurent Amsaleg ; Michael E. Houle ; Erich Shubert. Vol. 9939 New York : Springer-Verlag, 2016. pp. 65-78 (Lecture Notes in Computing Science).
@inproceedings{33ee79857dcd4affaf1f1a149d425d67,
title = "Reference point hyperplane trees",
abstract = "We make the simple observation that, the deeper a data item is within the tree, the higher the probability of that item being excluded from a search. Assuming a fixed and independent probability p of any subtree being excluded at query time, the probability of an individual data item being accessed is (1-p)d for a node at depth d. In a balanced binary tree half of the data will be at the maximum depth of the tree so this effect should be significant and observable. We test this hypothesis with two experiments on partition trees. First, we force a balance by adjusting the partition/exclusion criteria, and compare this with unbalanced trees where the mean data depth is greater. Second, we compare a generic hyperplane tree with a monotone hyperplane tree, where also the mean depth is greater. In both cases the tree with the greater mean data depth performs better in high-dimensional spaces. We then experiment with increasing the mean depth of nodes by using a small, fixed set of reference points to make exclusion decisions over the whole tree, so that almost all of the data resides at the maximum depth. Again this can be seen to reduce the overall cost of indexing. Furthermore, we observe that having already calculated reference point distances for all data, a final filtering can be applied if the distance table is retained. This reduces further the number of distance calculations required, whilst retaining scalability. The final structure can in fact be viewed as a hybrid between a generic hyperplane tree and a LAESA search structure.",
keywords = "metric search, partition tree, reference point, monotonic hyperplane tree, LAESA",
author = "Richard Connor",
note = "The final publication is available at Springer via http://https:doi.org/10.1007/978-3-319-46759-7",
year = "2016",
month = "10",
day = "24",
doi = "10.1007/978-3-319-46759-7",
language = "English",
isbn = "978-3-319-46758-0",
volume = "9939",
series = "Lecture Notes in Computing Science",
publisher = "Springer-Verlag",
pages = "65--78",
editor = "Laurent Amsaleg and Houle, {Michael E.} and Erich Shubert",
booktitle = "9th International Conference on Similarity Search and Applications",

}

Connor, R 2016, Reference point hyperplane trees. in L Amsaleg, ME Houle & E Shubert (eds), 9th International Conference on Similarity Search and Applications. vol. 9939, Lecture Notes in Computing Science, Springer-Verlag, New York, pp. 65-78, SISAP 2016 - 9th International Conference on Similarity Searches and Applications, Tokyo, Japan, 24/10/16. https://doi.org/10.1007/978-3-319-46759-7

Reference point hyperplane trees. / Connor, Richard.

9th International Conference on Similarity Search and Applications. ed. / Laurent Amsaleg; Michael E. Houle; Erich Shubert. Vol. 9939 New York : Springer-Verlag, 2016. p. 65-78 (Lecture Notes in Computing Science).

Research output: Chapter in Book/Report/Conference proceedingConference contribution book

TY - GEN

T1 - Reference point hyperplane trees

AU - Connor, Richard

N1 - The final publication is available at Springer via http://https:doi.org/10.1007/978-3-319-46759-7

PY - 2016/10/24

Y1 - 2016/10/24

N2 - We make the simple observation that, the deeper a data item is within the tree, the higher the probability of that item being excluded from a search. Assuming a fixed and independent probability p of any subtree being excluded at query time, the probability of an individual data item being accessed is (1-p)d for a node at depth d. In a balanced binary tree half of the data will be at the maximum depth of the tree so this effect should be significant and observable. We test this hypothesis with two experiments on partition trees. First, we force a balance by adjusting the partition/exclusion criteria, and compare this with unbalanced trees where the mean data depth is greater. Second, we compare a generic hyperplane tree with a monotone hyperplane tree, where also the mean depth is greater. In both cases the tree with the greater mean data depth performs better in high-dimensional spaces. We then experiment with increasing the mean depth of nodes by using a small, fixed set of reference points to make exclusion decisions over the whole tree, so that almost all of the data resides at the maximum depth. Again this can be seen to reduce the overall cost of indexing. Furthermore, we observe that having already calculated reference point distances for all data, a final filtering can be applied if the distance table is retained. This reduces further the number of distance calculations required, whilst retaining scalability. The final structure can in fact be viewed as a hybrid between a generic hyperplane tree and a LAESA search structure.

AB - We make the simple observation that, the deeper a data item is within the tree, the higher the probability of that item being excluded from a search. Assuming a fixed and independent probability p of any subtree being excluded at query time, the probability of an individual data item being accessed is (1-p)d for a node at depth d. In a balanced binary tree half of the data will be at the maximum depth of the tree so this effect should be significant and observable. We test this hypothesis with two experiments on partition trees. First, we force a balance by adjusting the partition/exclusion criteria, and compare this with unbalanced trees where the mean data depth is greater. Second, we compare a generic hyperplane tree with a monotone hyperplane tree, where also the mean depth is greater. In both cases the tree with the greater mean data depth performs better in high-dimensional spaces. We then experiment with increasing the mean depth of nodes by using a small, fixed set of reference points to make exclusion decisions over the whole tree, so that almost all of the data resides at the maximum depth. Again this can be seen to reduce the overall cost of indexing. Furthermore, we observe that having already calculated reference point distances for all data, a final filtering can be applied if the distance table is retained. This reduces further the number of distance calculations required, whilst retaining scalability. The final structure can in fact be viewed as a hybrid between a generic hyperplane tree and a LAESA search structure.

KW - metric search

KW - partition tree

KW - reference point

KW - monotonic hyperplane tree

KW - LAESA

UR - http://www.sisap.org/2016/

UR - http://www.springer.com/gb/computer-science/lncs

UR - http://www.springer.com/

U2 - 10.1007/978-3-319-46759-7

DO - 10.1007/978-3-319-46759-7

M3 - Conference contribution book

SN - 978-3-319-46758-0

VL - 9939

T3 - Lecture Notes in Computing Science

SP - 65

EP - 78

BT - 9th International Conference on Similarity Search and Applications

A2 - Amsaleg, Laurent

A2 - Houle, Michael E.

A2 - Shubert, Erich

PB - Springer-Verlag

CY - New York

ER -

Connor R. Reference point hyperplane trees. In Amsaleg L, Houle ME, Shubert E, editors, 9th International Conference on Similarity Search and Applications. Vol. 9939. New York: Springer-Verlag. 2016. p. 65-78. (Lecture Notes in Computing Science). https://doi.org/10.1007/978-3-319-46759-7