Abstract
It is well known that, as the dimensionality of a metric space increases, metric search techniques become less effective and the cost of indexing mechanisms becomes greater than the saving they give. This is due to the so-called curse of dimensionality.
One effect of increasing dimensionality is that the ratio of unit hypersphere to unit hypercube volume decreases rapidly, making the solution to a similarity query (the query ball, or hypersphere) ever more difficult to identify by using metric invariants such as triangle inequality.
In this paper we take a different approach, by identifying points within a query polyhedron rather than a ball. We show how this can be achieved by constructing a surrogate metric space, such that a query ball in the surrogate space corresponds to a polyhedron in the original space. If the polyhedron contains the ball, the overall cost of the query is likely to be increased in high dimensions; however, we show that shrinking the polyhedron can capture a surprisingly high proportion of the points within the ball, whilst at the same time giving a more efficient, and more scalable, search.
We show results which confirm our underlying hypothesis. In some cases we can retrieve significant volumes of query results from spaces which are otherwise intractable.
One effect of increasing dimensionality is that the ratio of unit hypersphere to unit hypercube volume decreases rapidly, making the solution to a similarity query (the query ball, or hypersphere) ever more difficult to identify by using metric invariants such as triangle inequality.
In this paper we take a different approach, by identifying points within a query polyhedron rather than a ball. We show how this can be achieved by constructing a surrogate metric space, such that a query ball in the surrogate space corresponds to a polyhedron in the original space. If the polyhedron contains the ball, the overall cost of the query is likely to be increased in high dimensions; however, we show that shrinking the polyhedron can capture a surprisingly high proportion of the points within the ball, whilst at the same time giving a more efficient, and more scalable, search.
We show results which confirm our underlying hypothesis. In some cases we can retrieve significant volumes of query results from spaces which are otherwise intractable.
Original language | English |
---|---|
Title of host publication | Similarity Search and Applications |
Subtitle of host publication | 7th International Conference, SISAP 2014, Los Cabos, Mexico, October 29-31, 2014. Proceedings |
Publisher | Springer-Verlag |
Pages | 176-188 |
Number of pages | 13 |
Volume | 8821 |
ISBN (Print) | 9783319119878 |
DOIs | |
Publication status | Published - 20 Oct 2014 |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Publisher | Springer International Publishing |
Volume | 8821 |
ISSN (Print) | 0302-9743 |
Keywords
- high dimensional search
- similarity search
- hypersphere
- hyperpolyhedron
- high performance search