Abstract
Jensen-Shannon divergence is a symmetrised, smoothed version of Küllback-Leibler. It has been shown to be the square of a proper distance metric, and has other properties which make it an excellent choice for many high-dimensional spaces in R*.
The metric as defined is however expensive to evaluate. In sparse spaces over many dimensions the Intrinsic Dimensionality of the metric space is typically very high, making similarity-based indexing ineffectual. Exhaustive searching over large data collections may be infeasible.
Using a property that allows the distance to be evaluated from only those dimensions which are non-zero in both arguments, and through the identification of a threshold function, we show that the cost of the function can be dramatically reduced.
The metric as defined is however expensive to evaluate. In sparse spaces over many dimensions the Intrinsic Dimensionality of the metric space is typically very high, making similarity-based indexing ineffectual. Exhaustive searching over large data collections may be infeasible.
Using a property that allows the distance to be evaluated from only those dimensions which are non-zero in both arguments, and through the identification of a threshold function, we show that the cost of the function can be dramatically reduced.
Original language | English |
---|---|
Title of host publication | Similarity Search and Applications |
Subtitle of host publication | 6th International Conference, SISAP 2013, A Coruña, Spain, October 2-4, 2013, Proceedings |
Editors | Nieves Brisaboa, Oscar Pedreira, Pavel Zezula |
Place of Publication | Berlin |
Publisher | Springer |
Pages | 163-168 |
Number of pages | 6 |
Volume | 8199 |
ISBN (Print) | 9783642410611 |
DOIs | |
Publication status | Published - 13 Sept 2013 |
Event | 6th International Conference on Similarity Search and Applications, SISAP 2013 - Hotel Riazor, A Coruña, Spain Duration: 2 Oct 2013 → 4 Oct 2013 |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Publisher | Springer |
Volume | 8199 |
ISSN (Print) | 0302-9743 |
Conference
Conference | 6th International Conference on Similarity Search and Applications, SISAP 2013 |
---|---|
Country/Territory | Spain |
City | A Coruña |
Period | 2/10/13 → 4/10/13 |
Keywords
- distance metrics
- exhaustive searching
- high dimensional spaces
- instrinsic dimensionalitites
- Jensen-Shannon divergence
- metric spaces
- other properties
- threshold functions
- artificial intelligence
- computer science