A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces.
Roger Weber, Hans-Jörg Schek, Stephen Blott:
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces.
VLDB 1998: 194-205@inproceedings{DBLP:conf/vldb/WeberSB98,
author = {Roger Weber and
Hans-J{\"o}rg Schek and
Stephen Blott},
editor = {Ashish Gupta and
Oded Shmueli and
Jennifer Widom},
title = {A Quantitative Analysis and Performance Study for Similarity-Search
Methods in High-Dimensional Spaces},
booktitle = {VLDB'98, Proceedings of 24rd International Conference on Very
Large Data Bases, August 24-27, 1998, New York City, New York,
USA},
publisher = {Morgan Kaufmann},
year = {1998},
isbn = {1-55860-566-5},
pages = {194-205},
ee = {db/conf/vldb/WeberSB98.html},
crossref = {DBLP:conf/vldb/98},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
Abstract
For similarity search in high-dimensional vector spaces (or 'HDVSs'), researchers have proposed a number of new methods (or adaptations of existing methods) based, in the main, on data-space partitioning.
However, the performance of these methods generally degrades as dimensionality increases.
Although this phenomenon - known as the 'dimensional curse'- is well known, little or no quantitative analysis of the phenomenon is available.
In this paper, we provide a detailed analysis of partitioning and clustering techniques for similarity search in HDVSs.
We show formally that these methods exhibit linear complexity at high dimensionality, and that existing methods are outperformed on average by a simple sequential scan if the number of dimensions exceeds around 10.
Consequently, we come up with an alternative organization based on approximations to make the unavoidable sequential scan as fast as possible. We describe a simple vector approximation scheme, called VA-file, and report on an experimental evaluation of this and of two tree-based index methods (an R*-tree and an X-tree).
Copyright © 1998 by the VLDB Endowment.
Permission to copy without fee all or part of this material is granted provided that the copies are not made or
distributed for direct commercial advantage, the VLDB
copyright notice and the title of the publication and
its date appear, and notice is given that copying
is by the permission of the Very Large Data Base
Endowment. To copy otherwise, or to republish, requires
a fee and/or special permission from the Endowment.
Online Paper
CDROM Version: Load the CDROM "DiSC, Volume 1 Number 1" and ...
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...
Printed Edition
Ashish Gupta, Oded Shmueli, Jennifer Widom (Eds.):
VLDB'98, Proceedings of 24rd International Conference on Very Large Data Bases, August 24-27, 1998, New York City, New York, USA.
Morgan Kaufmann 1998, ISBN 1-55860-566-5
Contents
References
- [1]
- Daniel Barbará, William DuMouchel, Christos Faloutsos, Peter J. Haas, Joseph M. Hellerstein, Yannis E. Ioannidis, H. V. Jagadish, Theodore Johnson, Raymond T. Ng, Viswanath Poosala, Kenneth A. Ross, Kenneth C. Sevcik:
The New Jersey Data Reduction Report.
IEEE Data Eng. Bull. 20(4): 3-45(1997)
- [2]
- Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger:
The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles.
SIGMOD Conference 1990: 322-331
- [3]
- Jon Louis Bentley, Jerome H. Friedman:
Data Structures for Range Searching.
ACM Comput. Surv. 11(4): 397-409(1979)
- [4]
- Stefan Berchtold, Christian Böhm, Bernhard Braunmüller, Daniel A. Keim, Hans-Peter Kriegel:
Fast Parallel Similarity Search in Multimedia Databases.
SIGMOD Conference 1997: 1-12
- [5]
- Stefan Berchtold, Christian Böhm, Daniel A. Keim, Hans-Peter Kriegel:
A Cost Model For Nearest Neighbor Search in High-Dimensional Data Space.
PODS 1997: 78-86
- [6]
- Stefan Berchtold, Christian Böhm, Hans-Peter Kriegel:
Improving the Query Performance of High-Dimensional Index Structures by Bulk-Load Operations.
EDBT 1998: 216-230
- [7]
- Stefan Berchtold, Daniel A. Keim, Hans-Peter Kriegel:
The X-tree : An Index Structure for High-Dimensional Data.
VLDB 1996: 28-39
- [8]
- Thomas Brinkhoff, Hans-Peter Kriegel, Ralf Schneider:
Comparison of Approximations of Complex Objects Used for Approximation-based Query Processing in Spatial Database Systems.
ICDE 1993: 40-49
- [9]
- Thomas Brinkhoff, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger:
Multi-Step Processing of Spatial Joins.
SIGMOD Conference 1994: 197-208
- [10]
- Paolo Ciaccia, Marco Patella, Pavel Zezula:
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces.
VLDB 1997: 426-435
- [11]
- ...
- [12]
- ...
- [13]
- ...
- [14]
- Christos Faloutsos:
Access Methods for Text.
ACM Comput. Surv. 17(1): 49-74(1985)
- [15]
- ...
- [16]
- Christos Faloutsos, Stavros Christodoulakis:
Description and Performance Analysis of Signature File Methods for Office Filing.
ACM Trans. Inf. Syst. 5(3): 237-257(1987)
- [17]
- Christos Faloutsos, Ibrahim Kamel:
Beyond Uniformity and Independence: Analysis of R-trees Using the Concept of Fractal Dimension.
PODS 1994: 4-13
- [18]
- Raphael A. Finkel, Jon Louis Bentley:
Quad Trees: A Data Structure for Retrieval on Composite Keys.
Acta Inf. 4: 1-9(1974)
- [19]
- Myron Flickner, Harpreet S. Sawhney, Jonathan Ashley, Qian Huang, Byron Dom, Monika Gorkani, Jim Hafner, Denis Lee, Dragutin Petkovic, David Steele, Peter Yanker:
Query by Image and Video Content: The QBIC System.
IEEE Computer 28(9): 23-32(1995)
- [20]
- Jerome H. Friedman, Jon Louis Bentley, Raphael A. Finkel:
An Algorithm for Finding Best Matches in Logarithmic Expected Time.
ACM Trans. Math. Softw. 3(3): 209-226(1977)
- [21]
- Antonin Guttman:
R-Trees: A Dynamic Index Structure for Spatial Searching.
SIGMOD Conference 1984: 47-57
- [22]
- Joseph M. Hellerstein, Elias Koutsoupias, Christos H. Papadimitriou:
On the Analysis of Indexing Schemes.
PODS 1997: 249-256
- [23]
- Gísli R. Hjaltason, Hanan Samet:
Ranking in Spatial Databases.
SSD 1995: 83-95
- [24]
- Norio Katayama, Shin'ichi Satoh:
The SR-tree: An Index Structure for High-Dimensional Nearest Neighbor Queries.
SIGMOD Conference 1997: 369-380
- [25]
- King-Ip Lin, H. V. Jagadish, Christos Faloutsos:
The TV-Tree: An Index Structure for High-Dimensional Data.
VLDB J. 3(4): 517-542(1994)
- [26]
- David B. Lomet, Betty Salzberg:
The hB-Tree: A Multiattribute Indexing Method with Good Guaranteed Performance.
ACM Trans. Database Syst. 15(4): 625-658(1990)
- [27]
- Jürg Nievergelt, Hans Hinterberger, Kenneth C. Sevcik:
The Grid File: An Adaptable, Symmetric Multikey File Structure.
ACM Trans. Database Syst. 9(1): 38-71(1984)
- [28]
- John T. Robinson:
The K-D-B-Tree: A Search Structure For Large Multidimensional Dynamic Indexes.
SIGMOD Conference 1981: 10-18
- [29]
- Hanan Samet:
The Design and Analysis of Spatial Data Structures.
Addison-Wesley 1990
- [30]
- Timos K. Sellis, Nick Roussopoulos, Christos Faloutsos:
The R+-Tree: A Dynamic Index for Multi-Dimensional Objects.
VLDB 1987: 507-518
- [31]
- Robert F. Sproull:
Refinements to Nearest-Neighbor Searching in k-Dimensional Trees.
Algorithmica 6(4): 579-589(1991)
- [32]
- Markus A. Stricker, Markus Orengo:
Similarity of Color Images.
Storage and Retrieval for Image and Video Databases (SPIE) 1995: 381-392
- [33]
- Jeffrey D. Ullman:
Principles of Database and Knowledge-Base Systems, Volume I.
Computer Science Press 1988, ISBN 0-7167-8158-1
Contents - [34]
- ...
Copyright © Tue Mar 16 02:22:07 2010
by Michael Ley (ley@uni-trier.de)