FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets.
Christos Faloutsos, King-Ip Lin:
FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets.
SIGMOD Conference 1995: 163-174@inproceedings{DBLP:conf/sigmod/FaloutsosL95,
author = {Christos Faloutsos and
King-Ip Lin},
editor = {Michael J. Carey and
Donovan A. Schneider},
title = {FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization
of Traditional and Multimedia Datasets},
booktitle = {Proceedings of the 1995 ACM SIGMOD International Conference on
Management of Data, San Jose, California, May 22-25, 1995},
publisher = {ACM Press},
year = {1995},
pages = {163-174},
ee = {http://doi.acm.org/10.1145/223784.223812, db/conf/sigmod/sigmod95-12.html},
crossref = {DBLP:conf/sigmod/95},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
Abstract
A very promising idea for fast searching in traditional and multimedia databases is to map objects into points in k-d space, using k feature-extraction functions, provided by a domain expert [Jagadish, SIGMOD Conf. 1991: 208-217]. Thus, we can subsequently use highly fine-tuned spatial access methods (SAMs), to answer several types of queries, including the `Query By Example' type (which translates to a range query); the `all pairs' query (which translates to a spatial join [Brinkhoff, SIGMOD Conf. 1994: 197-208]); the nearest-neighbor or best-match query, etc.
However, designing feature extraction functions can be hard. It is relatively easier for a domain expert to assess the similarity/distance of two objects. Given only the distance information though, it is not obvious how to map objects into points.
This is exactly the topic of this paper. We describe a fast algorithm to map objects into points in some k-dimensional space (k is user-defined),
such that the dis-similarities are preserved.
There are two benefits from this mapping:
(a) efficient retrieval, in conjunction with a SAM, as discussed before
and (b) visualization and data-mining:
the objects can now be plotted as points in 2-d or 3-d space,
revealing potential clusters,
correlations among attributes and other regularities
that data-mining is looking for.
We introduce an older method from pattern recognition, namely, Multi-Dimensional Scaling (MDS); although unsuitable for indexing, we use it as yardstick for our method. Then, we propose a much faster algorithm to solve the problem in hand, while in addition it allows for indexing. Experiments on real and synthetic data indeed show that the proposed algorithm is significantly faster than MDS, (being linear, as opposed to quadratic, on the database size N), while it manages to preserve distances and the overall structure of the data-set.
Copyright © 1995 by the ACM,
Inc., used by permission. Permission to make
digital or hard copies is granted provided that
copies are not made or distributed for profit or
direct commercial advantage, and that copies show
this notice on the first page or initial screen of
a display along with the full citation.
Online Version (ACM WWW Account required): Full Text in PDF Format
CDROM Version: Load the CDROM "Volume 1 Issue 1, SIGMOD '93-'97" and ...
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...
Printed Edition
Michael J. Carey, Donovan A. Schneider (Eds.):
Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, San Jose, California, May 22-25, 1995.
ACM Press 1995 ,
SIGMOD Record 24(2),
June 1995
Contents
[Index Terms]
[Full Text in PDF Format, 1179 KB]
References
- [1]
- Rakesh Agrawal, Christos Faloutsos, Arun N. Swami:
Efficient Similarity Search In Sequence Databases.
FODO 1993: 69-84
- [2]
- Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami:
Mining Association Rules between Sets of Items in Large Databases.
SIGMOD Conference 1993: 207-216
- [3]
- Rakesh Agrawal, Ramakrishnan Srikant:
Fast Algorithms for Mining Association Rules in Large Databases.
VLDB 1994: 487-499
- [4]
- ...
- [5]
- Manish Arya, William F. Cody, Christos Faloutsos, Joel E. Richardson, Arthur Toya:
QBISM: A Prototype 3-D Medical Image Database System.
IEEE Data Eng. Bull. 16(1): 38-42(1993)
- [6]
- Ricardo A. Baeza-Yates, Walter Cunto, Udi Manber, Sun Wu:
Proximity Matching Using Fixed-Queries Trees.
CPM 1994: 198-212
- [7]
- Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger:
The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles.
SIGMOD Conference 1990: 322-331
- [8]
- Thomas Brinkhoff, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger:
Multi-Step Processing of Spatial Joins.
SIGMOD Conference 1994: 197-208
- [9]
- Thomas Brinkhoff, Hans-Peter Kriegel, Bernhard Seeger:
Efficient Processing of Spatial Joins Using R-Trees.
SIGMOD Conference 1993: 237-246
- [10]
- Walter A. Burkhard, Robert M. Keller:
Some Approaches to Best-Match File Searching.
Commun. ACM 16(4): 230-236(1973)
- [11]
- ...
- [12]
- ...
- [13]
- Susan T. Dumais:
Latent Semantic Indexing (LSI): TREC-3 Report.
TREC 1994: 105-115
- [14]
- Christos Faloutsos, Shari Roseman:
Fractals for Secondary Key Retrieval.
PODS 1989: 247-252
- [15]
- ...
- [16]
- Peter W. Foltz, Susan T. Dumais:
Personalized Information Delivery: An Analysis of Information Filtering Methods.
Commun. ACM 35(12): 51-60(1992)
- [17]
- ...
- [18]
- Irene Gargantini:
An Effective Way to Represent Quadtrees.
Commun. ACM 25(12): 905-910(1982)
- [19]
- ...
- [20]
- Antonin Guttman:
R-Trees: A Dynamic Index Structure for Spatial Searching.
SIGMOD Conference 1984: 47-57
- [21]
- ...
- [22]
- ...
- [23]
- H. V. Jagadish:
Linear Clustering of Objects with Multiple Atributes.
SIGMOD Conference 1990: 332-342
- [24]
- H. V. Jagadish:
Spatial Search with Polyhedra.
ICDE 1990: 311-319
- [25]
- H. V. Jagadish:
A Retrieval Technique for Similar Shapes.
SIGMOD Conference 1991: 208-217
- [26]
- ...
- [27]
- Ibrahim Kamel, Christos Faloutsos:
Hilbert R-tree: An Improved R-tree using Fractals.
VLDB 1994: 500-509
- [28]
- ...
- [29]
- ...
- [30]
- Karen Kukich:
Techniques for Automatically Correcting Words in Text.
ACM Comput. Surv. 24(4): 377-439(1992)
- [31]
- David B. Lomet, Betty Salzberg:
The hB-Tree: A Multiattribute Indexing Method with Good Guaranteed Performance.
ACM Trans. Database Syst. 15(4): 625-658(1990)
- [32]
- Fionn Murtagh:
A Survey of Recent Advances in Hierarchical Clustering Algorithms.
Comput. J. 26(4): 354-359(1983)
- [33]
- A. Desai Narasimhalu, Stavros Christodoulakis:
Multimedia Information Systems: The Unfolding of a Reality (Guest Editors' Introduction).
IEEE Computer 24(10): 6-8(1991)
- [34]
- Raymond T. Ng, Jiawei Han:
Efficient and Effective Clustering Methods for Spatial Data Mining.
VLDB 1994: 144-155
- [35]
- Wayne Niblack, Ron Barber, William Equitz, Myron Flickner, Eduardo H. Glasman, Dragutin Petkovic, Peter Yanker, Christos Faloutsos, Gabriel Taubin:
The QBIC Project: Querying Images by Content, Using Color, Texture, and Shape.
Storage and Retrieval for Image and Video Databases (SPIE) 1993: 173-187
- [36]
- Jürg Nievergelt, Hans Hinterberger, Kenneth C. Sevcik:
The Grid File: An Adaptable, Symmetric Multikey File Structure.
ACM Trans. Database Syst. 9(1): 38-71(1984)
- [37]
- Jack A. Orenstein:
Spatial Query Processing in an Object-Oriented Database System.
SIGMOD Conference 1986: 326-336
- [38]
- Jack A. Orenstein:
A Comparison of Spatial Query Processing Techniques for Native and Parameter Spaces.
SIGMOD Conference 1990: 343-352
- [39]
- ...
- [40]
- ...
- [41]
- ...
- [42]
- Nick Roussopoulos, Stephen Kelley, Frédéic Vincent:
Nearest Neighbor Queries.
SIGMOD Conference 1995: 71-79
- [43]
- Gerard Salton, Michael McGill:
Introduction to Modern Information Retrieval.
McGraw-Hill Book Company 1984, ISBN 0-07-054484-0
- [44]
- ...
- [45]
- Timos K. Sellis, Nick Roussopoulos, Christos Faloutsos:
The R+-Tree: A Dynamic Index for Multi-Dimensional Objects.
VLDB 1987: 507-518
- [46]
- Marvin B. Shapiro:
The Choice of Reference Points in Best-Match File Searching.
Commun. ACM 20(5): 339-343(1977)
- [47]
- Dennis Shasha, Jason Tsong-Li Wang:
New Techniques for Best-Match Retrieval.
ACM Trans. Inf. Syst. 8(2): 140-158(1990)
- [48]
- ...
- [49]
- ...
- [50]
- ...
- [51]
- ...
- [52]
- ...
- [53]
- ...
- [54]
- ...
- [55]
- ...
Copyright © Fri Mar 12 17:21:32 2010
by Michael Ley (ley@uni-trier.de)