Parallel Algorithms for High-dimensional Similarity Joins for Data Mining Applications.
John C. Shafer, Rakesh Agrawal:
Parallel Algorithms for High-dimensional Similarity Joins for Data Mining Applications.
VLDB 1997: 176-185@inproceedings{DBLP:conf/vldb/ShaferA97,
author = {John C. Shafer and
Rakesh Agrawal},
editor = {Matthias Jarke and
Michael J. Carey and
Klaus R. Dittrich and
Frederick H. Lochovsky and
Pericles Loucopoulos and
Manfred A. Jeusfeld},
title = {Parallel Algorithms for High-dimensional Similarity Joins for
Data Mining Applications},
booktitle = {VLDB'97, Proceedings of 23rd International Conference on Very
Large Data Bases, August 25-29, 1997, Athens, Greece},
publisher = {Morgan Kaufmann},
year = {1997},
isbn = {1-55860-470-7},
pages = {176-185},
ee = {db/conf/vldb/ShaferA97.html},
crossref = {DBLP:conf/vldb/97},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
Abstract
We consider the problem of parallelizing high-dimensional
proximity joins. We present
a parallel multidimensional join algorithm
based on an epsilon-kdB tree abd compare
it with the more common approach of space
partitioning. An evaluation of the algorithm
on an IBM SP2 shared-nothing multiprocessor
is presented using both synthetic and real-life
datasets. We also examine the effictiveness
of the algorithms in the context of a specific
data-mining problem, that of finding similar
time-series. The empirical results show that
our algorithm exhibits good performance and
scalability, as well as ability to handle data-skew.
Copyright © 1997 by the VLDB Endowment.
Permission to copy without fee all or part of this material is granted provided that the copies are not made or
distributed for direct commercial advantage, the VLDB
copyright notice and the title of the publication and
its date appear, and notice is given that copying
is by the permission of the Very Large Data Base
Endowment. To copy otherwise, or to republish, requires
a fee and/or special permission from the Endowment.
Online Paper
CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ...
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...
Printed Edition
Matthias Jarke, Michael J. Carey, Klaus R. Dittrich, Frederick H. Lochovsky, Pericles Loucopoulos, Manfred A. Jeusfeld (Eds.):
VLDB'97, Proceedings of 23rd International Conference on Very Large Data Bases, August 25-29, 1997, Athens, Greece.
Morgan Kaufmann 1997, ISBN 1-55860-470-7
Contents
References
- [1]
- Rakesh Agrawal, Christos Faloutsos, Arun N. Swami:
Efficient Similarity Search In Sequence Databases.
FODO 1993: 69-84
- [2]
- Rakesh Agrawal, King-Ip Lin, Harpreet S. Sawhney, Kyuseok Shim:
Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases.
VLDB 1995: 490-501
- [3]
- Thomas Brinkhoff, Hans-Peter Kriegel, Bernhard Seeger:
Parallel Processing of Spatial Joins Using R-trees.
ICDE 1996: 258-265
- [4]
- Thomas Brinkhoff, Hans-Peter Kriegel, Bernhard Seeger:
Efficient Processing of Spatial Joins Using R-Trees.
SIGMOD Conference 1993: 237-246
- [5]
- David J. DeWitt, Shahram Ghandeharizadeh, Donovan A. Schneider, Allan Bricker, Hui-I Hsiao, Rick Rasmussen:
The Gamma Database Machine Project.
IEEE Trans. Knowl. Data Eng. 2(1): 44-62(1990)
- [6]
- Christos Faloutsos:
Multiattribute Hashing Using Gray Codes.
SIGMOD Conference 1986: 227-238
- [7]
- Christos Faloutsos, M. Ranganathan, Yannis Manolopoulos:
Fast Subsequence Matching in Time-Series Databases.
SIGMOD Conference 1994: 419-429
- [8]
- ...
- [9]
- ...
- [10]
- ...
- [11]
- H. V. Jagadish:
Linear Clustering of Objects with Multiple Atributes.
SIGMOD Conference 1990: 332-342
- [12]
- Nick Koudas, Kenneth C. Sevcik:
Size Separation Spatial Join.
SIGMOD Conference 1997: 324-335
- [13]
- Ming-Ling Lo, Chinya V. Ravishankar:
Generating Seeded Trees from Data Sets.
SSD 1995: 328-347
- [14]
- Ming-Ling Lo, Chinya V. Ravishankar:
Spatial Hash-Joins.
SIGMOD Conference 1996: 247-258
- [15]
- ...
- [16]
- Jürg Nievergelt, Hans Hinterberger, Kenneth C. Sevcik:
The Grid File: An Adaptable, Symmetric Multikey File Structure.
ACM Trans. Database Syst. 9(1): 38-71(1984)
- [17]
- Jack A. Orenstein, T. H. Merrett:
A Class of Data Structures for Associative Searching.
PODS 1984: 181-190
- [18]
- Jignesh M. Patel, David J. DeWitt:
Partition Based Spatial-Merge Join.
SIGMOD Conference 1996: 259-270
- [19]
- ...
- [20]
- Kyuseok Shim, Ramakrishnan Srikant, Rakesh Agrawal:
High-Dimensional Similarity Joins.
ICDE 1997: 301-311
Copyright © Tue Mar 16 02:22:06 2010
by Michael Ley (ley@uni-trier.de)