Using Taxonomy, Discriminants, and Signatures for Navigating in Text Databases.
Soumen Chakrabarti, Byron Dom, Rakesh Agrawal, Prabhakar Raghavan:
Using Taxonomy, Discriminants, and Signatures for Navigating in Text Databases.
VLDB 1997: 446-455@inproceedings{DBLP:conf/vldb/ChakrabartiDAR97,
author = {Soumen Chakrabarti and
Byron Dom and
Rakesh Agrawal and
Prabhakar Raghavan},
editor = {Matthias Jarke and
Michael J. Carey and
Klaus R. Dittrich and
Frederick H. Lochovsky and
Pericles Loucopoulos and
Manfred A. Jeusfeld},
title = {Using Taxonomy, Discriminants, and Signatures for Navigating
in Text Databases},
booktitle = {VLDB'97, Proceedings of 23rd International Conference on Very
Large Data Bases, August 25-29, 1997, Athens, Greece},
publisher = {Morgan Kaufmann},
year = {1997},
isbn = {1-55860-470-7},
pages = {446-455},
ee = {db/conf/vldb/ChakrabartiDAR97.html},
crossref = {DBLP:conf/vldb/97},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
Abstract
We explore how to organize a text database
hierarchically to aid better searching
and browsing. We propose to exploit
the natural hierarchy of topics,
or taxonomy, that many corpora, such
as internet directories, digital libraries, and
patent databases enjoy. In
our system, the user navigates through the
query response not as a flat
unstructured list, but embedded in the familiar
taxonomy , and annotated with document
signatures computed dynamically with respect to where the user is located at
any time. We show how to update such databases with new documents
with high speed and accuracy.
We use techniques from statistical
pattern recognition to effciently separate the feature words or
discriminants from the noise words at each node of the taxonomy.
Using these, we build a multi-level
classifier. At each node, this classifier can ignore the large number
of noise words in a documen t. Thus
the classifier has a small model size and
is very fast. However, owing to
the use of context-sensitive features, the
classifier is very accurate.
We report on experiences with the Reuters newswire benchmark, the
US Patent database, and web document
samples from Yahoo!.
Copyright © 1997 by the VLDB Endowment.
Permission to copy without fee all or part of this material is granted provided that the copies are not made or
distributed for direct commercial advantage, the VLDB
copyright notice and the title of the publication and
its date appear, and notice is given that copying
is by the permission of the Very Large Data Base
Endowment. To copy otherwise, or to republish, requires
a fee and/or special permission from the Endowment.
Online Paper
CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ...
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...
Printed Edition
Matthias Jarke, Michael J. Carey, Klaus R. Dittrich, Frederick H. Lochovsky, Pericles Loucopoulos, Manfred A. Jeusfeld (Eds.):
VLDB'97, Proceedings of 23rd International Conference on Very Large Data Bases, August 25-29, 1997, Athens, Greece.
Morgan Kaufmann 1997, ISBN 1-55860-470-7
Contents
Electronic Edition
From CS Dept.,
University Trier (Germany)
References
- [1]
- Peter G. Anick, Shivakumar Vaithyanathan:
Exploiting Clustering and Phrases for Context-based Information Retrieval.
SIGIR 1997: 314-323
- [2]
- Chidanand Apté, Fred Damerau, Sholom M. Weiss:
Automated Learning of Decision Rules for Text Categorization.
ACM Trans. Inf. Syst. 12(3): 233-251(1994)
- [3]
- James O. Berger:
Statistical Decision Theory and Bayesian Analysis, 2nd Edition.
Springer 1985, ISBN 3-540-96098-8
- [4]
- ...
- [5]
- ...
- [6]
- ...
- [7]
- Douglas R. Cutting, David R. Karger, Jan O. Pedersen:
Constant Interaction-Time Scatter/Gather Browsing of Very Large Document Collections.
SIGIR 1993: 126-134
- [8]
- Scott C. Deerwester, Susan T. Dumais, Thomas K. Landauer, George W. Furnas, Richard A. Harshman:
Indexing by Latent Semantic Analysis.
JASIS 41(6): 391-407(1990)
- [9]
- ...
- [10]
- ...
- [11]
- William B. Frakes, Ricardo A. Baeza-Yates (Eds.):
Information Retrieval: Data Structures & Algorithms.
Prentice-Hall 1992, ISBN 0-13-463837-9
Contents - [12]
- ...
- [13]
- ...
- [14]
- ...
- [15]
- Anil K. Jain, Jianchang Mao, K. Moidin Mohiuddin:
Artificial Neural Networks: A Tutorial.
IEEE Computer 29(3): 31-44(1996)
- [16]
- ...
- [17]
- ...
- [18]
- ...
- [19]
- Pat Langley:
Elements of Machine Learning.
Morgan Kaufmann 1994, ISBN 1-55860-301-8
- [20]
- ...
- [21]
- ...
- [22]
- ...
- [23]
- Balas K. Natarajan:
Machine Learning: A Theoretical Approach.
Morgan Kaufmann 1991, ISBN 1-55860-148-1
- [24]
- ...
- [25]
- J. Ross Quinlan:
C4.5: Programs for Machine Learning.
Morgan Kaufmann 1993, ISBN 1-55860-238-0
- [26]
- ...
- [27]
- ...
- [28]
- Stephen E. Robertson, Steve Walker:
Some Simple Effective Approximations to the 2-Poisson Model for Probabilistic Weighted Retrieval.
SIGIR 1994: 232-241
- [29]
- Gerard Salton, Chris Buckley:
Term-Weighting Approaches in Automatic Text Retrieval.
Inf. Process. Manage. 24(5): 513-523(1988)
- [30]
- Gerard Salton, Michael McGill:
Introduction to Modern Information Retrieval.
McGraw-Hill Book Company 1984, ISBN 0-07-054484-0
- [31]
- Hinrich Schütze, David A. Hull, Jan O. Pedersen:
A Comparison of Classifiers and Document Representations for the Routing Problem.
SIGIR 1995: 229-237
- [32]
- ...
- [33]
- C. J. van Rijsbergen:
Information Retrieval.
Butterworth 1979, ISBN 0-408-70929-4
- [34]
- Ellen M. Voorhees:
Using WordNet to Disambiguate Word Senses for Text Retrieval.
SIGIR 1993: 171-180
- [35]
- A. Wald:
Statistical Decision Funtions.
John Wiley 1950
- [36]
- Sholom M. Weiss, Casimir A. Kulikowski:
Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems.
Morgan Kaufmann 1990, ISBN 1-55860-065-5
- [37]
- Tzay Y. Young, Thomas W. Calvert:
Classification, Estimation and Pattern Recognition.
Elsevier 1974
- [38]
- George Kingsley Zipf:
Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology.
Addison-Wesley 1949
Copyright © Tue Mar 16 02:22:06 2010
by Michael Ley (ley@uni-trier.de)