ACM SIGMOD Anthology VLDB dblp.uni-trier.de

Using Taxonomy, Discriminants, and Signatures for Navigating in Text Databases.

Soumen Chakrabarti, Byron Dom, Rakesh Agrawal, Prabhakar Raghavan: Using Taxonomy, Discriminants, and Signatures for Navigating in Text Databases. VLDB 1997: 446-455
@inproceedings{DBLP:conf/vldb/ChakrabartiDAR97,
  author    = {Soumen Chakrabarti and
               Byron Dom and
               Rakesh Agrawal and
               Prabhakar Raghavan},
  editor    = {Matthias Jarke and
               Michael J. Carey and
               Klaus R. Dittrich and
               Frederick H. Lochovsky and
               Pericles Loucopoulos and
               Manfred A. Jeusfeld},
  title     = {Using Taxonomy, Discriminants, and Signatures for Navigating
               in Text Databases},
  booktitle = {VLDB'97, Proceedings of 23rd International Conference on Very
               Large Data Bases, August 25-29, 1997, Athens, Greece},
  publisher = {Morgan Kaufmann},
  year      = {1997},
  isbn      = {1-55860-470-7},
  pages     = {446-455},
  ee        = {db/conf/vldb/ChakrabartiDAR97.html},
  crossref  = {DBLP:conf/vldb/97},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Abstract

We explore how to organize a text database hierarchically to aid better searching and browsing. We propose to exploit the natural hierarchy of topics, or taxonomy, that many corpora, such as internet directories, digital libraries, and patent databases enjoy. In our system, the user navigates through the query response not as a flat unstructured list, but embedded in the familiar taxonomy , and annotated with document signatures computed dynamically with respect to where the user is located at any time. We show how to update such databases with new documents with high speed and accuracy. We use techniques from statistical pattern recognition to effciently separate the feature words or discriminants from the noise words at each node of the taxonomy. Using these, we build a multi-level classifier. At each node, this classifier can ignore the large number of noise words in a documen t. Thus the classifier has a small model size and is very fast. However, owing to the use of context-sensitive features, the classifier is very accurate. We report on experiences with the Reuters newswire benchmark, the US Patent database, and web document samples from Yahoo!.

Copyright © 1997 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Printed Edition

Matthias Jarke, Michael J. Carey, Klaus R. Dittrich, Frederick H. Lochovsky, Pericles Loucopoulos, Manfred A. Jeusfeld (Eds.): VLDB'97, Proceedings of 23rd International Conference on Very Large Data Bases, August 25-29, 1997, Athens, Greece. Morgan Kaufmann 1997, ISBN 1-55860-470-7
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Electronic Edition

From CS Dept., University Trier (Germany)

References

[1]
Peter G. Anick, Shivakumar Vaithyanathan: Exploiting Clustering and Phrases for Context-based Information Retrieval. SIGIR 1997: 314-323 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[2]
Chidanand Apté, Fred Damerau, Sholom M. Weiss: Automated Learning of Decision Rules for Text Categorization. ACM Trans. Inf. Syst. 12(3): 233-251(1994) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[3]
James O. Berger: Statistical Decision Theory and Bayesian Analysis, 2nd Edition. Springer 1985, ISBN 3-540-96098-8
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[4]
...
[5]
...
[6]
...
[7]
Douglas R. Cutting, David R. Karger, Jan O. Pedersen: Constant Interaction-Time Scatter/Gather Browsing of Very Large Document Collections. SIGIR 1993: 126-134 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[8]
Scott C. Deerwester, Susan T. Dumais, Thomas K. Landauer, George W. Furnas, Richard A. Harshman: Indexing by Latent Semantic Analysis. JASIS 41(6): 391-407(1990) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[9]
...
[10]
...
[11]
William B. Frakes, Ricardo A. Baeza-Yates (Eds.): Information Retrieval: Data Structures & Algorithms. Prentice-Hall 1992, ISBN 0-13-463837-9
Contents CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[12]
...
[13]
...
[14]
...
[15]
Anil K. Jain, Jianchang Mao, K. Moidin Mohiuddin: Artificial Neural Networks: A Tutorial. IEEE Computer 29(3): 31-44(1996) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[16]
...
[17]
...
[18]
...
[19]
Pat Langley: Elements of Machine Learning. Morgan Kaufmann 1994, ISBN 1-55860-301-8
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[20]
...
[21]
...
[22]
...
[23]
Balas K. Natarajan: Machine Learning: A Theoretical Approach. Morgan Kaufmann 1991, ISBN 1-55860-148-1
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[24]
...
[25]
J. Ross Quinlan: C4.5: Programs for Machine Learning. Morgan Kaufmann 1993, ISBN 1-55860-238-0
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[26]
...
[27]
...
[28]
Stephen E. Robertson, Steve Walker: Some Simple Effective Approximations to the 2-Poisson Model for Probabilistic Weighted Retrieval. SIGIR 1994: 232-241 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[29]
Gerard Salton, Chris Buckley: Term-Weighting Approaches in Automatic Text Retrieval. Inf. Process. Manage. 24(5): 513-523(1988) CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[30]
Gerard Salton, Michael McGill: Introduction to Modern Information Retrieval. McGraw-Hill Book Company 1984, ISBN 0-07-054484-0
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[31]
Hinrich Schütze, David A. Hull, Jan O. Pedersen: A Comparison of Classifiers and Document Representations for the Routing Problem. SIGIR 1995: 229-237 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[32]
...
[33]
C. J. van Rijsbergen: Information Retrieval. Butterworth 1979, ISBN 0-408-70929-4
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[34]
Ellen M. Voorhees: Using WordNet to Disambiguate Word Senses for Text Retrieval. SIGIR 1993: 171-180 CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[35]
A. Wald: Statistical Decision Funtions. John Wiley 1950
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[36]
Sholom M. Weiss, Casimir A. Kulikowski: Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems. Morgan Kaufmann 1990, ISBN 1-55860-065-5
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[37]
Tzay Y. Young, Thomas W. Calvert: Classification, Estimation and Pattern Recognition. Elsevier 1974
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML
[38]
George Kingsley Zipf: Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology. Addison-Wesley 1949
CiteSeerX Google scholar pubzone.org BibTeX bibliographical record in XML

Copyright © Tue Mar 16 02:22:06 2010 by Michael Ley (ley@uni-trier.de)