Extracting Large-Scale Knowledge Bases from the Web.
Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins:
Extracting Large-Scale Knowledge Bases from the Web.
VLDB 1999: 639-650@inproceedings{DBLP:conf/vldb/KumarRRT99,
author = {Ravi Kumar and
Prabhakar Raghavan and
Sridhar Rajagopalan and
Andrew Tomkins},
editor = {Malcolm P. Atkinson and
Maria E. Orlowska and
Patrick Valduriez and
Stanley B. Zdonik and
Michael L. Brodie},
title = {Extracting Large-Scale Knowledge Bases from the Web},
booktitle = {VLDB'99, Proceedings of 25th International Conference on Very
Large Data Bases, September 7-10, 1999, Edinburgh, Scotland,
UK},
publisher = {Morgan Kaufmann},
year = {1999},
isbn = {1-55860-615-7},
pages = {639-650},
ee = {db/conf/vldb/KumarRRT99.html},
crossref = {DBLP:conf/vldb/99},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
Abstract
The subject of this paper is the creation of knowledge bases by enumerating and
organizing all web occurrences of certain subgraphs. We focus on subgraphs that
are signatures of web phenomena such as tightly-focused topic communities, webrings,
taxonomy trees, keiretsus, etc. For instance, the signature of a webring is a central
page with bidirectional links to a number of other pages. We develop novel algorithms
for such enumeration problems. A key technical contribution is the development of a
model for the evolution of the web graph, based on experimental observations derived
from a snapshot of the web. We argue that our algorithms run efficiently in this model,
and use the model to explain some statistical phenomena on the web that emerged during
our experiments. Finally, we describe the design and implementation of Campfire,
a knowledge base of over one hundred thousand web communities.
Copyright © 1999 by the VLDB Endowment.
Permission to copy without fee all or part of this material is granted provided that the copies are not made or
distributed for direct commercial advantage, the VLDB
copyright notice and the title of the publication and
its date appear, and notice is given that copying
is by the permission of the Very Large Data Base
Endowment. To copy otherwise, or to republish, requires
a fee and/or special permission from the Endowment.
Online Paper
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...
Printed Edition
Malcolm P. Atkinson, Maria E. Orlowska, Patrick Valduriez, Stanley B. Zdonik, Michael L. Brodie (Eds.):
VLDB'99, Proceedings of 25th International Conference on Very Large Data Bases, September 7-10, 1999, Edinburgh, Scotland, UK.
Morgan Kaufmann 1999, ISBN 1-55860-615-7
Contents
References
- [1]
- Rakesh Agrawal, Ramakrishnan Srikant:
Fast Algorithms for Mining Association Rules in Large Databases.
VLDB 1994: 487-499
- [2]
- ...
- [3]
- ...
- [4]
- Krishna Bharat, Andrei Z. Broder, Monika Rauch Henzinger, Puneet Kumar, Suresh Venkatasubramanian:
The Connectivity Server: Fast Access to Linkage Information on the Web.
Computer Networks 30(1-7): 469-477(1998)
- [5]
- Krishna Bharat, Monika Rauch Henzinger:
Improved Algorithms for Topic Distillation in a Hyperlinked Environment.
SIGIR 1998: 104-111
- [6]
- Sergey Brin, Lawrence Page:
The Anatomy of a Large-Scale Hypertextual Web Search Engine.
Computer Networks 30(1-7): 107-117(1998)
- [7]
- ...
- [8]
- ...
- [9]
- Soumen Chakrabarti, Byron Dom, Prabhakar Raghavan, Sridhar Rajagopalan, David Gibson, Jon M. Kleinberg:
Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text.
Computer Networks 30(1-7): 65-74(1998)
- [10]
- ...
- [11]
- ...
- [12]
- Jeffrey Dean, Monika Rauch Henzinger:
Finding Related Pages in the World Wide Web.
Computer Networks 31(11-16): 1467-1479(1999)
- [13]
- Daniela Florescu, Alon Y. Levy, Alberto O. Mendelzon:
Database Techniques for the World-Wide Web: A Survey.
SIGMOD Record 27(3): 59-74(1998)
- [14]
- ...
- [15]
- ...
- [16]
- ...
- [17]
- ...
- [18]
- ...
- [19]
- Jon M. Kleinberg:
Authoritative Sources in a Hyperlinked Environment.
SODA 1998: 668-677
- [20]
- ...
- [21]
- Jon M. Kleinberg, Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins:
The Web as a Graph: Measurements, Models, and Methods.
COCOON 1999: 1-17
- [22]
- ...
- [23]
- ...
- [24]
- ...
- [25]
- Alberto O. Mendelzon, Peter T. Wood:
Finding Regular Simple Paths in Graph Databases.
SIAM J. Comput. 24(6): 1235-1258(1995)
- [26]
- ...
- [27]
- Ehud Rivlin, Rodrigo A. Botafogo, Ben Shneiderman:
Navigating in Hyperspace: Designing a Structure-Based Toolbox.
Commun. ACM 37(2): 87-96(1994)
- [28]
- ...
- [29]
- Shalom Tsur, Jeffrey D. Ullman, Serge Abiteboul, Chris Clifton, Rajeev Motwani, Svetlozar Nestorov, Arnon Rosenthal:
Query Flocks: A Generalization of Association-Rule Mining.
SIGMOD Conference 1998: 1-12
- [30]
- George Kingsley Zipf:
Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology.
Addison-Wesley 1949
Copyright © Tue Mar 16 02:22:08 2010
by Michael Ley (ley@uni-trier.de)