The XPS Approach to Loading and Unloading Terabyte Databases.
Sanket Atal:
The XPS Approach to Loading and Unloading Terabyte Databases.
VLDB 1996: 589@inproceedings{DBLP:conf/vldb/Atal96,
author = {Sanket Atal},
editor = {T. M. Vijayaraman and
Alejandro P. Buchmann and
C. Mohan and
Nandlal L. Sarda},
title = {The XPS Approach to Loading and Unloading Terabyte Databases},
booktitle = {VLDB'96, Proceedings of 22th International Conference on Very
Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India},
publisher = {Morgan Kaufmann},
year = {1996},
isbn = {1-55860-382-4},
pages = {589},
ee = {db/conf/vldb/Atal96.html},
crossref = {DBLP:conf/vldb/96},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
Abstract
XPS (eXtended Parallel Server), Informix's MPP solution,
is designed to provide a solution to enterprise-wide
database management, which not only includes the
DBMS, but also scalable utilities. The focus of this talk
will be our load/unload utility which is fast, flexible,
scalable, and easy to use.
Architecture
The loader was developed on top of the existing Parallel
Data Query (PDQ) iterator infrastructure. The loader
and converter are iterators. The server treats the load
iterator tree just like any other iterator tree and is able to
use existing algorithms for parallelization and resource
allocation. This also allows the loader functionality to
have low level access to SQL functions.
The XPS loader design introduces the concept of an
external table -- a table that has a catalog entry in
a database but does not reside in a materialized form within
that database. Any source for a load or target for an
unload can be treated as an {\em external table.} That is, an
external table can be used as an interface to an application
program or system device that is external to the
server. To create an external table, one uses our
extended create table or select...into... statement syntax.
Easy to Use
Although there is a GUI interface, its use is optional
because SQL can be used to perform load, unload and
other operations on the external tables.
- For loading, one can create a catalog entry for an
external table
by using the create table statement. Then a set insert
statement can be used to load the data from this table.
The insert statement
can contain complex filters on the columns of the
external table, thus allowing filtering/scrubbing
of the incoming data.
- For unloading, one can either create an
external table using the
create table statement. Then an
insert statement can be used to unload into this
table. One can also use the Informix
select...into... statement to automatically create an
external table definition for such a table.
- External tables can be used in queries so one can
analyze the data that is to be loaded.
Flexible
External tables provide flexibility in load processing:
- They support a variety of different formats (delimited, fixed,
ascii, Informix row, etc.)
- They support a variety of input devices either
directly or by the use of named pipes. An external
table can also consolidate multiple input sources in parallel.
- Data conversion can be parallelized independent of
the layout of the source. This means that incoming
data does not need to be manually split prior to
loading to achieve parallel conversions.
- External table data can be manipulated in several ways:
- operations (aggregation, trimming, etc.) on columns can be performed.
- certain types of data scrubbing can be done in
the server, in parallel by the loader.
- columns may be omitted, duplicated, remapped, have
their types converted, etc.
Benefits
- Data that resides in application, devices or files
external to the server can be read from/written to in
parallel with the full power and generality of SQL
functionality.
- Application level data can be loaded in bulk via
set-oriented SQL statements instead of using slow,
per-tuple cursor mechanisms.
- Loads are parallelized across all available nodes
resulting in a very fast and scalable loader. Load
performance numbers collected so far have been
very encouraging. 2+ GB/hr on an SP2 Thin 1 node
(with a 66.7 mhz processor with a specint rating of
114.3) and 100 GB/hr on 48 such nodes.
- External tables provide a good infrastructure for
future development towards more abstract types
where the input methods can be invoked in the
server.
Copyright © 1996 by the VLDB Endowment.
Permission to copy without fee all or part of this material is granted provided that the copies are not made or
distributed for direct commercial advantage, the VLDB
copyright notice and the title of the publication and
its date appear, and notice is given that copying
is by the permission of the Very Large Data Base
Endowment. To copy otherwise, or to republish, requires
a fee and/or special permission from the Endowment.
Online Paper
CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ...
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...
Printed Edition
T. M. Vijayaraman, Alejandro P. Buchmann, C. Mohan, Nandlal L. Sarda (Eds.):
VLDB'96, Proceedings of 22th International Conference on Very Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India.
Morgan Kaufmann 1996, ISBN 1-55860-382-4
Contents
Other Formats
Copyright © Tue Mar 16 02:22:06 2010
by Michael Ley (ley@uni-trier.de)