Ludwig-Maximilians-Universität München, Institut für Informatik

Technical Report 95-08

Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification
May 1995
Martin Ester
Hans-Peter Kriegel
Xiaowei Xu
{ester | kriegel | xwxu}
Institut für Informatik
Universität München
Leopoldstr. 11B
D-80802 München (Germany)
knowledge discovery in databases, spatial query processing, architecture of spatial database systems, clustering, application in molecular biology.
Both, the number and the size of spatial databases are rapidly growing because of the large amount of data obtained from satellite images, X-ray crystallography or other scientific equipment. Therefore, automated knowledge discovery becomes more and more important in spatial databases. So far, most of the methods for knowledge discovery in databases (KDD) have been based on relational database systems. In this paper, we address the task of class identification in spatial databases using clustering techniques. We put special emphasis on the integration of the discovery methods with the DB interface, which is crucial for the efficiency of KDD on large databases. The key to this integration is the use of a well-known spatial access method, the R*-tree. The focusing component of a KDD system determines which parts of the database are relevant for the knowledge discovery task. We present several strategies for focusing: selecting representatives from a spatial database, focusing on the relevant clusters and retrieving all objects of a given cluster. We have applied the proposed techniques to real data from a large protein database used for predicting protein-protein docking. A performance evaluation on this database indicates that clustering on large spatial databases can be performed, both, efficiently and effectively.

Bei Problemen, Vorschlägen schicken Sie bitte eine eMail an
For problems and suggestions send an email message to