Ludwig-Maximilians-Universität München, Institut für Informatik

Technical Report 95-10

A Database Interface for Clustering in Large Spatial Databases
May 1995
Martin Ester
Hans-Peter Kriegel
Xiaowei Xu
{ester | kriegel | xwxu}
Institut für Informatik
Universität München
Leopoldstr. 11B
D-80802 München (Germany)
discovery algorithms for large databases, database interfaces,spatial data sampling, clustering, application in molecular biology.
Both the number and the size of spatial databases are rapidly growing because of the large amount of data obtained from satellite images, X-ray crystallography or other scientific equipment. Therefore, automated knowledge discovery becomes more and more important in spatial databases. So far, most of the methods for knowledge discovery in databases (KDD) have been based on relational database systems. In this paper, we address the task of class identification in spatial databases using clustering techniques. We present an interface to the database management system (DBMS), which is crucial for the efficiency of KDD on large databases. This interface is based on a spatial access method, the R*-tree. It clusters the objects according to their spatial neighborhood and supports efficient processing of spatial queries. Furthermore,we propose a method for spatial data sampling as part of the focusing component, significantly reducing the number of objects to be clustered. Thus, we achieve a considerable speed-up for clustering in large databases. We have applied the proposed techniques to real data from a large protein database used for predicting protein-protein docking. A performance evaluation on this database indicates that clustering on large spatial databases can be performed both efficiently and effectively using our approach.

Bei Problemen, Vorschlägen schicken Sie bitte eine eMail an
For problems and suggestions send an email message to