Lehr- und Forschungseinheit für Datenbanksysteme Datenbanksysteme Database Systems

Diplomarbeit

Using Sets of Feature Vectors for Similarity Search on Voxelized CAD Objects

Inhalt

Similarity search in database systems is becoming an increasingly important task in modern application domains such as multimedia, molecular biology, medical imaging and many others. Especially for CAD applications, suitable similarity models and a clear representation of the results can help to reduce the cost of developing and producing new parts by maximizing the reuse of existing parts.

Most of the existing similarity models are based on the paradigm of feature vectors. We adapt four of these models to voxelized 3-D data. Based on the most promising of these four models, we explain how sets of feature vectors can be used for more effective and still efficient similarity search instead of single feature vectors.

We first introduce an intuitive distance measure on sets of feature vectors together with an algorithm for its efficient computation. Furthermore, we present a method for accelerating the processing of similarity queries on vector set data by using a multi-step query processing strategy. The experimental evaluation is based on two real world test data sets provided by the car and aircraft industry and points out that our new similarity approach yields more meaningful results in comparatively short time. This evaluation is based on hierarchical clustering as a new and effective way to analyze and compare similarity models.

Ergebnis

We surveyed four feature transformations that are suitable for the use on voxelized CAD data: the volume model, the solid angle model, the eigen value model and the cover sequence model. The cover sequence model generates a set of covers of a 3-dimensional object that can be stored in a feature vector. In comparison to the other three models it offers a better notion of similarity. A major problem of the cover sequence model is the order in which the covers are stored within the feature vector. For calculating the similarity of two objects the order realizing minimum distance offers a better similarity measure, but is prohibitive in calculation cost. Our new approach to represent an object as a set of feature vectors instead of a single feature vector avoids this problem. Furthermore, it offers a more general approach for applications working with set-valued objects. Then we formally described the distance measure on vector sets we used, called minimal matching distance. Minimal matching distance is a metric and computable in O(k3). To demonstrate how similarity queries can be answered efficiently, we introduced a highly selective filter step that is able to speed up similarity queries by the use of spatial index structures.

To evaluate our system we used two CAD data sets. To demonstrate the good notion of similarity provided by the combination of the cover sequence model and the vector set representation, we introduced an algorithm for hierarchical clustering called OPTICS as a comparably more objective way to examine similarity measures. Since k-nearest neighbor queries are the most common operation in similarity search systems, we evaluated the efficiency of the filter step using 100 sample k-nearest neighbor queries. It turned out that our new approach yields more meaningful results without sacrificing efficiency.

Personen

Bearbeiter Stefan Brecheisen
Betreuer Martin Pfeifle

Downloads

Ausarbeitung (PDF 1.557 KB)
Oberseminar-Präsentation (PDF 802 KB)
Homepages: DBS Institut LMU
21.01.2003 Stefan Brecheisen