Using Sets of Feature Vectors for Similarity Search on Voxelized CAD Objects
Similarity search in database systems is becoming an increasingly
important task in modern application domains such as multimedia,
molecular biology, medical imaging and many others.
Especially for CAD applications, suitable similarity models and a clear
representation of the results can help to reduce the cost of developing
and producing new parts by maximizing the reuse of existing parts.
Most of the existing similarity models are based on the paradigm of
feature vectors. We adapt four of these models to voxelized 3-D data.
Based on the most promising of these four models, we explain how sets of
feature vectors can be used for more effective and still efficient
similarity search instead of single feature vectors.
We first introduce an intuitive distance measure on sets of feature
vectors together with an algorithm for its efficient computation.
Furthermore, we present a method for accelerating the processing of
similarity queries on vector set data by using a multi-step query
processing strategy. The experimental evaluation is based on two real
world test data sets provided by the car and aircraft industry and points
out that our new similarity approach yields more meaningful results in
comparatively short time. This evaluation is based on hierarchical
clustering as a new and effective way to analyze and compare similarity
We surveyed four feature transformations that are suitable for the use on
voxelized CAD data: the volume model, the solid angle model, the eigen
value model and the cover sequence model.
The cover sequence model generates a set of covers of a 3-dimensional
object that can be stored in a feature vector. In comparison to the
other three models it offers a better notion of similarity. A major
problem of the cover sequence model is the order in which the covers
are stored within the feature vector. For calculating the similarity
of two objects the order realizing minimum distance offers a better
similarity measure, but is prohibitive in calculation cost. Our new
approach to represent an object as a set of feature vectors instead
of a single feature vector avoids this problem. Furthermore, it offers
a more general approach for applications working with set-valued
objects. Then we formally described the distance measure on vector sets
we used, called minimal matching distance. Minimal matching distance
is a metric and computable in O(k3).
To demonstrate how similarity queries can be answered efficiently,
we introduced a highly selective filter step that is able to speed up
similarity queries by the use of spatial index structures.
To evaluate our system we used two CAD data sets. To demonstrate the good
notion of similarity provided by the combination of the cover sequence
model and the vector set representation, we introduced an algorithm for
hierarchical clustering called OPTICS as a comparably more objective way
to examine similarity measures.
Since k-nearest neighbor queries are the most common operation
in similarity search systems, we evaluated the efficiency of the filter
step using 100 sample k-nearest neighbor queries. It turned out that our
new approach yields more meaningful results without sacrificing efficiency.
Ausarbeitung (PDF 1.557 KB)
Oberseminar-Präsentation (PDF 802 KB)
21.01.2003 Stefan Brecheisen