Lehr- und Forschungseinheit für Datenbanksysteme Ludwig-Maximilians-Universität München
Institut für Informatik
Lehr- und Forschungseinheit für Datenbanksysteme
University of Munich
Institute for Computer Science
Database and Information Systems

Knowledge Discovery in large Collections of Complex Objects


Objective

In recent years Knowledge Discovery in Databases has become an important topic in many interesting areas.  Huge amounts of information in data warehouses, the internet or the results of gene analyses could not be evaluated adequately without methods for clustering, classification or association rules. Though research has offered various approaches for these problems, there are still areas for which most of the well established algorithms do not yield proper solutions. For large collections of complex objects the common way of transforming objects into describing feature  vectors does not  capture all information relevant for Data Mining.

Techniques

Multi Representation Clustering

Data Mining in  Multi-Represented and Multi-Instance Objects

Complex objects often consist of several representations like images and describing text or the primary, secondary and 3D structure of a protein. Furthermore, there might be a complete set of instances of the same kind describing one object. The goal of this area is to enable data mining algorithm to incorporate all representations and instances of an object for more precise data mining.


Data Mining in Structured Objects

In recent years the numbers of structural data like XML-documents, protein-shell-graphs or website trees has increased continuously. In order to integrate the object structure into data mining, we are researching new data mining algorithms working with structured objects instead of  plain feature vectors.

Applications


Automatic Classification of Swissprot entries into Gene Ontology

To apply our algorithm to a practical purpose, we developed a prototype that is capable to determine the set of classes a given protein is blonging to within the well known Gene Ontology. As input format we use a combination of  the textual description and amino acid sequences we derived from Swissprot (a well known protein database). Using swissprot entries that were already mapped to Gene Ontology as training set, we were able to build up a highly accurate classifcation system that can be used to map new swissprot entries to Gene Ontology automatically and thus can help to find the function of proteins.



Publications

  List of Papers

Project leader

Prof. Dr. Hans-Peter Kriegel

Team


Bei Problemen oder Vorschlägen wenden Sie sich bitte an: wwwmaster@dbs.informatik.uni-muenchen.de
Last Modified: