Institut für Informatik
Lehr- und Forschungseinheit für Datenbanksysteme
Institute for Computer Science
Database and Information Systems
In recent years Knowledge Discovery in Databases has become an important topic in many interesting areas. Huge amounts of information in data warehouses, the internet or the results of gene analyses could not be evaluated adequately without methods for clustering, classification or association rules. Though research has offered various approaches for these problems, there are still areas for which most of the well established algorithms do not yield proper solutions. For large collections of complex objects the common way of transforming objects into describing feature vectors does not capture all information relevant for Data Mining.
Data Mining in Multi-Represented and Multi-Instance ObjectsComplex objects often consist of several representations like images and describing text or the primary, secondary and 3D structure of a protein. Furthermore, there might be a complete set of instances of the same kind describing one object. The goal of this area is to enable data mining algorithm to incorporate all representations and instances of an object for more precise data mining.
Data Mining in Structured ObjectsIn recent years the numbers of structural data like XML-documents, protein-shell-graphs or website trees has increased continuously. In order to integrate the object structure into data mining, we are researching new data mining algorithms working with structured objects instead of plain feature vectors.
Automatic Classification of Swissprot entries into Gene OntologyTo apply our algorithm to a practical purpose, we developed a prototype that is capable to determine the set of classes a given protein is blonging to within the well known Gene Ontology. As input format we use a combination of the textual description and amino acid sequences we derived from Swissprot (a well known protein database). Using swissprot entries that were already mapped to Gene Ontology as training set, we were able to build up a highly accurate classifcation system that can be used to map new swissprot entries to Gene Ontology automatically and thus can help to find the function of proteins.
Bei Problemen oder Vorschlägen wenden Sie sich bitte an: