Data Mining in Bioinformatics

Ludwig-Maximilians-Universität München
Institut für Informatik
Lehr- und Forschungseinheit für Datenbanksysteme

University of Munich
Institute for Computer Science
Database and Information Systems

[ Objective | Projects | Funding | Publications | Team ]

Data Mining in Bioinformatics

Objective

We develop, apply and analyze data mining techniques for tackling problems in bioinformatics. Our main interests are classification and clustering algorithms for protein and microarray data analysis.

Projects

	Microarray time series classification We are utilizing kernel methods for classsification of microarray time series data. This classification of gene expression time series has many further potential applications in medicine and pharmacogenomics, such as disease diagnosis, drug response prediction or disease outcome prognosis, contributing to individualized medical treatment.
	Protein function prediction We have designed graph representations of proteins integrating sequence, structure and bio-chemical information. We have applied graph kernels for protein function prediction on these models. Future work will aim at designing faster and more expressive graph kernels and at exploring new approaches to protein function prediction.
	Subspace Clustering Finding clusters in high-dimensional data is usually futile. But high-dimensional data may be clustered differently in varying subspaces of the feature space. Subspace clustering aims at finding all subspaces of high-dimensional data in which clusters exist.
	RIS A method for finding all subspaces of high-dimensional data containing density-based clusters.
	Retrieval of Feature Graphs and Activity Maps Potential docking sites that are represented by feature graphs and activity maps are retrieved from a 3D protein database in order to provide an efficient filter step for the one-to-many protein docking prediction.
	3D Shape Similarity Search in Biomolecular Databases From a 3D protein database, molecules that have a similar 3D shape are retrieved by using a similarity model based on 3D shape histograms.
	Similarity Search for 3D Surface Segments As a part of protein-protein docking prediction, we perform a similarity search on 3D surface segments. Parametric surface functions including paraboloids and trigonometric polynomials are used to approximate the surface segments.
	Histogram-Based Shape Similarity Sector, Shell and Web Histogram Model
	k-Nearest Neighbor Classification Whereas performance is a serious problem for many k-nn classifiers, our query processor efficiently supports this data mining technique.

Funding

Current Funding

Bioinformatics for the Functional Analysis of Mammalian Genomes (BFAM)
Currently we take part in a joint project with several partners from university and industry called BFAM. This project is funded by the BMBF (German Ministry for Education, Science, Research and Technology).

Past Funding

Development of efficient methods for the one-to-many protein docking problem regarding local flexibility (Entwicklung von effizienten Methoden für das 1:n-Docking von Proteinen unter Berücksichtigung lokaler Flexibilität). This joint project was funded by the DFG (Deutsche Forschungsgemeinschaft, German Research Foundation) within the Priority Program Informatikmethoden zur Analyse und Interpretation großer genomischer Datenmengen (Computer science methods for the analysis and interpretation of large genomic databases). Partners were Prof. Dr. Dietmar Schomburg (University of Cologne, Institute of Biochemistry; Coordinator) and Prof. Dr. Gerhard Sagerer (University of Bielefeld, Applied Computer Science).
Biomolecular Interactions of Proteins (BIOWEPRO). This joint project was funded by the BMBF (German Ministry for Education, Science, Research and Technology) within the Strategy Concept Molecular Bioinformatics from 1993 to 1997. Partners were Prof. Dr. Dietmar Schomburg (GBF Braunschweig; Coordinator), Prof. Dr. Gerhard Sagerer (University of Bielefeld) and Dr. D. Mario Soumpasis (MPI Göttingen).