Supervised Projects



  • Advanced Project: Designing and Evaluating a Relational Database for a Publication-Search-Engine in Intranets.

    • Problem Description: Like all research groups, the database group at the university of munich has published quite a number of papers in recent years. This information is an important part of our presence on the web and enjoys many pages hits (from the intranet by group members as well as from the internet) every day. Currently, this information resides in a flat file, has a rather primitive query interface and no update/insert functionality (apart from simple text-editors).
    • Solution: In this project, we design, implement and evaluate a relational database for the publications. The easy to use web interface consists of a powerful query engine, and a convenient update/insert interface. Technically, we use a 3-tier architecture. A mySQL database engine running under Linux contains the publication data, the Apache web server hosts the static HTML pages and cgi-based Java programs communicating with the database engine through JDBC.

  • Advanced Project: Speeding up Hierarchical Clustering using Data Compression.

    • Problem Description: Clustering algorithms are an important data mining method. They identify groups in the data, such that the data objects in a group are as similar to each other as possible while the data objects from different groups are as different as possible (data segmentation). While in flat clustering algorithms the groups are all on one level, in hierarchicals methods the groups can be nested inside each other (e.g. a large group containing customers buying fiction books containing two smaller groups of customers buying mostly science fiction and others buying mostly detective stories). Because of the high quality of the result that hierarchical methods provide, these algorithms usually have a runtime that grows at least quadratically with the number of object to segment, making them impractical for very large databases.
    • Solution: In order to facilitate the application of hierarchical clustering methods to very large databases, we investigate combining the clustering algorithm with a data compression algorithm. This improves the runtime dramatically, but incurs a loss in accurarcy (quality) of the result. We therefore extend the clustering method to make maximal use of the information contained in the compressed data items, thus keeping the loss of quality as small as possible.

  • Advanced Project: Data Compression in Java.

    • Problem Description: This project builds on the results of the project "Speeding up Hierarchical Clustering using Data Compression" discussed above. The data compression method we used is this project is part of a large software distribution implemented in C/C++. Our data mining software is mostly implemented in Java, causing lots of data conversion and code problems.
    • Solution: The data compression algorithm is extracted from the large system, ported to Java and integrated into our existing system.

  • Advanced Project: Next Generation Sampling: Recovering Lost Information.

    • Problem Description: available soon...
    • Solution:

  • Advanced Project: Evolving Optimal Cluster Descriptions Using Genetic Algorithms.

    • Problem Description: Clustering algorithms are an important data mining method. They identify groups in the data, such that the data objects in a group are as similar to each other as possible while the data objects from different groups are as different as possible (data segmentation). The result of many clustering algorithms are sets of objects belonging to the same group. However, for the human data analyst such a set of objects is hard to analyze further, so concise and easy to understand descriptions of the clusters are needed. This is a hard problem, as often the clusters can be of arbitraty shape, e.g. contain holes etc.
    • Solution: In the project we apply the search strategie pioneered by genetic algorithms for computing descriptions of clusters given sets of objects belonging to the clusters. Genetic algorithms mimic the natural processes of cross-over, mutation etc. They start we a large number of "individuals" (cluster descriptions) and assign each one a "fitness" value representing how well the description fits the clusters. Then, the best individuals are allowed to propagate and create offspring for further generations. After the number of generations, the best individual describes the clusters very well.

  • Diploma Thesis: Incremental Hierarchical Clustering.

    • Problem Description: Many databases used for knowledge discovery are dynamic, i.e. new objects are added and old ones deleted. An example are web log data: every time a user access a web page, a new page hit is recorded and added to the database. As these databases usually get very large, running a clustering algorithm on such a database becomes more and more expensive.
    • Solution: The idea of incremental algorithms is to re-use the information from the last run of the algorithm, and simple update this with the new information added in the mean time. In this project, our hierarchical clustering algorithm OPTICS is extended to make such an incremental version possible.



Main Home | Research Home