Search:
Lehrstuhl  |  Institut  |  Fakultät  |  LMU
print

Inhaltsverzeichnis

Data Mining for High Dimensional Dynamic Data

Motivation

Modern data impose new challenges and requirements for the data mining area due to their special characteristics. High dimensionality is one of these characteristics; an object might be described by a large number of attributes and also, there might be correlations or overlaps between these attributes. Modern data are also characterized by a high degree of variability that is, they evolve over time as new data records are inserted and old records are moved. A special particularly interesting category of dynamic data is stream data that continuously flow in and out of systems at high-speed (e.g. Internet traffic data, sensor data, position tracking data etc.) and its usually impossible to store them all or scan them multiple times. Both these aspects of modern data are considered within this project.

Application scenario

A concrete application that is addressed at the LFE DBS and in this project is the analysis of data derived from tracking wild animals in a wildlife sanctuary. To each of the animals a sensor is attached measuring position (GPS), health parameters like body temperature, etc. and also environmental parameters like air temperature, humidity, etc. It is of particular interest to the rangers of the sanctuary to apply data mining techniques on the stream of this sensor data in order to identify groups of animals with similar parameter values. For each group of animals special actions (rescue, feeding, etc.) depending on the corresponding measurement values might be required.

9783940837028.jpg

(Image from mapsworldwide.com)

Goal

This project is about data mining in high dimensional dynamic data. To deal with the high dimensionality issue, we would exploit the area of subspace clustering which aims at finding clusters at different subspaces of the original feature space. For the high degree of variability, we would rely on the stream mining area, especially on clustering and evolution monitoring over data streams.

Subspace clustering example


150px Cluster monitoring example

Open issues (project, diploma, bachelor's, master's thesis)

As part of our future work, new techniques for mining and monitoring in the above described environment would be developed. In particular, we are interested in:

  • Study and development of methods for incremental update of subspace clusters
  • Study and development of methods for subspace clustering over data streams
  • Modeling, detecting and monitoring subspace cluster changes (e.g. new cluster, cluster drift, cluster shift) in order to gain insights on the population evolution.
  • Summarize cluster changes over an evolving population

Requirements

Good programming skills in Java
Knowledge of KDD concepts (e.g. clustering, classification)
Independent work

Contacts

If you are interested in this topic or if you have further questions please contact:
Irene Ntoutsi Arthur Zimek Peer Kroger

blank
Datenschutz   Impressum