Search:
Lehrstuhl  |  Institut  |  Fakultät  |  LMU
print

Mining A Billion Notes

(Project, Bachelor, Diploma, Masterthesis or Werkstudent/Wissenschaftliche Hilfskraft)


Motivation

Hundreds of thousands of music scores are being digitized by libraries all over the world. In contrast to books, they generally remain inaccessible for content-based retrieval and algorithmic analysis. There is no analogue to Google Books for music scores, and there exist no large corpora of symbolic music data that would empower musicology in a way large text corpora are empowering computational linguistics, sociology, history, and other humanities that have printed word as their major source of evidence about their research subjects.

We want to change that.


We have developed Peachnote, the first of its kind music score search engine and analysis platform. It allows one to search for music inside scanned scores. Currently we are working with scores from the Petrucci Music Library - the Wikipedia of music scores.

Google recently supported this work with a Google Research Award.


Tens of thousands of people from more than 150 countries have used the search engine over the first two months of its existence. But this is just the beginning.

The current version allows only simple searches, and the interface is rather modest. There are lots of ways we can improve the system and make it much more functional, convenient and fun to use. The data set behind the search engine holds insights which were previously available only to trained musicologists and experienced musicians, if at all. Now, using data mining, we can provide these insights to a much wider audience.

Which other pieces contain the “Für Elise” motif? Which composers influenced Wagner? What are the characteristic patterns in Schumann’s music? Who else could have composed a given passage? How do melodies and rhythms spread over time and space? Where did the change in music styles come from, and how?

Finding answers to such questions and sharing them with the public is the goal of this project.


Tasks

  • Discovering previously unknown facts about the evolution of music over time and space
  • Identifying similar places in music scores and making the results accessible to a wide audience via the search engine interface
  • Incorporating information from query logs into the music similarity model and improving relevance of the search engine results
  • Improving the state of the art in optical music recognition (OMR) using data mining algorithms and large data sets
  • Estimating the quality of OMR output in absence of the ground truth data using statistics collected over large data sets
  • Acquiring and indexing all major score collections publicly available on the Internet
  • Etc.

Requirements

Good programming skills in Java are a plus, since you may want to build upon and extend the current code base, which is written mostly in Java (Hadoop on the backend and GWT on the frontend). But if you prefer other languages, that’s also fine.

Relevant links

http://www.peachnote.com

http://imslp.org/wiki

http://ngrams.googlelabs.com/

http://sappingattention.blogspot.com/2011/04/age-cohort-and-vocabulary-use.html

Contact

If you are interested or have any questions, please contact Vladimir Viro, Tobias Emrich or Andreas Züfle in German, English or Russian

blank
Datenschutz   Impressum