In this paper we give an overview of four algorithms that we have developed for pattern matching, pattern discovery and data compression in multidimensional datasets. We show that these algorithms can fruitfully be used for processing musical data. In particular, we show that our algorithms can discover instances of perceptually significant musical repetition that cannot be found using previous approaches. We also describe results that suggest the possibility of using our datacompression algorithm for modelling expert motivicthematic music analysis.
The importance of repetitions in music is well-known. In this paper, we study music repetitions in the context of effective and efficient automatic genre classification in large-scale music-databases. We aim at enhancing the access and organization of pieces of music in Digital Libraries by allowing automatic categorization of entire collections by considering only their musical content. We handover to the public a set of genre-specific patterns to support research in musicology. The patterns can be used, for instance, to explore and analyze the relations between musical genres.There are many existing algorithms that could be used to identify and extract repeating patterns in symbolically encoded music. In our case, the extracted patterns are used as representations of the pieces of music on the underlying corpus and, consecutively, to train and evaluate a classifier to automatically identify genres. In this paper, we apply two very fast algorithms enabling us to experiment on large and diverse corpora. Thus, we are able to find patterns with strong discrimination power that can be used in various applications. We carried out experiments on a corpus containing over 40,000 MIDI files annotated with at least one genre. The experiments suggest that our approach is scalable and capable of dealing with real-world-size music collections.
We introduce fast filtering methods for content-based music retrieval problems, where the music is modeled as sets of points in the Euclidean plane, formed by the (on-set time, pitch) pairs. The filters exploit a precomputed index for the database, and run in time dependent on the query length and intermediate output sizes of the filters, being almost independent of the database size. With a quadratic size index, the filters are provably lossless for general point sets of this kind. In the context of music, the search space can be narrowed down, which enables the use of a linear sized index for effective and efficient lossless filtering. For the checking phase, which dominates the overall running time, we exploit previously designed algorithms suitable for local checking. In our experiments on a music database, our best filter-based methods performed several orders of a magnitude faster than the previously designed solutions.
This paper deals with content-based music retrieval (CBMR) of symbolically encoded polyphonic music. It is one of the key issues in the field of music information retrieval. Due to extensive research, there are already satisfactory methods for monophonic CBMR. Unfortunately, this is not the case with the polyphonic task. The problem has been approached in various ways; the majority of the methods suggested fall into two frameworks. The first framework models music as linear strings and the similarity is based on the well-known edit-distance concept. The second one models music as sets of two-dimensional geometric objects (consider the piano-roll representation), but the definition of similarity varies considerably within the framework. We scrutinise these frameworks trying to find common, relevant properties that either inhibit or boost the effectiveness of the methods. Although the edit-distance framework offers more efficient solutions, we conclude that the geometric framework is the choice for the CBMR task because of the very natural way of modelling music still preserving the features intrinsic to the task.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.