The first step of understanding the structure of a music piece is to segment it into formative parts. A recently successful method for finding segment boundaries employs a Convolutional Neural Network (CNN) trained on spectrogram excerpts. While setting a new state of the art, it often misses boundaries defined by non-local musical cues, such as segment repetitions. To account for this, we propose a refined variant of self-similarity lag matrices representing long-term relationships. We then demonstrate different ways of fusing this feature with spectrogram excerpts within a CNN, resulting in a boundary recognition performance superior to the previous state of the art. We assume that the integration of more features in a similar fashion would improve the performance even further.
Abstract-We present and compare two approaches to detect the presence of bird calls in audio recordings using convolutional neural networks on mel spectrograms. In a signal processing challenge using environmental recordings from three very different sources, only two of them available for supervised training, we obtained an Area Under Curve (AUC) measure of 89% on the hidden test set, higher than any other contestant. By comparing multiple variations of our systems, we find that despite very different architectures, both approaches can be tuned to perform equally well. Further improvements will likely require a radically different approach to dealing with the discrepancy between data sources.
One of the central goals of Music Information Retrieval (MIR) is the quantification of similarity between or within pieces of music. These quantitative relations should mirror the human perception of music similarity, which is however highly subjective with low inter-rater agreement. Unfortunately this principal problem has been given little attention in MIR so far. Since it is not meaningful to have computational models that go beyond the level of human agreement, these levels of inter-rater agreement present a natural upper bound for any algorithmic approach. We will illustrate this fundamental problem in the evaluation of MIR systems using results from two typical application scenarios: (i) modelling of music similarity between pieces of music; (ii) music structure analysis within pieces of music. For both applications, we derive upper bounds of performance which are due to the limited inter-rater agreement. We compare these upper bounds to the performance of state-of-the-art MIR systems and show how the upper bounds prevent further progress in developing better MIR systems.
The usability of Application Programming Interfaces (APIs) is one of the main factors defining the success of a software based framework. Research in the area of human computer interaction (HCI) currently mainly focuses on end-user usability and only little research has been done regarding the usability of APIs. In this paper, we present a methodology on how to use and combine HCI methods with the goal to evaluate the usability of APIs. The methodology consist of three phasesa heuristic evaluation, a developer workshop and interviews. We setup a case-study according to the methodology, in which we are evaluating the usability of a service-oriented framework API. The goal was to explore different HCI methods and compare the applicability of such methods to find usability problems in an API. The case-study combined qualitative and quantitative methods in order to investigate the usability and intuitiveness of the API itself. It allowed us to identify relevant problem areas for usability related issues that could be mapped to specific types of HCI methods. Examples for this are e.g. structural problems, which are identified mainly in inspection methods, while problems regarding errors and exception handling are mainly identified during the hands-on example part of the developer workshops conducted. The resulting problem areas allow us to develop a first classification of API related usability problems that are making the relevancy of usability issues for APIs more explicit and applicable.
Audio signal processing frequently requires time-frequency representations and in many applications, a non-linear spacing of frequency-bands is preferable. This paper introduces a framework for efficient implementation of invertible signal transforms allowing for non-uniform and in particular non-linear frequency resolution. Non-uniformity in frequency is realized by applying nonstationary Gabor frames with adaptivity in the frequency domain. The realization of a perfectly invertible constant-Q transform is described in detail. To achieve real-time processing, independent of signal length, slicewise processing of the full input signal is proposed and referred to as sliCQ transform.By applying frame theory and FFT-based processing, the presented approach overcomes computational inefficiency and lack of invertibility of classical constant-Q transform implementations. Numerical simulations evaluate the efficiency of the proposed algorithm and the method's applicability is illustrated by experiments on real-life audio signals.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.