Recently, Kernel Additive Modelling was proposed as a new framework for performing sound source separation. Kernel Additive Modelling assumes that a source at some location can be estimated using its values at nearby locations where nearness is defined through a source-specific proximity kernel. Different proximity kernels can be used for different sources, which are then separated using an iterative kernel backfitting algorithm. These kernels can efficiently account for features such as continuity, stability in time or frequency and self-similarity. Here, we show that Kernel Additive Modelling can be used to generalise, extend and improve on a widely-used harmonic/percussive separation algorithm which attempts to separate pitched from percussive instruments.
International audienceSource separation consists of separating a signal into additive components. It is a topic of considerable interest with many applications that has gathered much attention recently. Here, we introduce a new framework for source separation called Kernel Additive Modelling, which is based on local regression and permits efficient separation of multidimensional and/or nonnegative and/or non-regularly sampled signals. The main idea of the method is to assume that a source at some location can be estimated using its values at other locations nearby, where nearness is defined through a source-specific proximity kernel. Such a kernel provides an efficient way to account for features like periodicity, continuity, smoothness, stability over time or frequency, self-similarity, etc. In many cases, such local dynamics are indeed much more natural to assess than any global model such as a tensor factorization. This framework permits one to use different proximity kernels for different sources and to separate them using the iterative kernel backfitting algorithm we describe. As we show, kernel additive modelling generalizes many recent and efficient techniques for source separation and opens the path to creating and combining source models in a principled way. Experimental results on the separation of synthetic and audio signals demonstrate the effectiveness of the approach
Popular music is often composed of an accompaniment and a lead component, the latter typically consisting of vocals. Filtering such mixtures to extract one or both components has many applications, such as automatic karaoke and remixing. This particular case of source separation yields very specific challenges and opportunities, including the particular complexity of musical structures, but also relevant prior knowledge coming from acoustics, musicology or sound engineering. Due to both its importance in applications and its challenging difficulty, lead and accompaniment separation has been a popular topic in signal processing for decades. In this article, we provide a comprehensive review of this research topic, organizing the different approaches according to whether they are model-based or data-centered. For model-based methods, we organize them according to whether they concentrate on the lead signal, the accompaniment, or both. For data-centered approaches, we discuss the particular difficulty of obtaining data for learning lead separation systems, and then review recent approaches, notably those based on deep learning. Finally, we discuss the delicate problem of evaluating the quality of music separation through adequate metrics and present the results of the largest evaluation, to-date, of lead and accompaniment separation systems. In conjunction with the above, a comprehensive list of references is provided, along with relevant pointers to available implementations and repositories.
Recently, shift-invariant tensor factorisation algorithms have been proposed for the purposes of sound source separation of pitched musical instruments. However, in practice, existing algorithms require the use of log-frequency spectrograms to allow shift invariance in frequency which causes problems when attempting to resynthesise the separated sources. Further, it is difficult to impose harmonicity constraints on the recovered basis functions. This paper proposes a new additive synthesis-based approach which allows the use of linear-frequency spectrograms as well as imposing strict harmonic constraints, resulting in an improved model. Further, these additional constraints allow the addition of a source filter model to the factorisation framework, and an extended model which is capable of separating mixtures of pitched and percussive instruments simultaneously.
Nonnegative matrix factorization (NMF) is an effective and popular low-rank model for nonnegative data. It enjoys a rich background, both from an optimization and probabilistic signal processing viewpoint. In this study, we propose a new cost-function for NMF fitting, which is introduced as arising naturally when adopting a Cauchy process model for audio waveforms. As we recall, this Cauchy process model is the only probabilistic framework known to date that is compatible with having additive magnitude spectrograms for additive independent audio sources. Similarly to the Gaussian power-spectral density, this Cauchy model features time-frequency nonnegative scale parameters, on which an NMF structure may be imposed. The Cauchy cost function we propose is optimal under that model in a maximum likelihood sense. It thus appears as an interesting newcomer in the inventory of useful cost-functions for NMF in audio. We provide multiplicative updates for Cauchy-NMF and show that they give good performance in audio source separation as well as in extracting nonnegative low-rank structures from data buried in very adverse noise.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.