2010
DOI: 10.1109/jproc.2009.2030345
|View full text |Cite
|
Sign up to set email alerts
|

Sparse Representations in Audio and Music: From Coding to Source Separation

Abstract: Abstract-Sparse representations have proved a powerful tool in the analysis and processing of audio signals and already lie at the heart of popular coding standards such as MP3 and Dolby AAC. In this paper we give an overview of a number of current and emerging applications of sparse representations in areas from audio coding, audio enhancement and music transcription to blind source separation solutions that can solve the "cocktail party problem". In each case we will show how the prior assumption that the au… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
98
0
3

Year Published

2012
2012
2021
2021

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 175 publications
(102 citation statements)
references
References 39 publications
1
98
0
3
Order By: Relevance
“…In this work we will focus on greedy algorithms. These methods have found widespread use in audio processing applications ranging from source separation to audio coding and compression [15], due to their computational efficiency and robust performance. We note that while algorithms based on L1-norm relaxation have received significant attention in the context of signal denoising, greedy methods have been comparatively ignored in this context.…”
Section: Sparse Representationmentioning
confidence: 99%
“…In this work we will focus on greedy algorithms. These methods have found widespread use in audio processing applications ranging from source separation to audio coding and compression [15], due to their computational efficiency and robust performance. We note that while algorithms based on L1-norm relaxation have received significant attention in the context of signal denoising, greedy methods have been comparatively ignored in this context.…”
Section: Sparse Representationmentioning
confidence: 99%
“…This is mainly due to the following three reasons. Firstly, as mentioned earlier, music audio can be made sparser if it is transformed into another domain, such as the TF domain, using an analytically pre-defined dictionary such as discrete Fourier transform (DFT) or discrete cosine transform (DCT) [69] [70]. Recent studies show that signal dictionaries directly adapted from training data using machine learning techniques, based on some optimisation criterion (such as the reconstruction error regularised by a sparsity constraint), can offer better performance than the pre-defined dictionary [71] [72].…”
Section: Future Directionsmentioning
confidence: 99%
“…models which describe the magnitude spectra of complex sounds as being composed of a purely additive (no negative components) combinations of spectral atoms, have proven to be adept at separating the target speech from interfering sounds such as noise [1,2], other speakers [3,4], music [5,6,7] and even reverberation [8]. For noise-robust automatic speech recognition (ASR), such compositional models really excel when the atoms also have some temporal extent [9,10].…”
Section: Introductionmentioning
confidence: 99%