2014 International Joint Conference on Neural Networks (IJCNN) 2014
DOI: 10.1109/ijcnn.2014.6889522
|View full text |Cite
|
Sign up to set email alerts
|

Semi-supervised non-negative tensor factorisation of modulation spectrograms for monaural speech separation

Abstract: This paper details the use of a semi-supervised approach to audio source separation. Where only a single source model is available, the model for an unknown source must be estimated. A mixture signal is separated through factorisation of a feature-tensor representation, based on the modulation spectrogram. Harmonically related components tend to modulate in a similar fashion, and this redundancy of patterns can be isolated. This feature representation requires fewer parameters than spectrally based methods and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
20
0

Year Published

2014
2014
2018
2018

Publication Types

Select...
4
3

Relationship

2
5

Authors

Journals

citations
Cited by 16 publications
(20 citation statements)
references
References 18 publications
0
20
0
Order By: Relevance
“…The MS representation for acoustical data is obtained using the procedure explained in [14]. For the NMF based system, this 3D representation of size b×T ×B, is converted to a 2D representation by stacking the truncated spectra belonging to different channels to get a matrix of size (B · b) × T , where b, T and B are the number of truncated bins, number of frames in the MS and number of filter banks used to obtain the MS representation, respectively.…”
Section: Proposed System With Ms Featuresmentioning
confidence: 99%
See 1 more Smart Citation
“…The MS representation for acoustical data is obtained using the procedure explained in [14]. For the NMF based system, this 3D representation of size b×T ×B, is converted to a 2D representation by stacking the truncated spectra belonging to different channels to get a matrix of size (B · b) × T , where b, T and B are the number of truncated bins, number of frames in the MS and number of filter banks used to obtain the MS representation, respectively.…”
Section: Proposed System With Ms Featuresmentioning
confidence: 99%
“…The simulation results obtained on the AURORA-2 database revealed that the proposed system with the Mel features as front-end results in better SDRs when compared to both the baseline systems. The paper also investigates the use of coupled dictionaries for modulation spectrogram (MS) [13] features which has recently been successfully used for blind source separation [14]. The proposed system with MS features also yields improved SDRs over the baseline systems.…”
Section: Introductionmentioning
confidence: 99%
“…As there is a low-pass filtering operation, it is possible to truncate each of these modulation spectrograms to their lowest few, say k, bins [3,19], i.e, each modulation spectrogram now has size k × T . To obtain a two-dimensional representation, we stack these modulation spectrograms originating from B channels to a matrix of size (B · k) × T .…”
Section: Ms-dft Settingmentioning
confidence: 99%
“…Fitzgerald brought the NTF to sound source separation [11]. Barker performed separation by a Wiener-like filter generated from the estimated tensor factors and obtained better results [12]. Gemmeke proposed exemplarbased sparse representations for speech recognition in a noisy condition.…”
Section: Introductionmentioning
confidence: 99%