2022
DOI: 10.48550/arxiv.2211.08553
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Hybrid Transformers for Music Source Separation

Abstract: A natural question arising in Music Source Separation (MSS) is whether long range contextual information is useful, or whether local acoustic features are sufficient. In other fields, attention based Transformers [1] have shown their ability to integrate information over long sequences. In this work, we introduce Hybrid Transformer Demucs (HT Demucs), an hybrid temporal/spectral bi-U-Net based on Hybrid Demucs [2], where the innermost layers are replaced by a cross-domain Transformer Encoder, using self-attent… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
7
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
2
2

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(7 citation statements)
references
References 13 publications
0
7
0
Order By: Relevance
“…We then applied one of two models to the audio mixture (or individual channels taken from this) to separate each instrumental source. We used Demucs, a hybrid spectrogram-and waveform-based separation model using transformers (Rouard et al, 2022), to separate double bass and drums. We used Spleeter, a spectrogram-based model using convolutional neural networks (Hennequin et al, 2020), to separate the piano.…”
Section: Model Selectionmentioning
confidence: 99%
See 1 more Smart Citation
“…We then applied one of two models to the audio mixture (or individual channels taken from this) to separate each instrumental source. We used Demucs, a hybrid spectrogram-and waveform-based separation model using transformers (Rouard et al, 2022), to separate double bass and drums. We used Spleeter, a spectrogram-based model using convolutional neural networks (Hennequin et al, 2020), to separate the piano.…”
Section: Model Selectionmentioning
confidence: 99%
“…Both Demucs and Spleeter have achieved good results in comparison to other available models and have appeared as baselines in several music demixing community challenges. Demucs has performed better than Spleeter on tests of drums and bass separation (Rouard et al, 2022), but the Demucs authors warn that the quality of separation for piano is poor; this led to our decision to use Spleeter for this instrument.…”
Section: Model Selectionmentioning
confidence: 99%
“…Simon et al use a hybrid model in the newest Demucs system. The hybrid model has a parallel time branch in addition to the spectrogram branch [ 26 ]. Kong et al constructed a residual U-Net architecture with a time branch and a spectrogram branch and estimated the phase by cIRMs [ 27 ].…”
Section: Related Workmentioning
confidence: 99%
“…Hybrid spectrogram and waveform source separation [14] extends the original U-Net architecture and provides two parallel branches: one in the time (temporal) domain and one in the frequency (spectral) domain, with shared features in the U-Net core while having a decoder for each branch. Hybrid Transformer Demucs [15] replaces the innermost layers of the two U-Net architectures with transformer layers, which then requires large amounts of data for training.…”
Section: Related Workmentioning
confidence: 99%
“…In contrast to our proposed Y-shaped architecture, previous studies have adopted an X-shaped architecture where separate branches were used for each input [15], [14]. Our approach utilises a single decoder branch, resulting in lower parameter count, faster training, lower computational cost, and less data requirement than the aforementioned studies.…”
Section: Related Workmentioning
confidence: 99%