LaSAFT: Latent Source Attentive Frequency Transformation for Conditioned Source Separation

choi, woo sung; Kim, Minseok; Chung, Jaehwa; Jung, Soonyoung

doi:10.48550/arxiv.2010.11631

Cited by 1 publication

(9 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Source Separation: Our work is also related to deep learningbased source separation methods [3,4,8,11,18,24,35]. While early methods separate either a single source [4,8,35] or multiple sources once [11], conditioned source separation methods [3,18,24] isolate the source specified by an input symbol. A conditioned source separation task can be viewed as an AMSS task where we want to simply mute all the unwanted sources.…”

Section: Related Workmentioning

confidence: 99%

“…The concept of latent source has been introduced in recent source separation methods [3,32]. [32] have trained their model to separate the given input into a variable number of latent sources, which can be remixed to approximate the original mixture.…”

Section: Related Workmentioning

confidence: 99%

“…By carefully taking the weighted sum of separated latent sources, we can extract the desired source, such as clean speech. [3] also use the concept of latent source for conditioned source separation. They proposed the Latent Source-Attentive Frequency Transformation (LaSAFT) method, which extracts the feature map for each latent source and takes the weighted sum of them by using an attention mechanism.…”

Section: Related Workmentioning

confidence: 99%

“…We also use the concept of latent sources in AMSS-Net, where each latent source deals with a more detailed aspect of acoustic features than a symbolic-level source (e.g., 'vocals'). Similar to [3], we assume that a weighted sum of latent sources can represent a source, while [32] assumed that latent sources are independent. Unlike previous works, our approach is based on channel-level separation as described in §5.4.1.…”

Section: Related Workmentioning

confidence: 99%

“…Although many machine learning approaches have been proposed for audio processing [3,8,13,14,18,19,23,24,28,32,33], to the best of our knowledge, there is no existing method that can directly address AMSS (see §3). This paper proposes a novel endto-end neural network that performs AMSS according to the given textual query.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

AMSS-Net: Audio Manipulation on User-Specified Sources with Textual Queries

choi¹,

Kim²,

Ramírez³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

This paper proposes a neural network that performs audio transformations to user-specified sources (e.g., vocals) of a given audio track according to a given description while preserving other sources not mentioned in the description. Audio Manipulation on a Specific Source (AMSS) is challenging because a sound object (i.e., a waveform sample or frequency bin) is 'transparent'; it usually carries information from multiple sources, in contrast to a pixel in an image. To address this challenging problem, we propose AMSS-Net, which extracts latent sources and selectively manipulates them while preserving irrelevant sources. We also propose an evaluation benchmark for several AMSS tasks, and we show that AMSS-Net outperforms baselines on several AMSS tasks via objective metrics and empirical verification. CCS CONCEPTS• Applied computing → Sound and music computing; • Computing methodologies → Neural networks.

show abstract