Attribute-based regularization of latent spaces for variational auto-encoders

Pati, Kumar Ashis; Lerch, Alexander

doi:10.1007/s00521-020-05270-2

Cited by 30 publications

(35 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Some recent studies have proposed metrics to evaluate the mapping and disentanglement of extracted latent dimensions (Adel et al, 2018;Locatello et al, 2020;Pati and Lerch, 2020). In the present study, we are not particularly interested in the disentanglement of the constrained latent dimensions as this would assume that the perceptual dimensions themselves are uncorrelated, which is not the case.…”

Section: Mapping and Disentanglement Evaluationmentioning

confidence: 93%

“…We then computed the average scores obtained by the two models with the four mapping and disentanglement metrics (see Table 3). The results show that the perceptual regularization has a clear impact on the structure of the latent space and that the obtained perceptually-regularized latent space significantly outperforms the baseline for all the metrics (the higher the better according to Pati and Lerch (2020)).…”

Section: Mapping and Disentanglement Evaluationmentioning

confidence: 97%

“…Following the approach of Esling et al (2018) and Pati and Lerch (2020), we inserted an additional regularization term in the VLB of Eq. (5).…”

Section: Perceptually Regularized Vlbmentioning

confidence: 99%

“…Examples of transformed sounds obtained with different offset values are available at the companion webpage. 11 (Pati and Lerch, 2020) for the first eight dimensions of the latent space. Thirty listeners participated in this third perceptual test.…”

Section: Stimulimentioning

confidence: 99%

“…Figure 6: Interpretability measure(Pati and Lerch, 2020) for the first eight dimensions of the latent space.…”

mentioning

confidence: 99%

See 4 more Smart Citations

Make That Sound More Metallic: Towards a Perceptually Relevant Control of the Timbre of Synthesizer Sounds Using a Variational Autoencoder

Roche¹,

Hueber²,

Garnier³

et al. 2021

Transactions of the International Society for Music Information Retrieval

View full text Add to dashboard Cite

In this article, we propose a new method of sound transformation based on control parameters that are intuitive and relevant for musicians. This method uses a variational autoencoder (VAE) model that is first trained in an unsupervised manner on a large dataset of synthesizer sounds. Then, a perceptual regularization term is added to the loss function to be optimized, and a supervised fine-tuning of the model is carried out using a small subset of perceptually labeled sounds. The labels were obtained from a perceptual test of Verbal Attribute Magnitude Estimation in which listeners rated this training sound dataset along eight perceptual dimensions (French equivalents of metallic, warm, breathy, vibrating, percussive, resonating, evolving, aggressive). These dimensions were identified as relevant for the description of synthesizer sounds in a first Free Verbalization test. The resulting VAE model was evaluated by objective reconstruction measures and a perceptual test. Both showed that the model was able, to a certain extent, to capture the acoustic properties of most of the perceptual dimensions and to transform sound timbre along at least two of them (aggressive and vibrating) in a perceptually relevant manner. Moreover, it was able to generalize to unseen samples even though a small set of labeled sounds was used.

show abstract

Section: Mapping and Disentanglement Evaluationmentioning

confidence: 93%

Section: Mapping and Disentanglement Evaluationmentioning

confidence: 97%

“…Following the approach of Esling et al (2018) and Pati and Lerch (2020), we inserted an additional regularization term in the VLB of Eq. (5).…”

Section: Perceptually Regularized Vlbmentioning

confidence: 99%

Section: Stimulimentioning

confidence: 99%

“…Figure 6: Interpretability measure(Pati and Lerch, 2020) for the first eight dimensions of the latent space.…”

mentioning