ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9052944
|View full text |Cite
|
Sign up to set email alerts
|

Perceptual loss function for neural modeling of audio systems

Abstract: This work investigates alternate pre-emphasis filters used as part of the loss function during neural network training for nonlinear audio processing. In our previous work, the errorto-signal ratio loss function was used during network training, with a first-order highpass pre-emphasis filter applied to both the target signal and neural network output. This work considers more perceptually relevant pre-emphasis filters, which include lowpass filtering at high frequencies. We conducted listening tests to determ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 17 publications
(13 citation statements)
references
References 12 publications
0
13
0
Order By: Relevance
“…The tests in Section 2 and in previous work [21,23] used a first-order high-pass pre-emphasis filter. For the experiments presented in this section we use a perceptually motivated pre-emphasis filter based on a low-passed A-Weighting filter, which was first proposed in and tested in [37]. The frequency response of this pre-emphasis filter is shown in Figure 11.…”
Section: Loss Function and Trainingmentioning
confidence: 99%
See 1 more Smart Citation
“…The tests in Section 2 and in previous work [21,23] used a first-order high-pass pre-emphasis filter. For the experiments presented in this section we use a perceptually motivated pre-emphasis filter based on a low-passed A-Weighting filter, which was first proposed in and tested in [37]. The frequency response of this pre-emphasis filter is shown in Figure 11.…”
Section: Loss Function and Trainingmentioning
confidence: 99%
“…As proposed in [21] and used in [37], an additional loss function representing the difference in DC offset between the target and neural network output was also included:…”
Section: Loss Function and Trainingmentioning
confidence: 99%
“…We use weight normalization [51], which we found to improve results and training stability. At the end of the trainable blocks, we add a fixed pre-emphasis (PE) filter with impulse response [1, −0.97], which amplifies the high frequencies, as common in similar tasks [58,59]. An illustration of the generator's architecture is shown in Fig.…”
Section: Methodsmentioning
confidence: 99%
“…Various deep learning approaches have already been proposed for the task of modeling audio effects [12][13][14][15][16][17]. While previous approaches have focused on training a single model for each effect, we believe our work is the first to consider building a model that emulates a series connection of effects and their parameters, jointly.…”
Section: Transformation Networkmentioning
confidence: 99%