Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-2219
|View full text |Cite
|
Sign up to set email alerts
|

Fréchet Audio Distance: A Reference-Free Metric for Evaluating Music Enhancement Algorithms

Abstract: We propose the Fréchet Audio Distance (FAD), a novel, reference-free evaluation metric for music enhancement algorithms. We demonstrate how typical evaluation metrics for speech enhancement and blind source separation can fail to accurately measure the perceived effect of a wide variety of distortions. As an alternative, we propose adapting the Fréchet Inception Distance (FID) metric used to evaluate generative image models to the audio domain. FAD is validated using a wide variety of artificial distortions an… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
42
0
2

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 88 publications
(45 citation statements)
references
References 15 publications
1
42
0
2
Order By: Relevance
“…For each model, the network is trained for about 120 epochs and the weights are saved each 8 epochs. We generated drum sounds with the regular weights and with the EMA weights and we observed the same phenomenon as in Song and Ermon [2020]: for the regular weights the quality of the sounds is not necessarily increasing with the training time whereas the EMA weights provide better and more homogeneous Fréchet Audio Distance Kilgour et al [2019] (FAD) during training 2 .…”
Section: Models and Processsupporting
confidence: 64%
“…For each model, the network is trained for about 120 epochs and the weights are saved each 8 epochs. We generated drum sounds with the regular weights and with the EMA weights and we observed the same phenomenon as in Song and Ermon [2020]: for the regular weights the quality of the sounds is not necessarily increasing with the training time whereas the EMA weights provide better and more homogeneous Fréchet Audio Distance Kilgour et al [2019] (FAD) during training 2 .…”
Section: Models and Processsupporting
confidence: 64%
“…We perform SSIM in the frequency domain to compare the synthetic spectrogram with the real-world sample. • Fréchet Audio Distance (FAD) [22] measures the quality and diversity of the generated samples. FAD score is the distance between two multivariate Gaussian estimated on sets of embeddings, i.e.…”
Section: Discussionmentioning
confidence: 99%
“…We use the Mean Opinion Score (MOS) test as a subjective evaluation. To evaluate each of the different vocoders objectively, we used the following four different evaluation metrics: Structural Similarity Index Measure (SSIM) [21], Fréchet Audio Distance (FAD) [22], Log-mel Spectrogram Mean Squared Error (LS-MSE), and Peak Signal-to-Noise Ratio (PSNR). More details about the experiment setup and evaluation metrics are presented in § 3.…”
Section: Introductionmentioning
confidence: 99%
“…Several studies indicate that widely-adopted source separation metrics such as signal to distortion ratio (SDR), signal to inference ratio (SIR), and signal to artifacts ratio (SAR) [56] do not always agree with human perception [7], [9], [35], [57]. Moreover, as brought out in [35], an increment of noise or interferences in the separated source produces an increment of the SAR value.…”
Section: Metricsmentioning
confidence: 99%