2021
DOI: 10.48550/arxiv.2104.11222
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

On Aliased Resizing and Surprising Subtleties in GAN Evaluation

Abstract: We investigate the sensitivity of the Fréchet Inception Distance (FID) score to inconsistent and often incorrect implementations across different image processing libraries. FID score is widely used to evaluate generative models, but each FID implementation uses a different low-level image processing process. Image resizing functions in commonlyused deep learning libraries often introduce aliasing artifacts. We observe that numerous subtle choices need to be made for FID calculation and a lack of consistencies… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
40
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 26 publications
(40 citation statements)
references
References 17 publications
0
40
0
Order By: Relevance
“…As reported in Table 2, to guarantee the most reliable performance of the previous methods, we evaluate the comparison results using a publicly available pre-trained model and its corresponding experiment setting. Because these visual quality metrics are highly sensitive to measurement tools showing incorrect implementations across different image processing libraries (Parmar, Zhang, and Zhu 2021), we evaluate all of the comparison methods using the same measurement tools. Based on the performances we measured, the proposed method shows better scores in terms of FID, precision, and recall compared to the existing is close to that of the real image in terms of the mean and standard deviation.…”
Section: Resultsmentioning
confidence: 99%
“…As reported in Table 2, to guarantee the most reliable performance of the previous methods, we evaluate the comparison results using a publicly available pre-trained model and its corresponding experiment setting. Because these visual quality metrics are highly sensitive to measurement tools showing incorrect implementations across different image processing libraries (Parmar, Zhang, and Zhu 2021), we evaluate all of the comparison methods using the same measurement tools. Based on the performances we measured, the proposed method shows better scores in terms of FID, precision, and recall compared to the existing is close to that of the real image in terms of the mean and standard deviation.…”
Section: Resultsmentioning
confidence: 99%
“…We additionally compare with StyleGAN2 [25] in the unconditional setting. We use Clean-FID [44] for benchmarking due to its reported benefits over previous implementations of FID [16]. 3 Results on MM-CelebA-HQ and MS-COCO are summarized in Tab.…”
Section: Resultsmentioning
confidence: 99%
“…Metrics. We apply the Fréchet Inception Distance (FID) [27] to measure the similarity between real and synthesized images, and perform human evaluation to quantitatively evaluate the synthesis quality of different methods. For the human evaluation, we design three questionnaires corresponding to the three used datasets.…”
Section: Methodsmentioning
confidence: 99%
“…4 We directly use the pre-trained model of these methods as their training procedure depends on the paired data of garment-person or person-person image pairs, which are unavailable in our dataset. When testing paired methods under the unpaired try-on setting, we extract the desired garment from the person image and regard it as the in-shop garment to meet the Table 1: The FID score [27] and human evaluation score among different methods under the unpaired setting on the DeepFashion dataset [21] and our UPT dataset. need of paired approaches.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation