Attentive Generative Adversarial Network To Bridge Multi-Domain Gap For Image Synthesis

Wang, Min; Lang, Congyan; Liang, Liqian; Lyu, Gengyu; Feng, Songhe; Wang, Tao

doi:10.1109/icme46284.2020.9102761

Cited by 12 publications

(8 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…AGAN-CL [114] consists of a network which is trained to produce masks, thereby providing fine-grained information such as the number of objects, location, size and shape. The authors employed a multi-scale loss between real and generated masks, and an additional perceptual loss for global coherence.…”

Section: Semantic Masksmentioning

confidence: 99%

“…Input Method caption [16], [33], [40], [42], [41], [48], [35], [54], [43], [55], [58], [61], [65], [67], [68], [69], [70], [75], [80], [86], [34], [87], [128] caption + dialogue [93], [95], [99] caption + layout [104], [97], [108], [103] caption + semantic masks [109], [110], [113], [114], [115], [38] scene graphs [116], [121], [122], [124],…”

Section: Evaluation Of T2i Modelsmentioning

confidence: 99%

“…Real Images --GAN-INT-CLS [16] 2.66 79.55 TAC-GAN [31] 3.45 -StackGAN [33] 3.20 55.28 StackGAN++ [40] 3.26 48.68 CVAEGAN [186] 4.21 -HDGAN [42] 3.45 -Lao et al [70] -37.94 PPAN [43] 3.52 -C4Synth [91] 3.52 -HfGAN [48] 3.57 -LeicaGAN [113] 3.92 -Text-SeGAN [65] 4.03 -RiFeGAN [92] 4.53 -AGAN-CL [114] 4.72 -Souza et al [34] 3.71 16.47 ---GAWWN [104] 3.62 67.22 -StackGAN [33] 3.70 51.89 -StackGAN++ [40] 4.04 15.30 -CVAEGAN [186] 4.97 --HDGAN [42] 4.15 --FusedGAN [41] 3.92 --PPAN [43] 4.38 --HfGAN [48] 4.48 --LeicaGAN [113] 4.62 --AttnGAN [35] 4…”

Section: Model Is ↑ Fid ↓mentioning

confidence: 99%

See 2 more Smart Citations

Adversarial Text-to-Image Synthesis: A Review

Frolov,

Hinz,

Raue

et al. 2021

Preprint

View full text Add to dashboard Cite

With the advent of generative adversarial networks, synthesizing images from textual descriptions has recently become an active research area. It is a flexible and intuitive way for conditional image generation with significant progress in the last years regarding visual realism, diversity, and semantic alignment. However, the field still faces several challenges that require further research efforts such as enabling the generation of high-resolution images with multiple objects, and developing suitable and reliable evaluation metrics that correlate with human judgement. In this review, we contextualize the state of the art of adversarial text-to-image synthesis models, their development since their inception five years ago, and propose a taxonomy based on the level of supervision. We critically examine current strategies to evaluate textto-image synthesis models, highlight shortcomings, and identify new areas of research, ranging from the development of better datasets and evaluation metrics to possible improvements in architectural design and model training. This review complements previous surveys on generative adversarial networks with a focus on text-to-image synthesis which we believe will help researchers to further advance the field.

show abstract

Section: Semantic Masksmentioning

confidence: 99%

Section: Evaluation Of T2i Modelsmentioning

confidence: 99%

Section: Model Is ↑ Fid ↓mentioning

confidence: 99%

See 1 more Smart Citation

Adversarial Text-to-Image Synthesis: A Review

Frolov,

Hinz,

Raue

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Fréchet Inception Distance (FID) With visual features extracted by a pre-trained Inception-v3 [205] model, FID ---GAWWN [182] 3.62 67.22 -StackGAN [16] 3.70 51.89 -StackGAN++ [17] 4.04 15.30 -CVAEGAN [183] 4.97 --HDGAN [108] 4.15 --FusedGAN [184] 3.92 --PPAN [109] 4.38 --HfGAN [185] 4.48 --LeicaGAN [186] 4.62 --AttnGAN [14] 4.36 -67.82 MirrorGAN [18] 4.56 -57.67 SEGAN [114] 4.67 18.17 -ControlGAN [116] 4.58 -69.33 DM-GAN [187] 4 [188] 4.67 --textStyleGAN [120] 4.78 -74.72 AGAN-CL [189] 4.97 -63.87 TVBi-GAN [125] 5.03 11.83 -Souza et al [124] 4.23 11.17 -RiFeGAN [190] 5.23 --Wang et al [191] 5.06 12.34 86.50 Bridge-GAN [122] 4.74 -- [206] measures the distance between the real image distribution and generated image distribution. Compared with IS, FID is a more consistent evaluation metric as it captures various kinds of disturbances [206].…”

Section: Image Quality Metricsmentioning

confidence: 99%

“…On the other hand, FID share the same problem with IS such as struggling to detect overfitting results. Except for above common image quality metrics, some evaluation metrics are specially designed for certain gen- 3.52 -C4Synth [192] 3.52 -HfGAN [185] 3.57 -LeicaGAN [186] 3.92 -Text-SeGAN [193] 4.03 -RiFeGAN [190] 4.53 -AGAN-CL [189] 4.72 -Souza et al [124] 3.71 16.47 eration tasks. For image synthesis conditioned on semantic map, the image quality can be assessed by leveraging pretrained segmentation model to compute the mean average precision (mAP) and pixel accuracy (Acc).…”

Section: Image Quality Metricsmentioning

confidence: 99%

Multimodal Image Synthesis and Editing: A Survey

Zhan¹,

Yu²,

Wu³

et al. 2021

Preprint

View full text Add to dashboard Cite

As information exists in various modalities in real world, effective interaction and fusion among multimodal information plays a key role for the creation and perception of multimodal data in computer vision and deep learning research. With superb power in modelling the interaction among multimodal information, multimodal image synthesis and editing have become a hot research topic in recent years. Different from traditional visual guidance which provides explicit clues, multimodal guidance offers intuitive and flexible means in image synthesis and editing. On the other hand, this field is also facing several challenges in alignment of features with inherent modality gaps, synthesis of high-resolution images, faithful evaluation metrics, etc. In this survey, we comprehensively contextualize the advance of the recent multimodal image synthesis & editing and formulate taxonomies according to data modality and model architectures. We start with an introduction to different types of guidance modalities in image synthesis and editing. We then describe multimodal image synthesis and editing approaches extensively with detailed frameworks including Generative Adversarial Networks (GANs), GAN Inversion, Transformers, and other methods such as NeRF and Diffusion models. This is followed by a comprehensive description of benchmark datasets and corresponding evaluation metrics as widely adopted in multimodal image synthesis and editing, as well as detailed comparisons of different synthesis methods with analysis of respective advantages and limitations. Finally, we provide insights into the current research challenges and possible future research directions. We hope this survey could lay a sound and valuable foundation for future development of multimodal image synthesis and editing. A project associated with this survey is available at https://github.com/fnzhan/MISE.

show abstract