ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
DOI: 10.1109/icassp43922.2022.9746970
|View full text |Cite
|
Sign up to set email alerts
|

Generative Adversarial Network Including Referring Image Segmentation For Text-Guided Image Manipulation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
2

Relationship

2
5

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 15 publications
0
5
0
Order By: Relevance
“…Note that this paper is an extension of a previously published paper [19]. The main difference between our previous study and this one is the introduction of the CLIP loss.…”
Section: Introductionmentioning
confidence: 85%
“…Note that this paper is an extension of a previously published paper [19]. The main difference between our previous study and this one is the introduction of the CLIP loss.…”
Section: Introductionmentioning
confidence: 85%
“…The model is trained on extracted features from training images, achieving an accuracy of 79% for 4k images and 99.5% for 51k images. Y. Watanabe, R. Togo [26] This paper introduces text-guided image manipulation, a concept where natural language descriptions are used to control image generation for user-friendly manipulation. Methods like CMPC-Refseg and text-guided feature exchange modules are proposed to semantically alter image appearance to meet user requirements, overcoming limitations of traditional image manipulation techniques.…”
Section: Literature Surveymentioning
confidence: 99%
“…The research on text-guided image editing was rapidly accelerated by the emergence of the generative adversarial network (GAN) [39]. Approaches to GAN-based text-guided image editing can be divided into two categories: (1) the approaches [10,11,[40][41][42][43] utilizing a unique network with a single or multi-stage architecture, and (2) the approaches [12,13,[44][45][46] leveraging the representation capabilities of a pretrained StyleGAN [16,47,48]. In approach (1), some studies [40,41] have applied an encoder-decoder architecture and successfully generated 64 × 64 resolution edited images on datasets such as Oxford-102 flower [49] and Caltech-UCSD Birds [50].…”
Section: Text-guided Image Editingmentioning
confidence: 99%
“…In approach (1), some studies [40,41] have applied an encoder-decoder architecture and successfully generated 64 × 64 resolution edited images on datasets such as Oxford-102 flower [49] and Caltech-UCSD Birds [50]. To generate high-resolution edited images on complex image dataset such as MSCOCO [51], several studies [10,11,42,43] construct a multi-stage architecture with a generator and discriminator at each stage. Three stages are trained at the same time, and progressively generate edited images of three different resolutions, i.e., 64 × 64 → 128 × 128 → 256 × 256.…”
Section: Text-guided Image Editingmentioning
confidence: 99%