2023
DOI: 10.1109/access.2023.3269847
|View full text |Cite
|
Sign up to set email alerts
|

Text-Guided Image Manipulation via Generative Adversarial Network With Referring Image Segmentation-Based Guidance

Abstract: This study proposed a novel text-guided image manipulation method that introduces referring image segmentation into a generative adversarial network. The proposed text-guided image manipulation method aims to manipulate images containing multiple objects while preserving text-unrelated regions. The proposed method assigns the task of distinguishing between text-related and unrelated regions in an image to segmentation guidance based on referring image segmentation. With this architecture, the adversarial gener… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 38 publications
0
2
0
Order By: Relevance
“…The research on text-guided image editing was rapidly accelerated by the emergence of the generative adversarial network (GAN) [39]. Approaches to GAN-based text-guided image editing can be divided into two categories: (1) the approaches [10,11,[40][41][42][43] utilizing a unique network with a single or multi-stage architecture, and (2) the approaches [12,13,[44][45][46] leveraging the representation capabilities of a pretrained StyleGAN [16,47,48]. In approach (1), some studies [40,41] have applied an encoder-decoder architecture and successfully generated 64 × 64 resolution edited images on datasets such as Oxford-102 flower [49] and Caltech-UCSD Birds [50].…”
Section: Text-guided Image Editingmentioning
confidence: 99%
See 1 more Smart Citation
“…The research on text-guided image editing was rapidly accelerated by the emergence of the generative adversarial network (GAN) [39]. Approaches to GAN-based text-guided image editing can be divided into two categories: (1) the approaches [10,11,[40][41][42][43] utilizing a unique network with a single or multi-stage architecture, and (2) the approaches [12,13,[44][45][46] leveraging the representation capabilities of a pretrained StyleGAN [16,47,48]. In approach (1), some studies [40,41] have applied an encoder-decoder architecture and successfully generated 64 × 64 resolution edited images on datasets such as Oxford-102 flower [49] and Caltech-UCSD Birds [50].…”
Section: Text-guided Image Editingmentioning
confidence: 99%
“…In approach (1), some studies [40,41] have applied an encoder-decoder architecture and successfully generated 64 × 64 resolution edited images on datasets such as Oxford-102 flower [49] and Caltech-UCSD Birds [50]. To generate high-resolution edited images on complex image dataset such as MSCOCO [51], several studies [10,11,42,43] construct a multi-stage architecture with a generator and discriminator at each stage. Three stages are trained at the same time, and progressively generate edited images of three different resolutions, i.e., 64 × 64 → 128 × 128 → 256 × 256.…”
Section: Text-guided Image Editingmentioning
confidence: 99%