2021
DOI: 10.48550/arxiv.2107.03021
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Bi-level Feature Alignment for Versatile Image Translation and Manipulation

Abstract: Generative adversarial networks (GANs) have achieved great success in image translation and manipulation. However, high-fidelity image generation with faithful style control remains a grand challenge in computer vision. This paper presents a versatile image translation and manipulation framework that achieves accurate semantic and style guidance in image generation by explicitly building a correspondence. To handle the quadratic complexity incurred by building the dense correspondences, we introduce a bi-level… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
6

Relationship

3
3

Authors

Journals

citations
Cited by 7 publications
(9 citation statements)
references
References 89 publications
0
9
0
Order By: Relevance
“…For example, Liu et al [19] exploit optimal transport in a unified framework to model semantic correspondences. For tackling unbalanced distributions with different masses and deviations, unbalanced optimal transport [4,15] has also been explored for different tasks, such as image translation [36,37,41]. To the best of our knowledge, the proposed VMRF the first that adapt unbalanced optimal transport for optimizing NeRF representations.…”
Section: Related Workmentioning
confidence: 99%
“…For example, Liu et al [19] exploit optimal transport in a unified framework to model semantic correspondences. For tackling unbalanced distributions with different masses and deviations, unbalanced optimal transport [4,15] has also been explored for different tasks, such as image translation [36,37,41]. To the best of our knowledge, the proposed VMRF the first that adapt unbalanced optimal transport for optimizing NeRF representations.…”
Section: Related Workmentioning
confidence: 99%
“…Wang et al [53] only involve tokens with top K attention scores for feature aggregation. Similar strategy is also adopted in [69,63,44]. Although effective, it is not flexible enough to fix the number of attentive tokens, which fails to model the complex and changeable matching patterns in practice.…”
Section: Efficient Transformermentioning
confidence: 99%
“…To alleviate this issue, the works of [69,63] propose to fix the number of feature points that each query position focuses on, thereby achieving a linearcomplexity model. The reason behind such a design lies in that, each query position in target images is, in reality, independent to and dissimilar to most points in the reference.…”
Section: Introductionmentioning
confidence: 99%
“…Conditional image generation has achieved remarkable progress by learning the mapping among data of different domains. To achieve high-fidelity yet flexible image generation, various conditional inputs have been adopted including semantic segmentation [12,48,35,59,62], scene layouts [42,65,18], key points [26,29,61,57], edge maps [12,55,56], etc. Recently, several studies explored to generate images with cross-modal guidance [58,53].…”
Section: Conditional Image Generationmentioning
confidence: 99%