XMP-Font: Self-Supervised Cross-Modality Pre-training for Few-Shot Font Generation

Liu, Wei; Liu, Fangyue; Ding, Fei; Qian, He; Yi, Zili

doi:10.1109/cvpr52688.2022.00775

Cited by 32 publications

(13 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To address this, SA-VAE (Sun et al 2017) and EMD (Zhang, Zhang, and Cai 2018) generate unseen fonts by disentangling style and content representations. To enable the generator to capture local style characteristics, some methods (Wu, Yang, and Hsu 2020;Huang et al 2020;Cha et al 2020;Park et al 2021a,b;Liu et al 2022;Kong et al 2022)…”

Section: Related Workmentioning

confidence: 99%

“…Although these methods have achieved remarkable success in font generation, they still suffer from complex character generation and large style variation transfer, leading to severe stroke missing, artifacts, blurriness, layout errors, and style inconsistency as shown in Figure 1(b)(c). Retrospectively, most font generation approaches (Park et al 2021a,b;Xie et al 2021;Tang et al 2022;Liu et al 2022;Kong et al 2022;Wang et al 2023) adopt a GANbased (Goodfellow et al 2014) framework which potentially suffers from unstable training due to their adversarial training nature. Moreover, most of these methods perceive content information through only single-scale highlevel features, omitting the fine-grained details that are crucial to preserving the source content, especially for complex characters.…”

Section: Introductionmentioning

confidence: 99%

“…Moreover, most of these methods perceive content information through only single-scale highlevel features, omitting the fine-grained details that are crucial to preserving the source content, especially for complex characters. There are also a number of methods (Cha et al 2020;Park et al 2021a,b;Liu et al 2022;Kong et al 2022;He et al 2022) that employ prior knowledge to facilitate font generation, such as stroke or component composition of characters; however, this information is costly to annotate for complex characters. Furthermore, the target style is commonly represented by a simple classifier or a discriminator in previous literature, which struggles to learn the appropri-ate style and hinders the style transfer with large variations.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning

Yang,

Peng,

Kong

et al. 2024

AAAI

View full text Add to dashboard Cite

Automatic font generation is an imitation task, which aims to create a font library that mimics the style of reference images while preserving the content from source images. Although existing font generation methods have achieved satisfactory performance, they still struggle with complex characters and large style variations. To address these issues, we propose FontDiffuser, a diffusion-based image-to-image one-shot font generation method, which innovatively models the font imitation task as a noise-to-denoise paradigm. In our method, we introduce a Multi-scale Content Aggregation (MCA) block, which effectively combines global and local content cues across different scales, leading to enhanced preservation of intricate strokes of complex characters. Moreover, to better manage the large variations in style transfer, we propose a Style Contrastive Refinement (SCR) module, which is a novel structure for style representation learning. It utilizes a style extractor to disentangle styles from images, subsequently supervising the diffusion model via a meticulously designed style contrastive loss. Extensive experiments demonstrate FontDiffuser's state-of-the-art performance in generating diverse characters and styles. It consistently excels on complex characters and large style changes compared to previous methods. The code is available at https://github.com/yeungchenwa/FontDiffuser.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning

Yang,

Peng,

Kong

et al. 2024

AAAI

View full text Add to dashboard Cite

show abstract

“…To improve the quality of the generated images, [13] proposed the Deep Feature Similarity (DFS) architecture to leverage the feature similarity between the input content and style images to synthesize target images. Recently, researchers [9,19,20,[44][45][46] have made significant progress by exploiting the compositionality of compositional scripts. However, our experimental results indicate poor performance for the constructed multi-language dataset.…”

Section: Font Generationmentioning

confidence: 99%

Cross-language font style transfer

Taniguchi

Min

et al. 2023

Appl Intell

View full text Add to dashboard Cite

In this paper, we propose a cross-language font style transfer system that can synthesize a new font by observing only a few samples from another language. Automatic font synthesis is a challenging task and has attracted much research interest. Most previous works addressed this problem by transferring the style of the given subset to the content of unseen ones. Nevertheless, they only focused on the font style transfer in the same language. In many cases, we need to learn font style from one language and then apply it to other languages. Existing methods make this difficult to accomplish because of the abstraction of style and language differences. To address this problem, we specifically designed the network into a multi-level attention form to capture both local and global features of the font style. To validate the generative ability of our model, we constructed an experimental font dataset of 847 fonts, each containing English and Chinese characters with the same style. Results show that our model generates 80.3% of users’ preferred images compared with state-of-the-art models.

show abstract

“…Disentanglement is mostly used for few-shot or one-shot font generation tasks [24,25,26,27,28,29,30]. Additional techniques for inter-or intra-radical style consistency for compounded characters with multiple radicals (such as Chinese and Korean letters) are introduced [26,31,32,33]. AGISNet [28] and TET-GAN [29,30] are proposed for dealing with more decorative and colorful font styles.…”

Section: Disentanglement For Font Imagesmentioning

confidence: 99%