Text2FaceGAN: Face Generation from Fine Grained Textual Descriptions

Nasir, Osaid Rehman; Jha, Shailesh; Grover, Manraj Singh; Yu, Yi; Kumar, Ajit; Shah, Rajiv Ratn

doi:10.1109/bigmm.2019.00-42

Cited by 39 publications

(11 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is to be noted that each of the techniques presented uses CelebA as their image datasets but different labels for the images from CelebA. For example, [30] has auto-generated captions with the same sentence repeated for any particular sentence. [21] generated their own dataset of labels of 400 images from CelebA.…”

Section: Resultsmentioning

confidence: 99%

“…By having two fully trained models and three datasets to test on, we are able to present our results and comparisons across data with more variations. We also compare our performance with current state-of-the-art techniques, including TediGAN [42], ControlGAN [19], AttnGAN [43], and Text2FaceGAN [30]. In order to evaluate the performance of these methods, we use the dataset we have gathered along with test images.…”

Section: Methodsmentioning

confidence: 99%

“…Text-to-image synthesis has made substantial progress towards realistic image generation, with numerous approaches being already published. Text-to-face generation, however, has remained largely unaddressed except for several attempts to create datasets for this purpose [30,42]. Recently, a text-to-face generator and manipulator method called TediGAN [42] used a new dataset Multi-Modal-CelebA-HQ which the authors have created.…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Semantic Text-to-Face GAN -ST^2FG

Oza¹,

Chanda²,

Doermann³

2021

Preprint

View full text Add to dashboard Cite

Faces generated using generative adversarial networks (GANs) have reached unprecedented realism. These faces, also known as "Deep Fakes", appear as realistic photographs with very little pixel-level distortions. While some work has enabled the training of models that lead to the generation of specific properties of the subject, generating a facial image based on a natural language description has not been fully explored. For security and criminal identification, the ability to provide a GAN-based system that works like a sketch artist would be incredibly useful. In this paper, we present a novel approach to generate facial images from semantic text descriptions. The learned model is provided with a text description and an outline of the type of face, which the model uses to sketch the features. Our models are trained using an Affine Combination Module (ACM) mechanism to combine the text embedding from BERT and the GAN latent space using a self-attention matrix. This avoids the loss of features due to inadequate "attention", which may happen if text embedding and latent vector are simply concatenated. Our approach is capable of generating images that are very accurately aligned to the exhaustive textual descriptions of faces with many fine detail features of the face and helps in generating better images. The proposed method is also capable of making incremental changes to a previously generated image if it is provided with additional textual descriptions or sentences.Preprint. Under review.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Semantic Text-to-Face GAN -ST^2FG

Oza¹,

Chanda²,

Doermann³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…As the image quality was poor, they used MSG-GAN [20] as the generator and improved the image quality. Text2FaceGAN [21] was based on the GAN-INT-CLS architecture by Reed et al [7]. The Text2Face dataset was also introduced using the attributes of the CelebA dataset and an algorithm for caption generation.…”

Section: Text-to-face Generationmentioning

confidence: 99%

Text-to-Face Generation with StyleGAN2

Ayanthi¹,

Munasinghe²

2022

Computer Science &Amp; Technology Trends

View full text Add to dashboard Cite

Synthesizing images from text descriptions has become an active research area with the advent of Generative Adversarial Networks. The main goal here is to generate photo-realistic images that are aligned with the input descriptions. Text-to-Face generation(T2F) is a sub-domain of Text-to-Image generation(T2I) that is more challenging due to the complexity and variation of facial attributes. It has a number of applications mainly in the domain of public safety. Even though several models are available for T2F, there is still the need to improve the image quality and the semantic alignment. In this research, we propose a novel framework, to generate facial images that are well-aligned with the input descriptions. Our framework utilizes the highresolution face generator, StyleGAN2, and explores the possibility of using it in T2F. Here, we embed text in the input latent space of StyleGAN2 using BERT embeddings and oversee the generation of facial images using text descriptions. We trained our framework on attributebased descriptions to generate images of 1024x1024 in resolution. The images generated exhibit a 57% similarity to the ground truth images, with a face semantic distance of 0.92, outperforming state-of-the-artwork. The generated images have a FID score of 118.097 and the experimental results show that our model generates promising images.

show abstract

“…To address the data scarcity issue, O.R. Nasir et al [20] proposed to utilise the labels of CelebA [11] to produce structured pseudo text descriptions automatically. In this way, the samples in the dataset are paired with sentences which contains the positive feature names separated by conjunctions and punctuation.…”

Section: B Text-to-face Synthesismentioning

confidence: 99%