he idea of computers generating content has been around since the 1950s. Some of the earliest attempts were focused on replicating human creativity by having computers generate visual art and music 1 . Unlike today's synthesized media, computer-generated content from the early era was far from realistic and easily distinguishable from that created by humans. It has taken decades and major leaps in artificial intelligence (AI) for generated content to reach a high level of realism.Generative and discriminative models are two different approaches to machines learning from data. Although discriminative models can identify a person in an image, generative models can produce a new image of a person that has never existed before. Recent leaps in generative models include generative adversarial networks (GANs) 2 . Since their introduction, models for AI-generated media, such as GANs, have enabled the hyper-realistic synthesis of digital content, including the generation of photorealistic images, cloning of voices, animation of faces and translation of images from one form to another 3-6 . The GAN architecture includes two neural networks, a generator and a discriminator. The generator is responsible for generating new content that resembles the input data, while the discriminator's job is to differentiate the generated or fake output from the real data. The two networks compete and try to outperform each other in a closed-feedback loop, resulting in a gradual increase of the realism of the generated output.GAN architectures can generate images of things that have never existed before, such as human faces 3,4 . However, StyleGAN is an example of a modifiable GAN that enables intuitive control of the facial details of generated images by separating high-level attributes like the identity of a person from low-level features such as hair or freckles, with few visible artefacts 4 . Researchers have also proposed an in-domain GAN inversion approach to enable the editing of GAN-generated images, allowing for de-aging or the addition of new facial expressions to existing photographs 7 . Meanwhile, transformers such as the ones used in the massive generative GPT-3 language model are already being shown to be successful for text-to-image generation 8 .