Automatic synthesis of realistic images is challenging, and even state-of-the-art artificial intelligence and machine learning algorithms suffer from not fulfilling this expectation. However, the emergence of image processing has allowed operations on an image to enhance or extract information from it and synthesize pictures from textual descriptions, which has become an active research area in recent times. The already-developed model by OpenAI surprised the world after its launch. However, everything worthwhile has a price. Further study is necessary since the model could not account for problems including gender prejudice, stereotypes, language structure, viewpoint, writing, symbolism, and the delivery of explicit material. This survey report aims to supplement past studies using different image processing techniques to create synthetic images. This article critically assesses current approaches to assess text-to-image synthesis models, draws attention to the existing architectures' limitations, and identifies new research areas. To further advance research in the field, improvement of the architectural design and model training is needed. This can be achieved by developing better datasets and evaluation metrics.