Generative AI models may generate massive amounts of fresh material from their training data. Besides text, they may create graphics, music, video, and more. One explanation for their unexpected popularity is its widespread effect on numerous sectors. Text, picture, and music creation are among their numerous uses. Further uses include healthcare, education, and met aversion. However, these models' design and execution remain difficult. Problems include dependability, biased material, overfitting, and restrictions. This study seeks to examine multimodal generative AI systems' similarities and differences. These criteria involve input, output, development authority, frameworks, and tools. These examples show how multimodal generative AI models are used in many industries.