In the modern era, virtual communication between individuals is common. Many people’s lives have been made simpler in a number of circumstances by providing subtitles, generating automated captions for social media videos, and language translation from a source language to a targeted language. Both are included, which offers face-to-face translated captions during video conversations. React is used for application development. To send the data, socket programming is utilized. Context is understood and translated using Google translate API and speech recognition modules. With OpenAI and Whisper, captions are generated. This paper will directly create the translated user voice rather than translating the text and creating subtitles.