Through this translator program, it is craved that it can avail the general public to understand foreign language videos, can be useful in the world of education and technology, and can avail the persons with disabilities be up to communicate. The method used is a classification method that functions to detect the flow of shapes to instruct the class attribute as the task of the input attribute by generating automatic output through three stages, namely Machine Learning, Natural Language Processing, and Speech. The results showed that 90.38% of videos were successfully translated into text and audio, 9.62% of videos failed to be translated because the owner limited public interaction, and 89%-97% synchronization between text and audio. In this research, a text and audio translator program has been created using the Application Programming Interface (API). This program is a configuration of deep learning, machine translation, and text-to-speech designed using the high-level programming language python. The system used is a predictive system in which the system tries to predict the output equally the wishes of the user.