Abstract-Facial/voice-based authentication is becoming increasingly popular (e.g., already adopted by MasterCard and AliPay), because it is easy to use. In particular, users can now authenticate themselves to online services by using their mobile phone to show themselves performing simple tasks like blinking or smiling in front of its built-in camera. Our study shows that many of the publicly available facial/voice recognition services (e.g. Microsoft Cognitive Services or Amazon Rekognition) are vulnerable to even the most primitive attacks. Furthermore, recent work on modeling a person's face/voice (e.g. Face2Face [1]) allows an adversary to create very authentic video/audio of any target victim to impersonate that target. All it takes to launch such attacks are a few pictures and voice samples of a victim, which can all be obtained by either abusing the camera and microphone of the victim's phone, or through the victim's social media account. In this work, we propose the Real Time Captcha (rtCaptcha) system, which stops/slows down such an attack by turning the adversary's task from creating authentic video/audio of the target victim performing known authentication tasks (e.g., smile, blink) to figuring out what is the authentication task, which is encoded as a Captcha. Specifically, when a user tries to authenticate using rtCaptcha, they will be presented a Captcha and will be asked to take a "selfie" video while announcing the answer to the Captcha. As such, the security guarantee of our system comes from the strength of Captcha, and not how well we can distinguish real faces/voices from synthesized ones. To demonstrate the usability and security of rtCaptcha, we conducted a user study to measure human response times to the most popular Captcha schemes. Our experiments show that, thanks to the humans' speed of solving Captchas, adversaries will have to solve Captchas in less than 2 seconds in order to appear live/human and defeat rtCaptcha, which is not possible for the best settings on the attack side.