Interactive real-time communication between people and machine enables innovations in transportation, health care, etc. Using voice or gesture commands improves usability and broad public appeal of such systems. In this paper we experimentally evaluate Google speech recognition and Apple Siritwo of the most popular cloud-based speech recognition systems. Our goal is to evaluate the performance of these systems under different network conditions in terms of command recognition accuracy and round trip delay -two metrics that affect interactive application usability. Our results show that speech recognition systems are affected by loss and jitter, commonly present in cellular and WiFi networks. Finally, we propose and evaluate a network coding transport solution to improve the quality of voice transmission to cloud-based speech recognition systems. Experiments show that our approach improves the accuracy and delay of cloud speech recognizers under different loss and jitter values.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.