Deep neural networks represent a powerful option for many realworld applications due to their ability to model even complex data relations. However, such neural networks can also be prohibitively expensive to train, making it common to either outsource the training process to third parties or use pretrained neural networks. Unfortunately, such practices make neural networks vulnerable to various attacks, where one attack is the backdoor attack. In such an attack, the third party training the model may maliciously inject hidden behaviors into the model. Still, if a particular input (called trigger) is fed into a neural network, the network will respond with a wrong result.In this work, we explore the option of backdoor attacks to automatic speech recognition systems where we inject inaudible triggers. By doing so, we make the backdoor attack challenging to detect for legitimate users, and thus, potentially more dangerous. We conduct experiments on two versions of datasets and three neural networks and explore the performance of our attack concerning the duration, position, and type of the trigger. Our results indicate that less than 1% of poisoned data is sufficient to deploy a backdoor attack and reach a 100% attack success rate. What is more, while the trigger is inaudible, making it without limitations with respect to the duration of the signal, we observed that even short, non-continuous triggers result in highly successful attacks.
CCS CONCEPTS• Security and privacy → Systems security; • Computing methodologies → Speech recognition; Neural networks.