Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security 2021
DOI: 10.1145/3460120.3484742
|View full text |Cite
|
Sign up to set email alerts
|

"Hello, It's Me": Deep Learning-based Speech Synthesis Attacks in the Real World

Abstract: Advances in deep learning have introduced a new wave of voice synthesis tools, capable of producing audio that sounds as if spoken by a target speaker. If successful, such tools in the wrong hands will enable a range of powerful attacks against both humans and software systems (aka machines). This paper documents efforts and findings from a comprehensive experimental study on the impact of deep-learning based speech synthesis attacks on both human listeners and machines such as speaker recognition and voicesig… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 21 publications
(12 citation statements)
references
References 62 publications
0
12
0
Order By: Relevance
“…Thus, it is interesting to investigate whether input transformations can defend against those attacks. As a first attempt, we carry out a preliminary evaluation against hidden voice attack [78] and speech synthesis attack [79] (cf. Appendix A.8).…”
Section: Discussion Of Limitationsmentioning
confidence: 99%
See 3 more Smart Citations
“…Thus, it is interesting to investigate whether input transformations can defend against those attacks. As a first attempt, we carry out a preliminary evaluation against hidden voice attack [78] and speech synthesis attack [79] (cf. Appendix A.8).…”
Section: Discussion Of Limitationsmentioning
confidence: 99%
“…[63], [80] for survey). There are other voice attacks in the speaker recognition domain, such as hidden voice attacks [78] and spoofing attacks [79], [88], [89], [90], [91], [92]. Though these attacks have different attack goals and scenarios from adversarial attacks [15], our preliminary evaluation shows that it is possible to mitigate hidden voice attack [78] and speech synthesis attack [79] via input transformations.…”
Section: Related Workmentioning
confidence: 94%
See 2 more Smart Citations
“…A successful low-resource attack can be trained with approximately 2 minutes of data. Confirming the results in [110], these systems lack the element of verifying the data source validity, thus, there is an urgent need to design solutions that check the integrity of the data before blocking it.…”
Section: Attacking Commercial Voicementioning
confidence: 99%