ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
DOI: 10.1109/icassp43922.2022.9747186
|View full text |Cite
|
Sign up to set email alerts
|

Deepfake Speech Detection Through Emotion Recognition: A Semantic Approach

Abstract: In recent years, audio and video deepfake technology has advanced relentlessly, severely impacting people's reputation and reliability. Several factors have facilitated the growing deepfake threat. On the one hand, the hyper-connected society of social and mass media enables the spread of multimedia content worldwide in real-time, facilitating the dissemination of counterfeit material. On the other hand, neural network-based techniques have made deepfakes easier to produce and difficult to detect, showing that… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
12
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 29 publications
(16 citation statements)
references
References 24 publications
0
12
0
Order By: Relevance
“…Works on how to include emotional cues in deepfake detection are starting to be Columns display the number of original recordings by the performer (OR), the number of performers of deepfakes on the original recordings of the current performer (PF), and number of fake recordings made on recordings by the performer (FRP). en, the number of photograms with emotions in common with performed [15][16][17], which indicates that emotional cues in video recordings can be used effectively for deepfake detection.…”
Section: Discussionmentioning
confidence: 99%
See 4 more Smart Citations
“…Works on how to include emotional cues in deepfake detection are starting to be Columns display the number of original recordings by the performer (OR), the number of performers of deepfakes on the original recordings of the current performer (PF), and number of fake recordings made on recordings by the performer (FRP). en, the number of photograms with emotions in common with performed [15][16][17], which indicates that emotional cues in video recordings can be used effectively for deepfake detection.…”
Section: Discussionmentioning
confidence: 99%
“…Consequently, the effect of bad neutral photograms in fakes, mainly in initial photograms, may suggest that emotion recognition is prone to work worse than in their corresponding original recordings. Works on how to include emotional cues in deepfake detection are starting to be performed [ 15 – 17 ], which indicates that emotional cues in video recordings can be used effectively for deepfake detection.…”
Section: Discussionmentioning
confidence: 99%
See 3 more Smart Citations