Srpdet, a Pedestrian Detection Dataset in Severe Weather

Zhao, Yue; Cai, Yuanqiang; Yan, Dongpeng; Song, Yan

doi:10.2139/ssrn.4345508

Cited by 1 publication

(1 citation statement)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…AI-based speech synthesis techniques (e.g., Speech to Text [3], Voice Conversion [3], Scene Fake [4], Emotion Fake [5]), posing significant threats to the integrity and authenticity of voice-activated systems. Consequently, the detection of audio deepfakes has become a crucial area of research, drawing considerable attention from the research community.…”

Section: Introductionmentioning

confidence: 99%

An Audio-Visual Dataset and Deep Learning Frameworks for Crowded Scene Classification

Pham

Ngo

Nguyen

et al. 2022

International Conference on Content-Based Multimedia Indexing

View full text Add to dashboard Cite

In this paper, we propose a deep learning based system for the task of deepfake audio detection. In particular, the draw input audio is first transformed into various spectrograms using three transformation methods of Shorttime Fourier Transform (STFT), Constant-Q Transform (CQT), Wavelet Transform (WT) combined with different auditorybased filters of Mel, Gammatone, linear filters (LF), and discrete cosine transform (DCT). Given the spectrograms, we evaluate a wide range of classification models based on three deep learning approaches. The first approach is to train directly the spectrograms using our proposed baseline models of CNN-based model (CNN-baseline), RNN-based model (RNN-baseline), C-RNN model (C-RNN baseline). Meanwhile, the second approach is transfer learning from computer vision models such as ResNet-18, MobileNet-V3, EfficientNet-B0, DenseNet-121, SuffleNet-V2, Swint, Convnext-Tiny, GoogLeNet, MNASsnet, RegNet. In the third approach, we leverage the state-of-the-art audio pre-trained models of Whisper, Seamless, Speechbrain, and Pyannote to extract audio embeddings from the input spectrograms. Then, the audio embeddings are explored by a Multilayer perceptron (MLP) model to detect the fake or real audio samples. Finally, high-performance deep learning models from these approaches are fused to achieve the best performance. We evaluated our proposed models on ASVspoof 2019 benchmark dataset. Our best ensemble model achieved an Equal Error Rate (EER) of 0.03, which is highly competitive to top-performing systems in the ASVspoofing 2019 challenge.Experimental results also highlight the potential of selective spectrograms and deep learning approaches to enhance the task of audio deepfake detection.Items-deepfake audio, deep learning model, spectrogram, ASVspoof dataset.

show abstract

Section: Introductionmentioning

confidence: 99%