Speaker-independent source cell-phone identification for re-compressed and noisy audio recordings

Verma, Vinay Kumar; Khanna, Nitin

doi:10.1007/s11042-020-10205-z

Cited by 10 publications

(8 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We have used the method proposed by Vinay et al [24] as the baseline in our experiments. In this method, the authors proposed a CNN architecture that was used to classify audio recordings from 19 different devices.…”

Section: Baseline Methodsmentioning

confidence: 99%

“…Yanxiong et al [23] used features extracted using a deep auto-encoder network for recording device classification. Verma et al [24,25] proposed convolutional neural network (CNN)-based classification with absolute discrete Fourier transform (DFT) features. A detailed review of the recording device classification literature can be found in [4,24].…”

Section: Introductionmentioning

confidence: 99%

“…Verma et al [24,25] proposed convolutional neural network (CNN)-based classification with absolute discrete Fourier transform (DFT) features. A detailed review of the recording device classification literature can be found in [4,24]. Among these existing methods, CNN-based classification with absolute DFT features proposed by Vinay et al [24] showed the best result.…”

Section: Introductionmentioning

confidence: 99%

“…A detailed review of the recording device classification literature can be found in [4,24]. Among these existing methods, CNN-based classification with absolute DFT features proposed by Vinay et al [24] showed the best result. However, we observed a significant margin for improvement in recording device classification accuracy.…”

Section: Introductionmentioning

confidence: 99%

“…Considering the above limitations, we in this work proposed a single-attention pooling network (SA-CNN) and a dual-attention pooling network (DA-CNN) for recording device classification. Recently, CNN-based models showed state-of-the-art results for recording device classification [24,25], which motivated us to keep the primary network based on CNNs. We observed that not all parts of the speech spectrum contribute equally to the recording device classification.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Dual Attention Pooling Network for Recording Device Classification Using Neutral and Whispered Speech

Naini

Singhal

Ghosh

2022

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

In this work, we proposed a method for recording device classification using the recorded speech signal. With the rapid increase in different mobile and professional recording devices, determining the source device has many applications in forensics and in further improving various speech-based applications. This paper proposes dual and single attention pooling-based convolutional neural networks (CNN) for recording device classification using neutral and whispered speech. Experiments using five recording devices with simultaneous direct recordings from 88 speakers speaking both in neutral and whisper and recordings from 21 mobile devices with simultaneous playback recordings reveal that the proposed dual attention pooling based CNN method performs better than the best baseline scheme. We show that we achieve a better performance in recording device classification with whispered speech recordings than corresponding neutral speech. We also demonstrate the importance of voiced/unvoiced speech and different frequency bands in classifying the recording devices.

show abstract

Section: Baseline Methodsmentioning

confidence: 99%