Recent studies have reported the success of linear prediction analysis (LPA)-related features, which are extracted as a short-term spectral feature for replay attack detection due to the advantage of the imperfection in the LPA-based signal produced by recording and playback devices. However, exploiting LPA-based signals is focused on only magnitude-based features and ignores phase-based features. In this paper, we propose two novel LPA-based relative phase features, namely, linear prediction residual-based relative phase (LPR-RP) and linear prediction analysis estimated speech-based relative phase (LPAES-RP).The key idea of both LPR-RP and LPAES-RP is to extract the phase information based on LPA-based signals. In the LPR-RP feature, we modify the relative phase (RP) feature extraction using a linear prediction residual (LPR) derived from the difference between the original/raw speech and LPA estimated speech signal (LPAES) instead of the original/raw speech signal. LPES-RP feature exploits the LPAES signal to replace the original/raw speech signal. Because the trace of the recording and playback device artifacts is the primary evidence for detecting the replayed signal, the advantages of the imperfection of LPR and LPAES are expected to provide efficient phase information for the replay attack detection task. In addition, using the individual LPR-RP/LPAES-RP feature, our proposed features are combined with two standard features, mel-frequency cepstral coefficients (MFCC), constant Q transform cepstral coefficients (CQCC) and the original RP feature, at score level to further improve the detection decision. The performance of the proposed LPR-RP/LPAES-RP feature and combination are evaluated using the ASVspoof 2017 version 2 database. On the evaluation subset, our proposed LPR-RP and LPAES-RP feature achieves a promising improvement over baseline features (MFCC/CQCC). Moreover, the combined systems of LPR-RP, RP, and CQCC obtains an equal error rate of 9.26%.
INDEX TERMSReplay attack detection, phase information, speaker recognition anti-spoofing, linear prediction analysis-based feature.