Speaker Recognition for Digital Forensic Audio Analysis using Support Vector Machine

Mardhotillah, Rinda; Dirgantoro, Burhanuddin; Setianingsih, Casi

doi:10.1109/isriti51436.2020.9315351

Cited by 5 publications

(2 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Furthermore, the stacked GRU recurrent network layer learns a speaker's acoustic features. Farhatullah et al [15] successfully recognized a recording of a telephone conversation compared to unexpected sound recordings using an SVM model. Saleem et al [16] introduced a novel FSR methodology that was dependent on extracting language and accent data from short words.…”

Section: Literature Reviewmentioning

confidence: 99%

Two-Tier Feature Extraction with Metaheuristics-Based Automated Forensic Speaker Verification Model

2023

View full text Add to dashboard Cite

While speaker verification represents a critically important application of speaker recognition, it is also the most challenging and least well-understood application. Robust feature extraction plays an integral role in enhancing the efficiency of forensic speaker verification. Although the speech signal is a continuous one-dimensional time series, most recent models depend on recurrent neural network (RNN) or convolutional neural network (CNN) models, which are not able to exhaustively represent human speech, thus opening themselves up to speech forgery. As a result, to accurately simulate human speech and to further ensure speaker authenticity, we must establish a reliable technique. This research article presents a Two-Tier Feature Extraction with Metaheuristics-Based Automated Forensic Speaker Verification (TTFEM-AFSV) model, which aims to overcome the limitations of the previous models. The TTFEM-AFSV model focuses on verifying speakers in forensic applications by exploiting the average median filtering (AMF) technique to discard the noise in speech signals. Subsequently, the MFCC and spectrograms are considered as the inputs to the deep convolutional neural network-based Inception v3 model, and the Ant Lion Optimizer (ALO) algorithm is utilized to fine-tune the hyperparameters related to the Inception v3 model. Finally, a long short-term memory with a recurrent neural network (LSTM-RNN) mechanism is employed as a classifier for automated speaker recognition. The performance validation of the TTFEM-AFSV model was tested in a series of experiments. Comparative study revealed the significantly improved performance of the TTFEM-AFSV model over recent approaches.

show abstract

Section: Literature Reviewmentioning

confidence: 99%

Two-Tier Feature Extraction with Metaheuristics-Based Automated Forensic Speaker Verification Model

2023

View full text Add to dashboard Cite

show abstract

“…[20][21][22]. and deep learning based "Convolutional Neural Network" (CNN)[5][6][7][23]. In this article we have used CNN as classification model for speaker recognition.…”

mentioning

confidence: 99%

Improved Speaker Recognition for Degraded Human Voice using Modified-MFCC and LPC with CNN

Moondra¹,

Chahal²

2023

IJACSA

View full text Add to dashboard Cite

Economical speaker recognition solution from degraded human voice signal is still a challenge. This article is covering results of an experiment which targets to improve feature extraction method for effective speaker identification from degraded human audio signal with the help of data science. Every speaker's audio has identical characteristics. Human ears can easily identify these different audio characteristics and classify speaker from speaker's audio. Mel-Frequency Cepstral Coefficient (MFCC) supports to get same intelligence in machine also. MFCC is extensively used for human voice feature extraction. In our experiment we have effectively used MFCC and Linear Predictive Coding (LPC) for better speaker recognition accuracy. MFCC first outlines frames and then finds cepstral coefficient for each frame. MFCC use human audio signal and convert it in numerical value of audio features, which is used to recognize speaker efficiently by Artificial Intelligence (AI) based speaker recognition system. This article covers how effectively audio features can be extracted from degraded human voice signal. In our experiment we have observed improved Equal Error Rate (EER) and True Match Rate (TMR) due to high sampling rate and low frequency range for mel-scale triangular filter. This article also covers pre-emphasis effects on speaker recognition when high background noise comes with audio signal.

show abstract

Tulu Language Text Recognition and Translation

Prathwini,

Rodrigues,

Vijaya

et al. 2024

IEEE Access

View full text Add to dashboard Cite

Language is a primary means of communication, but it is not the only means; knowing a language does, however, assist speed up the process. Many distinct languages are spoken worldwide, and people use them to communicate. This is only one of the many reasons why language is so crucial. Based on the literature survey, it is evident that there is a lack of available translators for the Tulu language. Despite being prevalent predominantly in Karnataka, the Tulu language has not been as widely spoken as other Indian languages until recently, although it gained enough recognition to become the second language in Karnataka. The purpose of our research work aims at translating the English language into the Tulu language. During the evaluation the system was tested on a dataset consisting of handwritten characters during the evaluation process Convolutional Neural Networks used achieved an accuracy rate of 92%. To translate English to the Tulu language, we employed a parallel sentence dataset for the neural approach and a parallel word dataset for the rule-based approach. The rule-based approach resulted in an 89% accuracy rate for word-based analysis and an 81% accuracy rate for sentence-based analysis for the English-to-Tulu language translation. The neural machine translation approach of the Encoder-Decoder model with LSTM is been used to accomplish translation from English to Tulu with a BLEU score of 0.83 and Tulu to English with a BLUE score of 0.65. The model also employed hybrid machine translation to enhance the translation.

show abstract

Speaker Recognition for Digital Forensic Audio Analysis using Support Vector Machine

Cited by 5 publications

References 12 publications

Two-Tier Feature Extraction with Metaheuristics-Based Automated Forensic Speaker Verification Model

Two-Tier Feature Extraction with Metaheuristics-Based Automated Forensic Speaker Verification Model

Improved Speaker Recognition for Degraded Human Voice using Modified-MFCC and LPC with CNN

Tulu Language Text Recognition and Translation

Contact Info

Product

Resources

About