FPGA Implementation for GMM-Based Speaker Identification

Ehkan, Phaklen; Allen, Timothy P.; Quigley, Steven F.

doi:10.1155/2011/420369

Cited by 21 publications

(7 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Fully considering the arithmetic property, Jo et al [4] proposed an energy-efficient floating-point MFCC extraction architecture based on field-programmable gate array (FPGA) with the improvement of frequency transformation and optimization of bit-width. Some other works [11], [12] about efficient MFCC extraction are also proposed based on FPGA for low-cost speech recognition systems. In addition, efficient parallel implementation of MFCC feature extraction on graphics processing units (GPU) [13] and digital signal processor (DSP) [14] are presented showing faster extraction than CPU implementation.…”

Section: (A)mentioning

confidence: 99%

“…It hardly decreases compared to the conventional implementation and proves that the difference is uninfluential to the recognition. Some conventional works [4], [11], [12] compare their recognition accuracy under different algorithms and even different datasets. It is unreasonable because the increase in accuracy may not be caused by feature improvement, but by better algorithms or datasets.…”

Section: A System Functionmentioning

confidence: 99%

See 1 more Smart Citation

MSP-MFCC: Energy-Efficient MFCC Feature Extraction Method With Mixed-Signal Processing Architecture for Wearable Speech Recognition Applications

Yang

Lan

et al. 2020

IEEE Access

View full text Add to dashboard Cite

Feature extraction is an essential part of automatic speech recognition (ASR) to compress raw speech data and enhance features, where conventional implementation methods based on the digital domain have encountered energy consumption and processing speed bottlenecks. Thus, we propose a Mixed-Signal Processing (MSP) architecture to efficiently extract Mel-Frequency Cepstrum Coefficients (MFCC) features. We design MSP-MFCC to pre-process speech signals in the analog domain, which significantly reduces the cost of the analog-to-digital converter (ADC), as well as the computational complexity of the digital backend. Moreover, MSP-MFCC eliminates the time-consuming Fourier transform in the conventional digital realization by improving processing flow. We fabricated the analog part based on 180nm CMOS mixedsignal technology, then measured the chip. The measured results show the energy consumption of MSP-MFCC is 0.72 µJ/frame, and the processing speed is up to 45.79 µs/frame. MSP-MFCC achieves 95% energy saving and about 6.4× speedup than state of the art. Further, by using the features extracted by MSP-MFCC, speech recognition simulation reaches the accuracy of 98.2%, which also keeps the leading performance to its current counterparts. The proposed MFCC extractor is competitive for integration in the ultra-low-power always-on wearable speech recognition applications. INDEX TERMS Mixed signal processing architecture, energy-efficient feature extraction, mel-frequency cepstrum coefficients (MFCC), wearable speech recognition application.

show abstract

Section: (A)mentioning

confidence: 99%

Section: A System Functionmentioning

confidence: 99%

MSP-MFCC: Energy-Efficient MFCC Feature Extraction Method With Mixed-Signal Processing Architecture for Wearable Speech Recognition Applications

Yang

Lan

et al. 2020

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Most existing speaker recognition accelerators are based on traditional algorithms, such as GMM or SVM (Table 7). The execution time of the accelerator according to MFCC and GMM proposed by EhKan in 2011 is 0.8ms per vector for speaker set of size 20 when the main frequency is 48MHz [22]. RamosLara proposed an accelerator on the basis of MFCC and SVM.…”

Section: Related Workmentioning

confidence: 99%

A Simplified Speaker Recognition System Based on FPGA Platform

Jiang

et al. 2020

IEEE Access

View full text Add to dashboard Cite

Speaker recognition is a crucial bio-identification technology, which is extensively used in our daily life. With the development of deep learning, convolutional neural networks (CNNs) are applied to speaker recognition tasks given their excellent performance. However, in real life, speaker recognition systems are frequently deployed on end-devices. Therefore, while obtaining recognition accuracy, the model of speaker recognition is expected to be as simple as possible. Inspired by 1-max pooling CNN and Gaussian mixture model-universal background model (GMM-UBM), this study proposes a one dimension convolutional neural networks (1D CNN) on the basis of original 2D CNN. The proposed model reduces the computational complexity of ResNet20 by 64% and the amount of parameters by 53%. In comparison with the original ResNet20 models, the recognition accuracy will be reduced by about one percent on the 15s data set. Then, on the basis of the 1D CNN, we propose a pyramid layer-folding pipeline structure and implement it on the Xilinx VC709 platform. According to the time-dimension partition, the proposed pyramid pipeline structure can process speech data of various lengths. Moreover, our accelerator is 5.1× faster on 3s dataset and 6.8× quicker on 15s dataset than those of the CPU platform.INDEX TERMS Speaker recognition,1D convolution neural networks, pyramid pipeline, folding pipeline, FPGA.

show abstract

“…Gaussian mixture models are used for a wide variety of tasks, including background subtraction [15] and speaker verification [16], [17]. Additionally, several GMM implementations for these domains have been published for GPUs [18], [19] and FPGAs [20], [21], as well as general-purpose GMM implementations for GPUs [22] and FPGAs [23]. Arguably the most similar work to ours is [19], in which the authors accelerated a GMM-based background subtraction algorithm by processing each pixel-wise GMM in a separate thread, in the same way that the KINCv3 GMM kernel devotes a thread to each gene pair.…”

Section: Related Workmentioning

confidence: 99%

GPU Implementation of Pairwise Gaussian Mixture Models for Multi-Modal Gene Co-Expression Networks

et al. 2019

View full text Add to dashboard Cite

Gene co-expression networks (GCNs) are widely used in bioinformatics research to perform system-level analyses of organisms based on the pairwise correlation between all expressed genes. For large datasets which contain samples from multiple sources, gene pairs can exhibit multiple modes of coexpression which confound typical correlation approaches. A clustering method such as Gaussian Mixture Models (GMMs) may be used to separate the modes of each gene pair in an unsupervised manner, prior to computing the correlation of each mode. However, pairwise clustering significantly increases the computational cost of constructing a GCN, as several clustering models must be evaluated for each gene pair, and the number of gene pairs grows rapidly with the number of genes. In this paper, we present a heterogeneous, high-throughput multi-CPU/GPU software package for multi-modal GCN construction, implemented in version 3 of the Knowledge Independent Network Construction (KINC) software. We determine the optimal values for several execution parameters of the GPU implementation, and we benchmark our CPU and GPU implementations for up to 8 CPUs/GPUs. Our GPU implementation achieves a 167x speedup over the corresponding CPU implementation, as well as a 500x speedup over KINCv1. INDEX TERMS Bioinformatics, Gaussian mixture model, gene co-expression network, gene expression matrix, GPU computing, high-performance computing, high-throughput computing.

show abstract

FPGA Implementation for GMM-Based Speaker Identification

Cited by 21 publications

References 10 publications

MSP-MFCC: Energy-Efficient MFCC Feature Extraction Method With Mixed-Signal Processing Architecture for Wearable Speech Recognition Applications

MSP-MFCC: Energy-Efficient MFCC Feature Extraction Method With Mixed-Signal Processing Architecture for Wearable Speech Recognition Applications

A Simplified Speaker Recognition System Based on FPGA Platform

GPU Implementation of Pairwise Gaussian Mixture Models for Multi-Modal Gene Co-Expression Networks

Contact Info

Product

Resources

About