A scalable large vocabulary, speaker independent speech recognition system is being developed using Hidden Markov Models (HMMs) for acoustic modeling and a Weighted Finite State Transducer (WFST) to compile sentence, word, and phoneme models. The system comprises a software backend search and an FPGA-based Gaussian calculation which are covered here. In this paper, we present an efficient pipelined design implemented both as an embedded peripheral and as a scalable, parallel hardware accelerator. Both architectures have been implemented on an Alpha Data XRC-5T1, reconfigurable computer housing a Virtex 5 SX95T FPGA. The core has been tested and is capable of calculating a full set of Gaussian results from 3825 acoustic models in 9.03 ms which coupled with a backend search of 5000 words has provided an accuracy of over 80%. Parallel implementations have been designed with up to 32 cores and have been successfully implemented with a clock frequency of 133 MHz.
An FPGA-based custom core which computes the Gaussian calculation portion of a Hidden Markov Model (HMM) based speech recognition system, is presented. The work is part of the development of a custom embedded system which will provide speaker independend, large vocabulary continuos speech recognition and is currently presented as a hardware/software codesign. By de-coupling the Gaussian calculation from the backend search, calculation of Gaussian results is performed with minimal communication between backend search software and an FPGA based Gaussian core. Several implementations have been investigated in order to minimize memory bandwidth and FPGA resource requirements and are presented. The system has been implemented using an Alpha Data XCR-5T1, reconfigurable computer housing a Virtex 5 SX95T FPGA and has achieved better than real-time performance at 133MHz. The core has been tested and is capable of calculating a full set of Gaussian results from 3825 acoustic models in 5.3ms which coupled with a backend search of 5000 words has provided over 80% accuracy.
The implementation of a complex, large vocabulary, speech recognition application on a modern graphic processors (GPUs) is presented. The parallel single instruction, multiple data (SIMD) architecture is effectively exploited by performing various optimizations to expose the algorithmic parallelism. The work addresses particularly the realization of the Gaussian calculation, a key function. The result is an implementation that runs 3.75 faster than real-time and gives a tenfold speedup when compared to a highly optimized sequential CPU-based implementation. The work is also compared with some earlier work involved in building the same system on a Virtex 5-based, Alpha Data XRC-5T1 reconfigurable computer.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.