In this paper, we investigate a new paradigm of objective speech quality estimation. The proposed nonintrusive method utilizes only processed speech signal, whereas conventional objective models require source speech applied as an input to the system under test, as well as the processed speech. The proposed method is based on the temporal envelope representation of speech, which reflects the perceptual characteristics of human auditory systems and human speech production systems, and we found it provides a useful cue for nonintrusive objective speech quality estimation. The performance of the proposed method is demonstrated for four different subjective mean opinion score databases.Index Terms-Modulation spectrum, non-intrusive estimation, objective model, perception, speech quality, temporal envelope.
A computational model to predict MOS of processed speech is proposed. The system measures the distortion of processed speech (compared to the source speech) using a peripheral model of the mammalian auditory system and a psychophysically-inspired measure, and maps the distortion value onto the MOS scale. This paper describes our attempt to derive a "universal", database-independent, distortion-to-MOS mapping function. Preliminary experimental evaluation shows that the performance of the proposed system is comparable with ITU-T recommendation P.861 for clean speech sources, and outperforms the P.861 recommendation for speech sources corrupted by either car or babble noise at 30 dJ3 SNR.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.