Perceived speech quality is most directly measured by subjective listening tests. These tests are often slow and expensive, and numerous attempts have been made to supplement them with objective estimators of perceived speech quality. These attempts have found limited success, primarily in analog and higher-rate, error-free digital environments where speech waveforms are preserved or nearly preserved. The objective estimation of the perceived quality of highly compressed digital speech, possibly with bit errors or frame erasures has remained an open question. We report our findings regarding two essential components of objective estimators of perceived speech quality: perceptual transformations and distance measures. A perceptual transformation modifies a representation of an audio signal in a way that is approximately equivalent to the human hearing process. A distance measure reflects the magnitude of a perceived distance between two perceptually transformed signals.We then describe a new objective estimation approach that uses a simple but effective perceptual transformation and a distance measure that consists of a hierarchy of measuring normalizing blocks. Each measuring normalizing block integrates two perceptually transformed signals over some time or frequency interval to determine the average difference across that interval. This difference is then normalized out of one signal, and is further processed to generate one or more measurements. The resulting new estimators, and several established estimators, are thoroughly evaluated and compared in Part II of this paper. Hierarchical structures of measuring normalizing blocks, or other structures of measuring normalizing blocks may also address open issues in perceived audio quality estimation, layered speech or audio coding, automatic speech or speaker recognition, audio signal enhancement, and other areas.
AcknowledgmentsThe authors would like to acknowledge Michael Frey, from the NIST Information Technology Laboratory, for his willingness to discuss the concept of the measurement system and his assistance in the uncertainty calculations for the measurement system. AbstractAccess time generally describes the time associated with the establishment of a talk path upon user request to speak and has been identifed as a key component of quality of experience (QoE) in communications. NIST's Public Safety Communications Research (PSCR) Division developed a method to measure and quantify the access time of any push-to-talk (PTT) communication system. This measurement method is a follow-on development to the mouth-to-ear (M2E) latency measurement system presented in Ref. [1]. Here, a broad defnition of access time is created that is applicable across multiple PTT technologies.In this paper, a speech intelligibility-based access delay measurement system is introduced. This system measures the Modifed Rhyme Test (MRT) intelligibility of a target word based on when PTT was pushed within a predefned message. It relies only on speech going into and coming out of a voice communications system and PTT timing, so it functions as a fair platform to compare different technologies. Example measurements were performed across the following land mobile radio (LMR) technologies: analog direct and conventional modes, and digital Project 25 (P25) direct, trunked Phase 1, and trunked Phase 2 modes.QoS quality of service. i, 1, 2, 6, 7 RMSE root mean square error. 21SUT system under test. 1, 9, 16, 21, 30 TIA Telecommunications Industry Association. 6, 7, 10 UE user equipment. 3, 5-7 iv ______________________________________________________________________________________________________ This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8275 Symbolsα Intelligibility scaling factor. 10, 14, 15, 20, 23 I 0 Asymptotic intelligibility. 10,[13][14][15] 23, 24, 29 λ Logistic parameter, intelligibility curve steepness. 15, 20,[23][24][25] 27, 29 L w Word length. 13, 14 P 1 First utterance of MRT keyword. 12-14, 21, 22, 29 P 2 Second utterance of MRT keyword. 12-14, 21, 22, 27, 29 T Time preceding P 1 and P 2 in audio clips. 11-14, 18 t Word invariant time. 14 t 0 Logistic parameter, intelligibility curve midpoint. 15, 23-25, 27, 29 τ A Access delay, function of α. 14, 15, 25, 27 T ptt Time PTT pressed within an audio clip. 14, 15
Part I of this paper describes a new approach to the objective estimation of perceived speech quality. This new approach uses a simple but effective perceptual transformation and a distance measure that consists of a hierarchy of measuring normalizing blocks. Each measuring normalizing block integrates two perceptually transformed signals over some time or frequency interval to determine the average difference across that interval. This difference is then normalized out of one signal, and is further processed to generate one or more measurements. In Part II, the resulting estimates of perceived speech quality are correlated with the results of nine subjective listening tests. Together, these tests include 219 4-kHz bandwidth speech codecs, transmission systems, and reference conditions, with bit rates ranging from 2.4 to 64 kb/s. When compared with six other estimators, significant improvements are seen in many cases, particularly at lower bit rates, and when bit errors or frame erasures are present. These hierarchical structures of measuring normalizing blocks, or other structures of measuring normalizing blocks may also address open issues in perceived audio quality estimation, layered speech or audio coding, automatic speech or speaker recognition, audio signal enhancement, and other areas.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.