The value or harm associated with an increase in speech coding quality depends on the type of the increase as well as the temporal location of the increase in an utterance. For example, some increases in speech coding bandwidth can be perceived as impairments. The higher quality associated with the wider bandwidth can offset the impairment, but only if the increase happens early enough in an utterance. We present a subjective speech-quality experiment that qualifies these relationships at the talk-spurt time-scale for six different combinations of AMR and SILK speech coders. If a quality increase does not include a bandwidth increase, then, on average, it is beneficial only if it occurs in the first 2.8 seconds of a talk-spurt. If a quality increase includes a bandwidth increase, then it is beneficial only if it occurs in the first 1.8 seconds of a talk-spurt.Index Terms-AMR, SILK, speech bandwidth, speech coding, speech quality, subjective testing, time-varying speech quality
BACKGROUND AND MOTIVATIONAvailable resources on modern voice networks vary with time. This, along with the mobility of many voice network users, results in dynamic resource availability for any given call. Service providers strive to provide a graceful degradation of speech quality when network resources become scarce during a call. When additional network resources become available during a call, it may be possible to increase the speech coding rate and deliver higher speech quality.But the effect of the quality transition must be considered. For example, wideband (WB) speech (50 to 7000 Hz nominal passband) has a documented higher perceived quality than narrowband (NB) speech (300 to 3400 Hz nominal passband) [1]-[3], but a transition from NB to WB speech coding is perceived as an impairment [4]- [6]. If the transition happens early enough in a speech recording, the value of the WB portion can exceed the harm of the transition, for a net improvement (relative to NB only) in overall speech quality. This was the case for NB-to-WB transitions at the 15 or 30 second point in a 60 second recording [4], [5]. But if the transition happens later in a speech recording, the shorter duration of the WB portion means that its value does not overcome the harm of the transition. This was the case for NB-to-WB transitions at the 45 second point in a 60 second recording [4], [5] or at the three-second point of a six-second recording [6]. In [6] we also experimented with gradual transitions (up to 2.5 seconds long) but found they did not mitigate the harm of the transition.Even quality transitions within a fixed bandwidth can be perceived as impairments. In [7], [8] short NB recordings with distinct quality levels were concatenated to form longer recordings and subjective scores were provided for both the short and long recordings.Analysis of these scores shows that when average quality is held constant, increases in quality variation lead to reductions in longterm speech quality.In [9] subjects evaluated three-second NB speech recordings with a low-high-l...