We report a series of computer experiments aimed to increase our understanding about the sufficiency of the short-time amplitude spectrum for speech coding, and to examine how bandpass segments of the speech spectrum might be represented parametrically. For this purpose we utilize the absolute value of the short-time Fourier transform and the time-derivative of the short-time phase, evaluated at frequency intervals chosen according to auditory criteria. We analyze and digitally encode these parameters. We find that a frequency resolution corresponding to contiguous 1/6-octave bands, spanning the range 200 to 3200 Hz, is a perceptually satisfactory design, and permits digital coding of respectable quality at transmission rates in the range 20 to 16 K bits/s. We also find that a combination of subband coding and short-time spectrum coding leads to comparable results and provides added economy in processing.
In aperture coding, one refrains from encoding waveform, samples until the waveform crosses an appropriately wide aperture centered around the last encoded value. If the waveform is slowly varying in some sense, the above procedure can be a basis for bit rate reduction. The identification of aperture‐crossing samples can be either explicit or implicit, and it is the latter case that this paper mainly addresses. We follow a finite length, converging‐aperture procedure proposed recently for picture waveforms, and show that it can be used for speech coding as well if the aperture width is designed to be syllabicate adaptive. We also describe, for Nyquist‐sampled speech, desirable designs for aperture shape and aperture length L. The special case of L = 1 corresponds to ternary delta modulation with a constant encoding rate of log2 3 ∼ 1.6 bits/sample. Using longer apertures (e.g., L = 2, 3), we show that it is possible to obtain average encoding rates as low as 1.2 bits/sample without significantly changing output speech quality. With 8‐ to 12‐kHz sampling, the average bit rate would then be 9.6 to 14.4 kb/s. At these transmission rates, adaptive aperture coding, used in conjunction with a simple (first‐order) adaptive predictor, can provide communications quality speech.
This paper shows the utility of using adaptive quantizers in the tree-encoding of speech waveforms based on the (M, L ) algorithm [ 11. Resulting adaptive differential PCM (ADPCM) and adaptive delta modulation (ADM) encoders, with time-invariant prediction networks, can provide useful speech outputs at bit rates in the order of 24 kbits/s; at 16 kbits/s, on the other hand, the encoders exhibit clearly perceptible amounts of quantization noise.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.