This paper describes a digital cepstrum-pitch detector designed to work in real time on a 12-bit binary machine. Spectrum and cepstrum analysis of raw speech data is accomplished through the use of a single complex Fast Fourier Transform (FFT) algorithm. Pitch detection is accomplished by a cepstrum-peak scanning logic designed to locate the position of a peak corresponding to the pitch period. Since other peaks may exist in the data, logic is required to differentiate these peaks from the desired one. Because of the limited word size of the machine, special processing algorithms were developed to perform logging and to avoid the squaring of variables. Various logical operations performed on the spectrum and cepstrum functions in voicing detection and in the extraction of pitch are presented and discussed. A demonstration of the performance of this pitch detector when operated with a 2400-bit/sec digital vocoder will be presented by means of a tape recording. [Work supported under contract.]
A system for the analysis and synthesis speech is described. In the analyzer, speech sounds are first classified into nonturbulent and turbulent groups. The first three formants of the former group and the first three moments of the latter group constitute six significant parameters of speech spectra. By measuring the zero-crossing densities and/or envelopes of automatically selected frequency bands, these parameters or their equivalents are extracted. To test the feasibility of using these parameters in a speech compression system, a synthesis procedure is carried out. The synthesized speech and its spectrograms are demonstrated. [The research in this paper has been made possible through support and sponsorship extended by the Electronics Research Directorate of the Air Force Cambridge Research Center, under Contract No. AF 19(604)-1039, Item I.]
The central frequencies of the principal resonance regions, or formants, have been found to contain sufficient information to resynthesize intelligible speech, as far as vowels and other nonturbulent sounds are concerned. When a single formant is isolated by filtering, one-half the average zero-crossing density, ρ0, is shown to be a convenient approximation to the formant's central frequency. Filters which are controlled electronically by a preliminary approximation to the first-formant frequency, in such a way as to isolate the first two formant regions, have been built and tested. LP and HP filters for the first and second formants, respectively, are based on a circuit by Linvill which uses a transistor negative-impedance converter. An alternate filter of the tuned-circuit type has proved more satisfactory for the first formant. The variable elements in the HP and tuned-circuit filters are Increductors, while the LP filter uses a circuit based on the Miller effect to represent an electronically controllable capacitor. [The research in this paper has been made possible through support and sponsorship extended by the Electronics Research Directorate of the Air Force Cambridge Research Center, under Contract No. AF19(604)-1039, Item I.]
It has long been speculated that an information handling rate of 1000 binits/sec or lower might be adequate for transmission of speech, if first the redundancy of its original form could be reduced. The Formoder, a speech-band-compression system, accomplishes the reduction by extracting the main information-bearing elements from speech, presenting them in a seven channel output which requires a total band width of about 175 cps. Latest articulation tests using Harvard P.B. word lists processed by the Formoder and given to a trained crew have yielded scores of 75% and higher for some male speakers. In order to measure the digital channel capacity necessary to intelligibly transmit the Formoder output, the seven channels were digitized using a technique known as “delta modulation.” This scheme achieves time and amplitude quantization by means of a clipper, sampler, and a linear feedback network. Compared with standard pcm techniques, “delta modulation” represents considerable economy in instrumentation. Articulation tests were made for several rates in the range 500–2500 binits/sec. Results of these tests will be reported and the important features of the digitizing technique will be given. [This research is sponsored by the Air Force Cambridge Research Center under Contract No. AF 19(604)-3465.]
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.