An algorithm is presented for the estimation of the fundamental frequency (F0) of speech or musical sounds. It is based on the well-known autocorrelation method with a number of modifications that combine to prevent errors. The algorithm has several desirable features. Error rates are about three times lower than the best competing methods, as evaluated over a database of speech recorded together with a laryngograph signal. There is no upper limit on the frequency search range, so the algorithm is suited for high-pitched voices and music. The algorithm is relatively simple and may be implemented efficiently and with low latency, and it involves few parameters that must be tuned. It is based on a signal model (periodic signal) that may be extended in several ways to handle various forms of aperiodicity that occur in particular applications. Finally, interesting parallels may be drawn with models of auditory processing.
There is information in speech sounds about the length of the vocal tract; specifically, as a child grows, the resonators in the vocal tract grow and the formant frequencies of the vowels decrease. It has been hypothesized that the auditory system applies a scale transform to all sounds to segregate size information from resonator shape information, and thereby enhance both size perception and speech recognition [Irino and Patterson, Speech Commun. 36, 181-203 (2002)]. This paper describes size discrimination experiments and vowel recognition experiments designed to provide evidence for an auditory scaling mechanism. Vowels were scaled to represent people with vocal tracts much longer and shorter than normal, and with pitches much higher and lower than normal. The results of the discrimination experiments show that listeners can make fine judgments about the relative size of speakers, and they can do so for vowels scaled well beyond the normal range. Similarly, the recognition experiments show good performance for vowels in the normal range, and for vowels scaled well beyond the normal range of experience. Together, the experiments support the hypothesis that the auditory system automatically normalizes for the size information in communication sounds.
Vocal attractiveness has a profound influence on listeners-a bias known as the "what sounds beautiful is good" vocal attractiveness stereotype [1]-with tangible impact on a voice owner's success at mating, job applications, and/or elections. The prevailing view holds that attractive voices are those that signal desirable attributes in a potential mate [2-4]-e.g., lower pitch in male voices. However, this account does not explain our preferences in more general social contexts in which voices of both genders are evaluated. Here we show that averaging voices via auditory morphing [5] results in more attractive voices, irrespective of the speaker's or listener's gender. Moreover, we show that this phenomenon is largely explained by two independent by-products of averaging: a smoother voice texture (reduced aperiodicities) and a greater similarity in pitch and timbre with the average of all voices (reduced "distance to mean"). These results provide the first evidence for a phenomenon of vocal attractiveness increases by averaging, analogous to a well-established effect of facial averaging [6, 7]. They highlight prototype-based coding [8] as a central feature of voice perception, emphasizing the similarity in the mechanisms of face and voice perception.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.