Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
<p><strong>The natural world is full of sounds produced by plants, animals and even landscape elements. Animals can be incredibly creative in their sound production, and the study of their vocalisation is an essential part of behavioural and conservation ecology. Counting animals' vocalisation is often used as a survey method to monitor population abundances, especially with cryptic or nocturnal species. Once performed only by humans, in the last decades, call counts have evolved, and nowadays, Automatic Recording Units (ARUs) are often used for this purpose, and terabytes of data are collected during acoustics surveys requiring long hours of tedious work to be analysed.</strong></p><p>As a consequence, in recent years, there has been a growing interest in using signal processing and artificial intelligence methods to speed up and facilitate this process.</p><p>Studying sounds, even animal sounds, entails studying how their frequency and intensity change in time, a piece of information not easily readable from the data type collected by ARUs: the waveform. A waveform only describes the change of amplitude with respect to time, but other analysis tools are needed to have information about the instantaneous changes in frequency and intensity.</p><p>In 1946, Koening, Dunn and Lace introduced a 3D representation of sounds, where time, frequency and intensity could be read simultaneously: the spectrogram. The spectrogram, then, became one of the main tools used in the broad field of signal processing and, more specifically, in the study of animal and natural sounds: Bioacoustics. Because the spectrogram combines the representation of a signal in both the time and frequency domain, it is called a Time-Frequency Representation (TFR) of sound.</p><p>However, during these seven decades, more TFRs were introduced to improve the spectrogram, whose performances are hindered by its limits in the time-frequency resolution. Every TFR is defined by a set of parameters that can influence its ability to depict important sound features effectively. Therefore, choosing the best TFR and the best parameters is a challenging task, and it is highly dependent on the characteristics of the sound we are studying. In this thesis, we will explore the differences between the main TFRs present in literature, and we will propose methods to test their performance in some real-world bioacoustics problems involving Aotearoa/New Zealand bats' echolocation data and North Island Brown kiwi calls data. We will also demonstrate how the choice of a TFR and its parameters is crucial if we want to obtain optimal results from our data analysis and discuss the importance of TFRs in Bioacoustics.</p>
<p><strong>The natural world is full of sounds produced by plants, animals and even landscape elements. Animals can be incredibly creative in their sound production, and the study of their vocalisation is an essential part of behavioural and conservation ecology. Counting animals' vocalisation is often used as a survey method to monitor population abundances, especially with cryptic or nocturnal species. Once performed only by humans, in the last decades, call counts have evolved, and nowadays, Automatic Recording Units (ARUs) are often used for this purpose, and terabytes of data are collected during acoustics surveys requiring long hours of tedious work to be analysed.</strong></p><p>As a consequence, in recent years, there has been a growing interest in using signal processing and artificial intelligence methods to speed up and facilitate this process.</p><p>Studying sounds, even animal sounds, entails studying how their frequency and intensity change in time, a piece of information not easily readable from the data type collected by ARUs: the waveform. A waveform only describes the change of amplitude with respect to time, but other analysis tools are needed to have information about the instantaneous changes in frequency and intensity.</p><p>In 1946, Koening, Dunn and Lace introduced a 3D representation of sounds, where time, frequency and intensity could be read simultaneously: the spectrogram. The spectrogram, then, became one of the main tools used in the broad field of signal processing and, more specifically, in the study of animal and natural sounds: Bioacoustics. Because the spectrogram combines the representation of a signal in both the time and frequency domain, it is called a Time-Frequency Representation (TFR) of sound.</p><p>However, during these seven decades, more TFRs were introduced to improve the spectrogram, whose performances are hindered by its limits in the time-frequency resolution. Every TFR is defined by a set of parameters that can influence its ability to depict important sound features effectively. Therefore, choosing the best TFR and the best parameters is a challenging task, and it is highly dependent on the characteristics of the sound we are studying. In this thesis, we will explore the differences between the main TFRs present in literature, and we will propose methods to test their performance in some real-world bioacoustics problems involving Aotearoa/New Zealand bats' echolocation data and North Island Brown kiwi calls data. We will also demonstrate how the choice of a TFR and its parameters is crucial if we want to obtain optimal results from our data analysis and discuss the importance of TFRs in Bioacoustics.</p>
Formants in speech signals are easily identified, largely because formants are defined to be local maxima in the wideband sound spectrum. Sadly, this is not what is of most interest in analyzing speech; instead, resonances of the vocal tract are of interest, and they are much harder to measure. Klatt [(1986). in Proceedings of the Montreal Satellite Symposium on Speech Recognition, 12th International Congress on Acoustics, edited by P. Mermelstein (Canadian Acoustical Society, Montreal), pp. 5–7] showed that estimates of resonances are biased by harmonics while the human ear is not. Several analysis techniques placed the formant closer to a strong harmonic than to the center of the resonance. This “harmonic attraction” can persist with newer algorithms and in hand measurements, and systematic errors can persist even in large corpora. Research has shown that the reassigned spectrogram is less subject to these errors than linear predictive coding and similar measures, but it has not been satisfactorily automated, making its wider use unrealistic. Pending better techniques, the recommendations are (1) acknowledge limitations of current analyses regarding influence of F0 and limits on granularity, (2) report settings more fully, (3) justify settings chosen, and (4) examine the pattern of F0 vs F1 for possible harmonic bias.
The reassigned spectrogram (RS) has emerged as the most accurate way to infer vocal tract resonances from the acoustic signal [Shadle, Nam, and Whalen (2016). “Comparing measurement errors for formants in synthetic and natural vowels,” J. Acoust. Soc. Am. 139(2), 713–727]. To date, validating its accuracy has depended on formant synthesis for ground truth values of these resonances. Synthesis is easily controlled, but it has many intrinsic assumptions that do not necessarily accurately realize the acoustics in the way that physical resonances would. Here, we show that physical models of the vocal tract with derivable resonance values allow a separate approach to the ground truth, with a different range of limitations. Our three-dimensional printed vocal tract models were excited by white noise, allowing an accurate determination of the resonance frequencies. Then, sources with a range of fundamental frequencies were implemented, allowing a direct assessment of whether RS avoided the systematic bias towards the nearest strong harmonic to which other analysis techniques are prone. RS was indeed accurate at fundamental frequencies up to 300 Hz; above that, accuracy was somewhat reduced. Future directions include testing mechanical models with the dimensions of children's vocal tracts and making RS more broadly useful by automating the detection of resonances.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.