Deep neural networks have advanced the field of detection and classification and allowed for effective identification of signals in challenging data sets. Numerous time-critical conservation needs may benefit from these methods. We developed and empirically studied a variety of deep neural networks to detect the vocalizations of endangered North Atlantic right whales (Eubalaena glacialis). We compared the performance of these deep architectures to that of traditional detection algorithms for the primary vocalization produced by this species, the upcall. We show that deep-learning architectures are capable of producing false-positive rates that are orders of magnitude lower than alternative algorithms while substantially increasing the ability to detect calls. We demonstrate that a deep neural network trained with recordings from a single geographic region recorded over a span of days is capable of generalizing well to data from multiple years and across the species' range, and that the low false positives make the output of the algorithm amenable to quality control for verification. The deep neural networks we developed are relatively easy to implement with existing software, and may provide new insights applicable to the conservation of endangered species.
[1] Frequency spectra from deep-ocean near-bottom acoustic measurements obtained contemporaneously with wind, wave, and seismic data are described and used to determine the correlations among these data and to discuss possible causal relationships. Microseism energy appears to originate in four distinct regions relative to the hydrophone: wind waves above the sensors contribute microseism energy observed on the ocean floor; a fraction of this local wave energy propagates as seismic waves laterally, and provides a spatially integrated contribution to microseisms observed both in the ocean and on land; waves in storms generate microseism energy in deep water that travels as seismic waves to the sensor; and waves reflected from shorelines provide opposing waves that add to the microseism energy. Correlations of local wind speed with acoustic and seismic spectral time series suggest that the local Longuet-Higgins mechanism is visible in the acoustic spectrum from about 0.4 Hz to 80 Hz. Wind speed and acoustic levels at the hydrophone are poorly correlated below 0.4 Hz, implying that the microseism energy below 0.4 Hz is not typically generated by local winds. Correlation of ocean floor acoustic energy with seismic spectra from Oahu and with wave spectra near Oahu imply that wave reflections from Hawaiian coasts, wave interactions in the deep ocean near Hawaii, and storms far from Hawaii contribute energy to the seismic and acoustic spectra below 0.4 Hz. Wavefield directionality strongly influences the acoustic spectrum at frequencies below about 2 Hz, above which the acoustic levels imply near-isotropic surface wave directionality.
The question of what is the optimal reverberation time for speech intelligibility in an occupied classroom has been studied recently in two different ways, with contradictory results. Experiments have been performed under various conditions of speech-signal to background-noise level difference and reverberation time, finding an optimal reverberation time of zero. Theoretical predictions of appropriate speech-intelligibility metrics, based on diffuse-field theory, found nonzero optimal reverberation times. These two contradictory results are explained by the different ways in which the two methods account for background noise, both of which are unrealistic. To obtain more realistic and accurate predictions, noise sources inside the classroom are considered. A more realistic treatment of noise is incorporated into diffuse-field theory by considering both speech and noise sources and the effects of reverberation on their steady-state levels. The model shows that the optimal reverberation time is zero when the speech source is closer to the listener than the noise source, and nonzero when the noise source is closer than the speech source. Diffuse-field theory is used to determine optimal reverberation times in unoccupied classrooms given optimal values for the occupied classroom. Resulting times can be as high as several seconds in large classrooms; in some cases, optimal values are unachievable, because the occupants contribute too much absorption.
In an earlier paper [Nosal and Frazer Appl. Acoust. 61, 1187-1201 (2006)], a sperm whale was tracked in three-dimensions using direct and surface-reflected time differences (DRTD) of clicks recorded on five bottom-mounted hydrophones, a passive method that is robust to timing errors between hydrophones. This paper refines the DRTD method and combines it with a time of (direct) arrival method to improve the accuracy of the track. The position and origin time of each click having been estimated, pitch and yaw are then obtained by assuming the main axis of the whale is tangent to the track. Roll is then found by applying the bent horn model of sperm whale phonation, in which each click is composed of two pulses, p0 and p1, that exit the whale at different points. With instantaneous pitch, roll, and yaw estimated from time differences, amplitudes are then used to estimate the beam patterns of the p0 and p1 pulses. The resulting beam patterns independently confirm those obtained by Zimmer et al. [J. Acoust. Soc. Am. 117, 1473-1485 (2005); 118, 3337-3345 (2005)] with a very different experimental setup. A method for estimating relative click levels is presented and used to find that click levels decrease toward the end of a click series, prior to the "creak" associated with prey capture.
This paper explores acoustical (or time-dependent) radiosity--a geometrical-acoustics sound-field prediction method that assumes diffuse surface reflection. The literature of acoustical radiosity is briefly reviewed and the advantages and disadvantages of the method are discussed. A discrete form of the integral equation that results from meshing the enclosure boundaries into patches is presented and used in a discrete-time algorithm. Furthermore, an averaging technique is used to reduce computational requirements. To generalize to nonrectangular rooms, a spherical-triangle method is proposed as a means of evaluating the integrals over solid angles that appear in the discrete form of the integral equation. The evaluation of form factors, which also appear in the numerical solution, is discussed for rectangular and nonrectangular rooms. This algorithm and associated methods are validated by comparison of the steady-state predictions for a spherical enclosure to analytical solutions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.