This paper presents a study of automatic detection and recognition of tonal bird sounds in noisy environments. The detection of spectro-temporal regions containing bird tonal vocalisations is based on exploiting the spectral shape to identify sinusoidal components in the short-time spectrum. The detection method provides tonal-based feature representation that is employed for automatic bird recognition. The recognition system uses Gaussian mixture models to model 165 different bird syllables, produced by 95 bird species. Standard models, as well as models compensating for the effect of the noise, are employed. Experiments are performed on bird sound recordings corrupted by White noise and real-world environmental noise. The proposed detection method shows high detection accuracy of bird tonal components. The employed tonal-based features show significant recognition accuracy improvements over the Mel-frequency cepstral coefficients, in both standard and noise-compensated models, and strong robustness to mismatch between the training and testing conditions.
This paper presents an automatic system for detection of bird species in field recordings. A sinusoidal detection algorithm is employed to segment the acoustic scene into isolated spectro-temporal segments. Each segment is represented as a temporal sequence of frequencies of the detected sinusoid, referred to as frequency track. Each bird species is represented by a set of hidden Markov models (HMMs), each HMM modelling an individual type of bird vocalisation element. These HMMs are obtained in an unsupervised manner. The detection is based on a likelihood ratio of the test utterance against the target bird species and non-target background model. We explore on selection of cohort for modelling the background model, z-norm and t-norm score normalisation techniques and score compensation to deal with outlier data. Experiments are performed using over 40 hours of audio field recordings from 48 bird species plus an additional 16 hours of field recordings as impostor trials. Evaluations are performed using detection error trade-off plots. The equal error rate of 5% is achieved when impostor trials are non-target bird species vocalisations and 1.2% when using field recordings which do not contain bird vocalisations.
Automatic system for recognition of multiple bird species in audio recordings is presented. Time-frequency segmentation of the acoustic scene is obtained by employing a sinusoidal detection algorithm, which does not require any estimate of noise and is able to handle multiple simultaneous bird vocalisations. Each segment is characterised as a sequence of frequencies over time, referred to as a frequency track. Each bird species is represented by a hidden Markov model that models the temporal evolution of frequency tracks. The decision on the number and identity of bird species in a given recording is obtained based on maximising the overall likelihood of the set of detected segments, with a penalisation applied for increasing the number of bird models used. Experimental evaluations used audio field recordings containing 30 bird species. The presence of multiple bird species is simulated by joining the set of detected segments from several bird species. Results show that the proposed method can achieve recognition performance for multiple bird species not far from that obtained for single bird species, and considerably outperforms majority voting methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.