2023
DOI: 10.1371/journal.pone.0283396
|View full text |Cite
|
Sign up to set email alerts
|

Deep audio embeddings for vocalisation clustering

Abstract: The study of non-human animals’ communication systems generally relies on the transcription of vocal sequences using a finite set of discrete units. This set is referred to as a vocal repertoire, which is specific to a species or a sub-group of a species. When conducted by human experts, the formal description of vocal repertoires can be laborious and/or biased. This motivates computerised assistance for this procedure, for which machine learning algorithms represent a good opportunity. Unsupervised clustering… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 12 publications
(5 citation statements)
references
References 51 publications
0
5
0
Order By: Relevance
“…HDBSCAN has the particularity of allowing soft clustering, where each vocal unit is not assigned to a single cluster, but instead to all clusters with varying probabilities. Both UMAP and HDBSCAN have become state-of-the-art algorithms due to their performance and robustness [ 69 , 70 ].…”
Section: Methodsmentioning
confidence: 99%
“…HDBSCAN has the particularity of allowing soft clustering, where each vocal unit is not assigned to a single cluster, but instead to all clusters with varying probabilities. Both UMAP and HDBSCAN have become state-of-the-art algorithms due to their performance and robustness [ 69 , 70 ].…”
Section: Methodsmentioning
confidence: 99%
“…Dias et al [ 20 ] also explored autoencoders for feature extraction and data visualization; additional, they used acoustic indices and spectral features to characterize sites from Costa Rica and Brazil. Best et al [ 21 ] introduced a new method for encoding vocalizations, using an autoencoder network to obtain embeddings from eight datasets across six species, including birds and marine mammals, also employing clustering and dimension reduction techniques such as DBSCAN and UMAP. Akbal et al [ 22 ] collected a new anuran sound dataset and proposed a hand-modeled sound classification system through an improved one-dimensional local binary pattern (1D-LBP) and Tunable Q Wavelet Transform (TQWT), obtaining a 99.35% accuracy in classifying 26 anuran species.…”
Section: Related Workmentioning
confidence: 99%
“…While selecting the best representation model to extract features and applying the most appropriate clustering method have been obvious factors to consider in bioacoustics research, we highlight the performance variations brought by the large search space of hyperparameter configurations (Best et al, 2023) which have remained obscure in the literature. As these configurations, mostly related to hyperparameters of algorithms, are often ambiguous and dataset-specific, grid search is therefore a step that should be considered when applying any algorithm.…”
Section: Discussionmentioning
confidence: 99%
“…As it is not clear which are the best acoustic features to describe and differentiate unknown underwater sounds, we decided to use available published state-of-the-art deep learning algorithms pre-trained and/or tested in bioacoustics data, containing (at least, partly) underwater sounds. Two different options were considered, namely, the Animal Vocalization Encoder based on Self-Supervision (AVES; Hagiwara, 2023) and a convolutional autoencoder network (CAE; Best et al, 2023) to obtain acoustic features. Since the autoencoder approach is unsupervised, we trained it on our own data (for training details see Supplementary Table S3).…”
Section: Automatic Feature Extractionmentioning
confidence: 99%
See 1 more Smart Citation