An approach to diarize taniavartanam segments of a Carnatic music concert is proposed in this paper. Information bottleneck (IB) based approach used for speaker diarization is applied for this task. IB system initializes the segments to be clustered uniformly with fixed duration. The issue with diarization of percussion instruments in taniavartanam is that the stroke rate varies highly across the segments. It can double or even quadruple within a short duration, thus leading to variable information rate in different segments. To address this issue, the IB system is modified to use the stroke rate information to divide the audio into segments of varying durations. These varying duration segments are then clustered using the IB approach which is then followed by Kullback-Leibler hidden Markov model (KL-HMM) based realignment of the instrument boundaries. Performance of the conventional IB system and the proposed system is evaluated on standard Carnatic music dataset. The proposed technique shows a best case absolute improvement of 8.2% over the conventional IB based system in terms of diarization error rate.
For challenging machine learning problems such as zero-shot learning and fine-grained categorization, embedding learning is the machinery of choice because of its ability to learn generic notions of similarity, as opposed to class-specific concepts in standard classification models. Embedding learning aims at learning discriminative representations of data such that similar examples are pulled closer, while pushing away dissimilar ones. Despite their exemplary performances, supervised embedding learning approaches require huge number of annotations for training. This restricts their applicability for large datasets in new applications where obtaining labels require extensive manual efforts and domain knowledge. In this paper, we propose to learn an embedding in a completely unsupervised manner without using any class labels. Using a graph-based clustering approach to obtain pseudo-labels, we form triplet-based constraints following a metric learning paradigm. Our novel embedding learning approach uses a probabilistic notion, that intuitively minimizes the chances of each triplet violating a geometric constraint. Due to nature of the search space, we learn the parameters of our approach using Riemannian geometry. Our proposed approach performs competitive to state-of-the-art approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.