Speaker identification is gaining popularity, with notable applications in security, automation, and authentication. For speaker identification, deep-convolutional-network-based approaches, such as SincNet, are used as an alternative to i-vectors. Convolution performed by parameterized sinc functions in SincNet demonstrated superior results in this area. This system optimizes softmax loss, which is integrated in the classification layer that is responsible for making predictions. Since the nature of this loss is only to increase interclass distance, it is not always an optimal design choice for biometric-authentication tasks such as face and speaker recognition. To overcome the aforementioned issues, this study proposes a family of models that improve upon the state-of-the-art SincNet model. Proposed models AF-SincNet, Ensemble-SincNet, and ALL-SincNet serve as a potential successor to the successful SincNet model. The proposed models are compared on a number of speaker-recognition datasets, such as TIMIT and LibriSpeech, with their own unique challenges. Performance improvements are demonstrated compared to competitive baselines. In interdataset evaluation, the best reported model not only consistently outperformed the baselines and current prior models, but also generalized well on unseen and diverse tasks such as Bengali speaker recognition.
In this study, we present a Bangla multi-domain sentiment analysis dataset, named as SentiGOLD, developed using 70,000 samples, which was compiled from a variety of sources and annotated by a gender-balanced team of linguists. This dataset was created in accordance with a standard set of linguistic conventions that were established after multiple meetings between the Government of Bangladesh and a nationally recognized Bangla linguistics committee. Although there are standard sentiment analysis datasets available for English and other rich languages, there are not any such datasets in Bangla, especially because, there was no standard linguistics framework agreed upon by national stakeholders. Senti-GOLD derives its raw data from online video comments, social media posts and comments, blog posts and comments, news and numerous other sources. Throughout the development of this dataset, domain distribution and class distribution were rigorously maintained. SentiGOLD was created using data from a total of 30 domains (e.g. politics, entertainment, sports, etc.) and was labeled using 5 classes (e.g. strongly negative, weakly negative, neutral, weakly positive, and strongly positive). In order to maintain annotation quality, the national linguistics committee approved an annotation scheme to ensure a rigorous Inter Annotator Agreement (IAA) in a multi-annotator annotation scenario. This procedure yielded an IAA score of 0.88 using Fleiss' kappa method, which is elaborated upon in the paper. A protocol for intra-and crossdataset evaluation was utilized in our efforts to develop a classification system as a standard. The cross-dataset evaluation was performed on the SentNoB dataset, which contains noisy Bangla text samples, thereby establishing a demanding test scenario. We Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only. KDD
Deep learning models have become an increasingly preferred option for biometric recognition systems, such as speaker recognition. SincNet, a deep neural network architecture, gained popularity in speaker recognition tasks due to its parameterized sinc functions that allow it to work directly on the speech signal. The original SincNet architecture uses the softmax loss, which may not be the most suitable choice for recognition-based tasks. Such loss functions do not impose inter-class margins nor differentiate between easy and hard training samples. Curriculum learning, particularly those leveraging angular margin-based losses, has proven very successful in other biometric applications such as face recognition. The advantage of such a curriculum learning-based techniques is that it will impose inter-class margins as well as taking to account easy and hard samples. In this paper, we propose Curricular SincNet (CL-SincNet), an improved SincNet model where we use a curricular loss function to train the SincNet architecture. The proposed model is evaluated on multiple datasets using intra-dataset and inter-dataset evaluation protocols. In both settings, the model performs competitively with other previously published work. In the case of inter-dataset testing, it achieves the best overall results with a reduction of 4% error rate compare to SincNet and other published work.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.