Despite the success of deep neural networks (DNNs) in enabling on-device voice assistants, increasing evidence of bias and discrimination in machine learning is raising the urgency of investigating the fairness of these systems. Speaker verification is a form of biometric identification that gives access to voice assistants. Due to a lack of fairness metrics and evaluation frameworks that are appropriate for testing the fairness of speaker verification components, little is known about how model performance varies across subgroups, and what factors influence performance variation. To tackle this emerging challenge, we design and develop SVEva 1 Fair, an accessible, actionable and model-agnostic framework for evaluating the fairness of speaker verification components. The framework provides evaluation measures and visualisations to interrogate model performance across speaker subgroups and compare fairness between models. We demonstrate SVEva Fair in a case study with end-to-end DNNs trained on the VoxCeleb datasets to reveal potential bias in existing embedded speech recognition systems based on the demographic attributes of speakers. Our evaluation shows that publicly accessible benchmark models are not fair and consistently produce worse predictions for some nationalities, and for female speakers of most nationalities. To pave the way for fair and reliable embedded speaker verification, SVEva Fair has been implemented as an open-source python library and can be integrated into the embedded ML development pipeline to facilitate developers and researchers in troubleshooting unreliable speaker verification performance, and selecting high impact approaches for mitigating fairness challenges.
When deploying machine learning (ML) models on embedded and IoT devices, performance encompasses more than an accuracy metric: inference latency, energy consumption, and model fairness are necessary to ensure reliable performance under heterogeneous and resource-constrained operating conditions. To this end, prior research has studied model-centric approaches, such as tuning the hyperparameters of the model during training and later applying model compression techniques to tailor the model to the resource needs of an embedded device. In this paper, we take a data-centric view of embedded ML and study the role that pre-processing parameters in the data pipeline can play in balancing the various performance metrics of an embedded ML system. Through an in-depth case study with audio-based keyword spotting (KWS) models, we show that pre-processing parameter tuning is a remarkable tool that model developers can adopt to trade-off between a model's accuracy, fairness, and system efficiency, as well as to make an embedded ML model resilient to unseen deployment conditions. CCS CONCEPTS• Hardware → Emerging tools and methodologies; • Computing methodologies → Neural networks; Speech recognition; • Social and professional topics → User characteristics.
Machine learning systems (MLSys) are emerging in the Internet of Things (IoT) to provision edge intelligence, which is paving our way towards the vision of ubiquitous intelligence. However, despite the maturity of machine learning systems and the IoT, we are facing severe challenges when integrating MLSys and IoT in practical context. For instance, many machine learning systems have been developed for large-scale production (e.g., cloud environments), but IoT introduces additional demands due to heterogeneous and resource-constrained devices and decentralized operation environment. To shed light on this convergence of MLSys and IoT, this paper analyzes the tradeoffs by covering the latest developments (up to 2020) on scaling and distributing ML across cloud, edge, and IoT devices. We position machine learning systems as a component of the IoT, and edge intelligence as a socio-technical system. On the challenges of designing trustworthy edge intelligence, we advocate a holistic design approach that takes multi-stakeholder concerns, design requirements and trade-offs into consideration, and highlight the future research opportunities in edge intelligence.
Clustering is frequently used in the energy domain to identify dominant electricity consumption patterns of households, which can be used to construct customer archetypes for long term energy planning. Selecting a useful set of clusters however requires extensive experimentation and domain knowledge. While internal clustering validation measures are well established in the electricity domain, they are limited for selecting useful clusters. Based on an application case study in South Africa, we present an approach for formalising implicit expert knowledge as external evaluation measures to create customer archetypes that capture variability in residential electricity consumption behaviour. By combining internal and external validation measures in a structured manner, we were able to evaluate clustering structures based on the utility they present for our application. We validate the selected clusters in a use case where we successfully reconstruct customer archetypes previously developed by experts. Our approach shows promise for transparent and repeatable cluster ranking and selection by data scientists, even if they have limited domain knowledge.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.