ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9414305
|View full text |Cite
|
Sign up to set email alerts
|

Federated Acoustic Modeling for Automatic Speech Recognition

Abstract: Data privacy and protection is a crucial issue for any automatic speech recognition (ASR) service provider when dealing with clients. In this paper, we investigate federated acoustic modeling using data from multiple clients. A client's data is stored on a local data server and the clients communicate only model parameters with a central server, and not their data. The communication happens infrequently to reduce the communication cost. To mitigate the non-iid issue, client adaptive federated training (CAFT) i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
15
0
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 34 publications
(16 citation statements)
references
References 13 publications
0
15
0
1
Order By: Relevance
“…FL-based adaptation for ASR models faces several unique challenges including the lack of ground truth transcriptions, high compute and cross-device network communication costs, the non independent and identical distribution of data (non-IIDness), and the difficulty of providing privacy guarantees. Several recent works have considered cross-device FL for ASR applications [14,15,16,17,18,19]. In particular, the challenge of training on non-IID data has been addressed using weighted model averaging [14,15] and federated variation noise [17].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…FL-based adaptation for ASR models faces several unique challenges including the lack of ground truth transcriptions, high compute and cross-device network communication costs, the non independent and identical distribution of data (non-IIDness), and the difficulty of providing privacy guarantees. Several recent works have considered cross-device FL for ASR applications [14,15,16,17,18,19]. In particular, the challenge of training on non-IID data has been addressed using weighted model averaging [14,15] and federated variation noise [17].…”
Section: Introductionmentioning
confidence: 99%
“…1 This paper was submitted to Interspeech 2022. A common assumption made by many existing works on cross-device FL for ASR is the availability of ground-truth transcriptions [14,15,16,17,18,19]. In reality, however, users of on-device ASR applications have neither the incentive to transcribe their audios, nor the desire to frequently edit inaccurate automatic transcriptions.…”
Section: Introductionmentioning
confidence: 99%
“…Various Federated ASR methods have been proposed to train ASR models in FL systems [11,12,13,14]. Specific challenges arising from data heterogeneity (speech characteristics, amount of data, acoustic environments etc.)…”
Section: Introductionmentioning
confidence: 99%
“…Specific challenges arising from data heterogeneity (speech characteristics, amount of data, acoustic environments etc.) are addressed via client-dependent data transformations [14] and imposing upper limits on the number of client samples [13]. Improvements to distributed optimization of models, such as Word Error Rate (WER) based aggregation [12] and a hierarchical gradient weighting scheme [11] have been proposed.…”
Section: Introductionmentioning
confidence: 99%
“…For example, in [14], federated learning is applied with an end-to-end (E2E) model on the French set of the Common Voice dataset [15]. In [16], an RNN-T architecture ASR [17] is used on the LibriSpeech corpus, and [18] applies FL to a hybrid ASR model. Additionally, in [19] Dimitriadis et.al.…”
Section: Introductionmentioning
confidence: 99%