“…There are a handful of attempts in literature for applying FL in speech-related tasks. Some of these applications are: ASR [10,11,12,13,14], Keyword Spotting [15,16], Emotion Recognition [17,18,16], and Speaker Verification [19]. Notably, for combining FL with SSL, the only available works include Federated self-supervised learning (FSSL) [20] for acoustic event detection and [21], where the challenges involved in combining FL & SSL due to hardware limitations on the client are highlighted and a wav2vec 2.0 [4] model is trained with FL on Common-Voice Italian data [22] and fine-tuned for ASR.…”