“…The model learns representations from the raw speech, and these representations can be used in the required downstream task. Examples of pre-trained models are wav2vec2 and HuBERT that have shown good performance in various speech technology tasks, such as ASR, emotion recognition, speaker and language identification, and voice disorder detection [47,48,49,50,51,52,53]. There are, however, no studies on using recent self-supervised pre-trained models, such as wav2vec2 [47] and HuBERT [54], for voice quality classification.…”