The purpose of this work is to contribute to the understanding and improvement of deep neural networks in the field of vocal quality. A neural network that predicts the perceptual assessment of overall severity of dysphonia in GRBAS scale is obtained. The design focuses on amplitude perturbations, frequency perturbations, and noise. Results are compared with performance of human raters on the same data. Both the precision and the mean absolute error of the neural network are close to human intra-rater performance, exceeding inter-rater performance.Auditory-perceptual judgment is a main part of routine clinical assessment of patients with voice disorders to document the voice quality [1]. The assessment of voice quality is important to detect both vocal pathologies and other diseases that, although they do not originate in the vocal cords, show signs of impaired vocal quality, for example Parkinson's disease [2,3]. This assessment is also important in monitoring the treatment.
GRBASIn order to standardize the evaluation and interrelate the auditory and physiological aspects of vocal production, methods and scales of audioperceptive evaluation have been proposed, such as GRBAS and CAPE-V [1].The GRBAS scale is an audioperceptive voice assessment method. It is based on studies that began in 1966 by the Japan Society of Logopedics and Phoniatrics [4] and was later popularizated by Minoru Hirano in 1981 [5]. The scale was globally adopted and it was also validated in many countries [6,7,8,9]. It is currently used by both clinicians and researchers.GRBAS comprises five dimensions (which form the acronym GRBAS) for the assessment of the glottal source: Grade or overall severity of dysphonia, Roughness, Breathiness, Asteny and Strain.Each dimension is rated on an integer four point scale, from "0" (no dysphonia) to "3" (severe dysphonia).There are two major weaknesses in the auditory-perceptual methods, the subjectivity of voice assessment and the need for experienced listeners [10,11].
Variability in quality assessmentEven using scales such as GRBAS, the assessment of voice quality has great variability between health professionals (inter-rater variability). The same can be found between different assessment instances carried out by the same evaluator (intra-evaluator variability) [12].