2016
DOI: 10.3389/fbioe.2016.00001
|View full text |Cite|
|
Sign up to set email alerts
|

Voice Pathology Detection Using Modulation Spectrum-Optimized Metrics

Abstract: There exist many acoustic parameters employed for pathological assessment tasks, which have served as tools for clinicians to distinguish between normophonic and pathological voices. However, many of these parameters require an appropriate tuning in order to maximize its efficiency. In this work, a group of new and already proposed modulation spectrum (MS) metrics are optimized considering different time and frequency ranges pursuing the maximization of efficiency for the detection of pathological voices. The … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
39
0
1

Year Published

2017
2017
2020
2020

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 25 publications
(40 citation statements)
references
References 31 publications
0
39
0
1
Order By: Relevance
“…The most coherent features, as ascertained in several trials, are two estimators of perturbation noise and one descriptor of dispersion based on the modulation spectrum; namely, GNE, CHNR and RALA. Interestingly, the latter is a novel characteristic that has been recently introduced in [27,28].…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…The most coherent features, as ascertained in several trials, are two estimators of perturbation noise and one descriptor of dispersion based on the modulation spectrum; namely, GNE, CHNR and RALA. Interestingly, the latter is a novel characteristic that has been recently introduced in [27,28].…”
Section: Discussionmentioning
confidence: 99%
“…For the trials based on sustained phonation, Hamming windows of 40 ms are employed for the Pert and SCs sets to ensure that each frame contains at least one pitch period, whereas windows of 55 ms length are used in the Comp sets as suggested in [23]. Likewise, for experiments in the MSs set, segments of 180 ms are utilised as in [27,28].…”
Section: Ancillary Datasetsmentioning
confidence: 99%
“…Likewise, windows of 55 ms length are used with the complexity features as suggested in [8]. Finally, for the experiments in the modulation spectrum set, frames of 180 ms are utilized as suggested in [6], [7].…”
Section: B Methodologymentioning
confidence: 99%
“…For the purposes of this paper, a representation learning approach based on MS is employed to characterize modulation and acoustic frequencies of input voices [39], following a short-time basis using frames of 180 ms as proposed in [5], [40]. The MS have been successfully used in different works related with the characterization of pathological voices, but because of the large amount of data they contain, it is always necessary to extract some hand tuned statistics [5], [40] or to use feature selection techniques [41]. In the representation learning approach considered in this paper, Convolutional Neural Network (CNN) are used to automatically extract information from MS in the context of voice quality assessment.…”
Section: A Characterizationmentioning
confidence: 99%