2006
DOI: 10.1109/tasl.2006.883253
|View full text |Cite
|
Sign up to set email alerts
|

Single-Ended Speech Quality Measurement Using Machine Learning Methods

Abstract: Abstract-We describe a novel single-ended algorithm constructed from models of speech signals, including clean and degraded speech, and speech corrupted by multiplicative noise and temporal discontinuities. Machine learning methods are used to design the models, including Gaussian mixture models, support vector machines, and random forest classifiers. Estimates of the subjective mean opinion score (MOS) generated by the models are combined using hard or soft decisions generated by a classifier which has learne… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
55
0

Year Published

2007
2007
2021
2021

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 87 publications
(56 citation statements)
references
References 24 publications
1
55
0
Order By: Relevance
“…The auditory spectrum is approximated by an all-pole autoregressive model, whose coefficients are transformed to th-order PLP cepstral coefficients . The zeroth cepstral coefficient is employed as an energy measure [49], and is chosen from previous experiments [50]. When describing the PLP vector for a given frame , the notation and will be used.…”
Section: A Preprocessing Vad and Feature Extractionmentioning
confidence: 99%
See 1 more Smart Citation
“…The auditory spectrum is approximated by an all-pole autoregressive model, whose coefficients are transformed to th-order PLP cepstral coefficients . The zeroth cepstral coefficient is employed as an energy measure [49], and is chosen from previous experiments [50]. When describing the PLP vector for a given frame , the notation and will be used.…”
Section: A Preprocessing Vad and Feature Extractionmentioning
confidence: 99%
“…On the other hand, approximately 10% of "normal" abrupt starts, such as those experienced with certain plosive consonants (e.g.,/d/), are misclassified as clippings. To improve classification performance, more complex machine learning methods can be used [50]. Since abrupt starts have, intuitively, less significant impact on perceived speech quality [30], [56], such classification errors are shown not to be detrimental to overall speech quality measurement.…”
Section: Temporal Discontinuity Detectionmentioning
confidence: 99%
“…The zeroth cepstral coefficient is used as a log-energy term. We also experiment with delta and double-delta coefficients [11] as measures of signal spectral dynamics.…”
Section: Pre-processing Vad and Feature Extractionmentioning
confidence: 99%
“…Different features extracted from speech have been detected to be useful for speech quality assessment. Spectral dynamics, spectral flatness, spectral centroid, spectral variance, fundamental frequency or pitch (F 0 ) excitation variance and perceptual linear prediction (PLP) coefficients were used for quality prediction in [5], [6]. In [7] and [8], the quality assessment problem is posed as a regression problem and the mapping between acoustic features and the subjective score was found using Mel Frequency Cepstral Coefficients (MFCCs) and filterbank energies, respectively.…”
Section: Introductionmentioning
confidence: 99%