2009
DOI: 10.1109/tasl.2008.2011515
|View full text |Cite
|
Sign up to set email alerts
|

Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition

Abstract: Abstract-While the accuracy of feature measurements heavily depends on changing environmental conditions, studying the consequences of this fact in pattern recognition tasks has received relatively little attention to date. In this paper, we explicitly take feature measurement uncertainty into account and show how multimodal classification and learning rules should be adjusted to compensate for its effects. Our approach is particularly fruitful in multimodal fusion scenarios, such as audiovisual speech recogni… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
28
0
3

Year Published

2011
2011
2022
2022

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 81 publications
(32 citation statements)
references
References 44 publications
1
28
0
3
Order By: Relevance
“…For visual features the approach used histogram-based descriptors around twelve lip landmarks determined using an AAM fitting technique and the classification involved multiple kernel learning and SVM. Similar results were reported by Papandreou et al [68] who achieved a best recognition rate of 83% in speaker independent experiments when using AAM visual features obtained from the entire lower face with six shape and six texture coefficients and when using HMM for classification.…”
Section: Comparison With Other Studiessupporting
confidence: 87%
“…For visual features the approach used histogram-based descriptors around twelve lip landmarks determined using an AAM fitting technique and the classification involved multiple kernel learning and SVM. Similar results were reported by Papandreou et al [68] who achieved a best recognition rate of 83% in speaker independent experiments when using AAM visual features obtained from the entire lower face with six shape and six texture coefficients and when using HMM for classification.…”
Section: Comparison With Other Studiessupporting
confidence: 87%
“…As already noted by [31], in coupled HMM decoding, stream weight adaptation and uncertainty compensation by UD both provide significant advantages in isolation, but using uncertainty compensation in addition to optimized stream weighting provides only small benefits. This finding was replicated in our experiments.…”
Section: Discussionmentioning
confidence: 86%
“…Uncertainty Decoding (denoted by GDU in the following tables) was used successfully for audiovisual speech recognition in [31]. In conjunction with uncertainty propagation techniques and stream weight optimization, however, the respective performance gains of UD become small.…”
Section: Uncertainty-based Decodingmentioning
confidence: 99%
“…These approaches were all proposed by Matthews et al 7 to extract visual features. Continuing their work on visual feature extraction, Papandreou et al 32 focused on multimodal fusion scenarios, using audiovisual speech recognition as an example. They demonstrated that their visemic AAM (based on digits 0-9) with six texture coefficients outperforms their PCA-based technique with 18 texture coefficients, achieving a word accuracy rate of 83% and 71%, respectively.…”
Section: Introductionmentioning
confidence: 99%