Y. Yemez scite author profile

Abstract-It is well-known that early integration (also called data fusion) is effective when the modalities are correlated, and late integration (also called decision or opinion fusion) is optimal when modalities are uncorrelated. In this paper, we propose a new multimodal fusion strategy for open-set speaker identification using a combination of early and late integration following canonical correlation analysis (CCA) of speech and lip texture features. We also propose a method for high precision synchronization of the speech and lip features using CCA prior to the proposed fusion. Experimental results show that i) the proposed fusion strategy yields the best equal error rates (EER), which are used to quantify the performance of the fusion strategy for open-set speaker identification, and ii) precise synchronization prior to fusion improves the EER; hence, the best EER is obtained when the proposed synchronization scheme is employed together with the proposed fusion strategy. We note that the proposed fusion strategy outperforms others because the features used in the late integration are truly uncorrelated, since they are output of the CCA analysis.

show abstract

3D Model Retrieval Using Probability Density-Based Shape Descriptors

Akgül

Sankur

Yemez

et al. 2009

IEEE Trans. Pattern Anal. Mach. Intell.

107

View full text Add to dashboard Cite

Abstract-We address content-based retrieval of complete 3D object models by a probabilistic generative description of local shape properties. The proposed shape description framework characterizes a 3D object with sampled multivariate probability density functions of its local surface features. This density-based descriptor can be efficiently computed via kernel density estimation (KDE) coupled with fast Gauss transform. The nonparametric KDE technique allows reliable characterization of a diverse set of shapes and yields descriptors which remain relatively insensitive to small shape perturbations and mesh resolution. Density-based characterization also induces a permutation property which can be used to guarantee invariance at the shape matching stage. As proven by extensive retrieval experiments on several 3D databases, our framework provides state-of-the-art discrimination over a broad and heterogeneous set of shape categories.

show abstract

Coarse‐to‐Fine Combinatorial Matching for Dense Isometric Shape Correspondence

Sahillioğlu

Yemez

2011

Computer Graphics Forum

View full text Add to dashboard Cite

show abstract

Scene Representation Technologies for 3DTV—A Survey

Alatan¹,

Yemez

Güdükbay

et al. 2007

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

Y. Yemez is with Department of Computer Engineering, Koç University, 34450 Istanbul, Turkey (e-mail: yyemez@ku.edu.tr).U. Güdükbay is with Department of Computer Engineering, Bilkent University, Bilkent 06800, Turkey (e-mail: gudukbay@cs.bilkent.edu.tr).X. Zabulis is with ITI-CERTH, Thessaloniki 57001, Greece (e-mail: xenophon@iti.gr).K. Müller is with Fraunhofer Institute for Telecommunications-Heinrich-Hertz-Institut, 10587 Berlin, Germany (e-mail: kmueller@hhi.de).Ç. E. Erdem is with Momentum A.Ş ., TÜBİTAK-MAM-TEKSEB, 41470 Kocaeli, Turkey (e-mail: cigdem.erdem@momentum-dmt.com potential for use in a 3DTV framework for modeling and animating dynamic scenes. As a concluding remark, it can be argued that 3-D scene and texture representation techniques are mature enough to serve and fulfill the requirements of 3-D extraction, transmission and display sides in a 3DTV scenario.

show abstract

Discriminative Analysis of Lip Motion Features for Speaker Identification and Speech-Reading

Çetingül

Yemez

Erzin³

et al. 2006

IEEE Trans. on Image Process.

101

View full text Add to dashboard Cite

Abstract-There have been several studies that jointly use audio, lip intensity, and lip geometry information for speaker identification and speech-reading applications. This paper proposes using explicit lip motion information, instead of or in addition to lip intensity and/or geometry information, for speaker identification and speech-reading within a unified feature selection and discrimination analysis framework, and addresses two important issues: 1) Is using explicit lip motion information useful, and, 2) if so, what are the best lip motion features for these two applications? The best lip motion features for speaker identification are considered to be those that result in the highest discrimination of individual speakers in a population, whereas for speech-reading, the best features are those providing the highest phoneme/word/phrase recognition rate. Several lip motion feature candidates have been considered including dense motion features within a bounding box about the lip, lip contour motion features, and combination of these with lip shape features. Furthermore, a novel two-stage, spatial, and temporal discrimination analysis is introduced to select the best lip motion features for speaker identification and speech-reading applications. Experimental results using an hidden-Markov-modelbased recognition system indicate that using explicit lip motion information provides additional performance gains in both applications, and lip motion features prove more valuable in the case of speech-reading application.Index Terms-Bayesian discriminative feature selection, lip motion, speaker identification, speech recognition, temporal discriminative feature selection.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Y. Yemez

Audiovisual Synchronization and Fusion Using Canonical Correlation Analysis

3D Model Retrieval Using Probability Density-Based Shape Descriptors

Coarse‐to‐Fine Combinatorial Matching for Dense Isometric Shape Correspondence

Scene Representation Technologies for 3DTV—A Survey

Discriminative Analysis of Lip Motion Features for Speaker Identification and Speech-Reading

Contact Info

Product

Resources

About