In this paper we propose a new method for face recognition using fractal codes. Fractal codes represent local contractive, affine transformations which when iteratively applied to range-domain pairs in an arbitrary initial image result in a fixed point close to a given image. The transformation parameters such as brightness offset, contrast factor, orientation and the address of the corresponding domain for each range are used directly as features in our method. Features of an unknown face image are compared with those pre-computed for images in a database. There is no need to iterate, use fractal neighbor distances or fractal dimensions for comparison in the proposed method. This method is robust to scale change, frame size change and rotations as well as to some noise, facial expressions and blur distortion in the image.
Surveillance imagery is usually lower in resolution than the capabilities of hardware owing to demands on storage and processing. Additionally, a large field of view reduces the number of pixels covered by a face even further. As this number decreases, it has been observed that face recognition performance eventually degrades considerably. Super-resolution helps overcome this by fusing complimentary information from multiples frames of video to produce higher resolution images. As many existing techniques assume rigidity of objects and simple global motion between frames, their performance suffers when applied to human faces. Optical flow can be used solve this problem by generating a dense motion field to track the inter-frame motion. This paper presents a novel optical flow based super-resolution face recognition system. Results from preliminary experiments show consistent improvement in recognition performance using super-resolved images, making the system viable for use in a surveillance environment.
This paper investigates the use of temporal lip information, in conjunction with speech information, for robust, text-dependent speaker identification. We propose that significant speakerdependent information can be obtained from moving lips, enabling speaker recognition systems to be highly robust in the presence of noise. The fusion structure for the audio and visual information is based around the use of multi-stream hidden Markov models (MSHMM), with audio and visual features forming the two independent data streams. Recent work with multi-modal MSHMM's has been performed successfully for the task of speech recognition. The use of temporal lip information for speaker identification has been performed previously, however this has been restricted to output fusion via single-stream HMM's. We present an extension to this previous work, and show that a MSHMM is a valid structure for multi-modal speaker identification.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.