Speech recognition and speaker recognition by machine are crucial ingredients for many important applications such as natural and flexible human-machine interfaces. Most developments in speech-based automatic recognition have relied on acoustic speech as the sole input signal, disregarding its visual counterpart. However, recognition based on acoustic speech alone can be afflicted with deficiencies that preclude its use in many real-world applications, particularly under adverse conditions. The combination of auditory and visual modalities promises higher recognition accuracy and robustness than can be obtained with a single modality. Multimodal recognition is therefore acknowledged as a vital component of the next generation of spoken language systems. This paper reviews the components of bimodal recognizers, discusses the accuracy of bimodal recognition, and highlights some outstanding research issues as well as possible application domains.
This paper assesses the merits of three diflerent approaches t o pixel-level h u m a n skin detection. T h e basisfor the 3 approaches has been reported recently in the literature. T h e first two approaches [1, 21 use simple ratios and colour space transforms respectively, whereas the third is a numerically eficient approach based o n a 3-D RGB probability map, first implemented by Rehg [3]. T h e Bayesian probabilities are made possible t o compute only with the availability of a large appropriately labeled database. Over 12,000 images f r o m the Compaq skin and nonskin databases [4] are used t o quantitatively assess the three approaches. Thresholds are determined empiricall y t o detect 95% of all skin-associated pixels and assessment is then made in terms of the percentage of non-skin pixels incorrectly accepted. T h e lowest of these false acceptance rates is found t o be about 20% given by the 3-D probability map.
Esta es la versión de autor del artículo publicado en: This is an author produced version of a paper published in: Abstract-Footstep recognition is a relatively new biometric, which aims to discriminate persons using walking characteristics extracted from floor-based sensors. This paper reports for the first time a comparative assessment of the spatio-temporal information contained in the footstep signals for person recognition. Experiments are carried out on the largest footstep database collected to date, with almost 20,000 valid footstep signals and more than 120 persons. Results show very similar performance for both spatial and temporal approaches (5% to 15% EER depending on the experimental setup), and a significant improvement is achieved for their fusion (2.5% to 10% EER). The assessment protocol is focused on the influence of the quantity of data used in the reference models, which serves to simulate conditions of different potential applications such as smart homes or security access scenarios.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.