This paper describes a method of model-matching, applicable as a verification procedure within a knowledge-based vision systems containing threedimensional geometric models. Most approaches to object verification in model-based vision merely extend the initial model instantiation process which uses a symbolic edge description, and thereby refine the initial hypothesis until a solution is reached. Symbolic edge descriptions are always inaccurate since it is difficult to translate real world scenes in to a set of discrete entities 1 . The iconic approach returns to the original image and can thus make use of information missed by the low-level processes.If a three-dimensional object is to be represented as a geometric model the model description should be analogical, to reflect the spatial isomorphism between the two entities. Geometric models used in vision systems are usually converted into an entirely symbolic form, for example graph structures 2 , or bit strings 3 , to facilitate matching with a symbolic image segmentation. A preferable approach is to use the spatial isomorphism present in the model and match it to an iconic representation of the image. Thus instead of simply applying global operations to the image to produce a fixed set of data structures, which can only be used uniformly, computational procedures of arbitrary complexity may be devised to manipulate the information in the image. Reliance for the final classification on the output of region or edge segmentations then becomes unnecessary.
ICONIC MODEL-MATCHINGThere are a number of methods available with which iconic model-matching to an image can be performed. One example is image rendering, which was discussed in particular by Besl and Jain 4 . They believed that a common fault of vision systems is that "high-level results are not projected back into a low-level form for final error checking" and said that "more research is needed in this area". Their proposed solution to the problem was to use computer graphics techniques in conjunction with the object models to predict the sensor data. This prediction could then be tested for its correspondence with the pixels in the image. On a sequential machine this is a very slow process and excessively sensitive to minor detail. In any real image, and particularly in natural scene analysis, it is impossible to predict the conditions in the image.Another similar iconic/iconic process, namely normalised correlation, suffers from the same problems. In this technique template functions are produced which specify the expected binary or grey-level distribution of the image. The image is convolved with these masks and a metric used to measure the "best" or "sufficiently good" matches in the image. Correlation is more flexible than image rendering, as it is less dependent on local properties, and in addition deformations of the image are allowed. It still however relies on specifying what a portion of the image will look like and performing a quantative match. This is very difficult, and requires a much deeper ...