This paper discusses an approach to the description of the structure of models capable of being trained to recognize representations of items of generative models-in particular, the architecture of a convolutional autoencoder is considered in detail. Reliable qualitative results of the operation of a convolutional encoder are also presented that show that it is valid to regard this model as generative because it is possible to implement output and sampling procedures, using as an example the solution of the problem of restoring images in missing regions.
URGENCYOne of the most important problems that need to be solved in order to construct the most efficient recognition systems is the problem of selecting the attributes of the object to be classified. In other words, it is necessary to select such a description of an object of interest to us that it retains the most general characteristics for the given class of objects and also eliminates all the superfluous, excess elements. In working with real-world objects, as a rule, the most widely used and at the same time informative input descriptions are various images of the objects themselves. However, an ordinary, pixelbased representation of the images of objects is not the most convenient from the viewpoint of the recognition problem, because it is impossible to adequately satisfy the conditions indicated above. It is accordingly required to solve the problem of transforming the original image into some other representation that would increase the recognition quality by comparison with the use of a "raw" pixel representation. 1 For many successful recognition systems, the given problem is currently solved by a person at the level of developing and selecting an appropriate image-processing algorithm that would distinguish one or the other global or local attributes on these images in accordance with criteria established, again by a person, even if it is sometimes less heuristic because the given criteria are based on fairly strict mathematical models. Many such attribute descriptions possess a certain stability with respect to a number of transformations over images, be they a brightness transformation, an affine transformation, a scale change, etc. Examples of such descriptions can now be standard popular key-point descriptors SIFT and SURF, as well as certain less known descriptors.However, from the viewpoint of machine training, there is special interest in models of attribute-selecting systems that would be capable of training to recognize these attributesi.e., they would be capable of independently distinguishing specific attributes on an image as a function of context, conditioned by a specific problem. A successful universal solution of the given problem would help one to substantially advance on the path of constructing completely automatic computer-vision systems and also possibly even improve the understanding of how biological visual systems function.
MODEL FOR AUTOMATIC ATTRIBUTE SELECTIONA typical example of a model that solves the problem of automatic...