The paper constitutes a short review of the second-order visual mechanisms studies. Their contribution to the process of the visual attention controlling is being of great interest today. Basic and neural network approaches in the modeling of the second-order visual mechanisms are discussed. The authors report the results of network training when modulated textures were used as training sets, and also present, as an example, the architecture of fast-learning classifier with accuracy more than 98% on test set. The representations obtained through learning are demonstrated. The results of convolutional autoencoders’ training to extract the envelope of the textures, that are modulated in contrast, orientation, and spatial frequency, are presented as well. The successful learning architectures are given as examples. The authors assume that using of convolutional networks in the modeling of the second-order visual mechanisms provides the great perspective, while the results can be used in the algorithms of saliency maps development.