The human visual recognition system is more efficient than any current robotic vision setting. One reason for this superiority is that humans utilize different fields of vision, depending on the recognition task. For instance, experiments on human subjects show that the peripheral vision is more useful than the central vision in recognizing scenes. We tested our recently-developed model, that is, the elastic net-regularized hierarchical MAX (En-HMAX), in recognizing objects and scenes. In various experimental conditions, images were occluded with windows and scotomas of varying sizes. With this model, classification accuracies of up to 90% for objects and scenes were possible. Modelling human experiments, window and scotoma analysis with the En-HMAX model revealed that object and scene recognition are sensitive to the availability of data in the centre and the periphery of the images, respectively. Similarly, results of deep learning models have shown that the classification accuracy diminishes dramatically in the absence of the peripheral vision. These differences led us to further analyse the performance of the En-HMAX model with the parafoveal versus peripheral areas of vision, in a second study. Results of the second study show that approximately 50% of the visual field would be sufficient to achieve 96% accuracy in the classification of unseen images. The En-HMAX model adopts a relative order of importance, similar to the human visual system, depending on the image category. We showed that utilizing the relevant regions of vision can significantly reduce the image processing time and size.