Computer vision is currently one of the most exciting and rapidly evolving fields of science, which affects numerous industries. Research and development breakthroughs, mainly in the field of convolutional neural networks (CNNs), opened the way to unprecedented sensitivity and precision in object detection and recognition tasks. Nevertheless, the findings in recent years on the sensitivity of neural networks to additive noise, light conditions, and to the wholeness of the training dataset, indicate that this technology still lacks the robustness needed for the autonomous robotic industry. In an attempt to bring computer vision algorithms closer to the capabilities of a human operator, the mechanisms of the human visual system was analyzed in this work. Recent studies show that the mechanisms behind the recognition process in the human brain include continuous generation of predictions based on prior knowledge of the world. These predictions enable rapid generation of contextual hypotheses that bias the outcome of the recognition process. This mechanism is especially advantageous in situations of uncertainty, when visual input is ambiguous. In addition, the human visual system continuously updates its knowledge about the world based on the gaps between its prediction and the visual feedback. CNNs are feed forward in nature and lack such top-down contextual attenuation mechanisms. As a result, although they process massive amounts of visual information during their operation, the information is not transformed into knowledge that can be used to generate contextual predictions and improve their performance. In this work, an architecture was designed that aims to integrate the concepts behind the top-down prediction and learning processes of the human visual system with the state-of-the-art bottom-up object recognition models, e.g., deep CNNs. The work focuses on two mechanisms of the human visual system: anticipation-driven perception and reinforcement-driven learning. Imitating these top-down mechanisms, together with the state-of-the-art bottom-up feed-forward algorithms, resulted in an accurate, robust, and continuously improving target recognition model.
As the quantity of visual information sources increases, the need to develop sensors that can automatically alert the user to exceptional events is being emphasized. This feature incorporates the ability to detect and classify the targets they "see". Achieving this goal will dramatically improve the efficiency of CCTV-based security systems, improve search/retrieval engines, increase the autonomy of robotic systems and will contribute in many other areas of life. The performance of the human visual system and its robustness to image degradations still surpasses the best computer vision systems. Remarkable in particular, is the human brain high accuracy in ultra rapid object categorization tasks. Recent studies shows that the mechanism behind the recognition process includes predictions based on prior knowledge about the world. These predictions enable rapid generation of hypothesis that biases the outcome of the recognition process in situation of uncertainty. In this work, we implemented the concepts behind this top-down prediction mechanism of the human visual system. This work focus on the orbitofrontal cortex (OFC) role in the prediction process, which appears to be attuned to the associative content of visual information and to facilitate recognition of sensory inputs via predictive feedback to sensory cortices. Specifically, coarse representations reach the OFC, which generate "initial guesses" regarding the targets identity. These predictions are projected to the inferior temporal cortex, which facilitate perception and select the most likely interpretations. We show that imitating this mechanism can potentially create more robust target recognition models than exist today.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.