This study presents a systematic review of the automatic vision-based assessment systems and models from related fields over the last 6 years. Many studies focus on automatic vision-based assessment in the area of robotics, artificial intelligence and virtual reality or augmented reality (AR) to enhance the communication between humans and machines for general purposes. Three reliable databases, IEEEXplore, Science Direct and Web of Science, were used to obtain relevant studies on the given topic. Several stages of filtering and scanning were applied according to the inclusion/exclusion criteria to filter the obtained 3505 papers from 2015 to 2020. Finally, 97 papers met the criteria. They were classified into four main categories based on their field by following the scientific taxonomy: computer-vision-based, robotics-based, AR-based and hybrid-based categories, accounting for 42.26% (n = 41/97 papers), 48.45% (n = 47/97 papers), 6.18% (n = 6/97 papers) and 4.12% (n = 4/97 papers), respectively. Subsequently, a deep and critical analysis of this multifield systematic review highlighted new research opportunities, motivations, challenges and recommendations that need attention to integrate interdisciplinary studies. Thus, automatic vision-based assessment, which is a field requiring automated solutions, tools and methods, enhances the ability of the assistive technology and facilitates the interaction of individuals with machines. Many studies have been conducted on the automatic vision-based assessment systems and their subtypes to promote accurate communication and performance evaluation in human-machine interaction. This study can provide researchers with useful guides and valuable information for future research. This study also addresses the ambiguity of automatic vison-based assessment models in interdisciplinary fields.