In the era of rapid development of artificial intelligence, the integration of multimedia and human-artificial intelligence (H-AI) has become an important research hotspot. Especially in the multimedia environment, effective remote visual monitoring has become the exploration direction of many scholars. The use of traditional filtering algorithm (CF) for real-time monitoring in the context of multimedia is a practical strategy. However, most existing filtering-based visual monitoring algorithms still have the problems of insufficient robustness and effectiveness. Therefore, by considering the strategy of updating human memory, this paper proposes a multi-layer template update mechanism to achieve effective monitoring in a multimedia environment. In this strategy, the weighted template of the high-confidence matching memory is used as the confidence memory, and the unweighted template of the low-confidence matching memory is used as the cognitive memory. Through the alternate use of confidence memory, matching memory, and cognitive memory, it is ensured that the target will not be lost during the monitoring process. Experimental result s show that this strategy does not affect the speed (still real-time) and improves the robustness in the multimedia background.