“…Especially in the computer vision field, the level of situation awareness based on visual information has been dramatically improved. For example, AlexNet (Krizhevsky et al, 2012) has significantly improved the object classification performance by intro-J o u r n a l P r e -p r o o f vironments (e.g., depth images) during the training (Loquercio et al, 2021;Ramezani Dooraki & Lee, 2018;Wu et al, 2018), or use virtual simulators that are similar to real environments (He et al, 2020;Roghair et al, 2021;Sadeghi & Levine, 2016). Ahn & Song (2020) trained the robot arm grasping policy in the simulation using a vision sensor and deploy the learned policy in the physical worlds with additional training in real-world environments.…”