“…The navigation policy of our proposed system processes depth observation using a 2-D CNN to obtain l depth . The layer configurations include input channel number, output channel number, kernel size, and stride of (1,32,8,4), (32,64,4,2), and (64, 64, 3, 1), with a zero-padding of (2, 2, 2, 2) and (1, 1, 1, 1) for the first two layers. Task observation is processed using a 2-layer MLP with (3, 256) and (256, 256), resulting in l task .…”