“…As a result, for each group of components, we respectively have a three-dimensional tensor of shape (n, h, d), where n, h, and d respectively denote the number of features (7 for obstacles, 4 for side margins, 3 for oriented margins, and 1 for shot distances), the height of the frame, and the maximum horizontal distance from the plane/shot. All together, there would be four independent input populations: one for the obstacles of shape (7, 101, 175), one for the side margins of shape (4,25,17), one for the oriented margins of shape (3,101,41), and one for the distance of obstacles from the shot of shape (1,101,21). Note that the lowest bar in the game frame is not considered in ToM-based agent's visual access to reduce the complexity of the network's input (see Figure 5).…”