“…However, the dynamic human bodies occupy the most proportion of the image pixels in multiperson scenarios. To handle this obstacle, [50,12,8,50,13] obtain structure cues and estimate camera parameters from the semantics of the scene (e.g., lines of the basketball court). [24,55] estimate the extrinsic camera parameters from the tracked human trajectories in more general multiperson scenes.…”