“…V2B [33] designs a voxel-to-BEV object localization network to tackle sparse point clouds. Other techniques such as LTTR [34], PTT [35], PTTR [36], STNet [37], and CMT [38] develop sophisticated transformer structures to improve feature fusion or object localization. Nevertheless, none of them challenges q 1 p1, , s1 q 1 pi, , si q i pi, , si q i pK, , sK q K pK, , sK q K ... ... p1, , s1 q 1 pi, , si q i pK, , sK q K Vote Cluster Features 3D Proposals…”