“…To enable intra-category any-pose manipulation, an object representation that achieves category-level generalization is crucial. Existing representations can be roughly classified into three kinds: 6-DOF pose estimators [39], [38], [37], [40], [18], 3D keypoints [33], [29], [19], [20], [4], and dense correspondence models [28], [12], [32], [31]. Despite the disparities in form, their ultimate goals are consistent -to determine the local coordinate frame of the object.…”