“…Nevertheless, in situations where clothes significantly self-occlude, point cloud representations become ambiguous as different layers of the cloth cannot be distinguished based solely on the observable set of points. While classical computer vision approaches such as a Harris Corner Detector [22] or a wrinkle-detector [23] can be used for detecting cloth features, they are typically not robust to variations of texture, lightning conditions, and non-static observations. This study tackles these perception challenges by integrating semantic descriptors, derived from RGB observations through pre-trained VLMs, with point cloud representations.…”