“…Language and Shape Works that explore the intersection between language and geometry have taken many forms, from resolving language references [2,3,36], to generating language descriptions of a shape [3,19], to generating a shape given a language description [22,34]. Most relevant to our work are the ones that attempt the language reference game, where the task is to select based on a language description a target shape out of a set of potential candidates either in a collection of individual 3D shapes [3,36] or within a scene [2,20,33,40,43,45]. While most of these works treat the reference game as a classification problem on the set of candidates, [20] outputs a segmentation mask over the scene.…”