Human visual systems can parse a scene composed of novel objects and infer their surfaces and occlusion relationships without relying on object-specific shapes or textures. Perceptual grouping can bind together spatially disjoint entities to unite them as one object even when the object is entirely novel, and bind other perceptual properties like color and texture to that object using object-based attention. Border-ownership assignment, the assignment of perceived occlusion boundaries to specific perceived surfaces, is an intermediate representation in the mammalian visual system that facilitates this perceptual grouping. Since objects in a scene can be entirely novel, inferring border ownership requires integrating global figural information, while dynamically postulating what the figure is, a chicken-and egg process that is complicated further by missing or conflicting local evidence regarding the presence of boundaries. Based on neuroscience observations, we introduce a model -- the cloned Markov random field (CMRF)-- that can learn attention-controllable representations for border-ownership. Higher-order contour representations that distinguish border-ownerships emerge as part of learning in this model. When tested with a cluttered scene of novel 2D objects with noisy contour-only evidence, the CMRF model is able to perceptually group them, despite clutter and missing edges. Moreover, the CMRF is able to use occlusion cues to bind disconnected surface elements of novel objects into coherent objects, and able to use top-down attention to assign border ownership to overlapping objects. Our work is a step towards dynamic binding of surface elements into objects, a capability that is crucial for intelligent agents to interact with the world and to form entity-based abstractions.