Visual object-based attention marks a key process of mammalian perception. By which mechanisms this process is implemented and how it can be interacted with by means of attentional control is not completely understood yet. Incremental binding is a mechanism required in more demanding scenarios of object-based attention and is likewise experimentally investigated quite well. Attention spreads across a representation of the visual object and labels bound elements by constant up-modulation of neural activity. The speed of incremental binding was found to be dependent on the spatial arrangement of distracting elements in the scene and to be scale invariant giving rise to the growth-cone hypothesis. In this work, we propose a neural dynamical model of incremental binding that provides a mechanistic account for these findings. Through simulations, we investigate the model properties and demonstrate how an attentional spreading mechanism tags neurons that participate in the object binding process. They utilize Gestalt properties and eventually show growth-cone characteristics labeling perceptual items by delayed activity enhancement of neuronal firing rates. We discuss the algorithmic process underlying incremental binding and relate it to the model’s computation. This theoretical investigation encompasses complexity considerations and finds the model to be not only of explanatory value in terms of neurohpysiological evidence, but also to be an efficient implementation of incremental binding striving to establish a normative account. By relating the connectivity motifs of the model to neuroanatomical evidence, we suggest thalamo-cortical interactions to be a likely candidate for the flexible and efficient realization suggested by the model. There, pyramidal cells are proposed to serve as the processors of incremental grouping information. Local bottom-up evidence about stimulus features is integrated via basal dendritic sites. It is combined with an apical signal consisting of contextual grouping information which is gated by attentional task-relevance selection mediated via higher-order thalamic representations.Author SummaryUnderstanding a visual scene requires us to tell apart visual objects from one another. Object-based attention is the process by which mammals achieve this. Mental processing of object components determines whether they are compatible to the overall object and, thus, should be grouped together to be perceived as a whole or not. For complicated objects, this processing needs to happen serially, determining the compatibility step by step. In this work, we propose a neural model of this process and try to answer the question of how it might be implemented in the brain. We test the model on a case of object-based attention for grouping elongated lines and compare it to the available experimental evidence. We additionally show that the model not only explains this evidence, but it does so also by spending neurons and connections efficiently — a property likewise desirable for brains and machines. Together, these findings suggest which brain areas might be involved in realizing this process and how to reason about the complexity of this computation.