Motion-compensated prediction induces a chain of coding dependencies between pixels in video. In principle, an optimal selection of encoding parameters (motion vectors, quantization parameters, coding modes) should take into account the whole temporal horizon of a GOP. However, in practical coding schemes, these choices are made on a frame-by-frame basis, thus with a possible loss of performance. In this paper we describe a tree-based model for pixelwise coding dependencies: each pixel in a frame is the child of a pixel in a previous reference frame. We show that some tree structures are more favorable than others from a rate-distortion perspective, e.g., because they entail a large descendance of pixels which are well predicted from a common ancestor. In those cases, a higher quality has to be assigned to pixels at the top of such trees. We promote the creation of these structures by adding a special discount term to the conventional Lagrangian cost adopted at the encoder. The proposed model can be implemented through a double-pass encoding procedure. Specifically, we devise heuristic cost functions to drive the selection of quantization parameters and of motion vectors, which can be readily implemented into a state-of-the-art H.264/AVC encoder. Our experiments demonstrate that coding efficiency is improved for video sequences with low motion, while there are no apparent gains for more complex motion. We argue that this is due to both the presence of complex encoder features not captured by the model, and to the complexity of the source to be encoded.