Motion compensated inter prediction is a common component of all video coders. The concept was established in traditional hybrid coding and successfully transferred to learning-based video compression. To compress the residual signal after prediction, usually the difference of the two signals is compressed using a standard autoencoder. However, information theory tells us that a general conditional coder is more efficient. In this paper, we provide a solid foundation based on information theory and Shannon entropy to show the potentials but also the limits of conditional coding. Building on those results, we then propose the generalized difference coder, a special case of a conditional coder designed to avoid limiting bottlenecks. With this coder, we are able to achieve average rate savings of 27.8% compared to a standard autoencoder, by only adding a moderate complexity overhead of less than 7%.