The ability to predict how objects behave during manipulation is an important problem. Models informed by mechanics are powerful, but are hard to tune. An alternative is to learn a model of the object's motion from data, to learn to predict. We study this for push manipulation. The paper starts by formulating a quasi-static prediction problem. We then pose the problem of learning to predict in two different frameworks: (i) regression and (ii) density estimation. Our architecture is modular: many simple, object specific, and context specific predictors are learned. We show empirically that such predictors outperform a rigid body dynamics engine tuned on the same data. We then extend the density estimation approach using a product of experts. This allows transfer of learned motion models to objects of novel shape, and to novel actions. With the right representation and learning method, these transferred models can match the prediction performance of a rigid body dynamics engine for novel objects or actions.