Deep discriminative models provide remarkable insights into hierarchical processing in the brain by predicting neural activity along the visual pathway. However, these models differ from biological systems in their computational and architectural properties. Unlike biological systems, they require teaching signals for supervised learning. Moreover, they rely on feed-forward processing of stimuli, which contrasts with the extensive top-down connections in the ventral pathway. Here, we address both issues by developing a hierarchical deep generative model and show that it predicts an extensive set of experimental results in the primary and secondary visual cortices (V1 and V2). We show that the widely documented nonlinear sensitivity of V2 neurons to texture statistics is a consequence of learning a hierarchical representation of natural images. Further, we show that top-down influences are inherent to inference in hierarchical generative models, and explain neuronal responses to illusory contours and systematic modulations of noise correlations in V1.