Higher levels of visual processing are progressively more invariant to low-level visual factors such as contrast. Although this invariance trend has been well documented for simple stimuli like gratings and lines, it is difficult to characterize such invariances in images with naturalistic complexity. Here, we use a generative image model based on a hierarchy of learned visual features-a Generative Adversarial Networkto constrain image manipulations to remain within the vicinity of the manifold of natural images. This allows us to quantitatively characterize visual discrimination behaviour for naturalistically complex, non-linear image manipulations. We find that human tuning to such manipulations has a factorial structure. The first factor governs image contrast with discrimination thresholds following a power law with an exponent between 0.5 and 0.6, similar to contrast discrimination performance for simpler stimuli. A second factor governs image content with approximately constant discrimination thresholds throughout the range of images studied. These results support the idea that human perception factors out image contrast relatively early on, allowing later stages of processing to extract higher level image features in a stable and robust way.