In daily life, we can see images of real-life objects on posters, television, or virtually any type of smooth physical surfaces. We seldom confuse these images with the objects per se mainly with the help of the contextual information from the surrounding environment and nearby objects. Without this contextual information, distinguishing an object from an image of the object becomes subtle; it is precisely an effect that a large immersive display aims at achieving. In this work, we study and address a problem that mirrors the above-mentioned recognition problem, i.e., distinguishing images of true natural scenes and those from recapturing. Being able to detect recaptured images, robot vision can be more intelligent and a single-image-based countermeasure for re-broadcast attack on a face authentication system becomes feasible. This work is timely as the face authentication system is getting common on consumer mobile devices such as smart phones and laptop computers. In this work, we present a physical model for image recapturing and the features derived from the model are used in a recaptured image detector. Our physics-based method out-performs a statistics-based method by a significant margin on images of VGA (640×480) and QVGA (320×240) resolutions which are common for mobile devices. In our study, we find that apart from the contextual information, the unique properties for the recaptured image rendering process are crucial for the recognition problem.