Fig. 1: We propose representing scenes as composed focal stacks, which we compute from registered images, for view synthesis. Our approach enables diminishing local artifacts such as motion blur and ISO noise, to improve multi-layer scene representations. (a) Our approach allows the user to randomly capture photos in a continuous motion. (b) From the captured images, we compose a synthetic focal stack, from which we derive a multi-layer scene representation using a CNN. (c) The outcome of the network is a multi-layer scene representation, which enables continuous viewpoint changes. (d) Our approach supports several applications, including 6 degrees of freedom wide field of view scene representation, which enables photo-realistic VR applications (left), scene representations from underexposed images (right), and multi-layer image generation from captured focal stacks (top).