Optical lenses are only able to focus a single scene plane onto the sensor, leaving the remainder of the scene subject to varying levels of defocus. The apparent depth of field can be extended by capturing a sequence with varying focal planes that is merged by selecting, for each pixel in the target image, the most focused corresponding pixel from the stack. This process is heavily dependent on capturing a stabilised sequence-a requirement that is impractical for hand-held cameras. Here we have developed a novel method that can merge a focus stack captured by a hand-held camera despite changes in shooting position and focus. Our approach is able to register the sequence using affine transformation before fusing the focus stack. We have developed a merging process that is able to identify the focused pixels for each pixel in the stack and therefore select the most appropriate pixels for the synthetically focused image. We have proposed a novel approach for capturing qualified focus stack on mobile phone cameras. Furthermore, we test our approach on a mobile phone platform that can automatically capture a focus stack as easily as a photographer capturing a conventional image.