Obtaining highly accurate depth from stereo images in real time has many applications across computer vision and robotics, but in some contexts, upper bounds on power consumption constrain the feasible hardware to embedded platforms such as FPGAs. Whilst various stereo algorithms have been deployed on these platforms, usually cut down to better match the embedded architecture, certain key parts of the more advanced algorithms, e.g. those that rely on unpredictable access to memory or are highly iterative in nature, are difficult to deploy efficiently on FPGAs, and thus the depth quality that can be achieved is limited. In this paper, we leverage a FPGA-CPU chip to propose a novel, sophisticated, stereo approach that combines the best features of SGM and ELAS-based methods to compute highly accurate dense depth in real time. Our approach achieves an 8.7% error rate on the challenging KITTI 2015 dataset at over 50 FPS, with a power consumption of only 5W.Obtaining information about the 3D structure of a scene is important for many computer vision and robotics applications, e.g. 3D scene reconstruction [1]-[3], camera relocalisation [4]-[6], navigation and obstacle avoidance [7]. Often, this information will be obtained in the form of a depth image, and various options for acquiring such images exist. Passive approaches, which rely only on one or more image sensors, are popular due their low cost, low weight and size, lack of active/moving components, ability to work at longer ranges, deployability in a wider range of operating environments and lack of interference. Among them, binocular stereo relies on a pair of synchronised cameras to acquire the same scene from two different points of view. Given the two frames, a dense and reliable depth map can be computed by finding correspondences between the pixels in the two images [8]. State-of-the-art algorithms for this problem usually rely on costly global image optimisations or on massive convolutional neural networks that involve significant computational costs, making them hard to deploy on resource-limited systems such as embedded devices [9].Two popular solutions offering a good trade-off between speed and accuracy are Semi-Global Matching (SGM) [10] and ELAS [11]. SGM computes initial matching hypotheses by comparing patches around pixels in the left and right images, then approximates a costly image-wide smoothness constraint with the sum of several directional minimizations over the Correspondence: {oscar@robots.ox.ac.uk} O. Rahnama is with the University of Oxford and FiveAI Ltd. T. Joy and P. Torr are with the University of Oxford. A. Tonioni and L. Di Stefano are with the University of Bologna. T. Cavallari, S. Golodetz and S. Walker are with FiveAI Ltd. Work done whilst A. Tonioni was visiting the University of Oxford.disparity range. By contrast, ELAS first identifies a set of sparse but reliable correspondences to provide a coarse approximation of the scene geometry, then uses them to define slanted plane priors that guide the final dense matching stage. ...