ABSTRACT:Due to good scalability, systems for image-based dense surface reconstruction often employ stereo or multi-baseline stereo methods. These types of algorithms represent the scene by a set of depth or disparity maps which eventually have to be fused to extract a consistent, non-redundant surface representation. Generally the single depth observations across the maps possess variances in quality. Within the fusion process not only preservation of precision and detail but also density and robustness with respect to outliers are desirable. Being prune to outliers, in this article we propose a local median-based algorithm for the fusion of depth maps eventually representing the scene as a set of oriented points. Paying respect to scalability, points induced by each of the available depth maps are streamed to cubic tiles which then can be filtered in parallel. Arguing that the triangulation uncertainty is larger in the direction of image rays we define these rays as the main filter direction. Within an additional strategy we define the surface normals as the principle direction for median filtering/integration. The presented approach is straight-forward to implement since employing standard oc-and kd-tree structures enhanced by nearest neighbor queries optimized for cylindrical neighborhoods. We show that the presented method in combination with the MVS (Rothermel et al., 2012) produces surfaces comparable to the results of the Middlebury MVS benchmark and favorably compares to an state-of-the-art algorithm employing the Fountain dataset (Strecha et al., 2008). Moreover, we demonstrate its capability of depth map fusion for city scale reconstructions derived from large frame airborne imagery.