High-Level Structure (HLS) extraction in a set of images consists of recognizing 3D elements with useful information to the user or application. There are several approaches to HLS extraction. However, most of these approaches are based on processing two or more images captured from different camera views or on processing 3D data in the form of point clouds extracted from the camera images. In contrast and motivated by the extensive work developed for the problem of depth estimation in a single image, where parallax constraints are not required, in this work, we propose a novel methodology towards HLS extraction from a single image with promising results. For that, our method has four steps. First, we use a CNN to predict the depth for a single image. Second, we propose a region-wise analysis to refine depth estimates. Third, we introduce a graph analysis to segment the depth in semantic orientations aiming at identifying potential HLS. Finally, the depth sections are provided to a new CNN architecture that predicts HLS in the shape of cubes and rectangular parallelepipeds.