Manual investigation of damages incurred to infrastructure is a challenging process, in that it is not only labour-intensive and expensive but also inefficient and error-prone. To automate the process, a method that is based on computer vision for automatically detecting cracks from 2D images is a viable option. Amongst the different methods of deep learning that are commonly used, the convolutional neural network (CNNs) is one that provides the opportunity for end-to-end mapping/learning of image features instead of using the manual suboptimal image feature extraction. Specifically, CNNs do not require human supervision and are more suitable to be used for indoor and outdoor applications requiring image feature extraction and are less influenced by internal and external noise. Additionally, the CNN’s are also computationally efficient since they are based on special convolution layers and pooling operations that enable the full execution of CNN frameworks on several hardware devices. Keeping this in mind, we propose a deep CNN framework that is based on 10 different convolution layers along with a cycle GAN (Generative Adversarial Network) for predicting the crack segmentation pixel by pixel in an end-to-end manner. The methods proposed here include the Deeply Supervised Nets (DSN) and Fully Convolutional Networks (FCN). The use of DSN enables integrated feature supervision for each stage of convolution. Furthermore, the model has been designed intricately for learning and aggregating multi-level and multiscale features while moving from the lower to higher convolutional layers through training. Hence, the architecture in use here is unique from the ones in practice which just use the final convolution layer. In addition, to further refine the predicted results, we have used a guided filter and CRFs (Conditional Random Fields) based methods. The verification step for the proposed framework was carried out with a set of 537 images. The deep hierarchical CNN framework of 10 convolutional layers and the Guided filtering achieved high-tech and advanced performance on the acquired dataset, showing higher F-score, Recall and Precision values of 0.870, 0.861, and 0.881 respectively, as compared to the traditional methods such as SegNet, Crack-BN, and Crack-GF.