Information of industrial estates is important because it may increase the economy level, industrial goods production, and export activities. In addition, information of industrial areas needs to be identified so that the area does not interfere with agricultural productivity, natural resources, and cultural heritage. To obtain this information, extraction and classification of building footprints using orthophoto data with a deep learning approach is carried out. However, this has the challenge that the condition of the building is highly diverse in both shape and size, so it requires additional data such as height data form (NDSM) to facilitate its identification. The Mask Region-based Convolutional Neural Network (Mask R-CNN) method used for extraction produces 88.49% precision accuracy; 66.82% completeness (recall); and 76.15% F1-score. The classification model performed with the Deep Neural Network (DNN) method, produced excellent accuracy with average values of precision, recall, and F1-score of 0.94; 0.90; and 0.92, respectively.