Robust perception is generally produced through complex multimodal perception pipelines, but these kinds of methods are unsuitable for autonomous UAV deployment, given the restriction found on the platforms. This chapter describes developments and experimental results produced to develop new deep learning (DL) solutions for industrial perception problems. An earlier solution combining camera, LiDAR, GPS, and IMU sensors to produce high rate, accurate, robust detection, and positioning of pipes in industrial environments is to be replaced by a single camera computationally lightweight convolutional neural network (CNN) perception technique. In order to develop DL solutions, large image datasets with ground truth labels are required, so the previous multimodal technique is modified to be used to capture and label datasets. The labeling method developed automatically computes the labels when possible for the images captured with the UAV platform. To validate the automated dataset generator, a dataset is produced and used to train a lightweight AlexNet-based full convolutional network (FCN). To produce a comparison point, a weakened version of the multimodal approach-without using prior data-is evaluated with the same DL-based metrics.