Post-processing pipeline for image analysis in reverse engineering modelling, such as photogrammetry applications, still asks for manual interventions mainly for shadows and reflections corrections and, often, for background removal. The usage of Convolutional Neural Network (CNN) may conveniently help in recognition and background removal. This paper presents an approach based on CNN for background removal, assessing its efficiency. Its relevance pertains to a comparison of CNN approaches versus manual assessment, in terms of accuracy versus automation with reference to cultural heritage targets. Through a bronze statue test case, pros and cons are discussed with respect to the final model accuracy. The adopted CNN is based on the U-NetMobilenetV2 architecture, a combination of two deep networks, to converge faster and achieve higher efficiency with small datasets. The used dataset consists of over 700 RGB images used to provide knowledge from which CNNs can extract features and distinguish the pixels of the statue from background ones. To extend CNN capabilities, training sets with and without dataset integration are investigated. Dice coefficient is applied to evaluate the CNN efficiency. Results obtained are used for the photogrammetric reconstruction of the Principe Ellenistico model. This 3D model is compared with a model obtained through a 3D scanner. Moreover, through a comparison with a photogrammetric 3D model obtained without the CNN background removal, performances are evaluated. Although few errors due to bad light conditions, the advantages in terms of process automation are consistent (over 50% in time reduction).