Anthropometric landmarks obtained from three-dimensional (3D) body scans are widely used in medicine, civil engineering, and virtual reality. For all those fields, an acquisition of certain and accurate landmark positions is crucial for obtaining satisfying results. Manual marking is time-consuming and is affected by the subjectivity of the human operator. Therefore, an automatic approach has become increasingly popular. This paper provides a short survey of different attempts for automatic landmark localization, from which one machine learning-based method was further analyzed and extended in the case of input data preparation for a convolutional neural network (CNN). A novel method of data processing is presented which utilize a mid-surface projection followed by further unwrapping. The article emphasizes its significance and the way it affects the outcome of a deep neural network. The workflow and the detailed description of algorithms used are included in this paper. To validate the method, it was compared with the orthogonal projection used for the state-of-the-art approach. Datasets consisting of 200 specimens, acquired using both methods, were used for convolutional neural networks training and 20 for validation. In this paper, we used YOLO v.3 architecture for detection and ResNet-152 for classification. For each approach, localizations of 22 normalized body landmarks for 10 male and 10 female subjects of different ages and various postures were obtained. To compare the accuracy of approaches, errors and their distribution were measured for each characteristic point. Experiments confirmed that the mid-surface projections resulted, on average, in a 14% accuracy improvement and up to 15% enhancement of resistance on errors related to scan imperfections.