“…D EEP Learning, and (Convolutional) Neural Networks (CNN) in general, whose growth in popularity begun in the early 2010s, have marked a shift of Computer Vision, permeating most of the academic research fields of the last decade. Thanks to their ability of learning a hierarchical representation of raw input data without relying on handcrafted features, CNNs have rapidly become a methodology of choice for analyzing medical images [1], [2], [3], [4], [5], perceiving and elaborating an interpretation of dynamic scenes [6], [7], [8], [9], [10], handwriting analysis and speech recognition [11], [12], surveillance, traffic monitoring and autonomous driving [13], [14], [15], [16], people tracking [17], [18], skeletonization [19], image synthesis [20] and so on. This became possible thanks to the increase of processing capabilities, aided by the fast development of Graphics Processing Units (GPUs), and thanks to the collection of massive amounts of datasets [16], [17], [21], [22], required during the training of the models.…”