“…The most general employed CNNs consist of a ~10-layer encoder network, after which the latent space is used for patientor lesion-wise classification [178][179][180]182,184,187,193] or subsequently expanded in a decoder configuration for full-image probability maps [173,174,176,189] . In some papers, residual connections within the encoding structure (e.g., ResNet architectures) are used to ease training and reduce the impact of vanishing gradients by preserving information for subsequent layers [175,190,193] . Auto-encoder networks enforcing sparsity or nonnegativity in the feature space have been developed for feature extraction [181,182,218] , subsequently employing other classification algorithms for a final classification output.…”