“…Currently, the most prevalent and high-performing methods in the computer vision literature for automated tumor segmentation use a variety of encoder-decoder-based architectures in an end-to-end approach to generate lesion masks directly from MR image inputs [30,32,35]. These methods are primarily inspired by the pioneering developments of convolutional neural networks (CNNs) capable of 3D segmentation in works like U-net [19] and V-net [20], with more recent advances employing techniques such as multi-task learning [22][23][24], generative modeling for augmenting training data or adversarial approaches [26,[36][37][38][39][40][41], hybrid machine learning approaches [27,42], domain adaptation and transfer learning [29,[43][44][45][46][47][48][49][50][51][52][53], task-specific loss modification [18,25,27,31,34,54], diffusion models [41,[55][56][57], and attention mechanisms like transformer modules [58][59][60], as well as federated learning approaches [34,...…”