“…While the loss function was deliberately designed to weigh recall higher than precision (at β = 0.7), consistent improvements in all test performance metrics including DSC and F 2 scores on the test set indicate improved generalization through this type of training. Compared to DSC which weighs recall and precision equally, and the ROC analysis, we consider the area under the PR curves (APR, shown in Figure 2) the most reliable performance metric for such highly skewed data [8,1]. To put the work in context, we reported average DSC, F 2 , and APR scores (equal to 56.4, 57.3, and 56.0, respectively), which indicate that our approach performed very well compared to the latest results in MS lesion segmentation [6,20].…”