“…However, currently there is no consensus on the criterion of optimum. A variety of metrics were used in previous pipeline validations, to just name a few, spatial smoothness (Esteban et al, 2019 ), consistency within and between datasets (Cruces et al, 2022 ), between‐group difference detection (Cui et al, 2013 ; Xu et al, 2018 ), discriminability (Lawrence et al, 2021 ), inter‐pipeline agreement (Li et al, 2021 ), and age predication/correlation (Alfaro‐Almagro et al, 2018 ; Tustison et al, 2014 ; Yan et al, 2016 ). However, these previous validations were usually not systematic by considering only one brain feature in one dataset or the metrics used were not informative for end‐users in experiment design and statistical analysis.…”