We address the alignment of a group of images with simultaneous registration. Therefore, we provide further insights into a recently introduced framework for multivariate similarity measures, referred to as accumulated pair-wise estimates (APE), and derive efficient optimization methods for it. More specifically, we show a strict mathematical deduction of APE from a maximum-likelihood framework and establish a connection to the congealing framework. This is only possible after an extension of the congealing framework with neighborhood information. Moreover, we address the increased computational complexity of simultaneous registration by deriving efficient gradient-based optimization strategies for APE: Gauss-Newton and the efficient second-order minimization (ESM). We present next to SSD the usage of intrinsically nonsquared similarity measures in this least squares optimization framework. The fundamental assumption of ESM, the approximation of the perfectly aligned moving image through the fixed image, limits its application to monomodal registration. We therefore incorporate recently proposed structural representations of images which allow us to perform multimodal registration with ESM. Finally, we evaluate the performance of the optimization strategies with respect to the similarity measures, leading to very good results for ESM. The extension to multimodal registration is in this context very interesting because it offers further possibilities for evaluations, due to publicly available datasets with ground-truth alignment.