Multimodal image registration is an important component of medical image processing, allowing the integration of complementary information from various imaging modalities to improve clinical applications like diagnosis and treatment planning. We proposed a novel multistage neural network for three-dimensional multimodal medical image registration, which addresses the challenge of larger rigid deformations commonly present in medical images due to variations in patient positioning in different scanners and rigid anatomical structures. This multistage network combines rigid, affine and deformable transformations in three stages. The network was trained unsupervised with Mutual Information and Gradient L2 loss. We compared the results of our proposed multistage network with a rigid-affine-deformable registration with the classical registration method NiftyReg as a baseline and a multistage network, which combines affine and deformable transformation, as a benchmark. To evaluate the performance of the proposed multistage network, we used four three-dimensional multimodal in vivo datasets: three renal MR datasets consisting of T1-weighted and T2-weighted MR scans and one liver dataset containing CT and T1-weighted MR scans. Experimental results showed that combining rigid, affine and deformable transformations in a multistage network leads to registration results with a high structural similarity, overlap of the corresponding structures (Dice: 76.7 ± 12.5, 61.1 ± 14.0, 64.8 ± 16.2, 68.1 ± 24.6 for the four datasets) and a low level of image folding (|J| ≤ 0: less than or equal to 1.1%), resulting in a medical plausible registration result.