Neonates with marked brain developmental delays are at increased risk of neurodevelopmental disorders. Brain chronological age is a valuable biomarker for assessing abnormal maturation in developing brains; however, accurately estimating brain age at birth remains challenging. In this study, we introduce a cross-modal relationship inference network (CMRINet) that integrates structural and diffusion magnetic resonance imaging data to improve the accuracy of neonatal brain age estimation. The CMRINet employs a Transformer encoder and relational inference module to capture both the long- and short-range dependencies of multimodal features among cortical parcels. Our model outperformed others in predicting neonatal brain age, achieving a mean squared error of 0.51 and a mean absolute error of 0.55 on the test set. By applying the model trained on full-term neonates to preterm infants at term-equivalent age, we found that the predicted age was significantly lower than the chronological age, suggesting delayed development in preterm brains. Furthermore, the deviation of predicted age was significantly associated with long-term motor development of preterm infants. These findings highlight the effectiveness of the CMRINet for neonatal brain age estimation, with potential clinical utility in early detection of neurodevelopmental risks during the perinatal period.