Neonates admitted to neonatal intensive care units (NICUs) are at risk for respiratory decompensation and may require endotracheal intubation. Delayed intubation is associated with increased morbidity and mortality, particularly in urgent unplanned intubation. By accurately predicting the need for intubation in real-time, additional time can be made available for preparation, thereby increasing the safety margins by avoiding high-risk late intubation. In this study, the probability of intubation in neonatal patients with respiratory problems was predicted using a deep neural network. A multimodal transformer model was developed to simultaneously analyze time-series data (1-3 h of vital signs and FiO 2 setting value) and numeric data including initial clinical information. Over a dataset including information of 128 neonatal patients who underwent noninvasive ventilation, the proposed model successfully predicted the need for intubation 3 h in advance (area under the receiver operator characteristic curve = 0.880 ± 0.051, F1-score = 0.864 ± 0.031, sensitivity = 0.886 ± 0.041, specificity = 0.849 ± 0.035, and accuracy = 0.857 ± 0.032). Moreover, the proposed model showed high generalization ability by achieving AUROC 0.890, F1-score 0.893, specificity 0.871, sensitivity 0.745, and accuracy 0.864 with an additional 91 dataset for testing.