Early and accurate prediction of endotracheal tube (ETT) location is pivotal for critically ill patients. Automatic and timely detection of faulty ETT locations from chest X-ray images may avert patients’ morbidity and mortality. Therefore, we designed convolutional neural network (CNN)-based algorithms to evaluate ETT position appropriateness relative to four detected key points, including tracheal tube end, carina, and left/right clavicular heads on chest radiographs. We estimated distances from the tube end to tracheal carina and the midpoint of clavicular heads. A DenseNet121 encoder transformed images into embedding features, and a CNN-based decoder generated the probability distributions. Based on four sets of tube-to-carina distance-dependent parameters (i.e., (i) 30–70 mm, (ii) 30–60 mm, (iii) 20–60 mm, and (iv) 20–55 mm), corresponding models were generated, and their accuracy was evaluated through the predicted L1 distance to ground-truth coordinates. Based on tube-to-carina and tube-to-clavicle distances, the highest sensitivity, and specificity of 92.85% and 84.62% respectively, were revealed for 20–55 mm. This implies that tube-to-carina distance between 20 and 55 mm is optimal for an AI-based key point appropriateness detection system and is empirically comparable to physicians’ consensus.