In the United States, to ensure railroad safety and keep its efficient operation, regular track inspections on track component defects are required by the Federal Railroad Administration (FRA). Various types of inspection equipment are applied, such as ground penetrating radar, laser, and LiDAR, but they are usually very expensive and require extensive training and rich experience to operate. To date, track inspections still rely heavily on manual inspections which are low‐efficiency, subjective, and not as accurate as desired, especially for missing and broken track components, such as spikes, clips, and tie plates. To address this issue, a real‐time pixel‐level rail components detection framework to inspect tracks timely and accurately is proposed in this study. The first public rail components image database, including rails, spikes, and clips, is built and released online. A real‐time pixel‐level detection framework with improved real‐time instance segmentation models is developed. The improved models leverage fast object detection and highly accurate instance segmentation. Backbones with more granular levels and receptive fields are implemented in the proposed models. Compared with the original YOLACT and Mask R‐CNN models, the proposed models are able to: (1) achieve 59.9 bbox mAP, and 63.6 mask mAP with the customized dataset, which are higher than the other models and (2) achieve a real‐time speed which is over 30 FPS processing a high‐resolution video (1,080 × 1,092) with a single GPU. The fast processing speed can quickly turn inspection videos into useful information to assist track maintenance. The railroad track components image dataset can be accessed at https://github.com/jonguo111/Rail_components_image_data