In this paper, we resolve the challenging obstacle of detecting pedestrians with the ubiquity of irregularities in scale, rotation, and the illumination of the natural scene images natively. Pedestrian instances with such obstacles exhibit significantly unique characteristics. Thus, it strongly influences the performance of pedestrian detection techniques. We propose the new robust Scale Illumination Rotation and Affine invariant Mask R-CNN (SIRA M-RCNN) framework for overcoming the predecessor’s difficulties. The first phase of the proposed system deals with illumination variation by histogram analysis. Further, we use the contourlet transformation, and the directional filter bank for the generation of the rotational invariant features. Finally, we use Affine Scale Invariant Feature Transform (ASIFT) to find points that are translation and scale-invariant. Extensive evaluation of the benchmark database will prove the effectiveness of SIRA M-RCNN. The experimental results achieve state-of-the-art performance and show a significant performance improvement in pedestrian detection.