The advancements in machine vision have opened up new avenues for implementing multimodal biometric identification systems for real-world applications. These systems can address the shortcomings of unimodal biometric systems, which are susceptible to spoofing, noise, nonuniversality, and intraclass variations. Besides, ocular traits among various biometric traits are preferably used in these recognition systems due to their great uniqueness, permanence, and performance. However, segmenting visual biometric features under unconstrained situations remains challenging due to a variety of variables, such as Purkinje reflexes, specular reflections, eye gaze, off-angle pictures, poor resolution, and numerous occlusions. To overcome these challenges, this research presents a novel framework called SIPFormer, comprising the encoder, decoder, and transformer blocks to simultaneously segment three ocular traits (sclera, iris, and pupil) using its discriminative multihead self-attention mechanism. Besides, we used the large publicly available iris database reflecting different unconstrained acquisition settings, with inherent noise effects such as scanner artifacts, intensity and illumination variations, motion blur, and occultations caused by eyelashes, eyelids, and eyeglasses. Furthermore, the simulation results demonstrate the efficacy of the proposed SIPFormer model, where it achieved the mean Dice similarity coefficient scores of 0.9018, 0.9176, and 0.9229 for segmenting the sclera, iris, and pupil classes, respectively.