Intracranial aneurysms (IAs) manifest as atypical dilatation within the intracranial arterial structures, the rupture of which accounts for high mortality and morbidity rates. Current clinical protocols require radiologists to manually annotate IAs on Magnetic Resonance Angiography (MRA) images, which is inherently subjective and time‐consuming. Given these limitations, there is an urgent need to explore methods for automated and accurate segmentation of IAs from MRA images. In particular, recent years have witnessed the proliferation of sophisticated computational techniques, with deep learning algorithms—especially the 3D U‐Net and its derivatives—gaining prominence in segmentation works. Nevertheless, convolutional neural network (CNN)‐based models have an inherent limitation in capturing long‐range spatial dependencies, which inadvertently compromises the retention of global features critical for segmentation. In response to this challenge, we introduce an avant‐garde architectural design, dubbed staged cluster transformers (SCTR), which incorporates cluster mechanism into vision transformers to perform volumetric MRA image segmentation. In addition to the MRA clustering branch, the spatially aligned brain Magnetic Resonance Imaging (MRI) representation branch is also combined to extract the structural features and assist the network in learning richer contextual and boundary information for accurate voxel prediction. For validation, we utilized both a publicly available challenge dataset and an internal clinical dataset in this study. Our proposed model achieves dice similarity coefficients (DSC) of 0.5587 and 0.8110 on these two datasets, respectively, outperforming other state‐of‐the‐art approaches. The results suggest that SCTR is a promising method for automatic segmentation of IAs. Our code is available at https://github.com/guolilin/SCTR.