In recent years, the use of deep learning-based models for developing advanced healthcare systems has been growing due to the results they can achieve. However, the majority of the proposed deep learning-models largely use convolutional and pooling operations, causing a loss in valuable data and focusing on local information. In this paper, we propose a deep learning-based approach that uses global and local features which are of importance in the medical image segmentation process. In order to train the architecture, we used extracted three-dimensional (3D) blocks from the full magnetic resonance image resolution, which were sent through a set of successive convolutional neural network (CNN) layers free of pooling operations to extract local information. Later, we sent the resulting feature maps to successive layers of self-attention modules to obtain the global context, whose output was later dispatched to the decoder pipeline composed mostly of upsampling layers. The model was trained using the Mindboggle-101 dataset. The experimental results showed that the self-attention modules allow segmentation with a higher Mean Dice Score of 0.90 ± 0.036 compared with other UNet-based approaches. The average segmentation time was approximately 0.038 s per brain structure. The proposed model allows tackling the brain structure segmentation task properly. Exploiting the global context that the self-attention modules incorporate allows for more precise and faster segmentation. We segmented 37 brain structures and, to the best of our knowledge, it is the largest number of structures under a 3D approach using attention mechanisms.