In the realm of modern medicine, medical imaging stands as an irreplaceable pillar for accurate diagnostics. The significance of precise segmentation in medical images cannot be overstated, especially considering the variability introduced by different practitioners. With the escalating volume of medical imaging data, the demand for automated and efficient segmentation methods has become imperative. This study introduces an innovative approach to heart image segmentation, embedding a multi-scale feature and attention mechanism within an inverted pyramid framework. Recognizing the intricacies of extracting contextual information from low-resolution medical images, our method adopts an inverted pyramid architecture. Through training with multi-scale images and integrating prediction outcomes, we enhance the network’s contextual understanding. Acknowledging the consistent patterns in the relative positions of organs, we introduce an attention module enriched with positional encoding information. This module empowers the network to capture essential positional cues, thereby elevating segmentation accuracy. Our research resides at the intersection of medical imaging and sensor technology, emphasizing the foundational role of sensors in medical image analysis. The integration of sensor-generated data showcases the symbiotic relationship between sensor technology and advanced machine learning techniques. Evaluation on two heart datasets substantiates the superior performance of our approach. Metrics such as the Dice coefficient, Jaccard coefficient, recall, and F-measure demonstrate the method’s efficacy compared to state-of-the-art techniques. In conclusion, our proposed heart image segmentation method addresses the challenges posed by diverse medical images, offering a promising solution for efficiently processing 2D/3D sensor data in contemporary medical imaging.