This paper introduces EFFB7-UNet, an advanced semantic segmentation framework tailored for Indoor Autonomous Vision Systems (IAVSs) utilizing the U-Net architecture. The framework employs EfficientNetB4 as its encoder, significantly enhancing feature extraction. It integrates a spatial and channel Squeeze-and-Excitation (scSE) attention block, emphasizing critical areas and features to refine segmentation outcomes. Comprehensive evaluations using the NYUv2 Dataset and various augmented datasets were conducted. This study systematically compares EFFB7-UNet's performance with multiple U-Net encoders, including ResNet50, ResNet101, MobileNet V2, VGG16, VGG19, and EfficientNets B0-B6. The findings reveal that EFFB7-UNet not only surpasses these configurations in terms of accuracy but also highlights the effectiveness of the scSE attention block in achieving superior segmentation results. Without utilizing depth information, EFFB7-UNet achieves a 12\% improvement in mean Intersection over Union (mIOU). This notable enhancement highlights EFFB7-UNet's adaptability across different domains, implying substantial progress in enhancing the effectiveness and reliability of Intelligent Autonomous Vision Systems (IAVS) technologies.