Image semantic segmentation using deep learning algorithms plays a vital role in identifying different rock-forming minerals. In this paper, we employ the U-net model for its architecture that guarantees precise localization and efficient data utilization. We implement this deep learning model across two distinct datasets: (1) the first dataset from the ALEX Streckeisen website, and (2) the second dataset from the Gabal Nikeiba area, South Eastern Desert of Egypt. Our model exhibits excellent performance in both datasets, with an average accuracy of precision at 0.89 and 0.83, recall at 0.80 and 0.78, and F1 score at 0.82 and 0.79, respectively, helping in identifying and detecting rock-forming minerals in thin-section images. The model’s most exceptional performance is clearly in eleven different basement rock-forming minerals with precision up to 0.89, recall at 0.80, and F1 score at 0.82 on average. This study is significant as it represents the key to identifying and detecting minerals in the thin sections of rock samples in Egypt and the Arabian–Nubian Shield as a whole. By significantly reducing analysis time and improving accuracy compared to manual methods, it revolutionizes geological research and resource exploration in the region.