Artificial intelligence promotes the development of mineral/rock automatic recognition technology to reduce labor costs and reliance on personal experience. With instrument digitization, image recognition has played an increasingly important role. This review focuses on the image sensing system applied to mineral/rock identification and redivides this system into a sensing module, an imaging module, and an information processing module. The sensing and imaging mechanisms are summarized via technologies of sensing, imaging, and display. The information processing is classified into single‐modal or multimodal recognition. This review analyzes the limitations of single‐modal recognition and discusses the advantages and significance of multimodal recognition. The challenges of current mineral/rock identification are concluded, such as public database, multimodal learning, and portable sensing system. The interdisciplinary sensing technologies are discussed, and application feasibility is explored. This review aims to provide a reference for the research community in automatic mineral/rock identification and sensing system optimization.