The need for cross-modal retrieval (CMR) among users is growing because of the swift advancements in computer vision and natural language processing. Promoting home operating systems is greatly aided by the openKylin home operating system. It is focused on enhancing the functionality of the operating system with safe and dependable operating technologies. Practical implications abound for integrating computer vision research into home operating systems. A multimodal retrieval approach based on openKylin is presented to improve the usability of text search graphs, audio, and video. The overview of pertinent domestic and international research is the main topic of this article. This technological advancement. It then offers details on the design, implementation procedure, and final implementation outcomes for demonstration. The works of natural operations show that this strategy has excellent accuracy and performance.