Image retrieval technology has emerged as a popular research area of China’s development of cultural digital image dissemination and creative creation with the growth of the Internet and the digital information age. This study uses the shadow image in Shaanxi culture as the research object, suggests a shadow image retrieval model based on CBAM-ResNet50, and implements it in the IoT system to achieve more effective deep-level cultural information retrieval. First, ResNet50 is paired with an attention mechanism to enhance the network’s capacity to extract sophisticated semantic characteristics. The second step is configuring the IoT system’s picture acquisition, processing, and output modules. The image processing module incorporates the CBAM-ResNet50 network to provide intelligent and effective shadow play picture retrieval. The experiment results show that shadow plays on GPU can retrieve images at a millisecond level. Both the first image and the first six photographs may be accurately retrieved, with a retrieval accuracy of 92.5 percent for the first image. This effectively communicates Chinese culture and makes it possible to retrieve detailed shadow-play images.