Images are the most intuitive way for humans to perceive and obtain information, and they are one of the most important sources of information. With the development of information technology, the use of digital image processing methods to locate and identify targets is widely used, so it is particularly important to detect the targets of interest quickly and accurately in the image. The traditional image detection system has the problems of low detection accuracy, long time consumption, and poor stability. Therefore, this paper proposes the design and research of artificial intelligence image detection system based on Internet of Things and cloud computing. The system designed in this article mainly includes three links, namely: image processing analysis design link in cloud computing environment, image feature collection module design link, and image integration detection link. The main technologies used in image processing and analysis in the cloud computing environment are virtualization technology, distributed massive data storage, and distributed computing. In the image feature collection module, before the image is input to the neural network, it is necessary to perform preprocessing operations on the distorted image and perform perspective correction; then use the deep residual network in deep learning to extract features. Finally, there is the image integration detection link. First, the target category judgment and position correction are performed on the regions generated by the candidate region generation network, and then the integrated image detection is performed through the improved target detection method based on the frame difference method. Through simulation experiments, compared with the traditional image detection system, the speed advantage of the artificial intelligence image detection system designed in this paper is obvious in the case of a large increase in the number of images. On images at different pixel levels, the accuracy of the image detection system proposed in this paper is always higher than that of traditional image detection systems, and the CPU usage and memory usage are at a lower level. In addition, within three months, the stability is also at a relatively high level of 0.9.