With the rapid development of the Internet of Things industry today, it is imperative to classify the Internet of Things scenarios to adapt to the needs of the Internet of Things in different scenarios. In recent years, Convolutional Neural Network (CNN) has succeeded in scene parsing. The methods based on CNN generally extract the visual features of the input images through the step-by-step abstraction of multiple convolutional layers and then parse the corresponding segmentation results of the output images through the step-by-step restoration of multiple deconvolutions. Screen the scene pictures of the Internet of Things from the large data set, use the image recognition algorithm to extract the features of the photographs, and use the extracted features to cluster the image set (for example, it can be divided into a family environment, car environment, etc.), divide the pictures with similar features into the same set, and finally measure the clustering accuracy to realize the Internet of Things scene classification. Finally, scene recognition accuracy is about 87%, and the clustering of feature vectors is also satisfactory.