With the recent development of smart farms, researchers are very interested in such fields. In particular, the field of disease diagnosis is the most important factor. Disease diagnosis belongs to the field of anomaly detection and aims to distinguish whether plants or fruits are normal or abnormal. The problem can be solved by binary or multi-classification based on a Convolutional Neural Network (CNN), but it can also be solved by image reconstruction. However, due to the limitation of the performance of image generation, SOTA’s methods propose a score calculation method using a latent vector error. In this paper, we propose a network that focuses on chili peppers and proceeds with background removal through GrabCut. It shows a high performance through an image-based score calculation method. Due to the difficulty of reconstructing the input image, the difference between the input and output images is large. However, the serial autoencoder proposed in this paper uses the difference between the two fake images, instead of the actual input, as a score. We propose a method of generating meaningful images using the GAN structure and classifying three results simultaneously by one discriminator. The proposed method showed a higher performance than previous research, and image-based scores showed the best performance.