With the continuous integration of deep learning and the technique of molecular biology, target detection models must accurately detect the position of each cell in the image and classify it correctly. We present a model for the multi‐scale feature fusion of the existing human cell image dataset based on Gaussian mixedly clustering. First, a novel feature extraction network for extracting preliminary features at picture multi scales was presented, which was based on a residual neural network with Instance Normalization and a Mish activation function. Second, the presented model adopts the idea of feature fusion and introduced a new type of feature fusion network to integrate feature graphs on different scales. Furthermore, a Gaussian hybrid clustering algorithm was proposed to cluster the hyperparameters. Based on the experimental results, the average accuracy of the proposed model in the human cell image dataset exceeds 0.96, which improves by 11.9% compared with the existing target detection methods in the same field. Experiments show that the proposed model had been adapted to datasets with uneven sample distribution, providing new ideas for research on medical images.