Fire is a serious security threat that can lead to casualties, property damage, and environmental damage. However, despite the availability of object detection algorithms, challenges persist in detecting fires and smoke. These challenges include slow convergence speed, poor performance in detecting small targets, and high computational cost limiting deployments. In this paper, Fire smoke and human detection based on ConvNeXt and Mixed encoder (FCM-DETR), an end-to-end object detection algorithm based on Deformable DETR, is proposed. Firstly, we introduce ConvNeXt to take the place of Resnet, which greatly reduces the amount of computation and improves the ability to extract irregular flame features. Secondly, to effectively process multi-scale features, the original encoder is decoupled into two modules. Then Mixed encoder, an innovative encoder structure, is proposed, resulting in excellent performance of multi-scale fire and smoke features fusion. What can’t be overlooked is that the encoder block is compatible with any DETR-based models. Finally, the convergence speed is accelerated and the mean of average precision is improved by applying a novel loss function called Powerful IoU v2. The experimental results indicate that our model achieves the best detection accuracy compared to other models, with mAP reaching 66.7%, mAPs and Accuracyfire achieving impressive 50.2% and 98.05%, respectively.