Aimed at addressing deficiencies in existing image fusion methods, this paper proposed a multi-level and multi-classification generative adversarial network (GAN)-based method (MMGAN) for fusing visible and infrared images of forest fire scenes (the surroundings of firefighters), which solves the problem that GANs tend to ignore visible contrast ratio information and detailed infrared texture information. The study was based on real-time visible and infrared image data acquired by visible and infrared binocular cameras on forest firefighters’ helmets. We improved the GAN by, on the one hand, splitting the input channels of the generator into gradient and contrast ratio paths, increasing the depth of convolutional layers, and improving the extraction capability of shallow networks. On the other hand, we designed a discriminator using a multi-classification constraint structure and trained it against the generator in a continuous and adversarial manner to supervise the generator, generating better-quality fused images. Our results indicated that compared to mainstream infrared and visible image fusion methods, including anisotropic diffusion fusion (ADF), guided filtering fusion (GFF), convolutional neural networks (CNN), FusionGAN, and dual-discriminator conditional GAN (DDcGAN), the MMGAN model was overall optimal and had the best visual effect when applied to image fusions of forest fire surroundings. Five of the six objective metrics were optimal, and one ranked second-to-optimal. The image fusion speed was more than five times faster than that of the other methods. The MMGAN model significantly improved the quality of fused images of forest fire scenes, preserved the contrast ratio information of visible images and the detailed texture information of infrared images of forest fire scenes, and could accurately reflect information on forest fire scene surroundings.