The objective of infrared and visual image fusion is to amalgamate the salient and complementary features of the infrared and visual images into a singular informative image. To accomplish this, we introduce a novel local-extrema-driven image filter designed to effectively smooth images by reconstructing pixel intensities based on their local extrema. This filter is iteratively applied to the input infrared and visual images, extracting multiple scales of bright and dark feature maps from the differences between continuously filtered images. Subsequently, the bright and dark feature maps of the infrared and visual images at each scale are fused using elementwise-maximum and elementwise-minimum strategies, respectively. The two base images, representing the final-scale smoothed images of the infrared and visual images, are fused using a novel structural similarity- and intensity-based strategy. Finally, our fusion image can be straightforwardly produced by combining the fused bright feature map, dark feature map, and base image together. Rigorous experimentation conducted on the widely used TNO dataset underscores the superiority of our method in fusing infrared and visual images. Our approach consistently performs on par or surpasses eleven state-of-the-art image-fusion methods, showcasing compelling results in both qualitative and quantitative assessments.