The use of Synthetic Aperture Sonar (SAS) in autonomous underwater vehicle (AUV) surveys has found applications in archaeological searches, underwater mine detection and wildlife monitoring. However, the easy confusability of natural objects with the target object leads to high false positive rates. To improve detection, the combination of SAS and optical images has recently attracted attention. While SAS data provides a large-scale survey, optical information can help contextualize it. This combination creates the need to match multimodal, optical–acoustic image pairs. The two images are not aligned, and are taken from different angles of view and at different times. As a result, challenges such as the different resolution, scaling and posture of the two sensors need to be overcome. In this research, motivated by the information gain when using both modalities, we turn to statistical exploration for feature analysis to investigate the relationship between the two modalities. In particular, we propose an entropic method for recognizing matching multimodal images of the same object and investigate the probabilistic dependency between the images of the two modalities based on their conditional probabilities. The results on a real dataset of SAS and optical images of the same and different objects on the seafloor confirm our assumption that the conditional probability of SAS images is different from the marginal probability given an optical image, and show a favorable trade-off between detection and false alarm rate that is higher than current benchmarks. For reproducibility, we share our database.