A robust and efficient crowd counting framework forms a vital step towards the analysis of a crowded scene and finds its applications in social distance monitoring, traffic management, and video surveillance. We argue that visible RGB images fail to yield high-quality density maps owing to poor lighting conditions during the dark when an anomaly is more likely to occur. To tackle this scenario, we introduce a novel architecture Toggle-Fusion Network (TFNet) that effectively utilises a multimodal dataset, RGBT-CC, containing pairs of thermal and RGB images. Our approach eliminates the need for two branches for each of these modalities as proposed in the baseline by conditionally fusing the thermal and RGB images with the help of a toggle, saliency maps, and a rolling guidance filter. We conduct extensive experiments and delineate the importance of individual components of our method in the ablation study. TFNet, after evaluation on RGBT-CC, receives an RMSE value of 6.11 thereby establishing new state-of-the-art for the dataset. The code is made publicly available https://github.com/ShreyasSR/TFNet
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.