Loop closure detection (LCD), also known as place recognition, is a crucial component of visual simultaneous localization and mapping (vSLAM) systems, aiding in the reduction of cumulative localization errors on a global scale. However, changes in environmental appearance and differing viewpoints pose significant challenges to the accuracy of the LCD algorithm. Addressing this issue, this paper presents a novel end-to-end framework (MetricNet) for LCDs to enhance detection performance in complex scenes with distinct appearance variations. Focusing on deep features with high distinguishability, an attention-based Channel Weighting Module(CWM) is designed to adaptively detect salient regions of interest. In addition, a patch-by-patch Similarity Measurement Module (SMM) is incorporated to steer the network for handling challenging situations that tend to cause perceptual aliasing. Experiments on three typical datasets have demonstrated MetricNet’s appealing detection performance and generalization ability compared to many state-of-the-art learning-based methods, where the mean average precision is increased by up to 11.92%, 18.10%, and 5.33% respectively. Moreover, the detection results on additional open datasets with apparent viewpoint variations and the odometry dataset for localization problems have also revealed the dependability of MetricNet under different adaptation scenarios.