Abstract-Loop closure detection is an essential component for simultaneously localization and mapping in a variety of robotics applications. One of the most challenging problems is to perform long-term place recognition with strong perceptual aliasing and appearance variations due to changes of illumination, vegetation, weather, etc. To address this challenge, we propose a novel Robust Multimodal Sequence-based (ROMS) method for long-term loop closure detection, by formulating image sequence matching as an optimization problem regularized by structured sparsity-inducing norms. Our method is able to model the sparsity nature of place recognition, i.e., the current location should match only a small subset of previously visited places, as well as to model underlying structures of image sequences and incorporate multiple feature modalities to construct a discriminative scene representation. In addition, a new optimization algorithm is developed to efficiently solve the formulated problem, which has a theoretical guarantee to converge to the global optimal solution. To evaluate the ROMS algorithm, extensive experiments are performed using large-scale benchmark datasets, including St Lucia, CMU-VL, and Nordland datasets. Experimental results have validated that our algorithm outperforms previous loop closure detection methods, and obtains the state-of-the-art performance on long-term place recognition.