Multivariate time series (MTS) clustering has been an essential research topic in various domains over the past decades. However, inherent properties of MTS data—namely, temporal dynamics and inter-variable correlations—make MTS clustering challenging. These challenges can be addressed in Grassmann manifold learning combined with state-space dynamical modeling, which allows existing clustering techniques to be applicable using similarity measures defined on MTS data. In this paper, we present a systematic overview of Grassmann MTS clustering from a geometrical perspective, categorizing the methods into three approaches: (i) extrinsic, (ii) intrinsic, and (iii) semi-intrinsic. Consequently, we outline 11 methods for Grassmann clustering and demonstrate their effectiveness through a comparative experimental study using human motion gesture-derived MTS data.