This paper proposes a dual-network-based feature extractor, perceptive capsule network (PCapN), for multivariate time series classification (MTSC), including a local feature network (LFN) and a global relation network (GRN). The LFN has two heads (i.e., Head A and Head B), each containing two squash CNN blocks and one dynamic routing block to extract the local features from the data and mine the connections among them. The GRN consists of two capsule-based transformer blocks and one dynamic routing block to capture the global patterns of each variable and correlate the useful information of multiple variables. Unfortunately, it is difficult to directly deploy PCapN on mobile devices due to its strict requirement for computing resources. So, this paper designs a lightweight capsule network (LCapN) to mimic the cumbersome PCapN. To promote knowledge transfer from PCapN to LCapN, this paper proposes a deep transformer capsule mutual (DTCM) distillation method. It is targeted and offline, using one-and two-way operations to supervise the knowledge distillation process for the dualnetwork-based student and teacher models. Experimental results show that the proposed PCapN and DTCM achieve excellent performance on UEA2018 datasets regarding top-1 accuracy.