Classification tasks often include, among the large number of features to be processed in the datasets, many irrelevant and redundant ones, which can even decrease the efficiency of classifiers. Feature Selection (FS) is the most common preprocessing technique utilized to overcome the drawbacks of the high dimensionality of datasets and often has two conflicting objectives: The first function aims to maximize the classification performance or reduce the error rate of the classifier. In contrast, the second function is designed to minimize the number of features. However, the majority of wrapper FS techniques are developed for single-objective scenarios. Multi-verse optimizer (MVO) is considered as one of the well-regarded optimization approaches in recent years. In this paper, the binary multi-objective variant of MVO (MOMVO) is proposed to deal with feature selection tasks. The standard MOMVO suffers from local optima stagnation, so we propose an improved binary MOMVO to deal with this issue using the memory concept and personal best of the universes. The experimental results and comparisons indicate that the proposed binary MOMVO approach can effectively eliminate irrelevant and/or redundant features and maintain a minimum classification error rate when dealing with different datasets compared with the most popular feature selection techniques. Furthermore, the 14 benchmark datasets showed that the proposed approach outperforms the stat-of-art multi-objective optimization algorithms for feature selection.