Discretization-based feature selection approaches have shown interesting results when using several metaheuristic algorithms such as Particle Swarm Optimization (PSO), Genetic Algorithm (GA), Ant Colony Optimization (ACO), etc. However, these methods share the same shortcoming which consists in encoding the problem solution as a sequence of cut-points. From this cut-points vector, the decision of deleting or selecting any feature is induced. Indeed, the number of generated cutpoints varies from one feature to another. Thus, the higher the number of cut-points, the higher the probability of selecting the considered feature and vice versa. This fact leads to the deletion of possibly important features having a single or a low number of cut-points, such as the infection rate, the glycemia level, and the blood pressure. In order to solve the issue of the dependency relation between the feature selection (or removal) event and the number of its generated potential cut-points, we propose to model the discretization-based feature selection task as a bi-level optimization problem and then solve it using an improved version of an existing co-evolutionary algorithm, named I-CEMBA. The latter ensures the variation of the number of features during the migration process in order to deal with the multimodality aspect. The resulting algorithm, termed Bi-DFS (Bi-level Discretization-based Feature Selection), performs selection at the upper level while discretization is done at the lower level. The experimental results on several high-dimensional datasets show that Bi-DFS outperforms relevant state-of-the-art methods in terms of classification accuracy, generalization ability, and feature selection bias.