Feature selection involes identifying the most relevant subset of input features, with a view to improving generalization of predictive models by reducing overfitting. Directly searching for the most relevant combination of attributes is NP-hard. Variable selection is of critical importance in many applications, such as micro-array data analysis, where selecting a small number of discriminative features is crucial to developing useful models of disease mechanisms, as well as for prioritizing targets for drug discovery. In this paper, we use very new results in machine learning to develop a novel feature selection strategy. The recently proposed Minimal Complexity Machine (MCM) provides a way to learn a hyperplane classifier by minimizing an exact (Θ) bound on its VC dimension.It is well known that a lower VC dimension contributes to good generalization.Experimental results show that the MCM learns very sparse representations; on many datasets, the kernel MCM yields comparable or better test set accuracies while using less than one-tenth the number of support vectors. For a linear hyperplane classifier in the input space, the VC dimension is upper bounded by the number of features; hence, a linear classifier with a small VC dimension is parsimonious in the set of features it employs. In this paper, we use the linear MCM to learn a classifier in which a large number of weights are zero; features