A set of points X = X B ∪ X R ⊆ R d is linearly separable if the convex hulls of X B and X R are disjoint, hence there exists a hyperplane separating X B from X R . Such a hyperplane provides a method for classifying new points, according to which side of the hyperplane the new points lie. When such a linear separation is not possible, it may still be possible to partition X B and X R into prespecified numbers of groups, in such a way that every group from X B is linearly separable from every group from X R . We may also discard some points as outliers, and seek to minimize the number of outliers necessary to find such a partition. Based on these ideas, Bertsimas and Shioda proposed the classification and regression by integer optimization (CRIO) method in 2007. In this work we explore the integer programming aspects of the classification part of CRIO, in particular theoretical properties of the associated formulation. We are able to find facet-inducing inequalities coming from the stable set polytope, hence showing that this classification problem has exploitable combinatorial properties.