SEQENS is an ensemble method aimed at feature identification that has demonstrated strong performance in identifying relevant features across different synthetic tasks (Signol et al, 2023). In this paper, we present a framework based on SEQENS spanning the following contributions: (1) computing the hypergeometric p-value of the features of a SEQENS output ranking in order to threshold into relevant and non-relevant features; (2) extending SEQENS by introducing the use of preselected features as hypotheses of relevance in the SFS, which may help to attract other features that might exhibit weak correlation with the target on their own, but gain relevance when combined with the preselected ones; (3) designing an automated process based on a 2D-cascade of SEQENS ensembles to obtain a \emph{purged feature set}, or PFS, i.e., having as many relevant features, and as few non-relevant, as possible; (4) integrating all the former techniques so that the PFS is used as in a SEQENS ensemble, which corresponds the complete framefork named pc-SEQENS.
The performance of pc-SEQENS is measured on a task of gene expression identification using the E-MTAB-3732 public database and synthetic groundtruths. pc-SEQENS is compared to other feature identification state-of-the-art methods, including SEQENS. On average, the proposed framework identifies better the relevant genes, specially in the most unfavorable sample-to-dimension rates, and exhibits a stronger stability.