Simultaneous feature selection and outlier detection with optimality guarantees

Insolia, Luca; Kenney, Ana; Chiaromonte, Francesca; Felici, G.

doi:10.1111/biom.13553

Cited by 12 publications

(20 citation statements)

References 60 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Nevertheless, nowadays it can be solved effectively and at times also efficiently with specialized solvers. Importantly, it relates to the use of a trimmed loss function as in [10], and it extends the work in [7] for sparse linear regression models affected by data contamination in the form of mean-shift outliers. However, here the use of a nonlinear and nonquadratic objective function complicates the matter and requires special attention.…”

Section: Miprob: Robust Variable Selection Under the Logistic Slippage Modelmentioning

confidence: 90%

“…For instance, an ensemble method based on existing heuristic and robust procedures to create suitable big-M bounds was considered in [7]. However, a similar approach is challenging in this framework given a "pool" of openly available robust algorithms is not available for logistic regression models-unlike in linear regression.…”

Section: Algorithmic Implementationmentioning

confidence: 99%

“…For instance, one might consider robust counterparts of information criteria or cross-validation. In our simulation study, we do not include the L 2 -constraint and, for a given trimming level k n , we use a robust version of the Bayesian information criterion (BIC) similarly to [7]. In symbols, this is BIC = k p ln (n − k n ) + ∑ n i=1 d(x T i β, y i ), where d(x T i β, y i ) are the final deviances for a given estimator-recall that deviances corresponding to trimmed points are equal to 0.…”

Section: Additional Detailsmentioning

confidence: 99%

“…This motivates the development of robust estimation techniques. Notably, penalized estimation and robustness with respect to the presence of outliers are very closely related topics [6][7][8], and they have recently also been combined for logistic regression settings [9,10].…”

Section: Introductionmentioning

confidence: 99%

“…Specifically, we consider an L 0 sparsity assumption on the coefficients [13] and a logistic slippage model for the outlying observations [14]. We further build upon the work in [7] and rely on L 0 -constraints to detect outlying cases and select relevant features. This requires us to solve a double combinatorial problem, across both the units and the covariates.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Robust Variable Selection with Optimality Guarantees for High-Dimensional Logistic Regression

Insolia

Kenney

Calovi

et al. 2021

Stats

Self Cite

View full text Add to dashboard Cite

High-dimensional classification studies have become widespread across various domains. The large dimensionality, coupled with the possible presence of data contamination, motivates the use of robust, sparse estimation methods to improve model interpretability and ensure the majority of observations agree with the underlying parametric model. In this study, we propose a robust and sparse estimator for logistic regression models, which simultaneously tackles the presence of outliers and/or irrelevant features. Specifically, we propose the use of L0-constraints and mixed-integer conic programming techniques to solve the underlying double combinatorial problem in a framework that allows one to pursue optimality guarantees. We use our proposal to investigate the main drivers of honey bee (Apis mellifera) loss through the annual winter loss survey data collected by the Pennsylvania State Beekeepers Association. Previous studies mainly focused on predictive performance, however our approach produces a more interpretable classification model and provides evidence for several outlying observations within the survey data. We compare our proposal with existing heuristic methods and non-robust procedures, demonstrating its effectiveness. In addition to the application to honey bee loss, we present a simulation study where our proposal outperforms other methods across most performance measures and settings.

show abstract

Section: Miprob: Robust Variable Selection Under the Logistic Slippage Modelmentioning

confidence: 90%

Section: Algorithmic Implementationmentioning

confidence: 99%

Section: Additional Detailsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Robust Variable Selection with Optimality Guarantees for High-Dimensional Logistic Regression

Insolia

Kenney

Calovi

et al. 2021

Stats

Self Cite

View full text Add to dashboard Cite

show abstract

A Mixed-Integer Formulation for the Simultaneous Input Selection and Outlier Filtering in Soft Sensor Training

Sildir,

Boy,

Sarrafi

2024

Inf Syst Front

View full text Add to dashboard Cite

Soft sensors are used to calculate the real-time values of process variables which can be measured in the laboratory only or require expensive online measurement tools. A set of mathematical expressions are developed and trained from historical data to exploit the statistical knowledge between online and offline measurements to ensure a reliable prediction performance, for optimization and control purposes. This study focuses on the development of a mixed-integer optimization problem to perform input selection and outlier filtering simultaneously using rigorous algorithms during the training procedure, unlike traditional heuristic and sequential methods. Nonlinearities and nonconvexities in the optimization problem is further tailored for global optimality and computational advancements by reformulations and piecewise linearizations to address the complexity of the task with additional binary variables, representing the selection of a particular input or data. The proposed approach is implemented on actual data from two different industrial plants and compared to traditional approach.

show abstract

Robust variable selection and estimation via adaptive elastic net S-estimators for linear regression

Kepplinger

2023

Computational Statistics & Data Analysis

View full text Add to dashboard Cite

Simultaneous feature selection and outlier detection with optimality guarantees

Cited by 12 publications

References 60 publications

Robust Variable Selection with Optimality Guarantees for High-Dimensional Logistic Regression

Robust Variable Selection with Optimality Guarantees for High-Dimensional Logistic Regression

A Mixed-Integer Formulation for the Simultaneous Input Selection and Outlier Filtering in Soft Sensor Training

Robust variable selection and estimation via adaptive elastic net S-estimators for linear regression

Contact Info

Product

Resources

About