Recent research in biostatistics and bioinformatics focuses on diagnosing diseases using non-clinical approaches that involve machine learning methods. Several algorithmic procedures have been applied to solve various experimental problems that involve simulation and modelling of deoxyribonucleic acid (DNA) and ribonucleic acid (RNA) proteins. [1][2][3][4][5][6] The DNA and RNA are essential biological measurements used to monitor the abnormal cell growth in genetic sequencing, which serves as the bedrocks in non-clinical diagnoses.
Objective: Breast cancer is a leading cause of cancer-related death among women worldwide, with approximately 2.3 million new cases and 685,000 deaths reported in 2020 alone. One critical step in developing effective classification and prediction models is variable selection, which involves identifying a subset of relevant variables from a larger set of potential predictors. Accurate variable selection is crucial for building interpretable and robust models that are not overfit to noise, leading to improved model performance and generalization ability. In this paper, we proposed an alternative objective approach for comparing two Akaike Information Criterions (AIC) that originated from two competing models, such that the magnitude of the difference is subjected to the statistical test of significance. Material and Methods: We developed a new backward elimination variable selection procedure similar in spirit to the existing "stepAIC" within the environment of R statistical software. We used both simulated and Wisconsin breast cancer diagnostic datasets to compare the proposed method's variable selection and predictive performances with "stepAIC" and LASSO. Results: The simulation showed that the proposed AIC procedure achieved higher variable selection sensitivity, specificity and accuracy when compared to stepAIC and LASSO. Also, the proposed AIC method's prediction results are relatively comparable with ste-pAIC and LASSO at various simulated data dimensions. Similar supremacy results were observed with the breast cancer dataset used. Conclusion: The AIC-based variable selection approach proposed is a promising method that integrates AIC with statistical testing for improved variable selection in breast cancer classification and prediction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.