Genomic profiles among different breast cancer survivors who received similar treatment may provide clues about the key biological processes involved in the cells and finding the right treatment. More specifically, such profiling may help personalize the treatment based on the patients’ gene expression. In this paper, we present a hierarchical machine learning system that predicts the 5-year survivability of the patients who underwent though specific therapy; The classes are built on the combination of two parts that are the survivability information and the given therapy. For the survivability information part, it defines whether the patient survives the 5-years interval or deceased. While the therapy part denotes the therapy has been taken during that interval, which includes hormone therapy, radiotherapy, or surgery, which totally forms six classes. The Model classifies one class vs. the rest at each node, which makes the tree-based model creates five nodes. The model is trained using a set of standard classifiers based on a comprehensive study dataset that includes genomic profiles and clinical information of 347 patients. A combination of feature selection methods and a prediction method are applied on each node to identify the genes that can predict the class at that node, the identified genes for each class may serve as potential biomarkers to the class’s treatment for better survivability. The results show that the model identifies the classes with high-performance measurements. An exhaustive analysis based on relevant literature shows that some of the potential biomarkers are strongly related to breast cancer survivability and cancer in general.
Analyzing the genetic activity of breast cancer survival for a specific type of
therapy provides a better understanding of the body response to the treatment
and helps select the best course of action and while leading to the design of
drugs based on gene activity. In this work, we use supervised and nonsupervised
machine learning methods to deal with a multiclass classification problem in
which we label the samples based on the combination of the 5-year survivability
and treatment; we focus on hormone therapy, radiotherapy, and surgery. The
proposed nonsupervised hierarchical models are created to find the highest
separability between combinations of the classes. The supervised model consists
of a combination of feature selection techniques and efficient classifiers used
to find a potential set of biomarker genes specific to response to therapy. The
results show that different models achieve different performance scores with
accuracies ranging from 80.9% to 100%. We have investigated the roles of many
biomarkers through the literature and found that some of the discriminative
genes in the computational model such as ZC3H11A,
VAX2, MAF1, and ZFP91 are
related to breast cancer and other types of cancer.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.