Purpose: Individualized therapy of lung adenocarcinoma depends on the accurate classification of patients into subgroups of poor and good prognosis, which reflects a different probability of disease recurrence and survival following therapy. However, it is currently impossible to reliably identify specific high-risk patients. Here, we propose a computational model system which accurately predicts the clinical outcome of individual patients based on their gene expression profiles. Experimental Design: Gene signatures were selected using feature selection algorithms random forests, correlation-based feature selection, and gain ratio attribute selection. Prediction models were built using random committee and Bayesian belief networks. The prognostic power of the survival predictors was also evaluated using hierarchical cluster analysis and Kaplan-Meier analysis.Results: The predictive accuracy of an identified 37-gene survival signature is 0.96 as measured by the area under the time-dependent receiver operating curves. The cluster analysis, using the 37-gene signature, aggregates the patient samples into three groups with distinct prognoses (Kaplan-Meier analysis, P < 0.0005, log-rank test). All patients in cluster 1 were in stage I, with N 0 lymph node status (no metastasis) and smaller tumor size (T 1 or T 2 ). Additionally, a 12-gene signature correctly predicts the stage of 94.2% of patients. Conclusions: Our results show that the prediction models based on the expression levels of a small number of marker genes could accurately predict patient outcome for individualized therapy of lung adenocarcinoma. Such an individualized treatment may significantly increase survival due to the optimization of treatment procedures and improve lung cancer survival every year through the 5-year checkpoint.
This paper describes a novel methodology for predicting fault prone modules. The methodology is based on Dempster-Shafer (D-S) belief networks. Our approach consists of three steps: First, building the Dempster-Shafer network by the induction algorithm; Second, selecting the predictors (attributes) by the logistic procedure; Third, feeding the predictors describing the modules of the current project into the inducted Dempster-Shafer network and identifying fault prone modules. We applied this methodology to a NASA dataset. The prediction accuracy of our methodology is higher than that achieved by logistic regression or discriminant analysis on the same dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.