Conventional clinical decision support systems are generally based on a single classifier or a simple combination of these models, showing moderate performance. In this paper, we propose a classifier ensemble-based method for supporting the diagnosis of cardiovascular disease (CVD) based on aptamer chips. This AptaCDSS-E system overcomes conventional performance limitations by utilizing ensembles of different classifiers. Recent surveys show that CVD is one of the leading causes of death and that significant life savings can be achieved if precise diagnosis can be made. For CVD diagnosis, our system combines a set of four different classifiers with ensembles. Support vector machines and neural networks are adopted as base classifiers. Decision trees and Bayesian networks are also adopted to augment the system. Four aptamer-based biochip data sets including CVD data containing 66 samples were used to train and test the system. Three other supplementary data sets are used to alleviate data insufficiency. We investigated the effectiveness of the ensemble-based system with several different aggregation approaches by comparing the results with single classifier-based models. The prediction performance of the AptaCDSS-E system was assessed with a cross-validation test. The experimental results show that our system achieves high diagnosis accuracy (>94%) and comparably small prediction difference intervals (<6%), proving its usefulness in the clinical decision process of disease diagnosis. Additionally, 10 possible biomarkers are found for further investigation.
Abstract. Human papillomavirus (HPV) is considered to be the most common sexually transmitted disease and the infection of HPV is known as the major factor for cervical cancer. There are more than 100 types in HPV and each HPV has two risk types, low and high. In particular, high risk type HPV is known to the most important factors in medical judgment. Thus, the classifying the risk type of HPV is very important to the treat of cervical cancer. In this paper, we present a machine learning approach to mine the structure of HPV DNA sequence for effective classification of the HPV risk types. We learn the most informative subsequence segment sets and its weights with genetic algorithm to classify the risk types of each HPV. To resolve the problem of computational complexity of genetic algorithm we use distributed intelligent data engineering platform based on active grid concept called "IDEA@Home." The proposed genetic mining method, with the described platform, shows about 85.6% classification accuracy with relatively fast mining speed.
Abstract. PubMiner, an intelligent machine learning based text mining system for mining biological information from the literature is introduced. PubMiner utilize natural language processing and machine learning based data mining techniques for mining useful biological information such as protein-protein interaction from the massive literature data. The system recognizes biological terms such as gene, protein, and enzymes and extracts their interactions described in the document through natural language analysis. The extracted interactions are further analyzed with a set of features of each entity which were constructed from the related public databases to infer more interactions from the original interactions. An inferred interaction from the interaction analysis and native interaction are provided to the user with the link of literature sources. The evaluation of system performance proceeded with the protein interaction data of S.cerevisiae (bakers yeast) from MIPS and SGD.
Abstract. Human Papillomavirus (HPV) is known as the main cause of cervical cancer and classified to low-or high-risk type by its malignant potential. Detection of high-risk HPVs is critical to understand the mechanisms and recognize potential patients in medical judgments. In this paper, we present a simple kernel approach to classify HPV risk types from E6 protein sequences. Our method uses support vector machines combined with gap-spectrum kernels. The gap-spectrum kernel is introduced to compute the similarity between amino acids pairs with a fixed distance, which can be useful for the helical structure of proteins. In the experiments, the proposed method is compared with a mismatch kernel approach in accuracy and F1-score, and the predictions for unknown types are presented.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.