Prediction is one of the most attractive aspects in data mining. Link prediction has recently attracted the attention of many researchers as an effective technique to be used in graph based models in general and in particular for social network analysis due to the recent popularity of the field. Link prediction helps to understand associations between nodes in social communities. Existing link prediction-related approaches described in the literature are limited to predict links that are anticipated to exist in the future. To the best of our knowledge, none of the previous works in this area has explored the prediction of links that could disappear in the future. We argue that the latter set of links are important to know about; they are at least equally important as and do complement the positive link prediction process in order to plan better for the future. In this paper, we propose a link prediction model which is capable of predicting both links that might exist and links that may disappear in the future. The model has been successfully applied in two different though very related domains, namely health care and gene expression networks. The former application concentrates on physicians and their interactions while the second application covers genes and their interactions. We have tested our model using different classifiers and the reported results are encouraging. Finally, we compare our approach with the internal links approach and we reached the conclusion that our approach performs very well in both bipartite and non-bipartite graphs.
BackgroundPredicting type-1 Human Immunodeficiency Virus (HIV-1) protease cleavage site in protein molecules and determining its specificity is an important task which has attracted considerable attention in the research community. Achievements in this area are expected to result in effective drug design (especially for HIV-1 protease inhibitors) against this life-threatening virus. However, some drawbacks (like the shortage of the available training data and the high dimensionality of the feature space) turn this task into a difficult classification problem. Thus, various machine learning techniques, and specifically several classification methods have been proposed in order to increase the accuracy of the classification model. In addition, for several classification problems, which are characterized by having few samples and many features, selecting the most relevant features is a major factor for increasing classification accuracy.ResultsWe propose for HIV-1 data a consistency-based feature selection approach in conjunction with recursive feature elimination of support vector machines (SVMs). We used various classifiers for evaluating the results obtained from the feature selection process. We further demonstrated the effectiveness of our proposed method by comparing it with a state-of-the-art feature selection method applied on HIV-1 data, and we evaluated the reported results based on attributes which have been selected from different combinations.ConclusionApplying feature selection on training data before realizing the classification task seems to be a reasonable data-mining process when working with types of data similar to HIV-1. On HIV-1 data, some feature selection or extraction operations in conjunction with different classifiers have been tested and noteworthy outcomes have been reported. These facts motivate for the work presented in this paper.Software availabilityThe software is available at http://ozyer.etu.edu.tr/c-fs-svm.rar.The software can be downloaded at esnag.etu.edu.tr/software/hiv_cleavage_site_prediction.rar; you will find a readme file which explains how to set the software in order to work.
Classification is a technique widely and successfully used for prediction, which is one of the most attractive features of data mining. However, building the classifier is the most challenging part of the process, which proceeds into testing the classifier to check its effectiveness. This article introduces a classification framework that integrates fuzzy association rules into the learning process of machine learning techniques. The integrated framework involves three major components. First, we employ multiobjective optimization twice to decide on the fuzzy sets and then optimize their ranges to extract a set of interesting fuzzy association rules. Second, we use a special subset of the extracted fuzzy association rules, namely, fuzzy class association rules, for building a set of new feature vectors that measure the compatibility between the rules and the given data objects. Third, we train a classifier on the generated feature vectors to predict the class of unseen objects. Most of the earlier algorithms proposed for mining fuzzy association rules assume that the fuzzy sets are given. However, the fuzzy association rule mining component of the proposed framework uses an automated method for autonomous mining of both fuzzy sets and fuzzy association rules. For this purpose, first fuzzy sets are constructed by using a multiobjective genetic algorithm based clustering method for determining and optimizing the membership functions of the fuzzy sets. Then, a method is applied to extract interesting fuzzy association rules. Further, the proposed framework integrates a new layer to the learning process of the machine learning algorithm by constructing the compatibility rule-based feature vectors; this satisfies the aim of better understandability. Once used by the learning algorithm, the compatibility feature vectors represent a rich source of discrimination knowledge that can substantially impact the prediction power of the final classifier. The experimental study and the reported results show the efficiency and effectiveness of our framework for benchmark datasets. In order to further demonstrate and evaluate the applicability of the proposed method to a variety of domains, it is utilized for the task of gene expression classification as well.
The tremendous research effort on diseases and drug discovery has produced a huge amount of important biomedical information which is mostly hidden in the web. In addition, many databases have been created for the purpose of storing enormous amounts of information and high-throughput experiments related to drugs and diseases' effects on genes. Thus, developing an algorithm to integrate biological data from different sources forms one of the greatest challenges in the field of computational biology. Based on our belief that data integration would result in better understanding for the drug mode of action or the disease pathophysiology, we have developed a novel paradigm to integrate data from three major sources in order to predict novel therapeutic drug indications. Microarray data, biomedical text mining data, and gene interaction data have been all integrated to predict ranked lists of genes based on their relevance to a particular drug or disease molecular action. These ranked lists of genes have finally been used as a raw material for building a disease-drug connectivity map based on the enrichment between the up/down tags of a particular disease signature and the ranked lists of drugs. Using this paradigm, we have reported 13% sensitivity improvement in comparison with using microarray or text mining data independently. In addition, our paradigm is able to predict many clinically validated disease-drug associations that could not be captured using microarray or text mining data independently.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.