Abstract-Linear discriminant analysis (LDA) is a wellknown dimension reduction approach, which projects highdimensional data into a low-dimensional space with the best separation of different classes. In many tasks, the data accumulates over time, and thus incremental LDA is more desirable than batch LDA. Several incremental LDA algorithms have been developed and achieved success; however, the eigenproblem involved requires a large computation cost, which hampers the efficiency of these algorithms. In this paper, we
Motivation
The complete characterization of enzymatic activities between molecules remains incomplete, hindering biological engineering and limiting biological discovery. We develop in this work a technique, Enzymatic Link Prediction (ELP), for predicting the likelihood of an enzymatic transformation between two molecules. ELP models enzymatic reactions catalogued in the KEGG database as a graph. ELP is innovative over prior works in using graph embedding to learn molecular representations that capture not only molecular and enzymatic attributes but also graph connectivity.
Results
We explore transductive (test nodes included in the training graph) and inductive (test nodes not part of the training graph) learning models. We show that ELP achieves high AUC when learning node embeddings using both graph connectivity and node attributes. Further, we show that graph embedding improves link prediction by 30% in AUC over fingerprint-based similarity approaches and by 8% over Support Vector Machines. We compare ELP against rule-based methods. We also evaluate ELP for predicting links in pathway maps and for reconstruction of edges in reaction networks of four common gut microbiota phyla: actinobacteria, bacteroidetes, firmicutes and proteobacteria. To emphasize the importance of graph embedding in the context of biochemical networks, we illustrate how graph embedding can guide visualization.
Availability
The code and datasets are available through https://github.com/HassounLab/ELP
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.