Advanced biotechnology makes it possible to access a multitude of heterogeneous proteomic, interactomic, genomic, and functional annotation data. One challenge in computational biology is to integrate these data to enable automated prediction of the Subcellular Localizations (SCL) of human proteins. For proteins that have multiple biological roles, their correct in silico assignment to di erent SCL can be considered as an imbalanced multi-label classi cation problem. In this study, we developed a Bayesian Collective Markov Random Fields (BCMRFs) model for multi-SCL prediction of human proteins. Given a set of unknown proteins and their corresponding protein-protein interaction (PPI) network, the SCLs of each protein can be inferred by the SCLs of its interacting partners. To do so, we integrate PPIs, the adjacency of SCLs and protein features, and perform transductive learning on the re-balanced dataset. Our experimental results show that the spatial adjacency of the SCLs improves multi-SCL prediction, especially for the SCLs with few annotated instances. Our approach outperforms the state-of-art PPIbased and feature-based multi-SCL prediction method for human proteins.
KEYWORDSHuman protein subcellular localization; markov random eld; transductive learning; imbalanced multi-label classi cation.