Results 41According to candidate gene prediction performance evaluations tested under four different 42 semantic similarity calculation methods (Lin, Resnik, Schlicker, and Wang), the integrated 43 networks showed better receiver operating characteristic (ROC) and precision-recall curve 44 performances than PPI networks for both zebrafish and mouse. 45
Conclusion 46Integration of existing experimental knowledge about gene-anatomical entity relationships with 47 PPI networks via anatomy ontology improves the network quality, which makes them better 48 optimized for predicting candidate genes for anatomical entities. 49 50 molecular and phenotypic functions of proteins is a cornerstone in molecular 68 biology. In particular, understanding the genes associated with the formation of anatomical 69 structures, also termed 'anatomical entities', is essential in developmental biology [1][2][3][4]. The 70 majority of genes associated with anatomical entities are obtained using wet-lab methods, such 71 as gene knockout [5, 6], gene knockdown [7], and overexpression [8, 9]. These methods, 72 however, are time-consuming and require significant resources, and thus only a few genes may 73 be associated with the development of a particular anatomical entity, though there are likely 74 many more genes involved. 75Alternatively, computational prediction methods for discovering gene-anatomical entity 76 associations can be employed because of their higher speed and low resource consumption. 77Sequence similarity-based function prediction is such an example, which is widely used to 78 predict the molecular functions of proteins [10, 11]. However, using it to predict anatomical 79 associations of genes is questionable, because anatomical entities develop from a combination of 80 several biological pathways that include proteins with diverse molecular functions and sequences 81[12]. On the other hand, protein-protein interaction (PPI) networks can be used to predict 82 candidate genes for anatomical entities, based on the assumption that proteins that regulate the 83 same term or function are more likely to physically interact with each other [13, 14]. PPI 84 networks represent such interactions as graphs where proteins are represented by nodes and their 85 interactions are represented by edges. PPI networks have been widely used in predicting 86 candidate genes for human disease phenotypes [15][16][17]. Therefore, PPI networks are suitable for 87 predicting candidate genes associated with anatomical entities. However, the challenge with PPI 88 network-based candidate gene prediction is improving the accuracy of the predictions [13,[18][19][20][21], which is low because of the poor quality of the large-scale PPI network data sets [14, 21-90 23]. PPI networks are generated by experimental methods such as yeast two-hybrid assay and 91 high-throughput mass-spectrometric protein complex identification (HMS-PCI), which can 92 generate false positive interactions [19]. Furthermore, PPI networks for model organisms are still 93incomplete ...