Comprehensive functional annotation of vertebrate genomes is fundamental to biological discovery. Reverse genetic screening has been highly useful for determination of gene function, but is untenable as a systematic approach in vertebrate model organisms given the number of surveyable genes and observable phenotypes. Unbiased prediction of gene-phenotype relationships offers a strategy to direct finite experimental resources towards likely phenotypes, thus maximizing de novo discovery of gene functions. Here we prioritized genes for phenotypic assay in zebrafish through machine learning, predicting the effect of loss of function of each of 15,106 zebrafish genes on 338 distinct embryonic anatomical processes. Focusing on cardiovascular phenotypes, the learning procedure predicted known knockdown and mutant phenotypes with high precision. In proof-ofconcept studies we validated 16 high-confidence cardiac predictions using targeted morpholino knockdown and initial blinded phenotyping in embryonic zebrafish, confirming a significant enrichment for cardiac phenotypes as compared with morpholino controls. Subsequent detailed analyses of cardiac function confirmed these results, identifying novel physiological defects for 11 tested genes. Among these we identified tmem88a, a recently described attenuator of Wnt signaling, as a discrete regulator of the patterning of intercellular coupling in the zebrafish cardiac epithelium. Thus, we show that systematic prioritization in zebrafish can accelerate the pace of developmental gene function discovery.
KEY WORDS: Systems biology, Zebrafish, Cardiovascular, tmem88a
INTRODUCTIONDe novo gene function discovery has been greatly facilitated by systematic gene deletion and observation of resulting phenotypes in scalable model organisms. Indeed, systematic gene disruptions in S. cerevisiae (Costanzo et al., 2010; Giaever et al., 2002), C. elegans (Kamath et al., 2003) and Drosophila (Boutros et al., 2004) have each revealed molecular functions for thousands of genes. However, given the breadth and complexity of observable phenotypes in vertebrates, comprehensive assessment of gene function through serial observation of all possible phenotypes following gene disruption remains infeasible. A more efficient alternative would be to use gene function prediction to prioritize gene candidates for more detailed phenotypic testing based on a variety of known gene and protein properties and relationships.
RESEARCH ARTICLE TECHNIQUES AND RESOURCESComputational prediction of molecular function has been effective in assigning physiological roles to genes across eukaryotic model organisms (Deng et al., 2004;Guan et al., 2008; Huttenhower et al., 2009;Karaoz et al., 2004;Lee et al., 2004;Mostafavi et al., 2008;Tasan et al., 2012;Tasan et al., 2008;Troyanskaya et al., 2003). Similar prediction frameworks have been applied to predict associated phenotypes in yeast (King et al., 2003;Saha and Heber, 2006) and worm and to identify putative human disease gene candidates (Linghu et al., 2...