A catalog of all human protein-protein interactions would provide scientists with a framework to study protein deregulation in complex diseases such as cancer. Here we demonstrate that a probabilistic analysis integrating model organism interactome data, protein domain data, genomewide gene expression data and functional annotation data predicts nearly 40,000 protein-protein interactions in humans⎯a result comparable to those obtained with experimental and computational approaches in model organisms. We validated the accuracy of the predictive model on an independent test set of known interactions and also experimentally confirmed two predicted interactions relevant to human cancer, implicating uncharacterized proteins into definitive pathways. We also applied the human interactome network to cancer genomics data and identified several interaction subnetworks activated in cancer. This integrative analysis provides a comprehensive framework for exploring the human protein interaction network.We began by assembling a collection of genomic and proteomic data potentially useful in predicting human protein-protein interactions that included model organism protein-protein interactions 1 , protein domain assignments 2 , gene expression measurements in human tissue samples 3 and biological function annotations 4 ( Table 1). Based on previous reports, we suspected that (i) model organism interactions may suggest interactions among orthologous human proteins 5,6 , (ii) similar gene expression profiles across a panel of human tissue samples may identify interacting protein products 7,8 , (iii) protein domain pairs enriched among known human protein-protein interactions may suggest novel interactions 9 , (iv) shared functional annotations from Gene Ontology 4 may suggest physical interactions, and (v) that combining evidence from independent data sources may strongly predict protein-protein interactions [10][11][12] . To test these hypotheses, we applied a naive Bayes classifier 7 , a method well-suited for integrating disparate data types.A gold standard positive set (GSP) of 11,678 distinct protein-protein interactions among 5,505 proteins was queried from the Human Protein Reference Database (HPRD) 12 , a resource that contains known protein-protein interactions manually curated from the literature by expert biologists. A gold standard negative set (GSN) of 3,106,928 protein pairs was defined, in which one protein was assigned to the plasma membrane cellular component and the other to the nuclear cellular component by the Gene Ontology Consortium 4 . Although it is known that membrane proteins can occasionally interact with nuclear proteins, we demonstrated that there are far fewer known interactions within GSN than would be expected by chance (Supplementary Methods online). By averaging the number of interactions per protein in the GSP, we estimated the prior odds of interaction among two randomly selected proteins to be 1 in 381. This is likely an underestimate of the true prior odds because all protein-protein intera...