BackgroundDeciphering complete networks of interactions between proteins is the key to comprehend cellular regulatory mechanisms. A significant effort has been devoted to expanding the coverage of the proteome-wide interaction space at molecular level. Although a growing body of research shows that protein docking can, in principle, be used to predict biologically relevant interactions, the accuracy of the across-proteome identification of interacting partners and the selection of near-native complex structures still need to be improved.ResultsIn this study, we developed a new method to discover and model protein interactions employing an exhaustive all-to-all docking strategy. This approach integrates molecular modeling, structural bioinformatics, machine learning, and functional annotation filters in order to provide interaction data for the bottom-up assembly of protein interaction networks. Encouragingly, the success rates for dimer modeling is 57.5 and 48.7% when experimental and computer-generated monomer structures are employed, respectively. Further, our protocol correctly identifies 81% of protein-protein interactions at the expense of only 19% false positive rate. As a proof of concept, 61,913 protein-protein interactions were confidently predicted and modeled for the proteome of E. coli. Finally, we validated our method against the human immune disease pathway.ConclusionsProtein docking supported by evolutionary restraints and machine learning can be used to reliably identify and model biologically relevant protein assemblies at the proteome scale. Moreover, the accuracy of the identification of protein-protein interactions is improved by considering only those protein pairs co-localized in the same cellular compartment and involved in the same biological process. The modeling protocol described in this communication can be applied to detect protein-protein interactions in other organisms and pathways as well as to construct dimer structures and estimate the confidence of protein interactions experimentally identified with high-throughput techniques.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-017-1675-z) contains supplementary material, which is available to authorized users.