BackgroundAccurate determination of protein complexes is crucial for understanding cellular
organization and function. High-throughput experimental techniques have generated
a large amount of protein-protein interaction (PPI) data, allowing prediction of
protein complexes from PPI networks. However, the high-throughput data often
includes false positives and false negatives, making accurate prediction of
protein complexes difficult.MethodThe biomedical literature contains large quantities of PPI data that, along with
high-throughput experimental PPI data, are valuable for protein complex
prediction. In this study, we employ a natural language processing technique to
extract PPI data from the biomedical literature. This data is subsequently
integrated with high-throughput PPI and gene ontology data by constructing
attributed PPI networks, and a novel method for predicting protein complexes from
the attributed PPI networks is proposed. This method allows calculation of the
relative contribution of high-throughput and biomedical literature PPI data.ResultsMany well-characterized protein complexes are accurately predicted by this method
when apply to two different yeast PPI datasets. The results show that (i)
biomedical literature PPI data can effectively improve the performance of protein
complex prediction; (ii) our method makes good use of high-throughput and
biomedical literature PPI data along with gene ontology data to achieve
state-of-the-art protein complex prediction capabilities.