Highlights Global phage diversity is now primarily explored through cultivation-independent metagenomics Metagenome-derived phages are typically linked to their host(s) via in silico predictions Multiple alignment-dependent and alignment-free methods for host predictions have been proposed Recent integrative approaches combining several methods into a single prediction seem most promising Eventually, complementary in silico predictions and in vitro assays will enable the reconstruction of entire phage-host networks
AbstractBacterial communities play critical roles across all of Earth's biomes, affecting human health and global ecosystem functioning. They do so under strong constraints exerted by viruses, i.e., bacteriophages or "phages". Phages can reshape bacterial communities' structure, influence long-term evolution of bacterial populations, and alter host cell metabolism during infection. Metagenomics approaches, i.e., shotgun sequencing of environmental DNA or RNA, recently enabled large-scale exploration of phage genomic diversity, yielding several millions of phage genomes now to be further analyzed and characterized. One major challenge however is the lack of direct host information for these phages. Several methods and tools have been proposed to bioinformatically predict the potential host(s) of uncultivated phages based only on genome sequence information. Here we review these different approaches and highlight their distinct strengths and limitations. We also outline complementary experimental assays which are being proposed to validate and refine these bioinformatic predictions.