Horizontal transfer of genomic elements is an essential force that shapes microbial genome evolution. This process occurs via various mechanisms and has been studied in detail for a variety of biological systems.However, a coarse-grained, global picture of horizontal gene transfer (HGT) in the microbial world is still missing. One reason is the difficulty to process large amounts of genomic microbial data to find and characterize HGT events, especially for highly distant organisms. Here, we exploit that HGT between distant species creates long identical DNA sequences in distant species, which can be found efficiently using alignment-free methods. We analyzed over 90, 000 bacterial genomes and thus identified over 100, 000 events of HGT. We further develop a mathematical model to analyze the statistical properties of those long exact matches and thus estimate the transfer rate between any pair of taxa. Our results demonstrate that long-distance gene exchange (across phyla) is very frequent, as more than 8% of the bacterial genomes analyzed have been involved in at least one such event. Finally, we confirm that the function of the transferred sequences strongly impact the transfer rate, as we observe a 3.5 order of magnitude variation between the most and the least transferred categories. Overall, we provide a unique view of horizontal transfer across the bacterial tree of life, illuminating one fundamental process driving bacterial evolution.Microbial genomes are subject to loss and gain of genetic material from other organisms [5, 59], via 2 a variety of mechanisms: conjugation, transduction, and transformation, collectively known as horizontal 3 gene transfer (HGT) [69, 26]. The exchange of genetic material is a key driver of microbial evolution that 4 allows rapid adaptation to local niches [6]. Gene acquisition via HGT can provide microbes with adaptive 5 traits, conferring a selective advantage in particular conditions [34, 42] and eliminate deleterious mutations, 6 resolving Muller's ratchet paradox [70].
7Since the discovery of HGT more than 50 years ago [24] many cases of HGT have been intensively 8 studied. Several methods to infer HGT rely on identifying shifts in (oligo-)nucleotide compositions along 9 genomes [62]. Other methods are based on discrepancies between gene and species distances, i.e., sur-10 prising similarity between genomic regions belonging to distant organisms that cannot be satisfactorily 11 explained by their conservation [38, 49, 35, 51, 18, 19, 9]. For example, genomes from different genera are 12 typically up to 60 − 70% identical, meaning that one in every three base pairs is expected to differ. The 13 presence of regions in different genomes that are significantly more similar than expected can, therefore, be 14 interpreted as recent HGT events. Using such methods the transfer of drug-and metal-resistance genes [31], 15 toxin-antitoxin systems [72] and virulence factors [22, 50] have been observed numerous times. It is also 16 known that some bacterial taxa, such as members of t...