Background: Taxonomic classification of microbiomes has provided tremendous insight into the underlying genome dynamics of microbial communities but has relied on known microbial genomes contained in curated reference databases.Methods: We propose K-core graph decomposition as a novel approach for tracking metagenome dynamics that is taxonomy-oblivious. K-core performs hierarchical decomposition which partitions the graph into shells containing nodes having degree at least K called K-shells, yielding O(E + V ) complexity.Results: The results of the paper are two-fold: (1) KOMB can identify homologous regions efficiently in metagenomes, (2) KOMB reveals community profiles that capture intra-and inter-genome dynamics, as supported by our results on simulated, synthetic, and real data. 1 2 Background 3Graph-based representations and analyses paved the way for several advances in 4 computational biology over the last few decades [1][2][3]. This is particularly evident 5 in the progress made in the field of genome assembly, both for isolate genome 6 assembly [4,5] and metagenome assembly, as well as efficient detection of struc-7 tural variants [6-8] using genome graphs [9][10][11]. Indeed, state-of-the-art graph-8 based metagenome assemblers [12-15] have achieved remarkable improvements in 9 both run-time and accuracy in recent years [16] through the use of efficient data 10 Balaji et al.Page 2 of 34 structures and clever heuristics. Recent examples include compact De Bruijn graph 11 construction and traversal for assembly [17,18] as well as scaffold graphs for metage-12 nomic samples that can generate scaffolds from contiguous overlapping sequences 13 77regions into various shells that can then be used to analyze genomic variation in 78 the sample. We show that the distribution of nodes could lead to a new method-79 ology that describes metagenomic community structure based on sample specific 80 signatures obtained from KOMB profiles. In Methods, found towards the end of 81 the manuscript, we describe the pipeline of the tool, explain unitig graph construc-82 tion, and elaborate on the concept of K-core decomposition. In the Results section, 83 we provide a rigorous validation of our novel K-core decomposition tool KOMB as 84 applied to unitig graphs constructed from simulated data as well as synthetic and 85 real metagenomes. We demonstrate its effectiveness in identifying repetitive regions 86 across sample types and sizes and illustrate how KOMB profiles can be used to 87 visualize community structure. Finally, in the Discussion and Conclusions we cover 88 the salient points and main conclusions from our study and lay out future directions 89 of our research.
90
Results
91We present a thorough validation of KOMB as applied to various simulated, syn-92 thetic, and real datasets. We do this through three major sets of experiments.
93First, we demonstrate the efficacy of the application of the K-core decomposition 94 algorithm in genomics by testing it on simulated genomes constructed as random 95 sequences to wh...