“…To this end, the de Bruijn graph has become an object of central importance in many genomic analysis tasks. While it was initially used mostly in the context of genome (and transcriptome) assembly (EULER (Pevzner et al, 2001), EULER-SR (Chaisson and Pevzner, 2008), Velvet (Zerbino and Birney, 2008;Zerbino et al, 2009), ALLPATHS (Butler et al, 2008;MacCallum et al, 2009), ABySS (Simpson et al, 2009), Trans-AByss (Robertson et al, 2010), SPAdes (Bankevich et al, 2012), Minia (Chikhi and Rizk, 2013), SOAPdenovo (Li et al, 2010;Luo et al, 2015)), it has seen increasing use in i i i i i i i i comparative genomics (Cortex (Iqbal et al, 2012), DISCOSNP (Uricaru et al, 2014), Scalpel (Fang et al, 2016), BubbZ (Minkin and Medvedev, 2020)), and has also been used increasingly in the context of indexing genomic data, either from raw sequencing reads (Vari (Muggli et al, 2017), Mantis (Pandey et al, 2018;Almodaresi et al, 2019), VariMerge (Muggli et al, 2019), MetaGraph (Karasikov et al, 2020)), or from assembled reference sequences (deBGA (Liu et al, 2016), Pufferfish (Almodaresi et al, 2018), deSALT (Liu et al, 2019)), or from both (BLight (Marchet et al, 2019), Bifrost (Holley and Melsted, 2020)). These latter applications most frequently make use of the (colored) compacted de Bruijn graph, a variant of the de Bruijn graph in which the maximal non-branching paths (also referred to as unitigs) are condensed into single vertices in the underlying graph structure.…”