“…First, we downloaded the virus integration site information from the VISDB(Tang et al, 2020) and we lifted it over to the hg19 version using the liftover tool from the UCSC Genome Browser since FusionAI’s training was done based on the sequence of hg19 version (Navarro Gonzalez et al, 2021). We integrated 13 types of repeats (Alu repeats, A-Phased repeats, Directed repeats, DNA transposons, “G-Quadruplex, forming repeats”, Inverted repeats, L1 repeats, L2 repeats, “Low_complexity, A/T rich regions”, Microsatellites, MIR repeats, Mirror repeats, and Z-DNA motifs) from RepeatMasker (Bao et al, 2015) and MicroSatellite DataBase (MSDB) (Avvaru et al, 2020). For the diverse types of structural variants including the copy number variants, we downloaded the arranged breakpoint information of the structural variants from dbVar (Lappalainen et al, 2013).…”