Cassava Brown Streak Disease (CBSD), which is caused by cassava brown streak virus (CBSV) and Ugandan cassava brown streak virus (UCBSV), represents one of the most devastating threats to cassava production in Africa, including in Rwanda where a dramatic epidemic in 2014 dropped cassava yield from 3.3 million to 900,000 tonnes (1). Studying viral genetic diversity at the genome level is essential in disease management, as it can provide valuable information on the origin and dynamics of epidemic events. To fill the current lack of genome-based diversity studies of UCBSV, we performed a nationwide survey of cassava ipomovirus genomic sequences in Rwanda by high-throughput sequencing (HTS) of pools of plants sampled from 130 cassava fields in 13 cassava-producing districts, spanning seven agro-ecological zones with contrasting climatic conditions and different cassava cultivars. HTS allowed the assembly of a nearly complete consensus genome of UCBSV in 12 districts. The phylogenetic analysis revealed high homology between UCBSV genome sequences, with a maximum of 0.8 % divergence between genomes at the nucleotide level. An in-depth investigation based on Single Nucleotide Polymorphisms (SNP) was conducted to explore the genome diversity beyond the consensus sequences. First, to ensure the validity of the result, a panel of SNPs was confirmed by independent RT-PCR and Sanger sequencing.
Furthermore, the combination of fixation index (FST) calculation and Principal Component Analysis (PCA) based on SNPs patterns identified three different UCBSV haplotypes geographically clustered. The haplotype 2 (H2) was restricted to the central regions, where the NAROCAS 1 cultivar is predominantly farmed. RT-PCR and Sanger sequencing of individual NAROCAS1 plants confirmed their association with H2. Haplotype 1 was widely spread, with a 100% occurrence in the Eastern region, while Haplotype 3 was only found in the Western region. These haplotypes’ associations with specific cultivars or regions would need further confirmation. Our results prove that a much more complex picture of genetic diversity can be deciphered beyond the consensus sequences, with practical implications on virus epidemiology, evolution, and disease management. Our methodology proposes a high-resolution analysis of genome diversity beyond the consensus between and within samples. It can be used at various scales, from individual plants to pooled samples of virus-infected plants. Our findings also showed how subtle genetic differences could be informative on the potential impact of agricultural practices, as the presence and frequency of a virus haplotype could be correlated with the dissemination and adoption of improved cultivars.