Cacao swollen shoot virus is a member of the family Caulimoviridae, genus Badnavirus and is naturally transmitted to Theobroma cacao (L.) by several mealybug species. CSSV populations in West African countries are highly variable and genetically structured into several different groups based on the diversity in the first part of ORF3 which encodes the movement protein. To unravel the extent of isolate diversity and address the problems of low titer and mixed viral sequences in samples, we used Illumina MiSeq and HiSeq technology. We were able to reconstruct de novo 20 new complete genomes from cacao samples collected in the Cocoa Research Institute of Ghana (CRIG) Museum and from the field samples collected in Côte d'Ivoire or Ghana. Based on the 20% threshold of nucleotide divergence in the reverse transcriptase/ribonuclease H (RT/RNase H) region which denotes species demarcation, we conclude there exist seven new species associated with the cacao swollen shoot disease. These new species along with the three already described leads to ten, the total number of the complex of viral species associated with the disease. A sample from Sri Lanka exhibiting similar leaf symptomology to West African CSSD-affected plants was also included in the study and the corresponding sequence represents the genome of a new virus named cacao bacilliform SriLanka virus (CBSLV).
MotivationTo keep up with the scale of genomic databases, several methods rely on local sensitive hashing methods to efficiently find potential matches within large genome collections. Existing solutions rely on Minhash or Hyperloglog fingerprints and require reading the whole index to perform a query. Such solutions can not be considered scalable with the growing amount of documents to index.ResultsWe present NIQKI, a novel structure using well-designed fingerprints that lead to theoretical and practical query time improvements, outperforming state-of-the-art by orders of magnitude. Our contribution is threefold. First, we generalize the concept of Hyperminhash fingerprints in (h,m)-HMH fingerprints that can be tuned to present the lowest false positive rate given the expected sub-sampling applied. Second, we provide a structure able to index any kind of fingerprints based on inverted indexes that provide optimal queries, namely linear with the size of the output. Third, we implemented these approaches in a tool dubbed NIQKI that can index and calculate pairwise distances for over one million bacterial genomes from GenBank in a matter of days on a small cluster. We show that our approach can be orders of magnitude faster than state-of-the-art with comparable precision. We believe that this approach can lead to tremendous improvement allowing fast query, scaling on extensive genomic databases.Availability and implementationWe wrote the NIQKI index as an open-source C++ library under the AGPL3 license available at https://github.com/Malfoy/NIQKI. It is designed as a user-friendly tool and comes along with usage samples
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.