2017
DOI: 10.1101/179960
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Skip-mers: increasing entropy and sensitivity to detect conserved genic regions with simple cyclic q-grams

Abstract: 10Bioinformatic analyses and tools make extensive use of k-mers (fixed contiguous strings of k nucleotides) as an informational unit. K-mer analyses are both useful and fast, but are strongly affected by singlenucleotide polymorphisms or sequencing errors, effectively hindering direct-analyses of whole regions and decreasing their usability between evolutionary distant samples. Q-grams or spaced seeds, subsequences generated with a pattern of used-and-skipped nucleotides, overcome many of these limitations but… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 24 publications
(20 reference statements)
0
3
0
Order By: Relevance
“…Extending the generalised trio binning approach to not only consider unique k-mers but also k-mer frequencies is another interesting option. Further, all k-mer based methods could be used with skip-mers instead, a concept to include information from more distant genomic positions and to decrease the impact of SNVs (Clavijo et al 2017).…”
Section: Discussionmentioning
confidence: 99%
“…Extending the generalised trio binning approach to not only consider unique k-mers but also k-mer frequencies is another interesting option. Further, all k-mer based methods could be used with skip-mers instead, a concept to include information from more distant genomic positions and to decrease the impact of SNVs (Clavijo et al 2017).…”
Section: Discussionmentioning
confidence: 99%
“…Taking each genome in turn as a reference, kmers of length 51 were identified from genic regions using the annotation for that reference. These kmers were used to search the genomes of the other cultivars and a coverage score was computed 52 between each gene in the reference and every other genome. The coverage score (a value between 0 and 1) can be used as a proxy for sequence similarity/difference between genes in different cultivars where values closer to 0 indicate greater difference and values closer to 1 indicate similarity.…”
Section: Methodsmentioning
confidence: 99%
“…The new assembly was compared to the old D. firmibasis ASM27748v1 assembly (GenBank GCA_000277485.1) 17 using satsuma2 v2016-12-07 32,33 . Synteny analysis revealed that, although the old assembly is highly fragmented, there are no major regions of the new assembly entirely missing in the old assembly (Fig.…”
Section: Genome Assembly and Polishingmentioning
confidence: 99%