2019
DOI: 10.2174/1389202919666181026101326
|View full text |Cite
|
Sign up to set email alerts
|

Estimating the k-mer Coverage Frequencies in Genomic Datasets: A Comparative Assessment of the State-of-the-art

Abstract: Background: In bioinformatics, estimation of k-mer abundance histograms or just enumerating the number of unique k-mers and the number of singletons are desirable in many genome sequence analysis applications. The applications include predicting genome sizes, data pre-processing for de Bruijn graph assembly methods (tune runtime parameters for analysis tools), repeat detection, sequencing coverage estimation, measuring sequencing error rates, etc. Different methods for cardinality estimation in sequencing dat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 13 publications
(12 citation statements)
references
References 41 publications
0
12
0
Order By: Relevance
“…The overall genome duplication level in H. meleagridis was computed directly as the fraction of the genome that is duplicated instead of inferring this figure by indirect methods (i.e. [ 14 ]). To calculate this, we aligned the contigs to themselves with nucmer and extracted homologous regions with the show-coords tool [ 15 ].…”
Section: Resultsmentioning
confidence: 99%
“…The overall genome duplication level in H. meleagridis was computed directly as the fraction of the genome that is duplicated instead of inferring this figure by indirect methods (i.e. [ 14 ]). To calculate this, we aligned the contigs to themselves with nucmer and extracted homologous regions with the show-coords tool [ 15 ].…”
Section: Resultsmentioning
confidence: 99%
“…Briefly, k‐mers are sequences of length “k” contained within the genome assembly and k‐mer profiles provide estimates of data quality, completeness, and complexity (Simpson, 2014). There are several programs available for k‐mer analyses with trade‐offs between speed, accuracy, and memory usage for analyses (Manekar & Sathe, 2019; Mapleson et al, 2017; Vurture et al, 2017). These k‐mer profiles provide critical information regarding genome assemblies and may also identify mis‐assembly problems such as repeated sequences that have been inappropriately collapsed.…”
Section: Evaluating Assembliesmentioning
confidence: 99%
“…for the first gaze sequence example above) are extracted as features contained in each gaze sequence with look-away fixations (see Appendix A for a detailed example). The use of k-mers to break down the gaze fixation sequences is inspired by k-mer analysis of nucleic acid sequences in bioinformatics [64,65]. Single fixations or sequences of two fixations (2-mers) hold very little information about participants' sequential gaze.…”
Section: Plos Onementioning
confidence: 99%