2017
DOI: 10.1093/bioinformatics/btx304
|View full text |Cite
|
Sign up to set email alerts
|

KMC 3: counting and manipulating k-mer statistics

Abstract: Supplementary data are available at Bioinformatics online.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
449
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 524 publications
(450 citation statements)
references
References 9 publications
1
449
0
Order By: Relevance
“…HCoV-NL63 genotype A and B sequence sets were prepared from GenBank plus the Kilifi HCoV-NL63 sequences. KMC3 [41] was used to identify all 30-nt sequences (k-mers) present in genotype A sequences and not in genotype B sequences and vice versa. Quality-controlled short read sequences from each sample were then classified as HCoV-NL63 genotype A or genotype B based on the read's content of genotype A and B-specific 30-nt kmers using a threshold of 20 kmer per read as defining identity to a genotype.…”
Section: K-mer Methods Of Genotype Classificationmentioning
confidence: 99%
“…HCoV-NL63 genotype A and B sequence sets were prepared from GenBank plus the Kilifi HCoV-NL63 sequences. KMC3 [41] was used to identify all 30-nt sequences (k-mers) present in genotype A sequences and not in genotype B sequences and vice versa. Quality-controlled short read sequences from each sample were then classified as HCoV-NL63 genotype A or genotype B based on the read's content of genotype A and B-specific 30-nt kmers using a threshold of 20 kmer per read as defining identity to a genotype.…”
Section: K-mer Methods Of Genotype Classificationmentioning
confidence: 99%
“…The VCFs with these variants were then normalized using bcftools norm (1.9) and combined with the SVs across samples using bayesTyperTools combine to produce the input candidate set. k-mers in the raw reads were counted using kmc [41] (3.1.1) with a k-mer size of 55. A Bloom lter was constructed from these k-mers using bayesTyperTools makeBloom .…”
Section: Bayestyper (V15 Beta 62888d6)mentioning
confidence: 99%
“…The compression of k-mer sets has not been extensively studied, except in the context of how k-mer counters store their output [17][18][19][20]. DSK [18] uses an HDF5-based encoding, KMC3 [17] combines a dense storage of prefixes with a sparse storage of suffixes, and Squeakr [20] uses a counting quotient filter [21]. The compression of read data, on the other hand, stored in either unaligned or aligned formats, has received a lot of attention [22][23][24].…”
Section: Related Workmentioning
confidence: 99%
“…We measure the compressed space-usage (Table 2), compression time and memory (Table 3), and decompression time and memory. We compare against the following lossless compression strategies: 1) the binary output of the k-mer [18], KMC [17], and Squeakr-exact [20]; 2) the original FASTA sequences, with headers removed; 3) the maximal unitigs; and 4) the BOSS representation [31] (as implemented in COSMO [42]). In all cases, the stored data is additionally compressed using MFC (for nucleotide sequences, i.e.…”
Section: Evaluation Of Ust-compressmentioning
confidence: 99%