2020
DOI: 10.1186/s12859-020-03779-w
|View full text |Cite
|
Sign up to set email alerts
|

RepAHR: an improved approach for de novo repeat identification by assembly of the high-frequency reads

Abstract: Background Repetitive sequences account for a large proportion of eukaryotes genomes. Identification of repetitive sequences plays a significant role in many applications, such as structural variation detection and genome assembly. Many existing de novo repeat identification pipelines or tools make use of assembly of the high-frequency k-mers to obtain repeats. However, a certain degree of sequence coverage is required for assemblers to get the desired assemblies. On the other hand, assemblers cut the reads in… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 34 publications
0
4
0
Order By: Relevance
“…Nonetheless, a few other issues related to de Bruijn graph obstruct the genome assembly procedure. The splitting of reads into k-mers may destroy the structure of the repetitive regions, which is detrimental to the recovery of the repetitive segments 183 . The frequency of k -mers obtained from reads with many repeats are often much higher than the regular coverage of sequencing, but those with few repetitions may fail to meet the basic coverage criteria, making assembly tough to obtain 183 .…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Nonetheless, a few other issues related to de Bruijn graph obstruct the genome assembly procedure. The splitting of reads into k-mers may destroy the structure of the repetitive regions, which is detrimental to the recovery of the repetitive segments 183 . The frequency of k -mers obtained from reads with many repeats are often much higher than the regular coverage of sequencing, but those with few repetitions may fail to meet the basic coverage criteria, making assembly tough to obtain 183 .…”
Section: Resultsmentioning
confidence: 99%
“…The splitting of reads into k-mers may destroy the structure of the repetitive regions, which is detrimental to the recovery of the repetitive segments 183 . The frequency of k -mers obtained from reads with many repeats are often much higher than the regular coverage of sequencing, but those with few repetitions may fail to meet the basic coverage criteria, making assembly tough to obtain 183 . The de Bruijn -based assemblers use cutoff criteria to prune out low coverage regions, which reduces the complexity and makes the algorithms viable, but it has an inevitable consequence on the final assembly's effective length and genome coverage 184 .…”
Section: Resultsmentioning
confidence: 99%
“…RepeatFinder 186 , RepeatScout 187 , ReAS 188 , and Generic Repeat Finder (GRF) 189 are representative of this class of approaches. The third class of methods includes RepARK 190 , REPdenovo 191 , RepAHR 192 , and RepLong 193 , which rely on de novo sequence assembly and community detection in sequence similarity network to identify repeats (Supplementary Table S8) . Among these four tools, the first three obtain repeats by performing assembly of high-frequency reads or k-mers (Supplementary Fig.…”
Section: Repeat Detectionmentioning
confidence: 99%
“…In the first, DNA regions in which the concentration of different k-mers shows statistically significant deviation from the random level are recognized as locations of potential repeats. The existing algorithms for finding dispersed repeats based on k-mers can include expansion of the region with non-random k-mer distribution [19][20][21], training a classifier on areas with high k-mer frequencies [22], k-mers assembly [23,24], or grouping them into clouds [25].…”
Section: Introductionmentioning
confidence: 99%