2021
DOI: 10.1093/nar/gkab563
|View full text |Cite
|
Sign up to set email alerts
|

A sensitive repeat identification framework based on short and long reads

Abstract: Numerous studies have shown that repetitive regions in genomes play indispensable roles in the evolution, inheritance and variation of living organisms. However, most existing methods cannot achieve satisfactory performance on identifying repeats in terms of both accuracy and size, since NGS reads are too short to identify long repeats whereas SMS (Single Molecule Sequencing) long reads are with high error rates. In this study, we present a novel identification framework, LongRepMarker, based on the global de … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
9
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2

Relationship

3
3

Authors

Journals

citations
Cited by 11 publications
(9 citation statements)
references
References 60 publications
0
9
0
Order By: Relevance
“…Although it is routine that the identification and masking of repeats are performed before the gene prediction, repeat identification tools available at present can not totally find and mask all the repeats. 37 This is exactly the case for the YHSC genome with a high proportion of repeats. In this study, 24 of the 173 expanded domains exist in transposon-encoded proteins ( Table S11 ).…”
Section: Resultsmentioning
confidence: 67%
“…Although it is routine that the identification and masking of repeats are performed before the gene prediction, repeat identification tools available at present can not totally find and mask all the repeats. 37 This is exactly the case for the YHSC genome with a high proportion of repeats. In this study, 24 of the 173 expanded domains exist in transposon-encoded proteins ( Table S11 ).…”
Section: Resultsmentioning
confidence: 67%
“…Thorough annotation of TEs is ideal for dealing with the deluge of genome data. Therefore, de novo repeat identification was performed by Repeat Modeller2 ( Flynn et al, 2020 ), LongRepMarker ( Liao et al, 2021 ), and Extensive De Novo TE Annotator (EDTA) ( Ou et al, 2019 ). The unclassified repeats were further classified using DeepTE ( Yan et al, 2020 ).…”
Section: Methodsmentioning
confidence: 99%
“…Generation of the multi-alignment unique k-mers and their coverage regions on overlap sequences . The multi-alignment unique k-mers were first proposed in the paper of LongRepMarker ( 24 ), which refers to the unique k-mers that can be aligned to multiple different locations in the overlap sequences. Due to the sequencing bias, the high frequency threshold is often difficult to obtain accurately, which has a great impact on the range of the high frequency k-mers ( 29–31 ).…”
Section: Methodsmentioning
confidence: 99%
“…LongRepMarker ( 24 ) is a new framework developed recently by our group for comprehensive identification of genomic repetitive sequences. Comprehensive evaluations carried out in the study of LongRepMarker not only show that LongRepMarker can achieve more satisfactory results than the existing detection methods, but can also discover a large number of new repeat sequences and families.…”
Section: Introductionmentioning
confidence: 99%