2021
DOI: 10.1101/2021.09.10.459798
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Benchmarking tools for DNA repeat identification in diverse genomes

Abstract: Continuous progression in genomics shows that repeats are important elements of genomes that perform many regulatory and other functions. Eventually, to date, many computational tools have been developed and frequently used for the identification and analysis of genomic repeats. A single tool cannot detect all different types of repeats in diverse species rather pipeline of tools is more effective. But, the choice of such rigorous and robust tools is highly challenging. A method has been implemented to select … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 45 publications
0
4
0
Order By: Relevance
“…To assess our newly developed tool, we used variants based on human reference assembly GRCh38 from the ClinVar database [46], Platinum Genome [47], the National Institute of Standards and Technology’s Genome in a Bottle (GIAB) [48,49], and the 1000 Genomes Project [50,51] to study the breakpoint ambiguities of indels and small variants (1–50 bp) in STR regions. Due to the varying algorithmic parameters used in different studies for STR detection, such as the minimum length of an STR and the tolerance of mismatches and indels between STR units, the definitions of an STR may vary widely and lead to highly variable interpretations [5254]. In our study, we restricted our analysis to perfect STRs with motif sizes of 1–6 bp based the common definitions in the literature [38,5557].…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…To assess our newly developed tool, we used variants based on human reference assembly GRCh38 from the ClinVar database [46], Platinum Genome [47], the National Institute of Standards and Technology’s Genome in a Bottle (GIAB) [48,49], and the 1000 Genomes Project [50,51] to study the breakpoint ambiguities of indels and small variants (1–50 bp) in STR regions. Due to the varying algorithmic parameters used in different studies for STR detection, such as the minimum length of an STR and the tolerance of mismatches and indels between STR units, the definitions of an STR may vary widely and lead to highly variable interpretations [5254]. In our study, we restricted our analysis to perfect STRs with motif sizes of 1–6 bp based the common definitions in the literature [38,5557].…”
Section: Resultsmentioning
confidence: 99%
“…Due to the varying algorithmic parameters used in different studies for STR detection, such as the minimum length of an STR and the tolerance of mismatches and indels between STR units, the definitions of an STR may vary widely and lead to highly variable interpretations [52][53][54]. In our study, we restricted our analysis to perfect STRs with motif sizes of 1-6 bp based the common definitions in the literature [38,[55][56][57].…”
Section: An Overview Of Varscatmentioning
confidence: 99%
“…This could be explained by the use of different tools to identify repetitive sequences. Das & Ghosh [72] mention that specialized programs for the identi cation of repetitive sequences often present different results depending on the algorithm used. Despite that differences between algorithms make accurate comparisons di cult, general comparisons, such as the higher number of repetitive mononucleotides among SSRs in the chloroplast genomes of distant species, are still possible [40,73,74].…”
Section: Discussionmentioning
confidence: 99%
“…To assess our newly developed VarSCAT tool, we used variants based on human reference assembly GRCh38 from the ClinVar database [44], Platinum Genome [45], the National Institute of Standards and Technology's Genome in a Bottle (GIAB) [46,47], and the 1000 Genomes Project [48,49] to study the breakpoint ambiguities of indels and small variants (1-50 bp) in STR regions. Due to the varying algorithmic parameters used in different studies for STR detection, such as the minimum length of an STR and the tolerance of mismatches and indels between STR units, the definitions of an STR may vary widely and lead to highly variable interpretations [50][51][52]. In our study, we restricted our analysis to perfect STRs (except benchmarking of VarSCAT in the following section) with motif sizes of 1-6 bp based the common definitions in the literature [39,[53][54][55].…”
Section: Plos Computational Biologymentioning
confidence: 99%