2019
DOI: 10.1093/dnares/dsy046
|View full text |Cite
|
Sign up to set email alerts
|

Search for potential reading frameshifts in cds from Arabidopsis thaliana and other genomes

Abstract: A new mathematical method for potential reading frameshift detection in protein-coding sequences (cds) was developed. The algorithm is adjusted to the triplet periodicity of each analysed sequence using dynamic programming and a genetic algorithm. This does not require any preliminary training. Using the developed method, cds from the Arabidopsis thaliana genome were analysed. In total, the algorithm found 9,930 sequences containing one or more potential reading frameshift(s). This is ∼2… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 43 publications
0
5
0
Order By: Relevance
“…Step 4 of the algorithm excluded all DNA regions with triplet periodicity without indels. The presence of TR with n = 3 in Figure 2 indicated that there were a large number of regions where only triplets with indels could be detected, which is confirmed by the fact that many potential reading frame shifts that have been identified are associated with the presence of indels in the triplets [ 40 ]. Figure 3 also shows that a significant number of repeats were 2 nt long; other peaks were detected at 11, 22–23, and 31 nt.…”
Section: Resultsmentioning
confidence: 85%
See 1 more Smart Citation
“…Step 4 of the algorithm excluded all DNA regions with triplet periodicity without indels. The presence of TR with n = 3 in Figure 2 indicated that there were a large number of regions where only triplets with indels could be detected, which is confirmed by the fact that many potential reading frame shifts that have been identified are associated with the presence of indels in the triplets [ 40 ]. Figure 3 also shows that a significant number of repeats were 2 nt long; other peaks were detected at 11, 22–23, and 31 nt.…”
Section: Resultsmentioning
confidence: 85%
“…Although triplets without indels were not considered ( Section 2.1 , step 4), still more than 20 thousand regions with triplet repeats containing indels were observed ( Figure 3 ). It was also possible to calculate the total number of regions with triplet periodicity within the coding sequences of the rice genome, which was 14,534 ( Table 4 ), indicating that 75% of the detected regions containing triplet repeats with indels are in the coding sequences, where they may represent potential frame shifts [ 40 ]. Therefore, it would be more correct to say that most TRs we identified had a repeat length of 2 nt.…”
Section: Discussionmentioning
confidence: 99%
“…The obtained classes of promoter sequences were used to search for other promoter sequences in the rice genome. A profile matrix with a size of (16.600) was created for each class [33,34]. The search for potential promoter sequences was performed for each template with a global alignment.…”
Section: Identification Of Various Artificial Insertions Of Dna Fragmmentioning
confidence: 99%
“…HMMs of some SINE families (Alu and MIR) constructed for several model organisms and stored in the Dfam database [ 22 ] can also be used by RepeatMasker to search for divergent copies of these repeats. A limitation of this approach is that the initial sample for HMM construction is created using BLAST or similar methods that do not consider correlation between neighboring nucleotides; as a result, the correlation properties of different copies can eliminate each other, which can greatly reduce the search potential of an HMM [ 28 ].…”
Section: Introductionmentioning
confidence: 99%
“…To overcome the described limitations, we used the Highly Divergent Repeat Search Method (HDRSM), which considers both sequence similarity and correlations of nucleotide pairs within the compared sequences. Previously, a similar method was used to search for frameshifts in protein-coding sequences [ 28 ]. In this work, we applied the HDRSM to identification of SINEs in the genome of rice ( Oryza sativa subsp.…”
Section: Introductionmentioning
confidence: 99%