2014
DOI: 10.1515/jib-2014-252
|View full text |Cite
|
Sign up to set email alerts
|

Geometric approach to string analysis for biosequence classification

Abstract: SummaryTools that effectively analyze and compare sequences are of great importance in various areas of applied computational research, especially in the framework of molecular biology. In the present paper, we introduce simple geometric criteria based on the notion of string linearity and use them to compare DNA sequences of various organisms, as well as to distinguish them from random sequences. Several other theoretical and statistical results are outlined as well. Our experiments reveal a substantial diffe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2014
2014
2014
2014

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 14 publications
(23 reference statements)
0
2
0
Order By: Relevance
“…Due to the motivation given earlier that fluctuations in the linearity measures first disappear in samples of length 50,000, we randomly selected substrings of length 50,000 from the larger excerpts of genomes. Random sequences have been generated with independent symbols with uniform distribution (25% each) (see [3] for more details). Given such a string, using elementary techniques we compute and store an array of distances from points of a monotone path representing a biosequence to the corresponding straight line.…”
Section: Computational Proceduresmentioning
confidence: 99%
See 1 more Smart Citation
“…Due to the motivation given earlier that fluctuations in the linearity measures first disappear in samples of length 50,000, we randomly selected substrings of length 50,000 from the larger excerpts of genomes. Random sequences have been generated with independent symbols with uniform distribution (25% each) (see [3] for more details). Given such a string, using elementary techniques we compute and store an array of distances from points of a monotone path representing a biosequence to the corresponding straight line.…”
Section: Computational Proceduresmentioning
confidence: 99%
“…In the framework of this project we have obtained several other related theoretical and experimental results, which are not presented here because of the page limit. Some of these, together with a more detailed description of the experiments outlined in this article are available in a technical report [3].…”
Section: Introductionmentioning
confidence: 99%