2021
DOI: 10.1016/j.isci.2021.102048
|View full text |Cite
|
Sign up to set email alerts
|

Spatial constrains and information content of sub-genomic regions of the human genome

Abstract: Summary Complexity metrics and machine learning (ML) models have been utilized to analyze the lengths of segmental genomic entities of DNA sequences (exonic, intronic, intergenic, repeat, unique) with the purpose to ask questions regarding the segmental organization of the human genome within the size distribution of these sequences. For this we developed an integrated methodology that is based upon the reconstructed phase space theorem, the non-extensive statistical theory of Tsallis, ML techniques… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 8 publications
(5 citation statements)
references
References 62 publications
0
5
0
Order By: Relevance
“…Clearly the functional role of intronic sequences of the HLA genes is a domain that requires focused attention. Recent reports reveal that information content in the intronic sequences is comparable to exonic sequences 20 …”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Clearly the functional role of intronic sequences of the HLA genes is a domain that requires focused attention. Recent reports reveal that information content in the intronic sequences is comparable to exonic sequences 20 …”
Section: Discussionmentioning
confidence: 99%
“…Recent reports reveal that information content in the intronic sequences is comparable to exonic sequences. 20 The HLA polymorphisms in individual populations and their evolutionary relationships will be analyzed in the context of their geographical locations in another report. Their connections to other known alleles will be identified and possible environmental pressures that have contributed to the formation of these new alleles will be assessed and discussed further (manuscript in preparation).…”
Section: Discussionmentioning
confidence: 99%
“…As it can be seen in this panel, considering the cancer samples as positives, the true positives (TP) were found equal to 23, true negatives (TN) = 28, false positives (FP) = 3 and false negatives (FN) = 8. Based on these values, we estimated the performance measures using Equations ( 11)- (14). In particular, the accuracy of the classifier was found to be 51/62 × 100% = 82.3%.…”
Section: Svm Classifiermentioning
confidence: 99%
“…On the other hand, numerous models based on statistical physics consistently attempt to represent statistical features, such as long-range and short-range correlations, in light of the large DNA sequence data. Some approaches used statistical tools in connection with random-walk simulations [ 12 , 13 , 14 ], wavelet transforms [ 15 , 16 ], Ising models [ 17 ] (see e.g., [ 18 ] and references therein), and Tsallis’ statistics together with Machine Learning [ 19 ]. Many live creatures’ coding and non-coding sequence length distributions have been studied by some models in relation to long- and short-range correlations [ 20 , 21 , 22 , 23 ].…”
Section: Introductionmentioning
confidence: 99%