2014
DOI: 10.1016/j.compbiolchem.2014.08.010
|View full text |Cite
|
Sign up to set email alerts
|

Human–chimpanzee alignment: Ortholog exponentials and paralog power laws

Abstract: Genomic subsequences conserved between closely related species such as human and chimpanzee exhibit an exponential length distribution, in contrast to the algebraic length distribution observed for sequences shared between distantly related genomes. We find that the former exponential can be further decomposed into an exponential component primarily composed of orthologous sequences, and a truncated algebraic component primarily composed of paralogous sequences.

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
13
0

Year Published

2014
2014
2020
2020

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 10 publications
(14 citation statements)
references
References 26 publications
1
13
0
Order By: Relevance
“…Since Figures 2E,F are in log-log scale, we have shown that repeats with mismatches have power-law distribution for both D and C . This power-law distribution for size is consistent with other studies: the self-alignment for smaller genomes shows similar power-law like distribution in Gao and Miller ( 2011 , 2014 ). We also draw power-law functions with the known exponents: 1/ D , 1/ D 2 , and 1/ D 3 for size distribution, and 1/C 3 for copy number distribution.…”
Section: Distribution Of Approximate Repeats In the Human Referencsupporting
confidence: 92%
“…Since Figures 2E,F are in log-log scale, we have shown that repeats with mismatches have power-law distribution for both D and C . This power-law distribution for size is consistent with other studies: the self-alignment for smaller genomes shows similar power-law like distribution in Gao and Miller ( 2011 , 2014 ). We also draw power-law functions with the known exponents: 1/ D , 1/ D 2 , and 1/ D 3 for size distribution, and 1/C 3 for copy number distribution.…”
Section: Distribution Of Approximate Repeats In the Human Referencsupporting
confidence: 92%
“…Interestingly, in this asymptotic regime, the MLD exhibits a power-law tail M (r) ∼ r α (identified as a straight line in the double logarithmic plot), where the exponent α is close to −5 for exonic sequences. This is in contrast to the MLD of non-coding sequences, where the exponent α is close to −4 [9,10,16]. This property appears to be impressively reproducible in the comparison of various pairs of species (see Fig S1).…”
Section: Introductionmentioning
confidence: 79%
“…To obtain all non-embedded and embedded maximal matches between two genomes, we must first of all specify the matching criteria and compare their sequences either by intersection or by alignment. In this paper, our non-embedded and embedded maximal matches are obtained without loss of generality from 4-base exact matched genome intersections on pristine (un-repeatmasked) whole-genome sequences (see section 5.2 and supplementary material 1 for computation details); non-embedded and embedded maximal matches identified in a Lastz raw alignment are referred to as "non-nested and nested CMRs" in (Gao and Miller 2014). Alternatively, we also examine another set of maximal matches-maximal unique matches ("MUMs" for short) (Delcher et al 1999;Delcher et al 2002;Kurtz et al 2004)-as an approximation to the non-embedded maximal matches defined in this paper.…”
Section: Identifying Lineal Orthologs By Non-embedded Maximal Matchesmentioning
confidence: 99%
“…We compare our non-embedded maximal matches identified by intersection (nem for short, see section 5.2 and supplementary material 1 for computation details) with exact matches returned by Lastz net alignment (net for short, see (Gao and Miller 2011) and (Gao and Miller 2014) for computation details). Table 1 shows such comparisons for two pairs of genomes: human/chimp and human/mouse.…”
Section: Comparison On Nucleotide Level With a Lastz Net Alignmentmentioning
confidence: 99%
See 1 more Smart Citation