We analyze the three-dimensional structure of proteins by a
computer program that finds regions of sequence that contain module
boundaries, defining a module as a segment of polypeptide chain bounded
in space by a specific given distance. The program defines a set of
“linker regions” that have the property that if an intron were to
be placed into each linker region, the protein would be dissected into
a set of modules all less than the specified diameter. We test a set of
32 proteins, all of ancient origin, and a corresponding set of 570
intron positions, to ask if there is a statistically significant excess
of intron positions within the linker regions. For 28-Å modules, a
standard size used historically, we find such an excess, with
P
< 0.003. This correlation is neither due to a
compositional or sequence bias in the linker regions nor to a surface
bias in intron positions. Furthermore, a subset of 20 introns, which
can be putatively identified as old, lies even more explicitly within
the linker regions, with
P
< 0.0003. Thus, there
is a strong correlation between intron positions and three-dimensional
structural elements of ancient proteins as expected by the
introns-early approach. We then study a range of module diameters and
show that, as the diameter varies, significant peaks of correlation
appear for module diameters centered at 21.7, 27.6, and 32.9 Å. These
preferred module diameters roughly correspond to predicted exon sizes
of 15, 22, and 30 residues. Thus, there are significant correlations
between introns, modules, and a quantized pattern of the lengths of
polypeptide chains, which is the prediction of the “Exon Theory of
Genes.”