The amino acid sequence of the heavy chain of Bombyx mori silk fibroin was derived from the gene sequence. The 5,263-residue (391-kDa) polypeptide chain comprises 12 low-complexity "crystalline" domains made up of Gly-X repeats and covering 94% of the sequence; X is Ala in 65%, Ser in 23%, and Tyr in 9% of the repeats. The remainder includes a nonrepetitive 151-residue header sequence, 11 nearly identical copies of a 43-residue spacer sequence, and a 58-residue C-terminal sequence. The header sequence is homologous to the N-terminal sequence of other fibroins with a completely different crystalline region. In Bombyx mori, each crystalline domain is made up of subdomains of approximately 70 residues, which in most cases begin with repeats of the GAGAGS hexapeptide and terminate with the GAAS tetrapeptide. Within the subdomains, the Gly-X alternance is strict, which strongly supports the classic Pauling-Corey model, in which beta-sheets pack on each other in alternating layers of Gly/Gly and X/X contacts. When fitting the actual sequence to that model, we propose that each subdomain forms a beta-strand and each crystalline domain a two-layered beta-sandwich, and we suggest that the beta-sheets may be parallel, rather than antiparallel, as has been assumed up to now.
The complete sequence of the Bombyx mori fibroin gene has been determined by means of combining a shotgun sequencing strategy with physical map-based sequencing procedures. It consists of two exons (67 and 15 750 bp, respectively) and one intron (971 bp). The fibroin coding sequence presents a spectacular organization, with a highly repetitive and G-rich (approximately 45%) core flanked by non-repetitive 5' and 3' ends. This repetitive core is composed of alternate arrays of 12 repetitive and 11 amorphous domains. The sequences of the amorphous domains are evolutionarily conserved and the repetitive domains differ from each other in length by a variety of tandem repeats of subdomains of approximately 208 bp which are reminiscent of the repetitive nucleosome organization. A typical composition of a subdomain is a cluster of repetitive units, Ua, followed by a cluster of units, Ub, (with a Ua:Ub ratio of 2:1) flanked by conserved boundary elements at the 3' end. Moreover some repeats are also perfectly conserved at the peptide level indicating that the evolutionary pressure is not identical along the sequence. A tentative model for the constitution and evolution of this unusual gene is discussed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.