We report the structure and developmental expression of collagen gene sequences in Drosophila melanogaster. Collagen-like genomic clones were isolated by screening a Drosophila genomic library with a chicken proa2(I) cDNA clone as a hybridization probe. A 1.5-kilobase (kb) DNA sequence from a 9.2-kb DNA clone (pDCgl) is presented. Unlike the highly fragmented genes for vertebrate type I collagen, there is no evidence of a 54-base-pair primordial unit within this gene segment. Instead, the fragment is composed of two large coding sequences. Together they specify a sequence of 469 amino acids. This collagen product is composed almost entirely of the Gly-X-Y repeat characteristic of peptides involved in triple helix formation. Within the polypeptide there are four minor discontinuities in the Gly-X-Y pattern. Similar interruptions have been observed in a mouse basement membrane collagen protein sequence. Therefore, the Drosophila collagen gene may encode a nonfibrous collagen such as a basement membrane or cuticle collagen or a novel collagenous protein. By using the DNA segment of known sequence as a hybridization probe, a developmental sequence of polyadenylylated RNA samples was screened for the presence of homologous sequences. A RNA species 6.4 kb in length was detected as a prominent band only in the first-and second-instar larval stages. This pattern of developmental hybridization correlates with the production ofthe cuticle and basement membranes, and the large size of the RNA is consistent with its identification as a collagen-encoding RNA.Much of the current interest in the collagen gene family comes from the key role these extracellular structural proteins play in developmental processes and tissue architecture as well as their complex gene structure. The highly unusual genomic organization of procollagen genes poses some fascinating questions concerning the evolution ofthe collagen gene family. However, to date the only collagen genes examined encode the constituent polypeptide (proa) chains of vertebrate type I collagen. In particular, these are the proa2(I) genes from chicken (1, 2) and sheep (3) and the proal(I) gene from mice (4). Except for the immunoglobulin, viral, and mitochondrial genes, these procollagen genes exhibit the greatest complexity in their genome organization of all the eukaryotic genes studied. For example, in the proa2(I) gene there may be 50 or more intervening sequences distributed over 38 kilobases (kb) of genomic DNA. Even more unusual is the preponderance of 54-base-pair (bp) coding sequences observed in this gene. This observation led to the hypothesis that procollagen genes arose by amplification of a primordial 54-bp unit (1). However, a more compact genomic organization is observed in a proal(I) procollagen gene (4). Here only five of the eight coding sequences studied are 54 bp and the remainder are 108 bp. Thus, a more complex evolutionary history has been postulated for procollagen genes (4). Whether other fibrous procollagen genes resemble either the proal(I)...