The availability of large volumes of genomic sequences presents an unprecedented proteomic challenge to characterize the structure and function of various protein motifs. Primary structural alignment is often unable to accurately identify a given motif due to sequence divergence; however, with the aid of secondary structural prediction for analysis, it becomes feasible to explore protein motifs on a proteome-wide scale. Here we report the use of secondary structural alignment to characterize the Src homology 2 (SH2) domains of both conventional and divergent sequences and divide them into two groups, Srctype and STAT-type. In addition to the basic "␣␣" structure (〉), the Src-type SH2 domain contains an extra -strand (E or E-F motif). Alternatively, the linker domain-conjugated SH2 domain in STAT contains the ␣B motif. Combining BLAST data from 〉 core motif sequences with predicted secondary structural alignment, we have screened for SH2 domains in various eukaryotic model systems including Arabidopsis, Dictyostelium, and Saccharomyces. Two novel genes carrying the linker-SH2 domain of STAT were discovered and subsequently cloned from Arabidopsis. These genes, designated as STAT-type linker-SH2 domain factors (STATL), are found in a wide array of vascular and nonvascular plants, suggesting that the linker-SH2 domain evolved prior to the divergence of plants and animals. Using this approach, we expanded the number of putative SH2 domain-bearing genes in Dictyostelium and comparatively studied the secondary structural profiles of both typical and atypical SH2 domains. Our results indicate that the linker-SH2 domain of the transcription factor STAT is one of the most ancient and fully developed functional domains, serving as a template for the continuing evolution of the SH2 domain essential for phosphotyrosine signal transduction.
Molecular & Cellular Proteomics 3:704 -714, 2004.The Src homology 2 (SH2) 1 domain is an ϳ100-aa-long motif that recognizes and interacts with phosphotyrosinecontaining motifs on the same or different protein molecules during signal transduction in animal cells. About 200 SH2 domain-containing genes have been identified in human cells, suggesting that this domain is one of the most rapidly expanded protein modules (1). In animal cells, SH2 domains are predominately present in signaling molecules, i.e. signalingrelated enzymes including protein tyrosine kinases, protein tyrosine phosphatases, inositol phosphatase, and phospholipase and signaling adapters. However, the SH2 domain has also been found in transcription factor STAT family members (2, 3). In a signaling molecule with catalytic activity, the SH2 domain is often conjugated immediately upstream with another functional motif such as the SH3 domain, whereas in STAT the linker domain is immediate upstream of the SH2 domain. Recently, two STAT proteins have been discovered in Dictyostelium, a facultative slime mold capable of both growing as a single cell and differentiating into multicellular structures (4, 5). More recently,...