Splicing of short introns by the nuclear pre-mRNA splicing machinery is thought to proceed via an ''intron definition'' mechanism, in which the 5 and 3 splice sites (5 ss, 3 ss, respectively) are initially recognized and paired across the intron. Here, we describe a computational analysis of sequence features involved in recognition of short introns by using available transcript data from five eukaryotes with complete or nearly complete genomic sequences. The information content of five different transcript features was measured by using methods from information theory, and Monte Carlo simulations were used to determine the amount of information required for accurate recognition of short introns in each organism. We conclude: (i) that short introns in Drosophila melanogaster and Caenorhabditis elegans contain essentially all of the information for their recognition by the splicing machinery, and computer programs that simulate splicing specificity can predict the exact boundaries of Ϸ95% of short introns in both organisms; (ii) that in yeast, the 5 ss, branch signal, and 3 ss can accurately identify intron locations but do not precisely determine the location of 3 cleavage in every intron; and (iii) that the 5 ss, branch signal, and 3 ss are not sufficient to accurately identify short introns in plant and human transcripts, but that specific subsets of candidate intronic enhancer motifs can be identified in both human and Arabidopsis that contribute dramatically to the accuracy of splicing simulators. R NA splicing is an essential step in the expression of most eukaryotic genes. An important goal of research on this process is to determine a set of rules that accurately predicts the splicing pattern of primary transcripts. Unlike the process of mRNA translation by the ribosome, which follows a set of rules that is essentially invariant in all known organisms, the rules governing RNA splicing clearly differ between different groups of eukaryotes. Therefore, there is not one but several variants of the ''splicing code'' that remain to be worked out. In addition, the rules for splicing appear to be significantly more complex than those for translation, involving presence of multiple degenerate motifs occurring with appropriate spacing in the transcript. Development of computer algorithms that directly model recognition by the splicing machinery is recognized as an important challenge (1).In human transcripts, the exons are usually short (typically 100-200 bases) and the introns are much longer, averaging about 3 kb (2). The realization that the splicing machinery would face great difficulty in locating splice sites across such long introns led to the exon definition model in which splice sites are paired first across the exons, with spliceosome assembly proceeding through subsequent pairing of exon units (3). The alternative intron definition model derives from the observation that introns in some transcripts (especially in invertebrates) are quite short relative to exons, and so the splicing machinery may initiall...