The authors of the first yeast chromosome sequence defined a minimum threshold requirement of 100 codons, above which an open reading frame (ORF) is retained as a putative coding sequence. However, at least 58 yeast genes shorter than 100 codons have an assigned protein function. Therefore, the yeast genome may contain other tiny but functionally important genes that are discarded from analyses by this simple filtering rule.
We have established discriminant functions from the in‐phase hexamer frequencies of functional genes and of simulated ORFs derived from a stationary Markov chain model. Fifty‐two out of the 58 genes were recognized as coding ORFs by our discriminating method. The test was also applied to all the small ORFs (36 to 100 codons) found in the intergenic regions of published chromosomes. It retained 140 new potential tiny coding sequences, among which we identified seven new genes by similarity searches. Our method, used conjointly with similarity searches, can also highlight sequencing errors resulting from the disruption of the coding frame of longer ORFs. This method, by its ability to detect potential coding ORFs, can be a very useful tool for functional analysis.