We identify a set of 575 human genes that are expressed in all conditions tested in a publicly available database of microarray results. Based on this common occurrence, the set is expected to be rich in "housekeeping" genes, showing constitutive expression in all tissues. We compare selected aspects of their genomic structure with a set of background genes. We find that the introns, untranslated regions and coding sequences of the housekeeping genes are shorter, indicating a selection for compactness in these genes.
1The amazing diversity of the human body stems from the different expression patterns of genes in different tissues. Although most genes show constitutive expression in only a subset of tissues, some gene products are required for the maintenance of the basal cellular function and are constitutively found in all human cells. These genes are called housekeeping genes (HK genes) [1]. HK genes can be used to calibrate measurements of gene expression [2].They might also help to define the minimal gene complement needed for a human cell [1].Several attempts have been made recently to define the complete set of HK genes [3,4].Microarrays are often used to identify sets of genes that are expressed either ubiquitously or in specific tissues or conditions. However, the technique is technically demanding and prone to artifacts, so independent evidence is often required to confirm the results. In principle, identifying the set of HK genes using microarray data is straightforward; one need only look for genes that are expressed in all tissues and all experimental conditions.Employing such an approach has so far resulted in two lists of HK genes [3,4]. However, problems in probe design, measurement noise and other artifacts introduce inevitable errors in such lists. Because a northern blot experiment for each gene in each tissue is impractical, an independent test is needed to validate any list of HK genes. Here, we report a validation test that uses a recently discovered property of highly expressed genes.The transcription process is both slow and costly; it takes 50 milliseconds [5,6] and two ATP molecules [7] approximately to transcribe a nucleotide. This might be expected to provide selective pressure to make genes as short as functionally possible. The more copies of a gene required for the organism, the stronger this pressure should be. The first demonstration of this principle [8] showed that genes with a large number of expressed sequence tags (ESTs) in public libraries (and hence most mRNAs) have a significantly shorter average intron length than those with fewer ESTs.Here, an implication of this principle is used to validate a set of HK genes. The HK genes, which are transcribed in all somatic cells and under all circumstances, are by nature highly expressed, and therefore should be selected to have shorter introns. We used a recently published database of microarray experiments [9] to identify a set of HK genes. As a further validation step, we checked the Gene Ontology (GO) annotation of thes...