Diatoms are one of the most important constituents of phytoplankton communities in aquatic environments, but in spite of this, only recently have large-scale diatom-sequencing projects been undertaken. With the genome of the centric species Thalassiosira pseudonana available since mid-2004, accumulating sequence information for a pennate model species appears a natural subsequent aim. We have generated over 12,000 expressed sequence tags (ESTs) from the pennate diatom Phaeodactylum tricornutum, and upon assembly into a nonredundant set, 5,108 sequences were obtained. Significant similarity (E , 1E-04) to entries in the GenBank nonredundant protein database, the COG profile database, and the Pfam protein domains database were detected, respectively, in 45.0%, 21.5%, and 37.1% of the nonredundant collection of sequences. This information was employed to functionally annotate the P. tricornutum nonredundant set and to create an internet-accessible queryable diatom EST database. The nonredundant collection was then compared to the putative complete proteomes of the green alga Chlamydomonas reinhardtii, the red alga Cyanidioschyzon merolae, and the centric diatom T. pseudonana. A number of intriguing differences were identified between the pennate and the centric diatoms concerning activities of relevance for general cell metabolism, e.g. genes involved in carbon-concentrating mechanisms, cytosolic acetyl-Coenzyme A production, and fructose-1,6-bisphosphate metabolism. Finally, codon usage and utilization of C and G relative to gene expression (as measured by EST redundance) were studied, and preferences for utilization of C and CpG doublets were noted among the P. tricornutum EST coding sequences.