As genomic sequences become easier to acquire, shotgun proteomics will play an increasingly important role in genome annotation. With proteomics, researchers can confirm and revise existing genome annotations and discover completely new genes. Proteomic-based de novo gene discovery should be especially useful for sets of genes with characteristics that make them difficult to predict with gene-finding algorithms. Here, we report the proteomic discovery of 19 previously unannotated genes encoding seminal fluid proteins (Sfps) that are transferred from males to females during mating in Drosophila. Using bioinformatics, we detected putative orthologs of these genes, as well as 19 others detected by the same method in a previous study, across several related species. Gene expression analysis revealed that nearly all predicted orthologs are transcribed and that most are expressed in a male-specific or male-biased manner. We suggest several reasons why these genes escaped computational prediction. Like annotated Sfps, many of these new proteins show a pattern of adaptive evolution, consistent with their potential role in influencing male sperm competitive ability. However, in contrast to annotated Sfps, these new genes are shorter, have a higher rate of nonsynonymous substitution, and have a markedly lower GC content in coding regions. Our data demonstrate the utility of applying proteomic gene discovery methods to a specific biological process and provide a more complete picture of the molecules that are critical to reproductive success in Drosophila.[Supplemental material is available online at www.genome.org. The sequence data from this study have been submitted to GenBank (http://www.ncbi.nlm.nih.gov/Genbank/) under accession nos. FJ460563-FJ460581. Mass spectrometry data are available in the PRIDE database under accession nos. 9199-9203.] Advances in DNA sequencing technology have made it cheaper and easier to determine the complete genome sequences of a variety of organisms. However, a fully sequenced genome is only a starting point for understanding an organism's biology. One critical, subsequent step is to annotate the complete sets of proteins used by the organism in specific biological processes. The first pass at genome annotation often comes from gene prediction algorithms, which scan DNA sequences for features of genes (such as open-reading frames and GC content) and examine crossspecies conservation to infer functionally important regions (Burge and Karlin 1997;Brent and Guigo 2004). These computational methods have identified many new genes, but they remain imperfect and cannot provide experimental validation of their predicted gene models. Mass spectrometry (MS)-based proteomic methods can be used to refine computational gene annotations and identify novel genes (Ansong et al. 2008;Gupta et al. 2008). Mass spectra are typically searched against a database of predicted proteins; the peptides that are identified confirm and refine gene models derived from computational work. When these searches are exp...