The diamondback moth, Plutella xylostella (L.), is the major cosmopolitan pest of brassica and other cruciferous crops. Its larval midgut is a dynamic tissue that interfaces with a wide variety of toxicological and physiological processes. The draft sequence of the P. xylostella genome was recently released, but its annotation remains challenging because of the low sequence coverage of this branch of life and the poor description of exon/intron splicing rules for these insects. Peptide sequencing by computational assignment of tandem mass spectra to genome sequence information provides an experimental independent approach for confirming or refuting protein predictions, a concept that has been termed proteogenomics. In this study, we carried out an in-depth proteogenomic analysis to complement genome annotation of P. xylostella larval midgut based on shotgun HPLC-ESI-MS/MS data by means of a multialgorithm pipeline. A total of 876,341 tandem mass spectra were searched against the predicted P. xylostella protein sequences and a whole-genome six-frame translation database. Based on a data set comprising 2694 novel genome search specific peptides, we discovered 439 novel protein-coding genes and corrected 128 existing gene models. To get the most accurate data to seed further insect genome annotation, more than half of the novel protein-coding genes, i.e. 235 over 439, were further validated after RT-PCR amplification and sequencing of the corresponding transcripts. Furthermore, we validated 53 novel alternative splicings. Finally, a total of 6764 proteins were identified, resulting in one of the most comprehensive proteogenomic study of a nonmodel animal. As the first tissue-specific proteogenomics analysis of P. xylostella, this study provides the fundamental basis for high-throughput proteomics and functional genomics approaches aimed at deciphering the molecular mechanisms of resistance and controlling this pest. Thousands of eukaryotic genomes have been sequenced with the rapid development of massive parallel sequencing technologies. The annotation of their genomic DNA sequences is a prerequisite to delineate conserved elements and understand the specific encoded functions. Today, conventional approaches used to identify protein-coding genes are highly dependent on predictions based on computational algorithms and homology searches against known proteins. Nonmodel organisms, such as most arthropods, are by nature distantly related to well-studied organisms (1). As a result, their genomes encode a large majority of proteins with little similarity to known proteins, challenging the annotation process. Much recent focus on computational gene finding is using transcript evidence to complement this predictionbased approach. Indeed, the use of information from deep sequencing of mRNA-derived cDNA libraries (RNA-seq) or expressed sequence tag (EST) 1 libraries can dramatically improve the genome annotation confidence (2-4). However, this analysis remains at the transcript level and cannot make a crystal-clear distinct...