In order to upgrade the genome sequence information of J. curcas L., we integrated de novo assembly of a total of 537 million paired-end reads generated from the Illumina sequencing platform into the current genome assembly which was obtained by a combination of the conventional Sanger method and the Roche/454 sequencing platform. e total length of the upgraded genome sequences thus obtained was 297,661,187 bp consisting of 39,277 contigs. e average and N50 lengths of the generated contigs were 7,579 bp and 15,950 bp, both of which were increased fourfold from the previous genome assembly. Along with genome sequence upgrading, the currently available transcriptome data were collected from the public databases and assembled into 19,454 tentative consensus sequences. Based on a comparison between these tentative consensus sequences of transcripts and the predictions of computer programs, a total of 30,203 complete and partial structures of protein-encoding genes were deduced. e number of genes with complete structures was increased about threefold from the previous genome annotation. By applying the upgraded genome sequence and predicted proteincoding gene information, the number and features of the tandemly arrayed genes, syntenic relations between Jatropha and other plant genomes, and structural features of transposable elements were investigated. e detailed information on the updated J. curcas genome is available at http://www.kazusa.or.jp/jatropha/.
Key words:Jatropha curcas, genome sequencing, transcriptome sequences, tentative consensus sequence, tandem gene duplication, database.Jatropha curcas L. is a perennial small tree or large shrub that belongs to the Euphorbiaceae family. J. curcas is endemic to central America but is distributed throughout the tropics and subtropics of Asia and Africa. J. curcas is an important non-edible oilseed crop with great potential for the production of biodiesel fuel. Since J. curcas is an undomesticated plant, its positive attributes in terms of breeding and utilization are not fully understood.In order to accelerate its genetic improvement, it is desirable to understand the genome information of J. curcas. With this goal in mind, we have analyzed the genome sequence of J. curcas by applying combined sequencing methods, and have made the obtained sequence information available through the public and web databases.e accumulated genome information (JAT_r3.0) was 285,858,490 bp consisting of 120,586 contigs and 29,831 singlets, and this accounted for approximately 95% of the gene-containing regions. A total of 40,929 complete and partial structures of proteinencoding genes have been deduced on the accumulated genome sequences. However, the majority of the predicted genes were partially predicted ones as the contig lengths were relatively short in JAT_r3.0. Further improvement of the genome sequence information is therefore needed.Along with the genome sequence approach, several transcriptome analyses have been attempted. Natarajan et al. have reported 12,084 ESTs using...