BackgroundThe high level of identity among duplicated homoeologous genomes in tetraploid pasta wheat presents substantial challenges for de novo transcriptome assembly. To solve this problem, we develop a specialized bioinformatics workflow that optimizes transcriptome assembly and separation of merged homoeologs. To evaluate our strategy, we sequence and assemble the transcriptome of one of the diploid ancestors of pasta wheat, and compare both assemblies with a benchmark set of 13,472 full-length, non-redundant bread wheat cDNAs.ResultsA total of 489 million 100 bp paired-end reads from tetraploid wheat assemble in 140,118 contigs, including 96% of the benchmark cDNAs. We used a comparative genomics approach to annotate 66,633 open reading frames. The multiple k-mer assembly strategy increases the proportion of cDNAs assembled full-length in a single contig by 22% relative to the best single k-mer size. Homoeologs are separated using a post-assembly pipeline that includes polymorphism identification, phasing of SNPs, read sorting, and re-assembly of phased reads. Using a reference set of genes, we determine that 98.7% of SNPs analyzed are correctly separated by phasing.ConclusionsOur study shows that de novo transcriptome assembly of tetraploid wheat benefit from multiple k-mer assembly strategies more than diploid wheat. Our results also demonstrate that phasing approaches originally designed for heterozygous diploid organisms can be used to separate the close homoeologous genomes of tetraploid wheat. The predicted tetraploid wheat proteome and gene models provide a valuable tool for the wheat research community and for those interested in comparative genomic studies.
The mce operons constitute four homologous regions in the Mycobacterium tuberculosis genome, each of which has 8-13 ORFs. Although the function of the Mce protein family has not been clearly established, its members are believed to be membrane lipid transporters. Based on functional experiments, we found that the regulator of the mce3 locus, Mce3R, negatively regulates the expression of the Rv1933c-Rv1935c and Rv1936-Rv1941 transcriptional units. These operons are adjacent to one another and divergently transcribed. The predicted functions of most of these genes are related to either lipid metabolism or redox reactions. Bioinformatic analysis of the 59 UTR sequences of the differentially expressed genes allowed us to define a putative Mce3R motif. Importantly, the Mce3R motif was present six and three times in the mce3R-yrbE3A and Rv1935c-Rv1936 intergenic regions, respectively. Two occurrences of this motif mapped within the two regions of the mce3 operon that were protected by Mce3R in a footprinting analysis, thus indicating that this motif is likely to serve as an operator site for the Mce3R regulator in the promoter. In addition, alterations in the lipid content of M. tuberculosis were detected in the absence of Mce3R. Taken together, these results suggest that Mce3R controls the expression of both the putative transport system encoded in the mce3 operon and the enzymes implicated in the modification of the Mce3-transported substrates.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.