Plasmodium vivax gene regulation remains difficult to study due to the lack of a robust in vitro culture method, low parasite densities in peripheral circulation and asynchronous parasite development. We adapted an RNA-seq protocol "DAFT-seq" to sequence the transcriptome of four P. vivax field isolates that were cultured for a short period ex vivo before using a density gradient for schizont enrichment. Transcription was detected from 78% of the PvP01 reference genome, despite being schizont-enriched samples. This extensive data was used to define thousands of 5' and 3' untranslated regions (UTRs), some of which overlapped with neighbouring transcripts, and to improve the gene models of 352 genes, including identifying 20 novel gene transcripts. This dataset has also significantly increased the known amount of heterogeneity between P. vivax schizont transcriptomes from individual patients. The majority of genes found to be differentially expressed between the isolates lack Plasmodium falciparum homologs and are predicted to be involved in host-parasite interactions, with an enrichment in reticulocyte binding proteins, merozoite surface proteins and exported proteins with unknown function. An improved understanding of the diversity within P. vivax transcriptomes will be essential for the prioritisation of novel vaccine targets.
Results
Preparation of purified late-stage schizont transcriptomesFour blood samples from Cambodian patients were selected to undergo short-term ex-vivo culture. After the majority of parasites had matured, as judged by microscopy, late-stage schizonts were purified using Percoll gradients. Four RNA-seq libraries were generated using a modified version of the DAFT-seq protocol, which was optimised for highly AT-rich Plasmodium parasites 23 , and sequenced using the Illumina platform to generate 55-63 million reads per patient sample. In all cases >85% of reads mapped to the P. vivax PvP01 reference genome (Table S1). This data was used to improve the gene models of 352 genes in the PvP01 genome, including identifying 20 novel gene transcripts (Table S2). Comparison of the expression values of these RNA-seq libraries to blood stage microarray time course from a P. vivax dataset (containing three patient isolates) 12 (Fig. S1) and a more densely sampled P. falciparum dataset 24 (Fig. S2) confirmed that the samples were late-stage schizonts (as they correlated most strongly to these time points in the prior datasets) and that the transcriptomes were highly similar to each other (as the correlation of the normalised expression values of the RNA-seq libraries was close to 1) (Fig. S3). More apparent heterogeneity was found between the patient isolates using the PVP01 genome than using only the gene IDs present in the Sal1 genome (Fig. S4), consistent with most of the heterogeneity within the patient isolate transcriptomes being present in multigene families that are difficult to assemble and annotate, and are thus under-represented in the Sal1 genome.