DOI: 10.12688/wellcomeopenres.17795.1
|View full text |Cite
Sign up to set email alerts

An open dataset of Plasmodium vivax genome variation in 1,895 worldwide samples

Abstract: This report describes the MalariaGEN Pv4 dataset, a new release of curated genome variation data on 1,895 samples of Plasmodium vivax collected at 88 worldwide locations between 2001 and 2017. It includes 1,370 new samples contributed by MalariaGEN and VivaxGEN partner studies in addition to previously published samples from these and other sources. We provide genotype calls at over 4.5 million variable positions including over 3 million single nucleotide polymorphisms (SNPs), as well as short indels and tande… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections


Citation Types


Year Published


Publication Types






Cited by 30 publications
(57 citation statements)
References 30 publications
Order By: Relevance
“…In parallel we are in the process of validating another version of the Pv AmpliSeq, which has a different within-country barcode designed with genomes from Peru (manuscript in preparation) with potential applicability to the wider Amazon basin. For other countries where it is not feasible to design a specific within-country barcode, allele frequencies of existing Pv AmpliSeq barcodes can be evaluated using the resources provided in this manuscript (supplementary file S5) or other genomic resources such as the MalariGEN datasets ( Adam et al., 2022 ), if the country or region is represented in these datasets. Ideally, MAF should vary between 0.1-0.5 with differences in MAF at smaller geographic scales within the country or region.…”
Section: Discussionmentioning
confidence: 99%
“…In parallel we are in the process of validating another version of the Pv AmpliSeq, which has a different within-country barcode designed with genomes from Peru (manuscript in preparation) with potential applicability to the wider Amazon basin. For other countries where it is not feasible to design a specific within-country barcode, allele frequencies of existing Pv AmpliSeq barcodes can be evaluated using the resources provided in this manuscript (supplementary file S5) or other genomic resources such as the MalariGEN datasets ( Adam et al., 2022 ), if the country or region is represented in these datasets. Ideally, MAF should vary between 0.1-0.5 with differences in MAF at smaller geographic scales within the country or region.…”
Section: Discussionmentioning
confidence: 99%
“…The study used genomic data on P. vivax derived from the Malaria Genomic Epidemiology (MalariaGEN) P. vivax Genome Variation Project release 4 (Pv4), which has recently been published as an open dataset 26 . The Pv4 open data set comprises genomes from 26 countries.…”
Section: Methodsmentioning
confidence: 99%
“…For the analysis in this study, the dataset was divided into two parts, a training dataset, and a validation dataset. The validation set consisted of isolates from 7 countries (Brazil, Cambodia, Colombia, Ethiopia, Peru, Thailand, and Vietnam) derived from a clinical trial conducted by GlaxoSmithKline (GSK) 26 . All remaining isolates were included in the training dataset, which comprised representation of all the countries in the validation set.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations