We have determined the exact splicing patterns of the mRNAs of the minute virus of mice by a combination of cDNA sequencing and SI nuclease protection analysis. There are four virus-specific mRNA species, each coding for one of the four polypeptides identified by in vitro translation. The RI mRNA comprises sequences from nucleotide-200 to 2281 and from 2378 to-4800 and codes for the NS1 protein. The R2 mRNA is derived from nucleotides-200 to 515, 1991 to 2281, and 2378 to-4800 and codes for the NS2 protein. Between nucleotides 1991 and 2281, the coding sequence for NS2 overlaps that of NS1, but in a different reading frame. R3 covers nucleotides-2007 to 2281 and 2378 to-4800 and codes for VP2. The fourth species, R3', differs from R3 by using an alternative splice donor and acceptor in the region around 47 map units (nucleotide 2400); it extends from nucleotide-2007 to 2317 and from 2400 to-4800 and almost certainly codes for VP1. The R2 transcript is unusual in that the intron that was removed from it (nucleotides 516 to 1990) starts with GC rather than the canonical GU. With the exception of the splice acceptor at position 2378, which is found only in rodent parvoviruses, the splice junctions are highly conserved among autonomous parvoviruses. These results show that minute virus of mice, like other small DNA viruses, uses multiple strategies to compress the coding information for several viral proteins into a short (5,104 nucleotide) genome.
BackgroundThe Pan-African bioinformatics network, H3ABioNet, comprises 27 research institutions in 17 African countries. H3ABioNet is part of the Human Health and Heredity in Africa program (H3Africa), an African-led research consortium funded by the US National Institutes of Health and the UK Wellcome Trust, aimed at using genomics to study and improve the health of Africans. A key role of H3ABioNet is to support H3Africa projects by building bioinformatics infrastructure such as portable and reproducible bioinformatics workflows for use on heterogeneous African computing environments. Processing and analysis of genomic data is an example of a big data application requiring complex interdependent data analysis workflows. Such bioinformatics workflows take the primary and secondary input data through several computationally-intensive processing steps using different software packages, where some of the outputs form inputs for other steps. Implementing scalable, reproducible, portable and easy-to-use workflows is particularly challenging.ResultsH3ABioNet has built four workflows to support (1) the calling of variants from high-throughput sequencing data; (2) the analysis of microbial populations from 16S rDNA sequence data; (3) genotyping and genome-wide association studies; and (4) single nucleotide polymorphism imputation. A week-long hackathon was organized in August 2016 with participants from six African bioinformatics groups, and US and European collaborators. Two of the workflows are built using the Common Workflow Language framework (CWL) and two using Nextflow. All the workflows are containerized for improved portability and reproducibility using Docker, and are publicly available for use by members of the H3Africa consortium and the international research community.ConclusionThe H3ABioNet workflows have been implemented in view of offering ease of use for the end user and high levels of reproducibility and portability, all while following modern state of the art bioinformatics data processing protocols. The H3ABioNet workflows will service the H3Africa consortium projects and are currently in use. All four workflows are also publicly available for research scientists worldwide to use and adapt for their respective needs. The H3ABioNet workflows will help develop bioinformatics capacity and assist genomics research within Africa and serve to increase the scientific output of H3Africa and its Pan-African Bioinformatics Network.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.