In this paper we propose a workflow for studying the genetic architecture of ischemic stroke outcomes. It develops further the candidate gene approach. The workflow is based on the animal model of brain ischemia, comparative genomics, human genomic variations, and algorithms of selection of tagging single nucleotide polymorphisms (tagSNPs) in genes which expression was changed after ischemic stroke. The workflow starts from a set of rat genes that changed their expression in response to brain ischemia and results in a set of tagSNPs, which represent other SNPs in the human genes analyzed and influenced on their expression as well.
Background
The imputation of genotypes increases the power of genome-wide association studies. However, the imputation quality should be assessed in each particular case. Nevertheless, not all imputation softwares control the error of output, e.g., the last release of fastPHASE program (1.4.8) lacks such an option. In this particular software there is also an uncertainty in choosing the model parameters. fastPHASE is based on haplotype clusters, which size should be set a priori. The parameter influences the results of imputation and downstream analysis.
Results
We present a software toolkit
imputeqc
to assess the imputation quality and/or to choose the model parameters for imputation. We demonstrate the efficacy of toolkit for evaluation of imputations made with both fastPHASE and BEAGLE software for HapMap and 1000 Genomes data. The discordance of genotypes received correlated well in both methods. Using
imputeqc
, we also shown how to choose the optimal number of haplotype clusters and expectation-maximization cycles for fastPHASE program. The found number of haplotype clusters of 25 was further applied for hapFLK testing that revealed signatures of selection at LCT region on chromosome 2. We also demonstrated how to decrease the computational time in the case of hapFLK testing from 3 days to 20 h.
Conclusions
The toolkit is implemented as an R package
imputeqc
and command line scripts. The code is freely available at
https://github.com/inzilico/imputeqc
under the MIT license.
We performed an exhaustive pairwise comparison of whole-genome sequences of 3120 individuals, representing 232 populations from all continents and seven prehistoric people including archaic and modern humans. In order to reveal an intricate picture of worldwide human genetic relatedness, 65 million very rare single nucleotide polymorphic (SNP) alleles have been bioinformatically processed. The number and size of shared identical-by-descent (IBD) genomic fragments for every pair of 3127 individuals have been revealed. Over 17 million shared IBD fragments have been described. Our approach allowed detection of very short IBD fragments (<20 kb) that trace common ancestors who lived up to 200,000 years ago. We detected nine distinct geographical regions within which individuals had strong genetic relatedness, but with negligible relatedness between the populations of these regions. The regions, comprising nine unique genetic components for mankind, are the following: East and West Africa, Northern Europe, Arctica, East Asia, Oceania, South Asia, Middle East, and South America. The level of admixture in every studied population has been apportioned among these nine genetic components. Genetically, long-term neighboring populations are strikingly similar to each other in spite of any political, religious, and cultural differences. The topmost admixture has been observed at the center of Eurasia. These admixed populations (including Uyghurs, Azerbaijanis, Uzbeks, and Iranians) have roughly equal genetic contributions from the Middle East, Europe, China, and India, with additional significant traces from Africa and Arctic. The entire picture of relatedness of all the studied populations unfolds and presents itself in the form of shared number/size of IBDs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.