MotifCut server and other materials can be found at motifcut.stanford.edu.
The genome of an admixed individual with ancestors from isolated populations is a mosaic of chromosomal blocks, each following the statistical properties of variation seen in those populations. By analyzing polymorphisms in the admixed individual against those seen in representatives from the populations, we can infer the ancestral source of the individual's haploblocks. In this paper we describe a novel approach for ancestry inference, HAPAA (HMM-based analysis of polymorphisms in admixed ancestries), that models the allelic and haplotypic variation in the populations and captures the signal of correlation due to linkage disequilibrium, resulting in greatly improved accuracy. We also introduce a methodology for evaluating the effect of genetic divergence between ancestral populations and time-to-admixture on inference accuracy. Using HAPAA, we explore the limits of ancestry inference in closely related populations.[HAPAA is available at http://hapaa.stanford.edu.]Human population migration, adaptation, and admixture have a chaotic and mostly undocumented history. However, nature has auspiciously recorded its account of events within our genomes, and we are at the cusp of an era where we will be able to unlock these records. An individual's genome is a mosaic of ancestral haploblocks whose sizes depend on how far back in the ancestry we compare them. Because recombination can occur essentially anywhere in the genome, the precise boundaries and sources of these haploblocks cannot be easily inferred. However, if the haploblocks are derived from isolated human subpopulations, they will tend to follow the patterns of variation seen in those populations. Using these patterns, we can partition an admixed individual's genome into a mosaic of blocks derived from different populations. The inference of admixed ancestries is intriguing from a personal perspective because it speaks to an individual's origins. In addition, it can be used in association mapping studies to identify loci relevant in genetic disease (McKeigue 1998;Hoggart et al. 2004;Montana and Pritchard 2004;Patterson et al. 2004;Zhu et al. 2004Zhu et al. , 2005 and will help unravel some of the complexities in the history of human evolution.Although recent work suggests that human genomes differ significantly in many ways (Redon et al. 2006), single nucleotide polymorphisms (SNPs) are ubiquitous and can serve as markers for the variation. Recent advances in genotyping technology allow us to genotype hundreds of thousands of SNPs in a single experiment, making them a convenient vehicle for studying genome-wide variation. For example, the Illumina HumanHap550 genotyping chip can assay over 550,000 tag-SNP loci for a few hundred dollars (http://illumina.com/pages.ilmn?ID=154). Because linkage disequilibrium (LD) has a strong effect at short genetic distances, the high-density coverage of such genotyping chips makes it possible to infer much of the intervening genomic variation (Carlson et al. 2004). Using SNPs as a basis for variation, methods have been describ...
Given a set of known binding sites for a specific transcription factor, it is possible to build a model of the transcription factor binding site, usually called a motif model, and use this model to search for other sites that bind the same transcription factor. Typically, this search is performed using a position-specific scoring matrix (PSSM), also known as a position weight matrix. In this paper we analyze a set of eukaryotic transcription factor binding sites and show that there is extensive clustering of similar k-mers in eukaryotic motifs, owing to both functional and evolutionary constraints. The apparent limitations of probabilistic models in representing complex nucleotide dependencies lead us to a graph-based representation of motifs. When deciding whether a candidate k-mer is part of a motif or not, we base our decision not on how well the k-mer conforms to a model of the motif as a whole, but how similar it is to specific, known k-mers in the motif. We elucidate the reasons why we expect graph-based methods to perform well on motif data. Our MotifScan algorithm shows greatly improved performance over the prevalent PSSM-based method for the detection of eukaryotic motifs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.