The genomes of individuals from the same species vary in sequence as a result of different evolutionary processes. To examine the patterns of, and the forces shaping, sequence variation in Arabidopsis thaliana, we performed high-density array resequencing of 20 diverse strains (accessions). More than 1 million nonredundant single-nucleotide polymorphisms (SNPs) were identified at moderate false discovery rates (FDRs), and approximately 4% of the genome was identified as being highly dissimilar or deleted relative to the reference genome sequence. Patterns of polymorphism are highly nonrandom among gene families, with genes mediating interaction with the biotic environment having exceptional polymorphism levels. At the chromosomal scale, regional variation in polymorphism was readily apparent. A scan for recent selective sweeps revealed several candidate regions, including a notable example in which almost all variation was removed in a 500-kilobase window. Analyzing the polymorphisms we describe in larger sets of accessions will enable a detailed understanding of forces shaping population-wide sequence variation in A. thaliana.
Mutations in the gene encoding the methyl-CG binding protein MeCP2 cause several neurological disorders including Rett syndrome. The di-nucleotide methyl-CG (mCG) is the classical MeCP2 DNA recognition sequence, but additional methylated sequence targets have been reported. Here we show by in vitro and in vivo analyses that MeCP2 binding to non-CG methylated sites in brain is largely confined to the tri-nucleotide sequence mCAC. MeCP2 binding to chromosomal DNA in mouse brain is proportional to mCAC + mCG density and unexpectedly defines large genomic domains within which transcription is sensitive to MeCP2 occupancy. Our results suggest that MeCP2 integrates patterns of mCAC and mCG in the brain to restrain transcription of genes critical for neuronal function.
Background: For splice site recognition, one has to solve two classification problems: discriminating true from decoy splice sites for both acceptor and donor sites. Gene finding systems typically rely on Markov Chains to solve these tasks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.