We systematically generated large-scale data sets to improve genome annotation for the nematode Caenorhabditis elegans, a key model organism. These data sets include transcriptome profiling across a developmental time course, genome-wide identification of transcription factor–binding sites, and maps of chromatin organization. From this, we created more complete and accurate gene models, including alternative splice forms and candidate noncoding RNAs. We constructed hierarchical networks of transcription factor–binding and microRNA interactions and discovered chromosomal locations bound by an unusually large number of transcription factors. Different patterns of chromatin composition and histone modification were revealed between chromosome arms and centers, with similarly prominent differences between autosomes and the X chromosome. Integrating data types, we built statistical models relating chromatin, transcription factor binding, and gene expression. Overall, our analyses ascribed putative functions to most of the conserved genome.
Allele-specific DNA methylation (ASM) is a hallmark of imprinted genes, but ASM in the larger nonimprinted fraction of the genome is less well characterized. Using methylation-sensitive SNP analysis (MSNP), we surveyed the human genome at 50K and 250K resolution, identifying ASM as recurrent genotype call conversions from heterozygosity to homozygosity when genomic DNAs were predigested with the methylation-sensitive restriction enzyme HpaII. Using independent assays, we confirmed ASM at 16 SNP-tagged loci distributed across various chromosomes. At 12 of these loci (75%), the ASM tracked strongly with the sequence of adjacent SNPs. Further analysis showed allele-specific mRNA expression at two loci from this methylation-based screen--the vanin and CYP2A6-CYP2A7 gene clusters--both implicated in traits of medical importance. This recurrent phenomenon of sequence-dependent ASM has practical implications for mapping and interpreting associations of noncoding SNPs and haplotypes with human phenotypes.
The immense growth in the volume of research literature and experimental data in the field of molecular biology calls for efficient automatic methods to capture and store information. In recent years, several groups have worked on specific problems in this area, such as automated selection of articles pertinent to molecular biology, or automated extraction of information using natural-language processing, information visualization, and generation of specialized knowledge bases for molecular biology. GeneWays is an integrated system that combines several such subtasks. It analyzes interactions between molecular substances, drawing on multiple sources of information to infer a consensus view of molecular networks. GeneWays is designed as an open platform, allowing researchers to query, review, and critique stored information.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.