Chromatin immunoprecipitation followed by tag sequencing (ChIP-Seq) using high-throughput next-generation instrumentation is replacing ChIP-chip for mapping of sites of transcription-factor binding and chromatin modification. To develop a scoring approach for this new technique, we produce two deeply sequenced datasets for human RNA polymerase II and STAT1 with matching input-DNA controls. In these, we observe that signal peaks corresponding to sites of potential binding are strongly correlated with peaks in the control, likely revealing features of open chromatin. Based on these observations, we develop a two-pass approach for scoring ChIP-Seq relative to controls. The first pass identifies putative binding sites and compensates for genomic variation in the mappability of sequences. The second pass filters sites not significantly enriched compared to the normalized control, computing precise enrichments and significances. Using our scoring we investigate optimal experimental design -i.e. depth of sequencing and value of replicas (showing marginal information gain beyond two).With the advent of new high-throughput sequencing technologies (Helicos HeliScope, Illumina Genome Analyzer, ABI SOLiD, Roche 454), most genome scale assays that previously could only be done cost-effectively using genomic tiling microarrays can now be performed using DNA sequencing. One of the most common uses of tiling microarrays is for performing ChIP-chip 1-3 . In ChIP-chip, DNA associated with a protein of interest is immunoprecipitated using an antibody specific to that protein (chromatin immunoprecipitation or ChIP) and the resulting DNA is labeled and hybridized to a genomic tiling microarray. Early adaptations of ChIP sequencing (e.g. STAGE 4 , ChIP-PET 5,6 ) used Sanger-based sequencing, which generally provided limited tags and/or was expensive. The
Background: Genetic association studies, thus far, have focused on the analysis of individual main effects of SNP markers. Nonetheless, there is a clear need for modeling epistasis or gene-gene interactions to better understand the biologic basis of existing associations. Tree-based methods have been widely studied as tools for building prediction models based on complex variable interactions. An understanding of the power of such methods for the discovery of genetic associations in the presence of complex interactions is of great importance. Here, we systematically evaluate the power of three leading algorithms: random forests (RF), Monte Carlo logic regression (MCLR), and multifactor dimensionality reduction (MDR). Methods: We use the algorithm-specific variable importance measures (VIMs) as statistics and employ permutation-based resampling to generate the null distribution and associated p values. The power of the three is assessed via simulation studies. Additionally, in a data analysis, we evaluate the associations between individual SNPs in pro-inflammatory and immunoregulatory genes and the risk of non-Hodgkin lymphoma. Results: The power of RF is highest in all simulation models, that of MCLR is similar to RF in half, and that of MDR is consistently the lowest. Conclusions: Our study indicates that the power of RF VIMs is most reliable. However, in addition to tuning parameters, the power of RF is notably influenced by the type of variable (continuous vs. categorical) and the chosen VIM.
orthogonal components. Informally, iter is one component, and the asynchronouyenfemble in which computers are embedded is the other. By computer we mean any activity locus, :!ny active synchronous device-any device that does (in effect) one thing at a time. An asynchronous ensemble is a communicating collection of activities.A multiprocessor is an asynchronous ensemble; so is a desktop computer system. The basic computet (processor and memory) and the devices are separate activities. For our purposes, a human user is just .mother activity' locus, and a user-pltis-computcr equals an asynchronous ensemble.We have argued these two basic componentscoordination or the ensemble, and compt4tation or tiic activity locus--are orthogonal.' By orthogonal, we mean we can develop independent models for each component. As long as both models are fully general in their own domains, any coordination model plus any computation model equals a complete model-a model of everything in computing. Over the last rwo decades, we have developed a coordination model (Linda, or tuple spaces), a computation model (symmetric languages) and another coordination model (lifestreams). Together, these three are a model o^more than everything because the two coordination models are complementary in one sense, competitive in another. But no model of this sort can claim to be the definitive one. Designers seek a good combination ot simplicity and power; in other words, elegance. Only the field as a whole can determine whether they have succeeded.We will briefly discuss each of these three systems and the ways in which they (purpottedly) add up to a computational model of everything. The Tuple Space Coordination Model (and the coordination-based view of computing)Our original problem was this: Given a collection of concurrently active processes or programs, supply a model that can turn the collection into an ensemble-a distributed system ot a parallel application or some other kind of ensemble. (A distributed system ordinarily copes with networked resources, for example, moves mail or files; a parallel application solves a compute-intensive problem tast by focusing many processors on it simultaneously. There is no logical difference between the two species. They are each an asynchronous ensemble.)Our goal was to provide communication through A claim made originally by the authors in I 'W2.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.