The immense increase in the generation of genomic scale data poses an unmet analytical challenge, due to a lack of established methodology with the required flexibility and power. We propose a first principled approach to statistical analysis of sequence-level genomic information. We provide a growing collection of generic biological investigations that query pairwise relations between tracks, represented as mathematical objects, along the genome. The Genomic HyperBrowser implements the approach and is available at http://hyperbrowser.uio.no.
Identification of three-dimensional (3D) interactions between regulatory elements across the genome is crucial to unravel the complex regulatory machinery that orchestrates proliferation and differentiation of cells. ChIA-PET is a novel method to identify such interactions, where physical contacts between regions bound by a specific protein are quantified using next-generation sequencing. However, determining the significance of the observed interaction frequencies in such datasets is challenging, and few methods have been proposed. Despite the fact that regions that are close in linear genomic distance have a much higher tendency to interact by chance, no methods to date are capable of taking such dependency into account. Here, we propose a statistical model taking into account the genomic distance relationship, as well as the general propensity of anchors to be involved in contacts overall. Using both real and simulated data, we show that the previously proposed statistical test, based on Fisher's exact test, leads to invalid results when data are dependent on genomic distance. We also evaluate our method on previously validated cell-line specific and constitutive 3D interactions, and show that relevant interactions are significant, while avoiding over-estimating the significance of short nearby interactions.
We propose an adaptive independent Metropolis-Hastings algorithm with the ability to learn from all previous proposals in the chain except the current location. It is an extension of the independent Metropolis-Hastings algorithm. Convergence is proved provided a strong Doeblin condition is satisfied, which essentially requires that all the proposal functions have uniformly heavier tails than the stationary distribution. The proof also holds if proposals depending on the current state are used intermittently, provided the information from these iterations is not used for adaption. The algorithm gives samples from the exact distribution within a finite number of iterations with probability arbitrarily close to 1. The algorithm is particularly useful when a large number of samples from the same distribution is necessary, like in Bayesian estimation, and in CPU intensive applications like, for example, in inverse problems and optimization.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.