It is now widely accepted that knowledge can be acquired from networks by clustering their vertices according to connection profiles. Many methods have been proposed and in this paper we concentrate on the Stochastic Block Model (SBM). The clustering of vertices and the estimation of SBM model parameters have been subject to previous work and numerous inference strategies such as variational Expectation Maximization (EM) and classification EM have been proposed. However, SBM still suffers from a lack of criteria to estimate the number of components in the mixture. To our knowledge, only one model based criterion, ICL, has been derived for SBM in the literature. It relies on an asymptotic approximation of the Integrated Complete-data Likelihood and recent studies have shown that it tends to be too conservative in the case of small networks. To tackle this issue, we propose a new criterion that we call ILvb, based on a non asymptotic approximation of the marginal likelihood. We describe how the criterion can be computed through a variational Bayes EM algorithm.
We show that, for every l, the family F l of circuits of length at least l satisfies the Erdős-Pósa property, with f (k) = 13l(k−1)(k−2)+(2l+3)(k−1), thereby sharpening a result of C. Thomassen. We obtain as a corollary that graphs without k disjoint circuits of length l or more have tree-width O(lk 2 ).
Published in at http://dx.doi.org/10.1214/10-AOAS382 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)International audienceComplex systems in nature and in society are often represented as networks, describing the rich set of interactions between objects of interest. Many deterministic and probabilistic clustering methods have been developed to analyze such structures. Given a network, almost all of them partition the vertices into disjoint clusters, according to their connection profile. However, recent studies have shown that these techniques were too restrictive and that most of the existing networks contained overlapping clusters. To tackle this issue, we present in this paper the Overlapping Stochastic Block Model. Our approach allows the vertices to belong to multiple clusters, and, to some extent, generalizes the well-known Stochastic Block Model [Nowicki and Snijders (2001)]. We show that the model is generically identifiable within classes of equivalence and we propose an approximate inference procedure, based on global and local variational techniques. Using toy data sets as well as the French Political Blogosphere network and the transcriptional network of Saccharomyces cerevisiae, we compare our work with other approaches
Mediation analysis aims at disentangling the effects of a treatment on an outcome through alternative causal mechanisms and has become a popular practice in biomedical and social science applications. The causal framework based on counterfactuals is currently the standard approach to mediation, with important methodological advances introduced in the literature in the last decade, especially for simple mediation, that is with one mediator at the time. Among a variety of alternative approaches, Imai et al. showed theoretical results and developed an R package to deal with simple mediation as well as with multiple mediation involving multiple mediators conditionally independent given the treatment and baseline covariates. This approach does not allow to consider the often encountered situation in which an unobserved common cause induces a spurious correlation between the mediators. In this context, which we refer to as mediation with uncausally related mediators, we show that, under appropriate hypothesis, the natural direct and joint indirect effects are non-parametrically identifiable. Moreover, we adopt the quasi-Bayesian algorithm developed by Imai et al. and propose a procedure based on the simulation of counterfactual distributions to estimate not only the direct and joint indirect effects but also the indirect effects through individual mediators. We study the properties of the proposed estimators through simulations. As an illustration, we apply our method on a real data set from a large cohort to assess the effect of hormone replacement treatment on breast cancer risk through three mediators, namely dense mammographic area, nondense area and body mass index.
Polymorphisms in DNA-or RNA-seq data lead to recognisable patterns in a de Bruijn graph representation of the reads obtained by sequencing. Such patterns have been called mouths, or bubbles in the literature. They correspond to two vertex-disjoint directed paths between a source s and a target t. Due to the high number of such bubbles that may be present in real data, their enumeration is a major issue concerning the efficiency of dedicated algorithms. We propose in this paper the first linear delay algorithm to enumerate all bubbles with a given source.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.