2016 IEEE International Symposium on Information Theory (ISIT) 2016
DOI: 10.1109/isit.2016.7541627
|View full text |Cite
|
Sign up to set email alerts
|

Active learning for community detection in stochastic block models

Abstract: Abstract-The stochastic block model (SBM) is an important generative model for random graphs in network science and machine learning, useful for benchmarking community detection (or clustering) algorithms. The symmetric SBM generates a graph with 2n nodes which cluster into two equally sized communities. Nodes connect with probability p within a community and q across different communities. We consider the case of p = a ln(n)/n and q = b ln(n)/n. In this case, it was recently shown that recovering the communit… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
8
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(10 citation statements)
references
References 16 publications
2
8
0
Order By: Relevance
“…We propose two active learning algorithms for the GBM that exactly recover the community memberships with high probability using a sub-linear number of queries, even in regimes below the limit of the state-of-the-art unsupervised algorithm in (Galhotra et al 2019). Similar to the result of (Gadde et al 2016) in the SBM, our results offer a smooth trade-off between query complexity and clustering accuracy in the GBM. Both algorithms exploit the idea of motifcounting to remove cross-cluster edges, while combining it with active learning in a different way.…”
Section: Contributionssupporting
confidence: 68%
See 1 more Smart Citation
“…We propose two active learning algorithms for the GBM that exactly recover the community memberships with high probability using a sub-linear number of queries, even in regimes below the limit of the state-of-the-art unsupervised algorithm in (Galhotra et al 2019). Similar to the result of (Gadde et al 2016) in the SBM, our results offer a smooth trade-off between query complexity and clustering accuracy in the GBM. Both algorithms exploit the idea of motifcounting to remove cross-cluster edges, while combining it with active learning in a different way.…”
Section: Contributionssupporting
confidence: 68%
“…In the active learning framework, we are allowed to query node labels up to a budget constraint in order to improve overall classification accuracy. The authors of (Gadde et al 2016) showed that a sub-linear number of queries is sufficient to achieve exact recovery below the limit (in terms of difference between p and q) of unsupervised methods in the SBM, and that the number of queries needed for exact recovery depends on how far we are below such limit -hence providing a smooth trade-off between query complexity and clustering accuracy in the SBM.…”
Section: Introductionmentioning
confidence: 99%
“…In particular, the observable E = g(X N , L N ) can be described by an N × N binary random matrix where element (i, j) is given by E i,j = h(X i , X j , L i , L j ), with h(•) being a properly defined (possibly noisy) function. A renowned example is given by the SBM [9]- [13], a popular random graph model for community detection that generalizes the well known Erdös-Renyi model. In this case, h(X i , X j , L i , L j ) is a noisy binary function defined as…”
Section: Bayesian Classification Setupmentioning
confidence: 99%
“…On the theoretical front, a few recent works have studied the AL problem assuming an underlying statistical model and providing performance guarantees. However, such works focus on specific models, typically graph models such as the stochastic block model (SBM), and their performance analysis is only valid in the asymptotic regime where the number of data items goes to infinity [9]- [13]. As a result, while one would expect a significant performance boost when exploiting the knowledge of the underlying statistical model, their results tend to be pessimistic and with limited practical insight.…”
Section: Introductionmentioning
confidence: 99%
“…In this paper, we consider a particular model with latent community structure: the stochastic block model (SBM) proposed by Holland et al [19]. This model is a useful benchmark for some statistical tasks as recovering community (also called blocks or types in the sequel) structure in network science [14,15,23]. By block structure, we mean that the set of vertices in the graph is partitioned into subsets called blocks and nodes connect to each other with probabilities that depend only on their types, i.e.…”
Section: Introductionmentioning
confidence: 99%