2013
DOI: 10.21236/ada599141
|View full text |Cite
|
Sign up to set email alerts
|

Greedy Learning of Graphical Models with Small Girth

Abstract: Abstract-This paper develops two new greedy algorithms for learning the Markov graph of discrete probability distributions, from samples thereof. For finding the neighborhood of a node (i.e. variable), the simple, naive greedy algorithm iteratively adds the new node that gives the biggest improvement in prediction performance over the existing set. While fast to implement, this can yield incorrect graphs when there are many short cycles, as now the single node that gives the best prediction can be outside the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2013
2013
2017
2017

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(15 citation statements)
references
References 7 publications
0
15
0
Order By: Relevance
“…A famous early example of such an algorithmic result is due to Chow and Liu from 1968 [CL68] who gave an efficient algorithm for learning graphical models where the underlying graph is a tree. Subsequent work considered generalizations of trees [ATHW11] and graphs under various strong assumptions (e.g., restricted strong convexity [NRWY10] or correlation decay [BMS13,RSS12]).…”
Section: Introductionmentioning
confidence: 99%
“…A famous early example of such an algorithmic result is due to Chow and Liu from 1968 [CL68] who gave an efficient algorithm for learning graphical models where the underlying graph is a tree. Subsequent work considered generalizations of trees [ATHW11] and graphs under various strong assumptions (e.g., restricted strong convexity [NRWY10] or correlation decay [BMS13,RSS12]).…”
Section: Introductionmentioning
confidence: 99%
“…The summations indexed by j / ∈ {u, v} are over nodes in the size d+1 clique under consideration. The last inequality follows by observing that the largest value is achieved in (22) when σ (l−1) v = −1 and j / ∈{u,v} σ (l−1) j → −∞. By symmetry the same bound holds for the ratio of conditional probabilities of σ u = −1.…”
Section: A Bound On Kl Divergencementioning
confidence: 81%
“…It was first observed in [6] that it is possible to efficiently learn models with (exponential) decay of correlations, under the additional assumption that neighboring variables have correlation bounded away from zero. A variety of other papers including [21], [22], [23], [24] give alternative low-complexity algorithms, but also require the CDP. A number of structure learning algorithms are based on convex optimization, such as Ravikumar et al's [25] approach using regularized node-wise logistic regression.…”
Section: A Complexity Of Graphical Model Learningmentioning
confidence: 99%
“…• From (19), the KL divergence from a single-edge graph to the empty graph is upper bounded by λ tanh λ. Using this fact along with (17), any graph in T has a KL divergence to the empty graph of at most = αλ tanh λ. Combining these with (12) gives the necessary condition…”
Section: Ensemble1(α) [Isolated Edges Ensemble]mentioning
confidence: 99%
“…• The total number of possible edges is α m 2 , and hence the total number of graphs is |T | = 2 α( m 2 ) . • The maximal degree of each graph is at most m − 1. due to (17). Substituting these into (12), setting q max = θ 2 α m 2 for some θ 2 ∈ 0, 1 2 , and applying some simplifications, we obtain the following necessary condition for P e (q max ) ≤ δ:…”
Section: Ensemble1(α) [Isolated Edges Ensemble]mentioning
confidence: 99%