Abstract-This paper develops two new greedy algorithms for learning the Markov graph of discrete probability distributions, from samples thereof. For finding the neighborhood of a node (i.e. variable), the simple, naive greedy algorithm iteratively adds the new node that gives the biggest improvement in prediction performance over the existing set. While fast to implement, this can yield incorrect graphs when there are many short cycles, as now the single node that gives the best prediction can be outside the neighborhood.Our new algorithms get around this in two different ways. The forward-backward greedy algorithm includes a deletion step, which goes back and prunes incorrect nodes that may have initially been added. The recursive greedy algorithm uses forward steps in a two-level process, running greedy iterations in an inner loop, but only including the final node. We show, both analytically and empirically, that these algorithms can learn graphs with small girth which other algorithms -both greedy, and those based on convex optimization -cannot.