Accurate parameter estimation for Bayesian network classifiers using hierarchical Dirichlet processes

Petitjean, François; Buntine, Wray L.; Webb, Geoffrey I.; Zaidi, Nayyar Abbas

doi:10.1007/s10994-018-5718-0

Cited by 19 publications

(13 citation statements)

References 28 publications

(40 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Although more sophisticated smoothing methods such as Kneser-Ney [8] and Modified Kneser-Ney [2] have been used in language modelling for a long time, M-branch was the first hierarchical smoothing method for decision trees proposed in 2003 [6]. A recent smoothing method called Hierarchical Dirichlet Process (HDP) has had great success on language modelling [17] and Bayesian Network Classifiers [13], whereas it has not been used on decision trees. The following part introduces these methods in detail.…”

Section: Related Workmentioning

confidence: 99%

“…Equation 3 can also be explained by the Hierarchical Chinese Restaurant Process (CRP) [19]. Please refer to [13] for more detail of HDP smoothing on Bayesian Network Classifiers.…”

Section: Hdp Smoothingmentioning

confidence: 99%

“…The results demonstrate that M-branch performs better than M-estimation. Hierarchical Dirichlet Process (HDP) [13] can also be used to smooth the probability at the leaves with its parent, partially mimicking what is done in M-branch, but it uses fully Bayesian inference. A decision tree can be turned into a HDP model tree with each node in the tree associated with a Dirichlet Process (DP).…”

Section: Introductionmentioning

confidence: 99%

“…A decision tree can be turned into a HDP model tree with each node in the tree associated with a Dirichlet Process (DP). Similar HDP smoothing methods allow Bayesian network classifiers [13] and language models [17] to get state-of-the-art probability estimates but has not been applied to decision trees. [14] believe that a thorough study of what are the best smoothing methods for PETs would be a successful contribution to machine learning research, which is also the main aim of this research.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Hierarchical Gradient Smoothing for Probability Estimation Trees

Zhang

Petitjean

Buntine

2020

Advances in Knowledge Discovery and Data Mining

Self Cite

View full text Add to dashboard Cite

Decision trees are still seeing use in online, non-stationary and embedded contexts, as well as for interpretability. For applications like ranking and cost-sensitive classification, probability estimation trees (PETs) are used. These are built using smoothing or calibration techniques. Older smoothing techniques used counts local to a leaf node, but a few more recent techniques consider the broader context of a node when doing estimation. We apply a recent advanced smoothing method called Hierarchical Dirichlet Process (HDP) to PETs, and then propose a novel hierarchical smoothing approach called Hierarchical Gradient Smoothing (HGS) as an alternative. HGS smooths leaf nodes up to all the ancestors, instead of recursively smoothing to the parent used by HDP. HGS is made faster by efficiently optimizing the Leave-One-Out Cross-Validation (LOOCV) loss measure using gradient descent, instead of sampling used in HDP. An extensive set of experiments are conducted on 143 datasets showing that our HGS estimates are not only more accurate but also do so within a fraction of HDP time. Besides, HGS makes a single tree almost as good as a Random Forest with 10 trees. For applications that require more interpretability and efficiency, a single decision tree plus HGS is more preferred.

show abstract

Section: Related Workmentioning

confidence: 99%

“…Equation 3 can also be explained by the Hierarchical Chinese Restaurant Process (CRP) [19]. Please refer to [13] for more detail of HDP smoothing on Bayesian Network Classifiers.…”

Section: Hdp Smoothingmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Hierarchical Gradient Smoothing for Probability Estimation Trees

Zhang

Petitjean

Buntine

2020

Advances in Knowledge Discovery and Data Mining

Self Cite

View full text Add to dashboard Cite

show abstract

“…n ] is the observation over n random variables: x 1 ∼ X 1 , · · · , x n ∼ X n . Under this assumption, a Bayesian network can be formally described by B =< G, Θ G >, where G is a directed acyclic graph and Θ G the set of parameters that can maximize the likelihood [7,23]. The i-th node in G corresponds to a random variable X i , and an edge between two connected nodes indicates the direct dependency.…”

Section: Inference Graphmentioning

confidence: 99%

Visual Query Answering by Entity-Attribute Graph Matching and Reasoning

Xiong¹,

Zhan²,

Wang

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

Visual Query Answering (VQA) is of great significance in offering people convenience: one can raise a question for details of objects, or high-level understanding about the scene, over an image. This paper proposes a novel method to address the VQA problem. In contrast to prior works, our method that targets single scene VQA, replies on graphbased techniques and involves reasoning. In a nutshell, our approach is centered on three graphs. The first graph, referred to as inference graph G I , is constructed via learning over labeled data. The other two graphs, referred to as query graph Q and entity-attribute graph G EA , are generated from natural language query Q nl and image Img, that are issued from users, respectively. As G EA often does not take sufficient information to answer Q, we develop techniques to infer missing information of G EA with G I . Based on G EA and Q, we provide techniques to find matches of Q in G EA , as the answer of Q nl in Img. Unlike commonly used VQA methods that are based on end-to-end neural networks, our graph-based method shows well-designed reasoning capability, and thus is highly interpretable. We also create a dataset on soccer match (Soccer-VQA) with rich annotations. The experimental results show that our approach outperforms the state-of-the-art method and has high potential for future investigation.

show abstract