The cooperative hierarchical structure is a common and significant data structure observed in, or adopted by, many research areas, such as: text mining (author-paper-word) and multi-label classification (label-instance-feature).Renowned Bayesian approaches for cooperative hierarchical structure modeling are mostly based on topic models. However, these approaches suffer from a serious issue in that the number of hidden topics/factors needs to be fixed in advance and an inappropriate number may lead to overfitting or underfitting.One elegant way to resolve this issue is Bayesian nonparametric learning, but existing work in this area still cannot be applied to cooperative hierarchical structure modeling.In this paper, we propose a cooperative hierarchical Dirichlet process (CHDP) to fill this gap. Each node in a cooperative hierarchical structure is assigned a Dirichlet process to model its weights on the infinite hidden factors/topics. Together with measure inheritance from hierarchical Dirichlet process, two kinds of measure cooperation, i.e., superposition and maximization, are defined to capture the many-to-many relationships in the cooperative hierarchical structure. Furthermore, two constructive representations for CHDP, i.e., stick-breaking and international restaurant process, are designed to facilitate the model inference. Experiments on synthetic and real-world data with cooperative hierarchical structures demonstrate the properties and the ability of CHDP for cooperative hierarchical structure modeling and its potential for practical application scenarios.A hierarchical structure has multiple layers, and each layer contains a number of nodes that are linked to the nodes in the higher and lower layers, as illustrated in Figure 1. This kind of structure is very common and pervasive, and has been adopted in many different sub-fields in the artificial intelligence 5 area. One example of such structure is found in text mining. Consider all the papers in a scientific journal (e.g., Artificial Intelligence). An author-paper-word[1] hierarchical structure emerges, given each author writes and publishes a number of scientific papers in this journal, and each paper is composed of several different words. Learning from author-paper-word structure is useful for collab-10 orators' recommendations, authors disambiguation, paper clustering, statistical machine translation [2], and so on. Another example occurs within image processing. The scene-image-feature hierarchical structure is formed because each image may belong to several scenes, such as beach or urban [3], and an image is also described by an abundance of features, such as grayscale and texture.
15Learning from scene-image-feature structure could at least benefit image search and context-sensitive image enhancement.Current state-of-the-art Bayesian approaches to learn from this hierarchical structure are mainly based on topic models [4, 5] that are a kind of probabilistic graphical models [6] and were originally designed for modeling a two-level hier-20 ar...