Efficient learning of hierarchical latent class models

Zhang, N.L.; Kocka, Tomas

doi:10.1109/ictai.2004.55

Cited by 35 publications

(43 citation statements)

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To answer this question, SHC was tested on one of the synthetic data sets mentioned in Section 5.1 [20]. SHC turned out to be 22 times faster than DHC, and it obtained the same model as DHC.…”

Section: Empirical Results With Shc and Hshcmentioning

confidence: 99%

“…Can HSHC find high quality models? To answer those questions, experiments were conducted on five synthetic data sets with 6, 9, 12, 15 and 18 manifest variables, respectively and 10,000 records [20]. For the top-K scheme in HSHC, three values were used for K, namely 1, 2 and 3.…”

Section: Empirical Results With Shc and Hshcmentioning

confidence: 99%

“…Three search-based algorithms have been proposed for this task, namely DHC (double hill-climbing) [13,14], SHC (single hill-climbing) [20], and HSHC (heuristic single hill-climbing) [20]. All those algorithms aim at finding the model with the highest BIC score [21].…”

Section: Learning Latent Tree Modelsmentioning

confidence: 99%

See 2 more Smart Citations

Latent tree models and diagnosis in traditional Chinese medicine

Zhang

Yuan²,

Chen

et al. 2008

Artificial Intelligence in Medicine

100

View full text Add to dashboard Cite

SummaryObjective: TCM (traditional Chinese medicine) is an important avenue for disease prevention and treatment for the Chinese people and is gaining popularity among others. However, many remain skeptical and even critical of TCM because of a number of its shortcomings. One key shortcoming is the lack of objective diagnosis standards. We endeavor to alleviate this shortcoming using machine learning techniques. Method: TCM diagnosis consists of two steps, patient information gathering and syndrome differentiation. We focus on the latter. When viewed as a black box, syndrome differentiation is simply a classifier that classifies patients into different classes based on their symptoms. A fundamental question is: do those classes exist in reality? To seek an answer to the question from the machine learning perspective, one would naturally use cluster analysis. Previous clustering methods are unable to cope with the complexity of TCM. We have therefore developed a new clustering method in the form of latent tree models. We have conducted a case study where we first collected a data set about a TCM domain called KIDNEY DEFICIENCY and then used latent tree models to analyze the data set. Results: Our analysis has found natural clusters in the data set that correspond well to TCM syndrome types. This is an important discovery because (1) it provides statistical validation to TCM syndrome types and (2) it suggests the possibility of establishing objective and quantitative diagnosis standards for syndrome differentiation. In this paper, we provide a summary of research work on latent tree models and report the aforementioned case study. #

show abstract

Section: Empirical Results With Shc and Hshcmentioning

confidence: 99%

Section: Empirical Results With Shc and Hshcmentioning

confidence: 99%

See 1 more Smart Citation

Latent tree models and diagnosis in traditional Chinese medicine

Zhang

Yuan²,

Chen

et al. 2008

Artificial Intelligence in Medicine

100

View full text Add to dashboard Cite

show abstract

“…The algorithms we consider are abbreviated in this section as follows: algorithm IND generates baseline results using the model in which all variables are independent of each other. Algorithm ZHANG is the compiled Java code provided by N. L. Zhang implementing the method described in Zhang and Kočka (2004). Algorithm LCM estimates a non-hierarchical Latent Class Model inferring a single latent variable.…”

Section: Methodsmentioning

confidence: 99%

Greedy Learning of Binary Latent Trees

Harmeling

Williams

2011

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

Abstract-Inferring latent structures from observations helps to model and possibly also understand underlying data generating processes. A rich class of latent structures are the latent trees, i.e. tree-structured distributions involving latent variables where the visible variables are leaves. These are also called hierarchical latent class (HLC) models. Zhang (2004) proposed a search algorithm for learning such models in the spirit of Bayesian network structure learning. While such an approach can find good solutions it can be computationally expensive. As an alternative we investigate two greedy procedures: the BIN-G algorithm determines both the structure of the tree and the cardinality of the latent variables in a bottom-up fashion. The BIN-A algorithm first determines the tree structure using agglomerative hierarchical clustering, and then determines the cardinality of the latent variables as for BIN-G. We show that even with restricting ourselves to binary trees we obtain HLC models of comparable quality to Zhang's solutions (in terms of cross-validated log-likelihood), while being generally faster to compute. This claim is validated by a comprehensive comparison on several datasets. Furthermore, we demonstrate that our methods are able to estimate interpretable latent structures on real-world data with a large number of variables. By applying our method to a restricted version of the 20 newsgroups data these models turn out to be related to topic models, and on data from the PASCAL Visual Object Classes (VOC) 2007 challenge we show how such tree-structured models help us understand how objects co-occur in images. For reproducibility of all experiments in this paper, all code and datasets (or links to data) is available 1 .

show abstract

“…It can deal with data sets with about one dozen manifest variables. Zhang and Kočka [8] recently proposed another algorithm called heuristic single hill-climbing (HSHC). HSHC combines the two search routines of DHC into one and incorporates the idea of structural EM [2] to reduce the time spent in parameter optimization.…”

Section: Hirarchical Latent Class Modelsmentioning

confidence: 99%

Discovering Latent Structures: Experience with the CoIL Challenge 2000 Data Set

Zhang

2007

Computational Science – ICCS 2007

View full text Add to dashboard Cite

Abstract. We present a case study to demonstrate the possibility of discovering complex and interesting latent structures using hierarchical latent class (HLC) models. A similar effort was made earlier [6], but that study involved only small applications with 4 or 5 observed variables. Due to recent progress in algorithm research, it is now possible to learn HLC models with dozens of observed variables. We have successfully analyzed a version the CoIL Challenge 2000 data set that consists of 42 observed variable. The model obtained consists of 22 latent variables, and its structure is intuitively appealing.

show abstract

Efficient learning of hierarchical latent class models

Abstract: Hierarchical latent class (HLC)

Cited by 35 publications

References 2 publications

Latent tree models and diagnosis in traditional Chinese medicine

Latent tree models and diagnosis in traditional Chinese medicine

Greedy Learning of Binary Latent Trees

Discovering Latent Structures: Experience with the CoIL Challenge 2000 Data Set

Contact Info

Product

Resources

About