“…When the variable C is observed in the training data, naive Bayes can be used for classification, by assigning test example (X 1 , ···, X n ) to the class C with highest P(C|X 1 , ···, X n ) [2]. When C is unobserved, data points (X 1 , ···, X n ) can be clustered by applying the EM algorithm with C as the missing information, each value of C corresponds to a different cluster, and P(C|X 1 , ···, X n ) is the point's probability of membership in cluster C [14].…”