Over the last decade, document clustering, as one of the key tasks in information organization and navigation, has been widely studied. Many algorithms have been developed for addressing various challenges in document clustering and for improving clustering performance. However, relatively few research efforts have been reported on evaluating and understanding document clustering results. In this article, we present
DClusterE
, a comprehensive and effective framework for document clustering evaluation and understanding using information visualization.
DClusterE
integrates cluster validation with user interactions and offers rich visualization tools for users to examine document clustering results from multiple perspectives. In particular, through informative views including force-directed layout view, matrix view, and cluster view,
DClusterE
provides not only different aspects of document inter/intra-clustering structures, but also the corresponding relationship between clustering results and the ground truth. Additionally,
DClusterE
supports general user interactions such as zoom in/out, browsing, and interactive access of the documents at different levels. Two new techniques are proposed to implement
DClusterE
: (1) A novel multiplicative update algorithm (MUA) for matrix reordering to generate narrow-banded (or clustered) nonzero patterns from documents. Combined with coarse seriation, MUA is able to provide better visualization of the cluster structures. (2) A Mallows-distance-based algorithm for establishing the relationship between the clustering results and the ground truth, which serves as the basis for coloring schemes. Experiments and user studies are conducted to demonstrate the effectiveness and efficiency of
DClusterE
.