Recently, different studies have demonstrated the use of co-clustering, a data mining technique which simultaneously produces row-clusters of observations and column-clusters of features. The present work introduces a novel co-clustering model to easily summarize textual data in a document-term format. In addition to highlighting homogeneous co-clusters as other existing algorithms do we also distinguish noisy co-clusters from significant co-clusters, which is particularly useful for sparse document-term matrices. Furthermore, our model proposes a structure among the significant co-clusters, thus providing improved interpretability to users. The approach proposed contends with state-of-theart methods for document and term clustering and offers user-friendly results.The model relies on the Poisson distribution and on a constrained version of the Latent Block Model, which is a probabilistic approach for co-clustering. A Stochastic Expectation-Maximization algorithm is proposed to run the model's inference as well as a model selection criterion to choose the number of coclusters. Both simulated and real data sets illustrate the efficiency of this model by its ability to easily identify relevant co-clusters.related to the field of e-Health.In [6], the authors describe the Biterm Topic Model (BTM). It outperforms LDA on short texts (such as instant messages and tweets) for which LDA performs poorly, due to the sparsity of the data. In [7], the authors propose another version of the BTM: they represent the biterms (word-pairs) as graphs and use a deep convolutional network to encode word 25 co-relationships.This work presents the Self-Organised Co-Clustering model (SOCC). It aims at providing a tool to summarize large document-term matrices, whose rows correspond to documents and columns correspond to terms. The clustering ap-2 proach, which forms homogeneous groups of observations (documents in this 30 case), is a useful unsupervised technique with proven efficiency in several domains. However, in high-dimensional and sparse contexts, they are sometimes less adapted and difficult to interpret. When considering such data sets, coclustering, which groups observations and features simultaneously, turns out to be more efficient. It exploits the dualism between rows and columns and the 35 data set is summarized in blocks (the crossing of a row-cluster and a columncluster). The clusters of documents help in finding similar documents while the clusters of terms tell us what the clusters of documents are about. In this context, our work helps in finding similar documents and their interaction with term clusters.
40The co-clustering task can be done in several ways. For example, in [8], the authors describe an original approach that uses optimal transport theory to co-cluster continuous data. However, we mostly distinguish between two kinds of co-clustering approaches. Matrix factorization based methods, e.g.[9, 10], consist of factorizing the N × J data matrix x into three matrices a (of 45 size N × G), b (size G × H) and c (...