Deep collective matrix factorization for augmented multi-view learning

Mariappan, Ragunathan; Rajan, Vaibhav

doi:10.1007/s10994-019-05801-6

Cited by 15 publications

(15 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Representation learning from arbitrary collections of matrices have been studied in Collective Matrix Factorization (CMF) [Singh and Gordon, 2008], group-sparse CMF [Klami et al, 2014] and a neural approach Deep CMF [Mariappan and Rajan, 2019]. These approaches learn two latent factors for the row and column entities of each matrix, to reconstruct them.…”

Section: Related Workmentioning

confidence: 99%

“…Entity-specific representations may be learned by using N different autoencoders, one for each entity, where each autoencoder takes as input the concatenation of all matrices containing that entity. This approach is inadequate for matrices with different datatypes and sparsity levels as discussed in [Mariappan and Rajan, 2019]. Our approach addresses these problems but with higher computational cost.…”

Section: Neural Collective Multi-way Spectral Clustering Networkmentioning

confidence: 99%

See 1 more Smart Citation

Multi-way Clustering and Discordance Analysis through Deep Collective Matrix Tri-Factorization

Mariappan¹,

Rajan²

2021

Preprint

Self Cite

View full text Add to dashboard Cite

Heterogeneous multi-typed, multimodal relational data is increasingly available in many domains and their exploratory analysis poses several challenges. We advance the state-of-the-art in neural unsupervised learning to analyze such data. We design the first neural method for collective matrix tri-factorization of arbitrary collections of matrices to perform spectral clustering of all constituent entities and learn cluster associations. Experiments on benchmark datasets demonstrate its efficacy over previous non-neural approaches. Leveraging signals from multiway clustering and collective matrix completion we design a unique technique, called Discordance Analysis, to reveal information discrepancies across subsets of matrices in a collection with respect to two entities. We illustrate its utility in quality assessment of knowledge bases and in improving representation learning.Preprint. Under review.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Neural Collective Multi-way Spectral Clustering Networkmentioning

confidence: 99%

Multi-way Clustering and Discordance Analysis through Deep Collective Matrix Tri-Factorization

Mariappan¹,

Rajan²

2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…In this paper, we analyze patient representation learning in light of 2 recent advances in CMF and KG representation learning. A deep autoencoder-based architecture, called deep CMF (DCMF), was developed for CMF, which was found to outperform classical nonneural variants of CMF in several tasks [ 9 ]. Using DCMF, which provides a seamless way of integrating heterogeneous data, we evaluate the effectiveness of patient representations when the input data are augmented with additional information from literature-derived KGs.…”

Section: Introductionmentioning

confidence: 99%

“…A model for CMF based on deep learning was developed by Mariappan and Rajan [ 9 ], which is briefly described next. Given M matrices (indexed by m) that describe the relationships between E entities (indexed by e), each with dimension d e, DCMF jointly obtains latent representations of each entity U e and low-rank factorizations of each matrix such that U e =f θ ([C] (e) ), where f θ is an entity-specific nonlinear transformation, obtained through a neural network–based encoder with weights θ and [C] (e) denotes all matrices in the collection that contain a relationship of entity e. The entities corresponding to the rows and columns of the m th matrix are denoted by indices r m and c m , respectively.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Patient Representation Learning From Heterogeneous Data Sources and Knowledge Graphs Using Deep Collective Matrix Factorization: Evaluation Study

Kumar¹,

Shi²,

Mariappan³

et al. 2022

JMIR Med Inform

Self Cite

View full text Add to dashboard Cite

Background Patient representation learning aims to learn features, also called representations, from input sources automatically, often in an unsupervised manner, for use in predictive models. This obviates the need for cumbersome, time- and resource-intensive manual feature engineering, especially from unstructured data such as text, images, or graphs. Most previous techniques have used neural network–based autoencoders to learn patient representations, primarily from clinical notes in electronic medical records (EMRs). Knowledge graphs (KGs), with clinical entities as nodes and their relations as edges, can be extracted automatically from biomedical literature and provide complementary information to EMR data that have been found to provide valuable predictive signals. Objective This study aims to evaluate the efficacy of collective matrix factorization (CMF), both the classical variant and a recent neural architecture called deep CMF (DCMF), in integrating heterogeneous data sources from EMR and KG to obtain patient representations for clinical decision support tasks. Methods Using a recent formulation for obtaining graph representations through matrix factorization within the context of CMF, we infused auxiliary information during patient representation learning. We also extended the DCMF architecture to create a task-specific end-to-end model that learns to simultaneously find effective patient representations and predictions. We compared the efficacy of such a model to that of first learning unsupervised representations and then independently learning a predictive model. We evaluated patient representation learning using CMF-based methods and autoencoders for 2 clinical decision support tasks on a large EMR data set. Results Our experiments show that DCMF provides a seamless way for integrating multiple sources of data to obtain patient representations, both in unsupervised and supervised settings. Its performance in single-source settings is comparable with that of previous autoencoder-based representation learning methods. When DCMF is used to obtain representations from a combination of EMR and KG, where most previous autoencoder-based methods cannot be used directly, its performance is superior to that of previous nonneural methods for CMF. Infusing information from KGs into patient representations using DCMF was found to improve downstream predictive performance. Conclusions Our experiments indicate that DCMF is a versatile model that can be used to obtain representations from single and multiple data sources and combine information from EMR data and KGs. Furthermore, DCMF can be used to learn representations in both supervised and unsupervised settings. Thus, DCMF offers an effective way of integrating heterogeneous data sources and infusing auxiliary knowledge into patient representations.

show abstract