Gene Prioritization by Compressive Data Fusion and Chaining

Žitnik, Marinka; Nam, Edward A.; Dinh, Christopher; Kuspa, Adam; Shaulsky, Gad; Zupan, Blaž

doi:10.1371/journal.pcbi.1004552

Cited by 22 publications

(20 citation statements)

References 43 publications

(50 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In particular, we compared Mashup to a recently proposed matrix factorization-based approach, CMF (Žitnik et al, 2015), which views heterogeneous data matrices as relations between different object types that can be approximated via a low-lank factorization. While straightforward CMF has limited use of network data as additional constraints on the parameters to be learned, we considered a favorably modified CMF that directly factorizes the network data (i.e., more similar to Mashup) and found that Mashup significantly outperforms this approach as well (Figure S1).…”

Section: Resultsmentioning

confidence: 99%

Compact Integration of Multi-Network Topology for Functional Analysis of Genes

Cho¹,

2016

View full text Add to dashboard Cite

Summary The topological landscape of molecular or functional interaction networks provides a rich source of information for inferring functional patterns of genes or proteins. However, a pressing yet unsolved challenge is how to combine multiple heterogeneous networks, each having different connectivity patterns, to achieve more accurate inference. Here we describe the Mashup framework for scalable and robust network integration. In Mashup, the diffusion in each network is first analyzed to characterize the topological context of each node. Next, the high-dimensional topological patterns in individual networks are canonically represented using low-dimensional vectors, one per gene or protein. These vectors can then be plugged into off-the-shelf machine learning methods to derive functional insights about genes or proteins. We present tools based on Mashup that achieve state-of-the-art performance in three diverse functional inference tasks: protein function prediction, gene ontology reconstruction, and genetic interaction prediction. Mashup enables deeper insights into the structure of rapidly accumulating, diverse biological network data and can be broadly applied to other network science domains.

show abstract

Section: Resultsmentioning

confidence: 99%

Compact Integration of Multi-Network Topology for Functional Analysis of Genes

Cho¹,

2016

View full text Add to dashboard Cite

show abstract

“…The overall goal is to identify these genes and, in a second step, experimentally validate these genes only. Many different computational methods that use different algorithms, datasets, and strategies have been developed [195,224,226,229,230,231,232,233,234]. Some of these approaches have been implemented as publicly available tools and several of these approaches have been experimentally validated [195,226,230,231,227].…”

Section: Protein Function Predictionmentioning

confidence: 99%

Machine learning for integrating data in biology and medicine: Principles, practice, and opportunities

Žitnik

Nguyen

Wang³

et al. 2019

Information Fusion

441

278

View full text Add to dashboard Cite

New technologies have enabled the investigation of biology and human health at an unprecedented scale and in multiple dimensions. These dimensions include a myriad of properties describing genome, epigenome, transcriptome, microbiome, phenotype, and lifestyle. No single data type, however, can capture the complexity of all the factors relevant to understanding a phenomenon such as a disease. Integrative methods that combine data from multiple technologies have thus emerged as critical statistical and computational approaches. The key challenge in developing such approaches is the identification of effective models to provide a comprehensive and relevant systems view. An ideal method can answer a biological or medical question, identifying important features and predicting outcomes, by harnessing heterogeneous data across several dimensions of biological variation. In this Review, we describe the principles of data integration and discuss current methods and available implementations. We provide examples of successful data integration in biology and medicine. Finally, we discuss current challenges in biomedical integrative methods and our perspective on the future development of the field.

show abstract

“…The question of distinguishing different semantics that exist within biomedical data systems remains largely unexplored. Two notable exceptions include a meta-path-based approach for gene–disease link prediction in heterogeneous networks ( Himmelstein and Baranzini, 2015 ) and a latent-chain-based approach for gene prioritization ( Zitnik et al , 2015 ). These approaches, however, are algorithmically different.…”

Section: Related Workmentioning

confidence: 99%

“…Challenges in the joint consideration of systems of datasets, such as that in Figure 1 , include inferring accurate models to predict disease traits and outcomes, elucidating important disease genes and generating insight into the genetic underpinnings of complex diseases ( Barabási et al , 2011 ; Han et al , 2013 ; Ruffalo et al , 2015 ; Taşan et al , 2015 ). We would like these models to collectively consider the breadth of available data, from whole-genome sequencing to transcriptomic, methylomic and metabolic data ( Navlakha and Kingsford, 2010 ; Greene et al , 2015 ; Zitnik et al , 2015 ). A major barrier preventing existing methods from fully exploiting entire data collections is that individual datasets usually cannot be directly related to each other.…”

Section: Introductionmentioning

confidence: 99%

Jumping across biomedical contexts using compressive data fusion

Žitnik

Zupan

2016

Bioinformatics

View full text Add to dashboard Cite

Motivation: The rapid growth of diverse biological data allows us to consider interactions between a variety of objects, such as genes, chemicals, molecular signatures, diseases, pathways and environmental exposures. Often, any pair of objects—such as a gene and a disease—can be related in different ways, for example, directly via gene–disease associations or indirectly via functional annotations, chemicals and pathways. Different ways of relating these objects carry different semantic meanings. However, traditional methods disregard these semantics and thus cannot fully exploit their value in data modeling.Results: We present Medusa, an approach to detect size-k modules of objects that, taken together, appear most significant to another set of objects. Medusa operates on large-scale collections of heterogeneous datasets and explicitly distinguishes between diverse data semantics. It advances research along two dimensions: it builds on collective matrix factorization to derive different semantics, and it formulates the growing of the modules as a submodular optimization program. Medusa is flexible in choosing or combining semantic meanings and provides theoretical guarantees about detection quality. In a systematic study on 310 complex diseases, we show the effectiveness of Medusa in associating genes with diseases and detecting disease modules. We demonstrate that in predicting gene–disease associations Medusa compares favorably to methods that ignore diverse semantic meanings. We find that the utility of different semantics depends on disease categories and that, overall, Medusa recovers disease modules more accurately when combining different semantics.Availability and implementation: Source code is at http://github.com/marinkaz/medusaContact: marinka@cs.stanford.edu, blaz.zupan@fri.uni-lj.si

show abstract

Gene Prioritization by Compressive Data Fusion and Chaining

Cited by 22 publications

References 43 publications

Compact Integration of Multi-Network Topology for Functional Analysis of Genes

Compact Integration of Multi-Network Topology for Functional Analysis of Genes

Machine learning for integrating data in biology and medicine: Principles, practice, and opportunities

Jumping across biomedical contexts using compressive data fusion

Contact Info

Product

Resources

About