Deep neural networks for interpreting RNA binding protein target preferences

Ghanbari, Mahsa; Ohler, Uwe

doi:10.1101/518191

Cited by 10 publications

(22 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our work also demonstrates that the removal of border artefacts is crucial for an end-to-end learning system to learn non-trivial protein binding hypothesis. An interesting alternative to our de-biasing technique is the one introduced by Ghanbari and Ohler (2019), who formulate the RBP binding prediction problem as a multi-class classification, aiming to simultaneously predict the binding of all RBPs in a collection of CLIP-seq data sets, which combines multiple biased RBP dataset into one. If all datasets are equally affected by sequence biases introduced by the experimental protocol, then this bias is uninformative for the prediction task and should not significantly affect the training.…”

Section: Discussionmentioning

confidence: 99%

“…Integrated gradients (Sundararajan et al, 2017) are an effective approach to assign an "attribution score" Attr(i) to each position i of a given input sequence s, measuring the extent to which the nucleotide at that position contributes to the entire sequence's prediction score. Figure 2: Positive examples from the PAR-CLIP dataset can be biased at the beginning and at the end of the viewpoint regions, emitting an unusually high frequency of Guanine (Ghanbari and Ohler, 2019) and some correlated residuals, which can be revealed by aligning the viewpoint borders of all positive examples. Such pattern, however, does not seem to exist in the negative examples, e.g.…”

Section: Sequence and Secondary Structure Motif Extractionmentioning

confidence: 99%

“…Finally, another contribution of our paper is to quantify the impact of and rectify a specific type of sequence bias present in certain CLIP-Seq datasets, originally identified by Ghanbari and Ohler (2019), which artificially inflated the reported prediction accuracy of several approaches published recently.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Graph neural representational learning of RNA secondary structures for predicting RNA-protein interactions

Yan

Hamilton

Blanchette

2020

Preprint

View full text Add to dashboard Cite

Motivation: RNA-protein interactions are key effectors of post-transcriptional regulation. Significant experimental and bioinformatics efforts have been expended on characterizing protein binding mechanisms on the molecular level, and on highlighting the sequence and structural traits of RNA that impact the binding specificity for different proteins. Yet our ability to predict these interactions in silico remains relatively poor. Results: In this study, we introduce RPI-Net, a graph neural network approach for RNA-protein interaction prediction. RPI-Net learns and exploits a graph representation of RNA molecules, yielding significant performance gains over existing state-of-the-art approaches. We also introduce an approach to rectify particular type of sequence bias present in many CLIP-Seq data sets, and we show that correcting this bias is essential in order to learn meaningful predictors and properly evaluate their accuracy. Finally, we provide new approaches to interpret the trained models and extract simple, biologically-interpretable representations of the learned sequence and structural motifs. Availability: Source code can be accessed at https://www.github.com/HarveyYan/RNAonGraph. Contact:

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Sequence and Secondary Structure Motif Extractionmentioning

confidence: 99%

See 1 more Smart Citation

Graph neural representational learning of RNA secondary structures for predicting RNA-protein interactions

Yan

Hamilton

Blanchette

2020

Preprint

View full text Add to dashboard Cite

show abstract

“…Models that exploit the latter may not necessarily generalize well, Currently, the main approach to interpret a convolutional neural network (CNN) is to visualize learned representations in the input space. In genomics, such methods include visualizing the convolutional filters (Alipanahi et al, 2015;Kelley et al, 2016;Quang & Xie, 2016;Angermueller et al, 2016;Cuperus et al, 2017;Chen et al, 2018;Ben-Bassat et al, 2018;Wang et al, 2018), attribution methods (Alipanahi et al, 2015;Zhou & Troyanskaya, 2015;Kelley et al, 2016;Shrikumar et al, 2017;Ghanbari & Ohler, 2019), and more recently in silico experiments (Koo et al, 2018;Avsec et al, 2019). These approaches can be grouped into local and global interpretability methods.…”

Section: Overviewmentioning

confidence: 99%

“…For example, gradients (from predictions to the inputs) have been employed to reveal known transcription factor (TF) binding sites when trained to predict read profiles from high-throughput sequencing datasets (Kelley et al, 2018). Integrated gradients were used to uncover motifs for RNAprotein interactions (Ghanbari & Ohler, 2019). Recently, DeepLift was used to uncover known and novel TF binding sites, including their syntax with respect to other binding sites .…”

Section: Local Interpretabilitymentioning

confidence: 99%

Interpreting Deep Neural Networks Beyond Attribution Methods: Quantifying Global Importance of Genomic Features

Koo

Ploenzke

2020

Preprint

View full text Add to dashboard Cite

Despite deep neural networks (DNNs) having found great success at improving performance on various prediction tasks in computational genomics, it remains difficult to understand why they make any given prediction. In genomics, the main approaches to interpret a high-performing DNN are to visualize learned representations via weight visualizations and attribution methods. While these methods can be informative, each has strong limitations. For instance, attribution methods only uncover the independent contribution of single nucleotide variants in a given sequence.Here we discuss and argue for global importance analysis which can quantify population-level importance of putative features and their interactions learned by a DNN. We highlight recent work that has benefited from this interpretability approach and then discuss connections between global importance analysis and causality.

show abstract

Long-read RNA sequencing of human and animal filarial parasites improves gene models and discovers operons

Wheeler

Airs

Zamanian

2020

Preprint

View full text Add to dashboard Cite

Filarial nematodes (Filarioidea) cause substantial disease burden to humans and animals around the world. Recently there has been a coordinated global effort to generate and curate genomic data from nematode species of medical and veterinary importance. This has resulted in two chromosome-level assemblies ( Brugia malayi and Onchocerca volvulus ) and 10 additional draft genomes from Filarioidea. These reference assemblies facilitate comparative genomics to explore basic helminth biology and prioritize new drug and vaccine targets. While the continual improvement of genome contiguity and completeness advances these goals, experimental functional annotation of genes is often hindered by poor gene models. Short-read RNA sequencing data and expressed sequence tags, in cooperation with ab initio prediction algorithms, are employed for gene prediction, but these can result in missing clade-specific genes, fragmented models, imperfect mapping of gene ends, and lack of isoform resolution.Long-read RNA sequencing can overcome these drawbacks and greatly improve gene model quality. Here, we present Iso-Seq data for B. malayi and Dirofilaria immitis , etiological agents of lymphatic filariasis and canine heartworm disease, respectively. These data cover approximately half of the known coding genomes and substantially improve gene models by extending untranslated regions, cataloging novel splice junctions from novel isoforms, and correcting mispredicted junctions. Furthermore, we validated computationally predicted operons, identified new operons, and merged fragmented gene models. We carried out analyses of poly(A) tails in both species, leading to the identification of non-canonical poly(A) signals.Finally, we prioritized and assessed known and putative anthelmintic targets, correcting or validating gene models for molecular cloning and target-based antiparasitic screening efforts.Overall, these data significantly improve the catalog of gene models for two important parasites, and they demonstrate how long-read RNA sequencing should be prioritized for future improvement of parasitic nematode genome assemblies. Author SummaryReference genomes for parasitic nematodes are important resources that enable the study of nematode evolution and molecular biology, and they also hold promise for hastening the development of chemotherapeutics to treat parasitic diseases. Recent years have seen an explosion in the availability of reference genomes for filarial worms, which cause diseases in both humans and animals, but much work remains to be done in order to fully potentiate the true utility of these resources. We carried out long-read RNA sequencing of Brugia malayi and Dirofilaria immitis , two important filarial worms that cause lymphatic filariasis and canine heartworm disease, respectively. We used these RNA sequencing data to correct many errors in the gene models of the reference genomes of these two species, and we also carried out novel analyses of poly(A) tails and operons. These datasets will greatly improve the B. malayi a...

show abstract

Deep neural networks for interpreting RNA binding protein target preferences

Cited by 10 publications

References 37 publications

Graph neural representational learning of RNA secondary structures for predicting RNA-protein interactions

Graph neural representational learning of RNA secondary structures for predicting RNA-protein interactions

Interpreting Deep Neural Networks Beyond Attribution Methods: Quantifying Global Importance of Genomic Features

Long-read RNA sequencing of human and animal filarial parasites improves gene models and discovers operons

Contact Info

Product

Resources

About