Deep learning has become a powerful paradigm to analyze the binding sites of regulatory factors including RNA-binding proteins (RBPs), owing to its strength to learn complex features from possibly multiple sources of raw data. However, the interpretability of these models, which is crucial to improve our understanding of RBP binding preferences and functions, has not yet been investigated in significant detail. We have designed a multitask and multimodal deep neural network for characterizing in vivo RBP targets. The model incorporates not only the sequence but also the region type of the binding sites as input, which helps the model to boost the prediction performance. To interpret the model, we quantified the contribution of the input features to the predictive score of each RBP. Learning across multiple RBPs at once, we are able to avoid experimental biases and to identify the RNA sequence motifs and transcript context patterns that are the most important for the predictions of each individual RBP. Our findings are consistent with known motifs and binding behaviors and can provide new insights about the regulatory functions of RBPs.
18RNA-binding proteins (RBPs) control and coordinate each stage in the life cycle of RNAs. 19 Although in vivo binding sites of RBPs can now be determined genome-wide, most studies 20 typically focused on individual RBPs. Here, we examined a large compendium of 114 high-21quality transcriptome-wide in vivo RBP-RNA cross-linking interaction datasets generated by the 22 same protocol in the same cell line and representing 64 distinct RBPs. Comparative analysis of 23 categories of target RNA binding preference, sequence preference, and transcript region 24 specificity was performed, and identified potential posttranscriptional regulatory modules, i.e. 25 specific combinations of RBPs that bind to specific sets of RNAs and targeted regions. These 26 regulatory modules encoded functionally related proteins and exhibited distinct differences in 27 RNA metabolism, expression variance, as well as subcellular localization. This integrative 28 investigation of experimental RBP-RNA interaction evidence and RBP regulatory function in a 29 human cell line will be a valuable resource for understanding the complexity of post-30 transcriptional regulation. 31
RNA-binding proteins (RBPs) control and coordinate each stage in the life cycle of RNAs. Although in vivo binding sites of RBPs can now be determined genome-wide, most studies typically focused on individual RBPs. Here, we examined a large compendium of 114 high-quality transcriptome-wide in vivo RBP–RNA cross-linking interaction datasets generated by the same protocol in the same cell line and representing 64 distinct RBPs. Comparative analysis of categories of target RNA binding preference, sequence preference, and transcript region specificity was performed, and identified potential posttranscriptional regulatory modules, i.e. specific combinations of RBPs that bind to specific sets of RNAs and targeted regions. These regulatory modules represented functionally related proteins and exhibited distinct differences in RNA metabolism, expression variance, as well as subcellular localization. This integrative investigation of experimental RBP–RNA interaction evidence and RBP regulatory function in a human cell line will be a valuable resource for understanding the complexity of post-transcriptional regulation.
Deep learning has become a powerful paradigm to analyze the binding sites of regulatory factors including RNA-binding proteins (RBPs), owing to its strength to learn complex features from possibly multiple sources of raw data. However, the interpretability of these models, which is crucial to improve our understanding of RBP binding preferences and functions, has not yet been investigated in significant detail. We have designed a multitask and multimodal deep neural network for characterizing in vivo RBP binding preferences. The model incorporates not only the sequence but also the region type of the binding sites as input, which helps the model to boost the prediction performance. To interpret the model, we quantified the contribution of the input features to the predictive score of each RBP. Learning across multiple RBPs at once, we are able to avoid experimental biases and to identify the RNA sequence motifs and transcript context patterns that are the most important for the predictions of each individual RBP. Our findings are consistent with known motifs and binding behaviors of RBPs and can provide new insights about the regulatory functions of RBPs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.