Machine Learning-Based State-of-the-Art Methods for the Classification of RNA-Seq Data

Jabeen, Almas; Ahmad, Nadeem; Raza, Khalid

doi:10.1007/978-3-319-65981-7_6

Cited by 31 publications

(13 citation statements)

References 65 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, DeepCpG utilizes both DNA sequence patterns and neighboring methylation states for predicting single-cell methylation state and modeling the sources of DNA methylation variability ( Angermueller et al, 2017 ). However, the deep learning methods usually run as a “black box”, which is hard to interpret ( Almas Jabeen and Raza, 2017 ). Great efforts have been made to improve the interpretability of deep learning models.…”

Section: Introductionmentioning

confidence: 99%

MultiCapsNet: A General Framework for Data Integration and Interpretable Classification

et al. 2022

View full text Add to dashboard Cite

The latest progresses of experimental biology have generated a large number of data with different formats and lengths. Deep learning is an ideal tool to deal with complex datasets, but its inherent “black box” nature needs more interpretability. At the same time, traditional interpretable machine learning methods, such as linear regression or random forest, could only deal with numerical features instead of modular features often encountered in the biological field. Here, we present MultiCapsNet (https://github.com/wanglf19/MultiCapsNet), a new deep learning model built on CapsNet and scCapsNet, which possesses the merits such as easy data integration and high model interpretability. To demonstrate the ability of this model as an interpretable classifier to deal with modular inputs, we test MultiCapsNet on three datasets with different data type and application scenarios. Firstly, on the labeled variant call dataset, MultiCapsNet shows a similar classification performance with neural network model, and provides importance scores for data sources directly without an extra importance determination step required by the neural network model. The importance scores generated by these two models are highly correlated. Secondly, on single cell RNA sequence (scRNA-seq) dataset, MultiCapsNet integrates information about protein-protein interaction (PPI), and protein-DNA interaction (PDI). The classification accuracy of MultiCapsNet is comparable to the neural network and random forest model. Meanwhile, MultiCapsNet reveals how each transcription factor (TF) or PPI cluster node contributes to classification of cell type. Thirdly, we made a comparison between MultiCapsNet and SCENIC. The results show several cell type relevant TFs identified by both methods, further proving the validity and interpretability of the MultiCapsNet.

show abstract

Section: Introductionmentioning

confidence: 99%

MultiCapsNet: A General Framework for Data Integration and Interpretable Classification

et al. 2022

View full text Add to dashboard Cite

show abstract

“… [16]. Shallow learning basically uses neural networks with single layers or SVMs (Support Vector Machines) while deep learning uses neural network with more than one hidden layers.…”

Section: Who's Using Machine Learning?mentioning

confidence: 99%

Machine Learning and Deep Learning

Chahal¹,

Gulia²

2019

IJITEE

View full text Add to dashboard Cite

 Abstract: Now-a-days artificial intelligence has become an asset for engineering and experimental studies, just like statistics and calculus. Data science is a growing field for researchers and artificial intelligence, machine learning and deep learning are roots of it. This paper describes the relation between these roots of data science. There is a need of machine learning if any kind of analysis is to be performed. This study describes machine learning from the scratch. It also focuses on Deep Learning. Deep learning can also be known as new trend of machine learning. This paper gives a light on basic architecture of Deep learning. A comparative study of machine learning and deep learning is also given in the paper and allows researcher to have a broad view on these techniques so that they can understand which one will be preferable solution for a particular problem.

show abstract

“…For example, DeepCpG utilizes both DNA sequence patterns and neighboring methylation states for predicting single-cell methylation states and modeling the sources of DNA methylation variability [13]. But the deep learning methods, which usually operate as a 'black box', are hard to interpret [14]. There have been substantial efforts to increase the interpretability of deep learning model.…”

Section: Introductionmentioning

confidence: 99%

MultiCapsNet: a interpretable deep learning classifier integrate data from multiple sources

Wang

Miao

Zhang

2019

Preprint

View full text Add to dashboard Cite

Recent advances in experimental biology have generated huge amount of data. Due to differences present in detection targets and detection mechanisms, the produced data comes with different formats and lengths. There is an urgent call for computational methods to integrate these diverse data. Deep learning model is an ideal tool to cope with complex datasets, but its inherent 'black box' nature needs more interpretability.Here, we present MultiCapsNet, a deep learning model built on CapsNet and scCapsNet. The MultiCapsNet model possesses the merits of both easier data integration and higher model interpretability. In the first example, we use the labeled variant call dataset, which is originally used to test the models for automating somatic variant refinement. We divide the 71 features listed in the dataset into eight groups according to data source and data property. Then, the data from those eight groups with different formats and lengths are integrated by our MultiCapsNet to predict the labels associated with each variant call. The performance of our MultiCapsNet matches the previous deep learning model well, given much less parameters than those needed by the previous model. After training, the MultiCapsNet model provides importance scores for each data source directly, while the previous deep learning model needs an extra importance determination step to do so. Despite that our MultiCapsNet model is substantially different from the previous deep learning model and the source importance measuring methods are also different, the importance score correlation between these two models is very high. In the second example, the prior knowledge，including information for protein-protein interactions and protein-DNA interactions, is used to determine the structure of MultiCapsNet model. The single cell RNA sequence data are decoupled into multiple parts according to the structure of MultiCapsNet model that has been integrated with prior knowledge, with each part represents genes influenced by a transcription factor or involved in a protein-protein interaction network and then could be viewed as a data source. The MultiCapsNet model could classify cells with high accuracy as well as reveal the contribution of each data source for cell type recognition. The high ranked contributors are often relevant to the contributed cell type.

show abstract

Machine Learning-Based State-of-the-Art Methods for the Classification of RNA-Seq Data

Cited by 31 publications

References 65 publications

MultiCapsNet: A General Framework for Data Integration and Interpretable Classification

MultiCapsNet: A General Framework for Data Integration and Interpretable Classification

Machine Learning and Deep Learning

MultiCapsNet: a interpretable deep learning classifier integrate data from multiple sources

Contact Info

Product

Resources

About