T4SS Effector Protein Prediction with Deep Learning

Açıcı, Koray; Asuroglu, Tunc; Erdaş, Çağatay Berke; Ogul, Hasan

doi:10.3390/data4010045

Cited by 12 publications

(7 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There have been four common variations of DNNs, including the CNNs, the RNNs, the CNN-RNNs, and the DNNs. The CNNs have outstanding spatial information analysis capabilities and have been successfully applied in the prediction of secreted effectors ( Xue et al, 2018 , 2019 ; Açıcı et al, 2019 ), protein solubility ( Khurana et al, 2018 ), and crystallization ( Elbasir et al, 2019 ). Compared to CNNs, RNNs can handle sequential inputs effectively and recognize sequence motifs of varying length extraordinarily well, making them the preferred choice for machine translation, text generation, and image captioning ( Esteva et al, 2019 ).…”

Section: Methodsmentioning

confidence: 99%

“…Over the past decade, dozens of machine learning-based computational approaches have been proposed to identify different types of secreted effectors ( Zeng and Zou, 2019 ), including support vector machine (SVM) ( Samudrala et al, 2009 ; Yang et al, 2010 ; Wang et al, 2011 , 2014 , 2017 ; Dong et al, 2013 ; Zou et al, 2013 ; Goldberg et al, 2016 ; Esna Ashari et al, 2019a , b ), random forest (RF) ( Yang et al, 2013 ), artificial neural network (ANN) ( Löwer and Schneider, 2009 ), naive Bayes (NB) ( Arnold et al, 2009 ), hidden Markov model (HMM) ( Xu et al, 2010 ; Lifshitz et al, 2013 ; Wang et al, 2013 ), logistic regression (LR) ( Esna Ashari et al, 2018 ), decision tree (DT) ( Wang et al, 2019a ), gradient boosting ( Chen et al, 2020 ), deep learning (DL) ( Xue et al, 2018 , 2019 ; Açıcı et al, 2019 ; Fu and Yang, 2019 ; Hong et al, 2020 ; Li et al, 2020a ), and their ensemble methods ( Burstein et al, 2009 ; Hobbs et al, 2016 ; Wang et al, 2018 , 2019b ; Xiong et al, 2018 ; Li et al, 2020b ). Some of these methods have achieved relatively high predictive accuracy, while they can recognize only one type of secreted effector, such as SIEVE ( Samudrala et al, 2009 ), EffectiveT3 ( Arnold et al, 2009 ), T3_MM ( Wang et al, 2013 ), GenSET ( Hobbs et al, 2016 ), Bastion3 ( Wang et al, 2019a ), DeepT3 ( Xue et al, 2019 ), WEDeepT3 ( Fu and Yang, 2019 ), ACNNT3 ( Li et al, 2020a ), and EP3 ( Li et al, 2020b ) for T3SEs; T4EffPred ( Zou et al, 2013 ), T4SEpre ( Wang et al, 2014 ), DeepT4 ( Xue et al, 2018 ), PredT4SE-Stack ( Xiong et al, 2018 ), Bastion4 ( Wang et al, 2019b ), T4SE-XGB ( Chen et al, 2020 ), and CNN-T4SE ( Hong et al, 2020 ) for T4SEs; and Bastion6 ( Wang et al, 2018 ) for T6SEs.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

DeepT3_4: A Hybrid Deep Neural Network Model for the Distinction Between Bacterial Type III and IV Secreted Effectors

Liu

et al. 2021

Front. Microbiol.

View full text Add to dashboard Cite

Gram-negative bacteria can deliver secreted proteins (also known as secreted effectors) directly into host cells through type III secretion system (T3SS), type IV secretion system (T4SS), and type VI secretion system (T6SS) and cause various diseases. These secreted effectors are heavily involved in the interactions between bacteria and host cells, so their identification is crucial for the discovery and development of novel anti-bacterial drugs. It is currently challenging to accurately distinguish type III secreted effectors (T3SEs) and type IV secreted effectors (T4SEs) because neither T3SEs nor T4SEs contain N-terminal signal peptides, and some of these effectors have similar evolutionary conserved profiles and sequence motifs. To address this challenge, we develop a deep learning (DL) approach called DeepT3_4 to correctly classify T3SEs and T4SEs. We generate amino-acid character dictionary and sequence-based features extracted from effector proteins and subsequently implement these features into a hybrid model that integrates recurrent neural networks (RNNs) and deep neural networks (DNNs). After training the model, the hybrid neural network classifies secreted effectors into two different classes with an accuracy, F-value, and recall of over 80.0%. Our approach stands for the first DL approach for the classification of T3SEs and T4SEs, providing a promising supplementary tool for further secretome studies.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

DeepT3_4: A Hybrid Deep Neural Network Model for the Distinction Between Bacterial Type III and IV Secreted Effectors

Liu

et al. 2021

Front. Microbiol.

View full text Add to dashboard Cite

show abstract

“…Convolutional neural network (CNN) architecture is a deep learning approach commonly used in theoretical and practical studies on topics such as disease classification, MRI reconstruction, and effector protein prediction. [ 11 - 13 ] A common CNN architecture consists of input, convolutional, pooling, and fully connected layers. Additionally, batch normalization, rectified linear unit, and dropout layers can also be used to speed up the process and reduce overfitting.…”

Section: Methodsmentioning

confidence: 99%

Femoral neck fracture detection in X-ray images using deep learning and genetic algorithm approaches

2020

View full text Add to dashboard Cite

Results: Performance in terms of sensitivity, specificity, accuracy, F1 score, and Cohen's kappa coefficient were evaluated using five-fold cross validation tests. Best performance was obtained when cropped images were rescaled to 50¥50 pixels. The kappa metric showed more reliable classifier performance when 50¥50 pixels image size was used to feed the CNN. The classifier performance was more reliable according to other image sizes. Sensitivity and specificity rates were computed to be 83% and 73%, respectively. With the inclusion of the GA, this rate increased by 1.6%. The detection rate of fractured bones was found to be 83%. A kappa coefficient of 55% was obtained, indicating an acceptable agreement. Conclusion: This experimental study utilized deep learning techniques in the detection of bone fractures in radiography. Although the dataset was unbalanced, the results can be considered promising. It was observed that use of smaller image size decreases computational cost and provides better results according to evaluation metrics.

show abstract

“…Instead, a large number of computational methods have been developed for prediction of T4SEs in the last decade, which successfully speed up the process in terms of time and efficiency. These computational approaches can be categorized into two main groups: the first group of approaches infer new effectors based on sequence similarity with currently known effectors (Chen et al, 2010;Lockwood et al, 2011;Marchesini et al, 2011;Meyer et al, 2013;Sankarasubramanian et al, 2016;Noroy et al, 2019) or phylogenetic profiling analysis (Zalguizuri et al, 2019), and the second group of approaches involve learning the patterns of known secreted effectors that distinguish them from nonsecreted proteins based on machine learning and deep learning techniques (Burstein et al, 2009;Lifshitz et al, 2013;Zou et al, 2013;Wang et al, 2014;Ashari et al, 2017;Wang Y. et al, 2017;Esna Ashari et al, 2018Guo et al, 2018;Xiong et al, 2018;Xue et al, 2018;Acici et al, 2019;Chao et al, 2019;Hong et al, 2019;Wang J. et al, 2019;Yan et al, 2020). In the latter group of methods, Burstein et al (2009) worked on Legionella pneumophila to identify T4SEs and validated 40 novel effectors which were predicted by machine learning algorithms.…”

Section: Introductionmentioning

confidence: 99%

“…However, only few information of protein sequences can be extracted, which showed a slightly weaker performance compared with the Bastion4 (Wang J. et al, 2019). Later, Acici et al (2019) developed the CNN-based model based on the conversion from protein sequences to images using AAC, DPC and PSSM feature extraction methods. More recently, Hong et al (2019) developed the new tool CNN-T4SE based on CNN, which integrated three encoding strategies: PSSM, protein secondary structure & solvent accessibility (PSSSA) and one-hot encoding scheme (Onehot), respectively.…”

Section: Introductionmentioning

confidence: 99%

T4SE-XGB: Interpretable Sequence-Based Prediction of Type IV Secreted Effectors Using eXtreme Gradient Boosting Algorithm

et al. 2020

View full text Add to dashboard Cite

Type IV secreted effectors (T4SEs) can be translocated into the cytosol of host cells via type IV secretion system (T4SS) and cause diseases. However, experimental approaches to identify T4SEs are time-and resource-consuming, and the existing computational tools based on machine learning techniques have some obvious limitations such as the lack of interpretability in the prediction models. In this study, we proposed a new model, T4SE-XGB, which uses the eXtreme gradient boosting (XGBoost) algorithm for accurate identification of type IV effectors based on optimal features based on protein sequences. After trying 20 different types of features, the best performance was achieved when all features were fed into XGBoost by the 5-fold cross validation in comparison with other machine learning methods. Then, the ReliefF algorithm was adopted to get the optimal feature set on our dataset, which further improved the model performance. T4SE-XGB exhibited highest predictive performance on the independent test set and outperformed other published prediction tools. Furthermore, the SHAP method was used to interpret the contribution of features to model predictions. The identification of key features can contribute to improved understanding of multifactorial contributors to host-pathogen interactions and bacterial pathogenesis. In addition to type IV effector prediction, we believe that the proposed framework can provide instructive guidance for similar studies to construct prediction methods on related biological problems. The data and source code of this study can be freely accessed at https://github.com/CT001002/T4SE-XGB.

show abstract

T4SS Effector Protein Prediction with Deep Learning

Cited by 12 publications

References 26 publications

DeepT3_4: A Hybrid Deep Neural Network Model for the Distinction Between Bacterial Type III and IV Secreted Effectors

DeepT3_4: A Hybrid Deep Neural Network Model for the Distinction Between Bacterial Type III and IV Secreted Effectors

Femoral neck fracture detection in X-ray images using deep learning and genetic algorithm approaches

T4SE-XGB: Interpretable Sequence-Based Prediction of Type IV Secreted Effectors Using eXtreme Gradient Boosting Algorithm

Contact Info

Product

Resources

About