2017
DOI: 10.1038/s41598-017-01699-z
|View full text |Cite
|
Sign up to set email alerts
|

Predicting influenza antigenicity from Hemagglutintin sequence data based on a joint random forest method

Abstract: Timely identification of emerging antigenic variants is critical to influenza vaccine design. The accuracy of a sequence-based antigenic prediction method relies on the choice of amino acids substitution matrices. In this study, we first compared a comprehensive 95 substitution matrices reflecting various amino acids properties in predicting the antigenicity of influenza viruses by a random forest model. We then proposed a novel algorithm called joint random forest regression (JRFR) to jointly consider top sub… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
66
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 57 publications
(67 citation statements)
references
References 32 publications
1
66
0
Order By: Relevance
“…While there was considerable overlap between the positions in our model with the highest cumulative importance (Supplemental Table 3) compared to the positions in the JRFR algorithm (positions 62, 121, 131, 133, 135, 137, 142, 144, 145, 155, 156, 158, 159, 172, 173, 189, 193, 196, 276 ), the relative importance of these predictor features varied. Specifically, position 189 was the most important site in human H3 with ferret antisera, whereas our model identified position 145 as the most important position in swine H3 with swine sera (31). These differences of importance may be reflective of host specific interactions.…”
Section: Discussionmentioning
confidence: 78%
See 3 more Smart Citations
“…While there was considerable overlap between the positions in our model with the highest cumulative importance (Supplemental Table 3) compared to the positions in the JRFR algorithm (positions 62, 121, 131, 133, 135, 137, 142, 144, 145, 155, 156, 158, 159, 172, 173, 189, 193, 196, 276 ), the relative importance of these predictor features varied. Specifically, position 189 was the most important site in human H3 with ferret antisera, whereas our model identified position 145 as the most important position in swine H3 with swine sera (31). These differences of importance may be reflective of host specific interactions.…”
Section: Discussionmentioning
confidence: 78%
“…Our process included a robust analysis of prediction error and was able to identify the limits of the models. Using 10fold cross validation, our ensemble model had a higher RMSE when compared to a different machine learning approach developed for human IAV by Yao et al (2017) (31). This approach used a Joint Random Forest Regression (JRFR) algorithm that also included substitution matrices for predicting antigenic distances and had a RMSE < 1.0 (31).…”
Section: Discussionmentioning
confidence: 94%
See 2 more Smart Citations
“…Classification Algorithm. Two robust machine learning techniques, i.e., SVM and RF, are applied to perform the prediction of DBPs, which have been widely used for many classification tasks in the field of computational biology [43][44][45][46]. SVM is an outstanding classification method that is used to deal with a binary pattern recognition problem [47].…”
Section: Featurementioning
confidence: 99%