A context-free encoding scheme of protein sequences for predicting antigenicity of diverse influenza A viruses

Zhou, Xinrui; Yin, Rui; Kwoh, Chee-Keong; Zheng, Jie

doi:10.1186/s12864-018-5282-9

Cited by 18 publications

(24 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Herein, we improved our proposed method in [21] so that the encoding scheme can be applied to both protein sequences with varying lengths and protein sequence pairs, which covers the most situation of sequence analyses in bioinformatics. The method, CFreeEnS, is based on the AAindex database [22], which is the collection of amino acid indexes and mutation matrices from published work, representing physiochemic and biochemical properties related to the specificity and diversity of protein structures and functions.…”

Section: Methodsmentioning

confidence: 99%

“…Our previous work has demonstrated that CFreeEnS is effective in predicting the antigenic similarity between the hemagglutinin protein of influenza viral strains [21], indicating that CFreeEnS for protein sequence pairs can distinguish subtle differences between proteins within the same family.…”

Section: B Subtle Distinctions Between Proteins Within the Same Familymentioning

confidence: 99%

“…Previously, we proposed an encoding scheme for protein sequence pairs, named CFreeEnS, to predict the antigenic similarity between the hemagglutinin proteins of influenza viruses, which was effective across multiple subtypes of influenza [21]. We hypothesized that the method captured intrinsic distinctions between amino acid pairs and was promising to be applied to other problems with aligned protein sequence pairs as the input.…”

Section: Introductionmentioning

confidence: 99%

“…1) Section II describes how CFreeEnS encodes the protein sequences and protein sequences pairs. The module dealing with protein sequence pairs has been presented in our previous work [21] so that we would only briefly introduce the framework. 2) Section III presents the applications of CFreeEnS.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

An Encoding Scheme Capturing Generic Priors and Properties of Amino Acids Improves Protein Classification

et al. 2019

Self Cite

View full text Add to dashboard Cite

Feature engineering aims at representing non-numeric data with numeric features that keep the essential information of the underlying problem, and it is a non-trivial process in building a predictive model. In bioinformatics, there is a profound scale of DNA and protein sequences available, but far from being fully utilized. Computational models can facilitate the analyses of large-scale data. However, most computational models require a numeric representation as input. Expert knowledge can help design features to cast the raw symbolic data effectively. But generally, the features vary from case to case and have to be redesigned for a problem. Automated feature engineering, i.e., an encoding scheme automating the construction of features, saves the redesigning process and allows the researchers to try different representations with minimal effort. This is more in line with the explosion of data and the goal of building an intelligent system. In this paper, we introduce an encoding scheme for protein sequences, which encodes the representative sequence dataset into a numeric matrix that can be fed into a downstream learning model. The method, Context-Free Encoding Scheme (CFreeEnS), was proposed for a dataset with labels for pairwise sequences. Here, we improve the method by making it applicable to a batch of protein sequences, requiring no sequence alignment beforehand. The improved method is applied to protein classification at the functional level, including identifying antimicrobial peptides, screening tumor homing peptides, and detecting hemolytic peptides and phage virion proteins. Compared with the traditional methods using task-specific designed features, CFreeEnS improves the predicting accuracy, with an increase ranging from 5.54% to 14.14%. The results indicate that the improved CFreeEnS, free from dependence on carefully designed features, is promising in capturing generic priors and essential properties of amino acids, thereby serving as an automated feature engineering method for protein sequences.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: B Subtle Distinctions Between Proteins Within the Same Familymentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

An Encoding Scheme Capturing Generic Priors and Properties of Amino Acids Improves Protein Classification

et al. 2019

Self Cite

View full text Add to dashboard Cite

show abstract

“…Zhou et al . 24 proposed a Context-Free Encoding Scheme (CFreeEnS) prediction method that allows to be integrated with a large number of different substitution matrices for protein sequences.…”

Section: Introductionmentioning

confidence: 99%

Predicting Antigenicity of Influenza A Viruses Using biophysical ideas

Degoot

Adabor

Chirove

et al. 2019

Sci Rep

View full text Add to dashboard Cite

Antigenic variations of influenza A viruses are induced by genomic mutation in their trans-membrane protein HA1, eliciting viral escape from neutralization by antibodies generated in prior infections or vaccinations. Prediction of antigenic relationships among influenza viruses is useful for designing (or updating the existing) influenza vaccines, provides important insights into the evolutionary mechanisms underpinning viral antigenic variations, and helps to understand viral epidemiology. In this study, we present a simple and physically interpretable model that can predict antigenic relationships among influenza A viruses, based on biophysical ideas, using both genomic amino acid sequences and experimental antigenic data. We demonstrate the applicability of the model using a benchmark dataset of four subtypes of influenza A (H1N1, H3N2, H5N1, and H9N2) viruses and report on its performance profiles. Additionally, analysis of the model’s parameters confirms several observations that are consistent with the findings of other previous studies, for which we provide plausible explanations.

show abstract

Revisiting the Principles of Designing a Vaccine

et al. 2021

View full text Add to dashboard Cite

Immune principles formulated by Jenner, Pasteur, and early immunologists served as fundamental propositions for vaccine discovery against many dreadful pathogens. However, decisive success in the form of an efficacious vaccine still eludes for diseases such as tuberculosis, leishmaniasis, and trypanosomiasis. Several antileishmanial vaccine trials have been undertaken in past decades incorporating live, attenuated, killed, or subunit vaccination, but the goal remains unmet. In light of the above facts, we have to reassess the principles of vaccination by dissecting factors associated with the hosts' immune response. This chapter discusses the pathogen-associated perturbations at various junctures during the generation of the immune response which inhibits antigenic processing, presentation, or remodels memory T cell repertoire. This can lead to ineffective priming or inappropriate activation of memory T cells during challenge infection. Thus, despite a protective primary response, vaccine failure can occur due to altered immune environments in the presence of pathogens.

show abstract

A context-free encoding scheme of protein sequences for predicting antigenicity of diverse influenza A viruses

Cited by 18 publications

References 22 publications

An Encoding Scheme Capturing Generic Priors and Properties of Amino Acids Improves Protein Classification

An Encoding Scheme Capturing Generic Priors and Properties of Amino Acids Improves Protein Classification

Predicting Antigenicity of Influenza A Viruses Using biophysical ideas

Revisiting the Principles of Designing a Vaccine

Contact Info

Product

Resources

About