2020
DOI: 10.1159/000509084
|View full text |Cite
|
Sign up to set email alerts
|

A Comparative Analysis of Allergen Proteins between Plants and Animals Using Several Computational Tools and Chou’s PseAAC Concept

Abstract: <b><i>Background:</i></b> A large number of allergens are derived from plant and animal proteins. A major challenge for researchers is to study the possible allergenic properties of proteins. The aim of this study was in silico analysis and comparison of several physiochemical and structural features of plant- and animal-derived allergen proteins, as well as classifying these proteins based on Chou’s pseudo-amino acid composition (PseAAC) concept combined with bioinformatics algorithms.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(5 citation statements)
references
References 49 publications
0
5
0
Order By: Relevance
“…Motifs are basically signature sequences that aid in the identification of any protein. The e-value shows accuracy of the predicted motif; less the e-value, more the precision of the possible motifs [ 56 ]. From our study, it has been found that the GTs retrieved from all the three different environments based on growth temperatures (i.e., mesophile, thermophile, and hyperthermophile), e-value were less than 3.0e + 000.…”
Section: Resultsmentioning
confidence: 99%
“…Motifs are basically signature sequences that aid in the identification of any protein. The e-value shows accuracy of the predicted motif; less the e-value, more the precision of the possible motifs [ 56 ]. From our study, it has been found that the GTs retrieved from all the three different environments based on growth temperatures (i.e., mesophile, thermophile, and hyperthermophile), e-value were less than 3.0e + 000.…”
Section: Resultsmentioning
confidence: 99%
“…Therefore, more attention has recently been given to bioinformatics and machine learning strategies as potential tools for detecting and classifying food allergens. Among the great variety of methods, intelligence neural networks, supervised learning, support vector machines with linear kernel functions, and different classifiers such as k -nearest neighbor are used as reliable options for identifying, modeling, and predicting allergenic properties. Wang et al developed a new deep learning model (transformer with a self-attention mechanism combining the learning models Light Gradient Boosting Machine [LightGBM] and eXtreme Gradient Boosting [XGBoost]) for the prediction of food allergens. Machine learning is proving to be a tremendously helpful solution in this field.…”
Section: Introductionmentioning
confidence: 99%
“…Subsequently, various approaches are proposed from different perspectives for allergen prediction, such as motif-based approaches, similarity searches, machine learning-based modeling, etc. Prediction tools, such as AllerCatPro 2.0, AlgPred 2.0, have been used extensively for preliminary allergenic risk assessment and identification of new allergens. ,, One representative project derived from the FAO/WHO guidelines was AllerCatPro/AllerCatPro 2.0, which implemented a hierarchical workflow employing five criteria with improved similarity-checking methods for both sequences and 3D structures. , The criteria used in AllerCatPro were based on statistical analysis of existing allergens, representing a form of experience/knowledge-based decision-making. For instance, if a protein sequence shares >35% identity with known allergens over 90 windows, it would be classified into the group with strong evidence for allergenic potential. , So far, this approach has achieved a great performance in the allergen benchmark data sets .…”
Section: Introductionmentioning
confidence: 99%
“…Another representative technique is the machine learning (ML) approach where the classification criterion is determined by the ML model by learning from a separate training data set. ,,, , Besides the selection of modeling method, the most challenging task in the ML approach is the representation or encoding of protein/peptide sequences into a numerical vector/matrix. Various encoding methods, including amino acid descriptors, amino acid composition (AAC), pseudoamino acid composition (PseAAC), dipeptide composition (DPC), amino acid descriptors (AAD), position-specific scoring matrix (PSSM), physicochemical descriptors, biomedical properties, k-mer dictionary-based binary representation, etc., have been widely used in predicting allergenicity and other properties/bioactivities. ,,,,, However, these features may not always accurately represent protein sequences and simple combinations can cause high-dimensional problems as well as the feature redundancy …”
Section: Introductionmentioning
confidence: 99%