Background: An allergic reaction is the immune system's overreacting to a previously encountered, typically benign molecule, frequently a protein. Allergy reactions can result in rashes, itching, mucous membrane swelling, asthma, coughing, and other bizarre symptoms. To anticipate allergies, a wide range of principles and methods have been applied in bioinformatics. The sequence similarity approach's positive predictive value is very low and ineffective for methods based on FAO/WHO criteria, making it difficult to predict possible allergens. Method: This work advocated the use of a deep learning model LSTM (Long Short-Term Memory) to overcome the limitations of traditional approaches and machine learning lower performance models in predicting the allergenicity of dietary proteins. A total of 2,427 allergens and 2,427 non-allergens, from a variety of sources, including the Central Science Laboratory and the NCBI are used. The data was divided 80:20 for training and testing purposes. These techniques have all been implemented in Python. To describe the protein sequences of allergens and non-allergens, five E-descriptors were used. E1 (hydrophilic character of peptides), E2 (length), E3(propensity to form helices), E4(abundance and dispersion), and E5 (propensity of beta strands) are used to make the variable-length protein sequence to uniform length using ACC transformation. A total of eight machine learning techniques have been taken into consideration. Results: The Gaussian Naive Bayes as accuracy of 64.14 %, Radius Neighbour's Classifier with 49.2 %, Bagging Classifier was 85.8 %, ADA Boost was 76.9 %, Linear Discriminant Analysis has 76.13 %, Quadratic Discriminant Analysis was 84.2 %, Extra Tree Classifier was 90%, and LSTM is 91.5 %. Conclusion: As the LSTM, has an AUC value of 91.5 % is regarded best in predicting allergens. A web server called ProAll-D has been created that successfully identifies novel allergens using the LSTM approach. Users can use the link https://doi.org/10.17632/tjmt97xpjf.1 to access the ProAll-D server and data.
©2022 by the authors. This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).
Keywords
Allergen prediction; ACC transformation; LSTM model; Gaussian naive bayes; Classifier; Extra tree classifier; Bagging classifier; ADA boost; Linear discriminant analysis; Quadratic discriminant analysis