“…Another representative technique is the machine learning (ML) approach where the classification criterion is determined by the ML model by learning from a separate training data set. ,,,− , Besides the selection of modeling method, the most challenging task in the ML approach is the representation or encoding of protein/peptide sequences into a numerical vector/matrix. Various encoding methods, including amino acid descriptors, amino acid composition (AAC), pseudoamino acid composition (PseAAC), dipeptide composition (DPC), amino acid descriptors (AAD), position-specific scoring matrix (PSSM), physicochemical descriptors, biomedical properties, k-mer dictionary-based binary representation, etc., have been widely used in predicting allergenicity and other properties/bioactivities. ,,,,,− However, these features may not always accurately represent protein sequences and simple combinations can cause high-dimensional problems as well as the feature redundancy …”