“…Most existing studies extracted ncRNA and protein sequence features by using a simple k-mer: 3-mer frequency feature for protein and 4-mer frequency feature for ncRNA [22,24,26,29,33]. For protein, 20 amino acids can be classified into seven groups based on their dipole moments and side-chain volume: 1 =AA, G, V,, 2 =AI, L, F, P,, =AY, M, T, S,, 4 =AH, N, Q, W,, 5 =AR, K,, 6 =AD, E, and 7 ={C} [33].…”