“…In the past few years, several protein subcellular locations and protein type prediction tools, including sub-Golgi protein identification tools (Teasdale and Yuan, 2002; Van Dijk et al, 2008; Chou et al, 2010; Ding et al, 2011, 2013; Jiao et al, 2014; Lin et al, 2014; Nikolovski et al, 2014; Jiao and Du, 2016a,b; Yang R. et al, 2016; Ahmad et al, 2017; Wang et al, 2017; Rahman et al, 2018; Ahmad and Hayat, 2019; Wuritu et al, 2019), have been developed using various machine learning algorithms, including increment diversity Mahalanobis discriminant (IDMD) (Ding et al, 2011), support vector machine (SVM) (Ding et al, 2013, 2017; Jiao et al, 2014; Lin et al, 2014; Jiao and Du, 2016a,b), random forest (RF) (Ding et al, 2016a,b; Yang R. et al, 2016; Yu et al, 2017; Liu et al, 2018), and K nearest neighbor algorithm (KNN) (Ahmad et al, 2017; Ahmad and Hayat, 2019), among others. To generate feature vectors for sub-Golgi protein identification, protein amino acid composition (AAC) (Rahman et al, 2018), k-gapped dipeptide composition (k-gapDC) (Ding et al, 2011, 2013), pseudo amino acid composition (PseAAC) (Jiao et al, 2014; Liu et al, 2015), and protein sequences evolutionary information (e.g., position-specific scoring matrix, PSSM) and their derivative features (Yang et al, 2014; Jiao and Du, 2016a,b; Yang R. et al, 2016; Ahmad et al, 2017; Rahman et al, 2018) have been used.…”