“…The various feature extraction methods that have been explored in the stuttering recognition systems are autocorrelation function and envelope parameters [78], duration, energy peaks, spectral of word based and part word based [79][80][81], age, sex, type of disfluency, frequency of disfluency, duration, physical concomitant, rate of speech, historical, attitudinal and behavioral scores, family history [38], duration and frequency of disfluent portions, speaking rate [26], frequency, 1 𝑠𝑡 to 3 𝑟𝑑 formant's frequencies and its amplitudes [81,82], spectral measure (fast Fourier transform (FFT) 512) [83,84], mel frequency cepstral coefficients (MFCC) [81,[85][86][87], Linear Predictive Cepstral Coefficients (LPCCs) [81,86], pitch, shimmer [88], zero crossing rate (ZCR) [81], short time average magnitude, spectral spread [81], linear predictive coefficients (LPC), weighted linear prediction cepstral coefficients (WLPCC) [86], maximum autocorrelation value (MACV) [81], linear prediction-Hilbert transform based MFCC (LH-MFCC) [89], noise to harmonic ratio, shimmer harmonic to noise ratio , harmonicity, amplitude perturbation quotient, formants and its variants (min, max, mean, median, mode, std), spectrum centroid [88], Kohonen's self-organizing Maps [84], i-vectors [90], perceptual linear predictive (PLP) [87], respiratory biosignals [39], and sample entropy feature [91]. With the recent developments in convolutional neural networks, the feature representation of stuttered speech is moving towards spectrogram representations from conventional MFCCs.…”