Stuttering is a widespread speech disorder involving about the 5% of the population and the 2.5% of children under the age of 5. Much work in literature studies causes, mechanisms and epidemiology and much work is devoted to illustrate treatments, prognosis and how to diagnose stutter. Relevantly, a stuttering evaluation requires the skills of a multi-dimensional team. An expert speech-language therapist conduct a precise evaluation with a series of tests, observations, and interviews. During an evaluation, a speech language therapist perceive, record and transcribe the number and types of speech disuencies that a person produces in dierent situations. Stuttering is very variable in the number of repeated syllables/words and in the secondary aspects that alter the clinical picture. This work wants to help in the dicult task of evaluating the stuttering and recognize the occurrencies of disuency episodes like repetitions and prolongations of sounds, syllables, words or phrases silent pauses, hesitations or blocks before speech. In particular, we propose a deep-learning based approach able at automatically detecting diuent production point in the speech helping in early classication of the problems providing the number of disuencies and time intervals where the disuencies occur. A deep learner is built to preliminarly valuate audio fragments. However, the scenario at hand contains some peculiarities making the detection challenging. Indeed, (i) fragments too short lead to uneective classication since a too short audio fragment is not able to capture the stuttering episode; and (ii) fragments too long lead to uneective classication since stuttering episode can have a very small duration and, then, the much uent speaking contained in the fragment masks the disuence. So, we design an ad-hoc segment classier that, exploiting the output of a deep learner working with non too short fragments, classies each small segment composing an audio fragment by estimating the probability of containing a disuence.