An effective model, which jointly captures shape and motion cues, for dynamic texture (DT) description is introduced by taking into account advantages of volumes of blurred-invariant features in three main following stages. First, a 3-dimensional Gaussian kernel is used to form smoothed sequences that allow to deal with well-known limitations of local encoding such as near uniform regions and sensitivity to noise. Second, a receptive volume of the Difference of Gaussians (DoG) is figured out to mitigate the negative impacts of environmental and illumination changes which are major challenges in DT understanding. Finally, a local encoding operator is addressed to construct a discriminative descriptor of enhancing patterns extracted from the filtered volumes. Evaluations on benchmark datasets (i.e., UCLA, DynTex, and DynTex++) for issue of DT classification have positively validated our crucial contributions.