Although most state‐of‐the‐art action recognition models have adopted a two‐stream 3D convolutional structure as a backbone network, few works have studied the impact of loss functions on action recognition models. In addition, sparsity is used as a key prior knowledge in many fields. However, as far as is known, no one has studied the influence of the sparsity of network output on the output of deep learning‐based action recognition models. Therefore, this paper proposes a novel two‐stream inflated 3D ConvNet based on the sparse regularization (SRI3D) model for action recognition. In order to allow the network to learn the sparsity of output, the ℓ1 norm is embedded in the loss function in regularization form in a plug‐and‐play manner. It can make the classification result after the fusion of the two‐stream network only be the category with the highest confidence in one of the streams and not the other cases. The proposed loss function based on sparse regularization makes the output vector of the neural network as sparse as possible so that the classification results will not be ambiguous. Experimental results show that compared with other state‐of‐the‐art models, this SRI3D has a competitive advantage on Kinetics‐400, Something‐Something V2, UCF‐101 and HMDB‐51.