The performance of a speaker recognition system depends highly on which acoustic features are used. Most speaker recognition systems use short-term acoustic features extracted from a single speech frame, and the most popular short-term acoustic features are the Mel-frequency cepstral coefficients (MFCCs). The short-term features are generally static features no dynamic information in the speech signal is included in either cepstral coefficients or an MFCCs frame. Using an analysis sparse representation model, in this paper, we introduce the long-term acoustic (LTA) feature for text-independent speaker recognition, which is a sparse presentation of the static features and dynamic information for the speaker's speech. First, the speech signal is segmented into frames which are overlapping with each other, and then the MFCCs frame features can be extracted to construct some super MFCCs frames by stacking some following frames of the current frame to capture the dynamic information of the speech signal. The super MFCCs frames can be combined into a 2-D MFCCs features map (MFCCsmap). Finally, the speaker model can be built based on the analysis sparse model and the sparse representations of the MFCCsmap are used as the LTA features. A state-of-the-art deep neural network (DNN) is employed as a classifier for speaker recognition. The experimental results illustrate the effectiveness and robustness of the proposed system.INDEX TERMS Analysis sparse representation, deep neural network, long-term acoustic features, Mel-frequency cepstral coefficients, speaker recognition.