Learner corpora, with their detailed information on learner language use, have been widely explored in second language acquisition and teaching. This paper is based on a self-built longitudinal EFL learner corpus to partly meet a long-desired goal of measuring and describing the general guiding feature and the dynamics of learner language, especially for beginners. The current study uses NLP tools to calculate the values for the variables needed for measuring lexical development: types, tokens, TTR, unit length indices, COCA frequency list coverage, and lexical sophistication indices. As the data are in abnormal distribution, independent-samples Kruskal-Wallis tests are employed to test the significance; further pairwise comparisons are to determine the difference between group pairs by year. The present study finds that conventional global variables are more applicable for learner language development for beginners, including the number of tokens and types, the number of letters per word and the number of words per sentence, bigram frequency, and bigram mutual information. At the same time, some of the novel indices do not make significant differences, such as TTR, MATTR, MTLD, MTLD-Ma-Wrap, COCA frequency list coverage, trigram frequency and trigram mutual information. The present study also notes that spelling mistakes hinder statistical accuracy in processing beginner language. The real difficulty of beginners lies in their lack of knowledge and practice of non-literary, suggestive or affective use of content words; correct use of topic-specific words, lexical bundles, and set collocations also pose great challenges. The findings provide new insights into EFL learner language and offer helpful pedagogical implications.