This study aims to establish and explore students' perception of a corpus in vocabulary learning. The corpus development was completed based on IDM ADDIE. This research was started by conducting a problem analysis that reveals students' obstacles in learning a language. The students' are identified to have a limited vocabulary of the language they learned. The corpus construction and development was begun by creating a script in PHP language. This research produces a corpus with 377880 tokens and five sub-corpora, namely Indonesian, English, German, Arabic, as well as art and design. The vocabularies are presented according to the highest frequency in the language and language teacher education field. The evaluation carried out by the experts of materials, language, and media discovers that the corpus is feasible to be integrated into the learning. Simultaneously, the assessment from students who have attended the corpus' implementation with data-driven learning (DDL) approach shows that this corpus helps students broaden their vocabulary, including the word meaning, form, and usage through observation on the concordance and collocation lines.