After studying the existing test set generation methods of antivirus software and sample analysis methods based on manual experience, the paper proposes a software gene-based test set automatic generation framework for antivirus software. Most of current test set automatic generation frameworks have problems of unstable performance, time-consuming, and the fact that its test set cannot well reflect the density distribution character of the original dataset. In this paper, some improvements are made to resolve above problems. Experiment results show that the framework can efficiently generate the test sample set with the volume no more than one tenth of the original data set, meanwhile the distribution characteristics of the original dataset can be retained.