Malicious codes, such as advanced persistent threat (APT) attacks, do not operate immediately after infecting the system, but after receiving commands from the attacker’s command and control (C&C) server. The system infected by the malicious code tries to communicate with the C&C server through the IP address or domain address of the C&C server. If the IP address or domain address is hard-coded inside the malicious code, it can analyze the malicious code to obtain the address and block access to the C&C server through security policy. In order to circumvent this address blocking technique, domain generation algorithms are included in the malware to dynamically generate domain addresses. The domain generation algorithm (DGA) generates domains randomly, so it is very difficult to identify and block malicious domains. Therefore, this paper effectively detects and classifies unknown DGA domains. We extract features that are effective for TextCNN-based label prediction, and add additional domain knowledge-based features to improve our model for detecting and classifying DGA-generated malicious domains. The proposed model achieved 99.19% accuracy for DGA classification and 88.77% accuracy for DGA class classification. We expect that the proposed model can be applied to effectively detect and block DGA-generated domains.
According to the international data corporation (IDC), it is expected that an agency investigating trends in the international ICT market, the use of a variety of smart devices and the internet have increased dramatically, and the volume of data, such as digital contents have surged and the amount of stored data is expected to exceed 40,000 ex bytes, which is expected to surge 50 times compared to 2010, and accordingly, it will require more than 10 times as many servers (Korea IT promotion agency, 2012).
Recently, a lot of information security accidents such as APT(Advanced Persistent Threat), Ransomware, Drive-By-Download, and distribution of malicious code through e-mail have occurred in various public institutions and financial institutions. Such malicious attacks are becoming more intelligent, and the number
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.