Although the DNS over HTTPS (DoH) protocol has desirable properties for Internet users such as privacy and security, it also causes a problem in that network administrators are prevented from detecting suspicious network traffic generated by malware and malicious tools. To support their efforts in maintaining network security, in this paper, we propose a novel system that identifies malicious DNS tunnel tools through a hierarchical classification method that uses machine-learning technology on DoH traffic. We implemented a prototype of the proposed system and evaluated its performance on the CIRA-CIC-DoHBrw-2020 dataset, obtaining 99.81% accuracy in DoH traffic filtering, 99.99% accuracy in suspicious DoH traffic detection, and 97.22% accuracy in identification of malicious DNS tunnel tools.
Some of the most serious security threats facing computer networks involve malware. To prevent this threat, administrators need to swiftly remove the infected machines from their networks. One common way to detect infected machines in a network is by monitoring communications based on blacklists. However, detection using this method has the following two problems: no blacklist is completely reliable, and blacklists do not provide sufficient evidence to allow administrators to determine the validity and accuracy of the detection results. Therefore, simply matching communications with blacklist entries is insufficient, and administrators should pursue their detection causes by investigating the communications themselves. In this paper, we propose an approach for classifying malicious DNS queries detected through blacklists by their causes. This approach is motivated by the following observation: a malware communication is divided into several transactions, each of which generates queries related to the malware; thus, surrounding queries that occur before and after a malicious query detected through blacklists help in estimating the cause of the malicious query. Our cause-based classification drastically reduces the number of malicious queries to be investigated because the investigation scope is limited to only representative queries in the classification results. In experiments, we have confirmed that our approach could group 388 malicious queries into 3 clusters, each consisting of queries with a common cause. These results indicate that administrators can briefly pursue all the causes by investigating only representative queries of each cluster, and thereby swiftly address the problem of infected machines in the network.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.