2022
DOI: 10.3390/electronics11030414
|View full text |Cite
|
Sign up to set email alerts
|

Detection of DGA-Generated Domain Names with TF-IDF

Abstract: Botnets often apply domain name generation algorithms (DGAs) to evade detection by generating large numbers of pseudo-random domain names of which only few are registered by cybercriminals. In this paper, we address how DGA-generated domain names can be detected by means of machine learning and deep learning. We first present an extensive literature review on recent prior work in which machine learning and deep learning have been applied for detecting DGA-generated domain names. We observe that a common method… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 25 publications
(9 citation statements)
references
References 52 publications
0
9
0
Order By: Relevance
“…The selected families of malicious domain names and the number of them are shown in Table 2. Refer to Vranken [20] For dataset D, there were 1417 phrase elements after bigram processing, 31,220 phrase elements after trigram processing, 127,469 phrase elements after 4-g processing, and 138,636 phrase elements after 5-g processing. The N-gram algorithm was used for the legitimate and malicious domain names in D. The distribution of the phrase elements of the domain names was obtained as shown in Figure 5a,b.…”
Section: Discussionmentioning
confidence: 99%
“…The selected families of malicious domain names and the number of them are shown in Table 2. Refer to Vranken [20] For dataset D, there were 1417 phrase elements after bigram processing, 31,220 phrase elements after trigram processing, 127,469 phrase elements after 4-g processing, and 138,636 phrase elements after 5-g processing. The N-gram algorithm was used for the legitimate and malicious domain names in D. The distribution of the phrase elements of the domain names was obtained as shown in Figure 5a,b.…”
Section: Discussionmentioning
confidence: 99%
“…An example of a domain from this family is "b83ed4877eec1997fcc39b7ae590007a.info". This example appears to be very random and obviously fits the features of a generated hash [12,13]. Arithmetic-based DGAs are the most common ones in this category, the domains are constructed by generating sequences of values that have either an ASCII representation directly or index hardcoded arrays that constitute the DGA alphabet.…”
Section: Related Workmentioning
confidence: 96%
“…TF-IDF is one of the traditional methods based on statistics [21]. It has been used in many different applications, such as document clustering [22], text classification [23], detection of domain name generation algorithms [24], and comparing research trends [25]. Term frequency or word frequency is a rarer method used in information retrieval systems compared to TF-IDF.…”
Section: Related Workmentioning
confidence: 99%