Proceedings of the 15th Conference of the European Chapter of The Association for Computational Linguistics: Volume 1 2017
DOI: 10.18653/v1/e17-1004
|View full text |Cite
|
Sign up to set email alerts
|

Classifying Illegal Activities on Tor Network Based on Web Textual Contents

Abstract: The freedom of the Deep Web offers a safe place where people can express themselves anonymously but they also can conduct illegal activities. In this paper, we present and make publicly available 1 a new dataset for Darknet active domains, which we call it "Darknet Usage Text Addresses" (DUTA). We built DUTA by sampling the Tor network during two months and manually labeled each address into 26 classes. Using DUTA, we conducted a comparison between two well-known text representation techniques crossed by three… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
47
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
2
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 74 publications
(55 citation statements)
references
References 26 publications
0
47
0
Order By: Relevance
“…Research papers N. of papers Content classification [23], [24], [25], [26], [9], [27], [28], [29], [30], [31], [32], [33], [34], [35], [36], [37], [38] 17…”
Section: Research Fieldmentioning
confidence: 99%
“…Research papers N. of papers Content classification [23], [24], [25], [26], [9], [27], [28], [29], [30], [31], [32], [33], [34], [35], [36], [37], [38] 17…”
Section: Research Fieldmentioning
confidence: 99%
“…This dataset was proposed by Fidalgo et al [9] after crawling Tor images domains. In Tor, several types of services can be found [6] but also illegal activities or hidden services, such as illegal drugs selling [2] and other as identified by Al-Nabki et al [1], which include drug selling, weapons and personal ID forgery.…”
Section: State Of the Artmentioning
confidence: 99%
“…Due to this focus on anonymity, it is a common source of illegal content and media. According to [1], it is estimated that 25% of the content found in Tor network may involve potentially illegal activities, such as counterfeiting ID documents, credit cards, weapons, drug selling, and other types of illegal content. With the increase in number of these hidden services as well as the size of the available information, automated techniques are required to analyze the content and detect potential threats or illegal activities.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Due to its active learning ability, machine learning has been applied to website categorization and has become a hotspot of illegal website detection. There are many practical solutions based on machine learning, which generally extract features from the URL [ 7 ] or the content of websites [ 8 ]. The URL-based illegal detection method only extracts features from the URL of the website with a short detection time.…”
Section: Introductionmentioning
confidence: 99%