2019
DOI: 10.1145/3292577
|View full text |Cite
|
Sign up to set email alerts
|

Code Authorship Attribution

Abstract: Code authorship attribution is the process of identifying the author of a given code. With increasing numbers of malware and advanced mutation techniques, the authors of malware are creating a large number of malware variants. To better deal with this problem, methods for examining the authorship of malicious code are necessary. Code authorship attribution techniques can thus be utilized to identify and categorize the authors of malware. This information can help predict the types of tools and techniques that … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
12
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 68 publications
(20 citation statements)
references
References 68 publications
0
12
0
Order By: Relevance
“…To demonstrate how to estimate the probability in (9), such that it generalizes to problems of the testing set, and without over-fitting samples of the learning set, consider the following hypothetical example of four subsets of represented texts that are obtained from the learning set X y 1 ,1 ⊆ X y 1 , X y 2 ,2 ⊆ X y 2 , X y 3 ,3 ⊆ X y 3 , and X y 4 ,4 ⊆ X y 4 , where we know beforehand that y 1 = y 2 , y 3 = y 4 , but y 1 = y 3 . Additionally, let X y 1 ,1 , X y 2 ,2 , X y 3 ,3 , and X y 4 ,4 be random variables that take values in the subsets, respectively.…”
Section: Single-domain Open-set Classification (Soc)mentioning
confidence: 99%
See 1 more Smart Citation
“…To demonstrate how to estimate the probability in (9), such that it generalizes to problems of the testing set, and without over-fitting samples of the learning set, consider the following hypothetical example of four subsets of represented texts that are obtained from the learning set X y 1 ,1 ⊆ X y 1 , X y 2 ,2 ⊆ X y 2 , X y 3 ,3 ⊆ X y 3 , and X y 4 ,4 ⊆ X y 4 , where we know beforehand that y 1 = y 2 , y 3 = y 4 , but y 1 = y 3 . Additionally, let X y 1 ,1 , X y 2 ,2 , X y 3 ,3 , and X y 4 ,4 be random variables that take values in the subsets, respectively.…”
Section: Single-domain Open-set Classification (Soc)mentioning
confidence: 99%
“…Improving solvers of stylometry problems is essential for enhancing various application domains, such as forensics, privacy (or anti-forensics), active-authentication [1]- [3], the detection of compromised accounts [4], recommender systems [5], deception detection, market analysis, and medical diagnosis [6], [7]. Author identification can also be accurately performed on program source codes [8], [9] as well as compiled binaries [10]. Enhancing such application domains is growing increasingly more interesting thanks to the availability of large amounts of textual data via the Internet.…”
Section: Introductionmentioning
confidence: 99%
“…The overall goal of such a software is to help to identify the authors of malicious software. This domain has been very active in the last years [7,11,19]. Our tool is designed to identify coding style pattern used by PDF producer tools to detect PDF producer tool.…”
Section: Related Workmentioning
confidence: 99%
“…Although authorship attribution began with the stylistic analyses of humanities scholars (i.e., stylometry), with the advent of digital computers, the related techniques have been applied in music (e.g., musical style recognition and disputed musical authorship attribution; Brinkman et al, 2016; Tsai & Ji, 2020), art and painting (e.g., the identification of genuine paintings; Kokensparger, 2018; Yukimura et al, 2018), plagiarism detection (e.g., collaboration detection in documents; Gollub et al, 2013; AlSallal et al, 2019), spam detection (e.g., the detection of unsolicited and virus‐infested emails; Argamon et al, 2003; Rocha et al, 2017), and forensic investigation (e.g., author identification in anonymous or phishing emails; Gollub et al, 2013; Edwards, 2018). In the recent past, there has been increased research on code stylometry (Kokensparger, 2018; Kalgutkar et al, 2019; Quiring et al, 2019), which attempts to identify software authors from program source code using a feature analysis of programming styles. Its aims are to counter problems such as computer viruses and cyberattacks, as well as to detect unauthorized copying and plagiarism of software.…”
Section: Introductionmentioning
confidence: 99%