2019
DOI: 10.18293/seke2019-026
|View full text |Cite
|
Sign up to set email alerts
|

Logical Segmentation of Source Code

Abstract: Many software analysis methods have come to rely on machine learning approaches. Code segmentation -the process of decomposing source code into meaningful blockscan augment these methods by featurizing code, reducing noise, and limiting the problem space. Traditionally, code segmentation has been done using syntactic cues; current approaches do not intentionally capture logical content. We develop a novel deep learning approach to generate logical code segments regardless of the language or syntactic correctne… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(6 citation statements)
references
References 10 publications
0
6
0
Order By: Relevance
“…Among the reviewed works several of them use some form of RNNs. classical RNNs [86 ], long short‐term memory (LSTM) [16, 67, 69, 70, 73, 76, 83 ], contextual long short‐term memory (CLSTM) [72 ], bidirectional recurrent neural networks (BRNNs) [17, 74 ], bidirectional gated recurrent unit (BGRU) [83 ], and bidirectional long short‐term memory (BLSTM) [14, 15, 19, 77, 79, 82–85 ].…”
Section: Taxonomy Of Deep Learning Techniques For Source Code Vulnementioning
confidence: 99%
See 4 more Smart Citations
“…Among the reviewed works several of them use some form of RNNs. classical RNNs [86 ], long short‐term memory (LSTM) [16, 67, 69, 70, 73, 76, 83 ], contextual long short‐term memory (CLSTM) [72 ], bidirectional recurrent neural networks (BRNNs) [17, 74 ], bidirectional gated recurrent unit (BGRU) [83 ], and bidirectional long short‐term memory (BLSTM) [14, 15, 19, 77, 79, 82–85 ].…”
Section: Taxonomy Of Deep Learning Techniques For Source Code Vulnementioning
confidence: 99%
“…This issue has been highlighted by several reviewed papers as a limitation that can be explored in the future. On several of the surveyed works the programming language of the data set focus ranges from C/C++ [14, 15, 17, 19, 69–72, 77–87 ] and JavaScript [18, 60, 79 ]. Having a data set that consists of multiple programming languages would require further exploration.…”
Section: Issues and Challengesmentioning
confidence: 99%
See 3 more Smart Citations