Probabilistic model for code with decision trees

RaychevVeselin,; BielikPavol,; VechevMartin,

doi:10.1145/3022671.2984041

Cited by 98 publications

(91 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Apart from that, various deep neural embeddings and models have been also applied to different program analysis or software engineering tasks such as HAGGIS for mining idioms from source code [8], Gemini for binary code similarity detection [43], Code Vectors for code analogies, bug fining and repair/suggestion [22], Dynamic Program Embeddings for classifying the types of errors in programs [41], DYPRO for recognizing loop invariants [39], Im-port2Vec for learning embeddings of software libraries [38], NeurSA for catching static bugs in code [42], and HOPPITY to detect and fix bugs in programs [18]. Researchers have also studied the language model for code completion [23,34,36], code suggestion [7], and code retrieval [25] task.…”

Section: Related Workmentioning

confidence: 99%

Towards demystifying dimensions of source code embeddings

Rabin

Mukherjee

Gnawali

et al. 2020

Proceedings of the 1st ACM SIGSOFT International Workshop on Representation Learning for Software Engineering and Program Langu

View full text Add to dashboard Cite

Source code representations are key in applying machine learning techniques for processing and analyzing programs. A popular approach in representing source code is neural source code embeddings that represents programs with high-dimensional vectors computed by training deep neural networks on a large volume of programs. Although successful, there is little known about the contents of these vectors and their characteristics. In this paper, we present our preliminary results towards better understanding the contents of code2vec neural source code embeddings. In particular, in a small case study, we use the code2vec embeddings to create binary SVM classifiers and compare their performance with the handcrafted features. Our results suggest that the handcrafted features can perform very close to the highlydimensional code2vec embeddings, and the information gains are more evenly distributed in the code2vec embeddings compared to the handcrafted features. We also find that the code2vec embeddings are more resilient to the removal of dimensions with low information gains than the handcrafted features. We hope our results serve a stepping stone toward principled analysis and evaluation of these code representations. CCS CONCEPTS • Computing methodologies → Learning latent representations; • Software and its engineering → General programming languages.

show abstract

Section: Related Workmentioning

confidence: 99%

Towards demystifying dimensions of source code embeddings

Rabin

Mukherjee

Gnawali

et al. 2020

Proceedings of the 1st ACM SIGSOFT International Workshop on Representation Learning for Software Engineering and Program Langu

View full text Add to dashboard Cite

show abstract

“…[90][91][92][93][94][95][96][97][98][99][100][101][102] Paradigms of program synthesis. Paradigms [84,[103][104][105][106][107][108][109][110][111][112][113][114][115] Code completion and suggestion.…”

Section: [49]mentioning

confidence: 99%

Cognification of Program Synthesis—A Systematic Feature-Oriented Analysis and Future Direction

Subahi

2020

Computers

View full text Add to dashboard Cite

Program synthesis is defined as a software development step aims at achieving an automatic process of code generation that is satisfactory given high-level specifications. There are various program synthesis applications built on Machine Learning (ML) and Natural Language Processing (NLP) based approaches. Recently, there have been remarkable advancements in the Artificial Intelligent (AI) domain. The rise in advanced ML techniques has been remarkable. Deep Learning (DL), for instance, is considered an example of a currently attractive research field that has led to advances in the areas of ML and NLP. With this advancement, there is a need to gain greater benefits from these approaches to cognify synthesis processes for next-generation model-driven engineering (MDE) framework. In this work, a systematic domain analysis is conducted to explore the extent to the automatic generation of code can be enabled via the next generation of cognified MDE frameworks that support recent DL and NLP techniques. After identifying critical features that might be considered when distinguishing synthesis systems, it will be possible to introduce a conceptual design for the future involving program synthesis/MDE frameworks. By searching different research database sources, 182 articles related to program synthesis approaches and their applications were identified. After defining research questions, structuring the domain analysis, and applying inclusion and exclusion criteria on the classification scheme, 170 out of 182 articles were considered in a three-phase systematic analysis, guided by some research questions. The analysis is introduced as a key contribution. The results are documented using feature diagrams as a comprehensive feature model of program synthesis showing alternative techniques and architectures. The achieved outcomes serve as motivation for introducing a conceptual architectural design of the next generation of cognified MDE frameworks.

show abstract

“…There have been several recent successes in applying (supervised) machine learning to programming languages research. For example, machine learning has been used to infer program invariants [Padhi et al 2016;, improve program analysis [Liang et al 2011;Mangal et al 2015;Raghothaman et al 2018;Raychev et al 2015] and synthesis [Balog et al 2016;Feng et al 2018Kalyan et al 2018;Lee et al 2018;Raychev et al 2016b;Schkufza et al 2013Schkufza et al , 2014, build probabilistic models of code [Bielik et al 2016;Raychev et al 2016aRaychev et al , 2014, infer specifications [Bastani et al 2017[Bastani et al , 2018bBeckman and Nori 2011;Bielik et al 2017;Heule et al 2016;Kremenek et al 2006;Livshits et al 2009], test software [Clapp et al 2016;Godefroid et al 2017;Liblit et al 2005], and select lemmas for automated Proof. First, because transitions are deterministic, we have…”

Section: Related Workmentioning

confidence: 99%

Relational verification using reinforcement learning

Chen

Wei

et al. 2019

Proc. ACM Program. Lang.

View full text Add to dashboard Cite

Relational verification aims to prove properties that relate a pair of programs or two different runs of the same program. While relational properties (e.g., equivalence, non-interference) can be verified by reducing them to standard safety, there are typically many possible reduction strategies, only some of which result in successful automated verification. Motivated by this problem, we propose a new relational verification algorithm that learns useful reduction strategies using reinforcement learning. Specifically, we show how to formulate relational verification as a Markov decision process (MDP) and use reinforcement learning to synthesize an optimal policy for the underlying MDP. The learned policy is then used to guide the search for a successful verification strategy. We have implemented this approach in a tool called Coeus and evaluate it on two benchmark suites. Our evaluation shows that Coeus solves significantly more problems within a given time limit compared to multiple baselines, including two state-of-the-art relational verification tools. CCS Concepts: • Software and its engineering → Software verification; • Theory of computation → Reinforcement learning.

show abstract

Probabilistic model for code with decision trees

Cited by 98 publications

References 28 publications

Towards demystifying dimensions of source code embeddings

Towards demystifying dimensions of source code embeddings

Cognification of Program Synthesis—A Systematic Feature-Oriented Analysis and Future Direction

Relational verification using reinforcement learning

Contact Info

Product

Resources

About