Hannes Thaller scite author profile

Linsbauer

Egyed

2019

Design patterns are elegant and well-tested solutions to recurrent software development problems. They are the result of software developers dealing with problems that frequently occur, solving them in the same or a slightly adapted way. A pattern's semantics provide the intent, motivation, and applicability, describing what it does, why it is needed, and where it is useful. Consequently, design patterns encode a well of information. Developers weave this information into their systems whenever they use design patterns to solve problems. This work presents Feature Maps, a flexible human-and machine-comprehensible software representation based on micro-structures. Our algorithm, the Feature-Role Normalization, presses the high-dimensional, inhomogeneous vector space of micro-structures into a feature map. We apply these concepts to the problem of detecting instances of design patterns in source code. We evaluate our methodology on four design patterns, a wide range of balanced and imbalanced labeled training data, and compare classical machine learning (Random Forests) with modern deep learning approaches (Convolutional Neural Networks). Feature maps yield robust classifiers even under challenging settings of strongly imbalanced data distributions without sacrificing human comprehensibility. Results suggest that feature maps are an excellent addition in the software analysis toolbox that can reveal useful information hidden in the source code.

Exploring code clones in programmable logic controller software

Ramler

Pichler

et al. 2017

The reuse of code fragments by copying and pasting is widely practiced in software development and results in code clones. Cloning is considered an anti-pattern as it negatively affects program correctness and increases maintenance efforts. Programmable Logic Controller (PLC) software is no exception in the code clone discussion as reuse in development and maintenance is frequently achieved through copy, paste, and modification. Even though the presence of code clones may not necessary be a problem per se, it is important to detect, track and manage clones as the software system evolves. Unfortunately, tool support for clone detection and management is not commonly available for PLC software systems or limited to generic tools with a reduced set of features. In this paper, we investigate code clones in a real-world PLC software system based on IEC 61131-3 Structured Text and C/C++. We extended a widely used tool for clone detection with normalization support. Furthermore, we evaluated the different types and natures of code clones in the studied system and their relevance for refactoring. Results shed light on the applicability and usefulness of clone detection in the context of industrial automation systems and it demonstrates the benefit of adapting detection and management tools for IEC 611313-3 languages.

Subliminal Visual Information to Enhance Driver Awareness and Induce Behavior Change

Riener

2014

Towards Semantic Clone Detection via Probabilistic Software Modeling

Linsbauer

Egyed

2020

Semantic clone detection is the process of finding program elements with similar or equal runtime behavior. For example, detecting the semantic equality between the recursive and iterative implementation of the factorial computation. Semantic clone detection is the de facto technical boundary of clone detectors. This boundary was tested over the last years with interesting new approaches. This work contributes a semantic clone detection approach that detects clones with 0% syntactic similarity. We present Semantic Clone Detection via Probabilistic Software Modeling (SCD-PSM) as a stable and precise solution to semantic clone detection. PSM builds a probabilistic model of a program that is capable of evaluating and generating runtime data. SCD-PSM leverages this model and its model elements to finding behaviorally equal model elements. This behavioral equality is then generalized to semantic equality of the original program elements. It uses the likelihood between model elements as a distance metric. Then, it employs the likelihood ratio significance test to decide whether this distance is significant, given a pre-specified and controllable false-positive rate. The output of SCD-PSM are pairs of program elements (i.e., methods), their distance, and a decision whether they are clones or not. SCD-PSM yields excellent results with a Matthews Correlation Coefficient greater 0.9. These results are obtained on classical semantic clone detection problems such as detecting recursive and iterative versions of an algorithm, but also on complex problems used in coding competitions.

Towards Fault Localization via Probabilistic Software Modeling

Linsbauer

Egyed

et al. 2020

Software testing helps developers to identify bugs. However, awareness of bugs is only the first step. Finding and correcting the faulty program components is equally hard and essential for high-quality software. Fault localization automatically pinpoints the location of an existing bug in a program. It is a hard problem, and existing methods are not yet precise enough for widespread industrial adoption. We propose fault localization via Probabilistic Software Modeling (PSM). PSM analyzes the structure and behavior of a program and synthesizes a network of Probabilistic Models (PMs). Each PM models a method with its inputs and outputs and is capable of evaluating the likelihood of runtime data. We use this likelihood evaluation to find fault locations and their impact on dependent code elements. Results indicate that PSM is a robust framework for accurate fault localization.