The identification of security-critical vulnerabilities is a key for protecting computer systems. Being able to perform this process at the binary level is very important given that many software projects are closed-source. Even if the source code is available, compilation may create a mismatch between the source code and the binary code that is executed by the processor, causing analyses that are performed on source code to fail at detecting certain bugs and thus potential vulnerabilities. Existing approaches to find bugs in binary code 1) use dynamic analysis, which is difficult for firmware; 2) handle only a single architecture; or 3) use semantic similarity, which is very slow when analyzing large code bases. In this paper, we present a new approach to efficiently search for similar functions in binary code. We use this method to identify known bugs in binaries as follows: starting with a vulnerable binary function, we identify similar functions in other binaries across different compilers, optimization levels, operating systems, and CPU architectures. The main idea is to compute similarity between functions based on the structure of the corresponding control flow graphs. To minimize this costly computation, we employ an efficient pre-filter based on numeric features to quickly identify a small set of candidate functions. This allows us to efficiently search for similar functions in large code bases. We have designed and implemented a prototype of our approach, called discovRE, that supports four instruction set architectures (x86, x64, ARM, MIPS). We show that discovRE is four orders of magnitude faster than the state-of-the-art academic approach for cross-architecture bug search in binaries. We also show that we can identify Heartbleed and POODLE vulnerabilities in an Android system image that contains over 130,000 native ARM functions in about 80 milliseconds. Permission to freely reproduce all or part of this paper for noncommercial purposes is granted provided that copies bear this notice and the full citation on the first page. Reproduction for commercial purposes is strictly prohibited without the prior written consent of the Internet Society, the first-named author (for reproduction of an entire paper only), and the author's employer if the paper was prepared within the scope of employment.
Decompilation is important for many security applications; it facilitates the tedious task of manual malware reverse engineering and enables the use of source-based security tools on binary code. This includes tools to find vulnerabilities, discover bugs, and perform taint tracking. Recovering high-level control constructs is essential for decompilation in order to produce structured code that is suitable for human analysts and sourcebased program analysis techniques. State-of-the-art decompilers rely on structural analysis, a pattern-matching approach over the control flow graph, to recover control constructs from binary code. Whenever no match is found, they generate goto statements and thus produce unstructured decompiled output. Those statements are problematic because they make decompiled code harder to understand and less suitable for program analysis. In this paper, we present DREAM, the first decompiler to offer a goto-free output. DREAM uses a novel patternindependent control-flow structuring algorithm that can recover all control constructs in binary programs and produce structured decompiled code without any goto statement. We also present semantics-preserving transformations that can transform unstructured control flow graphs into structured graphs. We demonstrate the correctness of our algorithms and show that we outperform both the leading industry and academic decompilers: Hex-Rays and Phoenix. We use the GNU coreutils suite of utilities as a benchmark. Apart from reducing the number of goto statements to zero, DREAM also produced more compact code (less lines of code) for 72.7% of decompiled functions compared to Hex-Rays and 98.8% compared to Phoenix. We also present a comparison of Hex-Rays and DREAM when decompiling three samples from Cridex, ZeusP2P, and SpyEye malware families. Permission to freely reproduce all or part of this paper for noncommercial purposes is granted provided that copies bear this notice and the full citation on the first page. Reproduction for commercial purposes is strictly prohibited without the prior written consent of the Internet Society, the first-named author (for reproduction of an entire paper only), and the author's employer if the paper was prepared within the scope of employment.
Abstract.A technique commonly used by malware for hiding on a targeted system is the host-based code injection attack. It allows malware to execute its code in a foreign process space enabling it to operate covertly and access critical information of other processes. Since there exists a plethora of different ways for injecting and executing code in a foreign process space, a generic approach spanning all these possibilities is needed. Approaches just focussing on low-level operating system details (e.g. API hooking) do not suffice since the suspicious API set is constantly extended. Thus, approaches focussing on low level operating system details are prone to miss novel attacks. Furthermore, such approaches are restricted to intimate knowledge of exactly one operating system.In this paper, we present Bee Master, a novel approach for detecting host-based code injection attacks. Bee Master applies the honeypot paradigm to OS processes and by that it does not rely on low-level OS details. The basic idea is to expose regular OS processes as a decoy to malware. Our approach focuses on concepts -such as threads or memory pages -present in every modern operating system. Therefore, Bee Master does not suffer from the drawbacks of low-level OS-based approaches. Furthermore, it allows OS independent detection of host-based code injection attacks. To test the capabilities of our approach, we evaluated Bee Master qualitatively and quantitatively on Microsoft Windows and Linux. The results show that it reaches reliable and robust detection for various current malware families.
Reverse engineering of binary code is an essential step for malware analysis. However, it is a tedious and time-consuming task. Decompilation facilitates this process by transforming machine code into a high-level representation that is more concise and easier to understand. This paper describes REcompile, an efficient and extensible decompilation framework. REcompile uses the static single assignment form (SSA) as its intermediate representation and performs three main classes of analysis. Data flow analysis removes machine-specific details from code and transforms it into a concise high-level form. Type analysis finds variable types based on how those variables are used in code. Control flow analysis identifies high-level control structures such as conditionals, loops, and switch statements. These steps enable REcompile to produce well-readable decompiled code. The overall evaluation, using real programs and malware samples, shows that REcompile achieves a comparable and in many cases better performance than state-of-the-art decompilers
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.