Code reuse is widespread in software development as well as internet of things (IoT) devices. However, code reuse introduces many problems, e.g., software plagiarism and known vulnerabilities. Solving these problems requires extensive manual reverse analysis. Fortunately, binary clone detection can help analysts mitigate manual work by matching reusable code and known parts. However, many binary clone detection methods are not robust to various compiler optimization options and different architectures. While some clone detection methods can be applied across different architectures, they rely on manual features based on human prior knowledge to generate feature vectors for assembly functions and fail to consider the internal associations between features from a semantic perspective. To address this problem, we propose and implement a prototype GeneDiff, a semantic-based representation binary clone detection approach for cross-architectures. GeneDiff utilizes a representation model based on natural language processing (NLP) to generate high-dimensional numeric vectors for each function based on the Valgrind intermediate representation (VEX) representation. This is the first work that translates assembly instructions into an intermediate representation and uses a semantic representation model to implement clone detection for cross-architectures. GeneDiff is robust to various compiler optimization options and different architectures. Compared to approaches using symbolic execution, GeneDiff is significantly more efficient and accurate. The area under the curve (AUC) of the receiver operating characteristic (ROC) of GeneDiff reaches 92.35%, which is considerably higher than the approaches that use symbolic execution. Extensive experiments indicate that GeneDiff can detect similarity with high accuracy even when the code has been compiled with different optimization options and targeted to different architectures. We also use real-world IoT firmware across different architectures as targets, therein proving the practicality of GeneDiff in being able to detect known vulnerabilities.
At present, embedded devices have become a part of people’s lives, so detecting security vulnerabilities contained in devices becomes imperative. There are three challenges in detecting embedded device vulnerabilities: (1) Most network protocols are stateful; (2) the communication between the web front-end and the device is encrypted or encoded; and (3) the conditional constraints of programs in the device reduce the depth and breadth of fuzz testing. To address these challenges, we propose a new type of gray-box fuzz testing framework in this paper, called EWVHunter, which is mainly used to find authentication bypass and command injection vulnerabilities in embedded devices. The key idea in this paper is based on the observation that most embedded devices are controlled through the web front-end. Such embedded devices often contain rich information in the communication protocol between the web front-end and device. Therefore, by filling data at the input source on the web front-end and reusing web front-end program logic, we can effectively solve the impact of the stateful network protocol and communication data encryption on fuzzing without relying on any knowledge about the communication protocol. Additionally, we use firmware information extraction to enhance EWVHunter so that it can detect vulnerabilities in deep layer codes and hidden interfaces. In our research, we implemented EWVHunter and evaluated 8 real-world embedded devices, and our approach identified 12 vulnerabilities (including 7 zero-days), which affect a total of 31,996 online devices.
Despite greater attention being paid to sensitive-information leakage in the cyberdomain, the sensitive-information problem of the physical domain remains neglected. Anonymous users can easily access the sensitive information of other users, such as transaction information, health status, and addresses, without any advanced technologies. Ideally, secret messages should be protected not only in the cyberdomain but also in the complex physical domain. However, popular steganography schemes only work in the traditional cyberdomain and are useless when physical distortions of messages are unavoidable. This paper first defines the concept of cross-domain steganography, and then proposes EasyStego, a novel cross-domain steganography scheme. EasyStego is based on the use of QR barcodes as carriers; therefore, it is robust to physical distortions in the complex physical domain. Moreover, EasyStego has a large capacity for embeddable secrets and strong scalability in various scenarios. EasyStego uses an AES encryption algorithm to control the permissions of secret messages, which is more effective in reducing the possibility of sensitive-information leakage. Experiments show that EasyStego has perfect robustness and good efficiency. Compared with the best current steganography scheme based on barcodes, EasyStego has greater steganographic capacity and less impact on barcode data. In robustness tests, EasyStego successfully extracts secret messages at different angles and distances. In the case of adding natural textures and importing quantitative error bits, other related steganography techniques fail, whereas EasyStego can extract secret messages with a success rate of nearly 100%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.