In recent years, dynamic program analysis (DPA) has been widely used in various fields such as profiling, finding bugs, and security. However, existing solutions have their own weaknesses. Software solutions provide flexibility in DPA but they suffer from tremendous performance overhead. In contrast, core-level hardware engines rely on specialized integrated logics and attain extremely fast computation, but they have a limited functional extensibility because the logics are tightly coupled with the host processor. To mend this, a prior system-level approach utilizes an existing channel to integrate their hardware without necessitating the host architecture modification and introduced great potential in performance. Nevertheless, the prior work does not address the detailed design and implementation of the engine, which is quite essential to leverage the deployment on real systems. To address this, in this article, we propose an implementation of programmable DPA hardware engine, called program analysis unit (PAU). PAU is an application-specific instruction-set processor (ASIP) whose instruction set is customized to reflect common features of various DPA methods. With the specialized architecture and programmability of software, our PAU aims at fast computation and sufficient flexibility. In our case studies on several DPA techniques, we show that our ASIP approach can be successfully applicable to complex DPA schemes while providing hardware-backed power in performance and software-based flexibility in analysis. Recent experiments on our FPGA prototype revealed that the performance of PAU is 4.7-13.6 times faster than pure software DPA, and the power/area consumption is also acceptably small compared to today's mobile processors. . 2015. Implementing an application-specific instruction-set processor for system-level dynamic program analysis engines.
For decades, various concepts in security monitoring have been proposed. In principle, they all in common in regard to the monitoring of the execution behavior of a program (e.g., control-flow or dataflow) running on the machine to find symptoms of attacks. Among the proposed monitoring schemes, software-based ones are known for their adaptability on the commercial products, but there have been concerns that they may suffer from nonnegligible runtime overhead. On the other hand, hardware-based solutions are recognized for their high performance. However, most of them have an inherent problem in that they usually mandate drastic changes to the internal processor architecture. More recent ones have strived to minimize such modifications by employing external hardware security monitors in the system. However, these approaches intrinsically suffer from the overhead caused by communication between the host and the external monitor. Our solution also relies on external hardware for security monitoring, but unlike the others, ours tackles the communication overhead by using the core debug interface (CDI), which is readily available in most commercial processors for debugging. We build our system simply by plugging our monitoring hardware into the processor via CDI, precluding the need for altering the processor internals. To validate the effectiveness of our approach, we implement two well-known monitoring techniques on our proposed framework: dynamic information flow tracking and branch regulation. The experimental results on our FPGA prototype show that our external hardware monitors efficiently perform monitoring tasks with negligible performance overhead, mainly with thanks to the support of CDI, which helps us reduce communication costs substantially.
Abstract. Reconfigurable Architecture (RA), which provides extremely high energy efficiency for certain domains of applications, have one problem that current mapping algorithms for it do not scale well with the number of cores. One approach to this problem is using SIMD (Single Instruction Multiple Data) paradigm. However, SIMD can complicate the mapping problem by adding an additional dimension, i.e., iteration mapping, to the already inter-dependent problems of data mapping and operation mapping, and can significantly affect performance through memory bank conflicts. In this paper we introduce SIMD reconfigurable architecture, which allows for SIMD mapping at multiple levels of granularity, and investigate ways to minimize bank conflicts in a SIMD reconfigurable architecture with the related sub-problems taken into consideration. We further present data tiling and evaluate a conflict-free scheduling algorithm as a way to eliminate bank conflicts for a certain class of iteration and data mapping.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.