“…• BBB-a binary branch and bound method [8] of complexity O((mn + log σ )σ ) in the worst case and O((mn + log log σ ) log σ ) in the best case, • CDP-a classical dynamic programming repeated for all t ∈ [−σ, σ ] of complexity O(mnσ ) [10], • KBB-a k-ary branch and bound method [8] of complexity O((mn + log(σ k/(k − 1)))σ k/(k − 1)) in the worst case and O((mn + log(k log k σ ))k × log k σ ) in the best case (we used k = 3 in our experiments, following [8] where the authors found that this value is the best in their experiments; the choice of k was also confirmed in our preliminary experiments), • SDP-a sparse dynamic programming [11] of complexity O(mn log m), • YBP-a bit-parallel algorithm [2] of complexity O(mn σ/w ), where w is the machine word size (in bits), • HBP-a bit-parallel LCS algorithm [6] repeated for all possible t values of complexity O( n/w mσ ), • NGMD-a recent algorithm [14] of complexity O(mn log log σ ).…”
“…• BBB-a binary branch and bound method [8] of complexity O((mn + log σ )σ ) in the worst case and O((mn + log log σ ) log σ ) in the best case, • CDP-a classical dynamic programming repeated for all t ∈ [−σ, σ ] of complexity O(mnσ ) [10], • KBB-a k-ary branch and bound method [8] of complexity O((mn + log(σ k/(k − 1)))σ k/(k − 1)) in the worst case and O((mn + log(k log k σ ))k × log k σ ) in the best case (we used k = 3 in our experiments, following [8] where the authors found that this value is the best in their experiments; the choice of k was also confirmed in our preliminary experiments), • SDP-a sparse dynamic programming [11] of complexity O(mn log m), • YBP-a bit-parallel algorithm [2] of complexity O(mn σ/w ), where w is the machine word size (in bits), • HBP-a bit-parallel LCS algorithm [6] repeated for all possible t values of complexity O( n/w mσ ), • NGMD-a recent algorithm [14] of complexity O(mn log log σ ).…”
“…In particular both the points we mentioned are part of the MLCS in the top box of Figure 1. On the contrary the match (2,9,4,8), corresponding to letter 'B', does not dominate the match (4, 2, 1, 1), because in the first coordinate we have 2 < 4. These two matches are not compatible and therefore may not both occur in a MCS.…”
We consider the problem of updating the information about multiple longest common sub-sequences. This kind of sub-sequences is used to highlight information that is shared across several information sequences, therefore it is extensively used namely in bioinformatics and computational genomics. In this paper we propose a way to maintain this information when the underlying sequences are subject to modifications, namely when letters are added and removed from the extremes of the sequence. Experimentally our data structure obtains significant improvements over the state of the art.
“…Since the amount of malware is increasing, we need a faster algorithm for finding an LCS. Crochemore et al [10] have proposed a bit-vector algorithm with a processing time of O( MN w ), where w is the number of bits in a machine word. The method assigns one bit to a cell in the DP matrix, and calculates w cells in bulk using four operations (and, or, not and add).…”
Section: A Lcs Problem and Bit-vector Algorithmmentioning
We propose two novel techniques for reducing the workload for malware analysis. The first technique is restricted instruction, which accelerates finding the longest common subsequence (LCS) between machine code instruction sequences of malware. The second technique is probabilistic disassembly, which can find the most probable disassembly result of a binary stream without a clue, such as debug symbols or the information of import functions. By combining the two proposals and our generic unpacker, we built an automatic malware classification system. Given an unknown malware program, the system enables malware analysts to find the most similar known malware program to this unknown one, and even estimate different/common instructions. In one of our experiments, we classified 3,233 malware samples in the wild and concluded that 75% of the samples belong to the seven largest clusters. As a result, only seven samples, one from each cluster, were required to be analyzed in order to reveal the functionality of the rest of the 75%, showing a great increase in efficiency of analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.