Malware is any malicious code that has the potential to harm any computer or network. The amount of malware is increasing faster every year and poses a serious security threat. Thus, malware detection is a critical topic in computer security. Currently, signature-based detection is the most extended method for detecting malware. Although this method is still used on most popular commercial computer antivirus software, it can only achieve detection once the virus has already caused damage and it is registered. Therefore, it fails to detect new malware. Applying a methodology proven successful in similar problem-domains, we propose the use of ngrams (every substring of a larger string, of a fixed lenght n) as file signatures in order to detect unknown malware whilst keeping low false positive ratio. We show that n-grams signatures provide an effective way to detect unknown malware.
Malware is any kind of program explicitly designed to harm, such as viruses, trojan horses or worms. Since the amount of malware is growing exponentially, it already poses a serious security threat. Therefore, every incoming code must be analysed in order to classify it as malware or benign software. These tests commonly combine static and dynamic analysis techniques in order to extract the major amount of information from distrustful files. Moreover, the increment of the number of attacks hinders manually testing the thousands of suspicious archives that every day reach antivirus laboratories. Against this background, we address here an automatised system for malware behaviour analysis based on emulation and simulation techniques. Hence, creating a secure and reliable sandbox environment allows us to test the suspicious code retrieved without risk. In this way, we can also generate evidences and classify the samples with several machine-learning algorithms. We have developed the proposed solution, testing it with real malware. Finally, we have evaluated it in terms of reliability and time performance, two of the main aspects for such a system to work.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.