Shay Kels scite author profile

PowerShell is a command line shell, supporting a scripting language, that is widely used in organizations for configuration management and task automation. Unfortunately, PowerShell is also increasingly used by cybercriminals for launching cyber attacks against organizations, mainly because it is pre-installed on Windows machines, exposes strong functionality that may be leveraged by attackers, and its code can be deeply obfuscated in many ways. This makes the problem of detecting malicious PowerShell scripts both urgent and challenging.To the best of our knowledge, our work is the first to address this important problem. We do so by presenting several novel deep learning based detectors of malicious PowerShell scripts. Our best model obtains a true positive rate of nearly 90% while maintaining a low false positive rate of less than 0.1%, indicating that it can be of practical value.Our models employ pre-trained contextual embeddings of words from the PowerShell "language". A contextual word embedding is able to project semantically similar words to proximate vectors in the embedding space. A known problem in the cybersecurity domain is that labeled data is relatively scarce in comparison with unlabeled data, making it difficult to devise effective supervised detection of malicious activity of many types. This is also the case with PowerShell scripts. Our work shows that this problem can be largely mitigated by learning a pre-trained contextual embedding based on unlabeled data.We trained our models' embedding layer using a scripts dataset that was enriched by a large corpus of unlabeled Power-Shell scripts collected from public repositories. As established by our performance analysis, the use of unlabeled data for the embedding significantly improved the performance of our detectors. We estimate that the usage of pre-trained contextual embeddings based on unlabeled data for improved classication accuracy will find additional applications in the cybersecurity domain.

show abstract

Reconstruction of 3D objects from 2D cross-sections with the 4-point subdivision scheme adapted to sets

Kels

Dyn

2011

Computers & Graphics

View full text Add to dashboard Cite

Subdivision Schemes of Sets and the Approximation of Set-Valued Functions in the Symmetric Difference Metric

Kels

Dyn

2013

Found Comput Math

View full text Add to dashboard Cite

AMSI-Based Detection of Malicious PowerShell Code Using Contextual Embeddings

Hendler

Kels

Rubin

2020

View full text Add to dashboard Cite

PowerShell is a command-line shell, supporting a scripting language. It is widely used in organizations for configuration management and task automation but is also increasingly used for launching cyber attacks against organizations, mainly because it is pre-installed on Windows machines and exposes strong functionality that may be leveraged by attackers. This makes the problem of detecting malicious PowerShell code both urgent and challenging. Microsoft's Antimalware Scan Interface (AMSI), built into Windows 10, allows defending systems to scan all the code passed to scripting engines such as PowerShell prior to its execution. In this work, we conduct the first study of malicious PowerShell code detection using the information made available by AMSI. We present several novel deep-learning based detectors of malicious PowerShell code that employ pretrained contextual embeddings of words from the PowerShell "language". A contextual word embedding is able to project semantically-similar words to proximate vectors in the embedding space. A known problem in the cybersecurity domain is that labeled data is relatively scarce, making it difficult to devise effective supervised detection of malicious activity of many types. This is also the case with PowerShell code. Our work shows that this problem can be mitigated by learning a pretrained contextual embedding based on unlabeled data. We trained and evaluated our models using real-world data, collected using AMSI. The contextual embedding was learnt using a large corpus of unlabeled PowerShell scripts and modules collected from public repositories. Our performance analysis establishes that the use of unlabeled data for the embedding significantly improved the performance of our detectors. Our best-performing model uses an architecture that enables the processing of textual signals from both the character and token levels and obtains a true positive rate of nearly 90% while maintaining a low false positive rate of less than 0.1%.

show abstract

AMSI-Based Detection of Malicious PowerShell Code Using Contextual Embeddings

Rubin¹,

Kels²,

Hendler³

2019

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Shay Kels

Detecting Malicious PowerShell Commands using Deep Neural Networks

Reconstruction of 3D objects from 2D cross-sections with the 4-point subdivision scheme adapted to sets

Subdivision Schemes of Sets and the Approximation of Set-Valued Functions in the Symmetric Difference Metric

AMSI-Based Detection of Malicious PowerShell Code Using Contextual Embeddings

AMSI-Based Detection of Malicious PowerShell Code Using Contextual Embeddings

Contact Info

Product

Resources

About