Abstract-The rise in popularity of the Android platform has resulted in an explosion of malware threats targeting it. As both Android malware and the operating system itself constantly evolve, it is very challenging to design robust malware mitigation techniques that can operate for long periods of time without the need for modifications or costly re-training. In this paper, we present MAMADROID, an Android malware detection system that relies on app behavior. MAMADROID builds a behavioral model, in the form of a Markov chain, from the sequence of abstracted API calls performed by an app, and uses it to extract features and perform classification. By abstracting calls to their packages or families, MAMADROID maintains resilience to API changes and keeps the feature set size manageable. We evaluate its accuracy on a dataset of 8.5K benign and 35.5K malicious apps collected over a period of six years, showing that it not only effectively detects malware (with up to 99% F-measure), but also that the model built by the system keeps its detection capabilities for long periods of time (on average, 86% and 75% F-measure, respectively, one and two years after training). Finally, we compare against DROIDAPIMINER, a state-of-the-art system that relies on the frequency of API calls performed by apps, showing that MAMADROID significantly outperforms it.
Classifying streaming data requires the development of methods which are computationally efficient and able to cope with changes in the underlying distribution of the stream, a phenomenon known in the literature as concept drift. We propose a new method for detecting concept drift which uses an Exponentially Weighted Moving Average (EWMA) chart to monitor the misclassification rate of an streaming classifier. Our approach is modular and can hence be run in parallel with any underlying classifier to provide an additional layer of concept drift detection. Moreover our method is computationally efficient with overhead O(1) and works in a fully online manner with no need to store data points in memory. Unlike many existing approaches to concept drift detection, our method allows the rate of false positive detections to be controlled and kept constant over time.
Genetic association studies are commonly conducted to identify genes that explain the variability in a measured trait (e.g., disease status or disease progression). Often, results of these studies are summarized in the form of a p value corresponding to a test of association between each single nucleotide polymorphisms (SNPs) and the trait under study. As genes are comprised of multiple SNPs, post hoc approaches are generally applied to determine gene-level association. For example, if any SNP within a gene is significantly associated with the trait at a genome-wide significance level (p < 5 × 10 −8 ), then the corresponding gene is considered significant. A complementary strategy, termed mix ed modeling of meta-analysis p values (MixMAP) was proposed recently to characterize formally the associations between genes (or gene regions) and a trait based on multiple SNP-level p values. Here, the MixMAP package is presented as a means for implementing the MixMAP procedure in R.
The volatility of financial instruments is rarely constant, and usually varies over time. This creates a phenomenon called volatility clustering, where large price movements on one day are followed by similarly large movements on successive days, creating temporal clusters. The GARCH model, which treats volatility as a drift process, is commonly used to capture this behavior. However research suggests that volatility is often better described by a structural break model, where the volatility undergoes abrupt jumps in addition to drift. Most efforts to integrate these jumps into the GARCH methodology have resulted in models which are either very computationally demanding, or which make problematic assumptions about the distribution of the instruments, often assuming that they are Gaussian. We present a new approach which uses ideas from nonparametric statistics to identify structural break points without making such distributional assumptions, and then models drift separately within each identified regime. Using our method, we investigate the volatility of several major stock indexes, and find that our approach can potentially give an improved fit compared to more commonly used techniques.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.