Anomaly detection is being regarded as an unsupervised learning task as
anomalies stem from adversarial or unlikely events with unknown distributions.
However, the predictive performance of purely unsupervised anomaly detection
often fails to match the required detection rates in many tasks and there
exists a need for labeled data to guide the model generation. Our first
contribution shows that classical semi-supervised approaches, originating from
a supervised classifier, are inappropriate and hardly detect new and unknown
anomalies. We argue that semi-supervised anomaly detection needs to ground on
the unsupervised learning paradigm and devise a novel algorithm that meets this
requirement. Although being intrinsically non-convex, we further show that the
optimization problem has a convex equivalent under relatively mild assumptions.
Additionally, we propose an active learning strategy to automatically filter
candidates for labeling. In an empirical study on network intrusion detection
data, we observe that the proposed learning methodology requires much less
labeled data than the state-of-the-art, while achieving higher detection
accuracies
Deep learning has revolutionized data science in many fields by greatly improving prediction performances in comparison to conventional approaches. Recently, explainable artificial intelligence has emerged as an area of research that goes beyond pure prediction improvement by extracting knowledge from deep learning methodologies through the interpretation of their results. We investigate such explanations to explore the genetic architectures of phenotypes in genome-wide association studies. Instead of testing each position in the genome individually, the novel three-step algorithm, called DeepCOMBI, first trains a neural network for the classification of subjects into their respective phenotypes. Second, it explains the classifiers’ decisions by applying layer-wise relevance propagation as one example from the pool of explanation techniques. The resulting importance scores are eventually used to determine a subset of the most relevant locations for multiple hypothesis testing in the third step. The performance of DeepCOMBI in terms of power and precision is investigated on generated datasets and a 2007 study. Verification of the latter is achieved by validating all findings with independent studies published up until 2020. DeepCOMBI is shown to outperform ordinary raw P-value thresholding and other baseline methods. Two novel disease associations (rs10889923 for hypertension, rs4769283 for type 1 diabetes) were identified.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.