Xiyang Hu scite author profile

Outlier detection refers to the identification of rare items that are deviant from the general data distribution. Existing approaches suffer from high computational complexity, low predictive capability, and limited interpretability. As a remedy, we present a novel outlier detection algorithm called COPOD, which is inspired by copulas for modeling multivariate data distribution. COPOD first constructs an empirical copula, and then uses it to predict tail probabilities of each given data point to determine its level of "extremeness". Intuitively, we think of this as calculating an anomalous p-value. This makes COPOD both parameter-free, highly interpretable, and computationally efficient. In this work, we make three key contributions, 1) propose a novel, parameterfree outlier detection algorithm with both great performance and interpretability, 2) perform extensive experiments on 30 benchmark datasets to show that COPOD outperforms in most cases and is also one of the fastest algorithms, and 3) release an easy-to-use Python implementation for reproducibility.

show abstract

ECOD: Unsupervised Outlier Detection Using Empirical Cumulative Distribution Functions

Li¹,

Zhao

et al. 2023

IEEE Trans. Knowl. Data Eng.

125

View full text Add to dashboard Cite

ADBench: Anomaly Detection Benchmark

Han¹,

Huang

et al. 2022

SSRN Journal

View full text Add to dashboard Cite

PyGOD: A Python Library for Graph Outlier Detection

Liu¹,

Dou²,

Zhao³

et al. 2022

Preprint

View full text Add to dashboard Cite

ADBench: Anomaly Detection Benchmark

Han¹,

Hu²,

Huang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Given a long list of anomaly detection algorithms developed in the last few decades, how do they perform with regard to (i) varying levels of supervision, (ii) different types of anomalies, and (iii) noisy and corrupted data? In this work, we answer these key questions by conducting (to our best knowledge) the most comprehensive anomaly detection benchmark with 30 algorithms on 55 benchmark datasets, named ADBench. Our extensive experiments (93,654 in total) identify meaningful insights into the role of supervision and anomaly types, and unlock future directions for researchers in algorithm selection and design. With ADBench, researchers can easily conduct comprehensive and fair evaluations for newly proposed methods on the datasets (including our contributed ones from natural language and computer vision domains) against the existing baselines. To foster accessibility and reproducibility, we fully open-source ADBench and the corresponding results.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Xiyang Hu

COPOD: Copula-Based Outlier Detection

ECOD: Unsupervised Outlier Detection Using Empirical Cumulative Distribution Functions

ADBench: Anomaly Detection Benchmark

PyGOD: A Python Library for Graph Outlier Detection

ADBench: Anomaly Detection Benchmark

Contact Info

Product

Resources

About