Outlier detection is an important task in statistical analyses. An outlier is a case-specific unit since it may be interpreted as natural extreme noise in some applications, whereas in other applications it may be the most interesting observation. The molic package has been written to facilitate the novel outlier detection method in high-dimensional contingency tables (Lindskou, Eriksen, & Tvedebrink, 2019). In other words, the method works for data sets in which all variables are categorical, implying that they can only take on a finite set of values (also called levels).The software uses decomposable graphical models (DGMs), where the probability mass function can be associated with an interaction graph, from which conditional independences among the variables can be inferred. This gives a way to investigate the underlying nature of outliers. This is also called understandability in the literature. Outlier detection has many applications including areas such as
• Fraud detection • Medical and public health • Anomaly detection in text data • Fault detection (on critical systems) • Forensic science
The MethodThe method can be described by the outlier test procedure below. Assume we are interested in whether or not a new observation z is an outlier in some data set D. First an interaction graph G is fitted to the variables in D; a decomposable undirected graph that describes the association structure between variables in D. If the assumption that z belongs to D is true, z should be included in D. Denote by D z the new data set including z. Finally the outlier model M is constructed using G and D z from which we can query the p-value, p, for the test about z belonging to D. If p is less than some chosen threshold (significance level), say 0.05, z is declared an outlier in D.