Summary
As one of the most important research problems of data analytics and data mining, outlier detection from large datasets has drawn many research attentions in recent years. In this paper, we investigate the interesting research problem of summarizing large datasets for supporting efficient local outlier detection. To summarize large datasets, efficient summarization algorithms are proposed that produce a highly compact summary of the original dataset, which can be applied to detect local outliers from future similar datasets. A novel automatic parameter optimization method is proposed to produce the optimal setup of the key parameters used in the summarization algorithm. Parallel processing methods are also proposed to accelerate significantly the summarization process. The experimental evaluation results demonstrate that our proposed algorithms are highly scalable for large datasets and effective in producing dataset summary for local outlier detection.