T he success and performance of Machine Learning (ML) algorithms closely depend on the datasets used, their sample and feature spaces, and sampling quality. Researchers who build a classifier that is trained and tested on a dataset publish their classification performances in terms of standard metrics such as accuracy, true positive rate, or F1 [1]. The classifiers are compared with other classifiers that are trained and tested on different datasets via the same performance metrics. The datasets are usually not compared or analyzed. On the other hand, researchers who wish to enrich their datasets usually merge new datasets they acquired from other sources without analyzing them. They could not be sure how these datasets are different from the existing ones.