“…In this regard, there are only a few web spam corpora publicly available that can be successfully used to train, test, compare and rank existing and novel approaches for effective web spam detection and filtering. Moreover, most of the available alternatives are outdated and distributed in different incompatible formats [ 8 , 9 , 11 , 18 , 19 , 23 , 26 , 28 , 29 , 30 , 31 , 32 ]. This situation forces research teams to always carry out a previous compulsory task of data preparation and preprocessing [ 29 ], which in web spam-filtering domain habitually becomes hard, costly, time consuming and prone to error.…”