Abstract. Scalable feature selection algorithms should remove irrelevant and redundant features and scale well on very large datasets. We identify that the currently best state-of-art methods perform well on binary classification tasks but often underperform on multi-class tasks. We suggest that they suffer from the so-called accumulative effect which becomes more visible with the growing number of classes and results in removing relevant and unredundant features. To remedy the problem, we propose two new feature filtering methods which are both scalable and well adapted for the multi-class cases. We report the evaluation results on 17 different datasets which include both binary and multi-class cases.
Abstract:We address the problems of structuring and annotation of layout-oriented documents. We model the annotation problems as the collective classification on graph-like structures with typed instances and links that capture the domain-specific knowledge. We use the relational dependency networks (RDNs) for the collective inference on the multi-typed graphs. We then describe a variant of RDNs where a stacked approximation replaces the Gibbs sampling in order to accelerate the inference. We report results of evaluation tests for both the Gibbs sampling and stacking inference on two document structuring examples.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.