The use of classification methods in real-world problems has costs that are usually neglected in the early algorithms which cause inefficiencies in practice. One of these costs, which is significant in many cases, is the cost of obtaining feature values for each instance, named Test-Cost. The Ensemble of classifiers as a common and practical classification method, is also considered and used in this perspective. Each classifier needs a number of features to classify the sample; if instead of using all classifiers, the best arrange of classifiers with the aim of minimizing the needed features is found, an effective solution for lowering the test-cost is obtained. In this paper, a method is proposed which uses reinforcement learning to construct such a Classifier Ensemble. The proposed method learns to find the best sequence of classifiers for each sample to minimize the test-cost. Two problems, an easy one and a hard one, are considered for testing the proposed method, in both of which yields very good results.
The cost of acquiring training data instances for induction of data mining models is one of the main concerns in real-world problems. The web is a comprehensive source for many types of data which can be used for data mining tasks. But the distributed and dynamic nature of web dictates the use of solutions which can handle these characteristics. In this paper, we introduce an automatic method for topical data acquisition from the web. We propose a new type of topical crawlers that use a hybrid link context extraction method for topical crawling to acquire on-topic web pages with minimum bandwidth usage and with the lowest cost. The new link context extraction method which is called Block Text Window (BTW), combines a text window method with a block-based method and overcomes challenges of each of these methods using the advantages of the other one. Experimental results show the predominance of BTW in comparison with state of the art automatic topical web data acquisition methods based on standard metrics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.