We demonstrate that the previously introduced Widening framework is applicable to state-of-the-art Machine Learning algorithms. Using Krimp, an itemset mining algorithm, we show that parallelizing the search finds better solutions in nearly the same time as the original, sequential/greedy algorithm. We also introduce Reverse Standard Candidate Order (RSCO) as a candidate ordering heuristic for Krimp. 1 Introduction Research into parallelism in Machine Learning has primarily focused on reducing the execution time of existing algorithms, e.g., parallelized k-Means [23,17,14,26] and Dbscan [11,4,7]. There have been some exceptions, such as metalearning and ensemble methods [9], which have employed heterogeneous algorithms in parallel, and [3], which describes the application to simple examples. Recent work [2,15] describes Widening, a framework for employing parallel resources to increase accuracy. With Widening, measures of diversity are used to guarantee the parallel search paths' exploration of disparate regions within a solution space, thereby stepping around the common greedy algorithmic tendency to find local optima. Thus far, work has concentrated on a proof-of-concept and demonstrative application to algorithms for solving the Set Cover Problem and the creation of Decision Trees. This document describes the same approach, but with a state-of-the-art algorithm, Krimp [24]. Krimp finds "interesting" itemsets from a transactional database via the Minimum Description Length (MDL) principle [21]. The authors summarize the method as "the best set of patterns [being] the set of patterns that describes the data best," where the best set of itemsets is the set that provides the highest compression using MDL. The algorithm not only provides a solution to the problem of pattern explosion, thereby greatly reducing the set of itemsets used to generate association rules, but provides exceptional performance in other applications such as classification [24]. This paper demonstrates that it is possible to apply Widening to find even more interesting sets of itemsets than those found by the standard Krimp algorithm.
Abstract. We demonstrate the application of Widening to learning performant Bayesian Networks for use as classifiers. Widening is a framework for utilizing parallel resources and diversity to find models in a hypothesis space that are potentially better than those of a standard greedy algorithm. This work demonstrates that widened learning of Bayesian Networks, using the Frobenius Norm of the networks' graph Laplacian matrices as a distance measure, can create Bayesian networks that are better classifiers than those generated by popular Bayesian Network algorithms.
Widening is a method where parallel resources are used to find better solutions from greedy algorithms instead of merely trying to find the same solutions more quickly. To date, every example of Widening has used some form of communication between the parallel workers to maintain their distances from one another in the model space. For the first time, we present a communication-free, widened extension to a standard machine learning algorithm. By using Locality Sensitive Hashing on the Bayesian networks' Fiedler vectors, we demonstrate the ability to learn classifiers superior to those of standard implementations and to those generated with a greedy heuristic alone.
Index investing has an advantage over active investment strategies, because less frequent trades results in lower expenses, yielding higher long-term returns. Index tracking is a popular investment strategy that attempts to find a portfolio replicating the performance of a collection of investment vehicles. This paper considers index tracking from the perspective of solution space exploration. Three search space heuristics in combination with three portfolio tracking error methods are compared in order to select a tracking portfolio with returns that mimic a benchmark index. Experimental results conducted on real-world datasets show that Widening, a metaheuristic using diverse parallel search paths, finds superior solutions than those found by the reference heuristics. Presented here are the first results using Widening on time-series data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.