Sang-Hyeun Park scite author profile

This paper makes a first step toward the integration of two subfields of machine learning, namely preference learning and reinforcement learning (RL). An important motivation for a preference-based approach to reinforcement learning is the observation that in many real-world domains, numerical feedback signals are not readily available, or are defined arbitrarily in order to satisfy the needs of conventional RL algorithms. Instead, we propose an alternative framework for reinforcement learning, in which qualitative reward signals can be directly used by the learner. The framework may be viewed as a generalization of the conventional RL framework in which only a partial order between policies is required instead of the total order induced by their respective expected long-term reward.Therefore, building on novel methods for preference learning, our general goal is to equip the RL agent with qualitative policy models, such as ranking functions that allow for sorting its available actions from most to least promising, as well as algorithms for learning such models from qualitative feedback. As a proof of concept, we realize a first simple instantiation of this framework that defines preferences based on utilities observed for trajectories. To that end, we build on an existing method for approximate policy iteration based on rollouts. While this approach is based on the use of classification methods for generalization and policy learning, we make use of a specific type of preference learning method called

show abstract

Efficient Pairwise Classification

Park

Fürnkranz

2007

View full text Add to dashboard Cite

Abstract. Pairwise classification is a class binarization procedure that converts a multi-class problem into a series of two-class problems, one problem for each pair of classes. While it can be shown that for training, this procedure is more efficient than the more commonly used one-against-all approach, it still has to evaluate a quadratic number of classifiers when computing the predicted class for a given example. In this paper, we propose a method that allows a faster computation of the predicted class when weighted or unweighted voting are used for combining the predictions of the individual classifiers. While its worst-case complexity is still quadratic in the number of classes, we show that even in the case of completely random base classifiers, our method still outperforms the conventional pairwise classifier. For the more practical case of well-trained base classifiers, its asymptotic computational complexity seems to be almost linear.

show abstract

Preference-Based Policy Iteration: Leveraging Preference Learning for Reinforcement Learning

Cheng¹,

Fürnkranz

Hüllermeier³

et al. 2011

View full text Add to dashboard Cite

Abstract. This paper makes a first step toward the integration of two subfields of machine learning, namely preference learning and reinforcement learning (RL). An important motivation for a "preference-based" approach to reinforcement learning is a possible extension of the type of feedback an agent may learn from. In particular, while conventional RL methods are essentially confined to deal with numerical rewards, there are many applications in which this type of information is not naturally available, and in which only qualitative reward signals are provided instead. Therefore, building on novel methods for preference learning, our general goal is to equip the RL agent with qualitative policy models, such as ranking functions that allow for sorting its available actions from most to least promising, as well as algorithms for learning such models from qualitative feedback. Concretely, in this paper, we build on an existing method for approximate policy iteration based on roll-outs. While this approach is based on the use of classification methods for generalization and policy learning, we make use of a specific type of preference learning method called label ranking. Advantages of our preference-based policy iteration method are illustrated by means of two case studies.

show abstract

Efficient voting prediction for pairwise multilabel classification

2010

View full text Add to dashboard Cite

The pairwise approach to multilabel classification reduces the problem to learning and aggregating preference predictions among the possible labels. A key problem is the need to query a quadratic number of preferences for making a prediction. To solve this problem, we extend the recently proposed QWeighted algorithm for efficient pairwise multiclass voting to the multilabel setting, and evaluate the adapted algorithm on several real-world datasets. We achieve an average-case reduction of classifier evaluations from n 2 to n+dn log n, where n is the total number of labels and d is the average number of labels, which is typically quite small in real-world datasets.

show abstract

Efficient prediction algorithms for binary decomposition techniques

Park

Fürnkranz

2011

Data Min Knowl Disc

View full text Add to dashboard Cite

Binary decomposition methods transform multiclass learning problems into a series of two-class learning problems that can be solved with simpler learning algorithms. As the number of such binary learning problems often grows super-linearly with the number of classes, we need efficient methods for computing the predictions. In this paper, we discuss an efficient algorithm that queries only a dynamically determined subset of the trained classifiers, but still predicts the same classes that would have been predicted if all classifiers had been queried. The algorithm is first derived for the simple case of pairwise classification, and then generalized to arbitrary pairwise decompositions of the learning problem in the form of ternary error-correcting output codes under a variety of different code designs and decoding strategies.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Sang-Hyeun Park

Preference-based reinforcement learning: a formal framework and a policy iteration algorithm

Efficient Pairwise Classification

Preference-Based Policy Iteration: Leveraging Preference Learning for Reinforcement Learning

Efficient voting prediction for pairwise multilabel classification

Efficient prediction algorithms for binary decomposition techniques

Contact Info

Product

Resources

About