Preference-based learning to rank

Ailon, Nir; Mohri, Mehryar

doi:10.1007/s10994-010-5176-9

Cited by 15 publications

(28 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Active preference-based learning has been successfully used in many domains [1,6,7,14], but what makes applying it to learning reward functions difficult is the complexity of the queries, as well as the continuous nature of the underlying hypothesis space of possible reward functions. We focus on dynamical systems with continuous or hybrid discrete-continuous state.…”

Section: Introductionmentioning

confidence: 99%

Active Preference-Based Learning of Reward Functions

Sadigh

Dragan

Sastry

et al. 2017

Robotics: Science and Systems XIII

230

303

View full text Add to dashboard Cite

Abstract-Our goal is to efficiently learn reward functions encoding a human's preferences for how a dynamical system should act. There are two challenges with this. First, in many problems it is difficult for people to provide demonstrations of the desired system trajectory (like a high-DOF robot arm motion or an aggressive driving maneuver), or to even assign how much numerical reward an action or trajectory should get. We build on work in label ranking and propose to learn from preferences (or comparisons) instead: the person provides the system a relative preference between two trajectories. Second, the learned reward function strongly depends on what environments and trajectories were experienced during the training phase. We thus take an active learning approach, in which the system decides on what preference queries to make. A novel aspect of our work is the complexity and continuous nature of the queries: continuous trajectories of a dynamical system in environments with other moving agents (humans or robots). We contribute a method for actively synthesizing queries that satisfy the dynamics of the system. Further, we learn the reward function from a continuous hypothesis space by maximizing the volume removed from the hypothesis space by each query. We assign weights to the hypothesis space in the form of a log-concave distribution and provide a bound on the number of iterations required to converge. We show that our algorithm converges faster to the desired reward compared to approaches that are not active or that do not synthesize queries in an autonomous driving domain. We then run a user study to put our method to the test with real people.

show abstract

Section: Introductionmentioning

confidence: 99%

Active Preference-Based Learning of Reward Functions

Sadigh

Dragan

Sastry

et al. 2017

Robotics: Science and Systems XIII

230

303

View full text Add to dashboard Cite

show abstract

“…Preference-based models have been considered in various works [17]- [19]. In preference-based models, instead of learning the scoring function for each particular item, a preference function over pairs of items is learned in the training stage.…”

Section: Introductionmentioning

confidence: 99%

“…For instance, [18] proves that a simple approach [17], which is of time complexity quadratic to the number of test instances, is a 2-approximation of the optimal solution for a special ranking task called bipartite ranking. [19], [23] make improvements by using a quick-sort-like approach to achieve 3-approximation within sub-quadratic time complexity, and [24], [25] further achieve (1 + )-approximation within sub-quadratic time. Given the theoretical nature of the works, though many of them have discussed the possibility to employ preference-based LTR [19], [25], [26], few have yet to design algorithms that work well in practice or examine them properly on real-world data.…”

Section: Introductionmentioning

confidence: 99%

“…We design FUZZY-SORT, a divide-and-conquer approach for the prediction stage of preference-based LTR. FUZZY-SORT is inspired by the quick-sort routine within [19], [23], and employs a bottom-up routine to recursively sort items in the query list, actively querying more important pairwise preferences in the process. The divide-and-conquer scheme enables FUZZY-SORT to return a ranking list in sub-quadratic complexity.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A practical divide-and-conquer approach for preference-based learning to rank

Yang

Lin

2015

2015 Conference on Technologies and Applications of Artificial Intelligence (TAAI)

View full text Add to dashboard Cite

Abstract-Preference-based learning to rank (LTR) is a model that learns the underlying pairwise preference with soft binary classification, and then ranks test instances based on pairwise preference predictions. The model can be viewed as an alternative to the popular score-based LTR model, which learns a scoring function and ranks test instances based on their scores directly. Many existing works on preference-based LTR address the step of ranking test instances as the problem of weighted minimum feedback arcset on tournament graph. The problem is somehow NPhard to solve and existing algorithms cannot efficiently produce a decent solution. We propose a practical algorithm to speed up the ranking step while maintaining ranking accuracy. The algorithm employs a divide-and-conquer strategy that mimics merge-sort, and its time complexity is relatively low when compared to other preference-based LTR algorithms. Empirical results demonstrate that the accuracy of the proposed algorithm is competitive to state-of-the-art score-based LTR algorithms. I. INTRODUCTIONThe problem of learning to rank (LTR) arises in many applications ranging from web search to recommendation systems [1]- [3]. Given a list of items, the goal of LTR is to rearrange the items in a certain order such that the more relevant items are ranked before the less relevant ones. Because of its significance for vast applications, many different models are proposed to deal with the LTR problem [1]- [3].There are two major categories of LTR models, namely score-based models and preference-based models. In scorebased models, the learning algorithm in the training stage aims to produce a scoring function that maps each item to a real-valued score; then, the prediction algorithm produces the final ranking from the linear order induced from the scoring function. The training stage of score-based LTR models is in this sense similar to that of common regression models, which also map items to scores. Many works in tackling the LTR problem thus borrow the so-called pointwise ranking perspective from regression, such as PRanking [4] and large margin ordinal regression [5].Nevertheless, score-based LTR models care about the goodness of the final ranking while regression models care about the accuracy of scores themselves. Such difference makes pointwise ranking less satisfactory in producing a decent final ranking. Many other score-based LTR models therefore try to optimize different ranking-related loss functions. The loss functions often depend on the pairwise or listwise relations

show abstract

“…There are many known regret reductions for such problems as multiclass classification [33,47], costsensitive classification [12,39], and ranking [2,3]. There is also a rich body of work on so called surrogate regret bounds.…”

mentioning

confidence: 99%