Abstract. We give a compression scheme for any maximum class of VC dimension d that compresses any sample consistent with a concept in the class to at most d unlabeled points from the domain of the sample.
We consider the following type of online variance minimization problem: In every trial t our algorithms get a covariance matrix C t and try to select a parameter vector wsuch that the total variance over a sequence of trials T t=1 (w t−1 ) C t w t−1 is not much larger than the total variance of the best parameter vector u chosen in hindsight. Two parameter spaces in R n are considered-the probability simplex and the unit sphere. The first space is associated with the problem of minimizing risk in stock portfolios and the second space leads to an online calculation of the eigenvector with minimum eigenvalue of the total covariance matrix T t=1 C t . For the first parameter space we apply the Exponentiated Gradient algorithm which is motivated with a relative entropy regularization. In the second case, the algorithm has to maintain uncertainty information over all unit directions u. For this purpose, directions are represented as dyads uu and the uncertainty over all directions as a mixture of dyads which is a density matrix. The motivating divergence for density matrices is the quantum version of the relative entropy and the resulting algorithm is a special case of the Matrix Exponentiated Gradient algorithm. In each of the two cases we prove bounds on the additional total variance incurred by the online algorithm over the best offline parameter.
One of the main concepts in quantum physics is a density matrix, which is a symmetric positive definite matrix of trace one. Finite probability distributions can be seen as a special case when the density matrix is restricted to be diagonal.We develop a probability calculus based on these more general distributions that includes definitions of joints, conditionals and formulas that relate these, including analogs of the Theorem of Total Probability and various Bayes rules for the calculation of posterior density matrices. The resulting calculus parallels the familiar "conventional" probability calculus and always retains the latter as a special case when all matrices are diagonal.Whereas the conventional Bayesian methods maintain uncertainty about which model has the highest data likelihood, the generalization maintains uncertainty about which unit direction has the largest variance. Surprisingly the bounds also generalize: as in the conventional setting we upper bound the negative log likelihood of the data by the negative log likelihood of the MAP estimator.1 The machinery for infinite dimensional vector spaces is available. However, in this paper we start with the simplest finite dimensional setting.2 In quantum physics complex numbers are used instead of reals. In that case "symmetric" is replaced by "hermitian" and all our formulas hold for that case as well.
A number of updates for density matrices have been developed recently that are motivated by relative entropy minimization problems. The updates involve a softmin calculation based on matrix logs and matrix exponentials. We show that these updates can be kernelized. This is important because the bounds provable for these algorithms are logarithmic in the feature dimension (provided that the 2-norm of feature vectors is bounded by a constant). The main problem we focus on is the kernelization of an online PCA algorithm which belongs to this family of updates.
Many recent advances in neural information retrieval models, which predict top-items given a query, learn directly from a large training set of (query, item) pairs. However, they are often insufficient when there are many previously unseen (query, item) combinations, often referred to as the cold start problem. Furthermore, the search system can be biased towards items that are frequently shown to a query previously, also known as the "rich get richer" (a.k.a. feedback loop) problem. In light of these problems, we observed that most online content platforms have both a search and a recommender system that, while having heterogeneous input spaces, can be connected through their common output item space and a shared semantic representation. In this paper, we propose a new Zero-Shot Heterogeneous Transfer Learning framework that transfers learned knowledge from the recommender system component to improve the search component of a content platform. First, it learns representations of items and their natural-language features by predicting (item, item) correlation graphs derived from the recommender system as an auxiliary task. Then, the learned representations are transferred to solve the target search retrieval task, performing query-to-item prediction without having seen any (query, item) pairs in training. We conduct online and offline experiments on one of the world's largest search and recommender systems from Google, and present the results and lessons learned. We demonstrate that the proposed approach can achieve high performance on offline search retrieval tasks, and more importantly, achieved significant improvements on relevance and user interactions over the highly-optimized production system in online experiments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.