The extreme multi-label classification (XMC) task aims at tagging content with a subset of labels from an extremely large label set. The label vocabulary is typically defined in advance by domain experts and assumed to capture all necessary tags. However in real world scenarios this label set, although large, is often incomplete and experts frequently need to refine it. To develop systems that simplify this process, we introduce the task of open vocabulary XMC (OXMC): given a piece of content, predict a set of labels, some of which may be outside of the known tag set. Hence, in addition to not having training data for some labels-as is the case in zero-shot classificationmodels need to invent some labels on-thefly. We propose GROOV, a fine-tuned seq2seq model for OXMC that generates the set of labels as a flat sequence and is trained using a novel loss independent of predicted label order. We show the efficacy of the approach, experimenting with popular XMC datasets for which GROOV is able to predict meaningful labels outside the given vocabulary while performing on par with state-of-the-art solutions for known labels.
We consider a broad category of analytic queries, denoted by scalar product queries, which can be expressed as a scalar product between a known function over multiple database attributes and an unknown set of parameters. More specifically, given a set of ddimensional data points, we retrieve all points x which satisfy an inequality given by a scalar product: a, φ(x) ≤ b. We assume that the function φ : R d → R d is application specific and known apriori, while the query parameters a and the inequality parameter b are known only at the time of querying.Efficiently answering such scalar product queries are essential in a wide range of applications including evaluation of complex SQL functions, time series prediction, scientific simulation, and active learning. Although some specific subclasses of the aforementioned scalar product queries and their applications have been studied in computational geometry, machine learning, and in moving-object queries, surprisingly no generalized indexing scheme has been proposed for efficiently computing scalar product queries.We present a lightweight, yet scalable, dynamic, and generalized indexing scheme, called the Planar index, for answering scalar product queries in an accurate manner, which is based on the idea of indexing function φ(x) for each data point x using multiple sets of parallel hyperplanes. Planar index has loglinear indexing time and linear space complexity, and the query time ranges from logarithmic to being linear in the number of data points. Based on an extensive set of experiments on several real-world and synthetic datasets, we show that Planar index is not only scalable and dynamic, but also effective in various real-world applications including intersection finding between moving objects and active learning.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.