LightXML: Transformer with Dynamic Negative Sampling for High-Performance Extreme Multi-label Text Classification

Jiang, Ting; Wang, Deqing; Sun, Leilei; Yang, Huayi; Zhao, Zhengyang; Zhuang, Fuzhen

doi:10.48550/arxiv.2101.03305

Cited by 5 publications

(19 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Although previous works leverage negative sampling to alleviate the problem (Jiang et al, 2021;Chang et al, 2020), we argue that it is important to initialize the label embedding with the label side information.…”

Section: Rethinking Dense and Sparse Xmtcmentioning

confidence: 99%

“…The XML-CNN (Liu et al, 2017) and SLICE (Jain et al, 2019) employ the convolutional neural network on word embeddings for document representation. More recently, X-Transformer (Chang et al, 2020), LightXML (Jiang et al, 2021) and APLC-XLNet tames large pre-trained Transformer models to encode the input document into a fixed vector. AttentionXML (You et al, 2018) applies a label-word attention mechanism to calculate label-aware document embeddings, but it requires more computational cost proportional to the document length.…”

Section: Related Workmentioning

confidence: 99%

“…The DEPL+c model looks like an ensemble of the two systems at the first sight, but there are two major differences: 1) As the BERT encoder is shared between the classification and retrieval modules, it doesn't significantly increase the number of parameters as in (Chang et al, 2020;Jiang et al, 2021); and 2) when the two modules are optimized together, the system can take advantages of both units according to the situation of head or tail label predictions.…”

Section: Enhance Classification With Retrievalmentioning

confidence: 99%

“…For the tail label evaluation, our method is compared with the SOTA deep learning models including X-Transformer (Chang et al, 2020), XLNet-APLC , LightXML (Jiang et al, 2021), and AttentionXML (You et al, 2018). X-Transformer, LightXML, and XLNet-APLC employ pre-trained Transformers for document representation.…”

Section: Baselinesmentioning

confidence: 99%

See 3 more Smart Citations

Long-tailed Extreme Multi-label Text Classification with Generated Pseudo Label Descriptions

Ruohong¹,

Yau-Shian²,

Yang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Extreme Multi-label Text Classification (XMTC) has been a tough challenge in machine learning research and applications due to the sheer sizes of the label spaces and the severe data scarce problem associated with the long tail of rare labels in highly skewed distributions. This paper addresses the challenge of tail label prediction by proposing a novel approach, which combines the effectiveness of a trained bag-of-words (BoW) classifier in generating informative label descriptions under severe data scarce conditions, and the power of neural embedding based retrieval models in mapping input documents (as queries) to relevant label descriptions. The proposed approach achieves state-of-the-art performance on XMTC benchmark datasets and significantly outperforms the best methods so far in the tail label prediction. We also provide a theoretical analysis for relating the BoW and neural models w.r.t. performance lower bound.

show abstract

Section: Rethinking Dense and Sparse Xmtcmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Enhance Classification With Retrievalmentioning

confidence: 99%

Section: Baselinesmentioning

confidence: 99%

See 2 more Smart Citations

Long-tailed Extreme Multi-label Text Classification with Generated Pseudo Label Descriptions

Ruohong¹,

Yau-Shian²,

Yang³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…This problem is also present for other machine learning techniques that do self-supervision through contrastive learning in different domains such as computer vision, natural language and graphs [21,52,70]. For example the well-known word2vec [38] word embedding technique randomly samples words that are not relevant for the context (other words in the sentence) to distinguish from the actual word that is part of the context.…”

Section: Negative Sampling For Rankingmentioning

confidence: 99%

Sparse and Dense Approaches for the Full-rank Retrieval of Responses for Dialogues

Penha¹,

Hauff²

2022

Preprint

View full text Add to dashboard Cite

Ranking responses for a given dialogue context is a popular benchmark in which the setup is to re-rank the ground-truth response over a limited set of 𝑛 responses, where 𝑛 is typically 10. The predominance of this setup in conversation response ranking has lead to a great deal of attention to building neural re-rankers, while the first-stage retrieval step has been overlooked. Since the correct answer is always available in the candidate list of 𝑛 responses, this artificial evaluation setup assumes that there is a first-stage retrieval step which is always able to rank the correct response in its top-𝑛 list.In this paper we focus on the more realistic task of full-rank retrieval of responses, where 𝑛 can be up to millions of responses. We investigate both dialogue context and response expansion techniques for sparse retrieval, as well as zero-shot and fine-tuned dense retrieval approaches. Our findings-based on three different informationseeking dialogue datasets-reveal that a learned response expansion technique is a solid baseline for sparse retrieval. We find the best performing method overall to be dense retrieval with intermediate training-a step after the language model pre-training where sentence representations are learned-followed by fine-tuning on the target conversational data. We also investigate the intriguing phenomena that harder negatives sampling techniques lead to worse results for the fine-tuned dense retrieval models. The code and datasets are available at https://github.com/Guzpenha/transformer_ rankers/tree/full_rank_retrieval_dialogues.

show abstract

Towards Extreme Multi-label Text Classification Through Group-wise Label Ranking

Xiong,

Li,

Duan

et al. 2023

International Conferences on Software Engineering and Knowledge Engineering

View full text Add to dashboard Cite

LightXML: Transformer with Dynamic Negative Sampling for High-Performance Extreme Multi-label Text Classification

Cited by 5 publications

References 16 publications

Long-tailed Extreme Multi-label Text Classification with Generated Pseudo Label Descriptions

Long-tailed Extreme Multi-label Text Classification with Generated Pseudo Label Descriptions

Sparse and Dense Approaches for the Full-rank Retrieval of Responses for Dialogues

Towards Extreme Multi-label Text Classification Through Group-wise Label Ranking

Contact Info

Product

Resources

About