Learning Document Representations Using Subspace Multinomial Model

Kesiraju, Santosh; Burget, Lukáš; Szöke, Igor; Černocký, Jaň

doi:10.21437/interspeech.2016-1634

Cited by 12 publications

(17 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this work, we use the l1-SMM variant [11], which adds regularization term to the objective function (3): An l1 regularization on the entries of matrix T and l2 regularization on the i-vectors themselves.…”

Section: Subspace Multinomial Modelmentioning

confidence: 99%

“…For details of the training procedure, see the respective paper [11]. We use a publicly available implementation 2 of l1-SMM to obtain i-vectors.…”

Section: Subspace Multinomial Modelmentioning

confidence: 99%

“…Both regularization coefficients (l1 on T and l2 on i d ) were set to 10 −4 . We have trained the models using orthant-wise learning [11] until convergence. Figure 3 shows how partial i-vectors evolve in the latent space.…”

Section: Behaviour Of Partial I-vectorsmentioning

confidence: 99%

“…To address these issues, we propose to enhance FN-LMs by a document summary i-vector estimated from a variant of Subspace Multinomial Model (SMM) [11]. This way, we allow the model to exploit a long context.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

i-Vectors in Language Modeling: An Efficient Way of Domain Adaptation for Feed-Forward Models

Beneš¹,

Kesiraju

Burget

2018

Interspeech 2018

Self Cite

View full text Add to dashboard Cite

We show an effective way of adding context information to shallow neural language models. We propose to use Subspace Multinomial Model (SMM) for context modeling and we add the extracted i-vectors in a computationally efficient way. By adding this information, we shrink the gap between shallow feed-forward network and an LSTM from 65 to 31 points of perplexity on the Wikitext-2 corpus (in the case of neural 5-gram model). Furthermore, we show that SMM i-vectors are suitable for domain adaptation and a very small amount of adaptation data (e.g. endmost 5 % of a Wikipedia article) brings a substantial improvement. Our proposed changes are compatible with most optimization techniques used for shallow feedforward LMs.

show abstract

Section: Subspace Multinomial Modelmentioning

confidence: 99%

“…For details of the training procedure, see the respective paper [11]. We use a publicly available implementation 2 of l1-SMM to obtain i-vectors.…”

Section: Subspace Multinomial Modelmentioning

confidence: 99%

Section: Behaviour Of Partial I-vectorsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

i-Vectors in Language Modeling: An Efficient Way of Domain Adaptation for Feed-Forward Models

Beneš¹,

Kesiraju

Burget

2018

Interspeech 2018

Self Cite

View full text Add to dashboard Cite

show abstract

“…There are several natural language (NLP) processing tasks that involve such long sequences. Of particular interest are topic identification of spoken conversations [4,5,6] and call center customer satisfaction prediction [7,8,9,10]. Call center conversations, while usually quite short and to the point, often involve agents trying to solve very complex issues that the customers experience, resulting in some calls taking even an hour or more.…”

Section: Introductionmentioning

confidence: 99%

Hierarchical Transformers for Long Document Classification

Pappagari

Żelasko

Villalba

et al. 2019

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

168

View full text Add to dashboard Cite

BERT, which stands for Bidirectional Encoder Representations from Transformers, is a recently introduced language representation model based upon the transfer learning paradigm. We extend its fine-tuning procedure to address one of its major limitations -applicability to inputs longer than a few hundred words, such as transcripts of human call conversations. Our method is conceptually simple. We segment the input into smaller chunks and feed each of them into the base model. Then, we propagate each output through a single recurrent layer, or another transformer, followed by a softmax activation. We obtain the final classification decision after the last segment has been consumed. We show that both BERT extensions are quick to fine-tune and converge after as little as 1 epoch of training on a small, domain-specific data set. We successfully apply them in three different tasks involving customer call satisfaction prediction and topic classification, and obtain a significant improvement over the baseline models in two of them.

show abstract

KNN-Based Pseudo-supervised RCNN Framework for Text Clustering

Chen

Guo

2019

Advances in Natural Computation, Fuzzy Systems and Knowledge Discovery

View full text Add to dashboard Cite

Learning Document Representations Using Subspace Multinomial Model

Cited by 12 publications

References 14 publications

i-Vectors in Language Modeling: An Efficient Way of Domain Adaptation for Feed-Forward Models

i-Vectors in Language Modeling: An Efficient Way of Domain Adaptation for Feed-Forward Models

Hierarchical Transformers for Long Document Classification

KNN-Based Pseudo-supervised RCNN Framework for Text Clustering

Contact Info

Product

Resources

About