Better Word Embeddings by Disentangling Contextual n-Gram Information

Gupta, Prakhar; Pagliardini, Matteo; Jäggi, Martin

doi:10.18653/v1/n19-1098

Cited by 28 publications

(26 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…From the processed data, uni-, bi-, and trigrams are jointly trained in deriving our custom word embeddings. Previous work suggests that multigram can improve on the quality of the obtained word embeddings (Gupta et al, 2019). In addition to word embeddings, we also derive document embeddings based on the cleaned text using pre-trained contextual multilingual universal sentence encoders.…”

Section: Text Mining Pipelinementioning

confidence: 99%

Semantic search with domain-specific word-embedding and production monitoring in Fintech

Farmanbar¹,

Ommeren²,

Zhao

2020

Proceedings of the 28th International Conference on Computational Linguistics: System Demonstrations

View full text Add to dashboard Cite

We present an end-to-end information retrieval system with domain-specific custom language models for accurate search terms expansion. The text mining pipeline tackles several challenges faced in an industry-setting, including multi-lingual jargon-rich unstructured text and privacy compliance. Combined with a novel statistical approach for word embedding evaluations, the models can be monitored in a production setting. Our approach is used in the real world in risk management in the financial sector and has wide applicability to other domains.

show abstract

Section: Text Mining Pipelinementioning

confidence: 99%

Semantic search with domain-specific word-embedding and production monitoring in Fintech

Farmanbar¹,

Ommeren²,

Zhao

2020

Proceedings of the 28th International Conference on Computational Linguistics: System Demonstrations

View full text Add to dashboard Cite

show abstract

“…Research works have been carried out to model the order of words when learning the distributed sentence representation (Le and Mikolov, 2014;Kiros et al, 2015;Conneau et al, 2017;Pagliardini et al, 2018;Gupta et al, 2019;Shen et al, 2019). Le and Mikolov propose Doc2vec (Le and Mikolov, 2014) to add a paragraph vector to represent the missing information from the current context.…”

Section: Sentence Embeddingmentioning

confidence: 99%

“…Sent2Vec (Pagliardini et al, 2018) aims to strike a balance between matrix factorization and deep learning. Gupta et al (2019) propose two modifications of Word2vec by considering higher-order word n-grams along with uni-gram during training. Shen et al (2019) use InferSent (Conneau et al, 2017) for sentence embeddings based on word vectors learned by Glove (Pennington et al, 2014) or FastText (Joulin et al, 2017).…”

Section: Sentence Embeddingmentioning

confidence: 99%

“…Shen et al (2019) use InferSent (Conneau et al, 2017) for sentence embeddings based on word vectors learned by Glove (Pennington et al, 2014) or FastText (Joulin et al, 2017). Gupta et al (2019) claim that training word embeddings along with higher n-gram embeddings helps in the removal of the contextual information from the uni-gram, resulting in better stand-alone word embeddings. All the aforementioned methods require pre-trained word vectors as input.…”

Section: Sentence Embeddingmentioning

confidence: 99%

“…Although FastSent is faster than Skip-Thought in training, it sacrifices the order of words in a sentence, which is important in language models, such as the n-gram feature. For example, Gupta et al (2019) utilize the bi-gram and even tri-gram to train their embedding model.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Learning distributed sentence vectors with bi-directional 3D convolutions

Liu

Wang²,

Yin

2020

Proceedings of the 28th International Conference on Computational Linguistics

View full text Add to dashboard Cite

We propose to learn distributed sentence representation using the text's visual features as input. Different from the existing methods that render the words (or characters) of a sentence into images separately, we fold these images into a 3-dimensional sentence tensor. Then, multiple 3dimensional convolutions with different lengths (the third dimension) are applied to the sentence tensor, which would act as bi-gram, tri-gram, quad-gram, and even five-gram detectors jointly. Similar to the Bi-LSTMs, these n-gram detectors learn both forward and backward distributional semantic knowledge from the sentence tensor. The proposed model uses bi-directional convolutions to learn text embedding according to the semantic order of words. The feature maps from the two directions are concatenated for final sentence embedding learning. Our model involves only a single layer of convolution which makes it easy and fast to train. We evaluate the sentence embeddings on several downstream natural language processing (NLP) tasks, which demonstrate surprisingly excellent performance of the proposed model.

show abstract