2017
DOI: 10.1162/tacl_a_00051
|View full text |Cite
|
Sign up to set email alerts
|

Enriching Word Vectors with Subword Information

Abstract: Continuous word representations, trained on large unlabeled corpora are useful for many natural language processing tasks. Popular models that learn such representations ignore the morphology of words, by assigning a distinct vector to each word. This is a limitation, especially for languages with large vocabularies and many rare words. In this paper, we propose a new approach based on the skipgram model, where each word is represented as a bag of character n-grams. A vector representation is associated to eac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

15
5,833
1
101

Year Published

2017
2017
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 7,648 publications
(5,950 citation statements)
references
References 24 publications
15
5,833
1
101
Order By: Relevance
“…The most widely used word embedding models are Word2Vec, 20 Global Vectors GloVe, 21 and FastText. 35 Word2Vec 20 perhaps is the most popular language model based on the neural network. Word2Vec has two architectures Skip-gram and Continuous Bag-of-Words (CBOW).…”
Section: Deep Learning-based Similarity Measuresmentioning
confidence: 99%
See 1 more Smart Citation
“…The most widely used word embedding models are Word2Vec, 20 Global Vectors GloVe, 21 and FastText. 35 Word2Vec 20 perhaps is the most popular language model based on the neural network. Word2Vec has two architectures Skip-gram and Continuous Bag-of-Words (CBOW).…”
Section: Deep Learning-based Similarity Measuresmentioning
confidence: 99%
“…The main idea is to build word embedding of a word from its context words from a large corpus of text. The most widely used word embedding models are Word2Vec, Global Vectors GloVe, and FastText …”
Section: Introductionmentioning
confidence: 99%
“…In order to do this, we turned to the UMLS SPECIALIST NLP toolset [1] as well as [16] and [9, 23]. Our process for constructing 𝒜 is summarized in Figure 2.…”
Section: Knowledge Network Constructionmentioning
confidence: 99%
“…By using state-of-the-art tools, such as [16] and [9], we are able to find novel hypotheses without restricting the domain of our knowledge network or the resulting vocabulary when creating topics. As a result, is more generalized and yet still capable of identifying useful hypotheses.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation