Proceedings of the 1st Workshop on Representation Learning for NLP 2016
DOI: 10.18653/v1/w16-1609
|View full text |Cite
|
Sign up to set email alerts
|

An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation

Abstract: Recently, Le and Mikolov (2014) proposed doc2vec as an extension to word2vec (Mikolov et al., 2013a) to learn document-level embeddings. Despite promising results in the original paper, others have struggled to reproduce those results. This paper presents a rigorous empirical evaluation of doc2vec over two tasks. We compare doc2vec to two baselines and two state-of-the-art document embedding methodologies. We found that doc2vec performs robustly when using models trained on large external corpora, and can be f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

8
185
0
3

Year Published

2017
2017
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 440 publications
(196 citation statements)
references
References 11 publications
8
185
0
3
Order By: Relevance
“…However, no reports on the relevant literature describe an attempt to give meaning to each hidden node. One model of paragraph vectors (PV-DBOW) [5] uses pre-trained word embeddings that reportedly improve task performance [18]. Although this paper shows the possibility of learning proper document embeddings with good initialization of word embeddings, it does not demonstrate the possibility of interpretation of hidden nodes.…”
Section: Related Workmentioning
confidence: 94%
“…However, no reports on the relevant literature describe an attempt to give meaning to each hidden node. One model of paragraph vectors (PV-DBOW) [5] uses pre-trained word embeddings that reportedly improve task performance [18]. Although this paper shows the possibility of learning proper document embeddings with good initialization of word embeddings, it does not demonstrate the possibility of interpretation of hidden nodes.…”
Section: Related Workmentioning
confidence: 94%
“…However, word embeddings came to the foreground by Mikolov, Chen, Corrado, and Dean (), who presented the popular Continuous Bag‐of‐Words model (CBOW) and the Continuous Skip‐gram model. Additionally, sentence embeddings (Doc2Vec (Lau & Baldwin, ) or Sent2vec (Pagliardini, Gupta, & Jaggi, )) as well as the popular GloVe (Global Vectors) (Pennington, Socher, & Manning, ) method are utilized by keyphrase extraction methods.…”
Section: Unsupervised Methodsmentioning
confidence: 99%
“…Word‐embedding approaches have been applied to various problems, such as text classification, text clustering, and textual similarity tasks. Text similarity can be measured through document or word embedding , . Recently, word‐embedding approaches have been mainly used for generating the features of the classifier, such as the support vector machine (SVM), the convolutional neural network (CNN), or the recurrent neural network (RNN) .…”
Section: Related Workmentioning
confidence: 99%
“…For a forum question duplication task and semantic textual similarity task, document‐embedding‐based approaches perform better than word‐embedding‐based approaches, while a PV‐DBOW‐based approach performs better than PV‐DM . Therefore, the proposed model utilizes the document‐embedding method PV‐DBOW to obtain a pre‐trained document vector.…”
Section: Learning Document Embedding With Training Setmentioning
confidence: 99%
See 1 more Smart Citation