2022
DOI: 10.48550/arxiv.2210.16848
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Using Context-to-Vector with Graph Retrofitting to Improve Word Embeddings

Abstract: Although contextualized embeddings generated from large-scale pre-trained models perform well in many tasks, traditional static embeddings (e.g., Skip-gram, Word2Vec) still play an important role in low-resource and lightweight settings due to their low computational cost, ease of deployment, and stability. In this paper, we aim to improve word embeddings by 1) incorporating more contextual information from existing pre-trained models into the Skip-gram framework, which we call Context-to-Vec; 2) proposing a p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 3 publications
0
2
0
Order By: Relevance
“…In previous SLR works, cross-modal alignment only focuses on positive samples [15,48]. Inspired by contrastive learning [5,16,45], we construct both positive and negative samples in the same mini-batch and implement a contrastive cross-modal alignment method to ensure that similar features are closer while different are farther apart. Given that the normalized spatial features from CNN as S logits ∈ R B×T ×d , and the normalized temporal features from VAE as V logits ∈ R B×T ×d , where B denotes the number of samples.…”
Section: Contrastive Alignmentmentioning
confidence: 99%
“…In previous SLR works, cross-modal alignment only focuses on positive samples [15,48]. Inspired by contrastive learning [5,16,45], we construct both positive and negative samples in the same mini-batch and implement a contrastive cross-modal alignment method to ensure that similar features are closer while different are farther apart. Given that the normalized spatial features from CNN as S logits ∈ R B×T ×d , and the normalized temporal features from VAE as V logits ∈ R B×T ×d , where B denotes the number of samples.…”
Section: Contrastive Alignmentmentioning
confidence: 99%
“…More recently, some hybrid architectures (Rong et al, 2020;Ying et al, 2021;Min et al, 2022) of GNNs and transformers are emerging to capture the topological structures of molecular graphs. Additionally, given that the available labels for molecules are often expensive or incorrect (Xia et al, 2021;Tan et al, 2021;Xia et al, 2022a), the emerging self-supervised pre-training strategies (You et al, 2020;Xia et al, 2022c;Yue et al, 2022;Liu et al, 2023) on graph-structured data are promising for molecular graph data (Hu et al, 2020;Xia et al, 2023a;Gao et al, 2022), just like the overwhelming success of pre-trained language models in natural language processing community (Devlin et al, 2019;Zheng et al, 2022).…”
Section: C3 2d and 3d Graph-based Molecular Descriptorsmentioning
confidence: 99%