2022
DOI: 10.1089/cmb.2021.0536
|View full text |Cite
|
Sign up to set email alerts
|

kmer2vec: A Novel Method for Comparing DNA Sequences by word2vec Embedding

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 19 publications
(6 citation statements)
references
References 16 publications
0
6
0
Order By: Relevance
“…However, current mainstream feature coding methods typically encode DNA sequences in the form of individual vectors, which does not meet the input format requirements of our network framework. Therefore, we proposed three simple feature matrix coding methods and one complex feature matrix coding method, including One-hot coding, NCP coding, One-hot + NCP coding and Word2Vec coding [ 34 ]. Simple feature codes have only two numbers 0 and 1, they are very simple and efficient.…”
Section: Resultsmentioning
confidence: 99%
“…However, current mainstream feature coding methods typically encode DNA sequences in the form of individual vectors, which does not meet the input format requirements of our network framework. Therefore, we proposed three simple feature matrix coding methods and one complex feature matrix coding method, including One-hot coding, NCP coding, One-hot + NCP coding and Word2Vec coding [ 34 ]. Simple feature codes have only two numbers 0 and 1, they are very simple and efficient.…”
Section: Resultsmentioning
confidence: 99%
“…For example, researchers have adapted it for the construction of universal feature vectors for small molecules [9,27]. Also, it has been employed to create meaningful representations of nucleic acids for phylogenetic analysis [23], predicting drug-miRNA associations [8] and RNA degradation prediction [13]. Additionally, word2vec embeddings have been utilized for proteins in tasks such as drug-target interaction [30], drug-target affinity [32], protein-protein interaction [31], and others.…”
Section: Theorymentioning
confidence: 99%
“…Subsequently, we applied a novel method firstly introduced in this paper, referred to as merged CGR, to convert the promoter sequence into image data capturing evolutionary information. Alongside image information, we applied the kmer2vec method [34] to extract textual information from the promoter sequences.…”
Section: Overview Of the Model Architecturementioning
confidence: 99%
“…To better extract textual information from promoter sequences, we employed the word2vec method to obtain k-mer word embeddings in promoter sequences, following the specific strategies of the kmer2vec method [34]. Initially, we divided all promoter sequences in dataset_pro into a series of 3-mers using an overlapping division, treating them as complete text.…”
Section: Word2vec Word Embeddingmentioning
confidence: 99%