2018
DOI: 10.1101/401851
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Nucl2Vec : Local alignment of DNA sequences using Distributed Vector Representation

Abstract: The Next Generation Sequencing Technique (NGS) has provided affordable and fast method for generating genetic data. Generation of whole Genome Sequence and extract relevant information from this data is still a computationally expensive process. In this paper we demonstrate a novel approach for local alignment of DNA reads with respect to reference genome. For this process we have used Skip-gram model for creating encoding(Nucl2Vec) and k-nearest neighbor for the alignment. With our new approach we have reduce… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2020
2020
2020
2020

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 7 publications
0
4
0
Order By: Relevance
“…Aoki and Sakakibara [12] have used Word2Vec encoding to do alignment and generate motif of non-coding regions of RNA. Nucl2Vec [13] also uses encoding similar to word2Vec and uses KNN algorithm to do alignment of sequencer data.…”
Section: Current Methodologymentioning
confidence: 99%
See 1 more Smart Citation
“…Aoki and Sakakibara [12] have used Word2Vec encoding to do alignment and generate motif of non-coding regions of RNA. Nucl2Vec [13] also uses encoding similar to word2Vec and uses KNN algorithm to do alignment of sequencer data.…”
Section: Current Methodologymentioning
confidence: 99%
“…Due to their computational demands, heuristic-based methods are less commonly used than probabilistic methods [4]. Some studies have been conducted which use machine learning techniques for indel identification using random forests [14]. But still GATK is the de facto industry standard for identification of SNVs in Illumina datasets.…”
Section: Current Methodologymentioning
confidence: 99%
“…Nucl2vec. Another similar approach, Nucl2vec [10] trains word2vec on k-mers with the goal of aligning a small query sequence to a long reference genome. Due to the fact that the query might not match exactly with the reference due to natural divergence and sequencing errors, Ganesh et al modify word2vec to accommodate for insertions and substitutions.…”
Section: Distributed Representationsmentioning
confidence: 99%
“…Genomic Signatures are an effective method of discriminating between sequences from different organisms. [42] 20-mers 108 dna2vec [31] k-mers, k ∈ [3, 8] 100 Genomic Signatures [35] 5-mers 16 N-gram graph [22,35] 5-mers, 3-mers 2,6 Nucl2vec [10] 4-mers 1…”
Section: Genomic Signatures (Gs)mentioning
confidence: 99%