2019
DOI: 10.1038/s41598-019-42966-5
|View full text |Cite
|
Sign up to set email alerts
|

SPARK-MSNA: Efficient algorithm on Apache Spark for aligning multiple similar DNA/RNA sequences with supervised learning

Abstract: Multiple sequence alignment (MSA) is an integral part of molecular biology. But handling massive number of large sequences is still a bottleneck for most of the state-of-the-art software tools. Knowledge driven algorithms utilizing features of input sequences, such as high similarity in case of DNA sequences, can help in improving the efficiency of DNA MSA to assist in phylogenetic tree construction, comparative genomics etc. This article showcases the benefit of utilizing similarity features while performing … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 26 publications
(18 reference statements)
0
3
0
Order By: Relevance
“…To deal with the computational complexity of this task, various heuristics have been proposed in the literature. SPARK-MSNA, is an MSA algorithm on Spark proposed by Vineetha et al (2019) [56] . The algorithm uses both suffix tree and a modified Needleman-Wunsch algorithm.…”
Section: Apache Spark In Life Sciencesmentioning
confidence: 99%
“…To deal with the computational complexity of this task, various heuristics have been proposed in the literature. SPARK-MSNA, is an MSA algorithm on Spark proposed by Vineetha et al (2019) [56] . The algorithm uses both suffix tree and a modified Needleman-Wunsch algorithm.…”
Section: Apache Spark In Life Sciencesmentioning
confidence: 99%
“…SparkBAW aims to boost the process of the alignment phase in the DNA sequence analysis by targeting the shortread mapping. Another multiple sequence alignment Sparkbased implementation are PASTASpark [26] and [27] with a with supervised learning approach.Also, utilizing in-memory data analytics applications that process columnar data as for ArrowSAM [29] that employes Apache Arrow reported in the literature. In PipeMEM [33], a pipeline parallel pattern that ensures no local disk access, the authors optimized the computation phase by employing standard stream and PipeRDD.…”
Section: Related Workmentioning
confidence: 99%
“…In [26], the existing algorithms such as Needleman-Wunsch, Smith-Waterman, and BLAST are employed along with Hadoop or other big data technologies to scale down the time, memory memory memory and the CPU consumption. Spark MSNA (Multiple Sequence Nucleotide Alignment) services are used to compare the suffix tree approach [27]. FASTdoop [28] is able to load the FASTA and FASTQ input files for bioinformatics applications on the MapReduce framework.…”
Section: Related Workmentioning
confidence: 99%