2014
DOI: 10.1016/j.jda.2014.03.001
|View full text |Cite
|
Sign up to set email alerts
|

An elegant algorithm for the construction of suffix arrays

Abstract: The suffix array is a data structure that finds numerous applications in string processing problems for both linguistic texts and biological data. It has been introduced as a memory efficient alternative for suffix trees. The suffix array consists of the sorted suffixes of a string. There are several linear time suffix array construction algorithms (SACAs) known in the literature. However, one of the fastest algorithms in practice has a worst case run time of O(n2). The problem of designing practically and the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
9
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
4
4

Relationship

2
6

Authors

Journals

citations
Cited by 14 publications
(9 citation statements)
references
References 31 publications
0
9
0
Order By: Relevance
“…All convolutions were performed using the fftw [28] library Version 3.3.3. We used the suffix array algorithm RadixSAof [29]. Figure 1 shows run times for varying the length of the text n. All algorithms scale linearly with the length of the text.…”
Section: Resultsmentioning
confidence: 99%
“…All convolutions were performed using the fftw [28] library Version 3.3.3. We used the suffix array algorithm RadixSAof [29]. Figure 1 shows run times for varying the length of the text n. All algorithms scale linearly with the length of the text.…”
Section: Resultsmentioning
confidence: 99%
“…We implemented k -mer counting using a generalized suffix array and the derived longest common prefix (LCP) array. The generalized suffix array S A is created from the concatenated reads (delimited by special characters such as $) using a linear algorithm [33]. Then, we create the LCP using both the suffix array S A and the reversed suffix array S A ′ [34, 35].…”
Section: Methodsmentioning
confidence: 99%
“…For each position i in the LCP, LCP[i] contains the size of the longest common prefix between S A [ i ] and S A [ i −1]. The key observation [33] for efficient computation of LCP[i] is: for a position j in T , if L C P [ S A ′ [ j −1]] is L , L C P [ S A ′ [ j ]]≥ L −1. The whole LCP array construction takes linear time to the size of T [33].…”
Section: Methodsmentioning
confidence: 99%
“…This data structure requires \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}${\sim}10$\end{document} bytes/nucleotide. This array is constructed by a new, fast sorting method that is highly scalable ( Rajasekaran and Nicolae 2014 ), having worst case run times of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$O(L$\end{document} log \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$L)$\end{document} and usually much better than this in practice. Once the suffix array is sorted, exact \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$k$\end{document} -mer matches form contiguous blocks in the array.…”
Section: Methodsmentioning
confidence: 99%
“…To keep the speed and simplicity of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$k$\end{document} -mer based approaches but retain information about positional homology, we combined and extended several well-tested ideas in new ways ( Gardner and Hall 2013 ; Leimeister and Morgenstern 2014 ; Fan et al 2015 ; Haubold et al 2015 ) and leveraged recent improvements in engineering of a key data structure ( Rajasekaran and Nicolae 2014 ). From a set of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$N$\end{document} genomes, which may be at various stages of assembly, our algorithm builds short multiple sequence alignments, or “ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$k$\end{document} -mer blocks,” starting from approximately matching \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$k$\end{document} -mer “seeds” ( Fig.…”
mentioning
confidence: 99%