2020
DOI: 10.1038/s41598-019-55627-4
|View full text |Cite
|
Sign up to set email alerts
|

SWeeP: representing large biological sequences datasets in compact vectors

Abstract: Vectoral and alignment-free approaches to biological sequence representation have been explored in bioinformatics to efficiently handle big data. Even so, most current methods involve sequence comparisons via alignment-based heuristics and fail when applied to the analysis of large data sets. Here, we present "Spaced Words Projection (SWeeP)", a method for representing biological sequences using relatively small vectors while preserving intersequence comparability. SWeeP uses spaced-words by scanning the seque… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
19
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
2
1

Relationship

5
1

Authors

Journals

citations
Cited by 10 publications
(19 citation statements)
references
References 41 publications
0
19
0
Order By: Relevance
“…Phylogenetic analyses were conducted using SWeeP [ 47 ] for protein set representations (available at: ). The purpose of this method is to transform any set of amino acid sequences into a single vector that can, for example, represent all proteins in an organism.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Phylogenetic analyses were conducted using SWeeP [ 47 ] for protein set representations (available at: ). The purpose of this method is to transform any set of amino acid sequences into a single vector that can, for example, represent all proteins in an organism.…”
Section: Methodsmentioning
confidence: 99%
“…The purpose of this method is to transform any set of amino acid sequences into a single vector that can, for example, represent all proteins in an organism. In practice, the input file to generate the SWeeP vectors consists of a multiFASTA file containing amino acid sequences, in which the protein sequences in a particular organism are concatenated, each protein flanked by delimiters, to get a single sequence [ 47 ].…”
Section: Methodsmentioning
confidence: 99%
“…A tutorial for running rSWeeP and the trees are available at https://github.com/DanrleyRF/Suplementar. For all analysis, we applied the same protocol adopted in the study by De Pierri and colleagues (2020) [5] .…”
Section: Methodsmentioning
confidence: 99%
“…In the era of Big Data, vector and alignment-free approaches to compare and represent biological sequences stand out for being more efficient than most alignment-based heuristic methods [1] . Studies show that vector representation of biological data presents an effective solution for the Bioinformatics field [2][3][4][5] .…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation