2016
DOI: 10.1038/srep30308
|View full text |Cite
|
Sign up to set email alerts
|

A novel alignment-free method for detection of lateral genetic transfer based on TF-IDF

Abstract: Lateral genetic transfer (LGT) plays an important role in the evolution of microbes. Existing computational methods for detecting genomic regions of putative lateral origin scale poorly to large data. Here, we propose a novel method based on TF-IDF (Term Frequency-Inverse Document Frequency) statistics to detect not only regions of lateral origin, but also their origin and direction of transfer, in sets of hierarchically structured nucleotide or protein sequences. This approach is based on the frequency distri… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
48
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
8

Relationship

3
5

Authors

Journals

citations
Cited by 39 publications
(49 citation statements)
references
References 48 publications
1
48
0
Order By: Relevance
“…(D) If the average frequency of k -mers in a potential lateral segment is less than that of all k -mers in the target genome’s own group, then we consider it an inferred lateral segment . Step B implements the IDF component, and step D the TF component (Cong et al, 2016a,b). Pseudocode is available in the Supplementary Material to Cong et al (2016a), and the TF-IDF source code at https://github.com/congyingnan/TF-IDF.git.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…(D) If the average frequency of k -mers in a potential lateral segment is less than that of all k -mers in the target genome’s own group, then we consider it an inferred lateral segment . Step B implements the IDF component, and step D the TF component (Cong et al, 2016a,b). Pseudocode is available in the Supplementary Material to Cong et al (2016a), and the TF-IDF source code at https://github.com/congyingnan/TF-IDF.git.…”
Section: Methodsmentioning
confidence: 99%
“…Recently we (Cong et al, 2016a,b) introduced term frequency-inverse document frequency (TF-IDF) as an accurate, scalable approach to infer LGT among microbial genomes. Using TF-IDF, edges represent only lateral signal and can be inferred directly from whole genomes without first parsing them into individual genes.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…the use of degenerate k -mers, scoring match lengths rather than k -mer composition, and grammar-based techniques; see recent reviews 16, 17 for more detail. Methods for inferring lateral genetic transfer have also been developed 18, 19 . Importantly, evolutionary relationships can also be depicted as a network, with taxa and relationships represented respectively as nodes and edges 2024 , rather than as a strictly bifurcating tree.…”
Section: Introductionmentioning
confidence: 99%
“…A SVM model represents examples as points in space, different classes of examples are divided by a certain gap which must be as wide as possible. New examples when mapped into the space are predicted to belong to a class of examples based on which side of the gap they fall [10,18,22]. • KNN Algorithm The output obtained from Support Vector Machines Algorithm [9] are clusters of two sentiments with class labels "normal" and "critical".…”
Section: Tokenization Data Standardization Emoji Conversionmentioning
confidence: 99%