2020
DOI: 10.3389/fmicb.2020.01486
|View full text |Cite
|
Sign up to set email alerts
|

Batch-Learning Self-Organizing Map Identifies Horizontal Gene Transfer Candidates and Their Origins in Entire Genomes

Abstract: Horizontal gene transfer (HGT) has been widely suggested to play a critical role in the environmental adaptation of microbes; however, the number and origin of the genes in microbial genomes obtained through HGT remain unknown as the frequency of detected HGT events is generally underestimated, particularly in the absence of information on donor sequences. As an alternative to phylogeny-based methods that rely on sequence alignments, we have developed an alignment-free clustering method on the basis of an unsu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
13
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2

Relationship

4
2

Authors

Journals

citations
Cited by 11 publications
(13 citation statements)
references
References 114 publications
0
13
0
Order By: Relevance
“…The phylogenetic method based on sequence alignment is a well-established and irreplaceable method for molecular evolutionary studies. The presently developed sequence alignment-free method is suitable for analyzing a massive amount of sequence data and can analyze over ve million sequences simultaneously 8 ; notably, this method is highly robust against sequencing errors, and therefore, no special pretreatment is required. Furthermore, the BLSOM is unsupervised AI that can be used without special models or presumptions, and has powerful visualization capabilities that enable e cient knowledge discovery from big data.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…The phylogenetic method based on sequence alignment is a well-established and irreplaceable method for molecular evolutionary studies. The presently developed sequence alignment-free method is suitable for analyzing a massive amount of sequence data and can analyze over ve million sequences simultaneously 8 ; notably, this method is highly robust against sequencing errors, and therefore, no special pretreatment is required. Furthermore, the BLSOM is unsupervised AI that can be used without special models or presumptions, and has powerful visualization capabilities that enable e cient knowledge discovery from big data.…”
Section: Discussionmentioning
confidence: 99%
“…When we constructed a BLSOM for oligonucleotide compositions in fragment sequences (e.g., 10 kb) from a wide variety of species, the sequences were clustered (self-organized) primarily according to species, despite no species information being used during machine learning 5,7 . Importantly, the BLSOM is suitable for large-scale analysis and has been used to analyze ve million genomic fragments from over one thousand genera 8 . In addition, the BLSOM is explainable AI and can reveal the drivers of species-speci c clustering (self-organization).…”
Section: Introductionmentioning
confidence: 99%
“…In Fig. 4A, the tetranucleotide composition of over 5 million 5-kb fragments of almost all microorganisms (including viral sequences) available in INSDC (DDBJ, ENA/EBI and NCBI) was analyzed by a BLSOM (Abe et al, 2020). In INSDC, only one strand of complementary sequences is registered, and the strand is chosen rather arbitrarily in the registration of fragment sequences.…”
Section: Large-scale Blsoms and Their Use In Metagenome Studiesmentioning
confidence: 99%
“…By analyzing the compositions of short oligonucleotides (e.g., 4-and 5-mers) in a large number of genomic fragments (e.g., 10 kb) derived from a wide variety of species, the BLSOM enables the separation (self-organization) of the genomic sequences by species and phylogeny and identifies oligonucleotides that significantly contribute to the separation (Abe et al, 2003(Abe et al, , 2006cIwasaki et al, 2013b;Bai et al, 2014;Kikuchi et al, 2015). In an analysis of the genomic fragments of a wide range of microbial genomes, over 5 million sequences were successfully separated by phylogenetic groups with high accuracy (Abe et al, 2020). A BLSOM program suitable for PC cluster systems is available on the following website: http://bioinfo.ie.niigata-u.ac.jp/?BLSOM.…”
Section: Introductionmentioning
confidence: 99%
“…When we constructed a BLSOM for oligonucleotide compositions in fragment sequences (e.g., 10 kb) from a wide variety of species, the sequences were clustered (self-organized) primarily according to species, despite no species information being used during machine learning 5,7 . Importantly, the BLSOM is suitable for large-scale analysis and has been used to analyze five million genomic fragments from over one thousand genera 8 . In addition, the BLSOM is explainable AI and can reveal the drivers of species-specific clustering (self-organization).…”
Section: Introductionmentioning
confidence: 99%