2022
DOI: 10.1101/2022.07.25.500264
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

MetaGeneMark-2: Improved Gene Prediction in Metagenomes

Abstract: Accurate prediction of protein-coding genes in metagenomic contigs presents a well-known challenge. Particularly difficult is to identify short and incomplete genes as well as positions of translation initiation sites. It is frequently assumed that initiation of translation in prokaryotes is controlled by a ribosome binding site (RBS), a sequence with the Shine-Dalgarno (SD) consensus situated in the 5' UTR. However, ~30% of the 5,007 genomes, representing the RefSeq collection of prokaryotic genomes, have ei… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 12 publications
(5 citation statements)
references
References 29 publications
0
3
0
Order By: Relevance
“…Fragments shorter than 500 bp in the contigs were removed from the assembly. MetaGeneMark (v2.10) ( Gemayel et al, 2022 ) was then used to predict genes on contigs, and the information with a length less than 100 nt in the prediction results was filtered out. We used the CD-HIT software (v4.5.8) 2 with the default setting to create a unique initial gene catalog.…”
Section: Methodsmentioning
confidence: 99%
“…Fragments shorter than 500 bp in the contigs were removed from the assembly. MetaGeneMark (v2.10) ( Gemayel et al, 2022 ) was then used to predict genes on contigs, and the information with a length less than 100 nt in the prediction results was filtered out. We used the CD-HIT software (v4.5.8) 2 with the default setting to create a unique initial gene catalog.…”
Section: Methodsmentioning
confidence: 99%
“…Host-filtered metagenomic samples were assembled using MEGAHIT (Li et al, 2015) and open reading frames (ORFs) in contigs were predicted using MetaGeneMark-2 (Gemayel et al, 2022). The predicted genes from each sample were merged and clustered using CD-HIT (Fu et al, 2012) based on the criteria of identity >95% and coverage > 90% to remove redundant genes.…”
Section: Metagenomic Data Analysismentioning
confidence: 99%
“…To test the performance of gene prediction tools for large-scale metagenomic predictions, 13 different tools (Supplementary Table 1) were evaluated individually as well as in combinations of two or three. Among these, three tools were designed for eukaryotic sequences [30][31][32] , nine focused on prokaryotes [24][25][26][27][33][34][35][36][37][38][39] , and one for viruses 40 . Both Prodigal, and Pyrodigal 39 (v2.1.0), an actively maintained successor to Prodigal 27 (v2.6.3), were included in the comparisons.…”
Section: Selection Of Optimal Gene Prediction Toolsmentioning
confidence: 99%