2010
DOI: 10.1093/nar/gkq747
|View full text |Cite
|
Sign up to set email alerts
|

FragGeneScan: predicting genes in short and error-prone reads

Abstract: The advances of next-generation sequencing technology have facilitated metagenomics research that attempts to determine directly the whole collection of genetic material within an environmental sample (i.e. the metagenome). Identification of genes directly from short reads has become an important yet challenging problem in annotating metagenomes, since the assembly of metagenomes is often not available. Gene predictors developed for whole genomes (e.g. Glimmer) and recently developed for metagenomic sequences … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
572
0
6

Year Published

2014
2014
2023
2023

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 735 publications
(581 citation statements)
references
References 46 publications
3
572
0
6
Order By: Relevance
“…Before uploading, sequences were quality trimmed using MG-RAST QC pipeline, which included removal of artificial or technical replicates [16] and removal of low quality sequences [17]. Gene-calling was performed using MG-RAST automated pipeline which included the use of FragGeneScan [18], clustering of predicted proteins at 90% identity by using uclust [19] and the use of sBLAT, and applying of the BLAT algorithm [20] for similarity analysis of each cluster. Taxonomic composition of the viromes was obtained through comparison of sequences to the curated NCBI RefSeq complete viral genomes protein sequence database using blastX with an e-value cut-off of B10 -3 .…”
Section: Bioinformatic Analysesmentioning
confidence: 99%
“…Before uploading, sequences were quality trimmed using MG-RAST QC pipeline, which included removal of artificial or technical replicates [16] and removal of low quality sequences [17]. Gene-calling was performed using MG-RAST automated pipeline which included the use of FragGeneScan [18], clustering of predicted proteins at 90% identity by using uclust [19] and the use of sBLAT, and applying of the BLAT algorithm [20] for similarity analysis of each cluster. Taxonomic composition of the viromes was obtained through comparison of sequences to the curated NCBI RefSeq complete viral genomes protein sequence database using blastX with an e-value cut-off of B10 -3 .…”
Section: Bioinformatic Analysesmentioning
confidence: 99%
“…FragGeneScan (Rho et al, 2010) identified openreading frames, which were annotated by BLASTX (Camacho et al, 2009) against the KEGG (Kanehisa, 2002) and SILVA (Pruesse et al, 2007) databases. Analyses were conducted with the MG-RAST pipeline (Meyer et al, 2008) and sequences were deposited under the accession numbers 4623131.3, 4623132.3 and 4623133.3.…”
Section: Experimental Set Upmentioning
confidence: 99%
“…While 6FT captures all true open reading frames (ORFs), it also increases the volume of query sequences and can lead to classification of spurious ORFs. Alternatively, ab initio metagenomic gene prediction [26], [27], [28] decreases the volume of queries, but can result in false-positive and false-negative reading frame predictions Metagenomic simulation framework. 1) Taxonomic profiles from real metagenomes are used to construct mock microbial communities.…”
Section: Ab Initio Gene Prediction Reduces Data Volume At a Small Cosmentioning
confidence: 99%
“…ORFs were searched and classified into SFams using RAPsearch2 while optimizing all other parameters (Methods). We ran 6FT in addition to three different metagenomic gene predictors-FragGeneScan [27], MetaGeneMark [28], and Prodigal [26]-and classified reads into protein families according to their top-hit across predicted ORFs (per-read annotation). Additionally, we evaluated a novel heuristic to rapidly filter short spurious ORFs (S2 Fig). We found that the metagenomic gene finders reduced sequence volume by~85% relative to ORFs that had been naively translated in 6-frames (Fig 2A), consistent with prior observations [29].…”
Section: Ab Initio Gene Prediction Reduces Data Volume At a Small Cosmentioning
confidence: 99%
See 1 more Smart Citation