2003
DOI: 10.1093/bioinformatics/btg034
|View full text |Cite
|
Sign up to set email alerts
|

TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets

Abstract: TGICL is a pipeline for analysis of large Expressed Sequence Tags (EST) and mRNA databases in which the sequences are first clustered based on pairwise sequence similarity, and then assembled by individual clusters (optionally with quality values) to produce longer, more complete consensus sequences. The system can run on multi-CPU architectures including SMP and PVM.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
1,227
0
4

Year Published

2003
2003
2017
2017

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 1,717 publications
(1,251 citation statements)
references
References 5 publications
1
1,227
0
4
Order By: Relevance
“…Using the Trinity assembler, 242,069 contigs were obtained with 156.6‐Mb total bases, a 647‐bp average length, and a 943‐bp N50; in addition, 113,359 contigs had a length of more than 400 bp (Figure 2). We performed clustering contigs using TGICL (Pertea et al., 2003), and 190,473 unique clusters were generated with 129.2‐Mb total bases, a 679‐bp average length, and 1,060‐bp N50 size (Table 1, Figure 1). The raw RNA‐Seq reads and assembled transcripts were deposited in the European Nucleotide Archive under the project ID PRJEB19675 and accession numbers HAGJ01000001 to HAGJ01190473 for the assembled transcripts.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Using the Trinity assembler, 242,069 contigs were obtained with 156.6‐Mb total bases, a 647‐bp average length, and a 943‐bp N50; in addition, 113,359 contigs had a length of more than 400 bp (Figure 2). We performed clustering contigs using TGICL (Pertea et al., 2003), and 190,473 unique clusters were generated with 129.2‐Mb total bases, a 679‐bp average length, and 1,060‐bp N50 size (Table 1, Figure 1). The raw RNA‐Seq reads and assembled transcripts were deposited in the European Nucleotide Archive under the project ID PRJEB19675 and accession numbers HAGJ01000001 to HAGJ01190473 for the assembled transcripts.…”
Section: Resultsmentioning
confidence: 99%
“…A parameter kmer size of 25 and a depth of at least two kmer were used for assembly with the Trinity package. The contigs resulting from Trinity were further fed to the TGI clustering Tool (version 2.1) (Pertea et al., 2003) to process alternative splicing and redundant sequences.…”
Section: Methodsmentioning
confidence: 99%
“…In order to eliminate redundant sequences and improve the sequence quality, the TIGR Gene Indices Clustering Tools (TGICL) [29] was used to obtain consensus sequences from overlapping clusters of ESTs. Assembly criteria included a 50 bp minimum match, 95% minimum identity in the overlap region and 20 bp maximum unmatched overhangs.…”
Section: Resultsmentioning
confidence: 99%
“…These ESTs were assembled using the TGICL program [29]. A Perl script known as MIcroSAtellite (MISA http://pgrc.ipk-gatersleben.de/misa/) was used to mine microsatellites.…”
Section: Methodsmentioning
confidence: 99%
“…The ESTs were then clustered using The Institute for Genomic Research Gene Indices clustering tools (TGICL) (Pertea et al 2003) (available from http://www.tigr.org/tdb/tgi/software/). ESTs were clustered if they shared more than 30 bp of at least 95% identity.…”
Section: Methodsmentioning
confidence: 99%